API Reference

Pilot-MapReduce

MapReduce

class pmr.MapReduce(pmrDesc, coordinationUrl)[source]

MapReduce: Class for managing MapReduce Jobs

__init__(pmrDesc, coordinationUrl)[source]

Initializes MapReduce with Pilot computes/Data description and coordination system

startPilot()[source]

Start the pilot compute and data services

stopPilot()[source]

Stops the pilot compute and data services

setNbrReduces(nbrReduces)[source]

Set the number of reduces of the MapReduce Job @param nbrReduces: Takes number of Reduces as integer

setChunk(chunkDesc)[source]

Registers the chunk task description @param chunkDesc: SAGA Job Description of chunk task

setMapper(mapDesc)[source]

Registers the Map task description @param mapDesc: SAGA Job Description of Map task

setReducer(reduceDesc)[source]

Registers the Reduce task description @param reduceDesc: SAGA Job Description of Reduce task

setOutputPath(path)[source]

Sets the output path to store the final results of MapReduce job @param reduceDesc: SAGA Job Description of Reduce task

getPilotComputes()[source]

Get pilot computes

getDetails()[source]

Returns the execution time of MapReduce phases

@return: dictionary with execution timing details of MapReduce phases

chunkOnly(inputDu)[source]

Executes the chunk Job only

@param inputDu: Takes input Data Units as Input argument @return: List of chunk task output Data Units

mapOnly(chunkDus)[source]

Executes the Map Job only

@param mapDus: Takes map chunk/split Data Units as Input @return: List of Map task output Data Units

submitComputeUnit(desc)[source]

Submits SAGA Job description to Pilot

@param desc: SAGA Job description

submitDataUnit(desc)[source]

Submits SAGA Job description to Pilot

@param desc: SAGA Job description

reduceOnly(mapDus)[source]

Executes the Reduce job only

@param mapDus: Takes map task Data Units as Input @return: Output Data Unit

runJob()[source]

Executes the entire MapReduce workflow

Mapper

class pmr.Mapper(args)[source]

mapper: Class for managing Map phase of MapReduce Job

__init__(args)[source]

Initializes Map task with parameters passed by MapReduce framework as command line parameters.

partition(key)[source]

Default partition function. This function could be overwritten by custom partition functions.

@param key: map task emitted key value @return: partition number into which the key,value pair has to be written to

emit(key, value)[source]

Emit the key value based on the partition function

finalize()[source]

Prepare the map output files sort the map output contents

Reducer

class pmr.Reducer(args)[source]

reducer: Class for managing Reduce phase of MapReduce Job

__init__(args)[source]

Initializes Reduce task with parameters passed by MapReduce framework as command line parameters.

emit(key, value)[source]

Emit the key value pair to reduce file

finalize()[source]

Close the reduce file

Hadoop

class pmr.Hadoop(pmrDesc, coordinationUrl)[source]

Hadoop: Class for managing Hadoop Jobs

setUpCluster()[source]

Setup Hadoop Cluster

submitJob(desc)[source]

Submit Hadoop Job description

stopCluster()[source]

Tear down Hadoop cluster

Yarn

class pmr.Yarn(pmrDesc, coordinationUrl)[source]

Hadoop: Class for managing Yarn Jobs

setUpCluster()[source]

Setup Yarn Cluster

submitJob(desc)[source]

Submit Yarn Job description

stopCluster()[source]

Tear down Yarn cluster

Spark

class pmr.Spark(pmrDesc, coordinationUrl)[source]

Spark: Class for managing Spark cluster

setUpCluster()[source]

Setup Spark Cluster

submitJob(desc)[source]

Submit spark Job description

stopCluster()[source]

Tear down spark cluster