API Reference¶

Pilot-MapReduce¶

MapReduce¶

class pmr.MapReduce(pmrDesc, coordinationUrl)[source]¶

MapReduce: Class for managing MapReduce Jobs

__init__(pmrDesc, coordinationUrl)[source]¶: Initializes MapReduce with Pilot computes/Data description and coordination system

startPilot()[source]¶: Start the pilot compute and data services

stopPilot()[source]¶: Stops the pilot compute and data services

setNbrReduces(nbrReduces)[source]¶: Set the number of reduces of the MapReduce Job @param nbrReduces: Takes number of Reduces as integer

setChunk(chunkDesc)[source]¶: Registers the chunk task description @param chunkDesc: SAGA Job Description of chunk task

setMapper(mapDesc)[source]¶: Registers the Map task description @param mapDesc: SAGA Job Description of Map task

setReducer(reduceDesc)[source]¶: Registers the Reduce task description @param reduceDesc: SAGA Job Description of Reduce task

setOutputPath(path)[source]¶: Sets the output path to store the final results of MapReduce job @param reduceDesc: SAGA Job Description of Reduce task

getPilotComputes()[source]¶: Get pilot computes

getDetails()[source]¶

Returns the execution time of MapReduce phases

@return: dictionary with execution timing details of MapReduce phases

chunkOnly(inputDu)[source]¶

Executes the chunk Job only

@param inputDu: Takes input Data Units as Input argument @return: List of chunk task output Data Units

mapOnly(chunkDus)[source]¶

Executes the Map Job only

@param mapDus: Takes map chunk/split Data Units as Input @return: List of Map task output Data Units

submitComputeUnit(desc)[source]¶

Submits SAGA Job description to Pilot

@param desc: SAGA Job description

submitDataUnit(desc)[source]¶

Submits SAGA Job description to Pilot

@param desc: SAGA Job description

reduceOnly(mapDus)[source]¶

Executes the Reduce job only

@param mapDus: Takes map task Data Units as Input @return: Output Data Unit

runJob()[source]¶: Executes the entire MapReduce workflow

Mapper¶

class pmr.Mapper(args)[source]¶

mapper: Class for managing Map phase of MapReduce Job

__init__(args)[source]¶: Initializes Map task with parameters passed by MapReduce framework as command line parameters.

partition(key)[source]¶

Default partition function. This function could be overwritten by custom partition functions.

@param key: map task emitted key value @return: partition number into which the key,value pair has to be written to

emit(key, value)[source]¶: Emit the key value based on the partition function

finalize()[source]¶: Prepare the map output files sort the map output contents

Reducer¶

class pmr.Reducer(args)[source]¶

reducer: Class for managing Reduce phase of MapReduce Job

__init__(args)[source]¶: Initializes Reduce task with parameters passed by MapReduce framework as command line parameters.

emit(key, value)[source]¶: Emit the key value pair to reduce file

finalize()[source]¶: Close the reduce file

Hadoop¶

class pmr.Hadoop(pmrDesc, coordinationUrl)[source]¶

Hadoop: Class for managing Hadoop Jobs

setUpCluster()[source]¶: Setup Hadoop Cluster

submitJob(desc)[source]¶: Submit Hadoop Job description

stopCluster()[source]¶: Tear down Hadoop cluster

Yarn¶

class pmr.Yarn(pmrDesc, coordinationUrl)[source]¶

Hadoop: Class for managing Yarn Jobs

setUpCluster()[source]¶: Setup Yarn Cluster

submitJob(desc)[source]¶: Submit Yarn Job description

stopCluster()[source]¶: Tear down Yarn cluster

Spark¶

class pmr.Spark(pmrDesc, coordinationUrl)[source]¶

Spark: Class for managing Spark cluster

setUpCluster()[source]¶: Setup Spark Cluster

submitJob(desc)[source]¶: Submit spark Job description

stopCluster()[source]¶: Tear down spark cluster

API Reference¶

Pilot-MapReduce¶

MapReduce¶

Mapper¶

Reducer¶

Hadoop¶

Yarn¶

Spark¶

Table Of Contents

Previous topic

Next topic

This Page