Map Reduce
MapReduce is a design pattern suitable when you have either:
- Large input data (e.g., multiple files to process), or
- Large output data (e.g., multiple forms to fill)
and there is a logical way to break the task into smaller, ideally independent parts. You first break down the task using BatchNode in the map phase, followed by aggregation in the reduce phase.
Example: Document Summarization
class MapSummaries(BatchNode):
def prep(self, shared): return [shared["text"][i:i+10000] for i in range(0, len(shared["text"]), 10000)]
def exec(self, chunk): return call_llm(f"Summarize this chunk: {chunk}")
def post(self, shared, prep_res, exec_res_list): shared["summaries"] = exec_res_list
class ReduceSummaries(Node):
def prep(self, shared): return shared["summaries"]
def exec(self, summaries): return call_llm(f"Combine these summaries: {summaries}")
def post(self, shared, prep_res, exec_res): shared["final_summary"] = exec_res
# Connect nodes
map_node = MapSummaries()
reduce_node = ReduceSummaries()
map_node >> reduce_node
# Create flow
summarize_flow = Flow(start=map_node)
summarize_flow.run(shared)