Generative InputFormats for MapReduce

Gluster

2013-10-10

InputFormats in hadoop are commonly used to abstract the process of reading input records from mappers. Here;s how they work:

1) The InputFormat itself is defined at Runtime.

2) The InputFormat class provides a iterator-like API:

– nextKeyValue (boolean)

3) The InputFormat class also provides the RecordReader and Splits to the higher level MapReduce framework, which creates Mappers and sends individual records to mappers.

The most common InputFormat is your FileInputFormat, which provides a series of InputSplits which, collectively, represent a whole file.

So – what if you want to generate input on the fly?

In this case, we can create our own, custom input format, which continues returning key value pairs. The “amount” of pairs returned can be acquired from a configuration parameter if we want to.

Here’s an example:

Loading ….

BLOG

06 Dec 2020
Looking back at 2020 – with g...

2020 has not been a year we would have been able to predict. With a worldwide pandemic and lives thrown out of gear, as we head into 2021, we are thankful that our community and project continued to receive new developers, users and make small gains. For that and a...

Read more
27 Apr 2020
Update from the team

It has been a while since we provided an update to the Gluster community. Across the world various nations, states and localities have put together sets of guidelines around shelter-in-place and quarantine. We request our community members to stay safe, to care for their loved ones, to continue to be...

Read more
03 Feb 2020
Building a longer term focus for Gl...

The initial rounds of conversation around the planning of content for release 8 has helped the project identify one key thing – the need to stagger out features and enhancements over multiple releases. Thus, while release 8 is unlikely to be feature heavy as previous releases, it will be the...

Read more

Generative InputFormats for MapReduce

BLOG

Looking back at 2020 – with g...

Update from the team

Building a longer term focus for Gl...