Reduce job pending in HFileOutputFormat - hadoop

I am using
Hbase:0.92.1-cdh4.1.2, and
Hadoop:2.0.0-cdh4.1.2
I have a mapreduce program that will load data from HDFS to HBase using HFileOutputFormat in cluster mode.
In that mapreduce program i'm using HFileOutputFormat.configureIncrementalLoad() to bulk load a 800000 record
data set which is of 7.3GB size and it is running fine, but it's not running for 900000 record data set which is of 8.3GB.
In the case of 8.3GB data my mapreduce program have 133 maps and one reducer,all maps completed successfully.My reducer status is always in Pending for a long time. There is nothing wrong with the cluster since other jobs are running fine and this job also running fine upto 7.3GB of data.
What could i be doing wrong?
How do I fix this issue?

I ran into the same problem. Looking at the DataTracker logs, I noticed there was not enough free space for the single reducer to run on any of my nodes:
2013-09-15 16:55:19,385 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_slave01.mydomain.com:localhost/127.0.0.1:43455 has 503,777,017,856 bytes free; but we expect reduce input to take 978136413988
This 503gb refers to the free space available on one of the hard drives on the particular slave ("tracker_slave01.mydomain.com"), thus the reducer apparently needs to copy all the data to a single drive.
The reason this happens is your table only has one region when it is brand new. As data is inserted into that region, it'll eventually split on its own.
A solution to this is to pre-create your regions when creating your table. The Bulk Loading Chapter in the HBase book discusses this, and presents two options for doing this. This can also be done via the HBase shell (see create's SPLITS argument I think). The challenge though is defining your splits such that the regions get an even distribution of keys. I've yet to solve this problem perfectly, but here's what I'm doing currently:
HTableDescriptor desc = new HTableDescriptor();
desc.setName(Bytes.toBytes(tableName));
desc.addFamily(new HColumnDescriptor("my_col_fam"));
admin.createTable(desc, Bytes.toBytes(0), Bytes.toBytes(2147483647), 100);
An alternative solution would be to not use configureIncrementalLoad, and instead: 1) just generate your HFile's via MapReduce w/ no reducers; 2) use completebulkload feature in hbase.jar to import your records to HBase. Of course, I think this runs into the same problem with regions, so you'll want to create the regions ahead of time too (I think).

Your job is running with single reduces, means 7GB data getting processed on single task.
The main reason of this is HFileOutputFormat starts reducer that sorts and merges data to be loaded in HBase table.
here, Num of Reducer = num of regions in HBase table
Increase the number of regions and you will achieve parallelism in reducers. :)
You can get more details here:
http://databuzzprd.blogspot.in/2013/11/bulk-load-data-in-hbase-table.html

Related

HBase aggregation, Get And Put operation, Bulk Operation

I would like to know how can I map a value of a key.
I know that it can be done with Get and then Put operations. Is there any other way to do it efficiently? 'checkAndPut' is not ver helpful
can it be done with something like :
(key,value) => value+g()
I have read the book HBase the Definitive Guide and it seems like Map Reduce Job interpreted to Put/Get operations on top of HBase. Does it means that it is not a 'Bulk Operation' (since it's an operation per key) ?
How /Does Spark relevant here ?
HBase has scans (1) to retrieve multiple rows; and MapReduce jobs can and do use this command (2).
For HBase 'bulk' is mostly [or solely] is 'bulk load'/'bulk import' where one adds data via constructing HFiles and 'injecting' them to HBase cluster (as opposed to PUT-s) (3).
Your task can be implemented as a MapReduce Job as well as a Spark app (4 being one of examples, maybe not the best one), or a Pig script, or a Hive query if you use HBase table from Hive (5); pick your poison.
If you set up a Table with a counter then you can use an Increment to add a certain amount to the existing value in an atomic operation.
From a MapReduce job you would aggregate your input in micro batches (wherever you have your incremental counts), group them by key/value, sum them up, and then issue a Put from your job (1 Put per key).
What I mentioned above is not a 'bulk' operation but it would probably work just fine if the amount of rows that you modify in each batch is relatively small compared to the total number or rows in your table.
IFF you expect to modify your entire table at each batch then you should look at Bulk Loads. This will require you to write a job that reads your existing values in HBase, your new values from the incremental sources, add them together, and write them back to HBase (In a 'bulk load' fashion, not directly)
A Bulk Load writes HFiles directly to HDFS without going through the HBase 'write pipeline' (Memstore, minor compactions, major compactions, etc), and then issue a command to swap the existing files with the new ones. The swap is FAST! Note, you could also generate the new HFile outside the HBase cluster (not to overload it) and then copy them over and issue the swap command.

Is there an Alternative for HBaseStorage in PIG

I am using HBaseStorage with -caching option in pig script as follows
HBaseStorage('countDetails:ansCount countDetails:divCount countDetails:unansCount countDetails:engCount countDetails:ineffCount countDetails:totalCount', '-caching 1000');
I can see this was reflecting in my job.xml
but I can see there is no time difference in it I am processing 10 million records and store data around 160mb in to HBase.
When I store the result in hdfs its taking 3 mins to process the same job takes 30mins to store into HBase.
I even tried by setting
SET hbase.client.scanner.caching 1000;
Please let me know how can I reduce the time.
Is there any alternative for HBaseStorage?
http://apmblog.compuware.com/2013/02/19/speeding-up-a-pighbase-mapreduce-job-by-a-factor-of-15/
the above blog says that I have to set hbase.client.scanner.caching in bootstrap scrip
I don't know how to do that
will it be enough If I set it in Hbase-conf.
Please help me out of this
hbase.client.scanner.caching points to number of rows that will be fetched when calling next on a scanner if it is not served from (local, client) memory.
Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer time when the cache is empty. Do not set this value such that the time between invocations is greater than the scanner timeout;
i.e. hbase.regionserver.lease.period This property is 1 min by default. Clients must
report in within this period else they are considered dead.
In my experience HBase doesn't perform very well with Pig. It you don't have requirement for random look-up then use only HDFS otherwie HBase MR job would be better option. Also, In Hadoop MR job, you can connect to Hbase(This option gave me the best performance).

Is there any way to control in Hadoop MapReduce framework on which node reducer will be started?

shortly speaking I need a way to give Hadoop MapRedice API hint on what host I'd like to run certain reducer based on its partition. Is there any way?
Somewhat longer story:
I have few mapper tasks which generate (or import from another source) records for certain HBase table. Emitted records have ImmutableBytesWritable as keys. Number of reducers for this job exactly matches number of table regions and custom partitioner is used to distribute records so records of every region gets to appropriate reducer.
Reducers are intended to generate HFile images, one image per region so later bulk load could be used on them. The only serious problem here is I'd like reducers at least to 'try to run' on the same hosts appropriate region servers are running. This is to get good probability of generated HFiles locality (in terms of HDFS) for appropriate HBase region servers.
Any idea how to get this behavior?
Alternative could be how to 'request' HDFS file to 'get local'. Having this I could start another MR job with mappers bound to region servers (through splits) and request corresponding HFile to get local.
There is no out-of-box way to do this yet, short of writing a custom scheduler, which would be an overkill.
An upstream ticket does track this feature request at https://issues.apache.org/jira/browse/MAPREDUCE-199.

HBase as Input -> unable to balance load over available map tasks

I want each hadoop mapper to process a separate portion of data at a M/R job and I would like to test on a pseudo-distributed (single-node) setup the case where many mappers would be necessary to exist as a result of a bigger input-data size. Given the size of my current input and the standalone mode I am experimenting on, I can only see 1 map task.
My input comes from an hbase table and I thought that the number of regions per hbase table is equal to the number of mappers used to process the table's data.
So, as to reproduce a case where many mappers would process the input data, I predefined regions of table through shell like this :
create 't1', 'f1', {NUMREGIONS => 4, SPLITALGO => 'HexStringSplit'}
or setting 'UniformSplit' as SPLITALGO, but even if mappers indeed increase to the specified number of regions (after importing data to the respective table), all the input data (at a subsequent test job where I try to read from this table) pass through only one mapper - with the others processing none of the input rows.
I work on a pseudo-distributed (single-node) setup and I really don't know how to solve this. Does anyone have any ideas? Thanks!
Are you scanning the entire table or just a section of it? If you are scanning a section of the table, then that might be the cause of your problem as your data source isn't big enough to trigger multiple mappers.
You can try to decrease the region size in your hbase-size.xml configuration and restart hbase to achieve the desired effect.
Lastly, in your mapred-site.xml configuration, how many mapper slots do you have? If it is just 1, this will not limit the number of map jobs, but it will limit the number of map jobs that can be run at a time on that server.
Other than that, I don't think you have much control over specifying the number of mappers per job- not like you do with the number of reducers.

Want to compare two consecutive jobs on Hadoop

I want to know if I can compare two consecutive jobs in Hadoop. If not I would appreciate if anyone can tell me how to proceed with that. To be precise, I want to compare the jobs in terms of what exactly two jobs did? The reason behind doing this is to create a statistics about how many jobs executed on Hadoop were similar in terms of the behavior. For example how many times same sorting function was executed on the same input.
For example if first job did something like SortList(A) and some other job did SortList(A)+Group(result(SortList(A)). Now, I am wondering if in Hadoop there is some mapping being stored somewhere like JobID X-> SortList(A).
So far, I thought of this problem as finding the entry point in Hadoop and try to understand how job is created and what information is being kept with a jobID and in what form (in a code form or some description) , but I was not able to figure it out successfully.
Hadoop's Counters might be a good place to start. You can define your own counter names (like each counter name is a data set you are working on) and increment that counter each time you perform a sort on it. Finding which data set you are working on, however, may be the more difficult task.
Here's a tutorial I found:
http://philippeadjiman.com/blog/2010/01/07/hadoop-tutorial-series-issue-3-counters-in-action/
No. Hadoop jobs are just programs. They can have any side effects. They can write ordinary files, hdfs file, or a database. Nothing in hadoop is recording all of their activities. All hadoop is manage the schedule and the flow of data.

Resources