HBase HFiles size generation - hadoop

I am working on an HBase cluster with 28 region servers.
I have a table, which uses a wide-table definition. The row key is a Hex string, while each row has exactly one column family, which in turn has 80 qualifiers.
Each qualifier name is an int (starting from 1 to 80) and each value is a long.
The table has been presplited into 28 regions, using the classic getHexSplits method defined in the HBase manual here.
I have a Map-Reduce job which creates the table, and has to load about 1.8 TB of data in it.
I am using HFileOutputStream to create the HFiles. The problem is that, despite the fact that the job is configured with 28 reducers, and hbase.hregion.max.filesize is set to the default (10GB), I get a lot more(1149 of aprox 1.61 GB each!) HFiles that I expect.
The problem is that, once the table gets created, and the HFiles are being loaded, the table immediately starts both MAJOR and MINOR compactions, which triggers lots of I/O and affect my next Map-Reduce job which does reads from the table. I suppose this happens since there are multiple HFiles per region, and HBase tries to compact them to optimize the reads?
How can I make sure I get a lesser number of HFiles, in order to avoid the compactions? What would be ideal to set as the number of regions for the table, and what other parameters can I set to make sure I get no compactions?
My table is written only once, and then used just for reads.

Related

HBase aggregation, Get And Put operation, Bulk Operation

I would like to know how can I map a value of a key.
I know that it can be done with Get and then Put operations. Is there any other way to do it efficiently? 'checkAndPut' is not ver helpful
can it be done with something like :
(key,value) => value+g()
I have read the book HBase the Definitive Guide and it seems like Map Reduce Job interpreted to Put/Get operations on top of HBase. Does it means that it is not a 'Bulk Operation' (since it's an operation per key) ?
How /Does Spark relevant here ?
HBase has scans (1) to retrieve multiple rows; and MapReduce jobs can and do use this command (2).
For HBase 'bulk' is mostly [or solely] is 'bulk load'/'bulk import' where one adds data via constructing HFiles and 'injecting' them to HBase cluster (as opposed to PUT-s) (3).
Your task can be implemented as a MapReduce Job as well as a Spark app (4 being one of examples, maybe not the best one), or a Pig script, or a Hive query if you use HBase table from Hive (5); pick your poison.
If you set up a Table with a counter then you can use an Increment to add a certain amount to the existing value in an atomic operation.
From a MapReduce job you would aggregate your input in micro batches (wherever you have your incremental counts), group them by key/value, sum them up, and then issue a Put from your job (1 Put per key).
What I mentioned above is not a 'bulk' operation but it would probably work just fine if the amount of rows that you modify in each batch is relatively small compared to the total number or rows in your table.
IFF you expect to modify your entire table at each batch then you should look at Bulk Loads. This will require you to write a job that reads your existing values in HBase, your new values from the incremental sources, add them together, and write them back to HBase (In a 'bulk load' fashion, not directly)
A Bulk Load writes HFiles directly to HDFS without going through the HBase 'write pipeline' (Memstore, minor compactions, major compactions, etc), and then issue a command to swap the existing files with the new ones. The swap is FAST! Note, you could also generate the new HFile outside the HBase cluster (not to overload it) and then copy them over and issue the swap command.

Are all the data with the same row key stored in the same node?

I have got a question regarding hbase databases. We access the data first by defining a row key, column family and in the last by column qualifier.
My question is will HBase store all column families with the same row key together in one node or not?
UPDATE: As an example, I want to multiply val1 and val2 in a map/reduce job. While val1 and val2 are stored in database like this: Row=00000 Column Family:M, m000001_1234567=val1, Row=00000 Column Family: R, r000001_1234567=val2. Can I make sure that I have access to both val1 and val2 in the same node running the map?
As you might be aware its actually the HFile that has the actual key value data stored and it would be distributed accross the datanodes. The zookeeper / HLog /Memestore help in locating the rowkey data and retrieve it.
The Key-value storage would be grouped and stored in each node , say keys [A-L] goes to one node and the rest [M-z] to another node , considering 2 node scenario.
Question 1: Will HBase store all column families with the same row key together in one node?
Yes, but there are a few special cases.
The recommened way to set up an HBase cluster is the collocated (or co-located) configuration: use the some machines for HDFS Data Nodes and HBase Region Servers (in contrast to dedicating the machines to specifically one of these roles, in which case all reads would be remote and performance would suffer). In such a setup, when a Region Server saves data to HDFS, the first replica of the data will always get saved to the local disk. However, the placement of any further replicas are not consistent - different parts may be placed on different nodes. This means that if a machine dies, no data will get lost, but the data of that region will not be found on any single machine any more, bit will be scattered all around the cluster instead. Even in this case, a single row will probably still to be stored on a single Data Node, but it won't be local to the new Region Server any more.
This is not the only way how data locality can get lost, previously even restarting HBase had this effect. A lot of older posts mention this, but this has actually been fixed since then in HBASE-2896.
Even if data locality gets lost, the next major compaction will restore it.
Sources and recommended reading:
How Scaling Really Works in Apache HBase
HBase and data locality
HBase File Locality in HDFS
Major compaction and data locality
Question 2: When reading an HBase table from a MapReduce job, does each mapper run on the node where the data it uses is stored?
My understanding is that apart from the special case mentioned above, the answer is yes, but I couldn't find this explicitly mentioned anywhere.
Sources and recommended reading:
Understanding Map Reduce on HTable
The MapReduce Integration section of Tutorial: HBase

HBase bulk load usage

I am trying to import some HDFS data to an already existing HBase table.
The table I have was created with 2 column families, and with all the default settings that HBase comes with when creating a new table.
The table is already filled up with a large volume of data, and it has 98 online regions.
The type of row keys it has, are under the form of(simplified version) :
2-CHARS_ID + 6-DIGIT-NUMBER + 3 X 32-CHAR-MD5-HASH.
Example of key: IP281113ec46d86301568200d510f47095d6c99db18630b0a23ea873988b0fb12597e05cc6b30c479dfb9e9d627ccfc4c5dd5fef.
The data I want to import is on HDFS, and I am using a Map-Reduce process to read it. I emit Put objects from my mapper, which correspond to each line read from the HDFS files.
The existing data has keys which will all start with "XX181113".
The job is configured with :
HFileOutputFormat.configureIncrementalLoad(job, hTable)
Once I start the process, I see it configured with 98 reducers (equal to the online regions the table has), but the issue is that 4 reducers got 100% of the data split among them, while the rest did nothing.
As a result, I see only 4 folder outputs, which have a very large size.
Are these files corresponding to 4 new regions which I can then import to the table? And if so, why only 4, while 98 reducers get created?
Reading HBase docs
In order to function efficiently, HFileOutputFormat must be configured such that each output HFile fits within a single region. In order to do this, jobs whose output will be bulk loaded into HBase use Hadoop's TotalOrderPartitioner class to partition the map output into disjoint ranges of the key space, corresponding to the key ranges of the regions in the table.
confused me even more as to why I get this behaviour.
Thanks!
The number of maps you'd get doesn't depend on the number of regions you have in the table but rather how the data is split into regions (each region contains a range of keys). since you mention that all your new data start with the same prefix it is likely it only fit into a few regions.
You can pre split your table so that the new data would be divided between more regions

Reduce job pending in HFileOutputFormat

I am using
Hbase:0.92.1-cdh4.1.2, and
Hadoop:2.0.0-cdh4.1.2
I have a mapreduce program that will load data from HDFS to HBase using HFileOutputFormat in cluster mode.
In that mapreduce program i'm using HFileOutputFormat.configureIncrementalLoad() to bulk load a 800000 record
data set which is of 7.3GB size and it is running fine, but it's not running for 900000 record data set which is of 8.3GB.
In the case of 8.3GB data my mapreduce program have 133 maps and one reducer,all maps completed successfully.My reducer status is always in Pending for a long time. There is nothing wrong with the cluster since other jobs are running fine and this job also running fine upto 7.3GB of data.
What could i be doing wrong?
How do I fix this issue?
I ran into the same problem. Looking at the DataTracker logs, I noticed there was not enough free space for the single reducer to run on any of my nodes:
2013-09-15 16:55:19,385 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_slave01.mydomain.com:localhost/127.0.0.1:43455 has 503,777,017,856 bytes free; but we expect reduce input to take 978136413988
This 503gb refers to the free space available on one of the hard drives on the particular slave ("tracker_slave01.mydomain.com"), thus the reducer apparently needs to copy all the data to a single drive.
The reason this happens is your table only has one region when it is brand new. As data is inserted into that region, it'll eventually split on its own.
A solution to this is to pre-create your regions when creating your table. The Bulk Loading Chapter in the HBase book discusses this, and presents two options for doing this. This can also be done via the HBase shell (see create's SPLITS argument I think). The challenge though is defining your splits such that the regions get an even distribution of keys. I've yet to solve this problem perfectly, but here's what I'm doing currently:
HTableDescriptor desc = new HTableDescriptor();
desc.setName(Bytes.toBytes(tableName));
desc.addFamily(new HColumnDescriptor("my_col_fam"));
admin.createTable(desc, Bytes.toBytes(0), Bytes.toBytes(2147483647), 100);
An alternative solution would be to not use configureIncrementalLoad, and instead: 1) just generate your HFile's via MapReduce w/ no reducers; 2) use completebulkload feature in hbase.jar to import your records to HBase. Of course, I think this runs into the same problem with regions, so you'll want to create the regions ahead of time too (I think).
Your job is running with single reduces, means 7GB data getting processed on single task.
The main reason of this is HFileOutputFormat starts reducer that sorts and merges data to be loaded in HBase table.
here, Num of Reducer = num of regions in HBase table
Increase the number of regions and you will achieve parallelism in reducers. :)
You can get more details here:
http://databuzzprd.blogspot.in/2013/11/bulk-load-data-in-hbase-table.html

HBase as Input -> unable to balance load over available map tasks

I want each hadoop mapper to process a separate portion of data at a M/R job and I would like to test on a pseudo-distributed (single-node) setup the case where many mappers would be necessary to exist as a result of a bigger input-data size. Given the size of my current input and the standalone mode I am experimenting on, I can only see 1 map task.
My input comes from an hbase table and I thought that the number of regions per hbase table is equal to the number of mappers used to process the table's data.
So, as to reproduce a case where many mappers would process the input data, I predefined regions of table through shell like this :
create 't1', 'f1', {NUMREGIONS => 4, SPLITALGO => 'HexStringSplit'}
or setting 'UniformSplit' as SPLITALGO, but even if mappers indeed increase to the specified number of regions (after importing data to the respective table), all the input data (at a subsequent test job where I try to read from this table) pass through only one mapper - with the others processing none of the input rows.
I work on a pseudo-distributed (single-node) setup and I really don't know how to solve this. Does anyone have any ideas? Thanks!
Are you scanning the entire table or just a section of it? If you are scanning a section of the table, then that might be the cause of your problem as your data source isn't big enough to trigger multiple mappers.
You can try to decrease the region size in your hbase-size.xml configuration and restart hbase to achieve the desired effect.
Lastly, in your mapred-site.xml configuration, how many mapper slots do you have? If it is just 1, this will not limit the number of map jobs, but it will limit the number of map jobs that can be run at a time on that server.
Other than that, I don't think you have much control over specifying the number of mappers per job- not like you do with the number of reducers.

Resources