Getting Hbase Exception No regions passed - hadoop

Hi i am new to Hbase and im trying to learn how to load bulk data to Hbase table using MapReduce
But i am getting below Exception
Exception in thread "main" java.lang.IllegalArgumentException: No regions passed
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.writePartitions(HFileOutputFormat2.java:307)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configurePartitioner(HFileOutputFormat2.java:527)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:391)
at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:356)
at JobDriver.run(JobDriver.java:108)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at JobDriver.main(JobDriver.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
This is mY Mapper Code
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
System.out.println("Value in Mapper"+value.toString());
String[] values = value.toString().split(",");
byte[] row = Bytes.toBytes(values[0]);
ImmutableBytesWritable k = new ImmutableBytesWritable(row);
KeyValue kvProtocol = new KeyValue(row, "PROTOCOLID".getBytes(), "PROTOCOLID".getBytes(), values[1]
.getBytes());
context.write(k, kvProtocol);
}
This is my Job Configuration
public class JobDriver extends Configured implements Tool{
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
ToolRunner.run(new JobDriver(), args);
System.exit(0);
}
#Override
public int run(String[] arg0) throws Exception {
// TODO Auto-generated method stub
// HBase Configuration
System.out.println("**********Starting Hbase*************");
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "TestHFileToHBase");
job.setJarByClass(JobDriver.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(KeyValue.class);
job.setMapperClass(LoadMapper.class);
job.setOutputFormatClass(HFileOutputFormat2.class);
HTable table = new HTable(conf, "kiran");
FileInputFormat.addInputPath(job, new Path("hdfs://192.168.61.62:9001/sampledata.csv"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.61.62:9001/deletions_6.csv"));
HFileOutputFormat2.configureIncrementalLoad(job, table);
//System.exit(job.waitForCompletion(true) ? 0 : 1);
return job.waitForCompletion(true) ? 0 : 1;
}
}
Can Anyone please help me in resolvin the exception.

You have to create the table first. You can do it with the below code
//Create table and do pre-split
HTableDescriptor descriptor = new HTableDescriptor(
Bytes.toBytes(tableName)
);
descriptor.addFamily(
new HColumnDescriptor(Constants.COLUMN_FAMILY_NAME)
);
HBaseAdmin admin = new HBaseAdmin(config);
byte[] startKey = new byte[16];
Arrays.fill(startKey, (byte) 0);
byte[] endKey = new byte[16];
Arrays.fill(endKey, (byte)255);
admin.createTable(descriptor, startKey, endKey, REGIONS_COUNT);
admin.close();
or directly from the hbase shell with the command:
create 'kiran', 'colfam1'
The exception is caused because the startkeys list is empty: line 306
More info can be found here.
Note that the table name must be the same with the one you use in your code (kiran).

Related

Hadoop mapreduce - mapping NullPointerException

I need to write a simple map-reduce program that , given as input a directed graph represented as a list of edges, produces the same graph where each edge (x,y) with x>y is replaced by (y,x) and there are no repetitions of edges in the output graph.
INPUT
1;3
2;1
0;1
3;1
2;0
1;1
2;1
OUTPUT
1;3
1;2
0;1
0;2
1;1
This is the code :
public class ExamGraph {
// mapper class
public static class MyMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
#Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
value = new Text( value.toString());
String[] campi = value.toString().split(";");
if (Integer.getInteger(campi[0]) > Integer.getInteger(campi[1]))
context.write(new Text(campi[1]+";"+campi[0]), NullWritable.get());
else context.write(new Text(campi[0]+";"+campi[1]), NullWritable.get());
}
}
// reducer class
public static class MyReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
#Override
protected void reduce(Text key, Iterable <NullWritable> values , Context context)
throws IOException, InterruptedException {
context.write(key, NullWritable.get());
}
}
public static void main(String[] args) throws Exception {
// create new job
Job job = Job.getInstance(new Configuration());
// job is based on jar containing this class
job.setJarByClass(ExamGraph.class);
// for logging purposes
job.setJobName("ExamGraph");
// set input path in HDFS
FileInputFormat.addInputPath(job, new Path(args[0]));
// set output path in HDFS (destination must not exist)
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// set mapper and reducer classes
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// An InputFormat for plain text files.
// Files are broken into lines. Either linefeed or carriage-return are used
// to signal end of line. Keys are the position in the file, and values
// are the line of text.
job.setInputFormatClass(TextInputFormat.class);
// set type of output keys and values for both mappers and reducers
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
// start job
job.waitForCompletion(true);
}
}
When I run the jar file using :
hadoop jar path/jar JOBNAME /inputlocation /outputlocation
I got this error :
18/05/22 02:13:11 INFO mapreduce.Job: Task Id : attempt_1526979627085_0001_m_000000_1, Status : FAILED
Error: java.lang.NullPointerException
at ExamGraph$MyMapper.map(ExamGraph.java:38)
at ExamGraph$MyMapper.map(ExamGraph.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
But I did not find the error in the code.
Found the problem , I confused the method getInteger() with the parseInt() in the mapper.

Reading ORC File using MapReduce

I'm trying to read ORC File compressed using SNAPPY via MapReduce. My intent is just to leverage IdentityMapper, essentially to merge small files. However, I get NullPointerException on doing so. I can see from the log that schema is being inferred, do I still need to set the schema for the output file from mapper ?
public class Test {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = new Job(conf, "test");
job.setJarByClass(Test.class);
job.setMapperClass(Mapper.class);
conf.set("orc.compress", "SNAPPY");
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Writable.class);
job.setInputFormatClass(OrcInputFormat.class);
job.setOutputFormatClass(OrcOutputFormat.class);
job.setNumReduceTasks(0);
String source = args[0];
String target = args[1];
FileInputFormat.setInputPath(job, new Path(source))
FileOutputFormat.setOutputPath(job, new Path(target));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
Error: java.lang.NullPointerException at
org.apache.orc.impl.WriterImpl.(WriterImpl.java:178) at
org.apache.orc.OrcFile.createWriter(OrcFile.java:559) at
org.apache.orc.mapreduce.OrcOutputFormat.getRecordWriter(OrcOutputFormat.java:55)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:644)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Not understanding the path in distributed path

From the below code I didn't understand 2 things:
DistributedCache.addcachefile(new URI ('/abc.dat'), job.getconfiguration())
I didn't understand URI path has to be present in the HDFS. Correct me if I am wrong.
And what is p.getname().equals() from the below code:
public class MyDC {
public static class MyMapper extends Mapper < LongWritable, Text, Text, Text > {
private Map < String, String > abMap = new HashMap < String, String > ();
private Text outputKey = new Text();
private Text outputValue = new Text();
protected void setup(Context context) throws
java.io.IOException, InterruptedException {
Path[] files = DistributedCache.getLocalCacheFiles(context.getConfiguration());
for (Path p: files) {
if (p.getName().equals("abc.dat")) {
BufferedReader reader = new BufferedReader(new FileReader(p.toString()));
String line = reader.readLine();
while (line != null) {
String[] tokens = line.split("\t");
String ab = tokens[0];
String state = tokens[1];
abMap.put(ab, state);
line = reader.readLine();
}
}
}
if (abMap.isEmpty()) {
throw new IOException("Unable to load Abbrevation data.");
}
}
protected void map(LongWritable key, Text value, Context context)
throws java.io.IOException, InterruptedException {
String row = value.toString();
String[] tokens = row.split("\t");
String inab = tokens[0];
String state = abMap.get(inab);
outputKey.set(state);
outputValue.set(row);
context.write(outputKey, outputValue);
}
}
public static void main(String[] args)
throws IOException, ClassNotFoundException, InterruptedException {
Job job = new Job();
job.setJarByClass(MyDC.class);
job.setJobName("DCTest");
job.setNumReduceTasks(0);
try {
DistributedCache.addCacheFile(new URI("/abc.dat"), job.getConfiguration());
} catch (Exception e) {
System.out.println(e);
}
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
The idea of Distributed Cache is to make some static data available to the task node before it starts its execution.
File has to be present in HDFS ,so that it can then add it to the Distributed Cache (to each task node)
DistributedCache.getLocalCacheFile basically gets all the cache files present in that task node. By if (p.getName().equals("abc.dat")) { you are getting the appropriate Cache File to be processed by your application.
Please refer to the docs below:
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#DistributedCache
https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/filecache/DistributedCache.html#getLocalCacheFiles(org.apache.hadoop.conf.Configuration)
DistributedCache is an API which is used to add a file or a group of files in the memory and will be available for every data-nodes whether the map-reduce will work. One example of using DistributedCache is map-side joins.
DistributedCache.addcachefile(new URI ('/abc.dat'), job.getconfiguration()) will add the abc.dat file in the cache area. There can be n numbers of file in the cache and p.getName().equals("abc.dat")) will check the file which you required. Every path in HDFS will be taken under Path[] for map-reduce processing. For example :
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
The first Path(args[0]) is the first argument
(input file location) you pass while Jar execution and Path(args[1]) is the second argument which the output file location. Everything is taken as Path array.
In the same way when you add any file to cache it will get arrayed in the Path array which you shud be retrieving using the below code.
Path[] files = DistributedCache.getLocalCacheFiles(context.getConfiguration());
It will return all the files present in the cache and you will your file name by p.getName().equals() method.

Exception in thread "main" java.lang.IllegalArgumentException: Unable to find StyleMasterPage name

I'm developing an app that processes files in ODS format.The code snippet is as follows:
public static void main(String[] args) throws IOException {
// Set the platform L&F.
try {
UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
} catch (Exception e) {
e.printStackTrace();
}
display();
//print();
}
private static void display() throws IOException {
// Load the spreadsheet.
final OpenDocument doc = new OpenDocument();
doc.loadFrom("temperature3.ods");
String styleName = "Calibri";
StyleHeader header = new StyleHeader();
header.setStyleDisplay("Testing");
StyleMasterPage page = new StyleMasterPage();
page.setStyleHeader(header);
page.setStyleName(styleName);
OfficeMasterStyles off = new OfficeMasterStyles();
off.addMasterPage(off.getMasterPageFromStyleName(styleName));
doc.setMasterStyles(off);
// Show time !
final JFrame mainFrame = new JFrame("Viewer");
DefaultDocumentPrinter printer = new DefaultDocumentPrinter();
ODSViewerPanel viewerPanel = new ODSViewerPanel(doc, true);
mainFrame.setContentPane(viewerPanel);
mainFrame.pack();
mainFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
mainFrame.setLocation(10, 10);
mainFrame.setVisible(true);
}
I intend loading the file into a jcomponent for easy manipulation but I'm having this error message in the netbeans console:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to find StyleMasterPage named:Calibri
at org.jopendocument.model.office.OfficeMasterStyles.getMasterPageFromStyleName(Unknown Source)
at starzsmarine1.PrintSpreadSheet.display(PrintSpreadSheet.java:60)
at starzsmarine1.PrintSpreadSheet.main(PrintSpreadSheet.java:45)
Is there an alternative API for this purpose?

Read/Write Hbase without reducer, exceptions

I want to read and write hbase without using any reducer.
I followed the example in "The Apache HBaseâ„¢ Reference Guide", but there are exceptions.
Here is my code:
public class CreateHbaseIndex {
static final String SRCTABLENAME="sourceTable";
static final String SRCCOLFAMILY="info";
static final String SRCCOL1="name";
static final String SRCCOL2="email";
static final String SRCCOL3="power";
static final String DSTTABLENAME="dstTable";
static final String DSTCOLNAME="index";
static final String DSTCOL1="key";
public static void main(String[] args) {
System.out.println("CreateHbaseIndex Program starts!...");
try {
Configuration config = HBaseConfiguration.create();
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.addColumn(Bytes.toBytes(SRCCOLFAMILY), Bytes.toBytes(SRCCOL1));//info:name
HBaseAdmin admin = new HBaseAdmin(config);
if (admin.tableExists(DSTTABLENAME)) {
System.out.println("table Exists.");
}
else{
HTableDescriptor tableDesc = new HTableDescriptor(DSTTABLENAME);
tableDesc.addFamily(new HColumnDescriptor(DSTCOLNAME));
admin.createTable(tableDesc);
System.out.println("create table ok.");
}
Job job = new Job(config, "CreateHbaseIndex");
job.setJarByClass(CreateHbaseIndex.class);
TableMapReduceUtil.initTableMapperJob(
SRCTABLENAME, // input HBase table name
scan, // Scan instance to control CF and attribute selection
HbaseMapper.class, // mapper
ImmutableBytesWritable.class, // mapper output key
Put.class, // mapper output value
job);
job.waitForCompletion(true);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
System.out.println("Program ends!...");
}
public static class HbaseMapper extends TableMapper<ImmutableBytesWritable, Put> {
private HTable dstHt;
private Configuration dstConfig;
#Override
public void setup(Context context) throws IOException{
dstConfig=HBaseConfiguration.create();
dstHt = new HTable(dstConfig,SRCTABLENAME);
}
#Override
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
// this is just copying the data from the source table...
context.write(row, resultToPut(row,value));
}
private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
Put put = new Put(key.get());
for (KeyValue kv : result.raw()) {
put.add(kv);
}
return put;
}
#Override
protected void cleanup(Context context) throws IOException, InterruptedException {
dstHt.close();
super.cleanup(context);
}
}
}
By the way, "souceTable" is like this:
key name email
1 peter a#a.com
2 sam b#b.com
"dstTable" will be like this:
key value
peter 1
sam 2
I am a newbie in this field and need you help. Thx~
You are correct that you don't need a reducer to write to HBase, but there are some instances where a reducer might help. If you are creating an index, you might run into situations where two mappers are trying to write the same row. Unless you are careful to ensure that they are writing into different column qualifiers, you could overwrite one update with another due to race conditions. While HBase does do row level locking, it won't help if your application logic is faulty.
Without seeing your exceptions, I would guess that you are failing because you are trying to write key-value pairs from your source table into your index table, where the column family doesn't exist.
In this code you are not specifying the output format. You need to add the following code
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
DSTTABLENAME);
Also, we are not supposed to create new configuration in the set up, we need to use the same configuration from context.

Resources