Connect to Hbase using hadoop config in spark - hadoop

I am trying to create hbase connection in MapPartitionFunction of spark.
Caused by: java.io.NotSerializableException: org.apache.hadoop.conf.Configuration
I tried the following code
SparkConf conf = new SparkConf()
.setAppName("EnterPrise Risk Score")
.setMaster("local");
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
conf.set("spark.kryo.registrationRequired", "true");
conf.registerKryoClasses(new Class<?>[] {
Class.forName("org.apache.hadoop.conf.Configuration"),
Class.forName("org.apache.hadoop.hbase.client.Table"),
Class.forName("com.databricks.spark.avro.DefaultSource$SerializableConfiguration")});
SparkSession sparkSession = SparkSession.builder().config(conf)
.getOrCreate();
Configuration hbaseConf= HBaseConfiguration
.create(hadoopConf);
I am using sparkSession to create dataset and pass hbaseConf to create connections to hbase.
Is there any way to connect to hbase?

You probably implicitly pass an HBase configuration to a spark action like this:
Configuration hbaseConfiguration = HBaseConfiguration.create();
sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
//more valid code
});
Why don't you just create Configuration right inside of it like this:
sc.hadoopFile(inDirTrails, AvroInputFormat.class, AvroWrapper.class, NullWritable.class)).mapPartitions( i -> {
Configuration hbaseConfiguration = HBaseConfiguration.create();
hbaseConfiguration.set("hbase.zookeeper.quorum", HBASE_ZOOKEEPER_QUORUM);
Connection connection = ConnectionFactory.createConnection(hbaseConfiguration)
//more valid code
});

Related

Can't connect to remote hdfs using spring hadoop

I'm just starting in Hadoop and I'm trying to read from remote hdfs (remote hdfs in a docker container, accessible from localhost:32783) trough Spring Hadoop but I get the following error:
org.springframework.data.hadoop.HadoopException:
Cannot list resources Failed on local exception:
java.io.EOFException; Host Details : local host is: "user/127.0.1.1";
destination host is: "localhost":32783;
I'm triying to read the file using the following code:
HdfsClient hdfsClient = new HdfsClient();
Configuration conf = new Configuration();
conf.set("fs.defaultFS","hdfs://localhost:32783");
FileSystem fs = FileSystem.get(conf);
SimplerFileSystem sFs = new SimplerFileSystem(fs);
hdfsClient.setsFs(sFs);
String filePath = "/tmp/tmpTestReadTest.txt";
String output = hdfsClient.readFile(filePath);
What hdfsClient.readFile(filePath) does is the following:
public class HdfsClient {
private SimplerFileSystem sFs;
public String readFile(String filePath) throws IOException {
FSDataInputStream inputStream = this.sFs.open(filePath);
output = getStringFromInputStream(inputStream.getWrappedStream());
inputStream.close();
}
return output;
}
Any guess why I can't read from the remote hdfs? Removing the conf.set("fs.defaultFS","hdfs://localhost:32783"); I can read, but just from local filepath.
I understand that the "hdfs://localhost:32783" is correct because changing it by a random uri gives Connection refused error
There migh be something wrong into my hadoop configuration?
Thank you!

How to fetch data from Hbase table which is running on linux system and java progamme which is run on window Could not locate executable null\bin\

How to fetch data from Hbase table which is running on linux system and java progamme which is run on window Could not locate executable null\bin\
//
this is my code to connect
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.20.129");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.master", "192.168.20.129:60010");
Just add this method , call before connecting.
private static void workaround() {
//workaround for = java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
File workaround = new File(".");
System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());
new File("./bin").mkdirs();
try {
new File("./bin/winutils.exe").createNewFile();
} catch (IOException e) {
logger.error(e);
}
}

Create HbaseConfiguration once in MapReduce job

I am writing a map-reduce job in java.
I want to use external table for writing hbase increments object.
For that, I am creating new HbaseConfiguration.
I want to be able to create once and use in all mappers.
Any idea?
The Job configuration is already passed to the mappers (and to the reducers) as a part of the context. You can access any HBase table with it.
Job job = new Job(HbaseConfiguration.create(), ...);
/* ... Rest of the job setup ... */
job.waitForCompletion(true);
Within your mapper setup method:
Configuration config = context.getConfiguration();
HTable mytable = new HTable(config, "my_table_name");
....
You can even send your custom parameters or arguments so you can instantiate any kind of object you could need on your mappers:
Configuration config = HbaseConfiguration.create();
config.set("myStringParam", "customValue");
config.setStrings("myStringsParam", "customValue1", "customValue2", "customValue3");
config.setBoolean("myBooleanParam", true);
Job job = new Job(config, ...);
/* ... Rest of the job setup ... */
job.waitForCompletion(true);
Within your mapper setup method:
Configuration config = context.getConfiguration();
String myStringParam = config.get("myStringParam");
String myBooleanParam = config.get("myBooleanParam");
BTW: I don't know why this question has so many downvotes, it's not a bad question.

how to get job counters with sqoop 1.4.4 java api?

I'm using Sqoop 1.4.4 and its java api to run an import job and I'm having
trouble figuring out how to access the job counters once the import has
completed. I see suitable methods in the ConfigurationHelper class, like
getNumMapOutputRecords, but I'm not sure how to pass the job to them.
Is there a way to get at the job from the SqoopTool or Sqoop objects?
My code looks something like this:
SqoopTool sqoopTool = new ImportTool();
SqoopOptions options = new SqoopOptions();
options.setConnectString(connectString);
options.setUsername(username);
options.setPassword(password);
options.setTableName(table);
options.setColumns(columns);
options.setWhereClause(whereClause);
options.setTargetDir(targetDir);
options.setNumMappers(1);
options.setFileLayout(FileLayout.TextFile);
options.setFieldsTerminatedBy(delimiter);
Configuration config = new Configuration();
config.set("oracle.sessionTimeZone", timezone.getID());
System.setProperty(Sqoop.SQOOP_RETHROW_PROPERTY, "1");
Sqoop sqoop = new Sqoop(sqoopTool, config, options);
String[] nullArgs = new String[0];
Sqoop.runSqoop(sqoop, nullArgs);

How to Read file from Hadoop using Java without command line

I wanted to read file from hadoop system, I could do that using the below code
String uri = theFilename;
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
InputStream in = null;
try {
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
} finally {
IOUtils.closeStream(in);
}
To run this file I have to run hadoop jar myjar.jar com.mycompany.cloud.CatFile /filepathin_hadoop
That works. But How can I do that same from other program, I mean without using hadoop jar command.
You can add your core-site.xml to that Configuration object so it knows the URI for your HDFS instance. This method requires HADOOP_HOME to be set.
Configuration conf = new Configuration();
Path coreSitePath = new Path(System.getenv("HADOOP_HOME"), "conf/core-site.xml");
conf.addResource(coreSitePath);
FileSystem hdfs = FileSystem.get(conf);
// rest of code the same
Now, without using hadoop jar you can open a connection to your HDFS instance.
Edit: Have to use conf.addResource(Path). If you use a String arg it, looks in the classpath for that filename.
There is another configuration method set(parameterName,value). If you use this method, you dont have to specify the location of core-site.xml. This would be useful for accessing HDFS from remote location like webserver.
Usage as follows :
String uri = theFilename;
Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://10.132.100.211:8020/");
FileSystem fs = FileSystem.get(conf);
// Rest of the code

Resources