JNI error while creating external table using gphdfs protocol : greenplum - greenplum

1) Completed "One-time HDFS Protocol Installation" using link - http://gpdb.docs.pivotal.io/4360/admin_guide/load/topics/g-one-time-hdfs-protocol-installation.html#topic20
2) copied the 'csv' file on hdfs system at path - data/etl/ext01
3) created external table using following command
create external table orgData(orghk varchar(200),eff_datetime timestamp, source varchar(20), handle_id varchar(200), created_by_d varchar(100), created_datetime timestamp)
location ('gphdfs://<hostname>:8020/data/etl/ext01/part-r-00000-3eae416a-d0ff-4562-a762-d53469d42cd2.csv')
Format 'CSV' (DELIMITER ',')
However after executing the command - select * from orgData
I am getting following error
ERROR: ERROR: external table gphdfs protocol command ended with
error. Error: A JNI error has occurred, please check your
installation and try again (seg1 slice1
<hostname2>:40000 pid=4977) Detail:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/mapreduce/lib/input/FileInputFormat at
java.lang.Class.getDeclaredMethods0(Native Method) at
java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at
java.lang.Class.privateGetMethodRecursive(Class.java:3048) at
java.lang.Class.getMethod0(Class.java:3018) at
java.lang.Class.getMethod(Class.java:1784) at
sun.launcher.LauncherHelper.valid Command:
'gphdfs://<hostname>:8040/data/etl/ext01/part-r-00000-3eae416a-d0ff-4562-a762-d53469d42cd2.csv'
External table orgdata, file
gphdfs://<hostname>:8040/data/etl/ext01/part-r-00000-3eae416a-d0ff-4562-a762-d53469d42cd2.csv
Am I missing something?

Can you verify you set JAVA_HOME and HADOOP_HOME on ALL segments, then restarted the cluster?
gpssh -f clusterHostfile -e 'egrep (JAVA_HOME|HADOOP_HOME) ~/.bashrc | wc -l'
You should see the number 2 from each host in the cluster.

Related

Sqoop Job Exception

I am getting the following Exception when I'm trying to List the Sqoop JOBS.
I'm not able to create the Soop jobs because of this exception:
root#ubuntu:/usr/lib/sqoop/conf# sqoop job --list 16/04/11 01:51:44
ERROR tool.JobTool: I/O error performing job operation:
java.io.IOException: Exception creating SQL connection at
com.cloudera.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:220)
at
com.cloudera.sqoop.metastore.hsqldb.AutoHsqldbStorage.open(AutoHsqldbStorage.java:113)
at com.cloudera.sqoop.tool.JobTool.run(JobTool.java:279) at
com.cloudera.sqoop.Sqoop.run(Sqoop.java:146) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:182) at
com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:221) at
com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:230) at
com.cloudera.sqoop.Sqoop.main(Sqoop.java:239) Caused by:
Caused by: java.sql.SQLException: General error: java.lang.ClassFormatError: >Truncated class file
at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
at org.hsqldb.jdbc.jdbcConnection.(Unknown Source)
at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
at org.hsqldb.jdbcDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at com.cloudera.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:>180)
... 8 more
Sqoop Version: 1.3.0-cdh3u5
Please help
Commands used as below:
sqoop job --list
sqoop job --create sqoopjob21 -- import --connect jdbc:mysql://localhost/mysql1 --table emp --target-dir /importjob21 ;
This may possibly be because sqoop cannot find the hsqldb which it uses to store job information. Check if you have "metastore.db.script" file in the sqoop installed directory, if not create it.
Create a file named "metastore.db.script" and put the following lines
CREATE SCHEMA PUBLIC AUTHORIZATION DBA
CREATE MEMORY TABLE SQOOP_ROOT(VERSION INTEGER,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHAR(256),CONSTRAINT SQOOP_ROOT_UNQ UNIQUE(VERSION,PROPNAME))
CREATE MEMORY TABLE SQOOP_SESSIONS(JOB_NAME VARCHAR(64) NOT NULL,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHAR(1024),PROPCLASS VARCHAR(32) NOT NULL,CONSTRAINT SQOOP_SESSIONS_UNQ UNIQUE(JOB_NAME,PROPNAME,PROPCLASS))
CREATE USER SA PASSWORD ""
GRANT DBA TO SA
SET WRITE_DELAY 10
SET SCHEMA PUBLIC
INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0')
INSERT INTO SQOOP_ROOT VALUES(0,'sqoop.hsqldb.job.info.table','SQOOP_SESSIONS')
Now create "metastore.db.properties" file and put these lines
#HSQL Database Engine 1.8.0.10
#Fri Aug 04 14:07:10 IST 2017
hsqldb.script_format=0
runtime.gc_interval=0
sql.enforce_strict_size=false
hsqldb.cache_size_scale=8
readonly=false
hsqldb.nio_data_file=true
hsqldb.cache_scale=14
version=1.8.0
hsqldb.default_table_type=memory
hsqldb.cache_file_scale=1
hsqldb.log_size=200
modified=no
hsqldb.cache_version=1.7.0
hsqldb.original_version=1.8.0
hsqldb.compatible_version=1.8.0
Now create a directory named ".sqoop" if not already created and put these two files there. Now run your job.

Load CSV data to HBase using pig or hive

Hi I have created a pig script which loads data into hbase. My csv file is stored into hadoop location at /hbase_tables/zip.csv
Pig Script
register /home/hduser/pig-0.12.0/lib/pig-0.8.0-core.jar;
A = LOAD '/hbase_tables/zip.csv' USING PigStorage(',') as (id:chararray, zip:chararray, desc1:chararray, desc2:chararray, income:chararray);
STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('zip:zip,desc:desc1,desc:desc2,income:income');
when i execute it gives below error
Pig Stack Trace
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:667)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.execute(PigServer.java:1190)
at org.apache.pig.PigServer.access$100(PigServer.java:128)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:510)
at org.apache.pig.Main.main(Main.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:71)
at org.apache.hadoop.fs.Path.<init>(Path.java:45)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:470)
... 20 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 23 more
Please let me know how i can import csv data file into hbase or if you have any alternate solution.
Seems like your problem is with "Relative path" in absolute URI: hbase://mydata_logs.
Are you sure the path is correct?
Probably table mydata_logs does not exist. Start: hbase shell and type list. Is your table mydata_logs on the list?
I had the same task once and have fully-working solution (actually, I'm not sure about commas in your third line of the code):
%default hbase_home `echo \$HBASE_HOME`;
%default tmp '/user/alexander/tmp/users_dump/k14'
set zookeeper.znode.parent '/hbase-unsecure';
set hbase.zookeeper.quorum 'dmp-hbase.local';
register $hbase_home/lib/zookeeper-3.4.5.jar;
register $hbase_home/hbase-0.94.20.jar;
UsersHdfs = LOAD '$tmp' using PigStorage('\t', '-schema');
store UsersHdfs into 'hbase://user_test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'id:DEFAULT id:last_modified birth:year gender:female gender:male','-caster HBaseBinaryConverter'
);
That code works for me, maybe the matter is in you hbase configs.
You could provide your .csv file and we could talk about it in more details.

SerDe problems with Hive 0.12 and Hadoop 2.2.0-cdh5.0.0-beta2

The title is a bit weird as I'm having difficulties narrowing down the problem. I used my solution on Hadoop 2.0.0-cdh4.4.0 and hive 0.10 without issues.
I can't create a table using this SerDe: https://github.com/rcongiu/Hive-JSON-Serde
first try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.objectinspector.primitive.AbstractPrimitiveJavaObjectInspector.<init>(Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/PrimitiveObjectInspectorUtils$PrimitiveTypeEntry;)V
second try:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Could not initialize class org.openx.data.jsonserde.objectinspector.JsonObjectInspectorFactory
I can create a table with this SerDe: https://github.com/cloudera/cdh-twitter-example
I create an external table with tweets from flume. I can't do "SELECT * FROM tweets;"
FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
I can do SELECT id, text FROM tweets;
I can do a SELECT COUNT(*) FROM tweets;
I can't self join this table:
Execution log at: /tmp/jochen.debie/jochen.debie_20140311121313_164611a9-b0d8-4e53-9bda-f9f7ac342aaf.log
2014-03-11 12:13:30 Starting to launch local task to process map join; maximum memory = 257294336
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-5
mentioned execution log:
2014-03-11 12:13:30,331 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(324)) - Hive Runtime Error: Map local work failed
org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDejava.lang.RuntimeException: java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe
Does anyone know how to fix this or at least show me where the problem is?
EDIT: Can it be a problem that I built the serde on a Hadoop 2.0.0-cdh4.4.0 and hive 0.10?
From what I've seen, Hive-.11+ has a bug in join with custom SerDe.
https://github.com/Esri/gis-tools-for-hadoop/issues/9
You might try the workaround of copying the JAR file containing the SerDe class, to $HIVE_HOME/lib .
(I see in your question you got ClassNotFoundException both in join and in other cases; so far the times I have encountered such were all with join.)
[Edit] Another workaround is to use HADOOP_CLASSPATH:
env HADOOP_CLASSPATH=some.jar:other.jar hive ...
[Edit] The work around applies to Hive versions 0.11 and 0.12; then 0.13 and above contain the fix for HIVE-6670.

HIve shell exception java type java.lang.Integer cant be mapped for this datastore

I have hadoop and hbase installed. When i run show tables comand in hive shell the following error raised.
Hive version 0.10.0
Hbase version 0.90.6
Hadoop version 1.1.2
hive> show tables;
FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException javax.jdo.JDOFatalInternalException: JDBC type integer declared for field
"org.apache.hadoop.hive.metastore.model.MTable.createTime" of java type java.lang.Integer cant be mapped for this datastore.
NestedThrowables:
org.datanucleus.exceptions.NucleusException: JDBC type integer declared for field "org.apache.hadoop.hive.metastore.model.MTable.createTime" of java type java.lang.Integer cant be mapped for this datastore.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I found where the problem comes from. Error is related to language settings of the linux box. Before launching hive export LANG=C is needful.

Error When Loading Data from HDFS and Writing to HBase using Pig

How to load the output data of a mapreduce program which is in the hdfs into hbase?
I tried to running the following pig command to load the data from hdfs to hbase:-
A = LOAD 'hdfs://b**/user/user1/development/hbase/output/part-00000' USING PigStorage('t') as (strdata1:chararray, strdata2:chararray);
STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycf:strdata2');
where, hdfs://b**/user/user1/development/hbase/output/part-00000 is the map-reduce output mydata is the hbase table name created
mycf is the column family name
I am getting the following error:-
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:673)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
at org.apache.pig.PigServer.execute(PigServer.java:1190)
at org.apache.pig.PigServer.access$100(PigServer.java:128)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1517)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:362)
at org.apache.pig.PigServer.executeBatch(PigServer.java:329)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:406)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:71)
at org.apache.hadoop.fs.Path.<init>(Path.java:45)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:476)
... 15 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hbase://mydata_logs at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.<init>(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
Only remove the hbase:// schema from data source string like that :
STORE A INTO 'mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycf:strdata2');

Resources