apache pig not connecting to hdfs - hadoop

I have Hadoop version 2.6.3 and pig-0.6.0
I have all the daemons up and running in Single node cluster.
After firing the pig command . The pig is only connecting to file:/// not hdfs
could you please tell me how to make it to connect hdfs
below is the INFO log that could i see
2016-01-10 20:58:30,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2016-01-10 20:58:30,650 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
when I hit the command in GRUNT
grunt> ls hdfs://localhost:54310/
2016-01-10 21:05:41,059 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Wrong FS: hdfs://localhost:54310/, expected: file:///
Details at logfile: /home/hguna/pig_1452488310172.log
I have no clue has to why it is expecting file:///
ERROR 2999: Unexpected internal error. Wrong FS: hdfs://localhost:54310/, expected: file:///
java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:54310/, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
at org.apache.pig.tools.grunt.GruntParser.processLS(GruntParser.java:576)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:304)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:352)
Did I configure hadoop correctly ? or some where I am wrong please let me know if there is any file that I need to share . I have done enough researching could not fix it .Btw I am a newbie to Hadoop and pig
please help me .
Thanks

Chek your configuration in hadoop-site.xml, core-site.xml and mapred-site.xml
Use PIG_CLASSPATH to specify addition classpath entries. For eg, to add hadoop configuration files (hadoop-site.xml, core-site.xml) to classpath
export PIG_CLASSPATH=<path_to_hadoop_conf_dir>
you should override default classpath entries by setting PIG_USER_CLASSPATH_FIRST
export PIG_USER_CLASSPATH_FIRST=true
After that you can able to start the grunt shell

Related

INFO Configuration deprecation session id is deprecated Instead use dfs metrics session-id

I am trying to set up hadoop 2.6.2. Almost everything has been setup.
My Ubuntu version: 15.10
My hadoop path is /usr/local/hadoop/hadoop-2.6.2
Java path is /usr/local/java/jdk1.8.0_65
I have mentioned java and hadoop path in /etc/profile
I have edited 4 files inside hadoop-2.6.2/etc/hadoop: core-site.xml, hadoop-env.sh, hdfs-site.xml and mapred-site.xml
But when I try to execute following command from hadoop site
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar grep input output 'dfs[a-z.]+'
Then it gives me following error
INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/11/25 07:57:09 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
java.net.ConnectException: Call From jass-VirtualBox/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
What can be the reason?
I had the same problem but on ubuntu 14.04 LTS.
I have solved it with following commands:
sbin/stop-dfs.sh
bin/hdfs namenode -format
sbin/start-dfs.sh
The first command will stop all daemons.
The second will format file system.
The third will start all daemons again.

Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)

I am running a hive query throwh oozie using hue..
I am creating a table through hue-oozie work flow...
My job is failing but when I check in hive the table is created.
Log shows below error:
16157 [main] INFO org.apache.hadoop.hive.ql.hooks.ATSHook - Created ATS Hook
2015-09-24 11:05:35,801 INFO [main] hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
2015-09-24 11:05:35,803 ERROR [main] ql.Driver (SessionState.java:printError(960)) - hive.exec.post.hooks Class not found:org.apache.atlas.hive.hook.HiveHook
16159 [main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Hive Internal Error: java.lang.ClassNotFoundException(org.apache.atlas.hive.hook.HiveHook)
java.lang.ClassNotFoundException: org.apache.atlas.hive.hook.HiveHook
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
Not able to identify the issue....
I am usig HDP 2.3.1
Basically this error is due to missing atlas jar in oozie share lib.
In HDP the Atlas jar is available in /usr/hdp/2.3.0.0-2557/atlas/
Put all the jars related to atlas in hadoop share lib ..
hadoop fs -put /usr/hdp/2.3.0.0-2557/atlas/hook/hive/* /user/oozie/share/lib/lib200344/hive
Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in hive-env.sh .
Copy <atlas package>/conf/application.propertiesto hive conf directory.
Restart the oozie services. This will solve this problem. If anybody face the problem please comment here so that I can help.
[Comment by Immo Huneke: when using the Hortonworks sandbox VM, I found that just putting the jar files in the share/lib folder under HDFS was enough to resolve the problem. I didn't have to update hive-env.sh or copy the application.properties file. But check the exact path of your share/lib folder by executing the command hdfs dfs -ls /user/oozie/share/lib before copying.]
hive>add jar /usr/hdp//atlas/hook/hive/hive-bridge-${VERSION}.jar
it will be ok.
hope help for u.
It Seems You CLASS is not found exception.
Have you installed Oozie Sharedlib, if Yes, please update all the hive dependent Jar in the sharedLib Location, and check if the status
Also check if Hive Client is available in all the Nodes under the cluster and same should be running
​I tried each and every possible solution mentioned in this forum and in stackoverflow, but it did not resolve my issue.
Finally, I resolved it by copying all the jars in /hook/hive to lib (create a new lib folder at job.properties level) folder of my oozie workflow

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

ERROR 2998: Unhandled internal error. Run the code

executing the following command -x local -f /Hbase/load_hbase.pig
I get the following error
2014-11-08 23:36:47,455 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1 (r1585011) compiled Apr 05 2014, 01:41:34
2014-11-08 23:36:47,455 [main] INFO org.apache.pig.Main - Logging error messages to: /home/eduardo/pig_1415497007452.log
2014-11-08 23:36:47,817 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/eduardo/.pigbootup not found
2014-11-08 23:36:47,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-11-08 23:36:48,436 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable
Here is the code that I run:
raw_data = LOAD '/data/QCLCD201211/201201hourly.txt' USING PigStorage(',');
weather_data = FOREACH raw_data GENERATE $1, $10;
ranked_data = RANK weather_data;
final_data = FILTER ranked_data BY $0 IS NOT NULL;
STORE final_data INTO 'hbase://weather'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:date info:temp');
I wonder what I'm doing wrong I'll put down the version of hadoop, hbase and the pig.
Hadoop: hadoop-1.2.1
Hbase: hbase-0.96.2-hadoop1
Pig: pig-0.12.1
copy pig jar and hbase jar in hadoop
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/pig/lib/pig-common-0.8.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/pig/lib/hbase-0.96.2-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/pig/lib/hbase-0.96.2-cdh3u0.jar /usr/lib/hadoop/lib/
2)CLOSE HBASE AND HADOOP USING FOLLOWING COMMOND
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMOND
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh

Running hadoop job using java org.apache.hadoop.util.RunJar command

I want to submit a job to jobtracker using java (instead of hadoop) so that I can debug classpath issue.
export HADOOP_CLASSPATH=hbase-util-0.0.1-SNAPSHOT.jar:/etc/hadoop/conf:hbase-util-0.0.1-SNAPSHOT.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hbase/*:/usr/lib/hadoop/etc/hadoop/mapred-site.xml:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.0.1.jar:/usr/lib/hbase/hbase-0.92.1-cdh4.0.1-security.jar:/usr/lib/hbase/lib/zookeeper.jar:/usr/lib/hbase/lib:/etc/hbase/conf:/usr/lib/hbase/lib/guava-11.0.2.jar:/usr/lib/hbase/lib/jackson-mapper-asl-1.5.5.jar:/usr/lib/hbase/lib/jackson-core-asl-1.5.5.jar:/usr/lib/hbase:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/*:/usr/lib/hadoop-0.20-mapreduce/.//*
java -cp ${HADOOP_CLASSPATH} org.apache.hadoop.util.RunJar hbase-util-0.0.1-SNAPSHOT.jar hbase.util.RowDiffCounter SRM hdfs://dchilcmsnn01:8020/tmp/hadoop/mapred/temp/job1-temp-1491763074 /tmp/hadoop/mapred/temp/job1-temp-1491763075D SOURCE_MANAGEMENT SOURCE_MANAGEMENT
I get an error
ERROR [main] (UserGroupInformation.java:1235) - PriviledgedActionException as:devuser (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
Adding the following properties does not help. I checked the job configuration page on the jobtracker to get the correct value.
-D mapreduce.framework.name=local
-D mapred.job.tracker=host101:8021
Do I need to pass in the user info as well?

Resources