pig script param with ERROR 2999 - hadoop

Hi i am trying to run pig script with param.
cat unbz2.pig
a = load '$source' using PigStorage();
store a into '$target' using PigStorage();
Then I run following command from CMD :
$ pig -f /home/user/unbz2.pig –param source=/part-m-* -param target=/unzip2
2014-08-22 11:51:33,015 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.6.0 (rexported) compiled Feb 26 2014, 03:01:22
2014-08-22 11:51:33,016 [main] INFO org.apache.pig.Main - Logging error messages to: /home/user/pig_1408722693009.log
2014-08-22 11:51:34,041 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/user/.pigbootup not found
2014-08-22 11:51:34,151 [main] ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Undefined parameter : source
Details at logfile: /home/user/pig_1408722693009.log
I dont know what I am doing wrong. Please help.

you just need to pass parameter before executing your pig script because right now you are passing parameter after your pig execute command. you can pass your parameter like -
pig -param parameter=<parameter value> /home/user/unbz2.pig

you could delete the $ and put just the parameter 'source'
a = load 'source' using PigStorage();
store a into 'target' using PigStorage();

Related

Not able to export Hbase table into CSV file using HUE Pig Script

I have installed Apache Amabari and configured the Hue. I want to export hbase table data into csv file using pig script but I am getting following error.
2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s).
Message: java.net.ConnectException: Connection refused
2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-03 10:27:45,709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/usr/lib/hbase/lib/hbase-common-1.2.0-cdh5.11.0.jar' does not exist.
2017-06-03 10:27:45,899 [main] INFO org.apache.pig.Main - Pig script completed in 4 seconds and 532 milliseconds (4532 ms)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code 2
Oozie Launcher failed, finishing Hadoop job gracefully
Please help me and where I am doing wrong.
Let me know your concerns.

pig script does not exists error , even if I can see it in hdfs

I am trying to run the pig script using the -f usecatalog option but it is giving me issue.
it says script does not exist, while I can see the file is present in hdfs file system. see below.
[hdfs#ip-xx-xx-xx-x-xx ec2-user]$ pig -useHCatalog -f /user/admin/pig/scripts/hcat1.pig
WARNING: Use "yarn jar" to launch YARN applications.
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0.2.3.4.0-3485 (rexported) compiled Dec 16 20 15, 04:30:33
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,184 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /user/admin/pig/scripts/hca t1.pig does not exist
Details at logfile: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,203 [main] INFO org.apache.pig.Main - Pig script completed in 753 milliseconds (753 ms)
[hdfs#ip-xxx-xx-xx-xx ec2-user]$ hadoop fs -cat /user/admin/pig/scripts/hcat1.pig
a = load 'trucks' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by truckid == 'A1';
store b INTO '/user/admin/pig/scritps/outputb1';
You need to specify the complete HDFS URI to run the scripts that are stored in HDFS.
Here is what you need:
$pig -useHCatalog hdfs://namenode_hostname:port/user/admin/pig/scripts/hcat1.pig

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

ERROR 2998: Unhandled internal error. Run the code

executing the following command -x local -f /Hbase/load_hbase.pig
I get the following error
2014-11-08 23:36:47,455 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1 (r1585011) compiled Apr 05 2014, 01:41:34
2014-11-08 23:36:47,455 [main] INFO org.apache.pig.Main - Logging error messages to: /home/eduardo/pig_1415497007452.log
2014-11-08 23:36:47,817 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/eduardo/.pigbootup not found
2014-11-08 23:36:47,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-11-08 23:36:48,436 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/hadoop/hbase/filter/WritableByteArrayComparable
Here is the code that I run:
raw_data = LOAD '/data/QCLCD201211/201201hourly.txt' USING PigStorage(',');
weather_data = FOREACH raw_data GENERATE $1, $10;
ranked_data = RANK weather_data;
final_data = FILTER ranked_data BY $0 IS NOT NULL;
STORE final_data INTO 'hbase://weather'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:date info:temp');
I wonder what I'm doing wrong I'll put down the version of hadoop, hbase and the pig.
Hadoop: hadoop-1.2.1
Hbase: hbase-0.96.2-hadoop1
Pig: pig-0.12.1
copy pig jar and hbase jar in hadoop
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/pig/lib/pig-common-0.8.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/pig/lib/hbase-0.96.2-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/pig/lib/hbase-0.96.2-cdh3u0.jar /usr/lib/hadoop/lib/
2)CLOSE HBASE AND HADOOP USING FOLLOWING COMMOND
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMOND
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh

Reading hive table using Pig script

I am trying to read hive table using PIG script but when I run a pig code to read a table in hive its giving me following error:
2014-02-12 15:48:36,143 [main] WARN org.apache.hadoop.hive.conf.HiveConf
-hive-site.xml not found on CLASSPATH 2014-02-12 15:49:10,781 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate
exception from backed error: Error: Found class
org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected
(Ignore newlines and whitespace added for readability)
Hadoop version
1.1.1
Hive version
0.9.0
Pig version
0.10.0
Pig code
a = LOAD '/user/hive/warehouse/test' USING
org.apache.pig.piggybank.storage.HiveColumnarLoader('name string');
Is it due to some version mismatch ?
Why can't you use hcatalog to access hive metadata in pig?
Check this for an example

Resources