PIG setup throwing error

PIG setup throwing error - hadoop

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.

When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce

Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

Related

Not able to export Hbase table into CSV file using HUE Pig Script

I have installed Apache Amabari and configured the Hue. I want to export hbase table data into csv file using pig script but I am getting following error.
2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s).
Message: java.net.ConnectException: Connection refused
2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-03 10:27:45,709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/usr/lib/hbase/lib/hbase-common-1.2.0-cdh5.11.0.jar' does not exist.
2017-06-03 10:27:45,899 [main] INFO org.apache.pig.Main - Pig script completed in 4 seconds and 532 milliseconds (4532 ms)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code 2
Oozie Launcher failed, finishing Hadoop job gracefully
Please help me and where I am doing wrong.
Let me know your concerns.

pig script does not exists error , even if I can see it in hdfs

I am trying to run the pig script using the -f usecatalog option but it is giving me issue.
it says script does not exist, while I can see the file is present in hdfs file system. see below.
[hdfs#ip-xx-xx-xx-x-xx ec2-user]$ pig -useHCatalog -f /user/admin/pig/scripts/hcat1.pig
WARNING: Use "yarn jar" to launch YARN applications.
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
16/04/01 13:44:13 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0.2.3.4.0-3485 (rexported) compiled Dec 16 20 15, 04:30:33
2016-04-01 13:44:13,645 [main] INFO org.apache.pig.Main - Logging error messages to: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,184 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. File /user/admin/pig/scripts/hca t1.pig does not exist
Details at logfile: /tmp/hsperfdata_hdfs/pig_1459532653643.log
2016-04-01 13:44:14,203 [main] INFO org.apache.pig.Main - Pig script completed in 753 milliseconds (753 ms)
[hdfs#ip-xxx-xx-xx-xx ec2-user]$ hadoop fs -cat /user/admin/pig/scripts/hcat1.pig
a = load 'trucks' using org.apache.hive.hcatalog.pig.HCatLoader();
b = filter a by truckid == 'A1';
store b INTO '/user/admin/pig/scritps/outputb1';

You need to specify the complete HDFS URI to run the scripts that are stored in HDFS.
Here is what you need:
$pig -useHCatalog hdfs://namenode_hostname:port/user/admin/pig/scripts/hcat1.pig

PIG command execution

I am learning Hadoop by myself so I am not sure if what I asking is even a problem. When I run the command pig -x local to run it locally, i get the following message:
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-10-05 15:23:28,830 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-05 15:23:28,831 [main] INFO org.apache.pig.Main - Logging error messages to: /home/nkhl/pig_1444038808829.log
2015-10-05 15:23:29,050 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/nkhl/.pigbootup not found
2015-10-05 15:23:29,333 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-05 15:23:29,334 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-05 15:23:29,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2015-10-05 15:23:29,562 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
It looks different on my online tutor's screen so I am a little confused.
What concerns me most is the deprecation part. Can someone help me with that please? What is it trying to say? Don't get me wrong, everything works fine. The GRUNT shell loads up, and things execute fine. I just wanted to know what that meant.
It's an Ubuntu machine.
Thanks!

Running pig as local is great AFAIK if you are using for some quick testing.Like displaying the sysout in UDF etc.
The above warnings you can safely ignore.It is saying that some of the variables set in conf-site.xml are deprecated.
You can switch off those parameters by editing the
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation
in log4j.properties file.

You have some Hadoop-related variables set, such as HADOOP_HOME or HADOOP_PREFIX or HADOOP_CONF_DIR, which aren't needed if you are running Pig in local mode.
unset HADOOP_HOME
unset HADOOP_PREFIX
unset HADOOP_CONF_DIR
Deprecations aren't scary. They are a reminder that the code is calling on something that will eventually go away in a future version. These specific deprecations are caused by differences between Hadoop 1 vs Hadoop 2. Pig is compatible with both versions. If you happened to be using Hadoop 1.2.1 instead of 2.x, you wouldn't see the warnings. This is because Pig is checking the Hadoop 1 values first.
If you're interested in learning more, you can check out the Pig source code.
https://github.com/apache/pig/blob/release-0.15.0/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java#L219-L222

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore:
➜ ~ pig -useHCatalog
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2015-05-01 15:45:08,657 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.4.0 (rUnversioned directory) compiled Apr 21 2015, 12:19:15
2015-05-01 15:45:08,658 [main] INFO org.apache.pig.Main - Logging error messages to: /home/itam/pig_1430495108571.log
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2015-05-01 15:45:09,940 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,941 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2015-05-01 15:45:09,941 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,999 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,088 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,125 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,126 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,160 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,162 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,194 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,195 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,227 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,261 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,262 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,295 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,296 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
and when I tried to access the table:
grunt> a = load 'ufos' using org.apache.hcatalog.pig.HCatLoader();
2015-05-01 15:46:11,656 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/itam/pig_1430495108571.log
grunt>
Hadoop version
➜ ~ hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:16Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
UPDATE: I just tried with Impala, and It neither sees anything:
➜ ~ impala-shell
/usr/lib/python2.7/dist-packages/pkg_resources.py:1049: UserWarning: /home/itam/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extracti
on_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Starting Impala Shell without Kerberos authentication
Connected to 6b512e41337d:21000
Server version: impalad version 2.2.0-cdh5 RELEASE (build 2ffd73a4255cefd521362ffe1cfb37463f67f75c)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015)
[6b512e41337d:21000] > invalidate metadata;
Query: invalidate metadata
[6b512e41337d:21000] > show tables;
Query: show tables
Fetched 0 row(s) in 0.00s
but from beeline:
~ beeline -u jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
0: jdbc:hive2://> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| ufos |
+-----------+--+
1 row selected (0.701 seconds)
It worked... What is happening?
UPDATE: I am running hcatalog too
➜ ~ sudo service hive-webhcat-server status
* WEBHCat server is running
➜ ~ hcat -e "desc ufos"
OK
timestamp string from deserializer
city string from deserializer
state string from deserializer
shape string from deserializer
duration string from deserializer
summary string from deserializer
posted string from deserializer
Time taken: 1.314 seconds
UPDATE: The problem with impala was due that I didn't copy hive-site.xml to /etc/impala/conf, once this is done, impala-shell worked properly.

The loader you are using is deprecated. Instead of using org.apache.hcatalog.pig.HCatLoader, you need to use org.apache.hive.hcatalog.pig.HCatLoader.
From org.apache.hcatalog.pig.HCatLoader:
Deprecated.
Use/modify HCatLoader instead

I was facing the issue in HDP 2.3 and Pig 0.15 .
Package name for HCatLoader() class is different in Hortonworks distribution.
The following worked for me
USING org.apache.hive.hcatalog.pig.HCatLoader()
instead of
USING org.apache.hcatalog.pig.HCatLoader();

like you started to see the issues you have is with hive-site.xmlfile - you need to place it in the classpath
As mention here:
A workflow action interacting with HCatalog requires the following
jars in the classpath: hcatalog-core.jar, webhcat-java-client.jar,
hive-common.jar, hive-exec.jar, hive-metastore.jar, hive-serde.jar and
libfb303.jar. hive-site.xml which has the configuration to talk to the
HCatalog server also needs to be in the classpath. The correct version
of HCatalog and hive jars should be placed in classpath based on the
version of HCatalog installed on the cluster.
The jars can be added to the classpath of the action using one of the
below ways.
You can place the jars and hive-site.xml in the system shared library.
The shared library for a pig, hive or java action can be overridden to
include hcatalog shared libraries along with the action's shared
library. Refer to Shared Libraries for more information. The
oozie-sharelib-[version].tar.gz in the oozie distribution bundles the
required HCatalog jars in a hcatalog sharelib. If using a different
version of HCatalog than the one bundled in the sharelib, copy the
required HCatalog jars from such version into the sharelib.
You can
place the jars and hive-site.xml in the workflow application lib/
path.
You can specify the location of the jar files in archive tag and
the hive-site.xml in file tag in the corresponding pig, hive or java
action.
If you are going to use Oozie coordinator, upload them to HDFS coordinator path

Pig error while while entering simple scripts in hadoop 2 environment

I am using hadoop-2.5.1 and pig-0.13.0, and my hadoop cluster running very well. When I try to run simple pig script
test = load '/input-data/data10' using PigStorage(',');
I am getting an error:
2014-11-13 15:41:19,278 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-11-13 15:41:19,279 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.addres
Please if any one having solution let me know.

This is not an error, it's just an info logging.
If your script does not attemp to perform any action on loaded data and doesn't throw any error, then it probably works fine.
Try to add some actions with test variable, for example:
DUMP test;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio