Flag -useHCatalog not working - hadoop

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connect pig with the metastore:
➜ ~ pig -useHCatalog
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2015-05-01 15:45:08,657 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0-cdh5.4.0 (rUnversioned directory) compiled Apr 21 2015, 12:19:15
2015-05-01 15:45:08,658 [main] INFO org.apache.pig.Main - Logging error messages to: /home/itam/pig_1430495108571.log
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,035 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,035 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2015-05-01 15:45:09,940 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:09,941 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2015-05-01 15:45:09,941 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:09,999 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,001 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,088 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,089 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,125 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,126 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,160 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,162 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,194 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,195 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,227 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,228 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,261 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,262 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-05-01 15:45:10,295 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-05-01 15:45:10,296 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
and when I tried to access the table:
grunt> a = load 'ufos' using org.apache.hcatalog.pig.HCatLoader();
2015-05-01 15:46:11,656 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hcatalog.pig.HCatLoader using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /home/itam/pig_1430495108571.log
grunt>
Hadoop version
➜ ~ hadoop version
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:16Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
UPDATE: I just tried with Impala, and It neither sees anything:
➜ ~ impala-shell
/usr/lib/python2.7/dist-packages/pkg_resources.py:1049: UserWarning: /home/itam/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extracti
on_path or the PYTHON_EGG_CACHE environment variable).
warnings.warn(msg, UserWarning)
Starting Impala Shell without Kerberos authentication
Connected to 6b512e41337d:21000
Server version: impalad version 2.2.0-cdh5 RELEASE (build 2ffd73a4255cefd521362ffe1cfb37463f67f75c)
Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015)
[6b512e41337d:21000] > invalidate metadata;
Query: invalidate metadata
[6b512e41337d:21000] > show tables;
Query: show tables
Fetched 0 row(s) in 0.00s
but from beeline:
~ beeline -u jdbc:hive2://
scan complete in 2ms
Connecting to jdbc:hive2://
Connected to: Apache Hive (version 1.1.0-cdh5.4.0)
Driver: Hive JDBC (version 1.1.0-cdh5.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.4.0 by Apache Hive
0: jdbc:hive2://> show tables;
OK
+-----------+--+
| tab_name |
+-----------+--+
| ufos |
+-----------+--+
1 row selected (0.701 seconds)
It worked... What is happening?
UPDATE: I am running hcatalog too
➜ ~ sudo service hive-webhcat-server status
* WEBHCat server is running
➜ ~ hcat -e "desc ufos"
OK
timestamp string from deserializer
city string from deserializer
state string from deserializer
shape string from deserializer
duration string from deserializer
summary string from deserializer
posted string from deserializer
Time taken: 1.314 seconds
UPDATE: The problem with impala was due that I didn't copy hive-site.xml to /etc/impala/conf, once this is done, impala-shell worked properly.

The loader you are using is deprecated. Instead of using org.apache.hcatalog.pig.HCatLoader, you need to use org.apache.hive.hcatalog.pig.HCatLoader.
From org.apache.hcatalog.pig.HCatLoader:
Deprecated.
Use/modify HCatLoader instead

I was facing the issue in HDP 2.3 and Pig 0.15 .
Package name for HCatLoader() class is different in Hortonworks distribution.
The following worked for me
USING org.apache.hive.hcatalog.pig.HCatLoader()
instead of
USING org.apache.hcatalog.pig.HCatLoader();

like you started to see the issues you have is with hive-site.xmlfile - you need to place it in the classpath
As mention here:
A workflow action interacting with HCatalog requires the following
jars in the classpath: hcatalog-core.jar, webhcat-java-client.jar,
hive-common.jar, hive-exec.jar, hive-metastore.jar, hive-serde.jar and
libfb303.jar. hive-site.xml which has the configuration to talk to the
HCatalog server also needs to be in the classpath. The correct version
of HCatalog and hive jars should be placed in classpath based on the
version of HCatalog installed on the cluster.
The jars can be added to the classpath of the action using one of the
below ways.
You can place the jars and hive-site.xml in the system shared library.
The shared library for a pig, hive or java action can be overridden to
include hcatalog shared libraries along with the action's shared
library. Refer to Shared Libraries for more information. The
oozie-sharelib-[version].tar.gz in the oozie distribution bundles the
required HCatalog jars in a hcatalog sharelib. If using a different
version of HCatalog than the one bundled in the sharelib, copy the
required HCatalog jars from such version into the sharelib.
You can
place the jars and hive-site.xml in the workflow application lib/
path.
You can specify the location of the jar files in archive tag and
the hive-site.xml in file tag in the corresponding pig, hive or java
action.
If you are going to use Oozie coordinator, upload them to HDFS coordinator path

Related

Error While executing pig script which is saved in HDFS path

I have placed .pig file and txt file in HDFS path
Trying to execute .pig from Grunt
getting below error
students.txt
001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi
Script.pig
student = LOAD 'hdfs://localhost:8020/pig_data/students.txt' USING PigStorage(',')
as (id:int,name:chararray,city:chararray);
Dump student;
grunt> exec /user/cloudera/pig_data/script.pig
2019-07-24 12:13:35,303 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2019-07-24 12:13:35,304 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-07-24 12:13:35,376 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2019-07-24 12:13:35,378 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-07-24 12:13:35,384 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. File not found: /user/cloudera/pig_data/script.pig
Details at logfile: /home/cloudera/pig_1563995548146.log
grunt>

PIG command execution

I am learning Hadoop by myself so I am not sure if what I asking is even a problem. When I run the command pig -x local to run it locally, i get the following message:
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/10/05 15:23:28 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-10-05 15:23:28,830 [main] INFO org.apache.pig.Main - Apache Pig version 0.15.0 (r1682971) compiled Jun 01 2015, 11:44:35
2015-10-05 15:23:28,831 [main] INFO org.apache.pig.Main - Logging error messages to: /home/nkhl/pig_1444038808829.log
2015-10-05 15:23:29,050 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/nkhl/.pigbootup not found
2015-10-05 15:23:29,333 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-05 15:23:29,334 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-10-05 15:23:29,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2015-10-05 15:23:29,562 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
It looks different on my online tutor's screen so I am a little confused.
What concerns me most is the deprecation part. Can someone help me with that please? What is it trying to say? Don't get me wrong, everything works fine. The GRUNT shell loads up, and things execute fine. I just wanted to know what that meant.
It's an Ubuntu machine.
Thanks!
Running pig as local is great AFAIK if you are using for some quick testing.Like displaying the sysout in UDF etc.
The above warnings you can safely ignore.It is saying that some of the variables set in conf-site.xml are deprecated.
You can switch off those parameters by editing the
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation
in log4j.properties file.
You have some Hadoop-related variables set, such as HADOOP_HOME or HADOOP_PREFIX or HADOOP_CONF_DIR, which aren't needed if you are running Pig in local mode.
unset HADOOP_HOME
unset HADOOP_PREFIX
unset HADOOP_CONF_DIR
Deprecations aren't scary. They are a reminder that the code is calling on something that will eventually go away in a future version. These specific deprecations are caused by differences between Hadoop 1 vs Hadoop 2. Pig is compatible with both versions. If you happened to be using Hadoop 1.2.1 instead of 2.x, you wouldn't see the warnings. This is because Pig is checking the Hadoop 1 values first.
If you're interested in learning more, you can check out the Pig source code.
https://github.com/apache/pig/blob/release-0.15.0/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java#L219-L222

hbase warning about deprecated native.lib

hadoop conf file: opt/hadoop/etc/hadoop/core-site.xml
when set
<name>hadoop.native.lib</name>
and then start hbase shell, there will be four lines of warning:
2015-02-10 11:07:46,956 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,005 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,046 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,081 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2015-02-10 11:07:47,169 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
but when set
<name>io.native.lib.available</name>
and then start hbase shell, there will be one line of warning:
2015-02-10 11:07:46,956 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
How can I set to make it doesn't show any of this warning?
I'm on hadoop 2.5.2 and hbase 0.98.8 #ubuntu x64.

PIG setup throwing error

I was trying to install PIG v0.13.0 in my Fedora 20 system. After extracting the tar.gz contents, I did the PATH setup for JAVA_HOME and PIG/bin. Then I type the command pig in the console and this is what I got: Unable to understand what went wrong:
[root#localhost /]# pig
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/12/21 00:05:15 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-12-21 00:05:16,082 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-12-21 00:05:16,083 [main] INFO org.apache.pig.Main - Logging error messages to: //pig_1419100516081.log
2014-12-21 00:05:16,130 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2014-12-21 00:05:16,765 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:16,771 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-12-21 00:05:16,771 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2014-12-21 00:05:16,780 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
2014-12-21 00:05:19,130 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2014-12-21 00:05:19,130 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
2014-12-21 00:05:19,136 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> ls
2014-12-21 00:05:33,697 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Call From localhost.localdomain/127.0.0.1 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Details at logfile: //pig_1419100516081.log
Please let me know why did the ls command in grunt shell throw the error?
Please guide.
When you type pig in console, by default it will go to MAPREDUCE mode, for that you need access to a Hadoop cluster and HDFS installation. Mapreduce mode is the default mode in pig.
It looks like your hadoop cluster is not configured properly that is the reason you are getting the connection refunded error. Please follow up this link to solve this connect-refused problem.http://wiki.apache.org/hadoop/ConnectionRefused.
As a workaround use LOCAL mode, this doesn't need hadoop installation.
In the console type pig -x local this will bring the grunt shell and type ls command.
Local mode
$ pig -x local
Mapreduce mode
$ pig
(or) //try to connect HDFS
$ pig -x mapreduce
Ok I got this one working. if I connect to the pig mapreduce mode the the ls command will change to ls hdfs:/. Hence changing the above command from ls to ls hdfs:/ resolves my problem. But again, if I am connecting to the local mode then the ls command works fine.

Pig error while while entering simple scripts in hadoop 2 environment

I am using hadoop-2.5.1 and pig-0.13.0, and my hadoop cluster running very well. When I try to run simple pig script
test = load '/input-data/data10' using PigStorage(',');
I am getting an error:
2014-11-13 15:41:19,278 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-11-13 15:41:19,279 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.addres
Please if any one having solution let me know.
This is not an error, it's just an info logging.
If your script does not attemp to perform any action on loaded data and doesn't throw any error, then it probably works fine.
Try to add some actions with test variable, for example:
DUMP test;

Resources