Not able to read hdfs files through pig on pseudo node cluster - hadoop

I have this very basic test (immediately after installation of both hadoop 2.7 and pig 0.14)
the file exists in hdfs -
hdfs://master:50070/user/raghav/family<r 2> 32
hdfs://master:50070/user/raghav/nsedata <dir>
however, when i run the following,
A = LOAD 'family';
dump A;
i get the following error message -
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.7.0 0.14.0 raghav 2015-05-19 21:38:35 2015-05-19 21:38:41 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1432066972596_0002 A MAP_ONLY Message: Job failed! hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833,
Input(s):
Failed to read data from "hdfs://master:50070/user/raghav/family"
Output(s):
Failed to produce result in "hdfs://master:50070/tmp/temp-1977333348/tmp-1065056833"
Further investigations reveal a bit more.. As indicated, I can see the file on hdfs (from within pig through ls command) and also from shell prompt using hadoop fs commands. however, neither pig nor hive are able to see the files on hdfs.
I also tried to play around with the nematode ports (tried different values 8020, 9000, 50070) but the bahaviour remains same. I tried looking through nematode and datanode logs too, but couldn't find anything more...
serious help required !!!
Answers to some questions
myhost raghav$ hdfs dfs -ls /user/raghav/family
15/05/20 08:03:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
myhost raghav$ hdfs dfs -ls /user/raghav/
15/05/20 08:04:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 raghav supergroup 32 2015-05-15 01:01 /user/raghav/family
drwxr-xr-x - raghav supergroup 0 2015-05-15 00:25 /user/raghav/nsedata
myhost raghav$ hadoop fs -ls /
15/05/20 08:04:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - raghav supergroup 0 2015-05-19 23:06 /tmp
drwxr-xr-x - raghav supergroup 0 2015-05-20 07:30 /user
myhost raghav$
Further tests reveal that hive is able to use hdfs, but pig still can't. I could create an external table in hive, successfully pointing to the example file 'family'
create external table xfamily(name STRING, age INT)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> STORED AS TEXTFILE
> LOCATION '/user/raghav';
OK
Time taken: 0.023 seconds
hive> select * from xfamily;
xxxxxx - expected data shows up.

Related

FAILED: HiveAuthzPluginException Error getting permissions for hdfs

I am trying to insert data into hive table from a file in hdfs directory by the query:
$ jdbc:hive2://localhost:10000> LOAD DATA INPATH '/user/xyz/stdfiles/testtbl.txt' OVERWRITE INTO TABLE testdb.testtbl;
But the query failed resulting:
Error: Error while compiling statement: FAILED:
HiveAuthzPluginException Error getting permissions for
hdfs://localhost:9000/user/xyz/stdfiles/testtbl.txt: null
(state=42000,code=40000)
I have tried giving permissions by the following command which gives no error:
$ hdfs dfs -chown -R stdfiles /user/xyz/stdfiles
$ hdfs dfs -chmod -R 777 /user/xyz/stdfiles/testtbl.txt
Checked:
$ hdfs dfs -ls /user/xyz/stdfiles
19/05/22 09:15:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxrwxrwx 1 stdfiles supergroup 6 2019-05-22 08:45 /user/xyz/stdfiles/testtbl.txt
Inserting data successfully is by desired output
Add following properties in hadoop configuration file core-site.xml worked for me :)
<property>
<name>hadoop.proxyuser.niazullah.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.niazullah.groups</name>
<value>*</value>
</property>
Also check for user access hdfs:
$ hdfs dfs -ls /user
Output:
drwxr-xr-x - main supergroup 0 2019-05-22 13:22 /user/test
Where "main" is the user change it do the hive user

How to use hadoop from spark thrift server?

Please consider the following setup.
hadoop version 2.6.4
spark version 2.1.0
OS CentOS Linux release 7.2.1511 (Core)
All software is installed on a single machine as a single node cluster, spark is installed in standalone mode.
I am trying to use Spark Thrift Server.
To start the spark thrift server I run the shell script
start-thriftserver.sh
After running the thrift server, I can run beeline command line tool and issue the following commands:
The commands run successfully:
!connect jdbc:hive2://localhost:10000 user_name '' org.apache.hive.jdbc.HiveDriver
create database testdb;
use testdb;
create table names_tab(a int, name string) row format delimited fields terminated by ' ';
My first question is where on haddop is the underlying file/folder for this table/database created?
The problem is even if hadoop is stopped using stop-all.sh, still the create table/database command is successful,
which makes me think that the table is not created on hadoop at all.
My second question is how do I tell spark where in the world is hadoop installed?
and ask spark to use hadoop as the underlying data store for all queries run from beeline.
Am I supposed to install spark in some other mode?
Thanks in advance.
My objective was to get the beeline command line utility work through Spark Thrift Server using hadoop as underlying data-store and I got it to work. My setup was like this:
Hadoop <--> Spark <--> SparkThriftServer <--> beeline
I wanted to configure spark in such a manner that it uses hadoop for all queries run at beeline command line utility.
The trick was to specify the following property in spark-defaults.xml.
spark.sql.warehouse.dir hdfs://localhost:9000/user/hive/warehouse
By default spark uses derby for both meta data and the data itself (called warehouse in spark)
In order to have spark use hadoop as warehouse I had to add this property.
Here is a sample output
./beeline
Beeline version 1.0.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 abbasbutt '' org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/abbasbutt/Projects/hadoop_fdw/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Spark SQL (version 2.1.0)
Driver: Hive JDBC (version 1.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> create database my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.379 seconds)
0: jdbc:hive2://localhost:10000> use my_test_db;
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.03 seconds)
0: jdbc:hive2://localhost:10000> create table my_names_tab(a int, b string) row format delimited fields terminated by ' ';
+---------+--+
| Result |
+---------+--+
+---------+--+
No rows selected (0.11 seconds)
0: jdbc:hive2://localhost:10000>
Here are the corresponding files in hadoop
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/
17/01/19 10:48:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/fdw_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:23 /user/hive/warehouse/my_spark_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:47 /user/hive/warehouse/my_test_db.db
drwxrwxr-x - abbasbutt supergroup 0 2017-01-18 23:45 /user/hive/warehouse/testdb.db
[abbasbutt#localhost test]$ hadoop fs -ls /user/hive/warehouse/my_test_db.db/
17/01/19 10:50:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
drwxrwxr-x - abbasbutt supergroup 0 2017-01-19 10:50 /user/hive/warehouse/my_test_db.db/my_names_tab
[abbasbutt#localhost test]$

Folder Not Created with hadoop fs -mkdir

Hey I am installing HIVE in a Hadoop 2.0 Multi Node cluster ,and I am not able to Create folder using this command :
[hadoop#master ~]$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
16/07/19 14:20:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop#master ~]$ $HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
16/07/19 14:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Importantly I am not able to find the Created Folder ? Where it will go and create I am not sure. Please help.
JPS for Hadoop is working fine:
[hadoop#master ~]$ jps
2977 ResourceManager
2613 DataNode
3093 NodeManager
2822 SecondaryNameNode
2502 NameNode
5642 Jps
The warning you are getting after running -mkdir command does not impact the Hadoop functionality. It's just a warning, just ignore it. See here for details.
About creating directories under root i.e. "/", it is just one-time activity and should be done by superuser. Once you create the root directories like "/tmp", "/user" etc., then you can create user specific foders like "/user/hduser" and own them using commands:
sudo -u hdfs hdfs dfs -mkdir /tmp
OR
sudo -u hdfs hdfs dfs -mkdir -p /user/hive/warehouse
Once you have the main folder ready, just own it with the user who will be using it:
sudo -u hdfs hdfs dfs -chown hduser:hadoop /user/hive/warehouse
If you want to find the files/directories created on HDFS, then you have to interact with HDFS filesystem using CLI commands only
e.g. hdfs dfs -ls /
The data which is created on HDFS has a physical location on your local filesystem also, but you'll not see that location as files and directories. Look for the dfs.namenode.name.dir and dfs.datanode.data.dir properties in 'hdfs-site.xml' under your installation, usually located at: "/usr/local/hadoop/etc/hadoop/hdfs-site.xml"

Create directory in hadoop filesystem

I'm new to hadoop. I am trying to create a directory in hdfs but I am not able to create.
I have logged into "hduser" hence I assumed /home/hduser" pre-exists as Unix fs. So I tried to create hadoop directory using below command.
[hduser#Virus ~]$ hadoop fs -mkdir /home/hduser/mydata/
14/12/03 15:04:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: `/home/hduser/mydata/': No such file or directory
After online search, i thought of it is possible that hadoop is not able to understand "/home/hduser" or as I m using hadoop2 where mkdir wont work like Unix command "madir -p" (recursively). Hence I tried to create "/mydata" but no luck.
[hduser#Virus ~]$ hadoop fs -mkdir /mydata
14/12/03 15:09:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: Cannot create directory /mydata. Name node is in safe mode.
I tried to leave the safemode but still issue persists.
[hduser#Virus ~]$ hdfs dfsadmin -safemode leave
14/12/03 15:09:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Safe mode is OFF
I also tried with "/user/mydata" as "/user" is the directory which hadoop took as home.
[hduser#Virus ~]$ hadoop fs -mkdir /user/mydata
14/12/03 15:36:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: Cannot create directory /user/mydata. Name node is in safe mode.
Now how to debug further?
To leave safe mode, try below command since hadoop dfsadmin -safemode is deprecated in newer distribution of Hadoop:
hdfs dfsadmin -safemode leave
By default, user's home directory in hdfs exists with '/user/hduser' not as /home/hduser'.
If you tried to create directory directly like below then it will be created like '/user/hduser/sampleDir'.
hadoop fs -mkdir /path/to/be/created
On HDFS,
hdfs dfs -mkdir -p /this/is/a/new/directory
Create a directory /user
hadoop fs -mkdir /user
then with your user name
hadoop fs -mkdir /user/yourusername
Now try to creating directory.
List your directory
hadoop fs -ls /
Output:
Found 1 items
drwxr-xr-x - sony supergroup 0 2016-12-10 16:45 /usr
create a directory
hadoop fs -mkdir /app
created successfully and check
hadoop fs -ls /
Output:
Found 2 items
drwxr-xr-x - sony supergroup 0 2016-12-12 04:11 /usr
drwxr-xr-x - sony supergroup 0 2016-12-10 16:45 /app

Hadoop\HDFS: "no such file or directory"

I have installed Hadoop 2.2 on a single machine using this tutorial: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Some details were changed a little bit - for example, I used java 8, /hadoop root dir etc. Users, SSH, config keys - the same.
Namenode was successfully formatted:
13/12/22 05:42:31 INFO common.Storage: Storage directory /hadoop/tmp/dfs/name has been successfully formatted.
13/12/22 05:42:31 INFO namenode.FSImage: Saving image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/12/22 05:42:32 INFO namenode.FSImage: Image file /hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
13/12/22 05:42:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/12/22 05:42:32 INFO util.ExitUtil: Exiting with status 0
13/12/22 05:42:32 INFO namenode.NameNode: SHUTDOWN_MSG:
However, not 'mkdir' neither even 'ls' command worked:
$ /hadoop/hadoop/bin/hadoop fs -ls
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
13/12/22 05:39:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
Thanks for any help guys.
Try
hadoop fs -ls /
Tested on hadoop 2.4
In Hadoop 2.4
hdfs dfs -mkdir /input
hdfs dfs -ls /
Worked in my case:
First Get hadoop installed path by :
echo ${HADOOP_INSTALL} //in my case output is : `/user/local/hadoop`
Then create directory at your hadoop installed path, If you know your hadoop installed directory ignore above command
hadoop fs -mkdir -p /user/local/hadoop/your_directory
Here hadoop is directory
Tested on hadoop 2.4
I have verified this worked in Hadoop 2.5
hdfs dfs -mkdir /input
(where /input is the HDFS directory)

Resources