Hadoop Edge HDFS points to local FS - hadoop

I have done my Hadoop cluster including 1 NameNode and 2 DataNodes and everything works perfectly :)
Now, I want to add a Hadoop Edge (aka Hadoop Gateway), I followed instructions here and finally, I execute :
hadoop fs -ls /
But unfortunately, I expected to see my HDFS's content but I see my local FS :
Found 22 items
-rw-r--r-- 1 root root 0 2017-03-30 16:44 /autorelabel
dr-xr-xr-x - root root 20480 2017-03-30 16:49 /bin
...
drwxr-xr-x - root root 20480 2016-07-08 17:31 /home
I think my core-site.xml is configurated as needed with specific property :
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopnodemaster1:8020/</value>
</property>
hadoopmaster1 is my namenode and is reachable ..
I don't understand why I see my Local FS and not my HDFS .. Thank you :)

Related

Unable to start datanode and file permissions of datanode are changing when start-dfs.sh is started

I was facing issues in deploying local files to hdfs and found that I should have "drwx------" for datanode and namenode.
Initial permission status of datanode and namenode in hdfs.
drwx------ 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
Permission of datanode is changed to 755
hduser#pradeep:~$ chmod -R 755 /usr/local/hadoop_store/hdfs/
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 16:45 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 17:30 namenode
After initiating start-dfs.sh, datanode didn't start and permissions to datanode were restored to original state.
hduser#pradeep:~$ $HADOOP_HOME/sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop- hduser-namenode-pradeep.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-pradeep.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-pradeep.out
hduser#pradeep:~$ jps
4385 Jps
3903 NameNode
4255 SecondaryNameNode
hduser#pradeep:~$ ls -l /usr/local/hadoop_store/hdfs/
total 8
drwx------ 3 hduser hadoop 4096 Mar 2 22:34 datanode
drwxr-xr-x 3 hduser hadoop 4096 Mar 2 22:34 namenode
As datanode is not running i am not able to deploy data to hdfs from local file system. I couldn't understand or find any reason why the file permissions are restored to previous state only for datanode folder.
it appears the name space id generated by the NameNode is different from your DataNode.
Solution:
if you goto the path where your hadoop files are stored on the local file system.
forexample /usr/local/hadoop. go down the path to /usr/local/hadoop/tmp/dfs/name/version. copy the namespaceid and take it to the path /usr/local/hadoop/tmp/dfs/data/version , replace the namespaceid.
i hope this helps.

HDP: unable to start Phoenix sqlline.py

I am using Sandbox HDP 2.2
I did a yum install phoenix (version is 4.2)
But when I run these:
./sqlline.py localhost:2181
./sqlline.py localhost
./sqlline.py sandbox.hortonworks.com:2181
./sqlline.py sandbox.hortonworks.com
I got the error:
15/07/03 08:26:31 ERROR client.ConnectionManager$HConnectionImplementation:
The node /hbase is not in ZooKeeper. It should have been written by the master.
Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch
with the one configured in the master.
I tried to run:
./sqlline.py sandbox.hortonworks.com:2181:/hbase-unsecure
But it "hangs" - after 20 minutes still no response
I have this in my /etc/hbase/conf/hbase-site.xml:
<property>
<name>hbase.zookeeper.quorum</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase-unsecure</value>
</property>
You have to create links in the directory where sqlline.py lives to 2 .xml files that are provided by HBase/Hadoop.
$ pwd
/usr/hdp/2.2.8.0-3150/phoenix/bin
$ ll | grep xml
lrwxrwxrwx 1 root root 29 Dec 16 13:34 core-site.xml -> /etc/hbase/conf/core-site.xml
lrwxrwxrwx 1 root root 30 Dec 16 13:34 hbase-site.xml -> /etc/hbase/conf/hbase-site.xml
With those in place and $JAVA_HOME and java on your $PATH, you can now run sqlline.py:
$ ./sqlline.py localhost:2181/hbase-unsecure
You will need to specify root dir "/hbase-unsecure" in the connection string because by default Phoenix is trying to connect to /hbase. Try this:
./sqlline.py localhost:2181:/hbase-unsecure

Cannot load a file from Hadoop HDFS from Pig Latin

I am having trouble trying to load a csv from file. I keep on getting the following error:
Input(s):
Failed to read data from "hdfs://localhost:9000/user/der/1987.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"
Looking at my Hadoop hdfs installed in my local machine I see the file. In fact the file is located at multiple locations such as /, /user/ , etc.
hdfs dfs -ls /user/der
Found 1 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv
My pig scripts is as follows:
records = LOAD '1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);
milage_recs= GROUP records ALL;
tot_miles = FOREACH milage_recs GENERATE SUM(records.Distance);
STORE tot_miles INTO 'totalmiles3';
I ran pig with the -x local option. I was able to read the files from my local hard disk with the -x local option. Got the right answer and the tail -f on Hadoop namenode did not scroll which proves I ran the files all locally on hard disk:
pig -x local totalmiles.pig
Now I am getting errors. It seems the hadoop name server is getting request because I used tail -f and see the logs scroll.
pig totalmiles.pig
records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS
I get the following error:
Failed Jobs:
JobId Alias Feature Message Outputs
job_local602774674_0001 milage_recs,records,tot_miles
GROUP_BY,COMBINER Message: ENOENT: No such file or directory
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.j
ava:724)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java: 502)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSys tem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUpl
oader.java:94)
at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitte
r.java:98)
at org .apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
...blah...
Input(s):
Failed to read data from "/user/der/1987.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"
I used the hdfs to check for permissions by mkdir and that seems ok:
hdfs dfs -mkdir /user/der/temp2
hdfs dfs -ls /user/der
Found 3 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv
drwxr-xr-x - der supergroup 0 2015-05-28 16:21
/user/der/temp2
drwxr-xr-x - der supergroup 0 2015-05-28 15:57
/user/der/test
I tried the pig with mapreduce option and still get the same type of error:
pig -x mapreduce totalmiles.pig
5-05-28 20:58:44,608 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.jobc
ontrol.ControlledJob - PigLatin:totalmiles.pig while
submitting
ENOENT: No such file or directory
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Na at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermissi at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSy
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(Job
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(Jo
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobS
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
My core-site.xml has the temp dir as follows:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop</value>
<description>A base for other temporary directories.
</description>
</property>
and my hdfs-site.xml as the namenode and datanode as follows:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/datanode</value>
</property>
I've gotten a bit further in debugging the issue. It seems my namenode is misconfigured as I cannot reformat it:
[hadoop hdfs formatting gets error failed for Block pool ]
We have to give the hadoop file path as : /user/der/1987.csv
records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);
If its for testing, you can have the file : 1987.csv in the path from where you are executing the pig script, i.e. have 1987.csv and the .pig file in the same location.

Hadoop previous.checkpoint location

I tried hadoop-1.0.3 as well as 1.0.4. Both under pseudo cluster mode.
My understanding is that the previous.checkpoint directory should be created under secondary name node designated by "fs.checkpoint.dir"? On all occasions I am finding it under the namenode directory designated by "dfs.name.dir". It this something to do with Pseudo mode or my understanding is wrong? Can someone please help!
Below is my hdfs-site.xml file.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/hadoop/lab/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/hadoop/lab/hdfs/datanode</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/home/hadoop/lab/hdfs/secnamenode</value>
</property>
</configuration>
The below is high level directory structure for the daemons
hadoop#ubuntu:~/lab/hdfs$ ls -l
total 12
drwxr-xr-x 6 hadoop hadoop 4096 Mar 14 03:41 datanode
drwxrwxr-x 5 hadoop hadoop 4096 Mar 14 03:41 namenode
drwxrwxr-x 4 hadoop hadoop 4096 Mar 14 04:46 secnamenode
Below is the NameNode directory details
hadoop#ubuntu:~/lab/hdfs$ ls -l namenode
total 12
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 04:46 current
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:13 image
-rw-rw-r-- 1 hadoop hadoop 0 Mar 14 03:41 in_use.lock
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:34 previous.checkpoint
Below is the SNN directory details
hadoop#ubuntu:~/lab/hdfs$ ls -l secnamenode
total 8
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 04:46 current
drwxrwxr-x 2 hadoop hadoop 4096 Mar 14 03:46 image
-rw-rw-r-- 1 hadoop hadoop 0 Mar 14 03:41 in_use.lock
If you need any further details please let me know.
Thanks
Rags
I have done an extensive and desperate search about this. There seems to be a long pending Bug HDFS-1839 which removes previous.checkpoint directory from the SecondaryNameNode. The same bug might be responsible for having this directory created under the NameNode.
I have seen all the versions of hadoop so far and under all these the previous.checkpoint directory is consistently being created under the NameNode.
Hope soon this bug will be fixed or Apache Hadoop clarifies why directory is created under NameNode

How can I troubshoot this Hadoop filesystem installation error?

I'm trying to install Hadoop on a non-Cloudera Ubuntu test image. Everything seems to have been going well until I ran ./bin/start-all.sh. The name node never comes up so I can't even run a hadoop fs -ls to connect to the filesystem.
Here's the namenode log:
2011-03-24 11:38:00,256 INFO org.apache.hadoop.ipc.Server: Stopping server on 54310
2011-03-24 11:38:00,257 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:312)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:293)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:224)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:306)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1006)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1015)
2011-03-24 11:38:00,258 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Brash/192.168.1.5
************************************************************/
I've chmod -R 755 on the root directory and even gone so far as to make sure the directory exists by creating it with mkdir -p.
hadoop#Brash:/usr/lib/hadoop$ ls -la /usr/local/hadoop-datastore/hadoop-hadoop/dfs/
total 16
drwxr-xr-x 4 hadoop hadoop 4096 2011-03-24 11:41 .
drwxr-xr-x 4 hadoop hadoop 4096 2011-03-24 11:31 ..
drwxr-xr-x 2 hadoop hadoop 4096 2011-03-24 11:31 data
drwxr-xr-x 2 hadoop hadoop 4096 2011-03-24 11:41 name
Here's my /conf/hdfs-site.xml:
hadoop#Brash:/usr/lib/hadoop$ cat conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
You should never have to create the directory yourself. It will create it on its own. Did you forget to format namenode? Delete the existing directory, then reformat the namenode (bin/hadoop namenode -format) and try again.

Resources