Browsing into a folder in Hadoop - hadoop

I ssh to the debvox that it for Hadoop and if I saay hadoop fs -ls I get a lot of files including
drwxr-xr-x - root hadoop 0 2013-07-11 17:49 sandeep
drwxr-xr-x - root hadoop 0 2013-04-10 14:13 testprocedure
drwxr-xr-x - root hadoop 0 2013-04-03 13:56 tmp
I need to go inside that tmp folder, took a look at Hadoop shell commands in here but still didn't find the command for it. http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html
So what's the command to go to that folder?

Specify the directory name, as follows:
hadoop fs -ls tmp
Sample output from my Demo VM:
hadoop fs -ls
[cloudera#localhost ~]$ hadoop fs -ls
Found 12 items
-rw-r--r-- 1 cloudera supergroup 46 2013-06-18 21:18 /user/cloudera/FileWrite.txt
-rw-r--r-- 1 cloudera supergroup 13 2013-06-18 15:34 /user/cloudera/HelloWorld.txt
drwxr-xr-x - cloudera supergroup 0 2013-07-01 22:07 /user/cloudera/hiveext
drwxr-xr-x - cloudera supergroup 0 2012-06-12 15:10 /user/cloudera/input
-rw-r--r-- 1 cloudera supergroup 176 2013-06-18 23:07 /user/cloudera/input_data.txt
drwxr-xr-x - cloudera supergroup 0 2012-09-06 15:44 /user/cloudera/movies_input
drwxr-xr-x - cloudera supergroup 0 2012-09-06 17:02 /user/cloudera/movies_output
drwxr-xr-x - cloudera supergroup 0 2012-09-06 14:53 /user/cloudera/output
drwxr-xr-x - cloudera supergroup 0 2013-07-01 23:45 /user/cloudera/sample_external_input
-rw-r--r-- 1 cloudera supergroup 16 2012-06-14 01:39 /user/cloudera/test.txt
drwxr-xr-x - cloudera supergroup 0 2012-06-13 00:00 /user/cloudera/weather_input
drwxr-xr-x - cloudera supergroup 0 2012-06-13 15:13 /user/cloudera/weather_output
When I specify a directory hadoop fs -ls sample_external_input:
[cloudera#localhost ~]$ hadoop fs -ls sample_external_input
Found 2 items
-rw-r--r-- 1 cloudera supergroup 61 2013-07-01 23:17 /user/cloudera/sample_external_input/sample_external_data.txt
-rw-r--r-- 1 cloudera supergroup 13 2013-07-01 23:18 /user/cloudera/sample_external_input/sample_external_data2.txt

I need to go inside that tmp folder, took a look at Hadoop shell
commands in here but still didn't find the command for it.
http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html
There is nothing like cd which can take us inside a directory. So you can't go to that folder like you can do in your local FS. You could probably use ls a others have suggested, but that just list the content inside a directory and doesn't take you to that directory. If you really want to go inside a particular directory, you could make use of the HDFS WebUI. You can point your web browser to NameNode_Machine:50070 to go there. It allows you to browse the entire HDFS. You can view and download the files as well.

If you specify nothing after -ls, then the folders will be those in your "home" directory. If you want to give a path relative to your home folder, you can do so
hadoop fs ls tmp/someTmpStuff
(assuming tmp is a folder in your home directory ) or use a fully qualified path
hadoop fs ls /user/me/tmp/someTmpStuff

First you need to check if you have hadoop access or not. if yes then use command :
[yourhost]$ hadoop fs -ls /dir1/
It will list directory or file which is inside dir1

Related

Navigate file system in Hadoop

When running hadoop fs -ls
drwxr-xr-x - chiki supergroup 0 2019-01-14 17:03 Party_output
drwxr-xr-x - chiki supergroup 0 2018-01-22 18:25 party_uploads
but when try to access the directory
hadoop fs -ls /Party_output
showing output as
`/Party_output': No such file or directory
That's because hadoop fs -ls shows the contents of your home directory /home/chiki/.
You need to run hadoop fs -ls Party_output to see inside that directory (because it lives in /home/chiki/Party_output and not /Party_output).

how to change supergroups in hadoop?

drwxrwxrwx - hdfs supergroup 0 2017-10-23 09:15 /benchmarks
drwxr-xr-x - cloudera supergroup 0 2018-05-07 17:31 /data
drwxr-xr-x - hbase supergroup 0 2018-05-14 15:36 /hbase
drwxr-xr-x - solr solr 0 2017-10-23 09:18 /solr
drwxrwxrwt - hdfs supergroup 0 2018-05-16 18:13 /tmp
drwxrwxrwx - hdfs supergroup 0 2018-04-24 10:32 /user
drwxr-xr-x - hdfs supergroup 0 2017-10-23 09:17 /var
how to change /data to hdfs:supergroup?
how to change /user to cloudera:supergroup?
To change anything yourself, you need to be a user that has permissions to those files already.
how to change /data to hdfs:supergroup
sudo su - hdfs
hdfs dfs -chown -R hdfs:supergroup /data
how to change /user to cloudera:supergroup
While I would not recommend you overwrite the /user properties to anyone but the HDFS superuser...
sudo su - hdfs
hdfs dfs -chown -R cloudera:supergroup /user

Hadoop error du: java.util.ConcurrentModificationException

While working on my HDFS cluster, I get this error
du: java.util.ConcurrentModificationException
whenever I run
hdfs dfs -du -h -s /some/path/
A quick check on the Internet and I saw it was bug in Hadoop 2.7.0.
To fix the issue, I had to delete some of my Hadoop snapshot files. I believe a/some snapshot(s) had been corrupted as I had one of my data node decommissioned uncleanly from my cluster few days ago.
hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs supergroup 0 2018-01-30 17:04 0 65536 /data
[hdfs#hmastera ~]$ hdfs dfs -ls /data/.snapshot
Found 5 items
drwxr-xr-x - hdfs supergroup 0 2017-08-19 01:06 /data/.snapshot/insight-dl-cluster_snapshot_20170819T010503
drwxr-xr-x - hdfs supergroup 0 2017-08-19 01:08 /data/.snapshot/insight-dl-cluster_snapshot_20170819T010746
drwxr-xr-x - hdfs supergroup 0 2017-08-19 01:12 /data/.snapshot/insight-dl-cluster_snapshot_20170819T011013
drwxr-xr-x - hdfs supergroup 0 2017-08-19 01:14 /data/.snapshot/insight-dl-cluster_snapshot_20170819T011219
drwxr-xr-x - hdfs supergroup 0 2018-01-13 16:24 /data/.snapshot/insight-dl-cluster_snapshot_20180113T162234
`
Then I started deleting the snapshots till I got my mojo back.
hdfs# hmastera ~]
hdfs dfs -deleteSnapshot /data insight-dl-cluster_snapshot_20170819T010503
hdfs dfs -deleteSnapshot /data insight-dl-cluster_snapshot_20170819T010746
hdfs dfs -deleteSnapshot /data insight-dl-cluster_snapshot_20170819T011013
hdfs dfs -deleteSnapshot /data insight-dl-cluster_snapshot_20170819T011219
hdfs dfs -deleteSnapshot /data insight-dl-cluster_snapshot_20180113T162234
[hdfs# hmastera ~]$ hdfs dfs -du -h -s /data
510.1 G /data

Pig - Permission denied in map reduce mode

I am trying to load a csv file from hdfs using PigStorage, limit the output bt one record and dump.
my hdfs snapshot:
I am running a 2 node cluster with 1 master (NN & Sec NN)& 1 data node & job tracker on a slave machine.
My pig scripts running on data node.
using root user
grunt> x= load '/user/hadoop/input/myfile.csv' using PigStorage(',') as (colA:chararray);
grunt> y = limit x 1;
grunt> dump y;
console log:
> HadoopVersion PigVersion UserId StartedAt FinishedAt
> Features
> 1.0.4 0.11.1 root 2013-09-26 17:35:18 2013-09-26 17:35:47 LIMIT
>
> Failed!
>
> Failed Jobs: JobId Alias Feature Message Outputs
> job_201309190323_0019 x,y Message: Job failed! Error -
> JobCleanup Task Failure, Task: task_201309190323_0019_m_000002
I am getting permission denied error and log is
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hadoop, access=EXECUTE, inode="hadoop-root":root:supergroup:rwx------
which says that permission is denied when user "hadoop" is trying to execute on a folder "hadoop-root".
But my current user is root from where i am running pig & my namenode is running with user hadoop (superuser i hope)
**Why is the log showing user=hadoop instead of root. Am i doing anything wrong **
Snapshot of hdfs:
[hadoop#hadoop-master ~]$ hadoop fs -ls /
Warning: $HADOOP_HOME is deprecated.
Found 2 items
drwx------ - hadoop supergroup 0 2013-09-26 17:29 /tmp
drwxr-xr-x - hadoop supergroup 0 2013-09-26 14:20 /user
----------------------------------------------------------------------------------------
[root#hadoop-master hadoop]# hadoop fs -ls /user
Warning: $HADOOP_HOME is deprecated.
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2013-09-26 14:19 /user/hadoop
drwxr-xr-x - root root 0 2013-09-26 14:33 /user/root
----------------------------------------------------------------------------------------
[hadoop#hadoop-master ~]$ hadoop fs -ls /tmp
Warning: $HADOOP_HOME is deprecated.
Found 15 items
drwx------ - hadoop supergroup 0 2013-09-19 01:43 /tmp/hadoop-hadoop
drwx------ - root supergroup 0 2013-09-19 03:25 /tmp/hadoop-root
drwxr-xr-x - hadoop supergroup 0 2013-09-26 17:29 /tmp/temp-1036150440
drwxr-xr-x - root supergroup 0 2013-09-26 17:27 /tmp/temp-1270545146
drwx------ - root supergroup 0 2013-09-26 14:51 /tmp/temp-1286962351
drwx------ - hadoop supergroup 0 2013-09-26 14:12 /tmp/temp-1477800537
drwx------ - hadoop supergroup 0 2013-09-26 15:25 /tmp/temp-1503376062
drwx------ - root supergroup 0 2013-09-26 14:09 /tmp/temp-282162612
drwx------ - root supergroup 0 2013-09-26 17:22 /tmp/temp-758240893
drwx------ - root supergroup 0 2013-09-26 15:00 /tmp/temp1153649785
drwx------ - root supergroup 0 2013-09-26 13:35 /tmp/temp1294190837
drwx------ - root supergroup 0 2013-09-26 13:42 /tmp/temp1469783962
drwx------ - root supergroup 0 2013-09-26 14:45 /tmp/temp2087720556
drwx------ - hadoop supergroup 0 2013-09-26 14:29 /tmp/temp2116374858
drwx------ - root supergroup 0 2013-09-26 16:55 /tmp/temp299188455
I even tried to turn off the permission check (dfs.permissions in core-site.xml on both my nodes) as mentioned Permission denied at hdfs
restarted all my hadoop services. But still no luck.
As per the log, I tried doing
hadoop fs -chmod -R 777 /tmp
as i identified that hadoop-root (which is not having permission as per above log) will be under /tmp dir in hdfs.
But i got different exception after changing the permission.
Message: java.io.IOException: The ownership/permissions on the staging directory hdfs://hadoop-master:9000/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------
So, i reverted the permission to hadoop fs -chmod -R 700 /tmp, and now the same old permission denied exception came back.
Could you please help.
Finally i could able to solve this problem.
I had my /tmp file in HDFS without proper permissions. I tried to change the permission to 1777 (sticky bit) when i have some files already in my hdfs. But that did not work.
As a trial & error, i took a backup of my hdfs using -copyToLocal to my local file system and removed all my files including /tmp folder.
I recreated /tmp directory this time with proper permissions.
hadoop fs -chmod 1777 /tmp
and i copied all my files again into hdfs using -put command.
This time my pig script which is on the first post worked like charm.
I checked the permission of /tmp/hadoop-root/mapred/staging it is set to what it should be.
drwxrwxrwx
Hope this helps anyone who is facing the same issue.
Cheers
sudo su - hdfs
Once you're running as the 'hdfs' user then you should be able to run
hadoop fs -chmod -R 777 /tmp
All file permissions should then be changed.

HDFS path changing when trying to update files in HDFS

I am new to Hadoop and HDFS, so maybe it is something I am doing wrong when I copy from local (Ubuntu 10.04) to HDFS on a single node on localhost. The initial copy works fine, but when I modify my local input folder and try to copy back to HDFS, the HDFS path changes.
~$ $HADOOP_HOME/bin/hadoop dfs -copyFromLocal /tmp/anagram /user/hduser/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -ls /user/hduser/anagram
Found 1 items
-rw-r--r-- 1 hduser supergroup 4067675 2011-08-29 05:44 /user/hduser/anagram/SINGLE.TXT
After adding another file (COMMON.TXT) to the same local directory, I run the same copy on the local directory to HDFS, but this time it copies to a different location than the first time (/user/hduser/anagram to /user/hduser/anagram/anagram).
~$ $HADOOP_HOME/bin/hadoop dfs -copyFromLocal /tmp/anagram /user/hduser/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -ls /user/hduser/anagram
Found 2 items
-rw-r--r-- 1 hduser supergroup 4067675 2011-08-29 05:44 /user/hduser/anagram/SINGLE.TXT
drwxr-xr-x - hduser supergroup 0 2011-08-29 05:48 /user/hduser/anagram/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -ls /user/hduser/anagram/anagram
Found 2 items
-rw-r--r-- 1 hduser supergroup 805232 2011-08-29 05:48 /user/hduser/anagram/anagram/COMMON.TXT
-rw-r--r-- 1 hduser supergroup 4067675 2011-08-29 05:48 /user/hduser/anagram/anagram/SINGLE.TXT
Has anyone ran into this? I found that to resolve this, you need to remove the first directory and then copy over again:
~$ $HADOOP_HOME/bin/hadoop dfs -rmr /user/hduser/anagram/anagram
Deleted hdfs://localhost:54310/user/hduser/anagram/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -rmr /user/hduser/anagram
Deleted hdfs://localhost:54310/user/hduser/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -copyFromLocal /tmp/anagram /user/hduser/anagram
~$ $HADOOP_HOME/bin/hadoop dfs -ls /user/hduser/anagram
Found 2 items
-rw-r--r-- 1 hduser supergroup 805232 2011-08-29 05:55 /user/hduser/anagram/COMMON.TXT
-rw-r--r-- 1 hduser supergroup 4067675 2011-08-29 05:55 /user/hduser/anagram/SINGLE.TXT
Does anyone know how to do this without having to delete the directory every time?
It seems to me that this is side effect (check the FileUtil.java, static method FileUtil.checkDest(String srcName, FileSystem dstFS, Path dst, boolean overwrite) )
try this:
hadoop dfs -copyFromLocal /tmp/anagram/*.TXT /user/hduser/anagram
for updating directory.

Resources