hadoop creates dir that cannot be found - hadoop

I use the following hadoop command to create a directory
hdfs dfs -mkdir /tmp/testing/morehere1
I get the following message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
Not understanding the error, I run the command again, which returns this message:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
mkdir: `/tmp/testing/morehere2': File exists
then when I try to go to the directory just created, it's not there.
cd /tmp/testing/morehere2
-bash: cd: /tmp/testing/morehere2: No such file or directory
Any ideas what I am doing wrong?

hdfs dfs -mkdir /tmp/testing/morehere1
This command created a directory in the hdfs . Dont worry about the log4j warning . The command created the directory successfully . That is why you got the error mkdir: /tmp/testing/morehere2': File exists the second time you tried the command .
The following command will not work , since the directory is not created in your local filesystem , but in hdfs .
cd /tmp/testing/morehere2
Use the command below to check the created directory in hdfs :
hdfs dfs -ls /tmp/testing
You should be able to see the new directory there .
About the log4j warning : You can ignore that as it will not cause your hadoop commands to fail . But if you want to correct it , you can add a File appender to log4j.properties .

Remember that there's a difference between HDFS and your local file system. That first line that you posted creates a directory in HDFS, not on your local system. So you can't cd to it or ls it or anything directly; if you want to access it, you have to through hadoop. It's also very rare to be logging to HDFS as file appends have never been well-supported. I suspect that you actually want to be creating that directory locally, and that might be part of your problem.

If your MR code were running fine previously and Now its showing this log4j error then restart all the hadoop daemons. It may solve your problem as it solves mine :)

Related

permission denied error on hdfs hile using put command

While trying to use the put command to add patternsToSkip file to hdfs, I get an error saying: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x
In the image below, you can see the sequence of commands written along with the error:
I tried to user access as biadmin, root, and even hdfs but with no luck! (details in the image)
please help me fix this error. Thanks folks.
Reason, it is giving permission issue is because you are trying to put the file inside /user directory in hdfs since you are using 2 dots in put statement. You need to login as supergroup to access or copy file inside that particular directory.
What i would suggest is try running below commands to copy file to hdfs.
Target with one dot
hadoop fs -put patternsToSkip .
OR
Giving complete target directory path
hadoop fs -put patternsToSkip /user/<instance_name>/output

Error when uploading file from local file system to HDFS

I'm following Hortonworks Hadoop tutorial: https://hortonworks.com/tutorial/manage-files-on-hdfs-via-cli-ambari-files-view/section/1/#create-a-directory-in-hdfs-upload-a-file-and-list-contents.
And I was able to create a directory in HDFS, but having a problem uploading a file from my local system to the directory.
I gave root access to read and write to the user directory with the command hdfs dfs -chmod 777 /user. And then I gave permissions to root with the following command hdfs dfs -chown root:hdfs /user/hadoop.
But for some reason when I'm trying to execute the command hdfs dfs -put sf-salaries-2011-2013.csv /user/hadoop/sf-salaries-2011-2013/sf-salaries-2011-2013.csv - I'm getting the error:
put: `/user/hadoop/sf-salaries-2011-2013/sf-salaries-2011-2013.csv': No such file or directory: `hdfs://sandbox-hdp.hortonworks.com:8020 /user/hadoop/sf-salaries-2011-2013/sf-salaries-2011-2013.csv'
Could it be the problem with port 8020? I'm following the tutorial step by step and can not figure out what I may be missing here.
Here's my terminal view:
I can see the directory created in Ambari as well (created it twice):
The first one probably has a typo (upper-case H). But as I see the directory is empty but it exists.
You are trying to put in HDFS directory which is not there in HDFS. Its not with the permissions.
hdfs dfs -ls /user/hadoop/sf-salaries-2011-2013
The above directory is not there in HDFS, you are trying to put in the sf-salaries-2011-2013 directory but the directory in HDFS is like sf-salaries-2011-1013. check out the highlights in the below picture, you can find out what you are doing wrong here.

Permission denied issue in mapreduce?

I have tried the below query.
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount /home/cloudera/Desktop/words/output
map reduce is started after that its showing below error. can anyone please help on this issue.
15/11/04 10:33:57 INFO mapred.JobClient: Task Id : attempt_201511040935_0008_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Do I need to change anything config file or in cloudera manager.
The exception suggests that you are trying to write to the HDFS root directory "/" which you (user:cloudera) does not have permission to do.
Without knowing what your specific jar does:
I guess that the last argument ("/home/cloudera/Desktop/words/output") is where you wish to place the output.
I guess this is supposed to be within HDFS where /home does not exist.
Try to change this to somewhere where you can write, possibly "/user/cloudera/words/output"
There are set of default directories to be created before you start using the hadoop cluster,
do, it should show you the directories
$ hadoop fs -ls /
sample user, if you want to run as cloudera you need on hdfs
/user/cloudera -- the user running the program
/user/hadoop -- your hadoop file system user
/user/mapred -- your mapred user
/tmp -- temporary which needs to have permission hdfs chmod 1777
HTH.
The last argument that you are passing should be the output path of HDFS not the default file system.
As you are running with cloudera user, you can point to the /user/cloudera/words/output. But first you need to check whether you have cloudera in your HDFS and you have write permission by issuing the following
hadoop fs -ls /user/
Once you have it change your command to following:
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount <path_where_you_have_write_permission_in_HDFS>

Sqoop Permission Issue when running inside Map Reduce Code

I am trying to invoke Sqoop through a map reduce program using
Sqoop.runTool(arguments,_conf);
When executing, I receive the following error
Exception in thread "main" java.lang.RuntimeException: Could not create temporary directory: /tmp/sqoop-hdfs/compile/a609226c19d65f561dd7035c00d318f6; check for a directory permissions issue on /tmp.
I have set the permissions on /tmp and it's subdirectories in HDFS to 777
I can invoke the same command fine through command line using sudo -u hdfs sqoop ...
This is Cloudera's hadoop distirbution and I am running the job as hdfs user.
This probably isn't the /tmp directory in HDFS, but rather then /tmp directory on the local file system - whats the permissions on that directory (and would also explain why it works when you 'sudo' the command)
Just clean /tmp/sqoop-hdfs/compile floder it works

hadoop mapred job - Error initializing attempt mapred task

I accidentally deleted hadoop.tmp.dir, in my case /tmp/{user.name}/*. Now everytime when I run hive query from CLI, and the mapred job will fail at the task attempt as below:
Error initializing attempt_201202231712_1266_m_000009_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for ttprivate/taskTracker/hdfs/jobcache/job_201202231712_1266/jobToken
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:376)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4432)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1301)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1242)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2541)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2505)
It's a test environment, I don't care about the data. How can I get the system back to normal?
you should call stop-all.sh file , recreate the file and start after formatting the tmp file
You can just simple recreate the directory and change the owner of the file to mapred. chown mapred:mapred <your dir>

Resources