security.UserGroupInformation: PriviledgedActionException error for MR - hadoop

Whenever i m trying to execute a map reduce job to write to Hbase table i am getting the following error in the console. I am running the MR job from the user account.
ERROR security.UserGroupInformation: PriviledgedActionException as:user cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/data1/input/Filename.csv
I did the hadoop ls, user is the owner of the file.
-rw-r--r-- 1 user supergroup 7998682 2014-04-17 18:49 /data1/input/Filename.csv
All my daemons are perfectly running, if i am using hbase client api, i am able to insert.
Please help, thanks in advance.
Thanks,
KG

If you look at the following path
Input path does not exist: file:/data1/input/Filename.csv
you can see that it is pointing to local filesystem not to hdfs. Try prefixing the filesystem type hdfs in the path as follows
hdfs://<NAMENODE-HOST>:<IPC-PORT>/data1/input/Filename.csv

Related

Getting write permission from HDFS after updating flink-1.40 to flink-1.4.2

Environment
Flink-1.4.2
Hadoop 2.6.0-cdh5.13.0 with 4 nodes in service and Security is off.
Ubuntu 16.04.3 LTS
Java 8
Description
I have a Java job in flink-1.4.0 which writes to HDFS in a specific path.
After updating to flink-1.4.2 I'm getting the following error from Hadoop complaining that the user doesn't have write permission to the given path:
WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:xng (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=user1, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x
NOTE:
If I run the same job on flink-1.4.0, Error disappears regardless of what version of flink (1.4.0 or 1.4.2) dependencies I have for job
Also if I run the job main method from my IDE and pass the same parameters, I don't get above error.
Question
Any Idea what's wrong? Or how to fix?

java.io.EOFException: Premature EOF: no length prefix available in Spark on Hadoop

I'm getting this weird exception. I'm using Spark 1.6.0 on Hadoop 2.6.4 and submitting Spark job on YARN cluster.
16/07/23 20:05:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-532134798-128.110.152.143-1469321545728:blk_1073741865_1041
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
16/07/23 20:49:09 ERROR server.TransportRequestHandler: Error sending result RpcResponse{requestId=4719626006875125240, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=81 cap=81]}} to ms0440.utah.cloudlab.us/128.110.152.175:58944; closing connection
java.nio.channels.ClosedChannelException
I was getting this error when running on Hadoop 2.6.0 and thought the exception might be kind of a bug like this but after even changing this to Hadoop 2.6.4 I'm getting the same error. There is not any memory problem, my cluster is good with HDFS and memory. I went through this and this but no luck.
Note: 1. I'm using Apache Hadoop and Spark not any CDH/HDP. 2. I'm able to copy data in HDFS and even able to execute another job on this cluster.
Check file permissions of dfs directory:
find /path/to/dfs -group root
In general, the user permission group is hdfs.
Since I started HDFS service with root user, some dfs block file with root permissions was generated.
I solved the problem after change to right permissions:
sudo chown -R hdfs:hdfs /path/to/dfs

Permission denied issue in mapreduce?

I have tried the below query.
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount /home/cloudera/Desktop/words/output
map reduce is started after that its showing below error. can anyone please help on this issue.
15/11/04 10:33:57 INFO mapred.JobClient: Task Id : attempt_201511040935_0008_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Do I need to change anything config file or in cloudera manager.
The exception suggests that you are trying to write to the HDFS root directory "/" which you (user:cloudera) does not have permission to do.
Without knowing what your specific jar does:
I guess that the last argument ("/home/cloudera/Desktop/words/output") is where you wish to place the output.
I guess this is supposed to be within HDFS where /home does not exist.
Try to change this to somewhere where you can write, possibly "/user/cloudera/words/output"
There are set of default directories to be created before you start using the hadoop cluster,
do, it should show you the directories
$ hadoop fs -ls /
sample user, if you want to run as cloudera you need on hdfs
/user/cloudera -- the user running the program
/user/hadoop -- your hadoop file system user
/user/mapred -- your mapred user
/tmp -- temporary which needs to have permission hdfs chmod 1777
HTH.
The last argument that you are passing should be the output path of HDFS not the default file system.
As you are running with cloudera user, you can point to the /user/cloudera/words/output. But first you need to check whether you have cloudera in your HDFS and you have write permission by issuing the following
hadoop fs -ls /user/
Once you have it change your command to following:
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount <path_where_you_have_write_permission_in_HDFS>

Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory?

I am running Hadoop 2.2.0.2.0.6.0-101 on a single node.
I am trying to run Java MRD program that writes data to an existing Hive table from Eclipse under regular user. I get exception:
org.apache.hadoop.security.AccessControlException: Permission denied: user=dev, access=WRITE, inode="/apps/hive/warehouse/testids":hdfs:hdfs:drwxr-xr-x
This happens because regular user has no write permission to warehouse directory, only hdfs user does:
drwxr-xr-x - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxr-xr-x - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
To circumvent this I change permissions on warehouse directory, so everybody now have write permissions:
[hdfs#localhost wks]$ hadoop fs -chmod -R a+w /apps/hive/warehouse
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxrwxrwx - hdfs hdfs 0 2014-03-06 16:08 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
This helps to some extent, and MRD program can now write as a regular user to warehouse directory, but only once. When trying to write data into the same table second time I get:
ERROR security.UserGroupInformation: PriviledgedActionException as:dev (auth:SIMPLE) cause:org.apache.hcatalog.common.HCatException : 2003 : Non-partitioned table already contains data : default.testids
Now, if I delete output table and create it anew in hive shell, I again get default permissions that do not allow regular user to write data into this table:
[hdfs#localhost wks]$ hadoop fs -ls /apps/hive/warehouse
drwxr-xr-x - hdfs hdfs 0 2014-03-11 12:19 /apps/hive/warehouse/testids
drwxrwxrwx - hdfs hdfs 0 2014-03-05 12:07 /apps/hive/warehouse/test
Please advise on Hive correct configuration steps that will allow a program run as a regular user do the following operations in Hive warehouse:
Programmatically create / delete / rename Hive tables?
Programmatically read / write data from Hive tables?
Many thanks!
If you maintain the table from outside Hive, then declare the table as external:
An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
A Hive administrator can create the table and it can point it toward your own user owned HDFS storage location and you grant Hive permission to read from there.
As a general comment, there are no ways for an unprivileged user to do an unauthorized privileged action. Any such way is technically an exploit and you should never rely on it: even if is possible today, it will likely be closed soon. Hive Authorization (and HCatalog authorization) is orthogonal to HDFS authorization.
Your application is also incorrect, irrelevant of authorization issues. You are trying to write 'twice' in the same table which means your application does not handle partitions correctly. Start from An Introduction to Hive’s Partitioning.
You can configure for hdfs-site.xml such as:
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
This configure will disable permissions on HDFS. So, a regular user can do the operations on HDFS.
I hope this solve will help you.

Oozie job configuration app directory not found on HDFS

I installed a pseudo-distributed version of Cloudera on my Linux box, and ran some simple MapReduce examples with success. However, I'm trying to get Oozie to work, and am completely baffled by the errors I am receiving when attempting to execute a simple job workflow:
tim#phocion:~$ oozie version
Oozie client build version: 3.1.3-cdh4.0.1
Copy the pre-packaged examples to HDFS and execute, per the documentation:
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -config /user/tim/examples/apps/map-reduce/job.properties -run
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
Check to see if the file exists:
tim#phocion:~$ hdfs dfs -ls /user/tim/examples/apps/map-reduce
Found 3 items
-rwxr-xr-x 1 tim tim 995 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/job.properties
drwxrwxr-x - tim tim 4096 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/lib
-rwxr-xr-x 1 tim tim 2559 2012-10-03 14:47 /user/tim/examples/apps/map-reduce/workflow.xml
It does. Can I connect to phocion:8020?
tim#phocion:~$ telnet phocion 8020
Trying 127.0.1.1...
Connected to phocion.
Escape character is '^]'.
I can. So, basically, I'm at a total loss as to what this error is trying to tell me - the folder very much does exist. I'm assuming the error is too vague to fully communicate what the issue is, but I've found virtually nothing out there that could point me in the right direction.
I can also replicate this error with other 3rd party tutorials.
Spent much time pouring through configuration files to the point of not wanting to look at a computer ever again. Maybe I'm over thinking the issue here, but any help would be greatly appreciated.
EDIT: Adding the full job.properties (not too different from the default):
nameNode=hdfs://phocion:8020
jobTracker=phocion:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
MORE EDITS: I get the same exact error when the folder DOES NOT exist, and after I put if back into hdfs. Last-ditch idea that its a permissions issue, chmod 777 still gets the same error. Full HDFS path passed on the command line doesn't fix the issue. Running it under oozie and even root accounts don't work:
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ hdfs dfs -put examples/ /user/tim/
12/10/04 13:26:43 INFO util.NativeCodeLoader: Loaded the native-hadoop library
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ hdfs dfs -chmod -R 777 /user/tim/examples/
12/10/04 13:28:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
tim#phocion:~$ oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ sudo -u oozie oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
[sudo] password for tim:
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
tim#phocion:~$ sudo -u root oozie job -oozie http://phocion:11000/oozie -run -config /home/tim/examples/apps/map-reduce/job.properties -Doozie.wf.application.path=hdfs://phocion:8020/user/tim/examples/apps/map-reduce
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/map-reduce] does not exist
Should this command work in theory?
tim#phocion:~$ hdfs dfs -ls hdfs://phocion:8020/user/tim/examples/apps/map-reduce
ls: `hdfs://phocion:8020/user/tim/examples/apps/map-reduce': No such file or directory
This shows up in hadoop-hdfs logs after executing the oozie command:
2012-10-04 13:50:00,152 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 113297
2012-10-04 13:50:00,874 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://localhost.localdomain:50090/getimage?getimage=1&txid=113296&storageInfo=-40:2092007576:0:cluster8
2012-10-04 13:50:00,875 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.net.ConnectException: Connection refused
2012-10-04 13:50:00,876 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:234)
at sun.net.www.http.HttpClient.New(HttpClient.java:307)
at sun.net.www.http.HttpClient.New(HttpClient.java:324)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
In addition to HarshJ's comment, check your error message:
Error: E0504 : E0504: App directory [hdfs://phocion:8020/user/tim/examples/apps/demo] does not exist
And the hadoop fs -ls listing you provided:
/user/tim/examples/apps/map-reduce/
And play spot the difference:
/user/tim/examples/apps/demo
/user/tim/examples/apps/map-reduce/
try configuring as follows:
oozie.wf.application.path=/user/tim/examples/apps/map-reduce
I had a same issue and got it fixed by exporting the correct oozie url.
To export you should use the below command
export OOZIE_URL=http://someip:11000/oozie
To get this oozie url you need to use hue to connect you cluster and navigate to Workflows where you can find a tab called oozie. Inside this you should see gauges where a lot of properties will be listed. Look for the property oozie.servers.
What you need to do is to -copyFromLocal the examples folder to the location specified in the jobs config.

Resources