webhdfs open file NullPointerException - hadoop

I am trying open a file from HDFS throught the webhdfs API. I can create files and upload them, but once I try to open I get this error
{"RemoteException":{"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException","message":null}}
using the following command
curl -i -X GET "http://ec2-xx-xx-xx-xx.eu-west-1.compute.amazonaws.com:50070/webhdfs/v1/tmp/tmp.txt?op=OPEN"
I tried this from multiple machines (from the master node, or remotely) I get the same error. It's running on CHD4.6.
thanks,

apparently this is a bug in CDH running on Ubuntu 12.04, due to /run mounted with noexec
can be resolved as follows:
sudo mount -o remount,exec /run

Related

CDH 5.3.2 - Need to restart impala daemon from shell/script

I am using CDH 5.3.2 cluster and have a requirement to be able to start/stop impala daemons from a script. The command mentioned in Cloudera Docs
sudo service impala-server start
works fine on my CDH 5.10 local VM but on CDH 5.3.2 cluster I get an error "impala-server: unrecognized service". On checking in /etc/init.d I see that no such service is listed either (while its listed in 5.10 version)
Then i tried to restart the service directly from impala bin directory
cd /usr/bin
./impalad stop
However running into below error now:
E0918 11:55:27.815739 12046 JniFrontend.java:622] FileSystem is file:///
W0918 11:55:27.817589 12046 JniFrontend.java:534] Cannot detect CDH version. Skipping Hadoop configuration checks
E0918 11:55:27.817620 12046 impala-server.cc:210] Unsupported file system. Impala only supports DistributedFileSystem but the configured filesystem is: LocalFileSystem.fs.defaultFS(file:///) might be set incorrectly
E0918 11:55:27.817631 12046 impala-server.cc:212] Aborting Impala Server startup due to improper configuration
I checked core-site.xml on Cloudera Manager and fs.defaultFS is correctly set so not sure where its picking the value from. Any pointers on how to go further on this?
The init.d service packages to start Impala from the command line are meant to be used for CDH users who do NOT want to use Cloudera Manager. The right way to start and stop Impala on a Cloudera Manager cluster is to use the CM API:
https://cloudera.github.io/cm_api/apidocs/v17/index.html
start cluster service API
stop cluster service API
commands API
The tutorial shows how to use the CM APIs but for your situation you probably need to do:
$ curl -X POST -u USER:PASSWORD \
'CM_URL//api/v1/clusters/CLUSTERNAME/services/IMPALA_SERVICE/commands/stop'
replacing USER, PASSWORD, CM_URL, CLUSTERNAME, IMPALA_SERVICE_NAME with the appropriate values. The curl command will return a command ID.
Then poll this API with the command ID to see that the start/stop operation completed.
$ curl -u USER:PASSWORD 'CM_URL//api/v1/commands/COMMAND_ID'
However, if you still want to use the init.d service packages then you'll need to install the impala-server package.

Retrieve files from remote HDFS

My local machine does not have an hdfs installation. I want to retrieve files from a remote hdfs cluster. What's the best way to achieve this? Do I need to get the files from hdfs to one of the cluster machines fs and then use ssh to retrieve them? I want to be able to do this programmatically through say a bash script.
Here are the steps:
Make sure there is connectivity between your host and the target cluster
Configure your host as client, you need to install compatible hadoop binaries. Also your host needs to be running using same operating system.
Make sure you have the same configuration files (core-site.xml, hdfs-site.xml)
You can run hadoop fs -get command to get the files directly
Also there are alternatives
If Webhdfs/httpFS is configured, you can actually download files using curl or even your browser. You can write bash scritps if Webhdfs is configured.
If your host cannot have Hadoop binaries installed to be client, then you can use following instructions.
enable password less login from your host to the one of the node on the cluster
run command ssh <user>#<host> "hadoop fs -get <hdfs_path> <os_path>"
then scp command to copy files
You can have the above 2 commands in one script

Hadoop NFS gateway - mount failed: No such file or directory

I'm trying to mount my HDFS using the NFS gateway as it is documented here:
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html
Unfortunately, following the documentation step by step does not work for me (Hadoop 2.7.1 on CentOS 6.6). When executing the mount command I receive the following error message:
[root#server1 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,noacl,sync
server1:/ /hdfsmount/ mount.nfs: mounting server1:/ failed, reason
given by server: No such file or directory
I created the folder hdfsmount so that I can say it definitely exists. My questions are now:
Did anyone faced the same issue as I do?
Do I have to configure the NFS server before I start following the steps in the documentation (e.g. I read about editing /etc/exports).
Any help is highly apreciated!
I found the problem deep in the logs. When executing the command (see below) to start the nfs3 component of HDFS, the executing user needs permissions to delete /tmp/.hdfs-nfs which is configured as nfs.dump.dir in core-site.xml.
If the permissions are not set, you'll receive a log message like:
15/08/12 01:19:56 WARN fs.FileUtil: Failed to delete file or dir
[/tmp/.hdfs-nfs]: it still exists. Exception in thread "main"
java.io.IOException: Cannot remove current dump directory:
/tmp/.hdfs-nfs
Another option is to simply start the nfs component as root.
[root]> /usr/local/hadoop/sbin/hadoop-daemon.sh --script /usr/local/hadoop/bin/hdfs start nfs3

pig local mode spill data issue

I am trying to solve this issue but unable to understand. The pig script in my Development machine ran on a 1.8 GB data file successfully.
When I am trying to run it in server it is stating that it cannot find a local device to spill data spill0.out
I have modified the pig.temp.Dir property in the pig.property file to point to a location having space..
error:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out
So how to find out where pig is spilling out the data and can we change the pig spill directory location as well somehow.
I using pig in local mode.
Any ideas or suggestions or workarounds will be of great help.
Thanks..
I found an answer.
We need to put the follwing to the $PIG_HOME/conf/pig.properties file
mapreduce.jobtracker.staging.root.dir
mapred.local.dir
pig.temp.dir
and then test.
This has helped me solve the problem.
This is not a problem with Pig.
I'm not using Pig and I also have exactly the same error.
The problem seems to be more related to Hadoop. I also use it in local mode. I'm using Hadoop 2.6.0
I had no luck with these answers, Pig (version 0.15.0) was still writing pigbag* files to /tmp dir so I just renamed my /tmp dir and created a symbolic link to the desired location like this:
sudo -s #change to root
cd /
mv tmp tmp_local
ln -s /desired/new/tmp/location tmp
chmod 1777 tmp
mv tmp_local/* tmp
Make sure there are no active applications writing to tmp folder at the time of running these commands.

unable to setup psuedo distributed hadoop cluster

I am using centos 7. Downloaded and untarred hadoop 2.4.0 and followed the instruction as per the link Hadoop 2.4.0 setup
Ran the following command.
./hdfs namenode -format
Got this error :
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I see a number of posts with the same error with no accepted answers and I have tried them all without any luck.
This error can occur if the necessary jarfiles are not readable by the user running the "./hdfs" command or are misplaced so that they can't be found by hadoop/libexec/hadoop-config.sh.
Check the permissions on the jarfiles under: hadoop-install/share/hadoop/*:
ls -l share/hadoop/*/*.jar
and if necessary, chmod them as the owner of the respective files to ensure they're readable. Something like chmod 644 should be sufficient to at least check if that fixes the initial problem. For the more permanent fix, you'll likely want to run the hadoop commands as the same user that owns all the files.
I followed the link Setup hadoop 2.4.0
and I was able to get over the error message.
Seems like the documentation on hadoop site is not complete.

Resources