Spring XD on Hortonworks Sandbox on OSX - hadoop

I am trying to store Spring XD streams to the Hortonworks sandbox version 2.0 using xd-singlenode and xd-shell. No xd directory is created and no stream is stored in Hortonworks hadoop hdfs.
Environment:
Apple OSX 10.9.3, Hortonworks Sandbox running in Oracle Virtualbox (Red Hat 64bit), using bridge mode networking. I assigned in my WiFi router a fixed IP address (192.168.178.30) to the Virtualbox MAC address. When I browse with OSX Safari to 192.168.178.30:8000 I can use the Hortonworks menu's such as File Browser, Pig, Beeswax (Hive), etc.
A "check for misconfiguration" in the Hortonworks menu results into:
Configuration files located in /etc/hue/conf.empty
All OK. Configuration check passed.
I have used Homebrew to install Spring XD. I changed in OSX the /usr/local/Cellar/springxd/1.0.0.M6/libexec/xd/config/servers.yml file to include
# Hadoop properties
spring:
hadoop:
fsUri: hdfs://192.168.178.30:8020
and
#Zookeeper properties
# client connect string: host1:port1,host2:port2,...,hostN:portN
zk:
client:
connect: 192.168.178.30:2181
Within Virtualbox I changed the file /etc/hadoop/conf.empty/hadoop-env.sh to include:
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null"
I start Spring XD with OSX with the following commands:
./xd-singlenode --hadoopDistro hadoop22
and in a second OSX terminal:
./xd-shell --hadoopDistro hadoop22
In xd-shell I enter:
hadoop config fs --namenode hdfs://192.168.178.30:8020
A "hadoop fs ls /" command in the xd-shell results into:
Hadoop configuration changed, re-initializing shell...
2014-06-24 00:55:56.632 java[7804:5d03] Unable to load realm info from SCDynamicStore
00:55:56,672 WARN Spring Shell util.NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
drwxrwxrwt - yarn hadoop 0 2013-10-21 00:19 /app-logs
drwxr-xr-x - hdfs hdfs 0 2013-10-21 00:08 /apps
drwxr-xr-x - mapred hdfs 0 2013-10-21 00:10 /mapred
drwxr-xr-x - hdfs hdfs 0 2013-10-21 00:10 /mr-history
drwxrwxrwx - hdfs hdfs 0 2013-10-28 16:34 /tmp
drwxr-xr-x - hdfs hdfs 0 2013-10-28 16:34 /user
When I create a Spring XD stream with the command
stream create --name twvoetbal --definition "twittersearch --consumerKey='<mykey>' --consumerSecret='<mysecret>' --query='voetbal' | file" --deploy
then in OSX a /tmp/xd/output/twvoetbal.out file is created.
Spring XD seams to work including my Twitter developer secret keys.
When I create a Spring XD stream with the command
stream create --name twvoetbal --definition "twittersearch --consumerKey='<mykey>' --consumerSecret='<mysecret>' --query='voetbal' | hdfs" --deploy
then no xd directory and no file(s) are created in hadoop hdfs.
Questions:
How do I solve the "Unable to load realm info from SCDynamicStore" error in xd-shell?
How do I solve the "WARN Spring Shell util.NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable" error in xd-shell?
What else could I have done wrong?

The WARN messages are just noise - I always just ignore them. Did you create the xd directory on the sandbox? You need to give the user running the xd-singlenode the rights to create the needed directories.
You can ssh to the sandbox as root (password is hadoop) and run the following:
sudo -u hdfs hdfs dfs -mkdir /xd
sudo -u hdfs hdfs dfs -chmod 777 /xd
We have a brief writeup for using Hadoop VMs with XD:
https://github.com/spring-projects/spring-xd/wiki/Using-Hadoop-VMs-with-Spring-XD#hortonworks-sandbox

Related

How to run Spark Streaming application on Windows 10?

I run a Spark Streaming application on MS Windows 10 64-bit that stores data in MongoDB using spark-mongo-connector.
Whenever I run the Spark application, even pyspark I get the following exception:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
Full stack trace:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 32 more
I use Hadoop 3.0.0 alpha1 that I installed myself locally with HADOOP_HOME environment variable pointing to the path to Hadoop dir and %HADOOP_HOME%\bin in the PATH environment variable.
So I tried to do the following:
> hdfs dfs -ls /tmp
Found 1 items
drw-rw-rw- - 0 2016-12-26 16:08 /tmp/hive
I tried to change the permissions as follows:
hdfs dfs -chmod 777 /tmp/hive
but this command outputs:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I seem to be missing Hadoop's native library for my OS, which after looking it up also appears that i need to recomplie the libhadoop.so.1.0.0 for 64bit platform.
Where can I find the native library for Windows 10 64-bit?
Or is there another way of solving this ? aprt from the library ?
First of all, you don't have to install Hadoop to use Spark, including Spark Streaming module with or without MongoDB.
Since you're on Windows there is the known issue with NTFS' POSIX-incompatibility so you have to have winutils.exe in PATH since Spark does use Hadoop jars under the covers (for file system access). You can download winutils.exe from https://github.com/steveloughran/winutils. Download one from hadoop-2.7.1 if you don't know which version you should use (but it should really reflect the version of Hadoop your Spark Streaming was built with, e.g. Hadoop 2.7.x for Spark 2.0.2).
Create c:/tmp/hive directory and execute the following as admin (aka Run As Administrator):
winutils.exe chmod -R 777 \tmp\hive
PROTIP Read Problems running Hadoop on Windows for the Apache Hadoop project's official answer.
The message below is harmless and you can safely disregard it.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform

No folders in Hadoop 2.6 after installing

I am new to Hadoop. I succesfully installed hadoop 2.6 in my Ubuntu 12.04 by follwing the below link.
Hadoop 2.6 Installation
All services are running. But when I try to load file from local to HDFS, but it not at all showing folders in HDFS like /user or /data
hduse#vijee-Lenovo-IdeaPad-S510p:~$ jps
4163 SecondaryNameNode
4374 ResourceManager
3783 DataNode
3447 NameNode
5048 RunJar
18538 Jps
4717 NodeManager
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar
hduse#vijee-Lenovo-IdeaPad-S510p:~$ hadoop fs -ls hdfs:/
No output
If I run the above command: hadoop fs -ls hdfs:/, it is not showing any folder. I installed Pig as well and now I want to load data to Pig in mapreduce mode. In most of the websites they mentioned blindly URI in place of HDFS path. Please guide how to create folders and load data in the hdfs path.
If you are using plain vanilla hadoop, you will not see any directories. You have to create those.
You can start creating by running hadoop fs -mkdir /user

hadoop fs -ls results in "no such file or directory"

I have installed and configured Hadoop 2.5.2 for a 10 node cluster. 1 is acting as masternode and other nodes as slavenodes.
I have problem in executing hadoop fs commands. hadoop fs -ls command is working fine with HDFS URI. It gives message "ls: `.': No such file or directory" when used without HDFS URI
ubuntu#101-master:~$ hadoop fs -ls
15/01/30 17:03:49 WARN util.NativeCodeLoader: Unable to load native-hadoop
ibrary for your platform... using builtin-java classes where applicable
ls: `.': No such file or directory
ubuntu#101-master:~$
Whereas, executing the same command with HDFS URI
ubuntu#101-master:~$ hadoop fs -ls hdfs://101-master:50000/
15/01/30 17:14:31 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Found 3 items
drwxr-xr-x - ubuntu supergroup 0 2015-01-28 12:07 hdfs://101-master:50000/hvision-data
-rw-r--r-- 2 ubuntu supergroup 15512587 2015-01-28 11:50 hdfs://101-master:50000/testimage.seq
drwxr-xr-x - ubuntu supergroup 0 2015-01-30 17:03 hdfs://101-master:50000/wrodcount-in
ubuntu#101-master:~$
I am getting exception in MapReduce program due to this behavior. jarlib is referring to the HDFS file location, whereas, I want jarlib to refer to the jar files stored at the local file system on the Hadoop nodes.
The behaviour that you are seeing is expected, let me explain what's going on when you are working with hadoop fs commands.
The command's syntax is this: hadoop fs -ls [path]
By default, when you don't specify [path] for the above command, hadoop expands the path to /home/[username] in hdfs; where [username] gets replaced with linux username who is executing the command.
So, when you execute this command:
ubuntu#xad101-master:~$ hadoop fs -ls
the reason you are seeing the error is ls: '.': No such file or directory because hadoop is looking for this path /home/ubuntu, it seems like this path doesn't exist in hdfs.
The reason why this command:
ubuntu#101-master:~$ hadoop fs -ls hdfs://101-master:50000/
is working because, you have explicitly specified [path] and is the root of the hdfs. You can also do the same using this:
ubuntu#101-master:~$ hadoop fs -ls /
which automatically gets evaluated to the root of hdfs.
Hope, this clears the behaviour you are seeing while executing hadoop fs -ls command.
Hence, if you want to specify local file system path use file:/// url scheme.
this has to do with the missing home directory for the user. Once I created the home directory under the hdfs for the logged in user, it worked like a charm..
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/{loggedin user}
hdfs dfs -ls
this method fixed my problem.
The user directory in Hadoop is (in HDFS)
/user/<your operational system user>
If you get this error message it may be because you have not yet created your user directory within HDFS.
Use
hadoop fs -mkdir -p /user/<current o.p. user directory>
To see what is your current operational system user, use:
id -un
hadoop fs -ls it should start working...
There are a couple things at work here; based on "jarlib is referring to the HDFS file location", it sounds like you indeed have an HDFS path set as your fs.default.name, which is indeed the typical setup. So, when you type hadoop fs -ls, this is indeed trying to look inside HDFS, except it's looking in your current working directory, which should be something like hdfs://101-master:50000/user/ubuntu. The error message is unfortunately somewhat confusing since it doesn't tell you that . was interpreted to be that full path. If you hadoop fs -mkdir /user/ubuntu then hadoop fs -ls should start working.
This problem is unrelated to your "jarlib" problem; whenever you want to refer files explicitly stored in the local filesystem, but where the path goes through Hadoop's Path resolution, you simply need to add file:/// to force Hadoop to refer to the local filesystem. For example:
hadoop fs -ls file:///tmp
Try passing your jar file paths as fille file:///path/to/your/jarfile and it should work.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
This error will be removed using this command in .bashrc file:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop/lib/native"
------------------------------------------------------
/usr/local/hadoop is location where hadoop is install
-------------------------------------------------------

unable to load sample file into hadoop 2.2.0

I tried to install 2.2.0 pseudo mode,while I try to run copyfromlocal to copy a sample data
i used /input in destination path now, like-bin/hadoop fs -copyFromLocal /home/prassanna/Desktop/input /input
i think its worked now and i verified the file using below,
bin/hadoop fs -ls /input
14/03/12 09:31:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Found
1 items
-rw-r--r-- 1 root supergroup 64
and i also checked in uI of datanode,
but its showing used % is '0' only,but it has to show some kb's(64) of the file right?Please tell is the input file copied to hdfs properly ?**
and tell me where the file is physically stored in local machine exactly?Please help to solve this confusion.Thanks in Advance
If your source path is missing then you have to check for the existence of file on your local machine.
But if destination folder is not missing then first try to check then existence of that folder on HDFS.
For that you can open Web UI of hadoop HDFS by :50070 and then Browse the file system
Alternative to this you can check files through Command
hadoop fs -ls /<path of HDFS directory >
If this works then put file with following command
hadoop fs -put <local file path> <path of HDFS directory>
If any of these doesn't work then your hadoop is missing some important configuration
If your Web UI is opening but command is not running then try like this
hadoop fs -ls hdfs://<hadoop Master ip>/<path of HDFS directory >
If this works run put command as below
hadoop fs -put <local file path> hdfs://<hadoop Master ip>/<path of HDFS directory >

Hadoop 2.2 Add new Datanode to an existing hadoop installation

I first installed hadoop 2.2 on my machine (called Abhishek-PC) and everything worked fine. I am able to run the entire system successfully. (both namenode and datanode).
Now I created 1 VM hdclient1 and I want to add this VM as a data node.
Here are the steps which I have followed
I setup SSH successfully and I can ssh into hdclient1 without a password and I can login from hdclient1 into my main machine without a password.
I setup hadoop 2.2 on this VM and I modified the configuration files as per many tutorials on the web. Here are my configuration files
Name Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXdEM1WmRqVG5uYlU/edit?usp=sharing
Data Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXRnh3YUo1X2Frams/edit?usp=sharing
Now when I start start-dfs.sh on my first machine, I can see that DataNode starts successfully on hdclient1. Here is a screenshot from my hadoop console.
https://drive.google.com/file/d/0B0dV2NMSGYPXOEJ3UV9SV1d5bjQ/edit?usp=sharing
As you can see both the machines appear in my cluster (main main and data node).
Although both are called "localhost" for some strange reason.
I can see that the logs are being created on hdclient1in those logs there are no exceptions.
here are the logs from the name node
https://drive.google.com/file/d/0B0dV2NMSGYPXM0dZTWVRUWlGaDg/edit?usp=sharing
Here are the logs from the data node
https://drive.google.com/file/d/0B0dV2NMSGYPXNV9wVmZEcUtKVXc/edit?usp=sharing
I can login to the namenode UI successfully http://Abhishek-PC:50070
but here the UI in the live nodes it says only 1 live node and there is no mention of hdclient1.
https://drive.google.com/file/d/0B0dV2NMSGYPXZmMwM09YQlI4RzQ/edit?usp=sharing
I can create a directory in hdfs successfully hadoop fs -mkdir /small
From the datanode I can see that this directory has been created by using this command hadoop fs -ls /
Now when I try to add a file to my HDFS and I say
hadoop fs -copyFromLocal ~/Downloads/book/war_and_peace.txt /small
i get an error message
abhishek#Abhishek-PC:~$ hadoop fs -copyFromLocal
~/Downloads/book/war_and_peace.txt /small 14/01/04 20:07:41 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable 14/01/04
20:07:41 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/small/war_and_peace.txt.COPYING could only be replicated to 0 nodes
instead of minReplication (=1). There are 1 datanode(s) running and
no node(s) are excluded in this operation. at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
So my question is What am I doing wrong here? Why do I get this exception when I try to copy the file into HDFS?
We have a 3-node cluster (all physical boxes) that's been working great for a couple of months. This article helped me the most to setup.

Resources