How to run Spark Streaming application on Windows 10? - windows

I run a Spark Streaming application on MS Windows 10 64-bit that stores data in MongoDB using spark-mongo-connector.
Whenever I run the Spark application, even pyspark I get the following exception:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
Full stack trace:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 32 more
I use Hadoop 3.0.0 alpha1 that I installed myself locally with HADOOP_HOME environment variable pointing to the path to Hadoop dir and %HADOOP_HOME%\bin in the PATH environment variable.
So I tried to do the following:
> hdfs dfs -ls /tmp
Found 1 items
drw-rw-rw- - 0 2016-12-26 16:08 /tmp/hive
I tried to change the permissions as follows:
hdfs dfs -chmod 777 /tmp/hive
but this command outputs:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I seem to be missing Hadoop's native library for my OS, which after looking it up also appears that i need to recomplie the libhadoop.so.1.0.0 for 64bit platform.
Where can I find the native library for Windows 10 64-bit?
Or is there another way of solving this ? aprt from the library ?

First of all, you don't have to install Hadoop to use Spark, including Spark Streaming module with or without MongoDB.
Since you're on Windows there is the known issue with NTFS' POSIX-incompatibility so you have to have winutils.exe in PATH since Spark does use Hadoop jars under the covers (for file system access). You can download winutils.exe from https://github.com/steveloughran/winutils. Download one from hadoop-2.7.1 if you don't know which version you should use (but it should really reflect the version of Hadoop your Spark Streaming was built with, e.g. Hadoop 2.7.x for Spark 2.0.2).
Create c:/tmp/hive directory and execute the following as admin (aka Run As Administrator):
winutils.exe chmod -R 777 \tmp\hive
PROTIP Read Problems running Hadoop on Windows for the Apache Hadoop project's official answer.
The message below is harmless and you can safely disregard it.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform

Related

Unable to create a directory on hdfs on mac os

I am getting the below error message when I try to create a directory on hdfs.
I installed all the required softwares ssh, Java and set all the environment variables.
Not really sure where am I going wrong.
Could anyone share your thoughts on this? Thanks.
Command used:
bin/hdfs dfs -mkdir /Users/ravitejavutukuri/input
Error:
18/06/30 22:56:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: `/Users/ravitejavutukuri/input': No such file or directory
Currently I installed Hadoop 2.9.1 and I'm trying to experiment with pseudo-distributed-mode.
Try this command. It will create all the directories in the path.
bin/hdfs dfs -mkdir -p /Users/ravitejavutukuri/input
HDFS has no /Users directory (its not a Mac equivalent structure)
Did you mean /user?
The correct way to make a user directory for yourself would be
hdfs dfs -mkdir -p /user/$whoami/
hdfs dfs -chmod -R 750 /user/$whoami/
Then to make an input directory, not giving an absolute path automatically uses your HDFS user folder
hdfs dfs -mkdir input/

Spark installed but no command 'hdfs' or 'hadoop' found

I am a new pyspark user.
I just downloaded and installed a spark cluster ("spark-2.0.2-bin-hadoop2.7.tgz")
after installation I wanted to access the file system (upload local files to cluster). But when I tried to type hadoop or hdfs in command it will say "no command found".
Am I gonna install hadoop/HDFS (I thought it's built in the spark, I don't get)?
Thanks in advance.
You have to install hadoop first to access HDFS.
Follow this http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Choose the latest version of hadoop from the apache site.
Once you done with hadoop setup go to spark http://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.7.tgz download this, Extract files. Setup java_home and hadoop_home in spark-env.sh.
You don't have hdfs or hadoop on classpath so this is the reason why you are getting message: "no command found".
If you run \yourparh\hadoop-2.7.1\bin\hdfs dfs -ls / it should works and show root content.
But, You can add your hadoop/bin (hdfs, hadoop ...) commands to classpath with something like this:
export PATH $PATH:$HADOOP_HOME/bin
where HADOOP_HOME is your env. variable with path to hadoop installation folder (download and install is required)

Hadoop file system commands not found

I have installed Hadoop 2.6.0 on my laptop which runs Ubuntu 14.04 lts.
Below is the link I followed for Hadoop installation: https://github.com/ev2900/YouTube_Vedio/blob/master/Install%20Hadoop%202.6%20on%20Ubuntu%2014.04%20LTS/Install%20Hadoop%202.6%20--%20Virtual%20Box%20Ubuntu%2014.04%20LTS.txt
After installation, I ran two commands:
hadoop namenode -format - It works fine
hadoop fs -ls - It is giving the following error
15/11/15 16:15:28 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
ls: `.': No such file or directory
help me solve the error.
15/11/15 16:15:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable is a perpetual annoyance and not an error, so don't worry about that.
The ls: '.': No such file or directory error means that you haven't made your home directory yet, so you're trying to ls on a folder that doesn't exist. Do the following (as HDFS root user) to create your home folder. Ensure it has the correct permissions (which I guess depends on what specifically you want to do re: groups etc):
hdfs dfs -mkdir -p /user/'your-username'

Executing Mahout against Hadoop cluster

I have a jar file which contains the mahout jars as well as other code I wrote.
It works fine in my local machine.
I would like to run it in a cluster that has Hadoop already installed.
When I do
$HADOOP_HOME/bin/hadoop jar myjar.jar args
I get the error
Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/hdfs/path (exists=false, cwd=file:local/folder/where/myjar/is)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java 440)
...
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I checked that I can access and create the dir in the hdfs system.
I have also ran hadoop code (no mahout) without a problem.
I am running this in a linux machine.
Check for the mahout user and hadoop user being same. and also check for mahout and hadoop version compatibility.
Regards
Jyoti ranjan panda

Spring XD on Hortonworks Sandbox on OSX

I am trying to store Spring XD streams to the Hortonworks sandbox version 2.0 using xd-singlenode and xd-shell. No xd directory is created and no stream is stored in Hortonworks hadoop hdfs.
Environment:
Apple OSX 10.9.3, Hortonworks Sandbox running in Oracle Virtualbox (Red Hat 64bit), using bridge mode networking. I assigned in my WiFi router a fixed IP address (192.168.178.30) to the Virtualbox MAC address. When I browse with OSX Safari to 192.168.178.30:8000 I can use the Hortonworks menu's such as File Browser, Pig, Beeswax (Hive), etc.
A "check for misconfiguration" in the Hortonworks menu results into:
Configuration files located in /etc/hue/conf.empty
All OK. Configuration check passed.
I have used Homebrew to install Spring XD. I changed in OSX the /usr/local/Cellar/springxd/1.0.0.M6/libexec/xd/config/servers.yml file to include
# Hadoop properties
spring:
hadoop:
fsUri: hdfs://192.168.178.30:8020
and
#Zookeeper properties
# client connect string: host1:port1,host2:port2,...,hostN:portN
zk:
client:
connect: 192.168.178.30:2181
Within Virtualbox I changed the file /etc/hadoop/conf.empty/hadoop-env.sh to include:
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null"
I start Spring XD with OSX with the following commands:
./xd-singlenode --hadoopDistro hadoop22
and in a second OSX terminal:
./xd-shell --hadoopDistro hadoop22
In xd-shell I enter:
hadoop config fs --namenode hdfs://192.168.178.30:8020
A "hadoop fs ls /" command in the xd-shell results into:
Hadoop configuration changed, re-initializing shell...
2014-06-24 00:55:56.632 java[7804:5d03] Unable to load realm info from SCDynamicStore
00:55:56,672 WARN Spring Shell util.NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 6 items
drwxrwxrwt - yarn hadoop 0 2013-10-21 00:19 /app-logs
drwxr-xr-x - hdfs hdfs 0 2013-10-21 00:08 /apps
drwxr-xr-x - mapred hdfs 0 2013-10-21 00:10 /mapred
drwxr-xr-x - hdfs hdfs 0 2013-10-21 00:10 /mr-history
drwxrwxrwx - hdfs hdfs 0 2013-10-28 16:34 /tmp
drwxr-xr-x - hdfs hdfs 0 2013-10-28 16:34 /user
When I create a Spring XD stream with the command
stream create --name twvoetbal --definition "twittersearch --consumerKey='<mykey>' --consumerSecret='<mysecret>' --query='voetbal' | file" --deploy
then in OSX a /tmp/xd/output/twvoetbal.out file is created.
Spring XD seams to work including my Twitter developer secret keys.
When I create a Spring XD stream with the command
stream create --name twvoetbal --definition "twittersearch --consumerKey='<mykey>' --consumerSecret='<mysecret>' --query='voetbal' | hdfs" --deploy
then no xd directory and no file(s) are created in hadoop hdfs.
Questions:
How do I solve the "Unable to load realm info from SCDynamicStore" error in xd-shell?
How do I solve the "WARN Spring Shell util.NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable" error in xd-shell?
What else could I have done wrong?
The WARN messages are just noise - I always just ignore them. Did you create the xd directory on the sandbox? You need to give the user running the xd-singlenode the rights to create the needed directories.
You can ssh to the sandbox as root (password is hadoop) and run the following:
sudo -u hdfs hdfs dfs -mkdir /xd
sudo -u hdfs hdfs dfs -chmod 777 /xd
We have a brief writeup for using Hadoop VMs with XD:
https://github.com/spring-projects/spring-xd/wiki/Using-Hadoop-VMs-with-Spring-XD#hortonworks-sandbox

Resources