Not able to start the Master process of Spark - hadoop

I have set up a three node Hadoop cluster and trying to run Spark using Hadoop's YARN and HDFS.
I set the various environment variables like HADOOP_HOME , HADOOP_CONF_DIR, SPARK_HOME etc.
Now, when I try to run the master process of spark using start-master.sh , it is giving me some exceptions,the main contents of exception file is below:
Spark Command: /usr/local/java/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/*:/usr/local/hadoop/etc/hadoop/ - Xmx1g org.apache.spark.deploy.master.Master --host master.hadoop.cluster --port 7077 --webui-port 8080
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
As it says ClassNotFound exception,I am not able to understand how to provide this class, and which Jar to use from which it can pick the class file. Does this jar come as bundled with the Spark download?
Can anyone please help in fixing this issue.

Related

running hadoop-free spark build on windows 10

currently i installed Hadoop ver 3.2.3 on windows 10 and there is no problem running it. i also downloaded spark-3.3.0-bin-without-hadoop.tgz and set Enviornment variable SPARK_HOME appropriately.
i also check hadoop classpath and it returned:
C:\hadoop-3.2.3\etc\hadoop;C:\hadoop-3.2.3\share\hadoop\common;C:\hadoop-3.2.3\share\hadoop\common\lib\*;C:\hadoop-3.2.3\share\hadoop\common\*;C:\hadoop-3.2.3\share\hadoop\hdfs;C:\hadoop-3.2.3\share\hadoop\hdfs\lib\*;C:\hadoop-3.2.3\share\hadoop\hdfs\*;C:\hadoop-3.2.3\share\hadoop\yarn;C:\hadoop-3.2.3\share\hadoop\yarn\lib\*;C:\hadoop-3.2.3\share\hadoop\yarn\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\lib\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\*
seem's fine.
and i added
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
in the spark-env.sh
but still when i type spark-shell or pyspark command in command line i get the famous error
Error: A JNI error has occurred, please check your installation and try again. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
...
the path seem's to be right.
any idea what should i do now?
thanks for your attention!

Spark 2.0.1 not finding file passed in through archives flag

I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.

Executing Mahout against Hadoop cluster

I have a jar file which contains the mahout jars as well as other code I wrote.
It works fine in my local machine.
I would like to run it in a cluster that has Hadoop already installed.
When I do
$HADOOP_HOME/bin/hadoop jar myjar.jar args
I get the error
Exception in thread "main" java.io.IOException: Mkdirs failed to create /some/hdfs/path (exists=false, cwd=file:local/folder/where/myjar/is)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java 440)
...
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I checked that I can access and create the dir in the hdfs system.
I have also ran hadoop code (no mahout) without a problem.
I am running this in a linux machine.
Check for the mahout user and hadoop user being same. and also check for mahout and hadoop version compatibility.
Regards
Jyoti ranjan panda

Hadoop run time error

I have school project to work with hadoop and that will be hosted in amazon EMR.
At first, I'm trying to understand with simple wordcount program and it is running fine at eclipse IDE.
But if I tried to run from command line I'm getting below error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at counter.WordCount.main(WordCount.java:56)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method).
Do you have any suggestion for this error and any resource to understand hadoop and EMR?
Thanks,
myat
Don't run your Job from the IDE or with the java command. Instead use the hadoop script in the bin/ directory of the hadoop installation.
Example: if your Job's starting point is in the mrjob.MyJob class and you have a jar (job.jar) containing your Job class, you should run it like this:
path/to/bin/hadoop jar job.jar mrjob.MyJob inputFolder outputFolder

cant run pig with single node hadoop server

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage
I am missing something very obvious. Can someone help me get this running please?
More details:
1. I assume that hadoop is running fine because, I could run MapReduce jobs in python.
2. pig -x local runs as i expect.
3. when i just type pig it gives me following error
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.(PigServer.java:226)
at org.apache.pig.PigServer.(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================
Link helped me understand possible cause of failure.
Here is what fixed my problem.
1. Recompile pig without hadoop.
2. Update PIG_CLASSPATH to have all the jars from $HADOOP_HOME/lib
3. Run pig.
Thanks.
set your PIG_CLASSPATH to point to your correct HADOOP_HOME installation so that Pig can pick up ur cluster information from core-site.xml,mapreduce-site.xml and hdfs-site.xml,better to follow the link for correct installation.
Just install Cygwin, then add the Cygwin path to the Path Environment Variable:
For details see here.

Resources