running hadoop-free spark build on windows 10 - windows

currently i installed Hadoop ver 3.2.3 on windows 10 and there is no problem running it. i also downloaded spark-3.3.0-bin-without-hadoop.tgz and set Enviornment variable SPARK_HOME appropriately.
i also check hadoop classpath and it returned:
C:\hadoop-3.2.3\etc\hadoop;C:\hadoop-3.2.3\share\hadoop\common;C:\hadoop-3.2.3\share\hadoop\common\lib\*;C:\hadoop-3.2.3\share\hadoop\common\*;C:\hadoop-3.2.3\share\hadoop\hdfs;C:\hadoop-3.2.3\share\hadoop\hdfs\lib\*;C:\hadoop-3.2.3\share\hadoop\hdfs\*;C:\hadoop-3.2.3\share\hadoop\yarn;C:\hadoop-3.2.3\share\hadoop\yarn\lib\*;C:\hadoop-3.2.3\share\hadoop\yarn\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\lib\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\*
seem's fine.
and i added
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
in the spark-env.sh
but still when i type spark-shell or pyspark command in command line i get the famous error
Error: A JNI error has occurred, please check your installation and try again. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
...
the path seem's to be right.
any idea what should i do now?
thanks for your attention!

Related

Spark-shell error on windows "Illegal character in path at index 32"

I am trying to setup spark on my new windows laptop. I am getting below error while running spark-shell :
"
ERROR Main: Failed to initialize Spark session.
java.lang.reflect.InvocationTargetException
Caused by: java.net.URISyntaxException: Illegal character in path at index 32: spark://DESKTOP-RCMDGS4:49985/C:\classes"
I am using below s/w :
Spark 3.2.1
Java 11
Hadoop: winutils
I have set below environment variables :
HADOOP_HOME, SPARK_HOME, JAVA_HOME, PATH
This is known issue in latest spark version. Downgrade to 3.0.3 could fix the issue.

NameNode: Failed to start namenode in windows 7

I am trying to install Hadoop in windows machine, in middle I got the below error.
Logs
17/11/28 16:31:48 ERROR namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyze
Storage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:369)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
Looks like you didn't install Hadoop winutils or build Hadoop with Native Libraries
Native IO is mandatory on Windows and without it you will not be able to get your installation working. You must follow all the instructions from BUILDING.txt to ensure that Native IO support is built correctly
Hadoop2 on Windows
I also have the similar issue.
I am using Hadoop-2.8.1. These steps solved the error for me.
download the winutils of your version from GitHub
Copy paste winutils at <HADOOP_HOME>/bin/
Also. double check JAVA_HOME environment is correctly set and reference in hadoop-env.cmd file

Not able to start the Master process of Spark

I have set up a three node Hadoop cluster and trying to run Spark using Hadoop's YARN and HDFS.
I set the various environment variables like HADOOP_HOME , HADOOP_CONF_DIR, SPARK_HOME etc.
Now, when I try to run the master process of spark using start-master.sh , it is giving me some exceptions,the main contents of exception file is below:
Spark Command: /usr/local/java/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/*:/usr/local/hadoop/etc/hadoop/ - Xmx1g org.apache.spark.deploy.master.Master --host master.hadoop.cluster --port 7077 --webui-port 8080
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
As it says ClassNotFound exception,I am not able to understand how to provide this class, and which Jar to use from which it can pick the class file. Does this jar come as bundled with the Spark download?
Can anyone please help in fixing this issue.

Hadoop run time error

I have school project to work with hadoop and that will be hosted in amazon EMR.
At first, I'm trying to understand with simple wordcount program and it is running fine at eclipse IDE.
But if I tried to run from command line I'm getting below error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at counter.WordCount.main(WordCount.java:56)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method).
Do you have any suggestion for this error and any resource to understand hadoop and EMR?
Thanks,
myat
Don't run your Job from the IDE or with the java command. Instead use the hadoop script in the bin/ directory of the hadoop installation.
Example: if your Job's starting point is in the mrjob.MyJob class and you have a jar (job.jar) containing your Job class, you should run it like this:
path/to/bin/hadoop jar job.jar mrjob.MyJob inputFolder outputFolder

cant run pig with single node hadoop server

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage
I am missing something very obvious. Can someone help me get this running please?
More details:
1. I assume that hadoop is running fine because, I could run MapReduce jobs in python.
2. pig -x local runs as i expect.
3. when i just type pig it gives me following error
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.(PigServer.java:226)
at org.apache.pig.PigServer.(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================
Link helped me understand possible cause of failure.
Here is what fixed my problem.
1. Recompile pig without hadoop.
2. Update PIG_CLASSPATH to have all the jars from $HADOOP_HOME/lib
3. Run pig.
Thanks.
set your PIG_CLASSPATH to point to your correct HADOOP_HOME installation so that Pig can pick up ur cluster information from core-site.xml,mapreduce-site.xml and hdfs-site.xml,better to follow the link for correct installation.
Just install Cygwin, then add the Cygwin path to the Path Environment Variable:
For details see here.

Resources