Spark-shell error on windows "Illegal character in path at index 32" - windows

I am trying to setup spark on my new windows laptop. I am getting below error while running spark-shell :
"
ERROR Main: Failed to initialize Spark session.
java.lang.reflect.InvocationTargetException
Caused by: java.net.URISyntaxException: Illegal character in path at index 32: spark://DESKTOP-RCMDGS4:49985/C:\classes"
I am using below s/w :
Spark 3.2.1
Java 11
Hadoop: winutils
I have set below environment variables :
HADOOP_HOME, SPARK_HOME, JAVA_HOME, PATH

This is known issue in latest spark version. Downgrade to 3.0.3 could fix the issue.

Related

running hadoop-free spark build on windows 10

currently i installed Hadoop ver 3.2.3 on windows 10 and there is no problem running it. i also downloaded spark-3.3.0-bin-without-hadoop.tgz and set Enviornment variable SPARK_HOME appropriately.
i also check hadoop classpath and it returned:
C:\hadoop-3.2.3\etc\hadoop;C:\hadoop-3.2.3\share\hadoop\common;C:\hadoop-3.2.3\share\hadoop\common\lib\*;C:\hadoop-3.2.3\share\hadoop\common\*;C:\hadoop-3.2.3\share\hadoop\hdfs;C:\hadoop-3.2.3\share\hadoop\hdfs\lib\*;C:\hadoop-3.2.3\share\hadoop\hdfs\*;C:\hadoop-3.2.3\share\hadoop\yarn;C:\hadoop-3.2.3\share\hadoop\yarn\lib\*;C:\hadoop-3.2.3\share\hadoop\yarn\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\lib\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\*
seem's fine.
and i added
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
in the spark-env.sh
but still when i type spark-shell or pyspark command in command line i get the famous error
Error: A JNI error has occurred, please check your installation and try again. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
...
the path seem's to be right.
any idea what should i do now?
thanks for your attention!

Hive cli starting throws error Unrecognized Hadoop major version number: 1.0.4

I am facing below issue to start Hive/beeline :
*Logging initialized using configuration in jar:file:/home/mine/work/apache-hive-2.3.6-bin/lib/hive-common-2.3.6.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 1.0.4
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:91)*
i followed below url to set up hive setup:
https://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
previously, i had hadoop 1.2.1. now installed 2.7.3.
bashrc contains:
mine#ubuntu:~$ echo $HADOOP_HOME
/home/mine/work/hadoop-2.7.3
mine#ubuntu:~$ echo $HIVE_HOME
/home/mine/work/apache-hive-2.3.6-bin
hive-env.sh contains:
export HADOOP_HOME=/home/mine/work/hadoop-2.7.3
Derby server started.
I am not understanding where hadoop 1.0.4 comes. Is there any compatible issue.
Kindly, Please help me with ur Valuable suggestions.
Thanks in advance,
Try : export HADOOP_VERSION="2.7.3"

HBase-testing-util: All datanodes are bad error on Windows

I'm trying to use hbase-testing-util (1.2.0) on my project, but i get the following error:
An exception or error caused a run to abort: All datanodes 127.0.0.1:54655 are bad. Aborting...
I'm using IntelliJ on Windows 10, and I've correctly setup HADOOP_HOME environment variable.
I read that property mapred.max.split.size could help me, but i don't know which file I have to modify.

NameNode: Failed to start namenode in windows 7

I am trying to install Hadoop in windows machine, in middle I got the below error.
Logs
17/11/28 16:31:48 ERROR namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyze
Storage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:369)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
Looks like you didn't install Hadoop winutils or build Hadoop with Native Libraries
Native IO is mandatory on Windows and without it you will not be able to get your installation working. You must follow all the instructions from BUILDING.txt to ensure that Native IO support is built correctly
Hadoop2 on Windows
I also have the similar issue.
I am using Hadoop-2.8.1. These steps solved the error for me.
download the winutils of your version from GitHub
Copy paste winutils at <HADOOP_HOME>/bin/
Also. double check JAVA_HOME environment is correctly set and reference in hadoop-env.cmd file

cant run pig with single node hadoop server

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage
I am missing something very obvious. Can someone help me get this running please?
More details:
1. I assume that hadoop is running fine because, I could run MapReduce jobs in python.
2. pig -x local runs as i expect.
3. when i just type pig it gives me following error
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.(PigServer.java:226)
at org.apache.pig.PigServer.(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================
Link helped me understand possible cause of failure.
Here is what fixed my problem.
1. Recompile pig without hadoop.
2. Update PIG_CLASSPATH to have all the jars from $HADOOP_HOME/lib
3. Run pig.
Thanks.
set your PIG_CLASSPATH to point to your correct HADOOP_HOME installation so that Pig can pick up ur cluster information from core-site.xml,mapreduce-site.xml and hdfs-site.xml,better to follow the link for correct installation.
Just install Cygwin, then add the Cygwin path to the Path Environment Variable:
For details see here.

Resources