cant run pig with single node hadoop server - hadoop

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always prom ERROR 2999: Unexpected internal error. Failed to create DataStorage
I am missing something very obvious. Can someone help me get this running please?
More details:
1. I assume that hadoop is running fine because, I could run MapReduce jobs in python.
2. pig -x local runs as i expect.
3. when i just type pig it gives me following error
Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage
java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.(PigServer.java:226)
at org.apache.pig.PigServer.(PigServer.java:215)
at org.apache.pig.tools.grunt.Grunt.(Grunt.java:55)
at org.apache.pig.Main.run(Main.java:452)
at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
at org.apache.hadoop.ipc.Client.call(Client.java:743)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 9 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
================================================================================

Link helped me understand possible cause of failure.
Here is what fixed my problem.
1. Recompile pig without hadoop.
2. Update PIG_CLASSPATH to have all the jars from $HADOOP_HOME/lib
3. Run pig.
Thanks.

set your PIG_CLASSPATH to point to your correct HADOOP_HOME installation so that Pig can pick up ur cluster information from core-site.xml,mapreduce-site.xml and hdfs-site.xml,better to follow the link for correct installation.

Just install Cygwin, then add the Cygwin path to the Path Environment Variable:
For details see here.

Related

running hadoop-free spark build on windows 10

currently i installed Hadoop ver 3.2.3 on windows 10 and there is no problem running it. i also downloaded spark-3.3.0-bin-without-hadoop.tgz and set Enviornment variable SPARK_HOME appropriately.
i also check hadoop classpath and it returned:
C:\hadoop-3.2.3\etc\hadoop;C:\hadoop-3.2.3\share\hadoop\common;C:\hadoop-3.2.3\share\hadoop\common\lib\*;C:\hadoop-3.2.3\share\hadoop\common\*;C:\hadoop-3.2.3\share\hadoop\hdfs;C:\hadoop-3.2.3\share\hadoop\hdfs\lib\*;C:\hadoop-3.2.3\share\hadoop\hdfs\*;C:\hadoop-3.2.3\share\hadoop\yarn;C:\hadoop-3.2.3\share\hadoop\yarn\lib\*;C:\hadoop-3.2.3\share\hadoop\yarn\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\lib\*;C:\hadoop-3.2.3\share\hadoop\mapreduce\*
seem's fine.
and i added
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
in the spark-env.sh
but still when i type spark-shell or pyspark command in command line i get the famous error
Error: A JNI error has occurred, please check your installation and try again. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
...
the path seem's to be right.
any idea what should i do now?
thanks for your attention!

Installing splice machine on HDP 2.6.5

We tried almost all the guides that we can find online to splice machine as a ambari service.
But everytime we run the sqlshell.sh, it just says there is no server running and unable to connect to port 1527 on localhost.
We have a simple HDP sandbox version 2.6.5 and a three node 2.6.5 cluster. We are trying to install version 2.87 of splicemachine.
These are the guides we followed.
https://github.com/splicemachine/spliceengine/blob/branch-2.8/platforms/hdp2.6.5/docs/HDP-installation.md
That did not work on our three node cluster
Then we tried the sandbox with this tutorial
https://github.com/splicemachine/splice-ambari-service
Again the same result.
Please let us know if there is anything that we have missed in the guide/ or are there any extra steps.
Can you confirm you are running sqlshell.sh on a server that is running a region server? I would look at the region server log files. You want to look for 'Ready to accept JDBC connections on 0.0.0.0:1527'. That is the indicator that a region server is up and running. If you do not see that, then can you look in the log file and see if there are any error messages?
Found out the answer - Apart from what is mentioned in the doc here - https://github.com/splicemachine/spliceengine/blob/branch-2.8/platforms/hdp2.6.5/docs/HDP-installation.md.
There were multiple NoClassDefFoundError errors being thrown in the hbase master logs for classes found in spark2 jars.
One of them was
master.HMaster: The coprocessor com.splicemachine.hbase.SpliceMasterObserver threw java.io.IOException: Unexpected exception
java.io.IOException: Unexpected exception
at com.splicemachine.si.data.hbase.coprocessor.CoprocessorUtils.getIOException(CoprocessorUtils.java:30)
at com.splicemachine.hbase.SpliceMasterObserver.start(SpliceMasterObserver.java:111)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment.startup(CoprocessorHost.java:415)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadInstance(CoprocessorHost.java:256)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadSystemCoprocessors(CoprocessorHost.java:159)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost.<init>(MasterCoprocessorHost.java:93)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:774)
at org.apache.hadoop.hbase.master.HMaster.access$900(HMaster.java:225)
at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:2038)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: org/spark_project/guava/util/concurrent/ThreadFactoryBuilder
at com.splicemachine.timestamp.impl.TimestampClient.<init>(TimestampClient.java:108)
at com.splicemachine.timestamp.hbase.ZkTimestampSource.initialize(ZkTimestampSource.java:62)
at com.splicemachine.timestamp.hbase.ZkTimestampSource.<init>(ZkTimestampSource.java:48)
at com.splicemachine.si.data.hbase.coprocessor.HBaseSIEnvironment.<init>(HBaseSIEnvironment.java:146)
at com.splicemachine.si.data.hbase.coprocessor.HBaseSIEnvironment.loadEnvironment(HBaseSIEnvironment.java:100)
at com.splicemachine.hbase.SpliceMasterObserver.start(SpliceMasterObserver.java:81)
... 8 more
Caused by: java.lang.ClassNotFoundException: org.spark_project.guava.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
We found out that this class was in the spark-network-common jar and symlinked it to one of the folders that was part of HBASE_CLASSPATH. But then it threw another classdef not found error for a different class.
We ended up symlinking all the jars in spark2/jars folder to one of the folders that was part of HBASE_CLASSPATH.
After this the hbase master came up successfully and started the splice db process and we were able to connect with the sqlshell.sh
Nots: Make sure to do this symlinking on all nodes that have a regionserver.
What helped us in figuring this out were these two java files in splice documentation
https://github.com/splicemachine/spliceengine/blob/master/hbase_sql/src/main/java/com/splicemachine/hbase/SpliceMasterObserver.java
and
https://github.com/splicemachine/spliceengine/blob/ee61cadf17c97a0c5d866e2b764142f8e55311a5/splice_timestamp_api/src/main/java/com/splicemachine/timestamp/impl/TimestampServer.java

NameNode: Failed to start namenode in windows 7

I am trying to install Hadoop in windows machine, in middle I got the below error.
Logs
17/11/28 16:31:48 ERROR namenode.NameNode: Failed to start namenode.
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:609)
at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:996)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyze
Storage(Storage.java:490)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:369)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:225)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:978)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
Looks like you didn't install Hadoop winutils or build Hadoop with Native Libraries
Native IO is mandatory on Windows and without it you will not be able to get your installation working. You must follow all the instructions from BUILDING.txt to ensure that Native IO support is built correctly
Hadoop2 on Windows
I also have the similar issue.
I am using Hadoop-2.8.1. These steps solved the error for me.
download the winutils of your version from GitHub
Copy paste winutils at <HADOOP_HOME>/bin/
Also. double check JAVA_HOME environment is correctly set and reference in hadoop-env.cmd file

Hadoop java.io.IOException: Mkdirs failed to create /some/path, when running mapreduce job on mac osx

When I run my MR job on mac osx, I face on the following exception:
Exception in thread "main" java.io.IOException: Mkdirs failed to create /var/folders/9m/w_vzzmtx0rq0tt9whf_r4yhr0000gn/T/hadoop-unjar7688811202881231043/META-INF/license
at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:128)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:104)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
According other post, people gave alternative way to remove META-INF/LICENSE from jar file. I feel it seems temporary solution.
I think it will resolve if path trying to store tmp files below:
/var/folders/9m/.../META-INF/license
I checked permission and tried to change "hadoop.tmp.dir" value in core-site.xml, but it doesn't work for me.
PS. I know the issue is caused case-insensitive property for osx. Then, I am working with directory mounted disk image, which is case sensitive.
Thanks in advance!

Pig, Oozie, and HBase - java.io.IOException: No FileSystem for scheme: hbase

My Pig script works fine on its own, until I put it in an Oozie workflow, where I receive the following error:
ERROR 2043: Unexpected error during execution.
org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution.
...
Caused by: java.io.IOException: No FileSystem for scheme: hbase
I registered the HBase and Zookeeper jars successfully, but received the same error.
I also attempted to set the Zookeeper Quorum by adding variation of these lines in the Pig script:
SET hbase.zookeeper.quorum 'vm-myhost-001,vm-myhost-002,vm-myhost-003'
Some searching on the internet instructed me to add this to the beginning of my workflow.xml:
SET mapreduce.fileoutputcommitter.marksuccessfuljobs false
This solved the problem. I was even able to remove the registration of the HBase and Zookeeper jars and the Zookeeper quorum.
Now after double checking, I noticed that my jobs actually do their job: they store the results in HBase as expected. But, Oozie claims that a failure occurred, when it didn't.
I don't think that setting the mapreduce.fileoutputcommitter.marksuccessfuljobs to false constitutes a solution.
Are there any other solutions?
It seems that there is currently no real solution for this.
However, this answer to a different question seems to indicate that the best workaround is to create the success flag 'manually'.

Resources