Running pysparkling-water using Livy spark failed - h2o

I have been able to run the ChicagoCrimeDemo.py script using spark-submit successfully (spark-submit --master=yarn-client --py-files /opt/sparkling-water-1.6.10/py/build/dist/h2o_pysparkling_1.6-1.6.10-py2.7.egg /opt/sparkling-water-1.6.10/py/examples/scripts/ChicagoCrimeDemo.py) .
Although when I try to execute the same script using Livy(Spark), I am getting the following error:

Related

Error in : hadoop jar tutorial_classes_com.jar WordCount /WordCountApp/input /WordCountApp/output is showing error in mapreduce.xml in one system

While I try to run this command its showing error in one system and I tried in another its working. Why is it so? I'm using hadoop-3.3.2 using lubuntu terminal.
The job is not submitting. Its showing error in mapred.xml configuration, but everything is proper I worked in another system there its working. What I should do?
hadoop jar tutorial_classes_com.jar WordCount /WordCountApp/input /WordCountApp/output

Spark 2.0.1 not finding file passed in through archives flag

I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.

Can't seem to build hive for spark

I have been trying to run this code in pyspark.
sqlContext = HiveContext(sc)
datumDF = sqlContext.createDataFrame(datumX, schema)
But have been receiving this warning:
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o44))
I log in to AWS and spin up clusters with this code: /User/Downloads/spark-1.5.2-bin-hadoop2.6/ec2/spark-ec2 -k name -i /User/Desktop/pemfile.pem login clustername
However I all the docs I've found involve this commands, which exist in the file
/users/downloads/spark-1.5.2/ I've run them anyway, and tried logging into was using the ec2 command in that folder after I did. Still, just got the same error
I submit export SPARK_HIVE=TRUE before running these commands on my local machine, but I've seen messages saying its deprecated and will be ignored anyway.
Build hive with maven:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0
-Phive -Phive-thriftserver -DskipTests clean package
Build hive with sbt
build/sbt -Pyarn -Phadoop-2.3 assembly
And another I found
./sbt/sbt -Phive assembly
I also took the hive-site.xml file and put in both the /Users/Downloads/spark-1.5.2-bin-hadoop2.6/conf folder and the /Users/Downloads/spark-1.5.2/conf
Still no luck.
I can't seem to run the hive commands no matter what I build it with or how I log in. Is there anything obvious I'm missing.
I too had the same error when using a HiveContext on a EC2 cluster built with the ec2 scripts that comes with the Spark package (v1.5.2 in my case). Through much trial and error, I found that building a EC2 cluster with the following options got the right version of Hadoop with Hive properly built so that I can use a HiveContext in my PySpark jobs:
spark-ec2 -k <your key pair name> -i /path/to/identity-file.pem -r us-west-2 -s 2 --instance-type m3.medium --spark-version 1.5.2 --hadoop-major-version yarn launch <your cluster name>
The key parameters here is that you set --spark-version to 1.5.2 and --hadoop-major-version to yarn - even though you aren't using to use Yarn to submit jobs as it forces the hadoop build to be 2.4. Of course, adjust the other parameters as appropriate for your desired cluster.

Hadoop mkdirs fails during execution of a jar file

I am a very begineer on Hadoop. I developed a jar and tried to execute it with command below. But I got error: Mkdirs failed to create D:...\META-INF\license
I checked all permissions and gave full access but did not work.
command: hadoop jar wiki-stats.jar example/data/stats.txt example/results/
Thanks in advance

Error: Failed to create Data Storage while running embedded pig in java

I wrote a simple program to test the embedded pig in java to run in mapreduce mode.
The hadoop version in the server I am running is 0.20.2-cdh3u4a, and pig version is 0.10.0-cdh3u4a.
When I try to run in local mode, it runs successfully. But when I try to run in mapreduce mode, it gives me the error.
I run my program using the following commands as shown in http://pig.apache.org/docs/r0.9.1/cont.html#embed-java
javac -cp pig.jar EmbedPigTest.java
javac -cp pig.jar:.:/etc/hadoop/conf EmbedPigTest.java input.txt
My program gives error as:
Exception in thread "main" java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:214)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
at org.apache.pig.PigServer.<init>(PigServer.java:226)
at org.apache.pig.PigServer.<init>(PigServer.java:215)
at org.apache.pig.PigServer.<init>(PigServer.java:211)
at org.apache.pig.PigServer.<init>(PigServer.java:207)
at WordCount.main(EmbedPigTest.java:9)
In some online resources they say that this problem occurs due to different hadoop version. But, I didn't understand what I should do. Suggestions please !!
This is happening because you are linking to the wrong jar, Please see the link below it describes this issue very well.
http://localsteve.wordpress.com/2012/09/30/embedding-pig-for-cdh4-java-apps-fer-realz/
I was faced same kind of issue when I tried to use pig in map reduce mode without starting the services.
Please check all services using jps before using pig in map reduce mode.

Resources