See print in python script running on spark with spark-submit - bash

I have to test some code using Spark and I'm pretty new to it.
The code I have runs an ETL script on a cluster. The ETL script is written in Python and have several prints in it but I'm unable to see those prints. The Python script is added to the spark-submit in the --py-files tag. I don't if those prints are unreachable since they are happening in the YARN executors and I should change them to logs and use log4j or add them to an accumulator reachable by the driver.
Any suggestions would help.
The final goal is to see how the execution of the code is going.I don't know if simple prints is the best solution but it was already in the code I was given to test.

Related

What does "moveToLocal: Option '-moveToLocal' is not implemented yet." means?

I'm running a oozie workflow with some bash scripts in a hadoop environment (Hadoop 2.7.3). but my workflow is failing because my shell action get an error. After save the commands output in a file as a log I found in it the next entry:
moveToLocal: Option '-moveToLocal' is not implemented yet.
After I get this error my shell action fails becouse it takes this as an error and fails the entire action?
Also that line means that my current version of hadoop (2.7.3) doesn't support that command?
According to the documentation for 2.7.3 hadoop version:
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/FileSystemShell.html#moveToLocal
Says this command is not support it yet. Now, my shell action take it as an exception and terminate the shell script. I'm changing that command for an equivalent.

hadoop WordCount program gets stuck and doesn't finish runnig

I have configured hadoop on windows10. When I try to run WordCount program it doesn't finish the job:
What can I do to fix it?
Note: I have made a little change to the original program; input and output files are compiled with the code and don't need to be provided as arguments.

Pig job hangs on first failure

I'm encountering a problem with Pig and Oozie.
I have pig script that tries to read data from non-existent table, so an exception happens in initialize method of RecordReader . And that is ok, it should occur ( as the table definitely doesn't exist).
The problem starts when such a script is launched via oozie on a multi-node hadoop cluster - after the first attempt job just hangs and does nothing until any other job is submitted to the cluster.
If launched via CMD (pig -f test.pig) it doesn't hang. It also doesn't hang if launched in local mode or on a single-node cluster(via CMD or via Oozie).
I really hope someone had a problem like this and can help me.

Running MapReduce code that uses zooKeeper

I want to ask about how to execute a MapReduce java code that uses zooKeeper.
My first code is just to create a variable (znode) and to modify it by each mapper.
So I modified the wordCount code just to test zookeeper for the first time.
When I run it using the eclipse console, everything goes well, so I can see the changes on the value of the znode, etc.
However, I was trying to execute it using linux command line:
**bin/hadoop jar ./myjar.jar algo.WordCount /input.txt /out
I got the following error
**Error: java.lang.ClassNotFoundException: org.apache.zookeeper.Watcher
Although that I added the path of the jar file using conf.set("mapred.jar","...."); in the mapreduce code but I don't know why it did not recognize the classes of zookeeper.
Any idea?

Running Hadoop examples halt in Pseudo-Distributed mode

Every thing run well in Standalone mode and when going to the pseudo-distributed mode, the HDFS works well, I can put files to HDFS and browse it. And I also checked that there is one DataNode in the live nodes lists.
However, when I run bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+', the program just halt there without producing any error. And from http://ereg.adobe.com:50070/dfsnodelist.jsp?whatNodes=LIVE I can see that nothing has ever been run on that DataNode.
I followed the configuration in the tutorial for those xml conf files. So anyone have any idea about what other mistakes I might have made? B.T.W, I'm running the stuffs on Mac OS X.
By halt, do you mean it hangs, or that it just silently returns? For Mapreduce issues, you should check the JobTracker's webpage (at port 50030) to see the status of the submitted job.

Resources