Spark hangs/fails on manually starting master node on windows - windows

I am trying to manually start a master node on spark (2.1.0) on Windows 7 but the process hangs before it is setup.
$ bin\spark-class org.apache.spark.deploy.master.Master
17/05/17 14:23:52 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
It gets stuck here indefinitely (more than 10 mins)
My spark installation works fine otherwise, I have used pyspark to write and run scripts locally using pyspark --master local[x]. I am using winutils as this is being ran in standalone mode.
Also I have 2 other machines that I wish to use as workers, these work fine when I run this command on them (setup is near instant) and all environment variables appear to be setup the same on these workers as my (intended) master.

For anyone else coming across this issue I do not know the cause but a fresh download of spark placed in the same location resolved the problem.

Related

Is there a way to load the install-interpreter.sh file in EMR in order to load 3rd party interpreters?

I have an Apache Zeppelin notebook running and I'm trying to load the jdbc and/or postgres interpreter to my notebook in order to write to a postgres DB from Zeppelin.
The main resource to load new interpreters here tells me to run the code below to get other interpreters:
./bin/install-interpreter.sh --all
However, when I run this command in EMR terminal, I find that the EMR cluster does not come with an install-interpreter.sh executable file.
What is the recommended path?
1. Should I find the install-interpreter.sh file and load that to the EMR cluster under ./bin/?
2. Is there an EMR configuration on start time that would enable the install-interpreter.sh file?
Currently all tutorials and documentations assumes that you can run the install-interpreter.sh file.
The solution is to not run this code below in root (aka - ./ )
./bin/install-interpreter.sh --all
Instead in EMR, run the code above in Zeppelin, which in the EMR cluster, is in /usr/lib/zeppelin

Windows/Drillbit Error: Could not find or load main class org.apache.drill.exec.server.Drillbit

I have set up a Hadoop single node cluster with pseudo distributed operations, and YARN running. I am able to use Spark JAVA API to run queries as a YARN-client. I wanted to go one step further and try Apache Drill on this "cluster". I installed Zookeeper that is running smoothly but I am not able to start drill and I get this log:
nohup: ignoring input
Error: Could not find or load main class
org.apache.drill.exec.server.Drillbit
Any idea?
I am on Windows 10 with JDK 1.8.
DRILL CLASSPATH is not initialized in the process of running drillbit on your machine.
For the purpose to start Drill on Windows machine it is necessary to run sqlline.bat script, for example:
C:\bin\sqlline sqlline.bat –u "jdbc:drill:zk=local;schema=dfs"
See more info: https://drill.apache.org/docs/starting-drill-on-windows/

How can I run a Bigtable HBase shell from any directory?

I started by following these instructions to install a hbase and configure it to hit my BigTable instance. That all works fine, however next up I wanted to additionally configure this installation so I can run hbase shell from anywhere.
So I added the following to my .zshrc:
export HBASE_HOME=/path/to/my/hbase
export PATH=$HBASE_HOME:...
When I run hbase shell now I get the following:
2017-04-28 09:58:45,069 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NativeException: java.io.IOException: java.lang.ClassNotFoundException: com.google.cloud.bigtable.hbase1_2.BigtableConnection
initialize at /Users/mmscibor/.hbase/lib/ruby/hbase/hbase.rb:42
(root) at /Users/mmscibor/.hbase/bin/hirb.rb:131
I figured something was up with where it was looking for it's .jars and noticed that the .tar I downloaded had a lib directory so additionally tried:
hbase shell -cp $HBASE_HOME/lib/
But no luck. However, if I navigate to $HBASE_HOME and run hbase shell everything works fine again.
What am I missing here?
You are probably running into the issue described here:
https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/issues/226
You need to set GOOGLE_APPLICATION_CREDENTIALS in your environment, or run gcloud auth application-default login.

Spark History Server on Yarn only shows Python application

I have two spark contexts running on a box, 1 from python and 1 from scala. They are similarly configured, yet only the python application appears in the spark history page pointed to by the yarn tracking URL. Is there extra configuration I am missing here? (both run in yarn-client mode)

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

Resources