How can I run a Bigtable HBase shell from any directory? - shell

I started by following these instructions to install a hbase and configure it to hit my BigTable instance. That all works fine, however next up I wanted to additionally configure this installation so I can run hbase shell from anywhere.
So I added the following to my .zshrc:
export HBASE_HOME=/path/to/my/hbase
export PATH=$HBASE_HOME:...
When I run hbase shell now I get the following:
2017-04-28 09:58:45,069 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
NativeException: java.io.IOException: java.lang.ClassNotFoundException: com.google.cloud.bigtable.hbase1_2.BigtableConnection
initialize at /Users/mmscibor/.hbase/lib/ruby/hbase/hbase.rb:42
(root) at /Users/mmscibor/.hbase/bin/hirb.rb:131
I figured something was up with where it was looking for it's .jars and noticed that the .tar I downloaded had a lib directory so additionally tried:
hbase shell -cp $HBASE_HOME/lib/
But no luck. However, if I navigate to $HBASE_HOME and run hbase shell everything works fine again.
What am I missing here?

You are probably running into the issue described here:
https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/issues/226
You need to set GOOGLE_APPLICATION_CREDENTIALS in your environment, or run gcloud auth application-default login.

Related

Hadoop : start-dfs.sh does not work when calling directly

I have a very strange problem when starting hadoop.
When I call start-dfs.sh using absolute path /usr/local/hadoop/etc/hadoop/sbin/start-dfs.sh, it starts without any problem.
But as I add hadoop into my environment variables :
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
I would like to call it directly using start-dfs.sh. But when I start like this, it throws error :
20/10/26 16:36:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
20/10/26 16:36:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I wonder what is the problem ? I have all my Java home and core-site.xml well configured. Why it's not working if I start it directly from bash ?
It seems that you need to set the JAVA_HOME environment variable to where your java package is located in your (I suppose) Linux distribution. To do that, you have to locate the path to the java installation.
In order to do that you can use the following command on your terminal, as shown here:
find /usr -name java 2> /dev/null
which is gonna output one or a number of paths (depends on how many java versions you have on your system) like in the screenshot below:
You can choose one of the versions (or just take the single one you have) and copy the path of it.
Up next, to set the environment variable of JAVA_HOME with its path you need to copy the path you got from the output above and trim the last directory (aka the /java directory) off of it on a text editor.
For my system I chose the third version of java in my system so I went in the .bashrc file and added those 2 lines at the bottom (notice how on the setting of the variable the path ends before the /bin directory, while the setting of its path ends after the /bin directory):
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-i386
export PATH=$PATH:/usr/lib/jvm/java-11-openjdk-i386/bin/
So the bottom of the .bashrc file looks like this:
And to test it out, it works without the full path on the script (this also works for the start-all.sh and stop-all.sh scripts as well):
Finally the problem is that I have another hadoop in /opt/module. When I call hdfs for example, it refers to the /opt/module one than the one in /usr/local.

Spark 2.0.1 not finding file passed in through archives flag

I was running Spark job which make use of other files that is passed in through --archives flag of spark
spark-submit .... --archives hdfs:///user/{USER}/{some_folder}.zip .... {file_to_run}.py
Spark is currently running on YARN and when I tried it with spark version 1.5.1 it was fine.
However, when I ran the same commands with spark 2.0.1, I got
ERROR yarn.ApplicationMaster: User class threw exception: java.io.IOException: Cannot run program "/home/{USER}/{some_folder}/.....": error=2, No such file or directory
Since the resource is managed by YARN, it is challenging to manually check if the file gets successfully decompressed and exist when the job runs.
I wonder if anyone has experienced similar issue.

Spark hangs/fails on manually starting master node on windows

I am trying to manually start a master node on spark (2.1.0) on Windows 7 but the process hangs before it is setup.
$ bin\spark-class org.apache.spark.deploy.master.Master
17/05/17 14:23:52 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
It gets stuck here indefinitely (more than 10 mins)
My spark installation works fine otherwise, I have used pyspark to write and run scripts locally using pyspark --master local[x]. I am using winutils as this is being ran in standalone mode.
Also I have 2 other machines that I wish to use as workers, these work fine when I run this command on them (setup is near instant) and all environment variables appear to be setup the same on these workers as my (intended) master.
For anyone else coming across this issue I do not know the cause but a fresh download of spark placed in the same location resolved the problem.

How to run Spark Streaming application on Windows 10?

I run a Spark Streaming application on MS Windows 10 64-bit that stores data in MongoDB using spark-mongo-connector.
Whenever I run the Spark application, even pyspark I get the following exception:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
Full stack trace:
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 32 more
I use Hadoop 3.0.0 alpha1 that I installed myself locally with HADOOP_HOME environment variable pointing to the path to Hadoop dir and %HADOOP_HOME%\bin in the PATH environment variable.
So I tried to do the following:
> hdfs dfs -ls /tmp
Found 1 items
drw-rw-rw- - 0 2016-12-26 16:08 /tmp/hive
I tried to change the permissions as follows:
hdfs dfs -chmod 777 /tmp/hive
but this command outputs:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I seem to be missing Hadoop's native library for my OS, which after looking it up also appears that i need to recomplie the libhadoop.so.1.0.0 for 64bit platform.
Where can I find the native library for Windows 10 64-bit?
Or is there another way of solving this ? aprt from the library ?
First of all, you don't have to install Hadoop to use Spark, including Spark Streaming module with or without MongoDB.
Since you're on Windows there is the known issue with NTFS' POSIX-incompatibility so you have to have winutils.exe in PATH since Spark does use Hadoop jars under the covers (for file system access). You can download winutils.exe from https://github.com/steveloughran/winutils. Download one from hadoop-2.7.1 if you don't know which version you should use (but it should really reflect the version of Hadoop your Spark Streaming was built with, e.g. Hadoop 2.7.x for Spark 2.0.2).
Create c:/tmp/hive directory and execute the following as admin (aka Run As Administrator):
winutils.exe chmod -R 777 \tmp\hive
PROTIP Read Problems running Hadoop on Windows for the Apache Hadoop project's official answer.
The message below is harmless and you can safely disregard it.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform

Spark with HDFS without YARN

I am trying to configure HDFS for spark. Simple running spark-sub
mit with --master spark://IP:7077 --deploy-mode cluster.... ends up with
16/04/08 10:16:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Then it end work.
I downloaded and lanuched Hadoop cluster, for testing purposes only one machine. I also set envirtoment variables, althouth I think that I forget about some of them. In the fact I set:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/spark/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
Could you try to help me ?
I followed it: http://www.ccs.neu.edu/home/cbw/spark.html

Resources