Zeppelin can't execute Hive use shell - hadoop

I am using Zeppelin-0.5.6.
I check the log and find something below:
However, it works well when I use terminal.
I also tried some other hive commands. When I used commands below in Zeppelin I got the same error while it worked well in terminal.
%sh
hive -e "show databases"
Can anybody help?
Thanks.

Related

migration from Cloudera Hadoop to HDINSIGHT

Can anyone tell me. I have HQL scripts that I used to run on Cloudera using hive -f scriptname.hql Now I want to run on these scripts in HDINSIGHT(Hadoop cluster) but the hive command line is not available in HDINSIGHT. Can someone guide how I can do that.
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http' -i query.hql
Anyone has experience of using above rather than
hive -f query.hql
I don't see there is any other way to execute the HQL files.You can refer to this document-https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-beeline#run-a-hiveql-file
You can also use the zookeeper quorum(encircled) to avoid failure of queries during head node failover
beeline -u '<zookeeper quorum>' -i /path/query.hql
Create an environment variable :
export hivef="beeline -u 'jdbc:hive2://hn0-hdi-uk.witechmill.co.uk:10001/default;principal=hive/_HOST#witechmill.CO.UK;auth-kerberos;transportMode=http' -n umerrkhan "
witechmill is my clustername
Then call the script by using the below
$hivef scriptname.hql

unable to start a job using spark-submit via ssh (on EC2)

I set up spark on a single EC2 machine and, when I am connected to it, I am able to use spark either with jupyter or spark-submit, without any issue. Unfortunately, though, I am not able to use spark-submit via ssh.
So, to recap:
This works:
ubuntu#ip-198-43-52-121:~$ spark-submit job.py
This does not work:
ssh -i file.pem ubuntu#blablablba.compute.amazon.com "spark-submit job.py"
Initially, I kept getting the following error message over and over:
'java.io.IOException: Cannot run program "python": error=2, No such file or directory'
After having read many articles and posts about this issue, I thought that the problem was due to some variables not having been set properly, so I added the following lines to the machine's .bashrc file:
export SPARK_HOME=/home/ubuntu/spark-3.0.1-bin-hadoop2.7 #(it's where i unzipped the spark file)
export PATH=$SPARK_HOME/bin:$PATH
export PYTHONPATH=/usr/bin/python3
export PYSPARK_PYTHON=python3
(As the error message referenced python, I also tried adding the line "alias python=python3" to .bashrc, but nothing changed)
After all this, if I try to submit the spark job via ssh I get the following error message:
"command spark-submit not found".
As it looks like the system ignores all the environment variables when sending commands via SSH, I decided to source the machine's .bashrc file before trying to run the spark job. As I was not sure about the most appropriate way to send multiple commands via SSH, I tried all the following ways:
ssh -i file.pem ubuntu#blabla.compute.amazon.com "source .bashrc; spark-submit job.file"
ssh -i file.pem ubuntu#blabla.compute.amazon.com << HERE
source .bashrc
spark-submit job.file
HERE
ssh -i file.pem ubuntu#blabla.compute.amazon.com <<- HERE
source .bashrc
spark-submit job.file
HERE
(ssh -i file.pem ubuntu#blabla.compute.amazon.com "source .bashrc; spark-submit job.file")
All attempts worked with other commands like ls or mkdir, but not with source and spark-submit.
I have also tried providing the full path running the following line:
ssh -i file.pem ubuntu#blabla.compute.amazon.com "/home/ubuntu/spark-3.0.1-bin-hadoop2.7/bin/spark-submit job.py"
In this case too I get, once again, the following message:
'java.io.IOException: Cannot run program "python": error=2, No such file or directory'
How can I tell spark which python to use if SSH seems to ignore all environment variables, no matter how many times I set them?
It's worth mentioning I have got into coding and data a bit more than a year ago, so I am really a newbie here and any help would be highly appreciated. The solution may be very simple, but I cannot get my head around it. Please help.
Thanks a lot in advance :)
The problem was indeed with the way I was expecting the shell to work (which was wrong).
My issue was solved by:
Setting my variables in .profile instead of .bashrc
Providing full path to python
Now I can launch spark jobs via ssh.
I found the solution in the answer #VinkoVrsalovic gave to this post:
Why does an SSH remote command get fewer environment variables then when run manually?
Cheers

How to test if hbase is correctly running

I just installed hbase on a EC2 server (I also have HDFS installed, it's working).
My problem is that I don't know how to check if my Hbase is correctly installed.
To install hbase I followed this tutorial in wich they say we can check the hbase instance on the webUI on the address addressOfMyMachine:60010, I also checked on the port 16010 but this is not working.
I have an error saying this :
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
I managed to run the hbase shell but I don't know if my installation is working well.
To check if Hbase is running or not using shell script, execute the command below.
if echo -e "list" | hbase shell 2>&1 | grep -q "ERROR:" 2>/dev/null ;then echo "Hbase is not running"; fi

Unable to run psql command from within a BASH script

I have run into a problem with the psql command in my BASH script as I am trying to login to my local postgres database and submit a query. I am using the command in the following way:
psql -U postgres -d rebasoft_appauditor -c "SELECT * FROM katan_scripts"
However, I get the following error message.
psql: FATAL: Ident authentication failed for user "postgres"
This runs perfectly fine from the command line after I appended the following changes to /var/lib/pgsql/data/pg_hba.conf:
local all all trust
host all all 127.0.0.1/32 trust
Also, could this please be verified for correctness?
I find it rather strange that database authentication works fine on the command line but in a script it fails. Could anyone please help with this?
Note: I am using MAC OSX
It might possibly depend on your bash script.
Watch for the asterisk (*) not be replaced with the file names in your current directory. And possibly a semicolon or \g might help to actually send the SQL statement to the database server.

How to invoke impala-shell using ldap from a script?

I've installed CDH 5.4 and latest version of Impala (2.2.0) along with OpenLDAP. While invoking impala-shell in interactive mode, all works well as the password is not visible. However, I want to run this arrangement in a shell script. If I give:
impala-shell -l -u testuser -f /path/to/file
It does ask for the password which I have to provide manually. Can this be automated using any other parameter? Is there any parameter akin to "-w" parameter of Beeline, so that the password can be secured?
Any help will be much appreciated.

Resources