migration from Cloudera Hadoop to HDINSIGHT - hadoop

Can anyone tell me. I have HQL scripts that I used to run on Cloudera using hive -f scriptname.hql Now I want to run on these scripts in HDINSIGHT(Hadoop cluster) but the hive command line is not available in HDINSIGHT. Can someone guide how I can do that.

beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http' -i query.hql
Anyone has experience of using above rather than
hive -f query.hql

I don't see there is any other way to execute the HQL files.You can refer to this document-https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-beeline#run-a-hiveql-file
You can also use the zookeeper quorum(encircled) to avoid failure of queries during head node failover
beeline -u '<zookeeper quorum>' -i /path/query.hql

Create an environment variable :
export hivef="beeline -u 'jdbc:hive2://hn0-hdi-uk.witechmill.co.uk:10001/default;principal=hive/_HOST#witechmill.CO.UK;auth-kerberos;transportMode=http' -n umerrkhan "
witechmill is my clustername
Then call the script by using the below
$hivef scriptname.hql

Related

read remote properties file over ssh oozie

I want to run oozie workflow from remote machine whereas my config file exists on local machine. could you please help me how I can achieve this?
I tried below approach but it didnt work:
ssh user#remote_host "oozie job -run -config" < config.properties
giving error
Invalid sub-command: Missing argument for option: config
use 'help [sub-command]' for help details
You can pass properties in you config file with -D, such as:
ssh user#remote_host "oozie job -run -D oozie.wf.application.path=hdfs://hdf-example-cluster -D key=value"
Oozie will read specific value from system properties if you don't provide.

How to test if hbase is correctly running

I just installed hbase on a EC2 server (I also have HDFS installed, it's working).
My problem is that I don't know how to check if my Hbase is correctly installed.
To install hbase I followed this tutorial in wich they say we can check the hbase instance on the webUI on the address addressOfMyMachine:60010, I also checked on the port 16010 but this is not working.
I have an error saying this :
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
I managed to run the hbase shell but I don't know if my installation is working well.
To check if Hbase is running or not using shell script, execute the command below.
if echo -e "list" | hbase shell 2>&1 | grep -q "ERROR:" 2>/dev/null ;then echo "Hbase is not running"; fi

Zeppelin can't execute Hive use shell

I am using Zeppelin-0.5.6.
I check the log and find something below:
However, it works well when I use terminal.
I also tried some other hive commands. When I used commands below in Zeppelin I got the same error while it worked well in terminal.
%sh
hive -e "show databases"
Can anybody help?
Thanks.

How to invoke impala-shell using ldap from a script?

I've installed CDH 5.4 and latest version of Impala (2.2.0) along with OpenLDAP. While invoking impala-shell in interactive mode, all works well as the password is not visible. However, I want to run this arrangement in a shell script. If I give:
impala-shell -l -u testuser -f /path/to/file
It does ask for the password which I have to provide manually. Can this be automated using any other parameter? Is there any parameter akin to "-w" parameter of Beeline, so that the password can be secured?
Any help will be much appreciated.

SFTP file system in hadoop

Does hadoop version 2.0.0 and CDH4 have a SFTP file system in place ? I know hadoop has a support for FTP Filesystem . Does it have something similar for sftp ? I have seen some patches submitted for the sme though couldn't make sense of them ..
Consider using hadoop distcp.
Check here. That would be something like:
hadoop distcp
-D fs.sftp.credfile=/user/john/credstore/private/mycreds.prop
sftp://myHost.ibm.com/home/biadmin/myFile/part1
hdfs:///user/john/myfiles
After some research , I have figured out that hadoop currently doesn't have a FileSystem written for SFTP . Hence if you wish to read data using SFTP channel you have to either write a SFTP FileSystem (which is quite a big deal , extending and overriding lots of classes and methods) , patches of which are already been developed , though not yet integrated into hadoop , else get a customized InputFormat that reads from streams , which again is not implemented in hadoop.
You need to ensure core-site.xml is having property fs.sftp.impl set with value org.apache.hadoop.fs.sftp.SFTPFileSystem
Post this hadoop commands will work. Couple of samples are given below
ls command
Command on hadoop
hadoop fs -ls /
equivalent for SFTP
hadoop fs -D fs.sftp.user.{hostname}={username} -D fs.sftp.password.{hostname}.{username}={password} -ls sftp://{hostname}:22/
Distcp command
Command on hadoop
hadoop distcp {sourceLocation} {destinationLocation}
equivalent for SFTP
hadoop distcp -D fs.sftp.user.{hostname}={username} -D fs.sftp.password.{hostname}.{username}={password} sftp://{hostname}:22/{sourceLocation} {destinationLocation}
Ensure you are replacing all the place holders while trying these commands. I tried them on AWS EMR 5.28.1 which has Hadoop 2.8.5 installed on it
So hopefully cleaning up these answers a bit into something more digestible. Basically Hadoop/HDFS is capable of support SFTP, it's just not enabled by default, nor is it really documented in the core-default.xml very well.
The key configuration you need to set to enable SFTP support is:
<property>
<name>fs.sftp.impl</name>
<value>org.apache.hadoop.fs.sftp.SFTPFileSystem</value>
</property>
Alternatively, you can set it right at the CLI depending on your command
hdfs dfs \
-Dfs.sftp.impl=org.apache.hadoop.fs.sftp.SFTPFileSystem \
-Dfs.sftp.keyfile=~/.ssh/java_sftp_testkey.ppk \
-ls sftp://$USER#localhost/tmp/
The biggest requirement is that your SSH Keyfile needs to be password-less to work. This can be done via
cp ~/.ssh/mykeyfile.ppk ~/.ssh/mykeyfile.ppk.orig
ssh-keygen -p -P MyPass -N "" -f ~/.ssh/mykeyfile.ppk
mv ~/.ssh/mykeyfile.ppk ~/.ssh/mykeyfile_nopass.ppk
mv ~/.ssh/mykeyfile.ppk.orig ~/.ssh/mykeyfile.ppk
And finally, the biggest (and maybe neatest) is using this via distcp, if you need to send/receive a large amount of data to/from an SFTP server. There's an oddity about the ssh keyfile being needed locally to generate the directory listing, as well as on the cluster for the actual workers.
Something like this should work well enough:
cd workdir
ln -s ~/.ssh/java_sftp_testkey.ppk
hadoop distcp \
--files ~/.ssh/java_sftp_testkey.ppk \
-Dfs.sftp.impl=org.apache.hadoop.fs.sftp.SFTPFileSystem \
-Dfs.sftp.keyfile=java_sftp_testkey.ppk \
hdfs:///path/to/source/ \
sftp://user#FQDN/path/to/dest

Resources