Dump data from beeline in HDFS directory

Dump data from beeline in HDFS directory - hadoop

i am writing a bash script to export dynamic sql query into a hql file in HDFS directory.I am going to run this bash through oozie.
sql_v= select 'create table table_name from user_tab_columns where ...;'
beeline -u "$sql_v" > local_path
sql_v variable will store dynamic create table command which i want to store in a hql file in hdfs directory. If i run above 2 steps it runs fine because i am storing data in local path but instead of passing local_path i want to store sql in hdfs directory.Is there a way i can pass hdfs path instead of local_path like below but this doesn't work. Can i use any other command instead of beeline to achieve this ?
beeline -u "$sql_v" | hdfs dfs -appendToFile -

If the goal is to write the output of beeline to hdfs file then below options should work fine since both commands will pipe the standard output of beeline to hadoop commands as input which is recognized by (-).
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -put - /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" | hadoop fs -appendToFile - /user/userid/file.hql
Note:
1. It's a little unclear based on your question and comments on why can't you use the suggestion given by #cricket_007 and why to go for a beeline in particular.
echo "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -appendToFile file.hql /user/userid/file.hql
beeline -u beeline_connection_string .... -e "$sql_v" > file.hql
hadoop fs -put file.hql /user/userid/file.hql
If oozie shell action is used to run the bash script which containing the sql_v and beeline command, beeline needs to be present in the node where shell action will run if not you will face beeline not found an error.
Refer: beeline-command-not-found-error

Related

how do you create a hive warehouse directory?

I've installed hadoop and hive. I am trying to configure hive as follows:
hadoop fs -mkdir /data/hive/warehouse
I keep getting this error:
mkdir: '/data/hive/warehouse': No such file or directory
Do I need to create the directories with os commands before issuing the hadoop fs command? Any ideas?

You're missing the -p option similar to UNIX/Linux.
$ hadoop fs -mkdir -p /data/hive/warehouse
In addition, you should also chmod 1777 this directory if you're setting this up for multiple users and add /user/hive if you're running Hive as user hive.
$ hadoop fs -chmod -R 1777 /data/hive/warehouse
$ hadoop fs -mkdir -p /user/hive
$ hadoop fs -chown hive:hive /user/hive
See Apache Hive File System Permissions in CDH and Where does Hive store files in HDFS?.

Store file on hadoop

I want to store some .tbl files in hadoop.
I am using this command: hadoop fs -put customer.tbl
But Im getting:
Usage: java FsShell [- put <localsrc> .. <dst>]
If I do hadoop fs -cat cusomer.tbl
It appears that file does note exist.

It seems like you need to provide local-src and HDFS-dst.
Can you try to add destination?
e.g. hadoop fs -put customer.tbl .
please also try execute "ls" on the HDFS:
hadoop fs -ls
please also try execute "ls" on the HDFS using hdfs command, 'hdfs' should be found under hadoop-version-number/bin/:
hdfs dfs -ls

Hadoop Crontab Put

Im trying to program with crontab a simple task, copy some files from local to HDFS. My code is this:
#!/bing/ksh
ANIO=$(date +"%Y")
MES=$(date +"%m")
DIA=$(date +"%d")
HORA=$(date +"%H")
# LOCAL AND HDFS DIRECTORIES
DIRECTORIO_LOCAL="/home/cloudera/bicing/data/$ANIO/$MES/$DIA/stations"$ANIO$MES$DIA$HORA"*"
DIRECTORIO_HDFS="/bicing/data/$ANIO/$MES/$DIA/"
# Test if the destination directory exist and create it if it's necesary
echo "hdfs dfs -test -d $DIRECTORIO_HDFS">>/home/cloudera/bicing/data/logFile
hdfs dfs -test -d $DIRECTORIO_HDFS
if [ $? != 0 ]
then
echo "hdfs dfs -mkdir -p $DIRECTORIO_HDFS">>/home/cloudera/bicing/data/logFile
hdfs dfs -mkdir -p $DIRECTORIO_HDFS
fi
# Upload the files to HDFS
echo "hdfs dfs -put $DIRECTORIO_LOCAL $DIRECTORIO_HDFS">>/home/cloudera/bicing/data/logFile
hdfs dfs -put $DIRECTORIO_LOCAL $DIRECTORIO_HDFS
As you can see is quite simple, it only define the folders variables, create the directory in HDFS (if it doesn't exists) and copies the files from local to HDFS.
The script works if I launch it directly on the Terminal but when I schedule it with Crontab it doesn't "put" the files in HDFS.
Moreover, the script creates a "logFile" with the commands that should have been executed. When I copy them to the Terminal them work perfectly.
hdfs dfs -test -d /bicing/data/2015/12/10/
hdfs dfs -mkdir -p /bicing/data/2015/12/10/
hdfs dfs -put /home/cloudera/bicing/data/2015/12/10/stations2015121022* /bicing/data/2015/12/10/
I have checked the directories and files, but I cant find the key to solve it.
Thanks in advance!!!

When you execute these commands on the console, they run fine, because "HADOOP_HOME" is set. But, when the Cron job runs, most likely, "HADOOP_HOME" environment variable is not available.
You can resolve this problem in 2 ways:
In the script, add the following statements at the beginning. This will add the paths of all the Hadoop jars to your environment.
export HADOOP_HOME={Path to your HADOOP_HOME}
export PATH=$PATH:$HADOOP_HOME\etc\hadoop\;$HADOOP_HOME\share\hadoop\common\*;$HADOOP_HOME\share\hadoop\common\lib\*;$HADOOP_HOME\share\hadoop\hdfs\*;$HADOOP_HOME\share\hadoop\hdfs\lib\*;$HADOOP_HOME\share\hadoop\mapreduce\*;$HADOOP_HOME\share\hadoop\mapreduce\lib\*;$HADOOP_HOME\share\hadoop\tools\*;$HADOOP_HOME\share\hadoop\tools\lib\*;$HADOOP_HOME\share\hadoop\yarn\*;$HADOOP_HOME\share\hadoop\yarn\lib\*
You can also update your .profile (present in $HOME/.profile) or .kshrc (present in $HOME/.kshrc) to include the HADOOP paths.
That should solve your problem.

How to unzip file in hadoop?

I was trying to unzip a zip file, stored in Hadoop file system, & store it back in hadoop file system. I tried following commands, but none of them worked.
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop fs -put - /tmp
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp/
hadoop fs -cat /tmp/test.zip|gzip -d|hadoop put - /tmp
I get errors like gzip: stdin has more than one entry--rest ignored, cat: Unable to write to output stream., Error: Could not find or load main class put on terminal, when I run those commands. Any help?
Edit 1: I don't have access to UI. So, only command lines are allowed. Unzip/gzip utils are installed on my hadoop machine. I'm using Hadoop 2.4.0 version.

To unzip a gzipped (or bzipped) file, I use the following
hdfs dfs -cat /data/<data.gz> | gzip -d | hdfs dfs -put - /data/
If the file sits on your local drive, then
zcat <infile> | hdfs dfs -put - /data/

I use most of the times hdfs fuse mounts for this
So you could just do
$ cd /hdfs_mount/somewhere/
$ unzip file_in_hdfs.zip
http://www.cloudera.com/content/www/en-us/documentation/archive/cdh/4-x/4-7-1/CDH4-Installation-Guide/cdh4ig_topic_28.html
Edit 1/30/16: In case if you use hdfs ACLs: In some cases fuse mounts don't adhere to hdfs ACLs, so you'll be able to do file operations that are permitted by basic unix access privileges. See https://issues.apache.org/jira/browse/HDFS-6255, comments at the bottom that I recently asked to reopen.

To stream the data through a pipe to hadoop, you need to use the hdfs command.
cat mydatafile | hdfs dfs -put - /MY/HADOOP/FILE/PATH/FILENAME.EXTENSION

gzip use -c to read data from stdin
hadoop fs -put doesnt support read the data from stdin
I tried a lots of things and would help.I cant find the zip input support of hadoop.So it left me no choice but download the hadoop file to local fs ,unzip it and upload to hdfs again.

hadoop fs -put command

I have constructed a single-node Hadoop environment on CentOS using the Cloudera CDH repository. When I want to copy a local file to HDFS, I used the command:
sudo -u hdfs hadoop fs -put /root/MyHadoop/file1.txt /
But,the result depressed me:
put: '/root/MyHadoop/file1.txt': No such file or directory
I'm sure this file does exist.
Please help me,Thanks!

As user hdfs, do you have access rights to /root/ (in your local hdd)?. Usually you don't.
You must copy file1.txt to a place where local hdfs user has read rights before trying to copy it to HDFS.
Try:
cp /root/MyHadoop/file1.txt /tmp
chown hdfs:hdfs /tmp/file1.txt
# older versions of Hadoop
sudo -u hdfs hadoop fs -put /tmp/file1.txt /
# newer versions of Hadoop
sudo -u hdfs hdfs dfs -put /tmp/file1.txt /
--- edit:
Take a look at the cleaner roman-nikitchenko's answer bellow.

I had the same situation and here is my solution:
HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /
Advantages:
You don't need sudo.
You don't need actually appropriate local user 'hdfs' at all.
You don't need to copy anything or change permissions because of previous points.

try to create a dir in the HDFS by usig: $ hadoop fs -mkdir your_dir
and then put it into it $ hadoop fs -put /root/MyHadoop/file1.txt your_dir

Here is a command for writing df directly to hdfs file system in python script:
df.write.save('path', format='parquet', mode='append')
mode can be append | overwrite
If you want to put in in hdfs using shell use this command:
hdfs dfs -put /local_file_path_location /hadoop_file_path_location
You can then check on localhost:50070 UI for verification

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Dump data from beeline in HDFS directory - hadoop

Related

how do you create a hive warehouse directory?

Store file on hadoop

Hadoop Crontab Put

How to unzip file in hadoop?

hadoop fs -put command

Categories

Resources