Strange issue running hiveql using -e option from .sh file - shell

I have checked Stackoverflow but could not find any help and that is the reason i m posting a new question.
Issue is related executing hiveql using -e option from .sh file.
If i run hive as $ bin/hive everything works fine and properly all databases and tables are displayed.
If i run hive as $ ./hive OR $ hive (as set in path variable) OR $HIVE_HOME/bin/hive only default database is displayed that too without any table information.
I m learning hive and trying to execute hive command using $HIVE_HOME/bin/hive -e from .sh file but it always give database not found.
So i understand that it is something related to reading of metadata but i m not able to understand why this kind of behavior.
However hadoop commands work fine from anywhere.
Below is one command i m trying to execute from .sh file
$HIVE_HOME/bin/hive -e 'LOAD DATA INPATH hdfs://myhost:8040/user/hduser/sample_table INTO TABLE rajen.sample_table'
Information:
I m using hive-0.13.0, hadoop-1.2.1
Can anybody pl explain me how to solve this or how to overcome this issue?

can you correct the query first, hive expect load statement path should be followed by quotes.
try this first from shell- HIVE_HOME/bin/hive -e "LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table"
or put your command in test.hql file and test $hive -f test.hql
--test.hql
LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table

I finally was able to fix the issue.
Issue was that i have kept the default derby setup of hive metadatastore_db , so from where ever i used to trigger hive -e command, it used to create a new copy of metadata_db copy.
So i created metadata store in mysql which became global and so now from where ever i trigger hive -e command, same metadata store db was being used.

Related

Copy large datasets from Hive to local directory

Im trying to copy data from a hive table to my local dir.
The code that I am using is:
nohup hive -e "set hive.cli.print.header=true; set hive.resultset.use.unique.column.names=false; select * from sample_table;" | sed 's/[\t]/|/g' > /home/sample.txt &
The issue is the file will be around 400 GB and the process takes forever to complete.
Is there any better way to do it, like compressing the file as it is being generated.
I need to have the data as .txt file but im not able to get a quick work around for this problem.
Any smart ideas would be really helpful.
Have you tried doing it with the -getmerge option of the hadoop command? That'd typically what I use to merge Hive text tables and export to a local share drive.
hadoop fs -getmerge ${SOURCE_DIR}/table_name ${DEST_DIR}/table_name.txt
I think the sed command would also be slowing things down significantly. If you do the character replacement in Hive prior to extracting the data, that would be faster than a single-threaded sed command running on your edge node.

How can I run a series of queries/commands at once in Vertica?

Currently I am using Vertica on Ubuntu from the terminal as dbadmin. I am using admintools to connect a database and then executing queries like Create Table, Select, Insert in the terminal.
Is there any way I can write the commands in any external text file and execute all the query at once? Like for Oracle, We can create a SQL file in Notepad++ and then run all the queries in the database.
Not only can you use scripts, but it's a good practice for any operation that you might need to repeat.
From the vsql prompt, use the \i command to run a script:
vsql> \i create_tables.sql
From outside the vsql prompt, you can invoke vsql with -f filename.
File paths, if not absolute, are relative to the current working directory.

Ambari- Import multiple files to Hive

I have a python script that generates schemas, drop table and load table commands for files in a directory that I want to import into Hive. I can then run these in Ambari to import files. Multiple 'create table' commands can be executed, but when uploading files to import into their respective Hive tables, I can only upload one file at a time.
Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
I have also tried importing files to HDFS with the aim of then sending them to Hive via Linux using 'hdfs dfs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs' commands, but errors such as 'no such directory' crop up with regards to 'hadoop/hdfs'. I have tried changing permissions using chmod, but these don't seem to be effective either.
I would be very grateful if anyone could tell me which route would be better to pursue with regards to efficiently importing multiple files into their respective tables in Hive.
1) Is there a way to perhaps put these commands in a file and execute them all at once so that all tables are created and the relevant files are subsequently uploaded to their respective tables?
You can give all the queries in a .hql file, something like test.hql and run hive -f test.hql to execute all command in one shot
2) errors such as 'no such directory'
give hadoop fs -mkdir -p /hadoop/hdfs and then type hadoop fs -copyFromLocal /home/ixroot/Documents/ImportToHDFS /hadoop/hdfs
Edit: for permission
hadoop fs -chmod -R 777 /user/ixroot

Writing hive Query output to a HDFS file

I have tested writing hive query output to a file by executing hive queries inside shell script using hive –e and hive –f options. when i tried executing the shell script from putty it is working fine, however in the hue box from oozie workflow the same shell script is not writing any results to local file.
Using Insert overwrite directory I can directly write hive query output to a directory inside HDFS however for each query it creates a new directory so I can not use this option.
Please suggest me any alternative option to write multiple hive query output to a single file by executing shell script from oozie workflow.
Thanks in advance.
When running shell action via Oozie workflow, it will run on any of the datanodes. check the output path is present in the datanode

Saving hive queries

I need to know how we can store a query I have written in a command line just like we do in sql(we use ctrl+S in sql server).
I heared hive QL queries use .q or .hql extension.Is there any possibility I save my query to get the same by saving list of commands I am executing.
sure whatever ide you use you can just save your file as myfile.q and then run it from the command line as
hive -f myfile.q
You can also do
hive -f myfile.q > myfileResults.log
if you want to pipe your results into a log file.
Create a new file using "cat" command(You can even use editor).Write all the queries you want to perform inside the file
$cat > MyQueries.hql
query1
query2
.
.
Ctrl+D
Note: .hql or .q is not necessary. It is just for our reference to identify that it is a hive query(file).
You can execute all the queries inside the file at a time using
$hive -f MyQueries.hql
You can use hue or web interface to access hive instead of terminal. It will provide you UI from where you can write and execute queries. Solves copy problem too.
http://gethue.com/
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface

Resources