Saving hive queries - hadoop

I need to know how we can store a query I have written in a command line just like we do in sql(we use ctrl+S in sql server).
I heared hive QL queries use .q or .hql extension.Is there any possibility I save my query to get the same by saving list of commands I am executing.

sure whatever ide you use you can just save your file as myfile.q and then run it from the command line as
hive -f myfile.q
You can also do
hive -f myfile.q > myfileResults.log
if you want to pipe your results into a log file.

Create a new file using "cat" command(You can even use editor).Write all the queries you want to perform inside the file
$cat > MyQueries.hql
query1
query2
.
.
Ctrl+D
Note: .hql or .q is not necessary. It is just for our reference to identify that it is a hive query(file).
You can execute all the queries inside the file at a time using
$hive -f MyQueries.hql

You can use hue or web interface to access hive instead of terminal. It will provide you UI from where you can write and execute queries. Solves copy problem too.
http://gethue.com/
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface

Related

Uploading multiple CSVs from local source to GBQ using command line

I'm putting together a for loop on Google Cloud SDK Shell that will upload every CSV from the current directory (on my local computer) to a separate Google BigQuery table, all in the same dataset. Also, I want the created tables in GBQ to have the same name of their corresponding CSV files (except the .csv part).
I was actually able to do all that using the following command line, expect that it appends all CSVs in the same table not in separate tables.
for %d in (*.csv); do set var1=%d & bq load --autodetect --source_format=CSV "DatasetName.%var1:~0,-5%" %d
Hint: it seems to me that the variable "var1" gets updated in each loop, but the bq load function doesn't use the updated values, it keeps the same original value until the loop ends regardless.
Current Output:
Even though I was not able to reproduce a BigQuery load from my local environment to BigQuery. I was capable to reproduce this case uploading .csv files from Google Shell to BigQuery.
I tried running your code but my attempts were unsuccessful. Thus, I created the following bash script to map and upload all the .csv files to BigQuery with a bq load command, described here.
#!/bin/bash
echo "Starting the script"
for i in *.csv;
do
echo ${i%.csv} " loading";
bq load --autodetect --source_format=CSV project_id:dataset.Table_${i%.csv} ./$i;
echo ${i%.csv} " was loaded"
done
Notice that, the script maps only the .csv files within the directory it is located. Also, ${i%.csv} returns only the filename without the extension, which is used to name the destination table. On the other hand, $i returns the whole filename including the .csv, so it is used to point to the source file in the bq load command.
About the bq command, the --autodetect flag was used in order to auto detect the schema of each table.
Furthermore, since this load job is from a local data source, it is necessary to specify the project id in the destination's table path, here: project_id:dataset.Table_${i%.csv}.
As a bonus information, you can also upload you data to a Google Cloud Bucket and upload all the files to BigQuery using wild cards, Python script with a loop or Dataflow( streaming or batch) depending on you needs.

How to create BiqQuery view from SQL source in a file (Windows command line)

I created a number of BigQuery views and all works well. I need to move the SQL source for the queries into my source control and manage changes from there. Is there a way to create/update a view from the command line using the source from a file? The bq mk command seems to only allow the SQL code to be inline on the command line --view keyword. Some of my views are quite lengthy and I'm sure there are characters that would need to be escaped - which I obviously don't want to get into. I'm running on Windows. Thanks
Simply use the flagfile parameter:
bq mk --help:
--flagfile: Insert flag definitions from the given file into the command line.
bq mk --view --flagfile=<path_to_to_your_file> dataset.newview
Let us assume that file MyQuery.sql contains view definition.
Create a Script file script.sh with below contents
query=`cat MyQuery.sql`
bq mk --use_legacy_sql=false --view "$query" dataset.myview
Run using command sh script.sh
This worked for me in Shell..!! You can make necessary changes for Windows..!!

How can I run a series of queries/commands at once in Vertica?

Currently I am using Vertica on Ubuntu from the terminal as dbadmin. I am using admintools to connect a database and then executing queries like Create Table, Select, Insert in the terminal.
Is there any way I can write the commands in any external text file and execute all the query at once? Like for Oracle, We can create a SQL file in Notepad++ and then run all the queries in the database.
Not only can you use scripts, but it's a good practice for any operation that you might need to repeat.
From the vsql prompt, use the \i command to run a script:
vsql> \i create_tables.sql
From outside the vsql prompt, you can invoke vsql with -f filename.
File paths, if not absolute, are relative to the current working directory.

Writing hive Query output to a HDFS file

I have tested writing hive query output to a file by executing hive queries inside shell script using hive –e and hive –f options. when i tried executing the shell script from putty it is working fine, however in the hue box from oozie workflow the same shell script is not writing any results to local file.
Using Insert overwrite directory I can directly write hive query output to a directory inside HDFS however for each query it creates a new directory so I can not use this option.
Please suggest me any alternative option to write multiple hive query output to a single file by executing shell script from oozie workflow.
Thanks in advance.
When running shell action via Oozie workflow, it will run on any of the datanodes. check the output path is present in the datanode

Strange issue running hiveql using -e option from .sh file

I have checked Stackoverflow but could not find any help and that is the reason i m posting a new question.
Issue is related executing hiveql using -e option from .sh file.
If i run hive as $ bin/hive everything works fine and properly all databases and tables are displayed.
If i run hive as $ ./hive OR $ hive (as set in path variable) OR $HIVE_HOME/bin/hive only default database is displayed that too without any table information.
I m learning hive and trying to execute hive command using $HIVE_HOME/bin/hive -e from .sh file but it always give database not found.
So i understand that it is something related to reading of metadata but i m not able to understand why this kind of behavior.
However hadoop commands work fine from anywhere.
Below is one command i m trying to execute from .sh file
$HIVE_HOME/bin/hive -e 'LOAD DATA INPATH hdfs://myhost:8040/user/hduser/sample_table INTO TABLE rajen.sample_table'
Information:
I m using hive-0.13.0, hadoop-1.2.1
Can anybody pl explain me how to solve this or how to overcome this issue?
can you correct the query first, hive expect load statement path should be followed by quotes.
try this first from shell- HIVE_HOME/bin/hive -e "LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table"
or put your command in test.hql file and test $hive -f test.hql
--test.hql
LOAD DATA INPATH '/user/hduser/sample_table' INTO TABLE rajen.sample_table
I finally was able to fix the issue.
Issue was that i have kept the default derby setup of hive metadatastore_db , so from where ever i used to trigger hive -e command, it used to create a new copy of metadata_db copy.
So i created metadata store in mysql which became global and so now from where ever i trigger hive -e command, same metadata store db was being used.

Resources