Running a number of hive queries and writing output to file

Running a number of hive queries and writing output to file - bash

I'm trying to make use of the DESCRIBE function via Hive to output the column descriptions of each of the tables out to individual files. I've discovered the -f option so I can just read from a file and write the output back out:
hive -f nameOfSqlQueryFile.sql > out.txt
However, if I open the output file, it throws all the descriptions back to back and it's unclear where one description starts for a table and where it ends.
So, I've tried making a batch file that uses -e to describe each of the tables individually and output to a file:
#!/bin/bash
nameArr=( $(hive -e 'show tables;') )
count=0
for i in "${nameArr[#]}"
do
echo 'Working on table('$count'): '$i
hive -e 'describe '$i > $i'_.txt';
count=$(($count+1))
done
However, because this needs to reconnect for each query, it's remarkably slow, taking hours to process several hundred queries.
Does anyone have an idea of how else I might run each of these DESCRIBE functions, and ideally output to separate files?

You can probably use one of these, depending on how you process the output:
Just use the OK line as a separator and search for it using a script.
Use DESCRIBE EXTENDED which adds a line at the end with info on the table, including its location, which can be used to extract the table name (using sed, for example)
If you're just using the output file as a manual reference, insert a SQL statement that prints a separator of your choice between each table, e.g.:
DESCRIBE table;
SELECT '-----------------' FROM table;

Related

Read many values from one variable in config file. I want with ksh connect oracle db and make query

Hi all,
I need write script who reads variables from config files and connect
to Oracle db and after than do some queries and save results in
logfile.
To connect db in my file is line
sqlplus $Username/$Userpassword#Enviroment
than I have function who execute one query and result save in logfile.
While I have only one Enviroment its all work fine, but I need upgrade
this script who takes enviroment values from config file and run in
loop.
In config file I have information like this
Username=user Userpassword=pass Enviroments=Env1,Env2,Env3,EnvX
Problem is that skript read Enviroment=Env1,Env2Env3,EnvX like one
value and don't split like ... Enviroment=Env1 and connect db and do
some jobs
Enviroment=Env2 and connect db and do some jobs ....
I have for loop who looks like this
for Test in $(echo $Enviroment | sed "s/,/ /g") do echo "do some job"
done
with this function i split enviroment names. Programm understand after
" , " starts next variable.
Its works but only in script end. In program begining where program
read $Enviroment variable takes all like one value and there starts
errors.. I can't figured how to put this function into my script and
write all in big loop
I hope you understand what problem I have.
I expected loop who reeds variables from config file and if there is lot of environments this loop take one by one variable value and put in code.

As rtoijala said, you should always provide a minimal reproducible example. That said, using what I understood from your text I've made that code. Hopefully it might give you some pointers.
Using a configuration file as :
Username=user1 Userpassword=pass1 Enviroments=Env11,Env12,Env13,Env1X
Username=user2 Userpassword=pass2 Enviroments=Env21,Env22,Env23
Username=user3 Userpassword=pass3 Enviroments=Env31,Env32
This code :
#!/bin/ksh
while read -r a b c; do
locUser=${a#*=}
locPasswd=${b#*=}
locEnv=${c#*=}
arEnv=( ${locEnv//,/ } )
for i in ${!arEnv[#]}; do
print "sqlplus $locUser/$locPasswd#${arEnv[$i]}"
done
done < ./config.txt
will produce :
sqlplus user1/pass1#Env11
sqlplus user1/pass1#Env12
sqlplus user1/pass1#Env13
sqlplus user1/pass1#Env1X
sqlplus user2/pass2#Env21
sqlplus user2/pass2#Env22
sqlplus user2/pass2#Env23
sqlplus user3/pass3#Env31
sqlplus user3/pass3#Env32

kickstart five concurrent processes in bash

I have a folder named datafolder which contains five csv files aa.csv ab.csv ac.csv ad.csv ae.csv Each csv file contains data from an excel sheet in the format: date, product type, name, address etc. and I am only interested in the second column which is named product. Basically what I want to happen is for the jobmaster script to count the number of files in datafolder and then to start a map process for each individual file. I have the following scripts:
The jobmaster script runs without problems, however once the map script starts, only the first echo mapping $1 is displaying and the process is stuck in an infinite loop (my guess). When I run the ps command I expect to see 5 map.sh running, however there are none.

I suspect you missed an input redirection in map.sh:
file=$1
echo "mapping $file"
while IFS="," read -r value1 product remainder; do
# ...
done < "$file"
# ^^^^^ provide the standard input to from this file to `read`

Pre-pending and appending to a shell variable

My goal is to load an external tables log file into a CLOB column in an oracle database. I've been having issues with the max size you can insert at once but I am able to insert the whole file if I to_clob each line of the log file, concatenate and then insert them (as far as I'm aware this seems to be the quickest and easiest way?):
insert into clob_insert_test values (to_clob('hfsdjhfjsdhfjksd')||chr(10)||to_clob('jhfklsdjfklsdjklfjdsjlk'));
My question is:
I'm reading the file into a shell variable as below so what I need to do is pre-pend to_clob(' to the beginning of each line of the variable and then append ')||chr(10)|| and remove the last ||chr(10)|| from the variable to finish. I can then use that variable in the SQL insert statement for the clob column. Is there a way I can directly do this on the variable rather than modifying the log file before reading it in?
log_content=$(<"$log_file")
Edit:
Sorry I don't think I was clear. Given the example log file I would expect the following variable contents.
Input file:
LOG file opened at 05/05/15 15:12:24
Field Definitions for table ext_loading
Record format DELIMITED BY NEWLINE
Variable contents:
to_clob('LOG file opened at 05/05/15 15:12:24')||char(10)||to_clob('Field Definitions for table ext_loading')||char(10)||to_clob('Record format DELIMITED BY NEWLINE')

I assume you have a file like:
this is me||chr(10)||adfasdf
asdas||chr(10)||asdfasdfasdas
And you want it to become something like:
to_clob('this is meadfasdf')||chr(10)||
to_clob('asdasasdfasdfasdas')||chr(10)||
If so, you can use sed like this:
sed -e "s/||chr(10)||//" -e "s/^/to_clob('/" -e "s/$/')||chr(10)||/" file
That is:
remove ||chr(10)|| once from each line.
add to_clob(' to the begining of each line.
add ')||chr(10)|| to the end of each line.
And to store it in a variable:
log_content=$(sed -e "s/||chr(10)||//" -e "s/^/to_clob('/" -e "s/$/')||chr(10)||/" "$log_file")
Update
To match what you really need, you can also do this:
line=$(sed -e "/./s/^/to_clob('/" -e "/./s/$/')||chr(10)||/" "$log_file")
Then the output is:
$ echo $line # note, without quotes to have all of it together!
to_clob('LOG file opened at 05/05/15 15:12:24')||chr(10)|| to_clob('Field Definitions for table ext_loading')||chr(10)|| to_clob('Record format DELIMITED BY NEWLINE')||chr(10)||
And remove the last ||chr(10)|| with:
$ echo $line | sed 's/||chr(10)||$//'
to_clob('LOG file opened at 05/05/15 15:12:24')||chr(10)|| to_clob('Field Definitions for table ext_loading')||chr(10)|| to_clob('Record format DELIMITED BY NEWLINE')

Grep -f and only return the first match

I'm working with a large CSV that follows a basic process.
Backup the working original
Generate a skeleton CSV
Read from another CSV, format the contents, and then append it to the skeleton
Append the data from the backup to the new one.
The issue I'm running into is that when I read in the contents from the backup, I'm using grep -Ev -f with a file containing regexes to exclude undesired data from the backup to be included in the next revision. This currently presents a problem because grep appears to evaluate each regex in the file against every line from STDIN which will cause duplicates. The simple solution would be to simply pipe it through sort | uniq and call it a day but that will screw with the formatting of the csv currently in use. I can elaborate if needed but the short of it is I run a script to bulk process IP addresses but there is also manual editing of the file by other people and with the current form of the script the final output will be all of the automated content with manual entries being at the bottom of the file.
So, is there anyway without some ugly looping of grep to tell it to stop evaluating a line after a pattern is matched? Using -m 1 will stop grep after the first match in the whole stream where I need it stop after each new line.

For the task you want to accomplish. It would be best in my opinion to use AWK. You can find an excellent tutorial for AWK at : http://www.grymoire.com/Unix/Awk.html. You basically need to change the input field separator for awk with
awk -f',' foo.awk bar.dat
As far as the problem with sorting is concerned follow this : http://www.linuxquestions.org/questions/linux-general-1/how-to-use-awk-to-sort-243177/

Bash script to add new directories into a PostgreSQL table

I'm trying to write a script which lists a directory and creates an SQL script to insert these directories, problem is I only want to insert new directories, here is what I have so far:
#If file doesn't exist add the search path test
if [ ! -e /home/aydin/movies.sql ]
then
echo "SET SEARCH_PATH TO noti_test;" >> /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
#for each directory escape any single quotes
movie=$(echo $i | sed "s:':\\\':g" )
#build sql insert string
insertString="INSERT INTO movies (movie) VALUES (E'$movie');";
#if sql string exists in file already
if grep -Fxq "$insertString" /home/aydin/movies.sql
then
#comment out string
sed -i "s/$insertString/--$insertString/g" /home/aydin/movies.sql
else
#add sql string
echo $insertString >> /home/aydin/movies.sql;
fi
done;
#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;
It seems to work apart from one thing, the script doesn't recognise entries with single quotes in them, so upon running the script again with no new dirs, this is what the file looks like:
--INSERT INTO movies (movie) VALUES (E'007, Moonraker (1979)');
--INSERT INTO movies (movie) VALUES (E'007, Octopussy (1983)');
INSERT INTO movies (movie) VALUES (E'007, On Her Majesty\'s Secret Service (1969)');
I'm open to suggestions on a better way to do this also, my process seems pretty elongated and inefficient :)

Script looks generally good to me. Consider the revised version (untested):
#! /bin/bash
#If file doesn't exist add the search path test
if [ ! -e /home/aydin/movies.sql ]
then
echo 'SET search_path=noti_test;' > /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
#build sql insert string - single quotes work fine inside dollar-quoting
insertString="INSERT INTO movies (movie) SELECT \$x\$$movie\$x\$
WHERE NOT EXISTS (SELECT 1 FROM movies WHERE movie = \$x\$$movie\$x\$);"
#no need for grep. SQL is self-contained.
echo $insertString >> /home/aydin/movies.sql
done
#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;
To start a new file, use > instead of >>
Use single quotes ' for string constants without variables to expand
Use PostgreSQL dollar-quoting so you don't have to worry about single-quotes in the strings. You'll have to escape the $ character in the shell to remove its special meaning in the shell.
Use an "impossible" string for the dollar-quote, so it cannot appear in the string. If you don't have one, you can test for the quote-string and alter it in the unlikely case it should be matched, to be absolutely sure.
Use SELECT .. WHERE NOT EXISTS for the INSERT to automatically prevent already existing entries to be re-inserted. This prevents duplicate entries in the table completely - not just among the new entries.
An index on movies.movie (possibly, but not necessarily UNIQUE) would speed up the INSERTs.

Why bother with grep and sed and not just let the database detect duplicates?
Add a unique index on movie and create a new (temporary) insert script on each run and then execute it with autocommit (default) or with the -v ON_ERROR_ROLLBACK=1 option of psql. To get a full insert script of your movie database dump it with the --column-inserts option of pg_dump.
Hope this helps.

There's utility daemon called incron, which will fire your script whenever some file is written in watched directory. It uses kernel events, no loops - Linux only.
In its config (full file path):
/media/htpc IN_CLOSE_WRITE /home/aydin/added.sh $#/$#
Then simplest adder.sh script without any param check:
#!/bin/bash
cat <<-EOsql | psql -U "aydin.hassan" -d "aydin_1.0"
INSERT INTO movies (movie) VALUES (E'$1');
EOsql
You can have thousands of files in one directory and no issue as you can face with your original script.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Running a number of hive queries and writing output to file - bash

Related

Read many values from one variable in config file. I want with ksh connect oracle db and make query

kickstart five concurrent processes in bash

Pre-pending and appending to a shell variable

Grep -f and only return the first match

Bash script to add new directories into a PostgreSQL table

Categories

Resources