get the table names from file using UNIX Script - shell

I am having a sample file as given below. This is an SQL Loader control file:
LOAD DATA
APPEND
INTO TABLE XXWIN_TMP_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
APPEND
INTO TABLE XXWIN_SQL_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
APPEND
INTO TABLE XXWIN_SQL_LOADER_TAB
( seq POSITION(1:10) INTEGER EXTERNAL
,h_record POSITION(11:20) CHAR
,h_file_name POSITION(21:55) CHAR
)
I would like to select any number of table names occurring in the file which are starting with 'XX_' and ending with '_TAB' and store it into an array using an UNIX script.
Please advice.
Thanks,
Arun

If the file syntax is not changing (the table names start with XX, not XX_):
tnames=`grep -o "TABLE XX[^ ].*_TAB" <file_name> | sed 's/TABLE //g'`
for tn in $tnames; do echo $tn; done
Change the <file_name> to the name of the file.

You don't say which shell, but since sh doesn't support arrays I'm assuming Bash.
tables=($(sed -n '/TABLE /s/TABLE \(XX[^ ]*TAB\) *$/\1/p' inputfile))
for table in ${tables[#]}
do
echo "$table"
done

Related

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

Replace array of string ( passed as argument to script ) replace those values in a HQL file using Bash shell script?

I have a script which accepts 3 arguments $1 $2 $3
but $3 is an array like ("2018" "01")
so I am executing my script as :
sh script.sh Employee IT "2018 01"
and there an HQL file ( emp.hql) in which I want to replace my partition columns with the array passed like below :
***"select deptid , employee_name from {TBL_NM} where year={par_col[i]} and month={par_col[i]}"***
so below is the code I have tried :
**Table=$1
dept=$2
Par_cols=($3)
for i in "${par_cols[#]}" ;do
sed -i "/${par_col[i]}/${par_col[i]}/g" /home/hk/emp.hql**
done
Error :
*sed: -e experssion #1 , char 0: no previous regular expression*
*sed: -e experssion #2 , char 0: no previous regular expression*
But I think logic to replace partition columns is wrong , could you please help me in this?
Desired Output in HQL file :
select deptid ,employee_name from employee where year=2018 and month=01
Little bit related to below like :
Shell script to find, search and replace array of strings in a file

bash: separate blocks of lines between pattern x and y

I have a similar question to this one Sed/Awk - pull lines between pattern x and y, however, in my case I want to output each block-of-lines to individual files (named after the first pattern).
Input example:
-- filename: query1.sql
-- sql comments goes here or else where
select * from table1
where id=123;
-- eof
-- filename: query2.sql
insert into table1
(id, date) values (1, sysdate);
-- eof
I want the bash script to generate 2 files: query1.sql and query2.sql with the following content:
query1.sql:
-- sql comments goes here or else where
select * from table1
where id=123;
query2.sql:
insert into table1
(id, date) values (1, sysdate);
Thank you
awk '/-- filename/{if(f)close(f); f=$3;next} !/eof/&&/./{print $0 >> f}' input
Brief explanation,
-- filename{if(f)close(f); f=$3;next}: locate the record contains filename, and assign it to f
!/eof/&&/./{print $0 >> f}: if following lines don't contain 'eof' neither empty, save it to the corresponding file.
This might work for you (GNU sed):
sed -r '/-- filename: (\S+)/!d;s##/&/,/-- eof/{//d;w \1#p;s/.*/}/p;d' file |
sed -nf - file
Create a sed script from the input file and run it against the input file
N.B. Two lines are needed for each query as the program for the query must be surrounded by braces and the w command must end in a newline.
Using GNU awk to handle multiple open files for you:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{out=$3; f=1}' file
or with any awk:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{close(out); out=$3; f=1}' file

command line arguments in hive ( .hql) files from a bash script

I am having a main bash script running several other bash scripts and hql files. The hql files have hive queries. The hive queries have a where clause and it is on the date field. I am trying to automate a process and I need the where clause to change based on todays date ( which is obtained from the main bash script).
For example the .hql file looks like this:
This is selectrows.hql
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '2015-11-02 00:00:00' AND origintime < '2015-11-03 00:00:00';
Since today is 2015-11-11, i want to be able to pass the date - 9 days and date-8 days to the .hql script from the bash script. Is there a way to pass these two variable from the bash script to the .hql file.
So the main bash script looks like this:
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded=`date -d "$prodate - 8 days" +%Y-%m-%d`
echo $dateneeded
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -f /home/automation/tv/selectrows.hql
echo "created table"
thanks in advance.
You can use beeline -e option to execute queries using strings. Then pass the date parameters to the strings.
#!/bin/bash
# today's date
prodate=`date +%Y-%m-%d`
echo $prodate
dateneeded8=`date -d "$prodate - 8 days" +%Y-%m-%d`
dateneeded9=`date -d "$prodate - 9 days" +%Y-%m-%d`
echo $dateneeded8
echo $dateneeded9
hql="
DROP TABLE IF EXISTS tv.events_tmp;
CREATE TABLE tv.events_tmp
( origintime STRING,
deviceid STRING,
clienttype STRING,
loaddate STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u0001'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'hdfs://nameservice1/data/full/events_tmp';
INSERT INTO TABLE tv.events_tmp SELECT origintime, deviceid, clienttype, loaddate FROM tv.events_tmp WHERE origintime >= '"
echo "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
# CREATE temp table
beeline -u 'jdbc:hive2://datanode:10000/;principal=hive/datanode#HADOOP.INT.BELL.CA' -d org.apache.hive.jdbc.HiveDriver -e "$hql""$dateneeded9""' AND origintime < '""$dateneeded8""';"
echo "created table"
An alternate way to pass an argument
create hive .hql file with defined variables
vi multi_var_file.hql
SELECT * FROM TEST_DB.TEST_TB WHERE TEST1='${var_1}' AND TEST2='${var_2}';
Pass the same variables into the Hive script to run
hive -hivevar var_1='TEST1' -hivevar var_2='TEST2' -f multi_var_file.hql

How to check the string length using grep?

I have a lot of Teradata SQL files. The example file look like below:
create multiset volatile table abcde_fghijk_lmnop as(
select
a.oppnl3_budssstr as nip,
from T45_BACKJJU_33KUT.BRANDFO9 a
) with data on commit preserve rows;
create multiset volatile table mari_lee as(
select
b.getter3,
from maleno_fugi75_pratq b
) with data on commit preserve rows;
create multiset table blabla1 as (
select
a.atomic94,
from b4ty7_manto.pretyu59_bxcx a
) with data on commit preserve rows;
CREATE multiset table blablabla2 AS (
SELECT
a.prompter_to12
FROM tresh_old44 a
) WITH data on commit preserve rows;
CREATE multiset table blablablabla3 AS (
SELECT
c.future_opt86
FROM GFTY_133URO c
) WITH data on commit preserve rows;
I want to create a grep method which can count the length of the table name, which can't exceed 10 signs.
I have created a few greps, but none of them work, and I don't know why. What I have done wrong?
for f in /path/to/sql/files/*.sql; do
if grep -q ' table \{1,10\}' "$f"; then
echo "correct length of table name $f"
fi
done
other greps which I used:
if grep -q ' table \{1,10\} as ' "$f"; then
if grep -q ' table \[[:alnum:]]\{1,10\} ' "$f"; then
if grep -q ' table\[[:space:]][[:alnum:]]\{1,10\} ' $f; then
There are a couple of problems with your attempts. Firstly, it looks like you're escaping the [ in some of your bracket expressions, which means that the [ will be interpreted as a literal character instead. Secondly, you need to take care to match 1 to 10 legal characters, followed by a different character.
This pattern does what you want (I removed the -q so that you can see which table definitions match):
$ grep ' table [[:alnum:]_]\{1,10\}[^[:alnum:]_]' file
create multiset volatile table mari_lee as(
create multiset table blabla1 as (
CREATE multiset table blablabla2 AS (
This pattern matches 1 to 10 alphanumeric characters or underscores, followed by a different character, meaning that the longer table names no longer match.
As it appears that the casing is inconsistent, you should probably also use the -i switch to grep, to enable case-insensitive matching. Otherwise, any definitions that use "TABLE" would not match.
Use grep with word boundary to list only valid table names:
grep -E 'table +.{1,10}\b' "$f"
create multiset volatile table mari_lee as(
create multiset table blabla1 as (
CREATE multiset table blablabla2 AS (
To suppress output use -q and check return status:
grep -qE 'table +.{1,10}\b' "$f"

Resources