Bash - String verification method - bash

I have a lot of Teradata SQL files (example code of one of this file is below).
create multiset volatile table abc_mountain_peak as(
select
a.kkpp_nip as nip,
from BM_RETABLE_BATOK.EDETON a
) with data on commit preserve rows;
create multiset table qazxsw_asd_1 as (
select
a.address_id,
from DE30T_BIOLOB.HGG994P_ABS_ADDRESS_TRE a,
) with data on commit preserve rows;
create multiset volatile table xyz_sea_depth as(
select
a.trip,
from tele_line_tryt a
) with data on commit preserve rows;
CREATE multiset table wsxzaq_zxc_2 AS (
SELECT
a.bend_data
FROM lokl_station a ,
) WITH data on commit preserve rows;
CREATE multiset table rfvbgt_ttuop_3 AS (
SELECT
a.heret_bini
FROM fvgty_blumion a ,
) WITH data on commit preserve rows;
DROP qazxsw_asd_1;
DROP wsxzaq_zxc_2;
.EXIT
What I need to do is to create a script (bash), which could verify if the multiset tables are dropped.
There are created two kinds of tables:
multiset volatile tables (which shouldn't be dropped), and
multiset tables (which must be dropped)
In my example code, 2 of 3 multiset tables are dropped (which is correct), and one of them is not (which is incorrect).
Do You have any idea how to create script which could verify something like that (give information, that one table, or some tables aren't dropped)? I am really beginner in bash. My idea (could be wrong) is to create array holding a names of the multiset tables (but not a multiset volatile tables), and later create another one table with 'drop' and the names of dropped tables, and finaly check if every table from first array is also in second array.
What do You think? Any help will be gratefully appreciate.

You can do it fairly easily by reading each line in the file, isolate the table names associated with the multiset table commands into one array (dropnames), you then isolate the table names following the DROP statements into another array (droptable). Then it is just a matter of comparing both arrays to find the table in one that is not in the other. A short script like the following will do it for you:
#!/bin/bash
declare -a tmparray ## declare array names
declare -a dropnames
declare -a droptable
volstr="multiset volatile table" ## set query strings
dropstr="multiset table"
## read all lines and collect table names
while read -r line; do
[[ $line =~ $dropstr ]] && { ## collect "multiset table" names
tmparray=( $line )
dropnames+=( ${tmparray[3]} )
}
[[ $line =~ DROP ]] && { ## collect DROP table names
tmp="${line/DROP /}"
droptable+=( ${tmp%;*} )
}
unset array
done
## compare droptable to dropnames, print missing table(s)
if [ ${#dropnames[#]} -gt ${#droptable[#]} ]; then
printf "\n The following tables are missing from DROP tables:\n\n"
for i in "${dropnames[#]}"; do
found=0
for j in "${droptable[#]}"; do
[ $i = $j ] && found=1 && continue
done
[ $found -eq 0 ] && printf " %s\n" "$i"
done
elif [ ${#dropnames[#]} -lt ${#droptable[#]} ]; then
printf "\n The following tables are missing from DROP tables:\n\n"
for i in "${droptable[#]}"; do
found=0
for j in "${dropnames[#]}"; do
[ $i = $j ] && found=1 && continue
done
[ $found -eq 0 ] && printf " %s\n" "$i"
done
fi
printf "\n"
exit 0
Output
$ bash sqlfinddrop.sh <dat/sql.dat
The following tables are missing from DROP tables:
rfvbgt_ttuop_3

I would do it in two parts using sed:
Create list of creates:
sed -ne 's/^.*create multiset \(volatile \)\?table \(\w\+\).*$/\2/Ip' INPUT FILES | sort > creates.txt
Create list of deletes:
sed -ne 's/^.*drop \(\w\+\).*$/\1/Ip' INPUT FILES | sort > drops.txt
Tables which were created and dropped:
join creates.txt drops.txt
Tables created and not dropped:
combine creates.txt not drops.txt

Related

Export results from DB2 to CSV including column names via bash

This question branches off a question already asked.
I want to make a csv file with the db2 results including column names.
EXPORT TO ...
SELECT 1 as id, 'COL1', 'COL2', 'COL3' FROM sysibm.sysdummy1
UNION ALL
(SELECT 2 as id, COL1, COL2, COL3 FROM myTable)
ORDER BY id
While this does work, I am left with an unwanted column and rows of 1 and 2's
Is there a way to do this via the db2 command or a full bash alternative without redundant columns while keeping the header at the top?
e.g.
Column 1 Column 2 Column 3
data 1 data 2 data3
... ... ...
instead of:
1 Column 1 Column 2 Column 3
2 data 1 data 2 data3
2 ... ... ...
All the answers I've seen use two separate export statements. The first generates the column headers:
db2 "EXPORT TO /tmp/header.csv of del
SELECT
SUBSTR(REPLACE(REPLACE(XMLSERIALIZE(CONTENT XMLAGG(XMLELEMENT(NAME c,colname)
ORDER BY colno) AS VARCHAR(1500)),'<C>',', '),'</C>',''),3)
FROM syscat.columns WHERE tabschema=${SCHEMA} and tabname=${TABLE}"
then the query body
db2 "EXPORT TO /tmp/body.csv of del
SELECT * FROM ${SCHEMA}.${TABLE}"
then
cat /tmp/header.csv /tmp/body.csv > ${TABLE}.csv
If you just want the headers for the extracted data and you want those headers to always be on top and you want to be able to change the names of those headers so it appears more user-friendly and put it all into a CSV file.
You can do the following:
# Creates headers and new output file
HEADERS="ID,USERNAME,EMAIL,ACCOUNT DISABLED?"
echo "$HEADERS" > "$OUTPUT_FILE"
# Gets results from database
db2 -x "select ID, USERNAME, DISABLED FROM ${SCHEMA}.USER WHERE lcase(EMAIL)=lcase('$USER_EMAIL')" | while read ID USERNAME DISABLED ;
do
# Appends result to file
echo "${ID},${USERNAME},${USER_EMAIL},${DISABLED}" >> "$OUTPUT_FILE"
done
No temporary files or merging required.
Db2 for Linux/Unix/Windows lacks a (long overdue) simple opting (to the export command) for this common requirement.
But using the bash shell you can run two separate exports (one for the column-headers, the other for the data ) and concat the results to a file via an intermediate named pipe.
Using an intermediate named pipe means you don't need two flat-file copies of the data.
It is ugly and awkward but it works.
Example fragment (you can initialize the variables to suit your environment):
mkfifo ${target_file_tmp}
(( $? != 0 )) && print "\nERROR: failed to create named pipe ${target_file_tmp}" && exit 1
db2 -v "EXPORT TO ${target_file_header} of del SELECT 'COL1', 'COL2', 'COL3' FROM sysibm.sysdummy1 "
cat ${target_file_header} ${target_file_tmp} >> ${target_file} &
(( $? > 0 )) && print "Failed to append ${target_file} . Check permissions and free space" && exit 1
db2 -v "EXPORT TO ${target_file_tmp} of del SELECT COL1, COL2, COL3 FROM myTable ORDER BY 1 "
rc=$?
(( rc == 1 )) && print "Export found no rows matching the query" && exit 1
(( rc == 2 )) && print "Export completed with warnings, your data might not be what you expect" && exit 1
(( rc > 2 )) && print "Export failed. Check the messages from export" && exit 1
This would work for your simple case
EXPORT TO ...
SELECT C1, C2, C3 FROM (
SELECT 1 as id, 'COL1' as C1, 'COL2' as C2, 'COL3' as C3 FROM sysibm.sysdummy1
UNION ALL
(SELECT 2 as id, COL1, COL2, COL3 FROM myTable)
)
ORDER BY id
Longer term, EXTERNAL TABLE support (already in Db2 Warehouse) which has the INCLUDEHEADER option is (I guess) going to appear in Db2 at some point.
I wrote a stored procedure that extracts the header via describe command. The names can be retrieved from a temporary table, and be exported to a file. The only thing it is still not possible is to concatenate the files via SQL, thus a cat to both file and redirection to another file is necessary as last step.
CALL DBA.GENERATE_HEADERS('SELECT * FORM SYSCAT.TABLES') #
EXPORT TO myfile_header OF DEL SELECT * FROM SESSION.header #
EXPORT TO myfile_body OF DEL SELECT * FORM SYSCAT.TABLES #
!cat myfile_header myfile_body > myfile #
The code of the stored procedure is at: https://gist.github.com/angoca/8a2d616cd1159e5d59eff7b82f672b72
More information at: https://angocadb2.blogspot.com/2019/11/export-headers-of-export-in-db2.html.

how to merge two data based on one

I have two data saved at .txt in a folder
data1 which is called data 1 includes of one column data as follows
from
A0A0A6YXQ7
A0A0A6YXS5
A0A0A6YXW8
A0A0A6YXX6
A0A0A6YXZ1
A0A0A6YY28
A0A0A6YY43
A0A0A6YY47
A0A0A6YY78
A0A0A6YY89
A0A0A6YY91
A0A0A7NQN9
and the second data has two columns fromand to
from to
A0A0A6YXQ7 Myo1f
A0A0A6YXW8 Pak2
A0A0A6YXX6 Arhgap15
A0A0A6YXZ1 Igtp
A0A0A6YY28 pol
A0A0A6YY47 MumuTL
A0A0A6YY78 MumuTL
A0A0A6YY78 MumuTLM
A0A0A6YY91 MumuTL
A0A0A6YY91 MumuTLM
data1 and data2 have a column named from
all strings in data1 should be in data2. if they are not.
I want to load the two data, and if the any string does not exist in the data2, I want to put it there as data1
for example, in data2 the following strings are missing
A0A0A6YXS5 and A0A0A6YY43 and A0A0A6YY89 and A0A0A7NQN9
so the output will look like this
From To
A0A0A6YXQ7 Myo1f
A0A0A6YXS5 -
A0A0A6YXW8 Pak2
A0A0A6YXX6 Arhgap15
A0A0A6YXZ1 Igtp
A0A0A6YY28 pol
A0A0A6YY43 -
A0A0A6YY47 MumuTL
A0A0A6YY78 MumuTL;MumuTLM
A0A0A6YY89 -
A0A0A6YY91 MumuTL;MumuTLM
A0A0A7NQN9 -
How about:
#!/bin/bash
declare -A hash
# scan in file2 and make a key-value(s) table
while read line; do
set -- $line
if [ -z ${hash[$1]} ]; then
hash[$1]=$2
else
hash[$1]="${hash[$1]};$2"
fi
done < data2
# read file1 as keys and print appropriate value(s)
while read line; do
if [ -z ${hash[$line]} ]; then
echo $line "-"
else
echo $line ${hash[$line]}
fi
done < data1
Note that "from" and "to" pair are accidentally properly processed.
Hope this helps.

using for loop when scripting bash shell in linux

I have this script:
#!/bin/bash
echo Id,Name,Amount,TS > lfs.csv
I want to insert values that will match the columns I created (as above in the script) , I want for example to insert: 56,"Danny",579,311413567
I want to be able to insert it using 'FOR' loop which will insert values without stopping but to change the values for each insert
More detail would be useful what you like to achieve exactly, so I made a infinite for loop which put line into the csv incremented numbers you provide by $i. ( I cannot make comment yet to ask you )
Update:
I still using a infinite loop to get a number counting up to the endless, and using a variable (u_id) to count from 1 to 100 then reset it back to 1 if it is reach 100.
#!/bin/bash
echo 'Id,Name,Amount,TS,unique_ID' > lfs.csv
u_id=0
for (( id=1 ; ;id++ ))
do
[[ $u_id == 100 ]] && (( u_id = 1 )) || (( u_id +=1 ))
echo $id",Danny_"$id","$id","$id","$u_id >> lfs.csv
done
If you like to start Amount and TS from bigger number you can do that by modifing $id to $(( id + 50000 )) like:
echo $id",Danny_"$id","$(( id + 300 ))","$(( id + 50000 ))","$u_id >> lfs.csv

Storing multiple columns of data from a file in a variable

I'm trying to read from a file the data that it contains and get 2 important pieces of data from the file and use it in a bash script. A string and then a number for example:
Box 12
Toy 85
Dog 13
Bottle 22
I was thinking I could write a while loop to loop through the file and store the data into a variable. However I need two different variables, one for the number and one for the word. How do I get them separated into two variables?
Example code:
#!/bin/bash
declare -a textarr numarr
while read -r text num;do
textarr+=("$text")
numarr+=("$num")
done <file
echo ${textarr[1]} ${numarr[1]} #will print Toy 85
data are stored into two array variables: textarr numarr.
You can access each one of them using index ${textarr[$index]} or all of them at once with ${textarr[#]}
To read all the data into a single associative array (in bash 4.0 or newer):
#!/bin/bash
declare -A data=( )
while read -r key value; do
data[$key]=$value
done <file
With that done, you can retrieve a value by key efficiently:
echo "${data[Box]}"
...or iterate over all keys:
for key in "${!data[#]}"; do
value=${data[$key]}
echo "Key $key has value $value"
done
You'll note that read takes multiple names on its argument list. When given more than one argument, it splits fields by IFS, putting columns into their respective variables (with the entire rest of the line going into the last variable named, if more columns exist than variables are named).
Here I provide my own solution which should be discussed. I am not sure this is a good solution or not. Using while read construct has the drawback of starting a new shell and it will not be able to update a variable outside the loop. Here is an example code which you can modify to suite your own need. If you have more column data to use, then slight adjustment is need.
#!/bin/sh
res=$(awk 'BEGIN{OFS=" "}{print $2, $3 }' mytabularfile.tab)
n=0
for x in $res; do
row=$(expr $n / 2)
col=$(expr $n % 2)
#echo "row: $row column: $col value: $x"
if [ $col -eq 0 ]; then
if [ $n -gt 0 ]; then
echo "row: $row "
echo col1=$col1 col2=$col2
fi
col1=$x
else
col2=$x
fi
n=$(expr $n + 1)
done
row=$(expr $row + 1)
echo "last row: $row col1=$col1 col2=$col2"

Merging rows in .csv in order

After analysis of brain scans I ended up with around 1000 .csv files, one for each scan. I've merged them into one in order (by subject ID and date). My problem is, that some subjects had two or more consecutive scans and some had only one. Database now looks like that:
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
...
But I want it to look like that:
ID, CC_area, CC_perimeter, CC_circularity, CC_area2, CC_perimeter2, CC_circularity2, CC_area3, CC_perimeter3, CC_circularity3, ..., CC_circularity6
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420, ... ,
024_S_1063, 544.50, 214.34, 0.148939,,,,,, ...,
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269,,, ... ,
...
What is important, that order of data must not be changed and number of rows for one ID is not known (it varies from 1 to 6). (So first columns of scan 1, then scan 2 etc.). Could you help me, or provide, with solution for that using bash? I am not experienced in programming and I have lost hope, that I could do it myself.
You can combine the line with the same filename (or initial index) using a normal while read loop and then acting on 3 conditions. (1) whether it is the first line following the header; (2) where the current index is equal to the last; and (3) where the current index differs from the last. There are a number of ways to approach this, but a short bash script could look like the following:
#!/bin/bash
fn="${1:-/dev/stdin}" ## accept filename or stdin
[ -r "$fn" ] || { ## validate file is readable
printf "error: file not found: '%s'\n" "$fn"
exit 1
}
declare -i cnt=0 ## flag for 1st iteration
while read -r line; do ## for each line in file
## read header, print & continue
[ ${line//,*/} = ID ] && printf "%s\n" "$line" && continue
line="${line// */}" ## strip //first scan of A....
idx=${line//,*/} ## parse file index from line
line="${line#*, }" ## strip index
if [ $cnt -eq 0 ]; then ## if first line - print
printf "%s, %s" "$idx" "$line"
((cnt++))
elif [ $idx = $lidx ]; then ## if indexes equal, append
printf ", %s" "$line"
else ## else, newline & print
printf "\n%s, %s" "$idx" "$line"
fi
last="$line" ## save last line
lidx=$idx ## save last index
done <"$fn"
printf "\n"
Input
$ cat dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
Output
$ bash cmbcsv.sh dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420
024_S_1063, 544.50, 214.34, 0.148939
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269
Note: I didn't know whether you needed all the additional commas or ellipses or if they were just there to show there could be more of the same index (e.g. ,,...,). You can easily add them if need be.
well if you know which scan belongs to which person you can add an extra column like patient name or id, but I guess that's if you have that original info of how much scans per person

Resources