Export results from DB2 to CSV including column names via bash

Export results from DB2 to CSV including column names via bash - bash

This question branches off a question already asked.
I want to make a csv file with the db2 results including column names.
EXPORT TO ...
SELECT 1 as id, 'COL1', 'COL2', 'COL3' FROM sysibm.sysdummy1
UNION ALL
(SELECT 2 as id, COL1, COL2, COL3 FROM myTable)
ORDER BY id
While this does work, I am left with an unwanted column and rows of 1 and 2's
Is there a way to do this via the db2 command or a full bash alternative without redundant columns while keeping the header at the top?
e.g.
Column 1 Column 2 Column 3
data 1 data 2 data3
... ... ...
instead of:
1 Column 1 Column 2 Column 3
2 data 1 data 2 data3
2 ... ... ...

All the answers I've seen use two separate export statements. The first generates the column headers:
db2 "EXPORT TO /tmp/header.csv of del
SELECT
SUBSTR(REPLACE(REPLACE(XMLSERIALIZE(CONTENT XMLAGG(XMLELEMENT(NAME c,colname)
ORDER BY colno) AS VARCHAR(1500)),'<C>',', '),'</C>',''),3)
FROM syscat.columns WHERE tabschema=${SCHEMA} and tabname=${TABLE}"
then the query body
db2 "EXPORT TO /tmp/body.csv of del
SELECT * FROM ${SCHEMA}.${TABLE}"
then
cat /tmp/header.csv /tmp/body.csv > ${TABLE}.csv

If you just want the headers for the extracted data and you want those headers to always be on top and you want to be able to change the names of those headers so it appears more user-friendly and put it all into a CSV file.
You can do the following:
# Creates headers and new output file
HEADERS="ID,USERNAME,EMAIL,ACCOUNT DISABLED?"
echo "$HEADERS" > "$OUTPUT_FILE"
# Gets results from database
db2 -x "select ID, USERNAME, DISABLED FROM ${SCHEMA}.USER WHERE lcase(EMAIL)=lcase('$USER_EMAIL')" | while read ID USERNAME DISABLED ;
do
# Appends result to file
echo "${ID},${USERNAME},${USER_EMAIL},${DISABLED}" >> "$OUTPUT_FILE"
done
No temporary files or merging required.

Db2 for Linux/Unix/Windows lacks a (long overdue) simple opting (to the export command) for this common requirement.
But using the bash shell you can run two separate exports (one for the column-headers, the other for the data ) and concat the results to a file via an intermediate named pipe.
Using an intermediate named pipe means you don't need two flat-file copies of the data.
It is ugly and awkward but it works.
Example fragment (you can initialize the variables to suit your environment):
mkfifo ${target_file_tmp}
(( $? != 0 )) && print "\nERROR: failed to create named pipe ${target_file_tmp}" && exit 1
db2 -v "EXPORT TO ${target_file_header} of del SELECT 'COL1', 'COL2', 'COL3' FROM sysibm.sysdummy1 "
cat ${target_file_header} ${target_file_tmp} >> ${target_file} &
(( $? > 0 )) && print "Failed to append ${target_file} . Check permissions and free space" && exit 1
db2 -v "EXPORT TO ${target_file_tmp} of del SELECT COL1, COL2, COL3 FROM myTable ORDER BY 1 "
rc=$?
(( rc == 1 )) && print "Export found no rows matching the query" && exit 1
(( rc == 2 )) && print "Export completed with warnings, your data might not be what you expect" && exit 1
(( rc > 2 )) && print "Export failed. Check the messages from export" && exit 1

This would work for your simple case
EXPORT TO ...
SELECT C1, C2, C3 FROM (
SELECT 1 as id, 'COL1' as C1, 'COL2' as C2, 'COL3' as C3 FROM sysibm.sysdummy1
UNION ALL
(SELECT 2 as id, COL1, COL2, COL3 FROM myTable)
)
ORDER BY id
Longer term, EXTERNAL TABLE support (already in Db2 Warehouse) which has the INCLUDEHEADER option is (I guess) going to appear in Db2 at some point.

I wrote a stored procedure that extracts the header via describe command. The names can be retrieved from a temporary table, and be exported to a file. The only thing it is still not possible is to concatenate the files via SQL, thus a cat to both file and redirection to another file is necessary as last step.
CALL DBA.GENERATE_HEADERS('SELECT * FORM SYSCAT.TABLES') #
EXPORT TO myfile_header OF DEL SELECT * FROM SESSION.header #
EXPORT TO myfile_body OF DEL SELECT * FORM SYSCAT.TABLES #
!cat myfile_header myfile_body > myfile #
The code of the stored procedure is at: https://gist.github.com/angoca/8a2d616cd1159e5d59eff7b82f672b72
More information at: https://angocadb2.blogspot.com/2019/11/export-headers-of-export-in-db2.html.

Related

Assert on the postgres count output using bash

I would like to make an assertion the output of a postgres query using bash. Concretely, I am writing a bash job that counts the number of rows and if the count is not equal to zero, does something to raise alert.
$ psql MY_DATABASE -c "SELECT COUNT(*) WHERE foo=bar"
count
-------
0
(1 row)
In my script, I would like to assert that the output of above query is zero. However I am not sure where to begin because the output is not a number, but a formatted multi line string.
Is there an option in psql that makes it output a single number when counting, or could you think of any other approaches?

I would suggest to use temporary file to redirect the output and use it. Once your work is done, delete the temp file.
psql your_database -c "SELECT COUNT(*) as Count from table_a where c1=something" -t >assert.tmp
line=$(head -n 1 assert.tmp)
if [ $line -ge 0 ]; then
echo "greater then 0 and values is--"$line
fi
rm assert.tmp
Hope it works for you.

Assign query result to variable and access it from other file

I have two files namely file1.sh and file2.sh.
The file1.sh contains the DB2 query, the query return the total number of employees in the employee table.
Now I want to assign the total number of employees into a variable within the file file1.sh.
File 1:
#!/bin/bash
#database connection goes here
echo The total number employees:
db2 -x "select count(*) from employee"
When i run above file that display the total number of employees.
But
I want to store that total into some variable and want it to access from another file that is file2.sh.
File 2:
#!/bin/bash
#Here i want to use total number of employees
#Variable to be accessed here

Using the following two scripts, driver.sh and child.sh:
driver.sh
#!/bin/bash
cnt=`./child.sh syscat.tables`
echo "Number of tables: ${RESULT}"
cnt=`./child.sh syscat.columns`
echo "Number of columns: ${RESULT}"
child.sh
#!/bin/bash
db2 connect to pocdb > /dev/null 2>&1
cnt=`db2 -x "select count(*) from ${1}"`
db2 connect reset > /dev/null 2>&1
db2 terminate > /dev/null 2>&1
echo ${cnt}
results
[db2inst1#dbms stack]$ ./driver.sh
Number of tables: 474
Number of columns: 7006
[db2inst1#dbms stack]$ ./child.sh syscat.columns
7006

Bash - String verification method

I have a lot of Teradata SQL files (example code of one of this file is below).
create multiset volatile table abc_mountain_peak as(
select
a.kkpp_nip as nip,
from BM_RETABLE_BATOK.EDETON a
) with data on commit preserve rows;
create multiset table qazxsw_asd_1 as (
select
a.address_id,
from DE30T_BIOLOB.HGG994P_ABS_ADDRESS_TRE a,
) with data on commit preserve rows;
create multiset volatile table xyz_sea_depth as(
select
a.trip,
from tele_line_tryt a
) with data on commit preserve rows;
CREATE multiset table wsxzaq_zxc_2 AS (
SELECT
a.bend_data
FROM lokl_station a ,
) WITH data on commit preserve rows;
CREATE multiset table rfvbgt_ttuop_3 AS (
SELECT
a.heret_bini
FROM fvgty_blumion a ,
) WITH data on commit preserve rows;
DROP qazxsw_asd_1;
DROP wsxzaq_zxc_2;
.EXIT
What I need to do is to create a script (bash), which could verify if the multiset tables are dropped.
There are created two kinds of tables:
multiset volatile tables (which shouldn't be dropped), and
multiset tables (which must be dropped)
In my example code, 2 of 3 multiset tables are dropped (which is correct), and one of them is not (which is incorrect).
Do You have any idea how to create script which could verify something like that (give information, that one table, or some tables aren't dropped)? I am really beginner in bash. My idea (could be wrong) is to create array holding a names of the multiset tables (but not a multiset volatile tables), and later create another one table with 'drop' and the names of dropped tables, and finaly check if every table from first array is also in second array.
What do You think? Any help will be gratefully appreciate.

You can do it fairly easily by reading each line in the file, isolate the table names associated with the multiset table commands into one array (dropnames), you then isolate the table names following the DROP statements into another array (droptable). Then it is just a matter of comparing both arrays to find the table in one that is not in the other. A short script like the following will do it for you:
#!/bin/bash
declare -a tmparray ## declare array names
declare -a dropnames
declare -a droptable
volstr="multiset volatile table" ## set query strings
dropstr="multiset table"
## read all lines and collect table names
while read -r line; do
[[ $line =~ $dropstr ]] && { ## collect "multiset table" names
tmparray=( $line )
dropnames+=( ${tmparray[3]} )
}
[[ $line =~ DROP ]] && { ## collect DROP table names
tmp="${line/DROP /}"
droptable+=( ${tmp%;*} )
}
unset array
done
## compare droptable to dropnames, print missing table(s)
if [ ${#dropnames[#]} -gt ${#droptable[#]} ]; then
printf "\n The following tables are missing from DROP tables:\n\n"
for i in "${dropnames[#]}"; do
found=0
for j in "${droptable[#]}"; do
[ $i = $j ] && found=1 && continue
done
[ $found -eq 0 ] && printf " %s\n" "$i"
done
elif [ ${#dropnames[#]} -lt ${#droptable[#]} ]; then
printf "\n The following tables are missing from DROP tables:\n\n"
for i in "${droptable[#]}"; do
found=0
for j in "${dropnames[#]}"; do
[ $i = $j ] && found=1 && continue
done
[ $found -eq 0 ] && printf " %s\n" "$i"
done
fi
printf "\n"
exit 0
Output
$ bash sqlfinddrop.sh <dat/sql.dat
The following tables are missing from DROP tables:
rfvbgt_ttuop_3

I would do it in two parts using sed:
Create list of creates:
sed -ne 's/^.*create multiset \(volatile \)\?table \(\w\+\).*$/\2/Ip' INPUT FILES | sort > creates.txt
Create list of deletes:
sed -ne 's/^.*drop \(\w\+\).*$/\1/Ip' INPUT FILES | sort > drops.txt
Tables which were created and dropped:
join creates.txt drops.txt
Tables created and not dropped:
combine creates.txt not drops.txt

Get row count from all tables in hive

How can I get row count from all tables using hive? I am interested in the database name, table name and row count

You will need to do a
select count(*) from table
for all tables.
To automate this, you can make a small bash script and some bash commands.
First run
$hive -e 'show tables' | tee tables.txt
This stores all tables in the database in a text file tables.txt
Create a bash file (count_tables.sh) with the following contents.
while read line
do
echo "$line "
eval "hive -e 'select count(*) from $line'"
done
Now run the following commands.
$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt
This creates a text file(counts.txt) with the counts of all the tables in the database

A much faster way to get approximate count of all rows in a table is to run explain on the table. In one of the explain clauses, it shows row counts like below:
TableScan [TS_0] (rows=224910 width=78)
The benefit is that you are not actually spending cluster resources to get that information.
The HQL command is explain select * from table_name; but when not optimized not shows rows in the TableScan.

select count(*) from table
I think there is no more efficient way.

You can collect the statistics on the table by using Hive ANALAYZE command. Hive cost based optimizer makes use of these statistics to create optimal execution plan.
Below is the example of computing statistics on Hive tables:
hive> ANALYZE TABLE stud COMPUTE STATISTICS;
Query ID = impadmin_20171115185549_a73662c3-5332-42c9-bb42-d8ccf21b7221
Total jobs = 1
Launching Job 1 out of 1
…
Table training_db.stud stats: [numFiles=5, numRows=5, totalSize=50, rawDataSize=45]
OK
Time taken: 8.202 seconds
Links:
http://dwgeek.com/apache-hive-explain-command-example.html/

You can also set the database in the same command and separate with ;.
hive -e 'use myDatabase;show tables'

try this guys to automate-- put in shell after that run bash filename.sh
hive -e 'select count(distinct fieldid) from table1 where extracttimestamp<'2018-04-26'' > sample.out
hive -e 'select count(distinct fieldid) from table2 where day='26'' > sample.out
lc=cat sample.out | uniq | wc -l
if [ $lc -eq 1 ]; then
echo "PASS"
else
echo "FAIL"
fi

How do I mention the specific database that it needs to refer in below snippet:
while read line
do
echo "$line "
eval "hive -e 'select count(*) from $line'"
done

Here's a solution I wrote that uses python:
import os
dictTabCnt={}
print("=====Finding Tables=====")
tableList = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; show tables;\"").read().split('\n')
print("=====Finding Table Counts=====")
for i in tableList:
if i <> '':
strTemp = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; SELECT COUNT(*) FROM {}\"".format(i)).read()
dictTabCnt[i] = strTemp
print("=====Table Counts=====")
for table,cnt in dictTabCnt.items():
print("{}: {}".format(table,cnt))

Thanks to #mukul_gupta for providing shell script.
how ever we are encounting below error for the same
"bash syntax error near unexpected token done"
Solution for this at below link
BASH Syntax error near unexpected token 'done'
Also if any one need how to select DB Name
$hive -e 'use databasename;show tables' | tee tables.txt
for passing db name in select statement, give DB name in tableslist file itself.

variable has space separated values I need to get groups of 3 values for next script

I have a shell script environmental variable being populated by an sql command
the sql command is returning multiple records of 3 columns. each record I need to pass to another shell script.
QUERYRESULT=`${SQLPLUS_COMMAND} -s ${SQL_USER}/${SQL_PASSWD}#${SQL_SCHEMA}<<endl
set heading off feedback off
select col1, col2, col3
from mytable
where ......)
order by ......
;
exit
endl`
echo ${QUERYRESULT}
outputs a single line of all columns space separated, all variables are guaranteed to be not null
val1 val2 val3 val1 val2 val3 val1 val2 val3 ......
I need to call the following for each record
nextScript.bash val1 val2 val3
I can also run the query but count the records to determine how many times I need to call nextScript.bash.
any thoughts on how to get from a single env variable of multiple 3 parameters to my executing the next script?

Without the use of a variable:
( ${SQLPLUS_COMMAND} -s ${SQL_USER}/${SQL_PASSWD}#${SQL_SCHEMA}<<endl
set heading off feedback off
select col1, col2, col3
from mytable
where ......)
order by ......
;
exit
endl
) | while read line;do echo $line; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Export results from DB2 to CSV including column names via bash - bash

Related

Assert on the postgres count output using bash

Assign query result to variable and access it from other file

Bash - String verification method

Get row count from all tables in hive

variable has space separated values I need to get groups of 3 values for next script

Categories

Resources