Bash: store sql query in an array - bash

I am running an sql query in bash to get file names and their schedule time. The files will then be run at the schedule time associated with it. The query out put is below. I need to capture the date and time and the file names and run the files at their specified time. How do I store both columns in separate arrays.
file_name | schedule_time
--------------------------+---------------------
file 1 | 2016-02-25 07:26:00
file 2 | 2016-02-26 07:37:00
file 1 | 2016-02-27 07:39:00
file 3 | 2016-02-27 12:00:00
file 1 | 2016-02-28 07:25:00
file 2 | 2016-02-29 02:15:00
file 2 | 2016-02-29 08:38:00
file 1 | 2016-02-29 12:00:00

I don't know exactly why you need it, but here is a plain bash solution:
#!/bin/bash
sqldata="file_name | schedule_time
--------------------------+---------------------
file 1 | 2016-02-25 07:26:00
file 2 | 2016-02-26 07:37:00
file 1 | 2016-02-27 07:39:00
file 3 | 2016-02-27 12:00:00
file 1 | 2016-02-28 07:25:00
file 2 | 2016-02-29 02:15:00
file 2 | 2016-02-29 08:38:00
file 1 | 2016-02-29 12:00:00"
sqldata=$(echo "$sqldata" | tail -n +3) # skip first 3 lines
oldifs="$IFS"
IFS=$'\r\n'
lines=( $sqldata )
IFS="$oldifs"
files=()
dates=()
idx=0
for i in "${lines[#]}"
do
files[idx]=$(echo $i | sed -E 's/ +\|.*//')
data[idx]=$(echo $i | sed -E 's/ .*\|//')
idx=$(($idx + 1))
done
echo files:
echo ${files[#]}
echo data:
echo ${data[#]}

Related

bash looping and extracting of the fragment of txt file

I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in different positions of the log) in the following format:
File 1:
CLUSTERING HISTOGRAM
____________________
________________________________________________________________________________
| | | | |
Clus | Lowest | Run | Mean | Num | Histogram
-ter | Binding | | Binding | in |
Rank | Energy | | Energy | Clus| 5 10 15 20 25 30 35
_____|___________|_____|___________|_____|____:____|____:____|____:____|____:___
1 | -5.78 | 11 | -5.78 | 1 |#
2 | -5.53 | 13 | -5.53 | 1 |#
3 | -5.47 | 17 | -5.44 | 2 |##
4 | -5.43 | 20 | -5.43 | 1 |#
5 | -5.26 | 19 | -5.26 | 1 |#
6 | -5.24 | 3 | -5.24 | 1 |#
7 | -5.19 | 4 | -5.19 | 1 |#
8 | -5.14 | 16 | -5.14 | 1 |#
9 | -5.11 | 9 | -5.11 | 1 |#
10 | -5.07 | 1 | -5.07 | 1 |#
11 | -5.05 | 14 | -5.05 | 1 |#
12 | -4.99 | 12 | -4.99 | 1 |#
13 | -4.95 | 8 | -4.95 | 1 |#
14 | -4.93 | 2 | -4.93 | 1 |#
15 | -4.90 | 10 | -4.90 | 1 |#
16 | -4.83 | 15 | -4.83 | 1 |#
17 | -4.82 | 6 | -4.82 | 1 |#
18 | -4.43 | 5 | -4.43 | 1 |#
19 | -4.26 | 7 | -4.26 | 1 |#
_____|___________|_____|___________|_____|______________________________________
The aim is to loop over all the dlg files and take the single line from the table corresponding to wider cluster (with bigger number of slashes in Histogram column). In the above example from the table this is the third line.
3 | -5.47 | 17 | -5.44 | 2 |##
Then I need to add this line to the final_log.txt together with the name of the log file (that should be specified before the line). So in the end I should have something in following format (for 3 different log files):
"Name of the file 1": 3 | -5.47 | 17 | -5.44 | 2 |##
"Name_of_the_file_2": 1 | -5.99 | 13 | -5.98 | 16 |################
"Name_of_the_file_3": 2 | -4.78 | 19 | -4.44 | 3 |###
A possible model of my BASH workflow would be:
#!/bin/bash
do
file_name2=$(basename "$f")
file_name="${file_name2/.dlg}"
echo "Processing of $f..."
# take a name of the file and save it in the log
echo "$file_name" >> $PWD/final_results.log
# search of the beginning of the table inside of each file and save it after its name
cat $f |grep 'CLUSTERING HISTOGRAM' >> $PWD/final_results.log
# check whether it works
gedit $PWD/final_results.log
done
Here I need to substitute combination of echo and grep in order to take selected parts of the table.
You can use this one, expected to be fast enough. Extra lines in your files, besides the tables, are not expected to be a problem.
grep "#$" *.dlg | sort -rk11 | awk '!seen[$1]++'
grep fetches all the histogram lines which are then sorted in reverse order by last field, that means lines with most # on the top, and finally awk removes the duplicates. Note that when grep is parsing more than one file, it has -H by default to print the filenames at the beginning of the line, so if you test it for one file, use grep -H.
Result should be like this:
file1.dlg: 3 | -5.47 | 17 | -5.44 | 2 |##########
file2.dlg: 3 | -5.47 | 17 | -5.44 | 2 |####
file3.dlg: 3 | -5.47 | 17 | -5.44 | 2 |#######
Here is a modification to get the first appearence in case of many equal max lines in a file:
grep "#$" *.dlg | sort -k11 | tac | awk '!seen[$1]++'
We replaced the reversed parameter in sort, with the 'tac' command which is reversing the file stream, so now for any equal lines, initial order is preserved.
Second solution
Here using only awk:
awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
END {for (i in row) print i ":" row[i]}' *.dlg
Update: if you execute it from different directory and want to keep only the basename of every file, to remove the path prefix:
awk -F"|" '/#$/ && $NF > max[FILENAME] {max[FILENAME]=$NF; row[FILENAME]=$0}
END {for (i in row) {sub(".*/","",i); print i ":" row[i]}}'
Probably makes more sense as an Awk script.
This picks the first line with the widest histogram in the case of a tie within an input file.
#!/bin/bash
awk 'FNR == 1 { if(sel) print sel; sel = ""; max = 0 }
FNR < 9 { next }
length($10) > max { max = length($10); sel = FILENAME ":" $0 }
END { if (sel) print sel }' ./"$prot"/*.dlg
This assumes the histograms are always the tenth field; if your input format is even messier than the lump you show, maybe adapt to taste.
In some more detail, the first line triggers on the first line of each input file. If we have collected a previous line (meaning this is not the first input file), print that, and start over. Otherwise, initialize for the first input file. Set sel to nothing and max to zero.
The second line skips lines 1-8 which contain the header.
The third line checks if the current line's histogram is longer than max. If it is, update max to this histogram's length, and remember the current line in sel.
The last line is spillover for when we have processed all files. We never printed the sel from the last file, so print that too, if it's set.
If you mean to say we should find the lines between CLUSTERING HISTOGRAM and the end of the table, we should probably have more information about what the surrounding lines look like. Maybe something like this, though;
awk '/CLUSTERING HISTOGRAM/ { if (sel) print sel; looking = 1; sel = ""; max = 0 }
!looking { next }
looking > 1 && $1 != looking { looking = 0; nextfile }
$1 == looking && length($10) > max { max = length($10); sel = FILENAME ":" $0 }
END { if (sel) print sel }' ./"$prot"/*.dlg
This sets looking to 1 when we see CLUSTERING HISTOGRAM, then counts up to the first line where looking is no longer increasing.
I would suggest processing using awk:
for i in $FILES
do
echo -n \""$i\": "
awk 'BEGIN {
output="";
outputlength=0
}
/(^ *[0-9]+)/ { # process only lines that start with a number
if (length(substr($10, 2)) > outputlength) { # if line has more hashes, store it
output=$0;
outputlength=length(substr($10, 2))
}
}
END {
print output # output the resulting line
}' "$i"
done

Bash extract strings between two characters

I have the output of query result into a bash variable, stored as a single line.
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | -------------------------------------
I would like to ignore the column names(NAME | TEST_DATE) and store actual values of each name and test_date as a tuple in an array.
So here is the logic I am thinking, I would like to extract third string onwards between two '|' characters. These strings are comma separated and when a space is encountered we start the next tuple in the array.
Expected output:
array=(TESTTT_1,2019-01-15 TEST_2,2018-02-16 TEST_NAME_3,2020-03-17)
Any help is appreciated. Thanks.
let say your
String is stored in variable a (or pipe our query output to below command
echo "$a"
-------------------------------- | NAME | TEST_DATE | ----------------
--------------------- | TESTTT_1 | 2019-01-15 | | TEST_2 | 2018-02-16 | | TEST_NAME_3 | 2020-03-17 | ------------------------------------
Command to obtain desired results is:
array="$(echo "$a" | cut -d '|' -f2,3,5,6,8,9 | tail -n1 | sed 's/ | /,/g')
Above will store ourput in variable named array as you expected
Output of above command is:
echo "$array"
TESTTT_1,2019-01-15,TEST_2,2018-02-16,TEST_NAME_3,2020-03-17
Explanation of command: output of echo $a will be piped into cut and using '|' as delimeter it will cut fields 2,3,5,6,8,9 then the output is piped into tail to remove the undesired NAME and TEST_DATE columns and provide values only and then as per your expected output | will be converted to , using sed.
Here in this string you are having only three dates if you have more then just in cut command add more field numbers and as per format of your string field numbers will be in following style 2,3,5,6,8,9,11,12,14,15 .... and so on.
Hope it solved your problem.
echo "$a" | awk -F "|" '{ for(i=2; i<=NF; i++){ print $i }}' | sed -e '1,3d' -e '$d' | tr ' ' '\n' | sed '/^$/d' | sed 's/^/,/g' | sed -e 'N;s/\n/ /' | sed 's/^.//g' | xargs | sed 's/ ,/, /g'
Above is awk based solution
Output:
TESTTT_1, 2019-01-15 TEST_2, 2018-02-16 TEST_NAME_3, 2020-03-17
Is it ok.

Read content of file and put particular portion of content in separate files using bash

I would like to get specific file contains from single file and put into separate files via bash. I have tried getting test1 file contain using below code and able to get it but i'm failed when getting everything in respected files.
Tried code:
reportFile=/report.txt
test1File=/test1.txt
test2File=/test2.txt
test3File=/test3.txt
totalLineNo=`cat ${reportFile} | wc -l`
test1LineNo=`grep -n "Test1 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test2LineNo=`grep -n "Test2 file content :" ${reportFile} | grep -Eo '^[^:]+'`
test3LineNo=`grep -n "Test3 file content :" ${reportFile} | grep -Eo '^[^:]+'`
exactTest1LineNo=`echo $(( ${test1LineNo} - 1 ))`
exactTest2LineNo=`echo $(( ${test2LineNo} -1 ))`
exactTest3LineNo=`echo $(( ${test3LineNo} -1 ))`
test1Content=`cat ${reportFile} | head -n ${exactTest1LineNo}`
test3Content=`cat ${reportFile} | tail -n ${exactTest3LineNo}`
echo -e "${test1Content}\r" >> ${test1File}
echo -e "${test3Content}\r" >> ${test3File}
report.txt:
-------------------------------------
My Report:
Test1 file content:
1
2
3
4
5
6
Test2 file content:
7
8
9
10
Test3 file content:
11
12
13
14
15
Note: Find my report above.
-------------------------------------
test1.txt (expected):
1
2
3
4
5
6
test2.txt (expected):
7
8
9
10
test3.txt (expected):
11
12
13
14
15
With single awk command:
awk '/^Test[0-9] file content:/{ f=1; fn=tolower($1)".txt"; next }
f && NF{ print > fn }!NF{ f=0 }' report.txt
Viewing results:
$ head test[0-9].txt
==> test1.txt <==
1
2
3
4
5
6
==> test2.txt <==
7
8
9
10
==> test3.txt <==
11
12
13
14
15
If I understand you correctly: you have a long file report.txt and you want to extract short files from it. The name of each file is followed by the string " file content:" in the file report.txt.
This is my solution:
#!/bin/bash
reportFile=report.txt
Files=`grep 'file content' $reportFile | sed 's/ .*$//'`
for F in $Files ; do
f=${F,}.txt # first letter lowercase and append .txt
awk "/$F file content/,/^\$/ {print}" $reportFile |
tail -n +2 | # remove first line with "Test* file content:"
head -n -1 > $f # remove last blank line
done

convert a text file to csv using shell script

I am new to shell scripting, can anyone give me shell script for the condition below
My Input:
id | name | values
----+------+--------
1 | abc | 2
1 | abc | 3
1 | abc | 4
1 | abc | 5
1 | abc | 6
1 | abc | 7
Expected Output:
1,abc,2
"
"
1 million records
You can use awk for this:
awk -F '[[:blank:]]*\\|[[:blank:]]*' -v OFS=, 'NF==3 && NR>1{sub(/^[[:blank:]]*/, "", $1); print}' file
1,abc,2
1,abc,3
1,abc,4
1,abc,5
1,abc,6
1,abc,7

List of last generated file on each day from 7 days list

I've a list of files in the following format:
Group_2012_01_06_041505.csv
Region_2012_01_06_041508.csv
Region_2012_01_06_070007.csv
XXXX_YYYY_MM_DD_HHMMSS.csv
What is the best way to compile a list of last generated file for each day per group from last 7 days list?
Version that worked on HP-UX
for d in 6 5 4 3 2 1 0
do
DATES[d]=$(perl -e "use POSIX;print strftime '%Y_%m_%d%',localtime time-86400*$d;")
done
for group in `ls *.csv | cut -d_ -f1 | sort -u`
do
CSV_FILES=$working_dir/*.csv
if [ ! -f $CSV_FILES ]; then
break # if no file exists do not attempt processing
fi
for d in "${DATES[#]}"
do
file_nm=$(ls ${group}_$d* 2>>/dev/null | sort -r | head -1)
if [ "$file_nm" != "" ]
then
# Process file
fi
done
done
You can explicitly iterate over the group/time combinations:
for d in {1..6}
do
DATES[d]=`gdate +"%Y_%m_%d" -d "$d day ago"`
done
for group in `ls *csv | cut -d_ -f1 | sort -u`
do
for d in "${DATES[#]}"
do
echo "$group $d: " `ls ${group}_$d* 2>>/dev/null | sort -r | head -1`
done
done
Which outputs the following for your example data set:
Group 2012_01_06: Group_2012_01_06_041505.csv
Group 2012_01_05:
Group 2012_01_04:
Group 2012_01_03:
Group 2012_01_02:
Group 2012_01_01:
Region 2012_01_06: Region_2012_01_06_070007.csv
Region 2012_01_05:
Region 2012_01_04:
Region 2012_01_03:
Region 2012_01_02:
Region 2012_01_01:
XXXX 2012_01_06:
XXXX 2012_01_05:
XXXX 2012_01_04:
XXXX 2012_01_03:
XXXX 2012_01_02:
XXXX 2012_01_01:
Note Region_2012_01_06_041508.csv is not shown for Region 2012_01_06 as it is older than Region_2012_01_06_070007.csv

Resources