add query results in a csv in linux - shell

I have a query in shell scripts that gives me a results like:
article;20200120
fruit;22
fish;23
I execute that report every day. I would like that when I execute the query the next day shows me output like that:
article;20200120;20200121
fruit;22;11
fish;23;12
These report I execute with postgre sql in a linux shell script. The output of csv is generated redirecting the ouput with ">>"
Please any help to achive that.
Thanks

This might be somewhat fragile, but it sounds like what you want can be accomplished with cut and paste.
Let's start with two files we want to join:
$ cat f1.csv
article;20200120
fruit;22
fish;23
$ cat f2.csv
article;20200121
fruit;11
fish;12
We first use cut to strip the headers from the second file, then send that into paste with the first file to combine corresponding lines:
$ cut -d ';' -f 2- f2.csv | paste -d ';' f1.csv -
article;20200120;20200121
fruit;22;11
fish;23;12
Parsing that command line, the -d ';' tells cut to use semicolons as the delimiter (the default is tab), and -f 2- says to print the second and later fields. f2.csv is the input file for cut. Then the -d ';' similarly tells paste to use semicolons to join the lines, and f1.csv - are the two files to paste together, in that order, with - representing the input piped in using the | shell operator.
Now, like I say, this is somewhat fragile. We're not matching the lines based on the header information, only their line number from the start of the file. If some fields are optional, or the set of fields changes over time, this will silently produce garbage. One way to mitigate that would be to first call cut -d ';' -f 1 on each of the input files and insist the results are the same before combining them.

Related

Differneces between two .dat files using unix scripts

I need a UNIX script for the following requirements.
Input files: file1.dat.$prevday, file2.dat.$today
Requirement:
1) Script should have input fields as file1_today, file2_prevday
2) The script should compare both the files and give list of lines in two output files
To_be_added.txt, to_be_removed.txt
3)To be added.txt – this should have the list of lines which are available in file1_today but not in file2_prevday.
4) To be removed.txt – this should have the list lines which are available in file2_prevday but not in file1_today.
I'll show you how to build at least half of what you need using a few simple commands (teach a man to fish, and all that...)
You could do this with a scripting language like perl or ruby. If you've ever wanted to learn one of those languages, then a program like this would be the perfect opportunity.
You can also do this by chaining commands together.
To start, the unix command 'diff' gives you the info you want, just not in the format you want. If you 'diff file2_prevday file1_today' then it will show lines that only exist in file1_today with '> ' at the front (your 'To_be_added.txt', and those only in file2_prevday' with '< ' at the front. I suggest trying that now with some sample files.
Now we can search for just those lines with grep which will search the input only for lines that match, for example:
% diff file2_prevday file1_today | grep '^> '
Here we search for lines that match the pattern '^> '. The '^' is a special character for grep (and comparable tools) that matches the beginning of a line.
Unfortunately this leaves the '> ' at the beginning of all our output.
We can modify the lines that go through the pipe with sed, which will let us do a search and replace. We search for the same pattern and match it with nothing:
% diff file2_prevday file1_today | grep '^> ' | sed -e 's/^> //'
This gives us our output for one of our files, which we can save:
% diff file2_prevday file1_today | grep '^> ' | sed -e 's/^> //' > To_be_added.txt
I'll leave the creation of the other file up to you.
Some questions you would probably benefit from answering for yourself:
Why do we need the '^' in the grep and sed?
How could I make a single alias that would run both commands?
How could I write this as a script in a language such as
perl/ruby/python?
How could you generate the filenames using the date command and backquotes?

Bash tr -s command

So lets say I have several characters in an email which don't belong. I want to take them out with the tr command. For example...
jsmith#test1.google.com
msmith#test2.google.com
zsmith#test3.google.com
I want to take out all the test[123]. so I am using the command tr -s 'test[123].' < email > mail. That is one way I have tried but the two or three I have attempted all do not work as intended. The output I am trying to get to is ...
jsmith#google.com
msmith#google.com
zsmith#google.com
You could use sed.
$ sed 's/#test[1-3]\./#/' file
jsmith#google.com
msmith#google.com
zsmith#google.com
[1-3] matches all the characters which falls within the range 1 to 3 (1,2,3). Add in-place edit -i parameter to save the changes made.

shell program to modify contents of a file

I have a file that has a list of product ids each on one line. I want to modify this file in a way that all product ids are on one line and comma separated and in inverted commas. Original format -
1\n2\n3\n
Expected format -
'1','2','3'
I tried the following command -
paste -s -d "','" velocities.txt > out.txt
The result is looking like this -
1',2'3'4,
I do understand that using the above command I wont get the anything before the first product id, but i will be able to handle that case.
You could use sed to quote all digits:
paste -s -d, velocities.txt | sed "s|\([0-9]\+\)|'\1'|g" > out.txt
P.S. Another command that also handles IP-addressed:
sed "s|^\(.*\)$|'\1'|g" velocities.txt | paste -s -d, - > out.txt

Shell scripting - how to properly parse output, learning examples.

So I want to automate a manual task using shell scripting, but I'm a little lost as to how to parse the output of a few commands. I would be able to this in other languages without a problem, so I'll just explain what I'm going for in psuedo code and provide an example of the cmd output I'm trying to parse.
Example of output:
Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'
What I need to parse out is '2167467'. So what I want to do is split on spaces and take element 1 to use in another command. The output of my next command looks like this:
Change 2167463 by user1234#filename on 2012/02/13 18:10:15
description of submission
Affected files ...
... //filepath/dir1/dir2/dir3/filename#2298 edit
I need to parse out '//filepath/dir1/dir2/dir3/filename#2298' and use that in another command. Again, what I would do is remove the blank lines from the output, grab the 4th line, and split on space. From there I would grab the 1st element from the split and use it in my next command.
How can I do this in shell scripting? Examples or a point to some tutorials would be great.
Its not clear if you want to use the result from the first command for processing the 2nd command. If that is true, then
targString=$( cmd1 | awk '{print $2}')
command2 | sed -n "/${targString}/{n;n;n;s#.*[/][/]#//#;p;}"
Your example data has 2 different Chg values in it, (2167467, 2167463), so if you just want to process this output in 2 different ways, its even simpler
cmd1 | awk '{print $2}'
cmd2 | sed -n '/Change/{n;n;n;s#.*[/][/]#//#;p;}'
I hope this helps.
I'm not 100% clear on your question, but I would use awk.
http://www.cyberciti.biz/faq/bash-scripting-using-awk/
Your first variable would look something like this
temp="Chg 2167467 on 2012/02/13 by user1234#filename 'description of submission'"
To get the number you want do this:
temp=`echo $temp | cut -f2 -d" "`
Let the output of your second command be saved to a file something like this
command $temp > file.txt
To get what you want from the file you can run this:
temp=`tail -1 file.txt | cut -f2 -d" "`
rm file.txt
The last block of code gets the last nonwhite line of the file and delimits on the second set of white spaces

File names stacking during for loop

I am new to shell script, so this might be a dumb question. I haven't found an answer online though. I am taking a coworkers script and changing it so that it works for my data. Right now I am running a test that only uses three of my data files. The code hits a spot in the script where it goes through a for loop and it is supposed to run through the loop once for each of the different files (three times).
listtumor=`cat /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/MatchedTumorTest.txt`
for i in $listtumor
do
lst=`ls /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/freshstart/${i}*.txt | awk -F'/' '{print $9}'`
MatchedTumorTest.txt just contains the three different file names that I am using for the test without '.txt' As far as I can tell, this code should just run through the script three times, one for each file. Instead I am getting this error:
ls: /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/freshstart/TCGA-04-1514-01A-01D-0500-02_S01_CGH_105_Dec08\rTCGA-04-1530-01A-02D-0500-02_S01_CGH_105_Dec08\rTCGA-04-1542-01A-01D-0500-02_S01_CGH_105_Dec08*.txt: No such file or directory
For some reason all of the file names are stacked on top of each other instead of the loop going to each one individually. Any ideas why this is happening?
Thanks,
T.J.
It looks like the lines in your text file may be separated by carriage returns instead of newlines. Since none of the file names in your example have spaces, the for loop should work just fine if you initialize your listtumor like this:
listtumor=`tr '\r' '\n' < /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/MatchedTumorTest.txt`
The tr command will translate the carriage returns into newlines, which is what most text-processors (like the shell's own for command) will expect), and write the result to standard output.
The for loop doesn't do too well with some kinds of separators. Try this instead:
while read line; do
lst=`ls /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/freshstart/${line}*.txt | awk -F'/' '{print $9}'`
...
done < /Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun/MatchedTumorTest.txt
I'm assuming here that you're separating MatchedTumorTest.txt with newlines.
So combined all together:
dir="/Users/TReiersen/Work-Folder/OV/DataProcessing/TestRun"
file="$dir/MatchedTumorTest.txt"
< "$file" tr '\r' '\n' | while read tumor
do
ls "$dir/freshstart" | grep "$tumor.*\.txt$"
done
will print all .txt file-names in the directory $dir/freshstart what contain a name form the file MatchedTumorTest.txt

Resources