combine lines of csv in bash - bash

I want to create new csv file for each city combining several csv with rows and columns, one column has the name of cities, that repeat in all the csv files...
For example,
I have files with the name of the date,YYYYMMDD, 20140713.csv, 20140714.csv, 20140715.csv...
They have the same structure, same numbers of rows and columns, for example, 20140713.csv...
1. City, Data, TMinreal, TMaxreal, TMinext, TMaxext, DiffTMin, DiffTMax
2. Milano,20140714,19.0,28.8,18,27,1,1.8
3. Rome,20140714,18.1,29.3,14,29,4.1,0.3
4. Pisa,20140714,10.8,27.5,8,29,2.8,-1.5
5. Venecia,20140714,21.1,29.1,16,27,5.1,2.1
I want to combine all these csv files...and get, csv files with the name of the city, as Milano.csv and inside with the information about this city stored in all the csv combined.
For example, if I combine 20140713.csv, 20140714.csv, 20140715.csv, for Milano.csv
1. Milano,20140713,19.0,28.8,18,26,1,2.8
2. Milano,20140714,19.0,28.8,20,27,-1,1.8
3. Milano,20140715,21.0,26.8,19,27,2,-0.2
any idea? thank you

untested, but this should work:
awk -F, 'FNR==1{next} {file = $1".csv"; print > file}' 20*.csv

You can have this bash script:
#!/bin/bash
for FILE; do
{
read ## Skip header
while IFS=, read -r A B; do
echo "$A,$B" >> "$A".csv
done
} < "$FILE"
done
Then run as:
bash script.sh file1.csv file2.csv ...

Related

How to parse specific values from csv file to a for loop command? [duplicate]

This question already has answers here:
Looping over pairs of values in bash [duplicate]
(6 answers)
Closed 1 year ago.
I am trying to write a for loop where I conditionally parse specific values from a csv file into the do command.
My situation is as follows:
I have several directories containing genome sequences. The samples are numbered and the directories are named accordingly.
Dir 1 contains sample1_genome.fasta
Dir 2 contains sample2_genome.fasta
Dir 3 contains sample3_genome.fasta
The genome sequences have differing average read lengths. It is important to adress this. Therefore, I created a csv file containing the sample number and the according average read length of the genome sequence.
csv file example (first column = sample_no, 2nd column = avg_read_length):
1,130
2,134
3,129
Now, I want to loop through the directories, take the genome sequences as input and parse the respective average read length to the process.
my code is as follows:
for f in *
do
shortbred_quantify.py --genome $f/sample${f%}.fasta --aerage_read_length *THE SAMPLE MATCHING VALUE FROM 2nd COLUMN* --results results/quantify_results_sample${f%}
done
Can you help me out with this?
Use awk. $2 is the second field, $1 is the first. eg:
$ cat input
1,130
2,134
3,129
$ awk '$2 == avgReadBP{ print $1 }' FS=, avgReadBP=134 input
2
So your command ends up looking like:
input="$f"/genome_sample.fasta
shortbred_quantify.py --genome "$input" \
--avgreadBP "$(awk '$2 == a{ print $1 }' FS=, a="$value_to_match" "$input")" \
--results results/quantify_results_sample"${f}"
Don't forget to quote the filename.
I would structure it along these lines:
while IFS=, read sample read_length
do
shortbred_quantify.py --genome "$sample/genome_sample.fasta" --avgreadBP "$read_length" --results "results/quantify_results_sample$sample"
done < ./input

Need to export values stored in 2 different variables to two separate columns in csv file using bash script

My bash script stores data in 2 different variables like -:
Var1="My deployment name is A"
Var2="My deployment url is B"
Note:- Var1 and Var2 contains multiple lines and hence need to export in csv to respective columns i.e. 1st and 2nd column.
I would like to export these 2 values to CSV file so that value of Var1 goes to 1st column and Var2 to 2nd column of same CSV file. like -
Can some one please help me on this ? Thanks in advance.
Two Variables = Two Cells
If the variables never contain special symbols like " or , or linebreaks then it is as simple as
echo "$Var1,$Var2" > file.csv
If there could be a special symbol inside them, you have to quote them, for instance using
printf '"%s","%s"\n' "${Var1//\"/\"\"}" "${Var2//\"/\"\"}" > file.csv
Two Variables = Two Columns With Multiple Cells (One Cell Per Line)
Without quoting
paste -d, <(echo "$Var1") <(echo "$Var2") > file.csv
With quoting
varToCol() {
sed 's/"/""/g;s/.*/"&"/' <<< "${!1}"
}
paste -d, <(varToCol Var1) <(varToCol Var2) > file.csv
For a pure bash solution replace the function with the following. Please note: Below bash solution adds an unnecessary empty field at the bottom of the column, if the variable ends with a newline character.
varToCol() {
local col="${!1//\"/\"\"}"
printf %s\\n "\"${col//$'\n'/$'"\n"'}\""
}

Want to sort a file based on another file in unix shell

I have 2 files refer.txt and parse.txt
refer.txt contains the following
julie,remo,rob,whitney,james
parse.txt contains
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0
Now my output.txt should list the files in parse.txt based on the order specified in refer.txt
ex of output.txt should be:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
i have tried the following code:
sort -nru refer.txt parse.txt
but no luck.
please assist me.TIA
You can do that using gnu-awk:
awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
{s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt
Output:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
Explanation:
-F/ # Use field separator as /
-v RS=',|\n' # Use record separator as comma or newline
NR == FNR { # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the
# records with keys julie, remo, rob etc.
}
{ # while processing the second file refer.txt
s = (s)?s "," a[$1]:a[$1] # aggregate all values by reading key from 2nd file
}
END {print s } # print all the values
In pure native bash (4.x):
# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt
# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[#]}"; do
key=${value%%/*}
if [[ ${kv[$key]} ]]; then
kv[$key]+=",$value" # already exists, comma-separate
else
kv[$key]="$value"
fi
done
# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[#]}"; do
out+=( "${kv[$value]}" )
done
# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt
If you're getting more output fields than you have input fields, you're probably trying to run this with bash 3.x. Since associative array support is mandatory for correct operation, this won't work.
tr , "\n" refer.txt | cat -n >person_id.txt # 'cut -n' not posix, use sed and paste
cat person_id.txt | while read person_id person_key
do
print "$person_id" > $person_key
done
tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt
cat person_data.txt | while read foreign_key person_data
do
person_id="$(<$foreign_key)"
print "$person_id" " " "$person_data" >>merge.txt
done
sort merge.txt >output.txt
A text book data processing approach, a person id table, a person data table, merged on a common key field, which is the first name of the person:
[person_key] [person_id]
- person id table, a unique sortable 'id' for each person (line number in this instance, since that is the desired sort order), and key for each person (their first name)
[person_key] [person_data]
- person data table, the data for each person indexed by 'person_key'
[person_id] [person_data]
- a merge of the 'person_id' table and 'person_data' table on 'person_key', which can then be sorted on person_id, giving the output as requested
The trick is to implement an associative array using files, the file name being the key (in this instance 'person_key'), the content being the value. [Essentially a random access file implemented using the filesystem.]
This actually adds a step to the otherwise simple but not very efficient task of grepping parse.txt with each value in refer.txt - which is more efficient I'm not sure.
NB: The above code is very unlikely to work out of the box.
NBB: On reflection, probably a better way of doing this would be to use the file system to create a random access file of parse.txt (essentially an index), and to then consider refer.txt as a batch file, submitting it as a job as such, printing out from the parse.txt random access file the data for each of the names read in from refer.txt in turn:
# 1) index data file on required field
cat person_data.txt | while read data
do
key="$(print "$data" | sed 's/(^[^\/]*)/\1/')" # alt. `cut -d'/' -f1` ??
print "$data" >>./person_data/"$key"
done
# 2) run batch job
cat refer_data.txt | while read key
do
print ./person_data/"$key"
done
However having said that, using egrep is probably just as rigorous a solution or at least for small datasets, I would most certainly use this approach given the specific question posed. (Or maybe not! The above could well prove faster as well as being more robust.)
Command
while read line; do
grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -
Key points
For both files, newlines are translated to commas using the tr command (without actually changing the files themselves). This is useful because while read and grep work under the assumption that your records are separated by newlines instead of commas.
while read will read in every name from refer.txt, (i.e julie, remo, etc.) and then use grep to retrieve lines from parse.txt containing that name.
The ^ in the regex ensures matching is only performed from the start of the string and not in the middle (thanks to #CharlesDuffy's comment below), and the -w option for grep allows whole-word matching only. For example, this ensures that "rob" only matches "rob/..." and not "robby/..." or "throb/...".
The paste command at the end will comma-separate the results. Removing this command will print each result on its own line.

How to split a CSV file into multiple files based on column value

I have CSV file which could look like this:
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will make bar chart for each .dat file) For this case it should be two files:
data1.dat
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
data2.dat
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
Is there any simple way of doing it with bash?
You can use awk to generate a file containing only a particular value of the second column:
awk -F ';' '($2==1){print}' data.dat > data1.dat
Just change the value in the $2== condition.
Or, if you want to do this automatically, just use:
awk -F ';' '{print > ("data"$2".dat")}' data.dat
which will output to files containing the value of the second column in the name.
Try this:
while IFS=";" read -r a b c; do echo "$a;$b;$c" >> data${b}.dat; done <file

comparing csv files

I want to write a shell script to compare two .csv files. First one contains filename,path the second .csv file contains filename,paht,target. Now, I want to compare the two .csv files and output the target name where the file from the first .csv exists in the second .csv file.
Ex.
a.csv
build.xml,/home/build/NUOP/project1
eesX.java,/home/build/adm/acl
b.csv
build.xml,/home/build/NUOP/project1,M1
eesX.java,/home/build/adm/acl,M2
ddexse3.htm,/home/class/adm/33eFg
I want the output to be something like this.
M1 and M2
Please help
Thanks,
If you don't necessarily need a shell script, you can easily do it in Python like this:
import csv
seen = set()
for row in csv.reader(open('a.csv')):
seen.add(tuple(row))
for row in csv.reader(open('b.csv')):
if tuple(row[:2]) in seen:
print row[2]
if those M1 and M2 are always at field 3 and 5, you can try this
awk -F"," 'FNR==NR{
split($3,b," ")
split($5,c," ")
a[$1]=b[1]" "c[1]
next
}
($1 in a){
print "found: " $1" "a[$1]
}' file2.txt file1.txt
output
# cat file2.txt
build.xml,/home/build/NUOP/project1,M1 eesX.java,/home/build/adm/acl,M2 ddexse3.htm,/home/class/adm/33eFg
filename, blah,M1 blah, blah, M2 blah , end
$ cat file1.txt
build.xml,/home/build/NUOP/project1 eesX.java,/home/build/adm/acl
$ ./shell.sh
found: build.xml M1 M2
try http://sourceforge.net/projects/csvdiff/
Quote:
csvdiff is a Perl script to diff/compare two csv files with the possibility to select the separator. Differences will be shown like: "Column XYZ in record 999" is different. After this, the actual and the expected result for this column will be shown.

Resources