Splitting a column in a csv file in Bash - bash

I would like to extract values from the second column in my csv file and store the extracted values in a new column.
sample of my dataset:
page_name post_id page_id
A 86680728811_272953252761568 86680728811
A 86680728811_273859942672742 86680728811
B 86680728033_281125741936891 86680728033
B 86680728033_10150500662053812 86680728033
I would like to extract the number that come after the underscore and store them in a new column. Sample output:
page_name post_id page_id
A 272953252761568 86680728811
A 273859942672742 86680728811
B 281125741936891 86680728033
B 10150500662053812 86680728033
I tried using this code:
cat FB_Dataset.csv | sed -Ee 's/(.*)post_id/\1post_id/' -e 's/,[_ ]/,/' -e 's/_/,/'
but I don't get the desired output.
Any help is appreciated. Thank you.

sed 's/[0-9][0-9]*_//' < a.csv
where a.csv is the file with your original data
edited to add [0-9]

Related

Add word using sed to the last column of csv

I tried the following sed and even I have set $ to set the path at the end it doesn't seem to be working. Also I dont know if there's any way of execute this line by line with a while or for:
sed -E 's/$/Location/'
But the ouput i recive is:
Locationi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
Location,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True
Input
wi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
2,4141,Armond,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True
Ouput expected
wi,d,nm,yr,dt,mnn,rmd,g,gnr,rc,ct,st,sgns,tt,fl,cmr,lng,lt,gcdng
2,4141,Armond,2018,2018-10-25,sh,vhcl,28,M,B,St. Ls,MO,F,attack,flng,False,-90.219,38.773,True Location
If you have the example file ex.csv containing a header-row and 3 data-rows like this:
col1,col2,col3
1,2,3
4,5,6
7,8,9
So you du the replacement from row 2 and onward to avoid the header. This is the 2,$ part.
Then you do the search and replace at the end of each row. s/$/ Location/
Put it all together as:
$ sed '2,$s/$/ Location/' ex.csv
col1,col2,col3
1,2,3 Location
4,5,6 Location
7,8,9 Location

Command line: retrieving specific column from CSV file

I have a CSV file called articles.csv with headers as follows:
article_id, article_title, article_shares, article_date.
The first row of data in the article is found as $ articles.csv | sed "1 d" and this returns: "895", "Trump, Clinton, America. Who will win, who will lose?", "100", "01/05/2016".
I want to return the fourth column of data (the date of the article) so I use the following code:
$ articles.csv | sed "1 d" | cut -d , -f 4.
However I don't get the date, I get America. Who will win. How do I get the output of the fourth column, regardless of the fact that some columns have commas in them?
A quick and dirty solution:
... | awk -F'",' '{print $4}'
A slow but clean solution:
... | ruby -ne $'require "csv"; print CSV.parse($_)[0][3]'
Note: CSV format should not have spaces between fields, so change your record to:
"895","Trump, Clinton, America. Who will win, who will lose?","100","01/05/2016"

How to split a CSV file into multiple files based on column value

I have CSV file which could look like this:
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will make bar chart for each .dat file) For this case it should be two files:
data1.dat
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
data2.dat
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
Is there any simple way of doing it with bash?
You can use awk to generate a file containing only a particular value of the second column:
awk -F ';' '($2==1){print}' data.dat > data1.dat
Just change the value in the $2== condition.
Or, if you want to do this automatically, just use:
awk -F ';' '{print > ("data"$2".dat")}' data.dat
which will output to files containing the value of the second column in the name.
Try this:
while IFS=";" read -r a b c; do echo "$a;$b;$c" >> data${b}.dat; done <file

combine lines of csv in bash

I want to create new csv file for each city combining several csv with rows and columns, one column has the name of cities, that repeat in all the csv files...
For example,
I have files with the name of the date,YYYYMMDD, 20140713.csv, 20140714.csv, 20140715.csv...
They have the same structure, same numbers of rows and columns, for example, 20140713.csv...
1. City, Data, TMinreal, TMaxreal, TMinext, TMaxext, DiffTMin, DiffTMax
2. Milano,20140714,19.0,28.8,18,27,1,1.8
3. Rome,20140714,18.1,29.3,14,29,4.1,0.3
4. Pisa,20140714,10.8,27.5,8,29,2.8,-1.5
5. Venecia,20140714,21.1,29.1,16,27,5.1,2.1
I want to combine all these csv files...and get, csv files with the name of the city, as Milano.csv and inside with the information about this city stored in all the csv combined.
For example, if I combine 20140713.csv, 20140714.csv, 20140715.csv, for Milano.csv
1. Milano,20140713,19.0,28.8,18,26,1,2.8
2. Milano,20140714,19.0,28.8,20,27,-1,1.8
3. Milano,20140715,21.0,26.8,19,27,2,-0.2
any idea? thank you
untested, but this should work:
awk -F, 'FNR==1{next} {file = $1".csv"; print > file}' 20*.csv
You can have this bash script:
#!/bin/bash
for FILE; do
{
read ## Skip header
while IFS=, read -r A B; do
echo "$A,$B" >> "$A".csv
done
} < "$FILE"
done
Then run as:
bash script.sh file1.csv file2.csv ...

I want my bash script output in html format?

I am parsing the csv file using bash script, my output will be in tabular form with number of rows and coloums, so when i redirect my output to text file alignment mismatch and look so messy.
Can anyone guide me how to redirect my output to html format or suggest me with anyother alternative way.
Thanks in advance
If you don't really need the output in HTML, but you're having trouble with column alignment using tabs, you can get good column alignment with printf.
By the way, it would help if your question included some sample input, the script that you're using to parse and output it and some sample output.
Here is a simple demonstration of printf:
$ cat file
example text,123,word,23.12
more text,1004,long sequence of words,1.1
text,1,a,1000.42
$ cat script
#!/bin/bash
headformat='%-*s%-*s%*s%*s\n'
format='%-*s%-*s%*d%*.*f\n'
modwidth=16; descwidth=24; qtywidth=6; pricewidth=10
printf "$headformat" "$modwidth" Model "$descwidth" Desc. "$qtywidth" Qty "$pricewidth" Price
while IFS=, read model quantity description price
do
printf "$format" "$modwidth" "$model" "$descwidth" "$description" "$qtywidth" "$quantity" "$pricewidth" 2 "$price"
done < file
$ ./script
Model Desc. Qty Price
example text word 123 23.12
more text long sequence of words 1004 1.10
text a 1 1000.42
Write it out in TSV, then have a XSLT stylesheet convert it from TSV to XHTML. You can use $'\t' in bash to produce a tab character.
A simple solution wold use column(1):
column -t -s, <( echo "head1,head2,head3,head4"; cat csv.dat )
with a result like this one:
head1 head2 head3 head4
aaaa 33333 bbb 123
aaa 333333 bbbx 123
aa 3333333 bbbxx 123
a 33333333 bbbxxx 123

Resources