Bash replace in CSV multiple columns - bash

I have the following CSV format:
data_disk01,"/opt=920MB;4512;4917;0;4855","/=4244MB;5723;6041;0;6359","/tmp=408MB;998;1053;0;1109","/var=789MB;1673;1766;0;1859","/boot=53MB;656;692;0;729"
I would like to take from each column, except the first one, the last value from the array, like this:
data_disk01,"/opt=4855","/=6359","/tmp=1109","/var=1859","/boot=729"
I have tried something like:
awk 'BEGIN {FS=OFS=","} {if(NF==!1);gsub(/\=.*/,",")} 1'
Just the string, I managed to do it with:
string="/opt=920MB;4512;4917;0;4855"
echo $string | awk '{split($0,a,";"); print a[1],a[5]}' | sed 's#=.* #=#'
/opt=4855
But could not make it work for the whole CSV.
Any hints are appreciated.

If your input never contains commas in the quoted fields, simple sed script should work:
sed 's/=[^"]*;/=/g' file.csv

Could you please try following awk and let me know if this helps you.
awk '{gsub(/=[^"]*;/,"=")} 1' Input_file
In case you want to save output into Input_file then append > temp_file && mv temp_file Input_file in above code too.

Related

Parsing and modifying csv with bash

Have a csv file with tons of rows, small example:
id,location_id,name,title,email,directorate
1,1, Amy lee,Singer,,
2,2,brad Pitt,Actor,,Production
3,5,Steven Spielberg,Producer,spielberg#my.com,Production
Need to:
change first and last name to uppercase, example, Brad Pitt, Amy Lee.
create email with pattern first letter of first name + last name, all in lowercase with #google.com and value from location_id, example - ale1e#google.com, bpitt2#google.com
save it to new file.csv, with the same structure, example:
id,location_id,name,title,email,directorate
1,1, Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg#google.com,Production
I started from create a array and iterate through it, with bunch of sed, awk, but it gives to me random results.
Please give me advice, how resolve this task.
while read -ra array; do
for i in ${array[#]};
do
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}'
done
for i in ${array[#]};
do
awk -F "\"*,\"*" '{print $3}' | sed -e "s/\b\(.\)/\u\1/g"
done
done < file.csv
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}' working not correct.
Using GNU sed
$ sed -E 's/([^,]*,([^,]*),) ?(([[:alpha:]])[^ ]* +)(([^,]*),[^,]*,)[^,]*/\1\u\3\u\5\L\4\6\2#google.com/' input_file
id,location_id,name,title,email,directorate
1,1,Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt2#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg5#google.com,Production
With your shown samples please try following awk.
awk '
BEGIN{ FS=OFS="," }
{
split($3,arr," ")
val=(substr($3,1,1) arr[2]"#google.com,")
$NF=tolower(val) $NF
val=""
}
1
' Input_file

how to discard the last field of the content of a file using awk command

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

delete all line after a specific date

I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318
quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile
If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.
ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318
With awk please try:
awk -F, '$2<=20031231' input.csv

Remove hyphen from duration format time

I need to remove hyphen from duration format time and i didn't succeed with sed command as i intended to do it.
original output:
00:0-26:0-8
00:0-28:0-30
00:0-28:0-4
00:0-28:0-28
00:0-27:0-54
00:0-27:0-19
Expected output:
00:26:08
00:28:30
00:28:04
00:28:28
00:27:54
00:27:19
I tried with command but i am stucked.
sed 's/;/ /g' temp_file.txt | awk '{print $8}' | grep - | sed 's/-//g;s/00:0/0:/g'
Using sed:
sed 's/\<[0-9]\>/0&/g;s/:00-/:/g' file
The first command s/\<[0-9]\>/0&/g is adding a zero to single digit numbers.
The second command s/:00-/:/g is removing the 0- in front of the number.
With your shown sample only, following awk may help you on same.
awk -F":" '{for(i=1;i<=NF;i++){sub(/0-/,"",$i);$i=length($i)==1?0$i:$i}} 1' OFS=":" Input_file
In case you want to save output into Input_file itself then append > temp_file && mv temp_file Input_file to above command too.
For the given example, this one-liner does the job:
awk -F':0-' '{printf "%02d:%02d:%02d\n",$1,$2,$3}' file
If I have the below output with two columns "duration time"? When I try to use one of your regexp above is adding me "0" for the first column duration time/timestamp and I dont want that, just the column $7 = duration_time separated by ; to be modified.
01;12May2018 8:20:36;192.168.1.111;78787;192.168.1.111;78787;80:25:0-49;2018-05-12_111111;RO
02;14May2018 2:43:16;192.168.1.132;78787;192.168.1.111;78787;36:10:0-10;2018-05-12_111111;RO
03;15May2018 7:40:01;192.168.131.1;78787;192.168.1.111;78787;18:39:0-44;2018-05-12_111111;RO
04;15May2018 12:37:46;192.168.1.201;78787;192.168.1.111;78787;12:51:0-14;2018-05-12_111111;RO
Here is the output:
root#root> sed 's/\<[0-9]\>/0&/g;s/:00-/:/g' temp_file
01;12May2018 08:20:36;192.168.01.111;78787;192.168.01.111;78787;80:25:49;2018-05-12_111111;RO
02;14May2018 02:43:16;192.168.01.132;78787;192.168.01.111;78787;36:10:10;2018-05-12_111111;RO
03;15May2018 07:40:01;192.168.131.01;78787;192.168.01.111;78787;18:39:44;2018-05-12_111111;RO
04;15May2018 12:37:46;192.168.01.201;78787;192.168.01.111;78787;12:51:14;2018-05-12_111111;RO

cut out fields that matched a regex from a delimited string

Example file:
35=A|11=ABC|55=AAA|20=DEF
35=B|66=ABC|755=AAA|800=DEF|11=ZZ|55=YYY
35=C|66=ABC|11=CC|755=AAA|800=DEF|55=UUU
35=C|66=ABC|11=XX|755=AAA|800=DEF
i want the output to to print like following, with only column 11= and 55= printed. (They are not at fixed location)
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
Thanks.
sed might be easier here:
sed -nr '/(^|\|)11=[^|]*.*\|55=/s~^.*(11=[^|]*).*(\|55=[^|]*).*$~\1\2~p' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
Try this:
$ awk -F'|' '{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU
11=XX
To only show lines that have both a 11 field and a 55 field:
$ awk -F'|' '/(^|\|)11=/ && /\|55=/{f=0;for (i=1;i<=NF;i++)if ($i~/^(11|55)=/){printf "%s",(f?"|":"")$i;f=1};print""}' file
11=ABC|55=AAA
11=ZZ|55=YYY
11=CC|55=UUU

Resources