rearranging columns in a csv as new lines - bash

I have the following csv:
1,host1,group1,group2
2,host2,group3,group4
3,host3,group5
4,host4,group6,group7,group8
I want to achieve the following:
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
How do I do this using linux command line?

$ awk -F , -v OFS=, '{for(i=3;i<=NF;i++) print $1,$2,$i}' data
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8

awk 'BEGIN{FS=","}
{nf=NF;count=3;
while(nf-2>0){
printf "%s,%s,%s\n",$1,$2,$count;
count++;nf--
}
}' your_file
also would give you the desired result.
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8

Related

how to discard the last field of the content of a file using awk command

how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin

delete all line after a specific date

I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318
quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile
If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.
ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318
With awk please try:
awk -F, '$2<=20031231' input.csv

Using awk or sed split pipe separated values into new lines keeping first string as common

Below are my input and required output. How can I achieve this using sed or awk in unix? A single command or a pipe is fine.
Input
PRODUCT1,PRICEa|PRICEb|PRICEc
PRODUCT2,PRICEd
PRODUCT3,PRICEe|PRICEf
(and so on)
Output
PRODUCT1,PRICEa
PRODUCT1,PRICEb
PRODUCT1,PRICEc
PRODUCT2,PRICEd
PRODUCT3,PRICEe
PRODUCT3,PRICEf
(and so on)
Following simple awk may help you on same.
awk -F, '{gsub(/\|/,ORS $1",")} 1' Input_file
bash
while IFS=',|' read -ra fields; do
printf "${fields[0]},%s\n" "${fields[#]:1}"
done < file
Another awk
awk -F '[,|]' -v OFS=, '{for (i=2; i<=NF; i++) print $1,$i}' file
With gnu sed
sed -E ':A;s/([^,]*,)(.*)\|(.*)/\1\2\n\1\3/;tA' infile

Parsing CSV file from DB

I have this DB dump file in comma separated CSV file with first line as heading/table name and rest of it are data and some has duplicate entry
HOST_#_INFORMATION,HOST#,Primary Hostname,DNS Domain,IP_#_INFORMATION,Primary IP,DNS
,11,abc,example.com,,10.10.10.10,10.10.10.1
,12,bcd,example.com,,10.10.10.11,10.10.10.1
,13,cde,example.com,,10.10.10.12,10.10.10.1
,11,abc,example.com,,10.10.10.10,10.10.10.1
,13,cde,example.com,,10.10.10.12,10.10.10.1
I need to print only unique columns between HOST_#_INFORMATION and IP_#_INFORMATIO. Output I am looking for is
HOST#,Primary Hostname,DNS Domain
11,abc,example.com
12,bcd,example.com
12,bcd,example.com
I tried with awk gsub option but only printing first line. how can i parse this csv file. I am open to perl option also. Thanks
[root#test /tmp]$ awk -F, -vOFS=, '{if(++a[$2,$3,$4]==1)print $2,$3,$4}' a
HOST#,Primary Hostname,DNS Domain
11,abc,example.com
12,bcd,example.com
13,cde,example.com
No need for awk or sed, use cut'n'sort instead:
cut -d, -f2-4 infile | sort -u
Output:
11,abc,example.com
12,bcd,example.com
13,cde,example.com
Assuming your input format (OP specify between 2 field but with 1 configuration showed)
awk -F ',' 'NR == 1{print "HOST#,Primary Hostname,DNS Domain"} NR > 1{print $2 "," $3, "," $4}' YourFile
Assuming you will parse header separately from data, this is how to parse data and remove duplicates:
awk -F',' '{print $2","$3","$4}'|sort -u
In Perl you could use Text::CSV module, which has rich set of functions to deal with CSV files.

how can I insert a character at a certain position in a csv line

How should I go about inserting a character at a certain point in a csv line? For instance, if I had the following:
1,2,3,4,5,6,7
How could I insert ,,,,, at the spot where the 5 (fifth field) is, so it would look like
1,2,3,4,,,,,,5,6,7
I found a link for how to do this for java, but unfortunately I did not have much luck finding out how to do it with bash. Any help would be much appreciated, thanks!
You can use awk to change a specific field:
awk -F"," '{OFS=","; a=$5; $5=",,,,,",a; print $0}' file
The idea is to update the field 5 with the desired values and then print the whole line.
echo "1,2,3,4,5,6,7" | awk -F"," '{a=$5; $5=",,,,,"a; OFS=","; print}'
would print:
1,2,3,4,,,,,,5,6,7
awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}' your_file
tested below:
> echo "1,2,3,4,5,6" | awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}'
1,2,3,4,,,,,,5,6
>
or you can do it using perl:
> echo "1,2,3,4,5,6" | perl -F, -lane '$F[4]=~s/^/,,,,,/g;print join(",",#F)'
1,2,3,4,,,,,,5,6
>

Resources