I have the following csv:
1,host1,group1,group2
2,host2,group3,group4
3,host3,group5
4,host4,group6,group7,group8
I want to achieve the following:
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
How do I do this using linux command line?
$ awk -F , -v OFS=, '{for(i=3;i<=NF;i++) print $1,$2,$i}' data
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
awk 'BEGIN{FS=","}
{nf=NF;count=3;
while(nf-2>0){
printf "%s,%s,%s\n",$1,$2,$count;
count++;nf--
}
}' your_file
also would give you the desired result.
1,host1,group1
1,host1,group2
2,host2,group3
2,host2,group4
3,host3,group5
4,host4,group6
4,host4,group7
4,host4,group8
Related
how to discard the last field using awk
list.txt file contains data like below,
Ram/45/simple
Gin/Run/657/No/Sand
Ram/Hol/Sin
Tan/Tin/Bun
but I require output like below,
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
tried the following command but it prints only the last field
cat list.txt |awk -F '/' '{print $(NF)}'
45
No
Hol
Tin
With GNU awk, you could try following.
awk 'BEGIN{FS=OFS="/"} NF--' Input_file
OR with any awk try following.
awk 'BEGIN{FS=OFS="/"} match($0,/.*\//){print substr($0,RSTART,RLENGTH-1)}' Input_file
This simple awk should work:
awk '{sub(/\/[^/]*$/, "")} 1' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
Or even this simpler sed should also work:
sed 's~/[^/]*$~~' file
Ram/45
Gin/Run/657/No
Ram/Hol
Tan/Tin
I have a lot of *.csv files. I want to delete the content after a specific line. I will delete all lines after 20031231
How do I solve this problem with some lines of a shell script?
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
Test,20040101,000100,0.73342,0.744318
quick and dirty but without any other info about constraint
sed '1,/20031231/p;d' YourFile
If you want to use a shell script, the best is to use awk. This will do the trick:
awk 'BEGIN {FS=","} {if ($2 == "20031231") print $0}' input.csv > output.csv
This code will write to a different file only the lines that have 20031231.
ignores empty lines and unmatched data
awk file:
$ cat awk.awk
{
if($2<="20031231" && $0!=""){
print $0
}else{
next
}
}
execution:
$ awk -F',' -f awk.awk input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
one liner:
$ awk -F',' '{if($2<="20031231" && $0!=""){print $0}else{next}}' input
Test,20031231,000107,0.74843,0.74813
Test,20031231,000107,0.74838,0.74808
Test,20031231,000108,0.74841,0.74815
Test,20031231,000108,0.74835,0.74809
Test,20031231,000110,0.74842,0.74818
with Miller (http://johnkerl.org/miller/doc/)
mlr --nidx --fs "," filter '$2>20031231' input
gives you
Test,20040101,000100,0.73342,0.744318
With awk please try:
awk -F, '$2<=20031231' input.csv
Below are my input and required output. How can I achieve this using sed or awk in unix? A single command or a pipe is fine.
Input
PRODUCT1,PRICEa|PRICEb|PRICEc
PRODUCT2,PRICEd
PRODUCT3,PRICEe|PRICEf
(and so on)
Output
PRODUCT1,PRICEa
PRODUCT1,PRICEb
PRODUCT1,PRICEc
PRODUCT2,PRICEd
PRODUCT3,PRICEe
PRODUCT3,PRICEf
(and so on)
Following simple awk may help you on same.
awk -F, '{gsub(/\|/,ORS $1",")} 1' Input_file
bash
while IFS=',|' read -ra fields; do
printf "${fields[0]},%s\n" "${fields[#]:1}"
done < file
Another awk
awk -F '[,|]' -v OFS=, '{for (i=2; i<=NF; i++) print $1,$i}' file
With gnu sed
sed -E ':A;s/([^,]*,)(.*)\|(.*)/\1\2\n\1\3/;tA' infile
I have this DB dump file in comma separated CSV file with first line as heading/table name and rest of it are data and some has duplicate entry
HOST_#_INFORMATION,HOST#,Primary Hostname,DNS Domain,IP_#_INFORMATION,Primary IP,DNS
,11,abc,example.com,,10.10.10.10,10.10.10.1
,12,bcd,example.com,,10.10.10.11,10.10.10.1
,13,cde,example.com,,10.10.10.12,10.10.10.1
,11,abc,example.com,,10.10.10.10,10.10.10.1
,13,cde,example.com,,10.10.10.12,10.10.10.1
I need to print only unique columns between HOST_#_INFORMATION and IP_#_INFORMATIO. Output I am looking for is
HOST#,Primary Hostname,DNS Domain
11,abc,example.com
12,bcd,example.com
12,bcd,example.com
I tried with awk gsub option but only printing first line. how can i parse this csv file. I am open to perl option also. Thanks
[root#test /tmp]$ awk -F, -vOFS=, '{if(++a[$2,$3,$4]==1)print $2,$3,$4}' a
HOST#,Primary Hostname,DNS Domain
11,abc,example.com
12,bcd,example.com
13,cde,example.com
No need for awk or sed, use cut'n'sort instead:
cut -d, -f2-4 infile | sort -u
Output:
11,abc,example.com
12,bcd,example.com
13,cde,example.com
Assuming your input format (OP specify between 2 field but with 1 configuration showed)
awk -F ',' 'NR == 1{print "HOST#,Primary Hostname,DNS Domain"} NR > 1{print $2 "," $3, "," $4}' YourFile
Assuming you will parse header separately from data, this is how to parse data and remove duplicates:
awk -F',' '{print $2","$3","$4}'|sort -u
In Perl you could use Text::CSV module, which has rich set of functions to deal with CSV files.
How should I go about inserting a character at a certain point in a csv line? For instance, if I had the following:
1,2,3,4,5,6,7
How could I insert ,,,,, at the spot where the 5 (fifth field) is, so it would look like
1,2,3,4,,,,,,5,6,7
I found a link for how to do this for java, but unfortunately I did not have much luck finding out how to do it with bash. Any help would be much appreciated, thanks!
You can use awk to change a specific field:
awk -F"," '{OFS=","; a=$5; $5=",,,,,",a; print $0}' file
The idea is to update the field 5 with the desired values and then print the whole line.
echo "1,2,3,4,5,6,7" | awk -F"," '{a=$5; $5=",,,,,"a; OFS=","; print}'
would print:
1,2,3,4,,,,,,5,6,7
awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}' your_file
tested below:
> echo "1,2,3,4,5,6" | awk -F, 'BEGIN{OFS=","}{$5=",,,,,"$5;print}'
1,2,3,4,,,,,,5,6
>
or you can do it using perl:
> echo "1,2,3,4,5,6" | perl -F, -lane '$F[4]=~s/^/,,,,,/g;print join(",",#F)'
1,2,3,4,,,,,,5,6
>