Script for changing CSV into Key Value (KV) format - shell

I have a CSV file with data as below:
row_identifier,DBNAME,tblsps_name,Cur_size,Max_size,Used,Free,Percentage
tablespace,MRETF,RERETOSB15_DATA,51200,45600,14284,31316,31
tablespace,MRETF,SPOTLIGHT_DATA,500,2000,259,1741,13
tablespace,MRETF,DDLAUDITING,25,25,2,23,8
I want the output in the following format:
tablespace,MRETF,tblsps_name:RERETOSB15_DATA,Cur_size:51200,Max_size:45600,Used:14284,Free:31316,Percentage:31
tablespace,MRETF,tblsps_name:SPOTLIGHT_DATA,Cur_size:500,Max_size:2000,Used:259,Free:1741,Percentage:13
and so on..
Is this possible to get the output like the above key:value format?

Next time at least pretend that you tried something ;-)
awk -F"," 'FNR > 1 {print $1","$2",tblsps_name:"$3",Cur_size:"$4",Max_size:"$5",Used:"$6",Free:"$7",Percentage:"$8}' your.csv
-F"," is field separator, FNR > 1 skip first header line, $1 is first column and so on

Related

AWK write to a file based on number of fields in csv

I want to iterate over a csv file and discard the rows while writing to a file which doesnt have all columns in a row.
I have an input file mtest.csv like this
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP2##TestProcess2##TestDevice2
TestIP3##TestProcess3##TestDevice3##TestID3
But I am trying to only write those row records where all the 4 columns are present. The output should not have the TestIP2 column complete row as it has 3 columns.
Sample output should look like this:
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
I used to do like this to get all the columns earlier but it writes the TestIP2 row as well which has 3 columns
awk -F "\##" '{print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50)}' mtest.csv >output2.csv
But when I try to ensure that it writes to file when all 4 columns are present, it doesn't work
awk -F "\##", 'NF >3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,50)"\##"substr($4,1,50); exit}' mtest.csv >output2.csv
You are making things harder than it need to be. All you need to do is check NF==4 to output any records containing four fields. Your total awk expression would be:
awk -F'##' NF==4 < mtest.csv
(note: the default action by awk is print so there is no explicit print required.)
Example Use/Output
With your sample input in mtest.csv, you would receive:
$ awk -F'##' NF==4 < mtest.csv
IP##Process##Device##ID
TestIP1##TestProcess2##TestDevice1##TestID1
TestIP3##TestProcess3##TestDevice3##TestID3
Thanks David and vukung
Both your solutions are okay.I want to write to a file so that i can trim the length of each field as well
I think this below statement works out
awk -F "##" 'NF>3 {print $1"\##"substr($2,1,50)"\##"substr($3,1,2)"\##"substr($4,1,3)}' mtest.csv >output2.csv

Bash script OSX : split CSV

I have this string stored in a file :
ID1,A,B,,,,F
ID2,,,,D,E,F
ID3,,B,C,,,,
and I need to transform like this :
ID1,A
ID1,B
ID1,F
ID2,D
ID2,E
...
I tried with loop and IFS (like IFS=","; declare -a Array=($*)) whitout success.
Does someone knows how to do that ?
Pretty straight-forward in Awk,
awk 'BEGIN{FS=OFS=","}{first=$1; for (i=2;i<=NF;i++) if (length($i)) print first,$i}' file
Setting input and output field separator to , store the first field separately in first variable and print rest of the non-empty fields.
As suggested by user #PS. below you can also do,
awk -F, '{for(i=2;i<=NF;i++) if(length($i)) print $1 FS $i}' file
awk -F, '{for (i=2; i<=NF; i++) if($i != "") print $1","$i}' File
ID1,A
ID1,B
ID1,F
ID2,D
ID2,E
ID2,F
ID3,B
ID3,C
With , as the field seperator, for each line, loop from the 2nd field to the last field. If the current field is not empty, print the first field (IDx) and the current field seperated by a ,

Filtering data in a text file in bash

I am trying to filter the data in a text file. There are 2 fields in the text file. The first one is text while 2nd one has 3 parts seperated by _. The first part in the second file is date in yyyyMMdd format and the next 2 are string:
xyz yyyyMMdd_abc_lmn
Now I want to filter the lines in the file based on the date in the second field. I have come up with the following awk command but it doesn't seems to work as it is outputting the entire file definitely I am missing something.
Awk command:
awk -F'\t' -v ldate='20140101' '{cdate=substr($2, 1, 8); if( cdate <= ldate) {print $1'\t\t'$2}}' label
Try:
awk -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file
Note:
We are using split function which basically splits the field based on regex provided as the third element and stores the fields in the array defined as second element.
You don't need to set -F'\t unless your input file is tab-delimited. The default value of FS is space, so defining it to tab might throw it off in interpreting $2.
To output with two tabs you can set the OFS variable like:
awk -F'\t' -v OFS='\t\t' -v ldate='20140101' '{split($2,fld,/_/); if(fld[1]<=ldate) print $1,$2}' file
Try this:
awk -v ldate='20140101' 'substr($NF,1,8) <= ldate' label

CSV join some rows that have the same id

I have a CSV file like this
1,A,abc
2,A,def
1,B,smthing
1,A,ghk
5,C,smthing
Now I want to join all the rows that have the same value at row 2. In this case is row with the second element is A. The return file should be
1,A,abcdef,ghk
3,B,smthing
5,C,smthing
I'm trying with awk and I can get the second and the third fields but not whole file like this
awk -F, '{a[$2]=a[$2]?a[$2]$3:$3;}END{for (i in a)print i","a[i];}' old_file.csv > new_file.csv
Update
I solved my problem with 2 command. First create a new_file.csv (command above)
Second command will join old_file with new_file
awk -F, 'NR == FNR {a[$1] = $2;} NR != FNR && a[$2] {print $1","$2","a[$2];}' new_file.csv old_file.csv > last_file.csv
The last_file.csv looks like this
1,A,abcdefghk
2,A,abcdefghk
1,B,smthing
1,A,abcdefghk
5,C,smthing
So, how should I make a better command from those 2 commands?
Thank you!
One awk is enough:
awk 'NR==FNR{a[$2]=a[$2]==""?$3:a[$2] $3;next}{$3=a[$2]}1' FS=, OFS=, file file
1,A,abcdefghk
2,A,abcdefghk
1,B,smthing
1,A,abcdefghk
5,C,smthing
Explanation
NR==FNR{a[$2]=a[$2]==""?$3:a[$2] $3;next} merge records to array a (key is column 2)
$3=a[$2] read the input file again, change column 3 with new value.
Add the command to remove the duplicate records (column 2), keep the first one.
awk 'NR==FNR{a[$2]=a[$2]==""?$3:a[$2] $3;next}!b[$2]++{$3=a[$2];print}' FS=, OFS=, file file
1,A,abcdefghk
1,B,smthing
5,C,smthing

Output the first duplicate in a csv file

How do i output the first duplicate of a csv file?
for example if i have:
00:0D:67:24:D7:25,1,-34,123,135
00:0D:67:24:D7:25,1,-84,567,654
00:0D:67:24:D7:26,1,-83,456,234
00:0D:67:24:D7:26,1,-86,123,124
00:0D:67:24:D7:2C,1,-56,245,134
00:0D:67:24:D7:2C,1,-83,442,123
00:18:E7:EB:BC:A9,5,-70,123,136
00:18:E7:EB:BC:A9,5,-90,986,545
00:22:A4:25:A8:F9,6,-81,124,234
00:22:A4:25:A8:F9,6,-90,456,654
64:0F:28:D9:6E:F9,1,-67,789,766
64:0F:28:D9:6E:F9,1,-85,765,123
74:9D:DC:CB:73:89,10,-70,253,777
i want my output to look like this:
00:0D:67:24:D7:25,1,-34,123,135
00:0D:67:24:D7:26,1,-83,456,234
00:0D:67:24:D7:2C,1,-56,245,134
00:18:E7:EB:BC:A9,5,-70,123,136
00:22:A4:25:A8:F9,6,-81,124,234
64:0F:28:D9:6E:F9,1,-67,789,766
74:9D:DC:CB:73:89,10,-70,253,777
i was thinking along the lines of first outputting the first line of the csv file so like awk (code that outputs first row) >> file.csv then compare the first field of the row to the first field of the next row, if they are the same, check the next row. Until it comes to a new row, the code will output the new different row so again awk (code that outputs) >> file.csv and it will repeat until the check is complete
im kinda of new to bash coding, but i love it so far, im currently phrasing a csv file and i need some help. Thanks everyone
Using awk:
awk -F, '!a[$1]++' file.csv
awk forms an array where the 1st column is the key and the value is the count of no. of times the particular key is present. '!a[$1]++' will be true only when the 1st occurence of the 1st column, and hence the first occurrence of the line gets printed.
If I understand what you're getting at you want something like this:
prev_field=""
while read line
do
current_field=$(echo $line | cut -d ',' -f 1)
[[ $current_field != $prev_field ]] && echo $line
prev_field=$current_field
done < "stuff.csv"
Where stuff.csv is the name of your file. That's assuming that what you're trying to do is take the first field in the csv row and only print the first unique occurrence of it, which if that's the case I think your output may be missing a few.
Using uniq:
sort lines.csv | uniq -w 17
Provided your first column is fixed size (17). lines.csv is a file with your original input.
perl -F, -lane '$x{$F[0]}++;print if($x{$F[0]}==1)' your_file
if you want to change the file inplace:
perl -i -F, -lane '$x{$F[0]}++;print if($x{$F[0]}==1)' your_file

Resources