How to print a range of columns in a CSV in AWK? [duplicate] - bash

This question already has answers here:
Extract specific columns from delimited file using Awk
(8 answers)
Closed 4 years ago.
With awk, I can print any column within a CSV, e.g., this will print the 10th column in file.csv.
awk -F, '{ print $10 }' file.csv
If I need to print columns 5-10, including the comma, I only know this way:
awk -F, '{ print $5","$6","$7","$8","$9","$10 }' file.csv
This method is not so good if I want to print many columns. Is there a simpler syntax for printing a range of columns in a CSV in awk?

The standard way to do this in awk is using a for loop:
awk -v s=5 -v e=10 'BEGIN{FS=OFS=","}{for (i=s; i<=e; ++i) printf "%s%s", $i, (i<e?OFS:ORS)}' file
However, if your delimiter is simple (as in your example), you may prefer to use cut:
cut -d, -f5-10 file
Perl deserves a mention (using -a to enable autosplit mode):
perl -F, -lane '$"=","; print "#F[4..9]"' file

You can use a loop in awk to print columns from 5 to 10:
awk -F, '{ for (i=5; i<=10; i++) print $i }' file.csv
Keep in mind that using print it will print each columns on a new line. If you want to print them on same line using OFS then use:
awk -F, -v OFS=, '{ for (i=5; i<=10; i++) printf("%s%s", $i, OFS) }' file.csv

With GNU awk for gensub():
$ cat file
a,b,c,d,e,f,g,h,i,j,k,l,m
$
$ awk -v s=5 -v n=6 '{ print gensub("(([^,]+,){"s-1"})(([^,]+,){"n-1"}[^,]+).*","\\3","") }' file
e,f,g,h,i,j
s is the start position and n is the number of fields to print from that point on. Or if you prefer to specify start and end:
$ awk -v s=5 -v e=10 '{ print gensub("(([^,]+,){"s-1"})(([^,]+,){"e-s"}[^,]+).*","\\3","") }' file
e,f,g,h,i,j
Note that this will only work with single-character field separators since it relies on being able to negate the FS in a character class.

Related

How to awk to find and replace string in 2,3,4 and 5th column if matching pattern exist

My text file consist of 5 columns. Based on the user input want to search only in 2nd, 3rd, 4th and 5th column and replace it.
for example
oldstring="Old_word"
newstring="New_word"
So want to find all the exact match of oldstring and replace the same with newstring.
Column one should be untouched even if there is a match.
Browsed and found that awk will do but I am able to change in one particular column.
bash script
$ cat testfile
a,b,c Old_word,d,e
f,gOld_wordOld_wordOld_words,h,i,j
$ awk -F, -v OFS=, -v oldstring="Old_word" -v newstring="New_Word" '{
for (i=2; i<=5; i++) { gsub(oldstring,newstring,$i) }
print
}' testfile
a,b,c New_Word,d,e
f,gNew_WordNew_WordNew_Words,h,i,j
For more information about awk read the awk info page
Another way, similar to Glenn Jackman's answer is :
$ awk -F, -v OFS=, -v old="Old_word" -v new="New_word" '{
s=$1; $1=""; gsub(old,new,$0); $1=s
print }' <file>

Need to use awk to get a specific word or value after another specific word?

I need to use awk to get a specific word or value after another specific word, I tried some awk commands already but after many other filters like grep and sed. The file that I need to get the word from is having the same line more than one time like the below line:
Configuration: number=6 model=MSA SNT=4 IC=8 SIZE=16384MB NRF=24 meas=2.00
If need 24 I used
grep IC file | awk 'NF>1{print $NF}'
If need 16384MB I used
grep IC file | awk -F'SIZE=' '{ print $2 }'|awk '{ print $1 }'
We need to get any word from that line using awk? what I used can get what is needed but we still need a minimized awk command.
I am sure we can use one single awk to get the needed info from one line minimized command?
sed -r 's/.*SIZE=([^ ]+).*/\1/' input
16384MB
sed -r 's/.*NRF=([^ ]+).*/\1/' input
24
grep way :
grep -oP 'SIZE=\K[^ ]+' imput
16384MB
awk way :
awk '{for(i=1;i<=NF;i++) if($i ~ /SIZE=/) split($i,a,"=");print a[2]}' input
You could use an Awk with multi-character de-limiter as below to get this done. Loop through the fields, match the pattern you need and print the next field which contains the field value.
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
Examples,
match="number"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
6
match="model"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
MSA
match="meas"
awk -F'[:= ]' -v option="${match}" '{for(i=1;i<=NF;i++) if ($i ~ option) {print $(i+1)}}' file
2.00
here is a more general approach
... | awk -v k=NRF '{for(i=2;i<=NF;i++) {split($i,a,"="); m[a[1]]=a[2]} print m[k]}'
code will stay the same just change the key k.
If you have GNU awk you could use the third parameter of match:
$ awk 'match($0,/( IC=)([^ ]*)/,a)&& $0=a[2]' file
8
Or get the meas:
$ awk 'match($0,/( meas=)([^ ]*)/,a)&& $0=a[2]' file
2.00
Should you use some other awk, you could use this combination of split, substr and match:
$ awk 'split(substr($0,match($0,/ IC=[^ ]*/),RLENGTH),a,"=") && $0=a[2]' file
8

awk - how to replace semicolon in string in csv file?

I need to manage smtp logfile handling in my company.
These logfiles need to be imported to MSSQL, so it is my job to provide this data.
I got strange undelivery message with a ";" in the string, I need to replace this with a comma.
So what I got:
Sender;Recipient;Operation;Answer;Error;Servername
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions;+try+later;M0641
Mention the ";" in the Answer field after "restrictions", dunno why the mail server sends semicolons, maybe to annoy me :P
I tried following with awk after I did a lot of research:
awk 'BEGIN{FS=OFS=";"} {for (i=5;i<=NF;i++) gsub (";",",",$i)} 1' myfile.csv
This command actually works but it seems it does nothing with my file, the ";" in the error field remains. What I am missing here ?
Replacing the fifth and later ; with ,
$ awk -F\; '{for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))}' myfile.csv
Sender;Recipient;Operation;Answer;Error,Servername
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator for input to ;.
for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))
This loops over every field and prints the field followed by (a) ORS if we are on the last field, or (b) , if were are on field 5 or later, or (c) ; if we are on one of the first four fields.
Replacing all ; with ,
Try:
$ awk -F\; '{$1=$1} 1' OFS=, myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator on input to a semicolon.
$1=$1
This causes awk to think the the line has been changed so that awk will update the output line to use the new field separator.
1
This tells awk to print the line.
OFS=,
This sets the field separator on output to a comma.
Alternative #1
$ awk '{gsub(/;/, ",")} 1' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
Alternative #2
$ sed 's/;/,/g' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
I think your problem is replacing the unquotes delimiters in your logical 4th field in a five field wide input. Although this script is repetitious should be easier to understand
$ awk '{n=split($0,a,";");
for(i=1; i<4; i++) printf "%s;", a[i];
for(i=4; i<n-1; i++) printf "%s,", a[i];
printf "%s;%s\n", a[n-1], a[n]}' file
A better way to write the same based on #Ed Morton's comments
$ awk -F';' '{for(i=1; i<NF-1; i++) printf "%s"(i<4?FS:","), $i;
print $(NF-1) FS $NF}' file
For the input
1;2;3;4a;4b;4c;5
1;2;3;4;5
it generates
1;2;3;4a,4b,4c;5
1;2;3;4;5
If the offending semi-colons only appear in your 5th field then you can do this using GNU awk for the 3rd arg to match():
$ awk 'match($0,/(([^;]+;){4})(.*)(;[^;]+$)/,a){gsub(/;/,",",a[3]); print a[1] a[3] a[4]}' file
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later;M0641
If your fifth ; should be removed, append $6 to $5 and advance accordingly. This could be done with for loop (there are examples in SO) but since the fault is so near the end, we'll just do this in a simpler way:
$ awk 'BEGIN {FS=OFS=";"} NR==1 {nf=NF} NF==(nf+1) {$5=$5 "," $6; $6=$7; NF=nf} 1' file
Explained:
BEGIN {FS=OFS=";"} # set separator
NR==1 {nf=NF} # get field count from the first record (6)
NF==(nf+1) { # if record is one field longer:
$5=$5 "," $6 # append $6 to $5, comma-separated
$6=$7 # set $7 (NF) to $6 (nf)
NF=nf # reset NF
} 1 # output
Testing: Running the program and sending the output to cut -d\; -f 5 outputs:
Error
+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later

awk: sort file based on user input

I have this simple awk code:
awk -F, 'BEGIN{OFS=FS} {print $2,$1,$3}' $1
Works great, except I've hardcoded how I want to sort the comma-delimited fields of my plaintext file. I want to be able to specify at run time in which order I'd like to sort my fields.
One hacky way I thought about doing this was this:
read first
read second
read third
TOTAL=$first","$second","$third
awk -F, 'BEGIN{OFS=FS} {print $TOTAL}' $1
But this doesn't actually work:
awk: illegal field $(), name "TOTAL"
Also, I know a bit about awk's ability to accept user input:
BEGIN {
getline first < "-"
}
$1 == first {
}
But I wonder whether the variables created can in turn be used as variables in the original print command? Is there a better way?
You have to let bash expand $TOTAL before awk is called, so that awk sees the value of $TOTAL, not the literal string $TOTAL. This means using double, not single, quotes.
read first
read second
read third
# Dynamically construct the awk script to run
TOTAL="\$$first,\$$second,\$$third"
SCRIPT="BEGIN{OFS=FS} {print $TOTAL}"
awk -F, "$SCRIPT" "$1"
A safer method is to pass the field numbers as awk variables.
awk -F, -v c1="$first" -v c2="$second" -v c3="$third" 'BEGIN{OFS=FS} {print $c1, $c2, $c3}' "$1"
All you need is:
awk -v order='3 1 2' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
e.g.:
$ echo 'a b c' | awk -v order='3 1 2' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
c a b
$ echo 'a b c' | awk -v order='2 3 1' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
b c a

creating a ":" delimited list in bash script using awk

I have following lines
380:<CHECKSUM_VALIDATION>
393:</CHECKSUM_VALIDATION>
437:<CHECKSUM_VALIDATION>
441:</CHECKSUM_VALIDATION>
I need to format it as below
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Is it possible to achieve above output using "awk"? [I'm using bash]
Thanks you!
Here you go:
awk -F '[:<>/]+' '{ n = $1; getline; print $2 ":" n ":" $1 }'
Explanation:
Set the field separator with -F to be a sequence of a mix of :<>/ characters, this way the first field will be the number, and the second will be CHECKSUM_VALIDATION
Save the first field in variable n and read the next line (which would overwrite $1)
Print the line: a combination of the number from the previous line, and the fields on the current line
Another approach without using getline:
awk -F '[:<>/]+' 'NR % 2 { n = $1 } NR % 2 == 0 { print $2 ":" n ":" $1 }'
This one uses the record counter NR to determine whether it's time to print: if NR is odd, save the first field in n, if NR is even, then print.
You can try this sed,
sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
Test:
sat:~$ sed 'N; s/\([0-9]\+\):<\(.*\)>\n\([0-9]\+\):<\(.*\)>/\2:\1:\3/' file.txt
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Another way:
awk -F: '/<C/ {printf "CHECKSUM_VALIDATION:%d:",$1; next} {print $1}'
Here is one gnu awk
awk -F"[:\n<>]" 'NR==1{print $3,$1,$5;f=$3;next} $3{print f,$3,$7}' OFS=":" RS="</CH" file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441
Based on Jonas post and avoiding getline, this awk should do:
awk -F '[:<>/]+' '/<C/ {f=$1;next} { print $2,f,$1}' OFS=\: file
CHECKSUM_VALIDATION:380:393
CHECKSUM_VALIDATION:437:441

Resources