Changing value in entire column of a CSV file using awk - bash

I have CSV file like example below. I wish to change value of mail in every line for the same mail with awk. I used
awk -F ";" '{$18=firstname.lastname#testdata.invali} {print}' example_source_20180619.csv > erm.csv
but got error invalid statement
PNR;GES-SL-SAP;VERT-KANAL-SL-SAP;DSTNR-SAP;BTRBL-FKT;SCHWPKT-TAETIG-SL;SCHWPKT-TAETIG-TXT;BTRBL-TITEL-TXT;ANREDE;NAME;VORNAME;STRASSE;PLZ;ORT;DIENST-TEL-NR;TELEFAX-NR;MOBIL-TEL-NR;E-MAIL-ADR;INTERNET-ADR;P34F-KZ;HD-ANL-BER-KZ;VERT-KANAL-SL;
0000000;0010;2100 ;00602;Referent ;99;Sonstige/kein Schw. ;ohne Titel ;Sir ;John ;Doe ;Paul-Keller-Str. 21 ;92318;Neumarkt i.d.OPf. ;phone;0941/phone;;mail#mail.com;http://web.de ;NO;NO;

awk -F ";" '{OFS=";"; $18="firstname.lastname#testdata.invali"; print;}'
Put strings inside ".
Separate commands using ;.
Set the output separator also to ; so the output is similar to the input.
I guess theres no point in substituting the email address in the first line, so I added a small if below:
awk -F ";" '{ OFS=";"; if (NR != 1) { $18="firstname.lastname#testdata.invali"; } print; }'

Related

awk: select first column and value in column after matching word

I have a .csv where each row corresponds to a person (first column) and attributes with values that are available for that person. I want to extract the names and values a particular attribute for persons where the attribute is available. The doc is structured as follows:
name,attribute1,value1,attribute2,value2,attribute3,value3
joe,height,5.2,weight,178,hair,
james,,,,,,
jesse,weight,165,height,5.3,hair,brown
jerome,hair,black,breakfast,donuts,height,6.8
I want a file that looks like this:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
Using this earlier post, I've tried a few different awk methods but am still having trouble getting both the first column and then whatever column has the desired value for the attribute (say height). For example the following returns everything.
awk -F "height," '{print $1 "," FS$2}' file.csv
I could grep only the rows with height in them, but I'd prefer to do everything in a single line if I can.
You may use this awk:
cat attrib.awk
BEGIN {
FS=OFS=","
print "name,attribute,value"
}
NR > 1 && match($0, k "[^,]+") {
print $1, substr($0, RSTART+1, RLENGTH-1)
}
# then run it as
awk -v k=',height,' -f attrib.awk file
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
# or this one
awk -v k=',weight,' -f attrib.awk file
name,attribute,value
joe,weight,178
jesse,weight,165
With your shown samples please try following awk code. Written and tested in GNU awk. Simple explanation would be, using GNU awk and setting RS(record separator) to ^[^,]*,height,[^,]* and then printing RT as per requirement to get expected output.
awk -v RS='^[^,]*,height,[^,]*' 'RT{print RT}' Input_file
I'd suggest a sed one-liner:
sed -n 's/^\([^,]*\).*\(,height,[^,]*\).*/\1\2/p' file.csv
One awk idea:
awk -v attr="height" '
BEGIN { FS=OFS="," }
FNR==1 { print "name", "attribute", "value"; next }
{ for (i=2;i<=NF;i+=2) # loop through even-numbered fields
if ($i == attr) { # if field value is an exact match to the "attr" variable then ...
print $1,$i,$(i+1) # print current name, current field and next field to stdout
next # no need to check rest of current line; skip to next input line
}
}
' file.csv
NOTE: this assumes the input value (height in this example) will match exactly (including same capitalization) with a field in the file
This generates:
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
With a perl one-liner:
$ perl -lne '
print "name,attribute,value" if $.==1;
print "$1,$2" if /^(\w+).*(height,\d+\.\d+)/
' file
output
name,attribute,value
joe,height,5.2
jesse,height,5.3
jerome,height,6.8
awk accepts variable-value arguments following a -v flag before the script. Thus, the name of the required attribute can be passed into an awk script using the general pattern:
awk -v attr=attribute1 ' {} ' file.csv
Inside the script, the value of the passed variable is reference by the variable name, in this case attr.
Your criteria are to print column 1, the first column containing the name, the column corresponding to the required header value, and the column immediately after that column (holding the matched values).
Thus, the following script allows you to fish out the column headed "attribute1" and it's next neighbour:
awk -v attr=attribute1 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' data.txt
result:
name,attribute1,value1
joe,height,5.2
james,,
jesse,weight,165
jerome,hair,black
another column (attribute 3):
awk -v attr=attribute3 ' BEGIN {FS=","} /attr/{for (i=1;i<=NF;i++) if($i == attr) col=i;} {print $1","$col","$(col+1)} ' awkNames.txt
result:
name,attribute3,value3
joe,hair,
james,,
jesse,hair,brown
jerome,height,6.8
Just change the value of the -v attr= argument for the required column.

How to set FS to eof?

I want to read whole file not per lines. How to change field separator to eof-symbol?
I do:
awk "^[0-9]+∆DD$1[PS].*$" $(ls -tr)
$1 - param (some integer), .* - message that I want to print. There is a problem: message can contains \n. In that way this code prints only first line of file. How can I scan whole file not per lines?
Can I do this using awk, sed, grep? Script must have length <= 60 characters (include spaces).
Assuming you mean record separator, not field separator, with GNU awk you'd do:
gawk -v RS='^$' '{ print "<" $0 ">" }' file
Replace the print with whatever you really want to do and update your question with some sample input and expected output if you want help with that part too.
The portable way to do this, by the way, is to build up the record line by line and then process it in the END section:
awk '{rec = rec (NR>1?RS:"") $0} END{ print "<" rec ">" }' file
using nf = split(rec,flds) to create fields if necessary.

setting the output field separator in awk

I'n trying this statement in my awk script (in a file containing separate code, so not inline), script name: print-table.awk
BEGIN {FS = "\t";OFS = "," ; print "about to open the file"}
{print $0}
END {print "about to close stream" }
and running it this way from the shell
awk -f print-table.awk table
Where table is a tab separated file,
My goal is to declare the field separator (FS) and the output field separator (OFS) within the external function, and calling from the shell simply the
awk -f file input
without setting the field separator in the command line with -F"\t"
and without stdout it to a sed statement replacing the tab with a comma,
Any advise how can i do that?
You need to convince awk that something has changed to get it to reformat $0 using your OFS. The following works though there may be a more idiomatic way to do it.
BEGIN {FS = "\t";OFS = "," ; print "about to open the file"}
{$1=$1}1
END {print "about to close stream" }
You need to alter one of the field in awk:
awk 'BEGIN {FS="\t";OFS=","; print "about to open the file"} {$1=$1}1' file

awk changes the text unexpectedly

I am using the following awk statement in my shell script.
#!/bin/sh
# read file line by line
file="/pdump/country.000000.txt"
while read line
do
mycol=`echo $line | awk -F"," '{print $2}'`
mycol_new=`echo $mycol | tr "[:lower:]" [:upper:]`
echo $line | awk -v var="$mycol_new" -F"," '{print $1 "," var "," $3 "," $4 "," $5 "," $6 "," $7 "," $8}'
done < $file
It is working as expected.
The only problem is that if the original text is \N (slash N) in any other column for e.g. $4 or $7 then it changes to N (without slash).
How do I preserve the original values while replacing only the second column.
You need to use the -r option for read in your while loop:
while read -r line
That preserves backslashes in the input. That option should almost always be used. Make it a habit.
awk strips out the backslash if it's not one of the recognized escape sequences. So if it was \n, awk would have recognized it as newline but \N is simply interpreted as N. More details here
If I read your code correctly, you are trying:
Read input from a comma-separated-values (CSV) file
Change the second field to uppercase
Print the result.
If that is the case, use AWK directly. Save the following to toupper_second_field.awk:
BEGIN { FS = ","; OFS="," }
{ $2 = toupper($2); print }
The first line sets the field separators for both input (FS) and output (OFS) to comma. The second converts field #2 to upper case, then print. To invoke it:
awk -f toupper_second_field.awk /pdump/country.000000.txt
The logic is much simpler and you don't have to worry about backslashes.

How to print selected columns separated by tabs?

I have a txt file with columns separated by tabs and based on that file, I want to create a new file that only contains information from some of the columns.
This is what I have now:
awk '{ print $1, $5 }' filename > newfilename
That works except that when column 5 contains spaces e.g 123 Street, only 123 shows up and the street is considered as another column.
How can I achieve what I'm trying to do?
You can specify the field separator as tab:
awk 'BEGIN { FS = "\t" } ; { print $1, $5 }' filename > newfilename
Or from the command line like this:
awk -F"\t" '{ print $1, $5 }' filename > newfilename
What about simple cut shell comand?
very simple yet does the job
cut -d "\t" -f 1,5 filename > newfilename
You can use Bash syntax in the following way:
while IFS=$'\t' read -a cols; do
printf "%s\t%s\n" "${cols[0]}" "${cols[4]}";
done < in.txt > newfile.txt
This will save 1st and 5th columns separated by tabs into the new file.

Resources