awk syntax — what is OFS and why the 1 at the end? - shell

awk -F"\t" -v OFS="\t" '{if($18~/^ *[0-9]*(\.[0-9]+)?" *$/)sub(/"/,"",$18);else $18=" "}1' sample.txt
The code above is some awk code used in a script I'm modifying. I'm new to Unix so am not able understand the syntax of the above awk.
-F is for splitting the colum with the delimeter.
What is OFS?
And what is the use of 1 at the end of the awk script?

-v OFS="\n" passes a param named OFS from the shell to the awk script. Like the -F option or FS it is the field separator - but for the output. It is called the output field separator
You can test it:
awk -v OFS=' ' '{print 1,2}' a.txt
Output separated by spaces:
1 2
1 2
.
awk -v OFS=';' '{print 1,2}' a.txt
Output separated by ;:
1;2
1;2
In your case it means, that the output will be separated by tabs (as the input)
The 1 at the end of the awk script, let awk print the original input line in addition to the script generated output. That's because an awk script usually contains tests (regex, etc) and actions for them. The test 1 will be always true. And as the default action of awk is printing the current line, it will print the line

Related

Get all the text blocks with more than a certain number lines in a file

The text blocks are separated by blank lines, like:
AAA
BBB
AAA'
BBB'
AAA
BBB
CCC
I'd like to get the last text block that has more than 2 lines.
I know I could write a Python script.
How can I do so by using some command line fu?
EDIT: I think I have misunderstood "get the last text block". To simply print all paragraphs with more than 2 lines:
awk -v RS= -v ORS='\n\n' -F '\n' 'NF>2' file
perl -F'\n' -00e 'print if $#F >= 2' file
awk:
awk -v RS= -F '\n' 'NF>2 {rec=$0} END {if (rec!="") print rec}' file
RS set to a null value enables "paragraph mode". FS has been set to \n (so that NF will be equivalent to the number of lines within each paragraph). The awk program saves the latest record matching the criteria NF>2 & prints it at the end.
perl using a similar idea (except that perl counts the number of fields differently):
perl -F'\n' -l -00e '$rec=$_ if $#F >= 2; END {print $rec if defined $rec}' file
Depending on the content of the file, it may be faster to read the file backwards, e.g. with tac:
tac file | perl -F'\n' -l -00e 'if ($#F >= 2) {print $_; exit}' | tac

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

modify line in a .txt file in bash [duplicate]

This question already has answers here:
Modify column 2 only using awk and sed
(2 answers)
Closed 6 years ago.
I have a .txt that contains lines and these in turn data separated by "," for example:
10,05,nov,2016,122,2,2,330,user
What I want is to be able to modify a parameter of an X line, which the search method is the first number, which is unique, is not repeated.
For example find the number 10 (f1) and modify the row containing the 122 (f5).
I've tried it with sed but I can't do it.
I've commented that with awk I could, but I did'nt study that command.
Some help??
A simple awk script like the following should do the trick :
awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
Explanation:
awk -v find="10" -v field="5" -v newval="abcd" : defines 3 variables for awk. find, that contains the pattern we are looking for,field that contains the number of the field we want to edit, and newval with the value to replace.
BEGIN {FS=OFS=","} : before iterating through the file, we set the File Separator and Output File Separator to ",".
if ($1 == find) $field=newval: if the 1rst field of a line contains the pattern we want, we set the Nth field (1st if $field=1, 2nd if $field=2, ...) to the value of newval
print $0: whatever the result from the if test, we print the whole line.
A shorter (but less understandable) version of this script could be written as follow :
awk -v a="10" -v f="5" -v n="abcd" -F, '$1 == a {$f=n}OFS=FS' test.csv
Where a refers to find, f refers to field, n refers to newval and -F, refers to FS=","
Script in action :
> cat test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,123,2,2,330,user
12,05,nov,2016,124,2,2,330,user
> awk -v find="10" -v field="5" -v newval="abcd" 'BEGIN {FS=OFS=","} {if ($1 == find) $field=newval; print $0}' test.csv
11,05,nov,2016,122,2,2,330,user
10,05,nov,2016,abcd,2,2,330,user
12,05,nov,2016,124,2,2,330,user
With sed:
$ sed '/^10/s/,[^,]*/,333/4' <<< "10,05,nov,2016,122,2,2,330,user"
10,05,nov,2016,333,2,2,330,user
In lines starting with 10, search for 4th comma followed by non-comma characters and replace with your substitution string.

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

awk: assigning a shell variable in awk script

I have a situation in awk where I need to convert an input format into another format and later use the number of records processed separately. Is there any way I can use a shell variable to get the value of NR in the END section? Something like:
cat file1 | awk 'some processing END{SHELL_VARIABLE=NR}' > file2
Then later use SHELL_VARIABLE outside awk.
I do not want to process the file and then do a wc -l separately as the files are huge.
One way: Use the redirection inside your awk command and print your result in the END block. And use command substitution to read the result in a shell variable:
my_var=$(awk '{ some processing; print "your output" >>file2 } END { print NR }' file1)
No subprocess can affect the parent's environment variables. What you can do is have awk write output to the file directly, then have it print the value you want to stdout and capture it. Or if you prefer, you could reverse that and have awk just print it to a file and read it back afterwards.
Incidentally, you have a UUOC.
rows=$(awk '{ ...; print > "file2"} END {print NR}' file1)
Or
awk '... END{print NR > "rows"}' file1 >file2
rows=$(<rows)
rm rows

Resources