awk rounding the float numbers - bash

My first line of file a.txt contains following and fields are separated by (,)
ab,b1,c,d,5.986627e738,e,5.986627e738
cd,g2,h,i,7.3423542344,j,7.3423542344
ef,l3,m,n,9.3124234323,o,9.3124234323
when I issue the below command
awk -F"," 'NR>-1{OFS=",";gsub($5,$5+10);OFS=",";print }' a.txt
it is printing
ab,b1,c,d,inf,e,inf
cd,g2,h,i,17.3424,j,17.3424
ef,l3,m,n,19.3124,o,19.3124
Here I have two issues
I asked awk to add 10 to only 5th column but it has added to 7th column as well due to duplicate entries
It is rounding up the numbers, instead, I need decimals to print as it is
How can I fix this?

awk 'BEGIN {FS=OFS=","}{$5=sprintf("%.10f", $5+10)}7' file
in your data, the $5 from line#1 has an e, so it was turned into 10.0000... in output.
you did substitution with gsub, therefore all occurrences will be replaced.
printf/sprintf should be considered to output in certain format.
tested with gawk
If you want to set the format in printf dynamically:
kent$ cat f
ab,b1,c,d,5.9866,e,5.986627e738
cd,g2,h,i,7.34235,j,7.3423542344
ef,l3,m,n,9.312423,o,9.3124234323
kent$ awk 'BEGIN {FS=OFS=","}{split($5,p,".");$5=sprintf("%.*f",length(p[2]), $5+10)}7' f
ab,b1,c,d,15.9866,e,5.986627e738
cd,g2,h,i,17.34235,j,7.3423542344
ef,l3,m,n,19.312423,o,9.3124234323

what you did is replacement on the whole record, what you really want to do is
awk 'BEGIN {FS=OFS=","}
{$5+=10}1' a.txt

Related

Change date format in single column

I have a file with a 1000 lines. The first column has a date in then European standard (DD.MM.YYYY). There are about 20 other columns with different data. The columns are separated by a semicolon. Here's a simplified example.
10.12.2020;name;address
18.12.2020;name2;address2
21.12.2020;name3;address3
What I want to do is to change the date format in the first column to YYYY.MM.DD
In this case the final line should read
2020.12.10;name;address
2020.12.18;name2;address2
2020.12.21;name3;address3
I tried to do it with a combination of awk and sed
awk -F';' 'BEGIN {OFS = FS} NR != 0 { sed 's/\([0-9]\{2\}\).\([0-9]\{2\}\).\([0-9]\{4\}\)/\3.\2.\1/g'; print; }'
which results in errors. There is probably a better way to do it with gsub but I wasn't able to understand the syntax.
Can anyone help me achieve this result? Could be with sed, gsub or any other way.
No need to combine sed and awk. Either is sufficient:
sed -E 's/^(..)\.(..)\.(....);/\3.\2.\1;/'
or
awk -F\; -v OFS=\; '{split($1,a,"."); $1=a[3] "." a[2] "." a[1]; print}'

split up (with specified delimiter) a selected column

I have a tab-delimited file and want to modify it. The last column is pipe-delimited and I'd like to split that column up (from pipe to tab) while avoiding splitting up other columns with pipes.
This works to convert pipe to tab but I'm unable to have it do the splitting only on a selected column 13. Is there a way to have this work just on the last column without having to specify it?
awk -F'|' '$13=$13' OFS="\t" inputfile.tsv > split.tsv
Let's consider this tab-delimited test file:
$ cat file
a|b c|d e|f g
one two three four
To break up the third column on |:
$ awk -F'\t' '{gsub(/[|]/, "\t", $3)} 1' OFS='\t' file
a|b c|d e f g
one two three four
For your file, you will want to replace $3 with $13.
awk -F'\t' '{gsub(/[|]/, "\t", $13)} 1' OFS='\t' file
Or, to replace the last column, whatever column it is, use:
awk -F'\t' '{gsub(/[|]/, "\t", $NF)} 1' OFS='\t' file
How it works
-F'\t' sets the field separator on input to a tab.
gsub(/[|]/, "\t", $13) replaces | with a tab in field $13.
1 is awk's cryptic short-hand for print-the-line.
OFS='\t' tells awk to use a tab as the field separator on output.
Alternate form
It may be clearer and easier to maintain if \t is coded just once instead of three times. In that case (hat tip: Ed Morton):
awk 'BEGIN{FS=OFS="\t"} {gsub(/[|]/, OFS, $NF)} 1' file

Converting exponents to integers in Bash

I want to convert some timestamps in a text file from exponents to integers. The problem is that in the file there are already columns that are decimal numbers and those I don't want to change. I found this thread where the OP has a very similar problem and I came up with this code:
awk '{printf "%13.0f,%5.2f,%5.2f,%5.2f,%5.2f,%10.0f",$1,$2,$3,$4,$5,$6; print
"\n"} foo.csv
However, the code converts my timestamp just fine, but then converts all the other numbers to 0, like so:
`1391470000000,0.00,0.00,0.00,0.00,0000000000
1391380000000,0.00,0.00,0.00,0.00,0000000000`
What am I doing wrong?
EDIT:
My input numbers, foo.csv:
1.39147e+12,56.32,57.88,56.09,57.81,2911900000
1.39138e+12,58.15,58.48,56.08,56.15,2929300000
You haven't set the input field separator to a comma so the whole record is read as a single string, and the %13.0f format is converting just the first part (up to the first comma). The rest of the fields (2 through 6) are empty, and therefore equal 0. Try:
awk -F, '{printf "%13.0f,%5.2f,%5.2f,%5.2f,%5.2f,%10.0f\n",$1,$2,$3,$4,$5,$6} foo.csv
Or perhaps:
awk -F, -v OFS=, '{$1 = sprintf("%13.0f", $1); print}' foo.csv

How I can remove double lines in whole file, omiting first n characters in each line?

I have following data format:
123456786|data1
123456787|data2
123456788|data3
The first column is main_id. I need to remove all duplicated lines from txt file but omitting main_id number. How I can do that?
Normally I use such AWK script, but it finds double lines without omiting:
awk '!x[$0]++' $2 > "$filename"_no_doublets.txt #remove doublets
Thanks for any help.
if you have more columns, this line should do:
awk '{a=$0;sub(/[^|]*\|/,"",a)}!x[a]++' file
example:
123456786|data1
12345676|data1
123456787|data2|foo
203948787|data2|foo
123456788|data3
kent$ awk '{a=$0;sub(/[^|]*\|/,"",a)}!x[a]++' f
123456786|data1
123456787|data2|foo
123456788|data3
You can use:
awk -F'|' '!x[$2]++'
This will find duplicates only based on field 2 delimited by |
UPDATE:
awk '{line=$0; sub(/^[^|]+\|/, "", line)} !found[line]++'
awk '{key=$0; sub(/[^|]+/,"",key)} !seen[key]++' file

Cut and replace bash

I have to process a file with data organized like this
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
etc
Columns can have different length but lines always have the same number of columns.
I want to be able to cut a specific column of a given line and change it to the value I want.
For example I'd apply my command and change the file to
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
I know how to select a specific line with sed and then cut the field but I have no idea on how to replace the field with the value I have.
Thanks
Here's a way to do it with awk:
Going with your example, if you wanted to replace the 3rd field of the 1st line:
awk 'BEGIN{FS=OFS=":"} {if (NR==1) {$3 = "XXXX"}; print}' input_file
Input:
AAAAA:BB:CCC:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Output:
AAAAA:BB:XXXX:EEEE:DDDD
FF:III:JJJ:KK:LLL
MMMM:NN:OOO:PP
Explanation:
awk: invoke the awk command
'...': everything enclosed by single-quotes are instructions to awk
BEGIN{FS=OFS=":"}: Use : as delimiters for both input and output. FS stands for Field Separator. OFS stands for Output Field Separator.
if (NR==1) {$3 = "XXXX"};: If Number of Records (NR) read so far is 1, then set the 3rd field ($3) to "XXXX".
print: print the current line
input_file: name of your input file.
If instead what you are trying to accomplish is simply replace all occurrences of CCC with XXXX in your file, simply do:
sed -i 's/CCC/XXXX/g` input_file
Note that this will also replace partial matches, such as ABCCCDD -> ABXXXXDD
This might work for you (GNU sed):
sed -r 's/^(([^:]*:?){2})CCC/\1XXXX/' file
or
awk -F: -vOFS=: '$3=="CCC"{$3="XXXX"};1' file

Resources