Converting exponents to integers in Bash - bash

I want to convert some timestamps in a text file from exponents to integers. The problem is that in the file there are already columns that are decimal numbers and those I don't want to change. I found this thread where the OP has a very similar problem and I came up with this code:
awk '{printf "%13.0f,%5.2f,%5.2f,%5.2f,%5.2f,%10.0f",$1,$2,$3,$4,$5,$6; print
"\n"} foo.csv
However, the code converts my timestamp just fine, but then converts all the other numbers to 0, like so:
`1391470000000,0.00,0.00,0.00,0.00,0000000000
1391380000000,0.00,0.00,0.00,0.00,0000000000`
What am I doing wrong?
EDIT:
My input numbers, foo.csv:
1.39147e+12,56.32,57.88,56.09,57.81,2911900000
1.39138e+12,58.15,58.48,56.08,56.15,2929300000

You haven't set the input field separator to a comma so the whole record is read as a single string, and the %13.0f format is converting just the first part (up to the first comma). The rest of the fields (2 through 6) are empty, and therefore equal 0. Try:
awk -F, '{printf "%13.0f,%5.2f,%5.2f,%5.2f,%5.2f,%10.0f\n",$1,$2,$3,$4,$5,$6} foo.csv
Or perhaps:
awk -F, -v OFS=, '{$1 = sprintf("%13.0f", $1); print}' foo.csv

Related

Searching for a string between two characters

I need to find two numbers from lines which look like this
>Chr14:453901-458800
I have a large quantity of those lines mixed with lines that doesn't contain ":" so we can search for colon to find the line with numbers. Every line have different numbers.
I need to find both numbers after ":" which are separated by "-" then substract the first number from the second one and print result on the screen for each line
I'd like this to be done using awk
I managed to do something like this:
awk -e '$1 ~ /\:/ {print $0}' file.txt
but it's nowhere near the end result
For this example i showed above my result would be:
4899
Because it is the result of 458800 - 453901 = 4899
I can't figure it out on my own and would appreciate some help
With GNU awk. Separate the row into multiple columns using the : and - separators. In each row containing :, subtract the contents of column 2 from the contents of column 3 and print result.
awk -F '[:-]' '/:/{print $3-$2}' file
Output:
4899
Using awk
$ awk -F: '/:/ {split($2,a,"-"); print a[2] - a[1]}' input_file
4899

sed/awk unix csv file modification

I have a directory that is receiving .csv files.
column1,column2,column3,columb4
value1,0021,value3,value4,
value1,00211,value3,value4,
I want remove the header, pad the second column to 6 digits and add ":" so it is in HH:MM:SS format. e.g.
value1,00:00:21,value3,value4,
value1,00:02:11,value3,value4,
I can pad the characters to 6 digits using awk but I am not sure to insert the semicolumn every 2 characters for the second $2. Else can this be fully done in sed? which would be better for performance?
Thank you
You may do it all with GNU awk:
awk 'BEGIN{FS=OFS=","} {$2=sprintf("%06d", $2); $2=substr($2,1,2) gensub(/.{2}/,":&","g",substr($2,3))}1' file
See an online demo
Details
BEGIN{FS=OFS=","} - sets input/output field separator to a comma
$2=sprintf("%06d", $2) - pads Field 2 with zeros
$2=substr($2,1,2)""gensub(/.{2}/,":&","g",substr($2,3)) - sets Field 2 value equal to a first two chars of the field (substr($2,1,2)) plus the field substring starting from the third char with : inserted before each two char chunk.
1 - default print action.
With awk formatting + substitution magic:
awk 'BEGIN{ FS = OFS = "," }
NR > 1{ $2=sprintf("%06d", $2); gsub(/[0-9]{2}/, "&:", $2);
$2=substr($2, 0, 8); print }' file
The output:
value1,00:00:21,value3,value4,
value1,00:02:11,value3,value4,
with sed
$ sed -nE '2,$s/,([0-9]+)/,00000\1/;s/,0+(..)(..)(..),/,\1:\2:\3,/p' file
value1,00:00:21,value3,value4,
value1,00:02:11,value3,value4,
I think it can be simplified little bit.

Padding columns of csv

I have a csv file which contains a large number of csv seperated lines of data. i want to find the maximum length of the line then need to print NO in a new column
file.csv
1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11
abc,def,ghi,jkl
expected result
1,2,3,4,,,,,,,,,,,,,,,,N0
1,4,7,8,9,10,11,13,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,1,NO
abc,def,ghi,jkl,,,,,,,,NO
cat file | cat > file.csv
echo "N0" >> file.csv
output obtained
1,2,3,4,NO
1,4,7,8,9,10,11,13,NO
1,2,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,NO
You need to read the file twice, once to get the maximum number of columns, once to print the output:
awk -F, 'NR==FNR{if(m<=NF)m=NF;next} # Runs only on first iteration
{printf "%s",$0;for(i=0;i<=(m-NF);i++)printf ",";print "NO"}' file file
filename twice -----^
Output (12 columns in each row):
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,NO
It's hard to imagine why you'd want to pad the lines with commas so here's what I think you really want which is to make every line have the same number of fields:
$ awk 'BEGIN{FS=OFS=","} NR==FNR{m=(m>NF?m:NF);next} {$(m+1)="NO"} 1' file file
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,NO
and here's what you said you want anyway:
$ awk '{n=length()} NR==FNR{m=(m>n?m:n);next} {p=sprintf("%*s",m-n+1,""); gsub(/ /,",",p); $0=$0 p "NO"} 1' file file
1,2,3,4,,,,,,,,,,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,NO
abc,def,ghi,jkl,,,,,,,,,NO
awk -F, 'BEGIN{m=0}
{if(NF>m)m=NF;ar[NR]=$0;ars[NR]=NF;}
END{for(i=1;i<=NR;i++)
{for(j=ars[i];j<m;j++){ar[i]=ar[i]","}ar[i]=ar[i]"NO";
print ar[i]}}' <<<'1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11,12
abc,def,ghi,jkl
a,b'
output:
1,2,3,4,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,NO
1,2,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,12NO
abc,def,ghi,jkl,,,,,,,,NO
a,b,,,,,,,,,,NO
if lines must be have same size:
awk -F, 'BEGIN{m=0}
{if(length($0)>m)m=length($0);ar[NR]=$0;ars[NR]=length($0);}
END{for(i=1;i<=NR;i++)
{for(j=ars[i];j<m;j++)
{ar[i]=ar[i]","}ar[i]=ar[i]"NO";
print ar[i]}}' <<<'1,2,3,4,
1,4,7,8,9,10,11,13
1,2,
1,1,2,4,5,6,7,8,9,10,11,12
abc,def,ghi,jkl
a,b'
output:
1,2,3,4,,,,,,,,,,,,,,,,,,,NO
1,4,7,8,9,10,11,13,,,,,,,,NO
1,2,,,,,,,,,,,,,,,,,,,,,,,NO
1,1,2,4,5,6,7,8,9,10,11,12NO
abc,def,ghi,jkl,,,,,,,,,,,NO
a,b,,,,,,,,,,,,,,,,,,,,,,,NO
if you want also comma after max length line run for loop until m+1;

awk rounding the float numbers

My first line of file a.txt contains following and fields are separated by (,)
ab,b1,c,d,5.986627e738,e,5.986627e738
cd,g2,h,i,7.3423542344,j,7.3423542344
ef,l3,m,n,9.3124234323,o,9.3124234323
when I issue the below command
awk -F"," 'NR>-1{OFS=",";gsub($5,$5+10);OFS=",";print }' a.txt
it is printing
ab,b1,c,d,inf,e,inf
cd,g2,h,i,17.3424,j,17.3424
ef,l3,m,n,19.3124,o,19.3124
Here I have two issues
I asked awk to add 10 to only 5th column but it has added to 7th column as well due to duplicate entries
It is rounding up the numbers, instead, I need decimals to print as it is
How can I fix this?
awk 'BEGIN {FS=OFS=","}{$5=sprintf("%.10f", $5+10)}7' file
in your data, the $5 from line#1 has an e, so it was turned into 10.0000... in output.
you did substitution with gsub, therefore all occurrences will be replaced.
printf/sprintf should be considered to output in certain format.
tested with gawk
If you want to set the format in printf dynamically:
kent$ cat f
ab,b1,c,d,5.9866,e,5.986627e738
cd,g2,h,i,7.34235,j,7.3423542344
ef,l3,m,n,9.312423,o,9.3124234323
kent$ awk 'BEGIN {FS=OFS=","}{split($5,p,".");$5=sprintf("%.*f",length(p[2]), $5+10)}7' f
ab,b1,c,d,15.9866,e,5.986627e738
cd,g2,h,i,17.34235,j,7.3423542344
ef,l3,m,n,19.312423,o,9.3124234323
what you did is replacement on the whole record, what you really want to do is
awk 'BEGIN {FS=OFS=","}
{$5+=10}1' a.txt

awk combine 2 commands for csv file formatting

I have a CSV file which has 4 columns. I want to first:
print the first 10 items of each column
only print the items in the third column
My method is to pipe the first awk command into another but i didnt get exactly what i wanted:
awk 'NR < 10' my_file.csv | awk '{ print $3 }'
The only missing thing was the -F.
awk -F "," 'NR < 10' my_file.csv | awk -F "," '{ print $3 }'
You don't need to run awk twice.
awk -F, 'NR<=10{print $3}'
This prints the third field for every line whose record number (line) is less than or equal to 10.
Note that < is different from <=. The former matches records one through nine, the latter matches records one through ten. If you need ten records, use the latter.
Note that this will walk through your entire file, so if you want to optimize your performance:
awk -F, '{print $3} NR>10{exit}'
This will print the third column. Then if the record number is greater than 10, it will exit. This does not step through your entire file.
Note also that awk's "CSV" matching is very simple; awk does not understand quoted fields, so the record:
red,"orange,yellow",green
has four fields, two of which have double quotes in them. YMMV depending on your input.

Resources