I am finding the difference between two columns in a file like
cat "trace-0-dir2.txt" | awk '{print expr $2-$1}' | sort
this gives me values like :
-1.28339e+09
-1.28339e+09
-1.28339e+09
-1.28339e+09
I want to avoid the rounding off and want the exact value.How can this be achieved?
FYI ,trace-0-dir2.txt contains:
1283453524.342134 65337.141749 10 2
1283453524.556784 65337.388047 11 2
1283453524.556794 65337.411165 12 2
1283453524.556806 65337.435947 13 2
1283453524.556811 65337.435989 14 2
1283453524.556816 65337.453931 15 2
1283453524.771522 65337.484866 16 2
printf function can help get you the formatting you need. You don't need expr and you don't need cat. awk can do any calculation and you can invoke awk directly on the file.
You can alter the 20.20 to any number based on the format you are looking for.
[jaypal:~/Temp] cat file0
1283453524.342134 65337.141749 10 2
1283453524.556784 65337.388047 11 2
1283453524.556794 65337.411165 12 2
1283453524.556806 65337.435947 13 2
1283453524.556811 65337.435989 14 2
1283453524.556816 65337.453931 15 2
1283453524.771522 65337.484866 16 2
[jaypal:~/Temp] awk '{ printf("%20.20f\n", $2-$1)}' file0
-1283388187.20038509368896484375
-1283388187.16873693466186523438
-1283388187.14562892913818359375
-1283388187.12085914611816406250
-1283388187.12082219123840332031
-1283388187.10288500785827636719
-1283388187.28665614128112792969
From the man page:
Field Width:
An optional digit string specifying a field width; if the output string has fewer characters than the field width it will be blank-padded on the left (or right, if the left-adjustment indicator has been given) to make up the field width (note that a leading zero is a flag, but an embedded zero is part of a field width);
Precision:
An optional period, `.', followed by an optional digit string giving a precision which specifies the number of digits to appear after the decimal point, for e and f formats, or the maximum number of characters to be printed from a string; if the digit string is missing, the precision is treated as zero;
Related
This is a follow-up question to Right justify amounts in hledger journal text file
hledger is a plaintext accounting CLI program that can output all the postings of a transaction aligned amongst themselves, but not align the entire journal to a specific column number (see Add option to print with normalized output width · Issue #1045 · simonmichael/hledger)
# Test journal
; To test right-alignment of amounts
* Transactions
2018-01-01 #Payee | Internet
expenses:communication:internet 123.00 EUR
assets:cash:eur
2018-01-01 #Landlady | Rent
expenses:housing:rent 321.00 EUR
expenses:fees 2.50 EUR ; Bank fee
assets:bank:eur -323.50 EUR
2016/01/01 Unit prices
Expense:Foo 56 # $6.00 ; a comment after a space
Cash $-336 ; a comment after two spaces
2022-01-01 Time
skill 10000h
time -10000h
weird ; weird comment on posting without amount
2018-01-01 #Some things in life | Are free
; With and without comment
expenses:misc:stuff 0 ; A comment
expenses:misc:things 0
assets:cash:eur
Output should be
# Test journal
; To test right-alignment of amounts
* Transactions
2018-01-01 #Payee | Internet
expenses:communication:internet 123.00 EUR
assets:cash:eur
2018-01-01 #Landlady | Rent
expenses:housing:rent 321.00 EUR
expenses:fees 2.50 EUR ; Bank fee
assets:bank:eur -323.50 EUR
2016/01/01 Unit prices
Expense:Foo 56 # $6.00 ; a comment after a space
Cash $-336 ; a comment after two spaces
2022-01-01 Time
skill 10000h
time -10000h
weird ; weird comment on posting without amount
2018-01-01 #Some things in life | Are free
; With and without comment
expenses:misc:stuff 0 ; A comment
expenses:misc:things 0
assets:cash:eur
So the spaces have to stretch between the accounts (indented entries to the left) and the amounts (the text to the right)
Sounds like a job for awk, printf, something like that.
The solution is adapted from Ed Morton's answer to the initial question. The regexp is made more flexible, to cover more of the edge cases possible in a journal file.
Here is how to align right up to column 53.
align-journal.awk
match($0,/^( {2,4}[^ ]+) +([^;]+\y)?( *;.*)?/,a) {
$0 = sprintf("%-39s %12s%s", a[1], a[2], a[3])
}
{ print }
Running this AWK script on test.journal...
awk -f align-journal.awk test.journal
...produces the desired output. awk -f align-journal.awk test.journal > neat.journal to redirect it to a neat.journal file.
Explanations
The regular expression goes something like this: at the start of a line, in the first capture group match 2 to 4 spaces followed by 1 or more of any character that isn't space. Then 2 or more spaces, not captured. This is the space that's supposed to stretch to provide alignment / justification. Then second capture group, this is the amount of the posting, which can have many forms. Match whatever character isn't a ; (semicolons are how comments start in hledger journals) 1 or more times up to a word boundary. Do this if such a pattern exists. Then the third and final capture group, for the posting comment, if it exists: 0 or more spaces followed by a ;, followed by anything till the end of the line.
The idea is not to try to match the form of the amount field, so the script will manage to align it correctly whatever it is.
a is the variable name for what will hold the capturing groups.
The stretching of the spaces in the middle is done by the sprintf() function, using printf statements for fancier printing. The format string %-39s %12%s is chosen to align up to column 53, so last character of text should be on col. 52, and also to ensure a space is kept between the two fields even if the first is longer than 40 chars. That's why 39 and 12 add up to 51 instead of 53.
The script was tested with gawk 5.0.0.
I have a VB file were can a HEX value ‘0D25’ come in any position from 1 to 20 (values from 21 position should not be changed). This need to be replaced with HEX value ‘4040’.
Input:
----+----1----+----2----+----3----+----4----+
0000/12345678 566 #(#)#0000/12345678 566
FFFF6FFFFFFFF02FFF44B475BFFFF6FFFFFFFF02FFF02
0000112345678D5566005DBD50000112345678D5566D5
Expected output:
----+----1----+----2----+----3----+----4----+
0000/12345678 566 #(#)#0000/12345678 566
FFFF6FFFFFFFF44FFF44B475BFFFF6FFFFFFFF02FFF02
000011234567800566005DBD50000112345678D5566D5
I was using SORT with below control card.
SORT FIELDS=COPY
OUTREC FIELDS=(1,4,5,20,CHANGE=(20,X'0D25',X'4040'),
NOMATCH=(5,20),
21)
CHANGE= does not work the way you think it does. It does a lookup at only the specified position. It then replaces with either the replacement character(s), or with the NOMATCH= character(s) in exactly the length specified as the first sub-parm of CHANGE= (20 in your case).
FINDREP= searches for the specified character(s) in each position, and replaces with the replacement character(s). You limit to the part of the record to be inspected with the STARTPOS=, and ENDPOS= keywords, resp.
In your case the following statement should do what you want:
OUTREC FINDREP=(INOUT=(X'0D25',X'4040'),STARTPOS=5,ENDPOS=24)
Suppose I have two files
$A
a b
1 5
2 6
3 7
4 8
$B
a b
1 5
2 6
5 6
My question is, in Shell or Terminal, How to calculate the total number of values of B's first column (1,2,5) in the A's first column(1,2,3,4)? (here the answer is 2 (1,2).
The following awk solution counts column1 entries of file2 in file1:
awk 'FNR==1{next}NR==FNR{a[$1]=$b;next}$1 in a{count++}END{print count}' file1 file2
2
Skip the first line from both files using FNR==1{next}. You can remove this if you don't have header fields (a b) in your actual data files.
Read the entire first file into an array using NR==FNR{a[$1]=$b;next}. I am assigning column2 here if you wish to scale the solution to match both columns. You can also do a[$1]++ if you are not interested in column2 at all. Wont hurt either ways.
If the value of column1 from second file is in our array, increment a count variable
In the END block print the count variable.
I'm using the command line tool from Temperature Monitor, the mac software, which looks like this:
$ /Applications/TemperatureMonitor.app/Contents/MacOS/tempmonitor -c -l -a
SMART Disk Hitachi HTS547550A9E384 (J2250050GMBY3C): 30 C
SMART Disk TOSHIBA MK5065GSXF (71SNCPW4T): 28 C
SMC BATTERY: 30 C
SMC BATTERY POSITION 2: 31 C
SMC BATTERY POSITION 3: 28 C
SMC CPU A DIODE: 47 C
SMC CPU A PROXIMITY: 45 C
SMC GPU 1 CHIP: 40 C
SMC LEFT PALM REST: 28 C
SMC MAIN HEAT SINK 2: 38 C
SMC MAIN HEAT SINK 3: 37 C
SMC MAIN LOGIC BOARD: 36 C
SMC PLATFORM CONTROLLER HUB: 49 C
SMC SSD BAY: 36 C
I want to clean this up a bit. So for example, let's say I want to get the average of the three Battery temperature readings. I thought of piping into grep for Battery, then awking through all the fields for the correct data, but that seems really messy.
So I want the three variables $BATTERY_1, $BATTERY_2, and $BATTERY_3 to have the content 30, 31, and 28 respectively.
Any suggestions on the cleanest way to do so?
It will be easier to create an array and then move the values from the array into the plain variables. It is trivial to do the extraction with awk:
TEMPMON="/Applications/TemperatureMonitor.app/Contents/MacOS/tempmonitor"
battery=( $("$TEMPMON" -c -l -a | awk '/BATTERY/ { print $(NF-1) }') )
BATTERY_1=${battery[0]}
BATTERY_2=${battery[1]}
BATTERY_3=${battery[2]}
To complement #Jonathan Leffler's helpful answer:
If you don't actually need the individual values and instead want the average only, try:
... | awk '/ BATTERY/ {sum+=$(NF-1); ++i} END {OFMT="%.2f"; print sum / i}'
OFMT="%.2f" sets the (printf-style) output number format to 2 decimal places, resulting in 29.67.
Update: The OP, in a comment, asks for output in the format <Item name>: <avg temp> (<temp 1>, <temp 2>, <temp 3>):
... | awk -v itm='BATTERY' '
$0 ~ itm {
vals = vals (i ? " " : "") $(NF-1)
sum += $(NF-1); ++i
}
END {
printf "%s: %.2f (%s)\n", itm, sum / i, vals
}'
-v itm='BATTERY' passes the name of the items to match as awk variable itm.
$0 ~ itm matches (~) the current input line ($0) against itm (interpreted as a regular expression, which in this simple case performs substring matching).
awk splits input lines into fields $1, $2, ... by whitespace by default, and stores the number of fields in special variable NF. Since the values in the input data are in the next-to-last field, $(NF-1) references each line's value.
vals = ... builds up a string list of matching values; note how merely placing strings and variables next to each other causes them to be concatenated (as strings).
(i ? " " : "") is a C-style ternary conditional that returns a single space if condition i is true (i.e., if variable i has a nonzero value), and an empty string otherwise. In other words: if the value is not the first one, append a space before appending the value to the list of values built up so far. Note that uninitialized variables in awk default to an empty string in a string context, and 0 (false) in a numeric/Boolean context.
sum += ... sums up the values; ++i keeps the count of values.
END is a special pattern whose associated action (block) is processed after all input lines.
printf, for output based on a format (template) string, works like its C counterpart, and in this case outputs the item name (1st %s, instantiated with itm), the average with 2 decimal places (%.2f, instantiated with sum / i) and the list of values (last %s, instantiated with vals).
I am trying to resolve locations in lat and long in one file to a couple of named fields in another file.
I have one file that is like this..
f1--f2--f3--------f4-------- f5---
R 20175155 41273951N078593973W 18012
R 20175156 41274168N078593975W 18000
R 20175157 41274387N078593976W 17999
R 20175158 41274603N078593977W 18024
R 20175159 41274823N078593978W 18087
Each character is in a specific place so I need to define fields based on characters.
f1 char 18-21; f2 char 22 - 25; f3 char 26-35; f4 char 36-45; f5 char 62-66.
I have another much larger csv file that has fields 11, 12, and 13 to correspond to f3, f4, f5.
awk -F',' '{print $11, $12, $13}'
41.46703821 -078.98476926 519.21
41.46763555 -078.98477791 524.13
41.46824123 -078.98479015 526.67
41.46884129 -078.98480615 528.66
41.46943371 -078.98478482 530.50
I need to find the closest match to file 1 field 1 && 2 in file 2 field 11 && 12;
When the closest match is found I need to insert field 1, 2, 3, 4, 5 from file 1 into file 2 field 16, 17, 18, 19, 20.
As you can see the format is slightly different. File 1 breaks down like this..
File 1
f3-------f4--------
DDMMSSdd DDDMMSSdd
41273951N078593973W
File 2
f11-------- f12---------
DD dddddddd DDD dddddddd
41.46703821 -078.98476926
N means f3 is a positive number, W means f4 is a negative number.
I changed file 1 with sed, ridiculous one liner that works great.. (better way???)
cat $file1 |sed 's/.\{17\}//' |sed 's/\(.\{4\}\)\(.\{4\}\)\(.\{9\}\)\(.\)\(.\{9\}\)\(.\)\(.\{16\}\)\(.\{5\}\)/\1,\2,\3,\4,\5,\6,\8/'|sed 's/\(.\{10\}\)\(.\{3\}\)\(.\{2\}\)\(.\{2\}\)\(.\{2\}\)\(.\{3\}\)\(.\{3\}\)\(.\{2\}\)\(.*\)/\1\2,\3,\4.\5\6\7,\8\9/'|sed 's/\(.\{31\}\)\(.\{2\}\)\(.*\)/\1,\2.\3/'
2017,5155, 41,27,39.51,N,078,59,39.73,W,18012
2017,5156, 41,27,41.68,N,078,59,39.75,W,18000
2017,5157, 41,27,43.87,N,078,59,39.76,W,17999
2017,5158, 41,27,46.03,N,078,59,39.77,W,18024
2017,5159, 41,27,48.23,N,078,59,39.78,W,18087
Now I have to convert the formats.. (RESOLVED this (see below)--problem -- The numbers are rounded off too far. I need to have at least six decimal places.)
awk -F',' '{for (i=1;i<=NF;i++) {if (i <= 2) printf ($i","); else if (i == 3&&$6 == "S") printf("-"$3+($4/60)+($5/3600)","); else if (i == 3&&$6 == "N") printf($3+($4/60)+($5/3600)","); else if (i == 7&&$10 == "W") printf("-"$7+($8/60)+($9/3600)","); else if (i == 7&&$10 == "E") printf($7+($8/60)+($9/3600)","); if (i == 11) printf ($i"\n")}}'
2017,5155,41.461,-78.9944,18012
2017,5156,41.4616,-78.9944,18000
2017,5157,41.4622,-78.9944,17999
2017,5158,41.4628,-78.9944,18024
2017,5159,41.4634,-78.9944,18087
That's where I'm at.
RESOLVED THIS
*I need to get the number format to have at least 6 decimal places from this formula.*
printf($3+($4/60)+($5/3600))
Added "%.8f"
printf("%.8f", $3+($4/60)+($5/3600))
Next issue will be to match the fields file 1 f3 and f4 to the closest match in file 2 f11 and f12.
Any ideas?
Then I will need to calculate the distance between the fields.
In Excel the formuls would be like this..
=ATAN2(COS(lat1)*SIN(lat2)-SIN(lat1)*COS(lat2)*COS(lon2-lon1), SIN(lon2-lon1)*COS(lat2))
What could I use for that calculation?
*UPDATE---
I am looking at a short distance for the matching locations. I was thinking about applying something simple like Pythagoras’ theorem for the nearest match. Maybe even use less decimal places. It's got to be many times faster.
maybe something like this..*
x = (lon2-lon1) * Math.cos((lat1+lat2)/2);
y = (lat2-lat1);
d = Math.sqrt(x*x + y*y) * R;
Then I could do the heavy calculations required for greater accuracy after the final file is updated.
Thanks
You can't do the distance calculation after you perform the closest match: closest is defined by comparison of the distance values. Awk can evaluate the formula that you want (looks like great-circle distance?). Take a look at this chapter to see what you need.
The big problem is finding the nearest match. Write an awk script that takes a single line of file 1 and outputs the lines in file 2 with an extra column. That column is the calculation of the distance between the pair of points according to your distance formula. If you sort that file numerically (sort -n) then your closest match is at the top. Then you need a script that loops over each line in file 1, calls your awk script, uses head -n1 to pull out the closest match and then output it in the format that you want.
This is all possible in bash and awk, but it would be a much simpler script in Python. Depends on which you prefer.