I am trying to subtract the previous group-line from the current group-line one in the 2nd column. For example, the following script is repeating 100s of times.
A
322 0.2
322 0.2
322 0.2
B
455 0.35
455 0.35
455 0.35
C
566 0.92
566 0.92
566 0.92
A
322 0.18
322 0.18
322 0.18
B
455 0.33
455 0.33
455 0.33
C
566 0.99
566 0.99
566 0.99
I want the starting point is A, that means the fraction of 0.2 will be kept the same for the first group and 0.18 for the second group. In other words, C-B then B-A. See the desired output.
A
322 0.2
322 0.2
322 0.2
B
455 0.15
455 0.15
455 0.15
C
566 0.57
566 0.57
566 0.57
A
322 0.18
322 0.18
322 0.18
B
455 0.15
455 0.15
455 0.15
C
566 0.66
566 0.66
566 0.66
I tried this code to print it in the third column but it seems to subtract from the previous line, not previous group.
awk '{$3 = $2 - prev2; prev2 = $2; print;}'
awk to the rescue!
based on the posted input/output and implicit assumptions...
$ awk '/^A/ {ia=1; c=0}
ia {a[c++]=$2}
/^[B-Z]/ {ia=c=0}
!ia && NF>1 {t=$2; $2-=a[++c]; a[c]=t}1' file
A
322 0.2
322 0.2
322 0.2
B
455 0.15
455 0.15
455 0.15
C
566 0.57
566 0.57
566 0.57
A
322 0.18
322 0.18
322 0.18
B
455 0.15
455 0.15
455 0.15
C
566 0.66
566 0.66
566 0.66
the records under each heading can be different, but assumed to the same number of records.
If your real input not represented by this sample you may need to tweak the conditions.
Explanation
/^A/ {ia=1; c=0} if the label starts with A, set A indicator ai, reset counter.
ia {a[c++]=$2} if in A, store values for each record
/^[B-Z]/ {ia=c=0} for other labels, reset in A and counter
!ia && NF>1 {t=$2; $2-=a[++c]; a[c]=t} if not in A and not a label (number of fields more than one), save the numerical value, offset previously saved value for the corresponding record, save the temp value as the new offset value for the record position.
1 print
Related
I have generated two column data files ($Data1 and &Data2) with set table; here are the first values of Data1:
01/11/2021 00:15:00 15.0 70.0 0.10 1010.0 0.8 228 1.4 0.0
01/11/2021 00:30:00 14.8 71.0 0.20 1010.0 1.0 200 1.9 0.0
01/11/2021 00:45:00 14.6 73.0 0.30 1010.1 0.8 142 1.4 0.0
01/11/2021 01:00:00 14.6 74.0 0.20 1010.0 1.2 147 2.0 0.0
and Data2:
01/11/2021 00:15:00 14.8 56.0 0.00 1012.0 2.1 228 4.8 0.0
01/11/2021 00:30:00 14.2 59.0 0.00 1012.1 2.7 202 5.8 0.0
01/11/2021 00:45:00 14.6 62.0 0.00 1012.0 1.6 228 3.4 0.0
01/11/2021 01:00:00 14.0 65.0 0.00 1011.9 1.9 228 3.3 0.0
I have merged them into a new file called $Data with print, like this:
set print $Data
do for [i=1:|$Data1|-6] { print $Data1[i] }
do for [i=1:|$Data2|-6] { print $Data2[i] }
set print
I know how to plot the file $Data, but do you know how to edit it? (by editing I mean to be able to read the numerical values, not to plot them)
I have a text-file with some listing as shown below.I want to fill in missing numbers in the first columns as shown.
Typical original text:
5 401 6 5.80 0.15 -3.56 0.61 -0.02 0.96
8 -6.11 -0.64 4.07 0.24 0.20 0.38
402 6 -0.33 1.07 0.30 1.29 -0.00 2.04
8 0.02 -0.59 0.21 0.50 0.22 0.79
403 6 3.77 -0.70 -2.74 -0.94 0.20 -1.48
8 -4.08 0.22 2.23 -0.06 -0.19 -0.09
404 6 -2.36 0.22 1.12 -0.26 0.21 -0.41
8 2.05 0.27 -1.63 0.20 -0.16 0.32
16 401 16 -6.30 -0.76 -3.61 0.64 -0.22 -1.01
227 5.99 0.27 4.12 0.47 0.15 -0.74
402 16 -12.50 0.14 -7.52 -0.01 -0.24 0.02
227 12.19 0.35 8.03 0.24 0.13 -0.38
403 16 20.48 0.19 12.84 -0.29 0.03 0.46
227 -20.79 -0.68 -13.35 -0.64 -0.18 1.02
404 16 14.28 1.09 8.93 -0.94 0.01 1.48
227 -14.59 -0.60 -9.44 -0.87 -0.21 1.38
709 401 374 -1.17 -0.99 25.11 0.63 -1.12 -0.11
204 1.05 0.79 -24.91 -0.19 -0.62 0.06
402 374 -1.55 1.09 30.49 -0.90 -1.40 0.14
204 1.43 -0.90 -30.28 0.41 -0.79 -0.09
403 374 1.90 -1.58 0.79 1.65 0.50 -0.21
204 -2.02 1.38 -0.99 -0.93 0.41 0.14
404 374 1.51 0.50 6.16 0.12 0.22 0.04
204 -1.64 -0.31 -6.37 -0.32 0.24 -0.02
How I want it to be:
5 401 6 5.80 0.15 -3.56 0.61 -0.02 0.96
5 401 8 -6.11 -0.64 4.07 0.24 0.20 0.38
5 402 6 -0.33 1.07 0.30 1.29 -0.00 2.04
5 402 8 0.02 -0.59 0.21 0.50 0.22 0.79
5 403 6 3.77 -0.70 -2.74 -0.94 0.20 -1.48
5 403 8 -4.08 0.22 2.23 -0.06 -0.19 -0.09
5 404 6 -2.36 0.22 1.12 -0.26 0.21 -0.41
5 404 8 2.05 0.27 -1.63 0.20 -0.16 0.32
16 401 16 -6.30 -0.76 -3.61 0.64 -0.22 -1.01
16 401 227 5.99 0.27 4.12 0.47 0.15 -0.74
16 402 16 -12.50 0.14 -7.52 -0.01 -0.24 0.02
16 402 227 12.19 0.35 8.03 0.24 0.13 -0.38
16 403 16 20.48 0.19 12.84 -0.29 0.03 0.46
16 403 227 -20.79 -0.68 -13.35 -0.64 -0.18 1.02
16 404 16 14.28 1.09 8.93 -0.94 0.01 1.48
16 404 227 -14.59 -0.60 -9.44 -0.87 -0.21 1.38
709 401 374 -1.17 -0.99 25.11 0.63 -1.12 -0.11
709 401 204 1.05 0.79 -24.91 -0.19 -0.62 0.06
709 402 374 -1.55 1.09 30.49 -0.90 -1.40 0.14
709 402 204 1.43 -0.90 -30.28 0.41 -0.79 -0.09
709 403 374 1.90 -1.58 0.79 1.65 0.50 -0.21
709 403 204 -2.02 1.38 -0.99 -0.93 0.41 0.14
709 404 374 1.51 0.50 6.16 0.12 0.22 0.04
709 404 204 -1.64 -0.31 -6.37 -0.32 0.24 -0.02
I had a similar problem before, where two "cells" were missing regurlarly (e.g. the 402 to 404 numbers above also were missing. Then I managed to use this script:
for /F "delims=" %%i in ('type "tmp1.txt"') do (
set row=%%i
set cnt=0
for %%l in (%%i) do set /A cnt+=1
if !cnt! equ 7 (
set row=!header! !row!
) else (
for /F "tokens=1,2" %%j in ("%%i") do set header=%%j %%k
)
echo.!row!
) >> "tmp2.txt"
Idea anyone?
Assuming, the file is formatted with spaces (no TABs):
#echo off
setlocal enabledelayedexpansion
(for /f "delims=" %%a in (tmp1.txt) do (
set "line=%%a"
set "col1=!line:~0,3!"
set "col2=!line:~3,5!"
set "rest=!line:~8!"
if "!col1!" == " " (
set "col1=!old1!"
) else (
set "old1=!col1!"
)
if "!col2!" == " " (
set "col2=!old2!"
) else (
set "old2=!col2!"
)
echo !col1!!col2!!rest!
))>tmp2.txt
You will notice, I don't split the lines into tokens with for /f, but take the lines as a whole and "split" them manually to preserve the format (the length of the substring). Then simply replace "empty values" with the saved value from the line before.
Edit in response to I have made a mistake when pasting the original text. There are 4 (empty) spaces before all lines.:
Adapt the counting as follows ( first "token increase the lenght by 4, for the rest add 4 to the start position, keep the lengths unchanged):
set "col1=!line:~0,7!"
set "col2=!line:~7,5!"
set "rest=!line:~12!"
and adapt if "!col1!" == " " ( to if "!col1!" == " " ( (from three to seven spaces)
I have two files, file 1:
1 800 800 0.51
2 801 801 0.01
3 802 802 0.01
4 803 803 0.23
and file 2:
1 800 800 0.55
2 801 801 0.09
3 802 802 0.88
4 804 804 0.24
I have an awk script that looks in the second file for values that match the first three columns of the first file.
$ awk 'NR==FNR{a[$1,$2,$3];next} {if (($1,$2,$3) in a) {print $4} else {print "not found"}}' f1 f2
0.55
0.09
0.88
not found
Is there a way to make it such that any rows occurring in file 2 that are not in file 1 are still added at the end of the output, after the matches, such as this:
0.55
0.09
0.88
not found
4 804 804 0.24
That way, when I paste the two files back together, they will look something like this:
1 800 800 0.51 0.55
2 801 801 0.01 0.09
3 802 802 0.01 0.88
4 803 803 0.23 not found
4 804 804 not found 0.04
Or is there any other more elegant solution with completely different syntax?
awk '{k=$1FS$2FS$3}NR==FNR{a[k]=$4;next}
k in a{print $4;next}{print "not found";print}' f1 f2
The above one-liner will give you:
0.55
0.09
0.88
not found
4 804 804 0.24
I want to sort a file based on values in columns 2-8?
Essentially I want ascending order based on the highest value that appears on the line in any of those fields but ignoring columns 1, 9 and 10. i.e. the line with the highest value should be the last line of the file, 2nd largest value should be 2nd last line etc... If the next number in the ascending order appears on multiple lines (like A/B) I don't care of the order it gets printed.
I've looked at using sort but can't figure out an easy way to do what I want...
I'm a bit stumped, any ideas?
Input:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 0.26 0.52 0.78
B 0.00 0.00 0.02 0.26 0.19 0.09 0.20 0.56 0.76
C 0.00 0.00 0.02 0.16 0.20 0.22 2.84 0.60 3.44
D 0.00 0.00 0.02 0.29 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 0.90 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 1.06 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 1.11 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 1.39 0.03 0.04 0.01 0.01 1.47 1.48
I 0.00 0.00 1.68 0.16 0.55 0.24 5.00 2.63 7.63
J 0.00 0.00 6.86 0.52 1.87 0.59 12.79 9.83 22.62
K 0.00 0.00 7.26 0.57 2.00 0.64 11.12 10.47 21.59
Expected output:
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62
Preprocess the data: print the max of columns 2 through 8 at the start of each line, then sort, then remove the added column:
awk '
NR==1{print "x ", $0}
NR>1{
max = $2;
for( i = 3; i <= 8; i++ )
if( $i > max )
max = $i;
print max, $0
}' OFS=\\t input-file | sort -n | cut -f 2-
Another pure awk variant:
$ awk 'NR==1; # print header
NR>1{ #For other lines,
a=$2;
ai=2;
for(i=3;i<=8;i++){
if($i>a){
a=$i;
ai=i;
}
} # Find the max number in the line
$ai= "(" $ai ")"; # decoration - mark highest with ()
g[$0]=a;
}
function cmp_num_val(i1, v1, i2, v2) {return (v1 - v2);} # sorting function
END{
PROCINFO["sorted_in"]="cmp_num_val"; # assign sorting function
for (a in g) print a; # print
}' sortme.txt | column -t # column -t for formatting.
#1 2 3 4 5 6 7 8 9 10
A 0.00 0.00 0.01 0.23 0.19 0.07 (0.26) 0.52 0.78
B 0.00 0.00 0.02 (0.26) 0.19 0.09 0.20 0.56 0.76
D 0.00 0.00 0.02 (0.29) 0.22 0.09 0.28 0.62 0.90
E 0.00 0.00 (0.90) 0.09 0.18 0.05 0.24 1.21 1.46
F 0.00 0.00 (1.06) 0.03 0.04 0.01 0.00 1.13 1.14
G 0.00 0.00 (1.11) 0.10 0.31 0.08 0.64 1.60 2.25
H 0.00 0.00 (1.39) 0.03 0.04 0.01 0.01 1.47 1.48
C 0.00 0.00 0.02 0.16 0.20 0.22 (2.84) 0.60 3.44
I 0.00 0.00 1.68 0.16 0.55 0.24 (5.00) 2.63 7.63
K 0.00 0.00 7.26 0.57 2.00 0.64 (11.12) 10.47 21.59
J 0.00 0.00 6.86 0.52 1.87 0.59 (12.79) 9.83 22.62
When I require open-uri and either active_support/core_ext/numeric/conversions.rb or active_support/core_ext/big_decimal/conversions.rb, 'open "http://some.website.com"' becomes extremely slow.
How can I avoid this?
Ruby 2.0.0, active_support 4.0.0
EDIT
Here are profiling results. There are so many Gem::Dependency#matching_specs (and others) calls.
source (with conversions)
require 'open-uri'
require 'active_support/core_ext/numeric/conversions'
open 'http://stackoverflow.com'
result
% cumulative self self total
time seconds seconds calls ms/call ms/call name
21.46 0.56 0.56 22620 0.02 0.11 Gem::Dependency#matching_specs
13.41 0.91 0.35 4567 0.08 0.76 Array#each
5.36 1.05 0.14 1500 0.09 0.15 Gem::Version#<=>
4.98 1.18 0.13 3810 0.03 0.11 Gem::BasicSpecification#contains_requirable_file?
3.83 1.28 0.10 5353 0.02 0.03 Gem::StubSpecification#activated?
3.45 1.37 0.09 27604 0.00 0.00 Gem::StubSpecification#name
3.07 1.45 0.08 1382 0.06 0.33 nil#
3.07 1.53 0.08 2139 0.04 0.25 Gem::Specification#initialize
2.68 1.60 0.07 106 0.66 5.85 Kernel#gem_original_require
2.68 1.67 0.07 21258 0.00 0.00 String#===
...
source (without conversions)
require 'open-uri'
open 'http://stackoverflow.com'
result
% cumulative self self total
time seconds seconds calls ms/call ms/call name
36.36 0.08 0.08 46 1.74 10.65 Kernel#gem_original_require
22.73 0.13 0.05 816 0.06 0.09 nil#
4.55 0.14 0.01 46 0.22 11.09 Kernel#require
4.55 0.15 0.01 22 0.45 22.27 Net::BufferedIO#rbuf_fill
4.55 0.16 0.01 3 3.33 3.33 URI::Parser#split
4.55 0.17 0.01 88 0.11 0.34 Module#module_eval
4.55 0.18 0.01 133 0.08 0.45 Object#DelegateClass
4.55 0.19 0.01 184 0.05 0.11 Gem.find_unresolved_default_spec
4.55 0.20 0.01 1280 0.01 0.01 Integer#chr
4.55 0.21 0.01 1280 0.01 0.01 String#%
4.55 0.22 0.01 1381 0.01 0.01 Module#method_added
...