how to find a sequence of numbers - bash

I have a data file formatted like this:
0.00 0.00 0.00
1 10 1.0
2 12 1.0
3 15 1.0
4 20 0.0
5 23 0.0
0.20 0.15 0.6
1 12 1.0
2 15 1.0
3 20 0.0
4 18 0.0
5 20 0.0
0.001 0.33 0.15
1 8 1.0
2 14 1.0
3 17 0.0
4 25 0.0
5 15 0.0
I need to remove some data and reorder line like this:
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
My code do not show anything. The problem might be in the grep command. Could you please help me out?
touch extract_file.txt
for (( i=1; i<=band; i++))
do
sed -e '1, 7d' data_file | grep -w " '$(echo $i)' " | awk '{print $2}' > extract(echo $i).txt
paste -s extract_file.txt extract$(echo $i).txt > data
done
#rm eigen*.txt

The following code with comments:
cat <<EOF |
0.00 0.00 0.00
1 10 1.0
2 12 1.0
3 15 1.0
4 20 0.0
5 23 0.0
0.20 0.15 0.6
1 12 1.0
2 15 1.0
3 20 0.0
4 18 0.0
5 20 0.0
0.001 0.33 0.15
1 8 1.0
2 14 1.0
3 17 0.0
4 25 0.0
5 15 0.0
EOF
# remove lines not starting with a space
grep -v '^[^ ]' |
# remove leading space
sed 's/^[[:space:]]*//' |
# remove third arg
sed 's/[[:space:]]*[^[:space:]]*$//' |
# stable sort on first number
sort -s -n -k1 |
# each time first number changes, print additional newline
awk '{ if(length(last) != 0 && last != $1) printf "\n"; print; last=$1}'
outputs:
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
Tested on repl.

perl one-liner:
$ perl -lane 'push #{$nums{$F[0]}}, "#F[0,1]" if /^ /;
END { for $n (sort { $a <=> $b } keys %nums) {
print for #{$nums{$n}};
print "" }}' input.txt
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
Basically, for each line starting with a space, use the first number as a key to a hash table that stores lists of the first two numbers, and print them out sorted by first number.

Related

Regularly spaced numbers between bounds without jot

I want to generate a sequence of integer numbers between 2 included bounds. I tried with seq, but I could only get the following:
$ low=10
$ high=100
$ n=8
$ seq $low $(( (high-low) / (n-1) )) $high
10
22
34
46
58
70
82
94
As you can see, the 100 is not included in the sequence.
I know that I can get something like that using jot:
$ jot 8 10 100
10
23
36
49
61
74
87
100
But the server I use does not have jot installed, and I do not have permission to install it.
Is there a simple method that I could use to reproduce this behaviour without jot?
If you don't mind launching an extra process (bc) and if it's available on that machine, you could also do it like this:
$ seq -f'%.f' 10 $(bc <<<'scale=2; (100 - 10) / 7') 100
10
23
36
49
61
74
87
100
Or, building on oguz ismail's idea (but using a precision of 4 decimal places):
$ declare -i low=10
$ declare -i high=100
$ declare -i n=8
$ declare incr=$(( (${high}0000 - ${low}0000) / (n - 1) ))
$
$ incr=${incr::-4}.${incr: -4}
$
$ seq -f'%.f' "$low" "$incr" "$high"
10
23
36
49
61
74
87
100
You can try this naive implementation of jot:
jot_naive() {
local -i reps=$1 begin=${2}00 ender=${3}00
local -i x step='(ender - begin) / (reps - 1)'
for ((x = begin; x <= ender; x += step)); do
printf '%.f\n' ${x::-2}.${x: -2}
done
}
You could use awk for that:
awk -v reps=8 -v begin=10 -v end=100 '
BEGIN{
step = (end - begin) / (reps-1);
for ( f = i = begin; i <= end; i = int(f += step) )
print i
}
'
10
22
35
48
61
74
87
100
UPDATE 1 ::: fixed double-printing of final row due to difference less than tiny value of epsilon
to maintain directional consistency, rounding is performed based on sign of final :
—- e.g. if final is negative, then any rounding is done as if the current step value (CurrSV) is negative, regardless of sign of CurrSV
———————————————
while i haven't tested every single possible edge case, i believe this version of the code should handle both positive and negative rounding properly, for the most part.
that said, this isn't a jot replacement at all - it only implements a very small subset of the steps counting feature instead of being a full blown clone of it:
{m,g}awk '
function __________(_) {
return -_<+_?_:-_
}
BEGIN {
CONVFMT = "%.250g"; OFMT = "%.13f"
_____ = (_+=_^=_______=______="")^-_^!_
} {
____ = (((_=$(__=(___=$NF)^(_<_)))^(_______=______="")*___\
)-(__=$++__))/--_
_________ = (_____=(-(_^=_<_))^(________=+____<-____\
)*(_/++_))^(++_^_++-+_*_--+-_)
if (-___<=+___) {
_____=__________(_____)
_________=__________(_________)
}
do { print ______,
++_______, int(__+_____), -____+(__+=____)
} while(________? ___<(__-_________) : (__+_________)<___)
print ______, ++_______, int(___+_____), ___, ORS
}' <<< $'8 -3 -100\n8 10 100\n5 -15 -100\n5 15 100\n11 100 11\n10 100 11'
|
1 -3 -3
2 -17 -16.8571428571429
3 -31 -30.7142857142857
4 -45 -44.5714285714286
5 -58 -58.4285714285714
6 -72 -72.2857142857143
7 -86 -86.1428571428572
8 -100 -100
1 10 10
2 23 22.8571428571429
3 36 35.7142857142857
4 49 48.5714285714286
5 61 61.4285714285714
6 74 74.2857142857143
7 87 87.1428571428572
8 100 100
1 -15 -15
2 -36 -36.2500000000000
3 -58 -57.5000000000000
4 -79 -78.7500000000000
5 -100 -100
1 15 15
2 36 36.2500000000000
3 58 57.5000000000000
4 79 78.7500000000000
5 100 100
1 100 100
2 91 91.1000000000000
3 82 82.2000000000000
4 73 73.3000000000000
5 64 64.4000000000000
6 55 55.5000000000000
7 47 46.6000000000000
8 38 37.7000000000000
9 29 28.8000000000000
10 20 19.9000000000000
11 11 11
1 100 100
2 90 90.1111111111111
3 80 80.2222222222222
4 70 70.3333333333333
5 60 60.4444444444445
6 51 50.5555555555556
7 41 40.6666666666667
8 31 30.7777777777778
9 21 20.8888888888889
10 11 11

How to sort data based on the value of a column for part (multiple lines) of a file?

My data in the file file1 look like
3
0
2 0.5
1 0.8
3 0.2
3
1
2 0.1
3 0.8
1 0.4
3
2
1 0.8
2 0.4
3 0.3
Each block has the same number of rows (Here it is 3+2 = 5). In each block, the first two lines are header, the next 3 rows have two columns, the first column is the label, which is one of the number from 1 to 3. I want to sort the rows in each block, based on the value of the first column (except the first two rows). So the expected result is
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3
I think sort -k 1 -n file1 will be good for the total file.
It gives me the wrong result:
0
1
2
3
3
3
2 0.1
3 0.2
3 0.3
1 0.4
2 0.4
2 0.5
1 0.8
1 0.8
3 0.8
This is not the expected result.
How to sort each block is still a problem for me. I think AWK is possible to perform this problem. Please give some suggestions.
Apply the DSU (Decorate/Sort/Undecorate) idiom using any awk+sort+cut and regardless of how many lines are in each bock:
$ awk -v OFS='\t' '
NF<pNF || NR==1 { blockNr++ }
{ print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }
' file |
sort -n -k1,1 -k2,2 -k4,4 -k3,3 |
cut -f5-
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3
To understand what that's doing, just look at the first 2 steps:
$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr++ } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file
1 1 1 1 3
1 1 2 2 0
1 2 3 2 2 0.5
1 2 4 1 1 0.8
1 2 5 3 3 0.2
2 1 6 6 3
2 1 7 7 1
2 2 8 2 2 0.1
2 2 9 3 3 0.8
2 2 10 1 1 0.4
3 1 11 11 3
3 1 12 12 2
3 2 13 1 1 0.8
3 2 14 2 2 0.4
3 2 15 3 3 0.3
$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr++ } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file |
sort -n -k1,1 -k2,2 -k4,4 -k3,3
1 1 1 1 3
1 1 2 2 0
1 2 4 1 1 0.8
1 2 3 2 2 0.5
1 2 5 3 3 0.2
2 1 6 6 3
2 1 7 7 1
2 2 10 1 1 0.4
2 2 8 2 2 0.1
2 2 9 3 3 0.8
3 1 11 11 3
3 1 12 12 2
3 2 13 1 1 0.8
3 2 14 2 2 0.4
3 2 15 3 3 0.3
and notice that the awk command is just creating the key values that you need for sort to sort on by block number, line number or $1, etc. So awk Decorates the input, sort Sorts it, and cut Undecorates it by removing the decoration values that the awk script added.
You can use sort and arrays in gawk
awk 'NF==1 && a[1]{
n=asort(a);
for(k=1; k<=n; k++){print a[k]};
delete a; i=1
}NF==1{print}
NF==2{a[i]=$0;++i}
END{n=asort(a); for(k=1; k<=n; k++){print a[k]}}
' file1
you get
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3
This is similar to Ed Morton's solution but without variable assignment, it uses only built-in variables instead:
λ cat input.txt
3
0
2 0.5
1 0.8
3 0.2
3
1
2 0.1
3 0.8
1 0.4
3
2
1 0.8
2 0.4
3 0.3
awk '{ print int((NR-1)/5), ((NR-1)%5<2) ? 0 : 1, (NF>1 ? $1 : NR), NR, $0 }' input.txt |
sort -n -k1,1 -k2,2 -k3,3 -k4,4 | cut -d ' ' -f5-
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3
How it work
awk '{ print int((NR-1)/5), ((NR-1)%5<2) ? 0 : 1, (NF>1 ? $1 : NR), NR, $0 }' input.txt
0 0 1 1 3
0 0 2 2 0
0 1 2 3 2 0.5
0 1 1 4 1 0.8
0 1 3 5 3 0.2
1 0 6 6 3
1 0 7 7 1
1 1 2 8 2 0.1
1 1 3 9 3 0.8
1 1 1 10 1 0.4
2 0 11 11 3
2 0 12 12 2
2 1 1 13 1 0.8
2 1 2 14 2 0.4
2 1 3 15 3 0.3
A ruby:
ruby -e '$<.read.split(/\n/).map(&:split).
slice_when { |a, b| b.length == 1 && b.length < a.length }.
map{|e| e.sort_by{|sl| sl.length()>1 ? -sl[-1].to_f : -1.0/0}}.
each{|e| e.each{|x| puts "#{x.join(" ")}"}}' file
Or, a DSU form ruby:
ruby -lane 'BEGIN{lines=[]; block=0; lnf=0}
block+=1 if $F.length()>1 && lnf==1
lnf=$F.length()
lines << [block, -($F.length()>1 ? $F[-1].to_f : (-1.0/0)), $.] + $F
END{lines.sort().each{|sl| puts "#{sl[3..].join(" ")}"}}
' file

Need help to find average, min and max values in shell script from text file (again)

This is an update to a question I posted before. I've gotten a little farther into this but need help with a new problem.
I'm working on a shell script right now. I need to loop through a text file, grab the text from it, and find the average number, max number and min number from each line of numbers then print them in a chart with the name of each line. This is the text file:
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
This is my code so far. It should be working but it won't do any of the calculations. First loop grabs the line of text, second loop separates the name from the numbers, these two work. tHe thrid loop takes the numbers and does the calculations. It keeps giving me an error saying "expr: non integer argument", why is it doing that? I shouldn't
#!/bin/bash
while read line
do
echo $line | while read first second
do
echo $first
echo $second
sum=0
max=0
min=0
len=0
for arg in $second
do
sum=`expr $sum + $arg`
if [ $min > $arg ]
then
set min=$arg
fi
if [ $max < $arg ]
then
set max=$arg
fi
len=`expr $len + 1`
done
avg=`expr $sum / $len`
echo $avg
echo $min
echo $max
done
done < mystats.txt
This is the desired output when you type "bash statcalc.sh -s name mystats.txt"
Experiment Name Average Max Min
collect1 27 84 0
exp2 5 22 0
Experiment1 3 9 0
jump1 21 82 -1
jump2 31 88 5
taker1 13 44 2
Using awk
awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}'
Demo :
$awk '{if (NR==1)print "Experiment Name Average Max Min"; min=$2;max=$2;for(i=2;i<=NF;i++) {a[$1]=a[$1]+$i; if (min<$i) min=$i; if(max>$i)max=$i} print $1, int(a[$1]/(NF-1)),min,max}' file.txt | column -t
Experiment Name Average Max Min
Experiment1 4 9 0
collect1 27 84 0
jump1 22 82 -1
exp2 5 22 0
jump2 26 88 5
taker1 11 44 2
$cat file.txt
Experiment1 9 8 1 2 9 0 2 3 4 5
collect1 83 39 84 2 1 3 0 9
jump1 82 -1 9 26 8 9
exp2 22 0 7 1 0 7 3 2
jump2 88 7 6 5
taker1 5 5 44 2 3
$

Pythagorean Theorem in bash as a function

I am trying to display the Pythagorean Theorem in bash for my son - which should be easy. I need it in function. However the theorem a2 + b2 = c2 is just not making sense here. Don't know what I am doing wrong.
#!/bin/bash
read side_a side_b
hypo=$(( (side_a*side_a) + (side_b*side_b) ))
echo "side: $side_a side: $side_b hypotenuse: $hypo"
$ /tmp/hypo
5 5
side: 5 side: 5 hypotenuse: 50
time to switch to awk
$ awk '{print "side:",$1,"side:",$2,"hypotenuse:",sqrt($1^2+$2^2)}'
3 4
side: 3 side: 4 hypotenuse: 5
$1 and $2 are the input fields, the rest should read trivially.
With little more effort, you can generate the integer solutions as well...
$ awk 'BEGIN{for(i=1;i<=10;i++) for(j=1;j<i;j++) print 2*i*j, i^2-j^2, i^2+j^2}'
4 3 5
6 8 10
12 5 13
8 15 17
16 12 20
24 7 25
10 24 26
20 21 29
30 16 34
40 9 41
12 35 37
...

Replace repeated elements in a list with unique identifiers

I have a list like the below:
1 . Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 . Sam 3 4 56 6 89
3 . Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 . Pig 2 5 67 2 21
(except the real list is 40 million lines long).
There are repeated elements in the second column (i.e. the ".")
I want to replace these with unique identifers (e.g. ".1", ".2", ".3"...".n")
I tried to do this with a bash loop / sed combination, but it didn't work...
Failed attempt:
for i in 1..4
do
sed -i "s_//._//."$i"_"$i""
done
(Essentially, I was trying to get sed to replace each n th "." with ".n", but this didn't work).
Here's a way to do it with awk (assuming your file is called input:
$ awk '$2=="."{$2="."++counter}{print}' input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
The awk program replaces the second column ($2) by a string formed by concatenating . and a pre-incremented counter (++counter) if the second column was exactly .. It then prints out all the columns it got (with $2 modified or not) ({print}).
Plain bash alternative:
c=1
while read -r a b line ; do
if [ "$b" == "." ] ; then
echo "$a ."$((c++))" $line"
else
echo "$a $b $line"
fi
done < input
Since your question is tagged sed and bash, here are a few examples for completeness.
Bash only
Use parameter expansion. The second column will be unique, but not sequential:
i=1; while read line; do echo ${line/\./.$((i++))}; done < input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .3 Sam 3 4 56 6 89
3 .4 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .6 Pig 2 5 67 2 21
Bash + sed
sed cannot increment variables, it has to be done externally.
For each line, increment $i if line contains a ., then let sed append $i after the .
i=0
while read line; do
[[ $line == *.* ]] && i=$((i+1))
sed "s#\.#.$i#" <<<"$line"
done < input
Output:
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
you can use this command:
awk '{gsub(/\./,c++);print}' filename
Output:
1 0 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 2 Sam 3 4 56 6 89
3 3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 5 Pig 2 5 67 2 21

Resources