I have a file which contains following 10 numbers:
> cat numbers
9
11
32
88
89
90
95
104
118
120
>
I would like to print out the preceding number only if it is at least 5 numbers smaller than the current number. So I expect output like this:
11
32
90
95
104
120
I have a script which does this:
> cat test.sh
#!/bin/bash
subtraction="5"
while read -r number; do
if [ -n "$previous_number" ] && (( $((number - subtraction)) >= previous_number )); then
echo "$previous_number"
fi
previous_number="$number"
done < "$1"
> ./test.sh numbers
11
32
90
95
104
>
However, it doesn't print 120. What is the most elegant/proper solution in such cases? Should I simply add tail -1 "$1" after the while loop?
For someone else reading this for whom while read genuinely is not iterating over the last line of a file, there's a likely different problem: An input file without a trailing newline.
For that, one can amend their code as follows:
while read -r number || [[ $number ]]; do
: "...logic here..."
done
This is true because without a trailing newline, read will return false, and so the body of the loop will not be executed with the original code, but $number is still populated.
However, for this specific program and its specific input given, there's nothing at all wrong with how the while read idiom handles the last line of an input; the output at hand follows from the program's logic as written and defined.
Consider the following version, which makes what's happening more clear:
#!/bin/bash
subtraction="5"
while read -r number; do
if [[ $previous_number ]] && (( (number - subtraction) >= previous_number )); then
printf '%q is at least %q away from %q\n' "$previous_number" "$subtraction" "$number"
else
printf '%q is not %q away from %q\n' "$previous_number" "$subtraction" "$number"
fi
previous_number="$number"
done <"$1"
Its output is:
'' is not 5 away from 9
9 is not 5 away from 11
11 is at least 5 away from 32
32 is at least 5 away from 88
88 is not 5 away from 89
89 is not 5 away from 90
90 is at least 5 away from 95
95 is at least 5 away from 104
104 is at least 5 away from 118
118 is not 5 away from 120
...as this last line of output shows, it is genuinely considering 120, and deciding not to print it per your program's logic as defined.
It is easier to use awk for this job:
awk 'NR>1 && $1-p>=5{print p} {p=$1}' file
Output:
11
32
90
95
104
btw 120 won't be printed in output because preceding number is 118 which is not <=5 to 120.
Related
I have a little script to extract specific data and cleanup the output a little. It seems overly messy and i'm wondering if the script can be trimmed down a bit.
The input file contains of pairs of lines -- names, followed by numbers.
Line pairs where the numeric value is not between 80 and 199 should be discarded.
Pairs may sometimes, but will not always, be preceded or followed by blank lines, which should be ignored.
Example input file:
al12t5682-heapmemusage-latest.log
38
al12t5683-heapmemusage-latest.log
88
al12t5684-heapmemusage-latest.log
100
al12t5685-heapmemusage-latest.log
0
al12t5686-heapmemusage-latest.log
91
Example/wanted output:
al12t5683 88
al12t5684 100
al12t5686 91
Current script:
grep --no-group-separator -PxB1 '([8,9][0-9]|[1][0-9][0-9])' inputfile.txt \
| sed 's/-heapmemusage-latest.log//' \
| awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Extra input example
al14672-heapmemusage-latest.log
38
al14671-heapmemusage-latest.log
5
g4t5534-heapmemusage-latest.log
100
al1t0000-heapmemusage-latest.log
0
al1t5535-heapmemusage-latest.log
al1t4676-heapmemusage-latest.log
127
al1t4674-heapmemusage-latest.log
53
A1t5540-heapmemusage-latest.log
54
G4t9981-heapmemusage-latest.log
45
al1c4678-heapmemusage-latest.log
81
B4t8830-heapmemusage-latest.log
76
a1t0091-heapmemusage-latest.log
88
al1t4684-heapmemusage-latest.log
91
Extra Example expected output:
g4t5534 100
al1t4676 127
al1c4678 81
a1t0091 88
al1t4684 91
another awk
$ awk -F- 'NR%2{p=$1; next} 80<=$1 && $1<=199 {print p,$1}' file
al12t5683 88
al12t5684 100
al12t5686 91
UPDATE
for the empty line record delimiter
$ awk -v RS= '80<=$2 && $2<=199{sub(/-.*/,"",$1); print}' file
al12t5683 88
al12t5684 100
al12t5686 91
Consider implementing this in native bash, as in the following (which can be seen running with your sample input -- including sporadically-present blank lines -- at http://ideone.com/Qtfmrr):
#!/bin/bash
name=; number=
while IFS= read -r line; do
[[ $line ]] || continue # skip blank lines
[[ -z $name ]] && { name=$line; continue; } # first non-blank line becomes name
number=$line # second one becomes number
if (( number >= 80 && number < 200 )); then
name=${name%%-*} # prune everything after first "-"
printf '%s %s\n' "$name" "$number" # emit our output
fi
name=; number= # clear the variables
done <inputfile.txt
The above uses no external commands whatsoever -- so whereas it might be slower to run over large input than a well-implemented awk or perl script, it also has far shorter startup time since no interpreter other than the already-running shell is required.
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?, describing the while read idiom.
BashFAQ #100 - How do I do string manipulations in bash?; or The Bash-Hackers' Wiki on parameter expansion, describing how name=${name%%-*} works.
The Bash-Hackers' Wiki on arithmetic expressions, describing the (( ... )) syntax used for numeric comparisons.
perl -nle's/-.*//; $n=<>; print "$_ $n" if 80<=$n && $n<=199' inputfile.txt
With gnu sed
sed -E '
N
/\n[8-9][0-9]$/bA
/\n1[0-9]{2}$/!d
:A
s/([^-]*).*\n([0-9]+$)/\1 \2/
' infile
I'm doing for fun and this as part of my learning process in Shell scripting.
Let say I have initial input A B C
What I'm trying to do is to split the string and convert each of them to decimal value.
A B C = 65 66 67
Then I'll add the decimal value to random number, let say number 1.
Now, decimal value will become = 66 67 68
Finally, I'll convert the decimal to the original value again which will become B C D
ubuntu#Ubuntu:~$ cat testscript.sh -n
#!/bin/bash
1 string="ABC"
2
3 echo -e "\nSTRING = $string"
4 echo LENGTH = ${#string}
5
6 # TUKAR STRING KE ARRAY ... word[x]
7 for i in $(seq 0 ${#string})
8 do word[$i]=${string:$i:1}
9 done
10
11 echo -e "\nZero element of array is [ ${word[0]} ]"
12 echo -e "Entire array is [ ${word[#]}] \n"
13
14 # CHAR to DECIMAL
15 for i in $(seq 0 ${#string})
16 do
17 echo -n ${word[$i]}
18 echo -n ${word[$i]} | od -An -tuC
19 chardec[$i]=$(echo -n ${word[$i]} | od -An -tuC)
20 done
21
22 echo -e "\nNEXT, DECIMAL VALUE PLUS ONE"
23 for i in $(seq 0 ${#string})
24 do
25 echo `expr ${chardec[$i]} + 1`
26 done
27
28 echo
This is the output
ubuntu#Ubuntu:~$ ./testscript.sh
STRING = ABC
LENGTH = 3
Zero element of array is [ A ]
Entire array is [ A B C ]
A 65
B 66
C 67
NEXT, DECIMAL VALUE PLUS ONE
66
67
68
1
As you can see in the output, there are 2 problems (or maybe more)
The last for loop processing additional number. Any idea how to fix this?
NEXT, DECIMAL VALUE PLUS ONE
66
67
68
1
This is the formula to convert decimal value to char. I'm trying to put the last value to another array and then put it in another loop for this purpose. However, I'm still have no idea how to do this in loop based on previous data.
ubuntu#Ubuntu:~$ printf "\x$(printf %x 65)\n"
A
Please advise
Using bash you can replace all of your code with this code:
for i; do
printf "\x"$(($(printf '%x' "'$i'") +1))" "
done
echo
When you run it as:
./testscript.sh P Q R S
It will print:
Q R S T
awk to the rescue!
simpler to do the same in awk environment.
$ echo "A B C" |
awk 'BEGIN{for(i=33;i<127;i++) o[sprintf("%c",i)]=i}
{for(i=1;i<=NF;i++) printf "%c%s", o[$i]+1, ((i==NF)?ORS:OFS)}'
B C D
seq is from FIRST to LAST, so if your string length is 3, then seq 0 3 will give you <0,1,2,3>. Your second to last loop (lines 16-20) is actually running four iterations, but the last iteration prints nothing.
To printf the ascii code, insert it inline, like
printf "\x$(printf %x `expr ${chardec[$i]} + 1`) "
or more readably:
dec=`expr ${chardec[$i]} + 1`
printf "\x$(printf %x $dec)\n"
How to clean a list of points in a variable regarding on if it is
the same point or
a close by point (+-5).
Example each line is one point with to coordinates:
points="808,112\n807,113\n809,113\n155,183\n832,572"
echo "$points"
#808,112
#807,113
#809,113
#155,183
#832,572
#196,652
I would like to ignore points within a range of +-5 counts. The result should be:
echo "$points_clean"
#808,112
#155,183
#832,572
#196,652
I thought about looping through the list, but I need help to how to check if point coordinates already exist in the new list:
points_clean=$(for point in $points; do
x=$(echo "$point" | cut -d, -f1)
y=$(echo "$point" | cut -d, -f2)
# check if same or similar point coordinates already in $points_clean
echo "$x,$y"
done)
This seems to work with Bash 4.x (support for process substitution is needed):
#!/bin/bash
close=100
points="808,112\n807,113\n809,113\n155,183\n832,572"
echo -e "$points"
clean=()
distance()
{
echo $(( ($1 - $3) * ($1 - $3) + ($2 - $4) * ($2 - $4) ))
}
while read x1 y1
do
ok=1
for point in "${clean[#]}"
do
echo "compare $x1 $y1 with $point"
set -- $point
if [[ $(distance $x1 $y1 $1 $2) -le $close ]]
then
ok=0
break
fi
done
if [ $ok = 1 ]
then clean+=("$x1 $y1")
fi
done < <( echo -e "$points" | tr ',' ' ' | sort -u )
echo "Clean:"
printf "%s\n" "${clean[#]}" | tr ' ' ','
The sort is optional and may slow things down. Identical points will be too close together, so the second instance of a given coordinate will be eliminated even if the first wasn't.
Sample output:
808,112
807,113
809,113
155,183
832,572
compare 807 113 with 155 183
compare 808 112 with 155 183
compare 808 112 with 807 113
compare 809 113 with 155 183
compare 809 113 with 807 113
compare 832 572 with 155 183
compare 832 572 with 807 113
Clean:
155,183
807,113
832,572
The workaround for Bash 3.x (as found on Mac OS X 10.10.4, for example) is a tad painful; you need to send the output of the echo | tr | sort command to a file, then redirect the input of the pair of loops from that file (and clean up afterwards). Or you can put the pair of loops and the code that follows (the echo of the clean array) inside the scope of { …; } command grouping.
In response to the question 'what defines close?', wittich commented:
Let's say ±5 counts. Eg. 808(±5,) 112(±5). That's why the second and third point would be "cleaned".
OK. One way of looking at that would be to adjust the close value to 50 in my script (allowing a difference of 52 + 52), but that rejects points connected by a line of length just over 7, though. You could revise the distance function to do ±5; it takes a bit more work and maybe an auxilliary abs function, or you could return the square of the larger delta and compare that with 25 (52 of course). You can play with what the criterion should be to your hearts content.
Note that Bash shell arithmetic is integer arithmetic (only); you need Korn shell (ksh) or Z shell (zsh) to get real arithmetic in the shell, or you need to use bc or some other calculator.
The file I am working on looks like this
header
//
[25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001);
[29]:((962:0.000580339,930:0.000580339):0.00543993);
absolute:
gthcont: 5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6
01010010101010101010101010101011111100011
1111010010010101010101010111101000100000
00000000000000011001100101010010101011111
I need it to be split into four files. The first file is
[25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001);
[29]:((962:0.000580339,930:0.000580339):0.00543993);
The second file has to be
5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6
The next file has to be
01010010101010101010101010101011111100011
11110100100101010101010101111010001000001
00000000000000011001100101010010101011111
so the header and the // have to be excluded before the first file, the absolute: line should be removed and the gthcont: shoudl not pop up as well.
Ideally the script would just take the input name of the file and name the output as first_input, second_input and third_input...
the fourth file should have the numbers from within the brackets in the first file..in this case it woudl only be
25
29
so my current try ist
awk.awk
BEGIN{body=0}
!body && /^\/\/$/ {body=1}
body && /^\[/ {print > "first_"FILENAME}
body && /^pos/{$1="";print > "second_"FILENAME}
body && /^[01]+/ {print > "third_"FILENAME}
body && /^\[[0-9]+\]/ {
print > "first_"FILENAME
print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME
}
but is somehow duplicates the lines in the first file so it would be [25], [25], [29],[29]
Some very minor changes to your script produce the desired output:
!body && /^\/\/$/ {body=1}
body && sub(/^gthcont: */,"") {print > "second_"FILENAME}
body && /^[01]+/ {print > "third_"FILENAME}
body && /^\[[0-9]+\]/ {
print > "first_"FILENAME
print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME
}
The duplication problem was caused by the fact that you printed to the first file in two places.
I have used sub to remove the first part of the gthcont: line (and changed the pattern too). sub returns true if it makes any replacements, so you can use it as a test as well. The advantage of using a substitution rather than unsetting the first field is that you can also get rid of the leading white space from the line.
As pointed out in the comments, there is no need to initialise body, so I removed the BEGIN block too.
I would just use a shell function for this:
function split3 {
if [[ $# -ne 1 ]]; then echo 'split3: error: require 1 argument.' >&2; return 1; fi;
while read -r; do
line=$REPLY;
if [[ "$line" =~ ^\[([0-9]+)\]: ]]; then
echo "$line" >&3;
echo "${BASH_REMATCH[1]}" >&6;
elif [[ "$line" =~ ^gthcont: ]]; then
echo "${line#gthcont: }" >&4;
elif [[ "$line" =~ ^\s*[01]+\s*$ ]]; then
echo "$line" >&5;
fi;
done <"$1" 3>"first_$1" 4>"second_$1" 5>"third_$1" 6>"fourth_$1";
};
split3 input; echo $?;
## 0
cat first_input;
## [25]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001);
## [29]:((962:0.000580339,930:0.000580339):0.00543993);
cat second_input;
## 5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6
cat third_input;
## 01010010101010101010101010101011111100011
## 1111010010010101010101010111101000100000
## 00000000000000011001100101010010101011111
cat fourth_input;
## 25
## 29
I am trying to create a script in order to break down a file into 24. The "infoband.dat" contains the data of 24 bands that i want to plot, but rather than writing each band separately, it first writes all the 1st points of each band, then all the 2nd points, etc.
My script was supposed to begin reading each line of the file, while count to 24 over and over until the end of the file. On the first iteration of the for loop, it would create a file with all the first lines out of those 24 line chunks, and it does it successfully. But the second iteration doesn't even start. What is breaking the loop?
1 #!/bin/sh
2 grep frequency band.yaml > infoband.dat
3 contadora=0
4 for i in {1..24} #loop to create the band file, 24 is the no. of bands
5 do
6 contadora=$((contadora+1))
7 contadorb=0
8 contadorc=0
9 while read line
10 do
11 contadorb=$((contadorb+1))
12 if [ $contadorb -eq 25 ]
13 then
14 contadorb=1
15 contadorc=$((contadorc+1))
16 fi
19 if [ $contadora -eq $contadorb ]
20 then
21 echo $contadora $contadorb $contadorc "$line" >> band_$contadora.dat
22 fi
23 done < infoband.dat
24 echo "file of the band " $contadora "is finished"
25 done
Update: i got the code done using a different approach (the variable contadorc is useless btw):
1 #!/bin/sh
2 grep frequency band.yaml > infoband.dat
3 nband=24
4 contadorb=0
5 contadorc=0
6 while read line
7 do
8 contadorb=$((contadorb+1))
9 if [ $contadorb -eq $((nband+1)) ]
10 then
11 contadorb=1
12 contadorc=$((contadorc+1))
13 fi
14 echo " "$contadorb" "$line" punto_q $contadorc">> test_infoband.dat
15 done < infoband.dat
16 for i in `seq 1 $nband`
17 do
18 echo $i $nband
19 grep " $i " test_infoband.dat > banda_$i.dat
20 done
/bin/sh doesn't do brace expansion, so your loop only has one iteration, in which i is set to the string {1..24}. Either change the hashtag to /bin/bash and/or run the script with bash, or use
for i in $(seq 1 24)
(assuming your system has the seq command, otherwise you may need to just hard-code the list, or use a while loop to explicitly increment and test the value of i).
Did you try using the command "split"?
split -l 24 infoband.dat