Subtracting numbers in a column - bash

Inside a file I have a column with numbers with 10 elements. I want to subtract the 1st from the 3rd number, the 2nd from the 4th, the 3rd from the 5th, the 4th from the 6th, and so on till the 8th from the 10th.
For example:
10.3456
6.3452
11.2456
5.6666
10.5678
6.4568
14.7777
7.5434
16.5467
8.9999
and get a file with the subtraction
3rd-1st
4th-2nd
5th-3rd
6th-4th
7th-5th
8th-6th
9th-7th
10th-8th

quick and dirty:
$ awk '{a[NR]=0+$0}END{for(i=3;i<=NR;i++)print a[i]-a[i-2]}' file
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
Update: came up with another funny way:
$ awk 'NF>1{print $1-$2}' <(paste <(sed -n '3,$p' file) file)
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
update2, make the result CSV:
kent$ awk '{a[NR]=0+$0}END{for(i=3;i<=NR;i++)
printf "%s%s", a[i]-a[i-2],NR==i?RS:","}' file
0.9,-0.6786,-0.6778,0.7902,4.2099,1.0866,1.769,1.4565

#!/bin/bash
#Create an array
mapfile -t lines < inputFile
output=()
for index in "${!lines[#]}"; do
# Check if the index + 2 exist
if [[ ${lines[$(expr $index + 2)]} ]]; then
#It does exist, do the math
output+=("$(expr ${lines[$index]} + ${lines[$(expr $index + 2)]})")
fi
done
printf "%s\n" "${output[#]}" > output

perly dog
perl -ne '$a{$.}=$_;print $_-$a{$.-2}."\n" if $a{$.-2}' file
Makes an array
If a key of two lines before exists then print that line minus the value from array.
0.9
-0.6786
-0.6778
0.7902
4.2099
1.0866
1.769
1.4565
For in a row like asked for in Kents answer
perl -ne '$a{$.}=$_;print $_-$a{$.-2}.(eof()?"\n":",") if $a{$.-2}' file
0.9,-0.6786,-0.6778,0.7902,4.2099,1.0866,1.769,1.4565

With awk, I'd write
awk -v ORS="" '
{a=b; b=c; c=$0} # remember the last 3 lines
NR >= 3 {print sep c-a; sep=","} # print the difference
END {print "\n"} # optional, add a trailing newline.
' file
Or let paste do the gruntwork
awk '{a=b;b=c;c=$0} NR >= 3 {print c-a}' file | paste -sd,

Related

Extract String before bracket and create new line

I have data in below format
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW [123457, 89053]
FAFDJ-ER 1234 MNO [6532, 789, 234578]
I want to create the data in below format using sed or awk.
ABC-ERW 12344 ZYX 12345
FFANKN 2345 QW 123457
FFANKN 2345 QW 89053
FAFDJ-ER 1234 MNO 6532
FAFDJ-ER 1234 MNO 789
FAFDJ-ER 1234 MNO 234578
I can extract the data before bracket but I don't know how to concatenate the same with data from bracket repeatedly.
My Effort :--
# !/bin/bash
while IFS= read -r line
do
echo "$line"
cnt=`echo $line | grep -o "\[" | wc -l`
if [ $cnt -gt 0 ]
then
startstr=`echo $line | awk -F[ '{print $1}'`
echo $startstr
intrstr=`echo $line | cut -d "[" -f2 | cut -d "]" -f1`
echo $intrstr
else
echo "$line" >> newfile.txt
fi
done < 1.txt
I am able to get the first part and also keep the rows not having "[" in new file but I dont know how to get the values in "[" and pass it at end as number of variables in "[" keep changing randomly.
Regards
With your shown samples, please try following awkcode.
awk '
match($0,/\[[^]]*\]$/){
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ")
for(i=1;i<=num;i++){
print substr($0,1,RSTART-1) arr[i]
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
match($0,/\[[^]]*\]$/){ ##Using match function to match from [ till ] at the end of line.
num=split(substr($0,RSTART+1,RLENGTH-2),arr,", ") ##Splitting matched values by regex above and passing into array named arr with delimiters comma and space.
for(i=1;i<=num;i++){ ##Running for loop till value of num.
print substr($0,1,RSTART-1) arr[i] ##printing sub string before matched along with element of arr with index of i.
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.
Suggesting simple awk script:
awk 'NR==1{print}{for (i=2;i<NF;i++)print $1, $i}' FS="( \\\[)|(, )|(\\\]$)" input.1.txt
Explanation:
FS="( \\\[)|(, )|(\\\]$)" Set awk field seperator to be either [ , ]EOL
This will make the interesting fields $2 ---> $FN to be appended to $1
NR==1{print} print first line only as it is.
{for (i=2;i<NF;i++)print $1, $i} for 2nd line on, print: field $1 appended by current field.
This might work for you (GNU sed):
sed -E '/(.*)\[([^,]*), /{s//\1\2\n\1[/;P;D};s/[][]//g' file
Match the string up to the opening square bracket and also the string after before the comma and space.
Replace the entire match by the leading and trailing matching strings, followed be a newline and the leading matching string.
Print/delete the first line and repeat.
The last line of any repeat above will fail because there is not trailing comma space, in which case the opening and closing square brackets should also be removed.
Alternative:
sed -E ':a;s/([^\n]*)\[([^,]*), /\1\2\n\1[/;ta;s/[][]//g' file

How to increment the last number in a string; bash

I have a string that looks something like
/foo/bar/baz59_ 5stuff.thing
I would like to increase the last number (5 in the example) by one, if it's greater than another variable. A 9 would be increased to 10. Note that the last number could be multiple digits also; also that "stuff.thing" could be anything; other than a number; so it can't be hard coded.
The above example would result in /foo/bar/baz59_ 6stuff.thing
I've found multiple questions (and answers) that would extract the last number from the string, and obviously that could then be used in a comparison.
The issue I'm having is how to ensure that when I do the replace, I only replace the last number (since obviously I can't just replace "5" for "6"). Can anyone make any suggestions?
awk/sed/bash/grep are all viable.
Updated Answer
Thanks to #EdMorton for pointing out the further requirement of the number exceeding a threshold. That can be done like this:
perl -spe 's/(\d+)(?!.*\d+)/$1>$thresh? $1+1 : $1/e' <<< "abc123_456.txt" -- -thresh=500
Original Answer
You can evaluate/calculate a replacement with /e in Perl regexes. Here I just add 1 to the captured string of digits but you can do more complicated stuff:
perl -pe 's/(\d+)(?!.*\d+)/$1+1/e' <<< "abc123_456.txt"
abc123_457.txt
The (?!.*\d+) is (hopefully) a negative look-ahead for any more digits.
The $1 represents any sequence of digits captured in the capture group (\d+).
Note that this would need modification to handle decimal numbers, negative numbers and scientific notation - but that is possible.
Using bash regular expression matching:
$ f="/foo/bar/baz59_ 99stuff.thing"
$ [[ $f =~ ([0-9]+)([^0-9]+)$ ]]
OK, what do we have now?
$ declare -p BASH_REMATCH
declare -ar BASH_REMATCH=([0]="99stuff.thing" [1]="99" [2]="stuff.thing")
So we can construct the new filename
if [[ $f =~ ([0-9]+)([^0-9]+)$ ]]; then
prefix=${f%${BASH_REMATCH[0]}} # remove "99stuff.thing" from $f
number=$(( 10#${BASH_REMATCH[1]} + 1 )) # use "10#" to force base10
new=${prefix}${number}${BASH_REMATCH[2]}
echo $new
fi
# => /foo/bar/baz59_ 100stuff.thing
With GNU awk for the 3rd arg to match():
$ awk -v t=3 'match($0,/(.*)([0-9]+)([^0-9]*)$/,a) && a[2]>t{a[2]++; $0=a[1] a[2] a[3]} 1' file
/foo/bar/baz59_ 6stuff.thing
Just set t to whatever your threshold value is for incrementing, e.g.:
$ awk -v t=7 'match($0,/(.*)([0-9]+)([^0-9]*)$/,a) && a[2]>t{a[2]++; $0=a[1] a[2] a[3]} 1' file
/foo/bar/baz59_ 5stuff.thing
if it's greater than a script argument.
If I get it correctly(I am assuming you are passing an argument through a script and if its value is greater than string's 2nd field digit then increase 1 into that 2nd field's digit), could you please try following once.
cat script.ksh
value=$1
echo "/foo/bar/baz59_ 5stuff.thing" |
awk -v arg="$value" '
match($2,/[0-9]+/){
val=substr($2,RSTART,RLENGTH)
val=val<arg?val+1:val
$2=val substr($2,RSTART+RLENGTH)
}
1'
Here is an example when I run script.ksh it gives following output.
/script.ksh 7
/foo/bar/baz59_ 6stuff.thing
Here is a shorter gnu awk approach:
cat incr.awk
{
n = split($0, a, /[0-9]+/, b)
for(i=1; i<n; i++)
s = s a[i] b[i] + (b[i] < max && i == n-1 ? 1 : 0)
print s a[i]
}
Then use it as:
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 5stuff.thing'
/foo/bar/baz59_ 6stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 79stuff.thing'
/foo/bar/baz59_ 80stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 90stuff.thing'
/foo/bar/baz59_ 90stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/baz59_ 80stuff.thing'
/foo/bar/baz59_ 80stuff.thing
awk -v max=80 -f incr.awk <<< '/foo/bar/stuff.thing'
/foo/bar/stuff.thing
An awk:
$ echo /foo/bar/baz59_ 99stuff.thing |
awk '
/[0-9]/ {
rstart=1 # keep track of the start
while(match(substr($0,rstart),/[0-9]+/)) { # while numbers last
rstart+=RSTART+RLENGTH-1 # increase rstart
rlength=RLENGTH # remember length too
}
v=substr($0,rstart-rlength,rlength)+1 # increase last number
print substr($0,1,rstart-rlength-1) v substr($0,rstart) # print in parts
next
}1' # in case there was no number
/foo/bar/baz59_ 100stuff.thing
Edit:
Whoops, I missed the argument requirement (increase the last number - - by a one, if it's greater than a script argument):
$ echo /foo/bar/baz59_ 99stuff.thing |
awk -v arg=100 '
/[0-9]/ {
rstart=1
while(match(substr($0,rstart),/[0-9]+/)) {
rstart+=RSTART+RLENGTH-1
rlength=RLENGTH
}
v=substr($0,rstart-rlength,rlength)
if(0+v>arg) { # test if v greater that argument
print substr($0,1,rstart-rlength-1) v+1 substr($0,rstart)
next
}
}1'
Output now:
/foo/bar/baz59_ 99stuff.thing
if the 'testing' number is in 'bound' variable:
perl -pe 'BEGIN{$bound=6} s{(\d+)_(\d+)(?!.*\d+)}{ $i=$2+1;($i>$bound? $1+1:$1)."_".$i}e' <<<"/foo/bar/baz59_5stuff.thing"
/foo/bar/baz59_6stuff.thing

Including empty lines using pattern

My problem is the following: I have a text file where there are no empty lines, now I would like to include the lines according to the pattern file where 1 means print the line without including a new line, 0 - include a new line. My text file is :
apple
banana
orange
milk
bread
Thу pattern file is :
1
1
0
1
0
1
1
The desire output correspondingly:
apple
banana
orange
milk
bread
What I tried is:
for i in $(cat pattern file);
do
awk -v var=$i '{if var==1 {print $0} else {printf "\n" }}' file;
done.
But it prints all the lines first, and only after that it changes $i
Thanks for any prompts.
Read the pattern file into an array, then use that array when processing the text file.
awk 'NR==FNR { newlines[NR] = $0; next}
{ print $0 (newlines[FNR] ? "" : "\n") }' patternfile textfile
allow multiple 0 between 1
Self documented code
awk '# for file 1 only
NR==FNR {
#load an array with 0 and 1 (reversed due to default value of an non existing element = 0)
n[NR]=!$1
# cycle to next line (don't go furthier in the script for this line)
next
}
# at each line (of file 2 due to next of last bloc)
{
# loop while (next due to a++) element of array = 1
for(a++;n[a]==1;a++){
# print an empty line
printf( "\n")
}
# print the original line
print
}' pattern YourFile
need of inversion of value to avoid infinite new line on last line in case there is less info in pattern than line in data file
multiple 0 need a loop + test
unsynchro between file number of pattern and data file is a problem using a direct array (unless it keep how much newline to insert, another way to doing it)
This is a bit of a hack, but I present it as an alternative to your traditionally awk-y solutions:
paste -d, file.txt <(cat pattern | tr '\n' ' ' | sed 's,1 0,10,g' | tr ' ' '\n' | tr -d '1') | tr '0' '\n' | tr -d ','
The output looks like this:
apple
banana
orange
milk
bread
Inverse of Barmar's, read the text into an array and then print as you process the pattern:
$ awk 'NR==FNR {fruit[NR]=$0; next} {print $0?fruit[++i]:""}' fruit.txt pattern.txt
apple
banana
orange
milk
For an answer using only bash:
i=0; mapfile file < file
for p in $(<pattern); do
((p)) && printf "%s" "${file[i++]}" || echo
done

How to add all values in a certain column?

I want to add all the 3rd fields from each line and produce the result.
Below is the way I solved the problem
sum=0
grep '2016Feb' input.txt|awk -F\- '{print $3}'|while read LINE; do
sum = $(expr $sum + $LINE)
done
echo $sum
Is there a better way of solving the problem than my code? Possible a command that solves the problem # command line itself?
For a file like:
$ cat input.txt
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
The output is: 590.
Just set the field separator to the dash and sum the third column:
$ awk -F- '{sum+=$3} END{print sum+0}' file
590 ^^
# in case there are no matching lines, print 0
Since it looks like you are just counting those lines that contain the text "Feb2016", you can also add a filter:
awk -F- '/Feb2016/{sum+=$3} END{print sum+0}' file
# ^^^^^^^^^
# just on lines containing the string "Feb2016"
$ cat data
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
$ cut -d - -f 3 data | paste -s -d '+' | bc
590
$

Comparing values in two files

I am comparing two files, each having one column and n number of rows.
file 1
vincy
alex
robin
file 2
Allen
Alex
Aaron
ralph
robin
if the data of file 1 is present in file 2 it should return 1 or else 0, in a tab seprated file.
Something like this
vincy 0
alex 1
robin 1
What I am doing is
#!/bin/bash
for i in `cat file1 `
do
cat file2 | awk '{ if ($1=="'$i'") print 1 ; else print 0 }'>>binary
done
the above code is not giving me the output which I am looking for.
Kindly have a look and suggest correction.
Thank you
The simple awk solution:
awk 'NR==FNR{ seen[$0]=1 } NR!=FNR{ print $0 " " seen[$0] + 0}' file2 file1
A simple explanation: for the lines in file2, NR==FNR, so the first action is executed and we simply record that a line has been seen. In file1, the 2nd action is taken and the line is printed, followed by a space, followed by a "0" or a "1", depending on if the line was seen in file2.
AWK loves to do this kind of thing.
awk 'FNR == NR {a[tolower($1)]; next} {f = 0; if (tolower($1) in a) {f = 1}; print $1, f}' file2 file1
Swap the positions of file2 and file1 in the argument list to make file1 the dictionary instead of file2.
When FNR (the record number in the current file) and NR (the record number of all records so far) are equal, then the first file is the one being processed. Simply referencing an array element brings it into existence. This sets up the dictionary. The next instruction reads the next record.
Once FNR and NR aren't equal, subsequent file(s) are being processed and their data is looked up in the dictionary array.
The following code should do it.
Take a close look to the BEGIN and END sections.
#!/bin/bash
rm -f binary
for i in $(cat file1); do
awk 'BEGIN {isthere=0;} { if ($1=="'$i'") isthere=1;} END { print "'$i'",isthere}' < file2 >> binary
done
There are several decent approaches. You can simply use line-by-line set math:
{
grep -xF -f file1 file2 | sed $'s/$/\t1/'
grep -vxF -f file1 file2 | sed $'s/$/\t0/'
} > somefile.txt
Another approach would be to simply combine the files and use uniq -c, then just swap the numeric column with something like awk:
sort file1 file2 | uniq -c | awk '{ print $2"\t"$1 }'
The comm command exists to do this kind of comparison for you.
The following approach does only one pass and scales well to very large input lists:
#!/bin/bash
while read; do
if [[ $REPLY = $'\t'* ]] ; then
printf "%s\t0\n" "${REPLY#?}"
else
printf "%s\t1\n" "${REPLY}"
fi
done < <(comm -2 <(tr '[A-Z]' '[a-z]' <file1 | sort) <(tr '[A-Z]' '[a-z]' <file2 | sort))
See also BashFAQ #36, which is directly on-point.
Another solution, if you have python installed.
If you're familiar with Python and are interested in the solution, you only need a bit of formatting.
#/bin/python
f1 = open('file1').readlines()
f2 = open('file2').readlines()
f1_in_f2 = [int(x in f2) for x in f1]
for n,c in zip(f1, f1_in_f2):
print n,c

Resources