how to get substring from - bash

how to get substring from
42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!##Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND##!
to be
BEGIN!##Ghjk,GhjkEND##!
Note: there is whitespaces at end of lines, I tried removing whitespaces at end of lines but I cant.
I tried
#!/bin/bash
s=$(awk '/BEGIN!##/,/END##!/' switch.log )
while IFS= read -r line
do
h=$(echo "$line" | awk '{$1=$1;print}')
for i in {0..100}
do
zzz=$(echo "$h" | awk '{print $(NF-$i)}')
if [ ! -z "$zzz" -a "$zzz" != " " ]; then
hh=$(echo "$h" | awk '{print $(NF-$i)}')
echo "$zzz"
echo -e "$zzz" >> ggg.txt
break
fi
done
done <<< "$s"
I got
BEGIN!##Ghjk,Ghj

Another option is using sed with the normal substitute method storing the text you want to keep as the first two backreferences. For example:
sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< 'your string`
Example Use/Output
(note: updated to handle whitespace at the end)
$ sed -E 's/^.*(BEGIN[^[:space:]]+).*(kEND[^[:space:]]+)/\1\2/' <<< '42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!##Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND##!'
BEGIN!##Ghjk,GhjkEND##!
(note: single-quoting the string is required due to '!')

Using sed
$ sed -E 's/[0-9]+[a-z]? +| +//g' input_file
BEGIN!##Ghjk,GhjkEND##!

UPDATED, to fix an error:
You have not defined precisely in your question, how the string to be extracted looks like in general, but based on your example, this would do:
if [[ $line =~ (BEGIN[^ ]+)\ .*([^ ]+END[^ ]+) ]]
then
substring=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
else
echo Pattern not found in line 1>&2
fi

I would harness GNU AWK for this task following way, let file.txt content be
42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 6a BEGIN!##Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND##!
then
awk 'BEGIN{FPAT="[^[:space:]]*(BEGIN|END)[^[:space:]]*";OFS=""}{$1=$1;print}' file.txt
gives output
BEGIN!##Ghjk,GhjkEND##!
Explanation: I inform GNU AWK using field pattern (FPAT) that field is BEGIN or (|) END, prefixed and suffixed by zero-or-more (*) non (^)-whitespace ([:space:]) characters and output field separator (OFS) is empty string, then for each line I do $1=$1 to trigger line rebuilt and print it. If you are sure only space characters are used in line you might elect to replace [^[:space:]] using [^ ]
(tested in gawk 4.2.1)

s=$(awk '/BEGIN!##/,/END##!/' switch.log)
echo "$s" > ggg.txt
ss=$(sed -E 's/[0-9]+[a-z]? +| +//g' ggg.txt )
echo "$ss" > ddd.txt
sss=$(awk '{print $1}' ddd.txt)
echo "$sss" > hhhh.txt
ssss=$(awk '/BEGIN!##/,/END##!/' hhhh.txt)
echo "$ssss" > hhh.txt
aaa=$(<hhh.txt)
aaa=$(cat hhh.txt | tr -d '\n' )

By setting the awk record separator RS to " ", awk processes a white-spaced-separated portion of your file at a time (with each record containing only one field). So the two parts that are needed can be extracted with simple awk condition patterns /BEGIN/ and /END/. There can be no white space in any record since this was the delimiter.
If printed, the pattern-filtered records would normally be separated by a new line (the default output record-separator ORS) but this can be changed to an empty string ORS="" to make the two print statements run into one another with no space.
Thus this simple awk command will return the required fields as a concatentated string with no white space:
awk ' BEGIN{RS=" ";ORS=""} /BEGIN/{print} /END/{print}' file.txt
output:
BEGIN!##Ghjk,GhjkEND##!

$ grep -oE '(BEGIN|END)\S*' file | paste -sd'\0'
BEGIN!##Ghjk,GhjEND##!

echo ' 42 45 47 49 4e 21 40 23 47 68 6a 6b 2c 47 68 ' \
'6a BEGIN!##Ghjk,Ghj 6b 45 4e 44 23 40 21 kEND##!' |
{m,g}awk NF=NF FS='[ \t]*([^ \t][^ \t][ \t]+)+[ \t]*' OFS=
BEGIN!##Ghjk,GkEND##!

Related

How to add the elements in a for loop [duplicate]

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.

Converting string using bash

I want to convert the output of command:
dmidecode -s system-serial-number
which is a string looking like this:
VMware-56 4d ad 01 22 5a 73 c2-89 ce 3f d8 ba d6 e4 0c
to:
564dad01-225a-73c2-89ce-3fd8bad6e40c
I suspect I need to first of all extract all letters and numbers after the "VMware-" part at that start and then insert "-" at the known positions after character 10, 14, 18, 22.
To try the first extraction I have tried:
$ echo `dmidecode -s system-serial-number | grep -oE '(VMware-)?[a0-Z9]'`
VMware-5 6 4 d a d 0 1 2 2 5 a 7 3 c 2 8 9 c e 3 f d 8 b a d 6 e 4 0 c
However this isn't going the right way.
EDIT:
This gets me to a single log string however it's not elegant:
$ echo `dmidecode -s system-serial-number | sed -s "s/VMware-//" | sed -s "s/-//" | sed -s "s/ //g"`
564dad01225a73c289ce3fd8bad6e40c
Like this :
dmidecode -s system-serial-number |
sed -E 's/VMware-//;
s/ +//g;
s/(.)/\1-/8;
s/(.)/\1-/13;
s/(.)/\1-/23'
You can use Bash sub string extraction:
$ s="VMware-56 4d ad 01 22 5a 73 c2-89 ce 3f d8 ba d6 e4 0c"
$ s1=$(echo "${s:7}" | tr -d '[:space:]')
$ echo "${s1:0:8}-${s1:8:4}-${s1:12:9}-${s1:21}"
564dad01-225a-73c2-89ce-3fd8bad6e40c
Or, built-ins only (ie, no tr):
$ s1=${s:7}
$ s1="${s1// /}"
$ echo "${s1:0:8}-${s1:8:4}-${s1:12:9}-${s1:21}"

While loop in bash getting duplicate result

$ cat grades.dat
santosh 65 65 65 65
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
santosh 99 99 99 99 99
Scripts:-
#!/usr/bin/bash
filename="$1"
while read line
do
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
echo "total is count of the file is $a";
done <"$filename"
O/p
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
total is count of the file is 2
Real O/P should be
total is count of the file is 2 like this right..please let me know,where i am missing in above scripts.
Whilst others have shown you better ways to solve your problem, the answer to your question is in the following line:
a=`grep -w "santosh" $1 | awk '{print$1}' |wc -l`
You are storing names in the variable "line" through the while loop, but it is never used. Instead your loop is always looking for "santosh" which does appear twice and because you run the same query for all 5 lines in the file being searched, you therefore get 5 lines of the exact same output.
You could alter your current script like so:
a=$(grep -w "$line" "$filename" | awk '{print$1}' | wc -l)
The above is not meant to be a solution as others have pointed out, but it does solve your issue.

Setting Bash variable to last number in output

I have bash running a command from another program (AFNI). The command outputs two numbers, like this:
70.0 13.670712
I need to make a bash variable that will be whatever the last # is (in this case 13.670712). I've figured out how to make it print only the last number, but I'm having trouble setting it to be a variable. What is the best way to do this?
Here is the code that prints only 13.670712:
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')"; echo "${test}" | awk '{print $2}'
Just pipe(|) the command output to awk. Here in your example, awk reads from stdout of your previous command and prints the 2nd column de-limited by the default single white-space character.
test="$(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]' | awk '{print $2}')"
printf "%s\n" "$test"
13.670712
(or) using echo
echo "$test"
13.670712
This is the simplest of the ways to do this, if you are looking for other ways to do this in bash-ism, use read command as using process-substitution
read _ va2 < <(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]')
printf "%s\n" "$val2"
13.670712
Another more portable version using set, which will work irrespective of the shell available.
set -- $(3dBrickStat -mask ../../template/ROIs.nii -mrange 41 41 -percentile 70 1 70 'stats.s1_ANTS+tlrc[25]');
printf "%s\n" "$2"
13.670712
You can use cut to print to print the second column:
$ echo "70.0 13.670712" | cut -d ' ' -f2
13.670712
And assign that to a variable with command substitution:
$ sc="$(echo '70.0 13.670712' | cut -d ' ' -f2)"
$ echo "$sc"
13.670712
Just replace echo '70.0 13.670712' with the command that is actually producing the two numbers.
If you want to grab the last value of some delimited field (or delimited output from a command), you can use parameter expansion. This is completely internal to Bash:
$ echo "$s"
$ echo ${s##*' '}
10
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ echo ${s2##*' '}
20
And then just assign directly:
$ echo "$s2"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
$ lf=${s2##*' '}
$ echo "$lf"
20

Read the number of columns using awk/sed

I have the following test file
Kmax Event File - Text Format
1 4 1000
65 4121 9426 12312
56 4118 8882 12307
1273 4188 8217 12309
1291 4204 8233 12308
1329 4170 8225 12303
1341 4135 8207 12306
63 4108 8904 12300
60 4106 8897 12307
731 4108 8192 12306
...
ÿÿÿÿÿÿÿÿ
In this file I want to delete the first two lines and apply some mathematical calculations. For instance each column i will be $i-(i-1)*number. A script that does this is the following
#!/bin/bash
if test $1 ; then
if [ -f $1.evnt ] ; then
rm -f $1.dat
sed -n '2p' $1.evnt | (read v1 v2 v3
for filename in $1*.evnt ; do
echo -e "Processing file $filename"
sed '$d' < $filename > $1_tmp
sed -i '/Kmax/d' $1_tmp
sed -i '/^'"$v1"' '"$v2"' /d' $1_tmp
cat $1_tmp >> $1.dat
done
v3=`wc -l $1.dat | awk '{print $1}' `
echo -e "$v1 $v2 $v3" > .$1.dat
rm -f $1_tmp)
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e " Event file $1.evnt doesn't exist !!!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
else
echo -e "\a!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
echo -e "!!!!! Give name for event files !!!!!"
echo -e "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
fi
awk '{print $1, $2-4096, $3-(2*4096), $4-(3*4096)}' $1.dat >$1_Processed.dat
rm -f $1.dat
exit 0
The file won't always have 4 columns. Is there a way to read the number of columns, print this number and apply those calculations?
EDIT The idea is to have an input file (*.evnt), convert it to *.dat or any other ascii file(it doesn't matter really) which will only include the number in columns and then apply the calculation $i=$i-(i-1)*number. In addition it will keep the number of columns in a variable, that will be called in another program. For instance in the above file, number=4096 and a sample output file is the following
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
while in the console I will get the message There are 4 detectors.
Finally a new file_processed.dat will be produced, where file is the initial name of awk's input file.
The way it should be executed is the following
./myscript <filename>
where <filename> is the name without the format. For instance, the files will have the format filename.evnt so it should be executed using
./myscript filename
Let's start with this to see if it's close to what you're trying to do:
$ numdet=$( awk -v num=4096 '
NR>2 && NF>1 {
out = FILENAME "_processed.dat"
for (i=1;i<=NF;i++) {
$i = $i-(i-1)*num
}
nf = NF
print > out
}
END {
printf "There are %d detectors\n", nf | "cat>&2"
print nf
}
' file )
There are 4 detectors
$ cat file_processed.dat
65 25 1234 24
56 22 690 19
1273 92 25 21
1291 108 41 20
1329 74 33 15
1341 39 15 18
63 12 712 12
60 10 705 19
731 12 0 18
$ echo "$numdet"
4
Is that it?
Using awk
awk 'NR<=2{next}{for (i=1;i<=NF;i++) $i=$i-(i-1)*4096}1' file

Resources