shell scripting do loop - shell

Sorry Im new to unix, but just wondering is there anyway I can make the following code into a loop. For example the file name would change every time from 1 to 50
My script is
cut -d ' ' -f5- cd1_abcd_w.txt > cd1_rightformat.txt ;
sed 's! \([^ ]\+\)\( \|$\)!\1 !g' cd1_rightformat.txt ;
sed -i 's/ //g' cd1_rightformat.txt;
cut -d ' ' -f1-4 cd1_abcd_w.txt > cd1_extrainfo.txt ;
I would like to make this into a loop where cd1_abcd_w.txt would then become cd2_abcd_w.txt and output would be cd2_rightformat.txt etc...all the way to 50.
So essentially cd$i.
Many thanks

In bash, you can use brace expansion:
for num in {1..10}; do
echo ${num}
done
Similar to a BASIC for i = 1 to 10 loop, it's inclusive at both ends, that loop will output the numbers 1 through 10.
You then just replace the echo command with whatever you need to do, such as:
cut -d ' ' -f5- cd${num}_abcd_w.txt >cd${num}_rightformat.txt
# and so on
If you need the numbers less than ten to have a leading zero, change the expression in the for loop to be {01..50} instead. That doesn't appear to be the case here but it's very handy to know.
Also in the not-needed-but-handy-to-know category, you can also specify an increment if you don't want to use the default of one:
pax> for num in {1..50..9}; do echo ${num}; done
1
10
19
28
37
46
(equivalent to the BASIC for i = 1 to 50 step 9).

This should work:
for((i=1;i<=50;i++));do
cut -d ' ' -f5- cd${i}_abcd_w.txt > cd${i}_rightformat.txt ;
sed 's! \([^ ]\+\)\( \|$\)!\1 !g' cd${i}_rightformat.txt ;
sed -i 's/ //g' cd${i}_rightformat.txt;
cut -d ' ' -f1-4 cd${i}_abcd_w.txt > cd${i}_extrainfo.txt ;
done

This would work in bash:
for in in $(seq 50)
do
cut -d ' ' -f5- cd$i_abcd_w.txt > cd$1_rightformat.txt;
sed 's! \([^ ]\+\)\( \|$\)!\1 !g' cd$i_rightformat.txt;
sed -i 's/ //g' cd$i_rightformat.txt;
cut -d ' ' -f1-4 cd$i_abcd_w.txt > cd$i_extrainfo.txt;
done

Related

Extract number in every line of TSV file

I have a file with tab-separated-values and also with blank spaces like this:
! (desambiguación) http://es.dbpedia.org/resource/!_(desambiguación) 5
! (álbum) http://es.dbpedia.org/resource/!_(álbum_de_Trippie_Redd) 2
!! http://es.dbpedia.org/resource/!! 4
$9.99 http://es.dbpedia.org/resource/$9.99 6
Tomlinson http://es.dbpedia.org/resource/(10108)_Tomlinson 20
102 Miriam http://es.dbpedia.org/resource/(102)_Miriam 2
2003 QQ47 http://es.dbpedia.org/resource/(143649)_2003_QQ47 2
I want to extract the last number of every line:
5
2
4
6
20
2
2
For that, I have done this:
while read line;
do
NUMBER=$(echo $line | cut -f 3 -d ' ')
echo $NUMBER
done < $PAIRCOUNTS_FILE
The main problem is that some lines have more spaces than others and cut doesn't work for me with default delimiter (tab). I dont' know why, maybe because I am using WSL.
I have tried cut with several options but it doesn't work in anyway:
NUMBER=$(echo $line | cut -f 3 -d ' ')
NUMBER=$(echo $line | cut -f 4 -d ' ')
NUMBER=$(echo $line | cut -f 2)
NUMBER=$(echo $line | cut -f 3)
Hope you can help me with this. Thanks in advance.
I want to extract the last number of every line:
You could use grep
grep -Eo '[[:digit:]]+$' file
Or mapfile aka readarray which is a bash4+ feature.
mapfile -t array < file
printf '%s\n' "${array[#]##* }"
You can use awk:
awk '{print $NF}' file
With cut (if it is truly TAB separated and 3 fields per line):
cat file | cut -f3
If you have some variable number of fields per line, use rev|cut|rev to get the last field:
cat file | rev | cut -f1 | rev
Or with pure Bash and parameter expansion:
while IFS= read -r line; do
last=${line##* } # that is a literal TAB in the parameter expansion
printf "%s\n" "$last";
done <file
Or, read into a bash array and echo the last field:
while IFS=$'\t' read -r -a arr; do
echo "${arr[${#arr[#]}-1]}"
done <file
If you have a mixture of tabs and spaces you can do what usually is a mistake and break a Bash variable on white spaces in general (tabs and spaces) into an array:
while IFS= read -r line; do
arr=($line) # break on either tab or space without quotes
echo "${arr[${#arr[#]}-1]}"
done <file

how to awk pattern as variable and loop the result?

I assign a keyword as variable, and need to awk from a file using this variable and loop. The file has millions of lines.
i have tried the code below.
DEVICE="DEV2"
while read -r line
do
echo $line
X_keyword=`echo $line | cut -d ',' -f 2 | grep -w "X" | cut -d '=' -f2`
echo $X_keyword
done <<< "$(grep -w $DEVICE $config)"
log="Dev2_PRT.log"
while read -r file
do
VALUE=`echo $file | cut -d '|' -f 1`
HEADER=`echo $VALUE | cut -c 1-4`
echo $file
if [[ $HEADER = 'PTR:' ]]; then
VALUE=`echo $file | cut -d '|' -f 4`
echo $VALUE
XCOORD+=($VALUE)
((X++))
fi
done <<< "awk /$X_keyword/ $log"
expected result:
the log files content lots of below:
PTR:1|2|3|4|X_keyword
PTR:1|2|3|4|Y_rest .....
Filter the X_keyword and get the field no 4.
Unfortunately your shell script is simply the wrong approach to this problem (see https://unix.stackexchange.com/q/169716/133219 for some of the reasons why) so you should set it aside and start over.
To demonstrate the solution, lets create a sample input file:
$ seq 10 | tee file
1
2
3
4
5
6
7
8
9
10
and a shell variable to hold a regexp that's a character list of the chars 5, 6, or 7:
$ var='[567]'
Now, given the above input, here is the solution for how to g/re/p pattern as variable and count how many results:
$ awk -v re="$var" '$0~re{print; c++} END{print "---" ORS c+0}' file
5
6
7
---
3
If that's not all you need then please edit your question to clarify your requirements and provide concise, testable sample input and expected output.

bash calculations with numbers from files

I am trying to do a simple thing:
To get the second number in the the line with the second occurence of the word TER and lower it by one and further process it. The tr -s ' ' is there because the file is not delimited by tabs, but by different amounts of whitespaces.
My script:
first_res_atombumb= grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' '
echo $((first_res_atombumb-1))
but this only returnes:
255
-1
Of course I want to have 254.
adding | tr -d '\n' does not help either, what on earth is going on? I have already asked several people at work noone seems to know.
the lines in question look linke this
TER 128 DA3 4
TER 255 DA3 8
and if I apply grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 | tr -s ' '| cut -f 2 -d ' ' in the command line i get what i expect, just 255
With bash, I'd write
n_ter=0
while read -a words; do
if [[ ${words[0]} == TER ]] && (( ++n_ter == 2 )); then
echo $(( ${words[1]} - 1 ))
fi
done < file
but I'd use awk
awk '$1 == "TER" && ++n == 2 {print $2 - 1}' file
The problem with your code: you forgot to use the $() command substitution syntax
first_res_atombumb= grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' '
# .................^...............................................................................^
echo $((first_res_atombumb-1))
You're setting the variable to an empty string in the environment of the grep command. Then, since you're not capturing the output of that pipeline, "255" is printed to the terminal. Because the variable is unset in your current shell, you get echo $((-1))
All you need is:
first_res_atombumb=$(grep 'TER' tata_sbox_cuda.pdb | head -n 2 | tail -1 |tr -s ' '| cut -f 2 -d ' ')
# .................^^...............................................................................^
But I'd still use awk.
If I understand your problem correctly you can solve it using AWK:
awk 'BEGIN{v=0} $1 == "TER" {v++;if (v==2) {print $2-1 ;exit}}' tata_sbox_cuda.pdb
Explanation:
BEGIN{v=0} declaring and nulling the variable.
$1 == "TER" execute the command in {} only if it's the second occurence of TER.
{v++;if (v==2) {print $2-1 ;exit}}' increase the value of v and check if it's 2, in this case subtract 1 from the second field and display, exit afterwards (will make the processing faster and will skip unnecessary lines).

Reading a file in a shell script and selecting a section of the line

This is probably pretty basic, I want to read in a occurrence file.
Then the program should find all occurrences of "CallTilEdb" in the file Hendelse.logg:
CallTilEdb 8
CallCustomer 9
CallTilEdb 4
CustomerChk 10
CustomerChk 15
CallTilEdb 16
and sum up then right column. For this case it would be 8 + 4 + 16, so the output I would want would be 28.
I'm not sure how to do this, and this is as far as I have gotten with vistid.sh:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r line
do
if [ "$occurance" = $(cut -f1 line) ] #line 10
then
sumTime+=$(cut -f2 line)
fi
done < "$filename"
so the execution in terminal would be
vistid.sh CallTilEdb
but the error I get now is:
/home/user/bin/vistid.sh: line 10: [: unary operator expected
You have a nice approach, but maybe you could use awk to do the same thing... quite faster!
$ awk -v par="CallTilEdb" '$1==par {sum+=$2} END {print sum+0}' hendelse.logg
28
It may look a bit weird if you haven't used awk so far, but here is what it does:
-v par="CallTilEdb" provide an argument to awk, so that we can use par as a variable in the script. You could also do -v par="$1" if you want to use a variable provided to the script as parameter.
$1==par {sum+=$2} this means: if the first field is the same as the content of the variable par, then add the second column's value into the counter sum.
END {print sum+0} this means: once you are done from processing the file, print the content of sum. The +0 makes awk print 0 in case sum was not set... that is, if nothing was found.
In case you really want to make it with bash, you can use read with two parameters, so that you don't have to make use of cut to handle the values, together with some arithmetic operations to sum the values:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
while read -r name value # read both values with -r for safety
do
if [ "$occurance" == "$name" ]; then # string comparison
((sumTime+=$value)) # sum
fi
done < "$filename"
echo "sum: $sumTime"
So that it works like this:
$ ./vistid.sh CallTilEdb
sum: 28
$ ./vistid.sh CustomerChk
sum: 25
first of all you need to change the way you call cut:
$( echo $line | cut -f1 )
in line 10 you miss the evaluation:
if [ "$occurance" = $( echo $line | cut -f1 ) ]
you can then sum by doing:
sumTime=$[ $sumTime + $( echo $line | cut -f2 ) ]
But you can also use a different approach and put the line values in an array, the final script will look like:
#!/bin/bash
declare -t filename=prova
declare -t occurance="$1"
declare -i sumTime=0
while read -a line
do
if [ "$occurance" = ${line[0]} ]
then
sumTime=$[ $sumtime + ${line[1]} ]
fi
done < "$filename"
echo $sumTime
For the reference,
id="CallTilEdb"
file="Hendelse.logg"
sum=$(echo "0 $(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1 +/p" < "$file") p" | dc)
echo SUM: $sum
prints
SUM: 28
the sed extract numbers from a lines containing the given id, such CallTilEdb
and prints them in the format number +
the echo prepares a string such 0 8 + 16 + 4 + p what is calculation in RPN format
the dc do the calculation
another variant:
sum=$(sed -n "s/^$id[^0-9]*\([0-9]*\)/\1/p" < "$file" | paste -sd+ - | bc)
#or
sum=$(grep -oP "^$id\D*\K\d+" < "$file" | paste -sd+ - | bc)
the sed (or the grep) extracts and prints only the numbers
the paste make a string like number + number + number (-d+ is a delimiter)
the bc do the calculation
or perl
sum=$(perl -slanE '$s+=$F[1] if /^$id/}{say $s' -- -id="$id" "$file")
sum=$(ID="CallTilEdb" perl -lanE '$s+=$F[1] if /^$ENV{ID}/}{say $s' "$file")
Awk translation to script:
#!/bin/bash
declare -t filename=hendelse.logg
declare -t occurance="$1"
declare -i sumTime=0
sumtime=$(awk -v entry=$occurance '
$1==entry{time+=$NF+0}
END{print time+0}' $filename)

Reverse input order with sed

I have a file, lets call it 'a.txt' and this file contains the following text line
do to what
I'm wondering what the SED command is to reverse the order of this text to make it look like
what to do
Do I have to do some sort of append? Like append 'do' to 'to' so it would look like
to ++ do (used ++ just to make it clear)
I know tac can do something related
$ cat file
do to what
$ tac -s' ' file
what to do $
Where the -s defines the separator, which is by default a newline.
I would use awk to do this:
awk '{ for (i=NF; i>=1; i--) printf (i!=1) ? $i OFS : $i "\n" }' file.txt
Results:
what to do
EDIT:
If you require a one-liner to modify your file "in-place", try:
{ rm file.txt && awk '{ for (i=NF; i>=1; i--) printf (i!=1) ? $i OFS : $i "\n" }' > file.txt; } < file.txt
sed answer
As this question was tagged sed, my 1st answer was:
First (using arbitraty _ to mark viewed spaces, when a.txt contain do to what:
sed -e '
:a;
s/\([^_]*\) \([^ ]*\)/\2_\1/;
ta;
y/_/ /;
' a.txt
what to do
than, when a.txt contain do to to what:
sed -e '
:a;
s/^\(\|.* \)\([^+ ]\+\) \2\([+]*\)\(\| .*\)$/\1\2\3+\4/g;
ta;
:b;
s/\([^_]*\) \([^ ]*\)/\2_\1/;
tb;
y/_/ /;
' <<<'do to to to what'
what to++ do
There is one + for each supressed duplicated word:
sed -e ':a;s/^\(\|.* \)\([^+ ]\+\) \2\([+]*\)\(\| .*\)$/\1\2\3+\4/g;ta;
:b;s/\([^_]*\) \([^ ]*\)/\2_\1/;tb;
y/_/ /;' <<<'do do to what what what what'
what+++ to do+
bash answer
But as there is a lot of people searching for simple bash solutions, there is a simple way:
xargs < <(uniq <(tac <(tr \ \\n <<<'do do to what what what what')))
what to do
this could be written:
tr \ \\n <<<'do do to what what what what' | tac | uniq | xargs
what to do
or even with some bash scripting:
revcnt () {
local wrd cnt plut out="";
while read cnt wrd; do
printf -v plus %$((cnt-1))s;
out+=$wrd${plus// /+}\ ;
done < <(uniq -c <(tac <(tr \ \\n )));
echo $out
}
Will do:
revcnt <<<'do do to what what what what'
what+++ to do+
Or as pure bash
revcnt() {
local out i;
for ((i=$#; i>0; i--))
do
[[ $out =~ ${!i}[+]*$ ]] && out+=+ || out+=\ ${!i};
done;
echo $out
}
where submited string have to be submitted as argument:
revcnt do do to what what what what
what+++ to do+
Or if prossessing standard input (or from file) is required:
revcnt() {
local out i arr;
while read -a arr; do
out=""
for ((i=${#arr[#]}; i--; 1))
do
[[ $out =~ ${arr[i]}[+]*$ ]] && out+=+ || out+=\ ${arr[i]};
done;
echo $out;
done
}
So you can process multiple lines:
revcnt <<eof
do to what
do to to to what
do do to what what what what
eof
what to do
what to++ do
what+++ to do+
This might work for you (GNU sed):
sed -r 'G;:a;s/^\n//;t;s/^(\S+|\s+)(.*)\n/\2\n\1/;ta' file
Explanation:
G add a newline to the end of the pattern space (PS)
:a loop name space
s/^\n//;t when the newline is at the front of the PS, remove it and print line
s/^(\S+|\s+)(.*)\n/\2\n\1/;ta insert either a non-space or a space string directly after the newline and loop to :a
The -r switch makes the regexp easier-on-the-eye (grouping (...), alternation ...|... and the metacharacter for one-or-more + are relieved of the need of a backslash prefix).
Alternative:
sed -E 'G;:a;s/^(\S+)(\s*)(.*\n)/\3\2\1/;ta;s/.//' file
N.B. To reverse the line, adapt the above solution to:
sed -E 'G;:a;/^(.)(.*\n)/\2\1/;ta;s/.//' file
May be you would like perl for this:
perl -F -lane '#rev=reverse(#F);print "#rev"' your_file
As Bernhard said, tac can be used here:
#!/usr/bin/env bash
set -eu
echo '1 2 3
2 3 4
3 4 5' | while IFS= read -r; do
echo -n "$REPLY " | tac -s' '
echo
done
$ ./1.sh
3 2 1
4 3 2
5 4 3
I believe my example is more helpful.

Resources