get highest number then print next number in new file - shell

I have a file info.txt with pipe delimited, can you give me idea how to get the highest suffix and add entries on it based on the pattern?
info="$HOME/info.txt"
echo "Input the pattern: "
read pattern
awk '/pattern/{ print $0 }' $info >> $HOME/temp1.$$
sed 's/MICRO_AU_FILE//g' $HOME/temp1.$$
##then count highest num but i think not good approach
##if got he highest num then print next number
for ACC_NUM in `cat acc`
do
echo "$pattern-FILE$Highestsufix|server|$ACC_NUM*| >> $HOME/tempfile.$$
cat $HOME/tempfile.$$ >> $info
done
fi
info.txt
MICRO_AU-FILE01|serve|12345
MICRO_AU-FILE02|serve|23456
MICRO_AU-FILE04|serve|34534
MICRO_PH-FILE01|serve|56457
MICRO_PH-FILE02|serve|12345
MICRO_BN-FILE01|serve|78564
MICRO_BN-FILE03|serve|45267
acc
11111
22222
output: if my pattern is MICRO_AU
MICRO_AU-FILE01|serve|12345
MICRO_AU-FILE02|serve|23456
MICRO_AU-FILE04|serve|34534
MICRO_PL-FILE01|serve|56457
MICRO_PL-FILE02|serve|12345
MICRO_BN-FILE01|serve|78564
MICRO_BN-FILE03|serve|45267
MICRO_AU-FILE05|serve|11111
MICRO_AU-FILE06|serve|22222

I would extract the suffixes, sort them ascending numerically, and take the highest one. If the input is as regular as in the example, this would be simply
HIGHEST_INDEX=$(cut -c 14,15|sort -nr|head -n 1)
If the structure of the lines can vary, you would have to adapt the number selector (cut -c 14,15) according to your tast.
UPDATE: I just see, that you have tagged your question with shell and not with bash, zsh, or ksh. If you need your program to run also on Bourne Shell, you have to use
HIGHEST_INDEX=`cut -c 14,15|sort -nr|head -n 1`
In general, it is best with this type of question, if you explicitly state, on which shell(s) your program should run. The more specific you are in this respect, the better solution we can suggest. For example, getting the next higher number (after HIGHEST_INDEX) is more complicated in Bourne shell as in the other ones.

Related

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

How to find integer values and compare them then transfer the main files?

I have some output files (5000 files) of .log which are the results of QM computations. Inside each file there are two special lines indicate the number of electrons and orbitals, like this below as an example (with exact spaces as in output files):
Number of electrons = 9
Number of orbitals = 13
I thought about a script (bash or Fortran), as a solution to this problem, which grep these two lines (at same time) and get the corresponding integer values (9 and 13, for instance), compare them and finds the difference between two values, and finally, list them in a new text file with the corresponding filenames.
I would really appreciate any help given.
Am posting an attempt in GNU Awk, and have tested it in that only.
#!/bin/bash
for file in *.log
do
awk -F'=[[:blank:]]*' '/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' "$file" | awk 'function abs(v) {return v < 0 ? -v : v} {print abs($1-$2)}' >> output_"$file"
done
The reason I split the AWK logic to two was to reduce the complexity in doing it in single huge command. The first part is for extracting the numbers from your log file in a columnar format and second for getting their absolute value.
I will break-down the AWK logic:-
-F'=[[:blank:]]*' is a mult0 character delimiter logic including = and one or more instances of [[:blank:]] whitespace characters.
'/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' searches for lines starting with Number of and prints it in a columnar fashion, i.e. as 9 13 from your sample file.
The second part is self-explanatory. I have written a function to get the absolute value from the two returned values and print it.
Each output is saved in a file named output_, for you to process it further.
Run the script from your command line as bash script.sh, where script.sh is the name containing the above lines.
Update:-
In case if you are interested in negative values too i.e. without the absolute function, change the awk statement to
awk -F'=[[:blank:]]*' '/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' "$file" | awk '{print ($1-$2)}' >> output_"$file"
Bad way to do it (but it will work)-
while read file
do
first=$(awk -F= '/^Number/ {print $2}' "$file" | head -1)
second=$(awk -F= '/^Number/ {print $2}' "$file" | tail -1)
if [ "$first" -gt "$second" ]
then
echo $(("$first" - "$second"))
else
echo $(("$second" - "$first"))
fi > "$file"_answer ;
done < list_of_files
This method picks up the values (in the awk one liner and compares them.
It then subtracts them to give you one value which it saves in the file called "$file"_answer. i.e. the initial file name with '_answer' as a suffix to the name.
You may need to tweak this code to fit your purposes exactly.

Bash: Variable1 > get first n words > cut > Variable2

I've read so many entries here now and my head is exploding. Can't find the "right" solution, maybe my bad english is also the reason and for sure my really low skills of bash-stuff.
I'm writing a script, which reads the input of an user (me) into a variable.
read TEXT
echo $TEXT
Hello, this is a sentence with a few words.
What I want is (I'm sure) maybe very simple: I need now the first n words into a second variable. Like
$TEXT tr/csplit/grep/truncate/cut/awk/sed/whatever get the first 5 words > $TEXT2
echo $TEXT2
Hello, this is a sentence
I've used for example ${TEXT:0:10} but this cuts also in the middle of the word. And I don't want to use txt-file-input~outputs, just variables. Is there any really low level, simple solution for it, without losing myself in big, complex code-blocks and hundreds of (/[{*+$'-:%"})]... and so on? :(
Thanks a lot for any support!
Using cut could be a simple solution, but the below solution works too with xargs
firstFiveWords=$(xargs -n 5 <<< "Hello, this is a sentence with a few words." | awk 'NR>1{exit};1')
$ echo $firstFiveWords
Hello, this is a sentence
From the man page of xargs
-n max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s
option) is exceeded, unless the -x option is given, in which case xargs will exit.
and awk 'NR>1{exit};1' will print the first line from its input.

Count number of grep occurrences and store it a variable

I want to do something like this - grep for a string in a particular file, store it in a variable and be able to print just the number of occurrences.
#!/bin/bash
count=$(grep *something* *somefile*| wc -l)
echo $count
This always gives a 0 value, when I know it should be more.
This is what I intend to do, but its taking like forever to finish the script.
if egrep -iq "Android 6.0.1" $filename; then
count=$(egrep -ic "Android 6.0.1" $filename)
echo 'Operating System Version leaked number of times: '$count
I have 7 other such if statements and I am running this for around 20 files.
Any more efficient way to make it faster?
grep has its own counting flag
-c, --count
Suppress normal output; instead print a count of matching lines for
each input file. With the -v, --invert-match option (see below), count
non-matching lines. (-c is specified by POSIX .)
count=$( grep -c 'match' file)
Note that the match part is quoted as well so if you use special characters they are not interpreted by the shell.
Also as stated in the excerpt from that man page multiple matches on a single line will be counted as a single match as it only counts matching lines:
$ echo "hello hello hello hello
hello
> bye" | grep -c "hello"
2
A much more efficient approach would be to run Awk once.
awk -v patterns="foo,bar,baz" 'BEGIN { n=split(patterns, pats, ",") }
{ for (i=1; i<=n; ++i) if ($0 ~ pats[i]) ++hits[i] }
END { for (i=1; i<=n; ++i) printf("%8d%s\n", hits[i], pats[i]) }' list of files
For bonus points, format the output in machine-readable format (depending on where it ends up, JSON might be a good choice); and/or add the human-readable explanation for the significance of each hit to the END block.
If that's not what you want, running grep -Eic and ditching any zero value would already improve your run time over grepping the file twice for each match in the worst case. (The pessimal situation would be when the last line and no other line matches your pattern.)

Different output for pipe in script vs. command line

I have a directory with files that I want to process one by one and for which each output looks like this:
==== S=721 I=47 D=654 N=2964 WER=47.976% (1422)
Then I want to calculate the average percentage (column 6) by piping the output to AWK. I would prefer to do this all in one script and wrote the following code:
for f in $dir; do
echo -ne "$f "
process $f
done | awk '{print $7}' | awk -F "=" '{sum+=$2}END{print sum/NR}'
When I run this several times, I often get different results although in my view nothing really changes. The result is almost always incorrect though.
However, if I only put the for loop in the script and pipe to AWK on the command line, the result is always the same and correct.
What is the difference and how can I change my script to achieve the correct result?
Guessing a little about what you're trying to do, and without more details it's hard to say what exactly is going wrong.
for f in $dir; do
unset TEMPVAR
echo -ne "$f "
TEMPVAR=$(process $f | awk '{print $7}')
ARRAY+=($TEMPVAR)
done
I would append all your values to an array inside your for loop. Now all your percentages are in $ARRAY. It should be easy to calculate the average value, using whatever tool you like.
This will also help you troubleshoot. If you get too few elements in the array ${#ARRAY[#]} then you will know where your loop is terminating early.
# To get the percentage of all files
Percs=$(sed -r 's/.*WER=([[:digit:].]*).*/\1/' *)
# The divisor
Lines=$(wc -l <<< "$Percs")
# To change new lines into spaces
P=$(echo $Percs)
# Execute one time without the bc. It's easier to understand
echo "scale=3; (${P// /+})/$Lines" | bc

Resources