Explanation to an assignment - bash

I am NOT looking for an answer to this problem. I am having trouble understanding what I should be trying to accomplish in this assignment. I welcome Pseudo code or hints if you would like. but what I really need is an explanation to what I need to be making, and what the output should be/look like. please do not write out a lot of code though I would like to try that on my own.
(()) = notes from me
The assignment is:
a program (prog.exe) ((we are given this program)) that reads 2 integers (m, n) and 1 double (a) from an input data file named input.in. For example, the sample input.in given file contains the values
5 7 1.23456789012345
when you run ./prog.exe the output is a long column of floating-point numbers
in additions to the program, there is a file called ain.in that contains a long column of double precision values.
copy prog.exe and ain.in to working directory
Write a bash script that does that following:
-Runs ./prog.exe for all combonations of
--m=0,1,...,10
--n=0,1,...,5
--a=every value in the file ain.in
-this is essentially a triple nested loop over m,n and the ain.in values
-for each combination of m,n and ain.in value above:
-- generate the appropriate input file input.in
-- run the program and redirect the output to some temporary output file.
--extract the 37th and 51st values from this temporary output file and store these in a file called average.in
-when the 3 nested loops terminate the average.in file should contain a long list of floating point values.
-your script should return the average of the values contained in average.in
HINTS: seq , awk , output direction, will be useful here
thank you to whoever took the time to even read through this.
This is my second bash coding assignment and im still trying to get a grasp on it and a better explanation would be very helpful. thanks again!

this is one way of generating all input combinations without explicit loops
join -j9 <(join -j9 <(seq 0 10) <(seq 0 5)) ain.in | cut -d' ' -f2-

The idea is to write a bash script that will test prog.exe with a variety of input conditions. This means recreating input.in and running prog.exe many times. Each time you run prog.exe, input.in should contain a different three numbers, e.g.,
First run:
0 0 <first line of ain.in>
Second run:
0 0 <second line of ain.in>
. . . last run:
10 5 <last line of ain.in>
You can use seq and for loops to accomplish this.
Then, you need to systematically save the output of each run, e.g.,
./prog.exe > tmp.out
# extract line 37 and 51 and append to average.ln
sed -n '37p; 51p; 51q' tmp.out >> average.ln
Finally, after testing all the combinations, use awk to compute the average of all the lines in average.in.

One-liner inspired by #karakfa:
join -j9 <(join -j9 <(seq 0 10) <(seq 0 5)) ain.in | cut -d' ' -f2- |
sed "s/.*/echo & >input.in;./prog.exe>tmp.out; sed -n '37p;51p;51q' tmp.out/" |
sh | awk '{sum+=$1; n++} END {print sum/n}'

Related

How to loop a variable range in cut command

I have a file with 2 columns, and i want to use the values from the second column to set the range in the cut command to select a range of characters from another file. The range i desire is the character in the position of the value in the second column plus the next 10 characters. I will give an example in a while.
My files are something like that:
File with 2 columns and no blank lines between lines (file1.txt):
NAME1 10
NAME2 25
NAME3 48
NAME4 66
File that i want to extract the variable range of characters(just one very long line with no spaces and no bold font) (file2.txt):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
...or, more literally (for copy/paste to test):
GATCGAGCGGGATTCTTTTTTTTTAGGCGAGTCAGCTAGCATCAGCTACGAGAGGCGAGGGCGGGCTATCACGACTACGACTACGACTACAGCATCAGCATCAGCGCACTAGAGCGAGGCTAGCTAGCTACGACTACGATCAGCATCGCACATCGACTACGATCAGCATCAGCTACGCATCGAAGAGAGAGC
Desired resulting file, one sequence per line (result.txt):
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
The resulting file would have the characters from 10-20, 25-35, 48-58 and 66-76, each range in a new line. So, it would always keep the range of 10, but in different start points and those start points are set by the values in the second column from the first file.
I tried the command:
for i in $(awk '{print $2}' file1.txt);
do
p1=$i;
p2=`expr "$1" + 10`
cut -c$p1-$2 file2.txt > result.txt;
done
I don't get any output or error message.
I also tried:
while read line; do
set $line
p2=`expr "$2" + 10`
cut -c$2-$p2 file2.txt > result.txt;
done <file1.txt
This last command gives me an error message:
cut: invalid range with no endpoint: -
Try 'cut --help' for more information.
expr: non-integer argument
There's no need for cut here; dd can do the job of indexing into a file, and reading only the number of bytes you want. (Note that status=none is a GNUism; you may need to leave it out on other platforms and redirect stderr otherwise if you want to suppress informational logging).
while read -r name index _; do
dd if=file2.txt bs=1 skip="$index" count=10 status=none
printf '\n'
done <file1.txt >result.txt
This approach avoids excessive memory requirements (as present when reading the whole of file2 -- assuming it's large), and has bounded performance requirements (overhead is equal to starting one copy of dd per sequence to extract).
Using awk
$ awk 'FNR==NR{a=$0; next} {print substr(a,$2+1,10)}' file2 file1
GATTCTTTTT
GGCGAGTCAG
CGAGAGGCGA
TATCACGACT
If file2.txt is not too large, then you can read it in memory,
and use Bash sub-strings to extract the desired ranges:
data=$(<file2.txt)
while read -r name index _; do
echo "${data:$index:10}"
done <file1.txt >result.txt
This will be much more efficient than running cut or another process for every single range definition.
(Thanks to #CharlesDuffy for the tip to read data without a useless cat, and the while loop.)
One way to solve it:
#!/bin/bash
while read line; do
pos=$(echo "$line" | cut -f2 -d' ')
x=$(head -c $(( $pos + 10 )) file2.txt | tail -c 10)
echo "$x"
done < file1.txt > result.txt
It's not the solution an experienced bash hacker would use, but it is very good for someone who is new to bash. It uses tools that are very versatile, although somewhat bad if you need high performance. Shell scripting is commonly used by people who rarely shell scripts, but knows a few commands and just wants to get the job done. That's why I'm including this solution, even if the other answers are superior for more experienced people.
The first line is pretty easy. It just extracts the numbers from file1.txt. The second line uses the very nice tools head and tail. Usually, they are used with lines instead of characters. Nevertheless, I print the first pos + 10 characters with head. The result is piped into tail which prints the last 10 characters.
Thanks to #CharlesDuffy for improvements.

How to find integer values and compare them then transfer the main files?

I have some output files (5000 files) of .log which are the results of QM computations. Inside each file there are two special lines indicate the number of electrons and orbitals, like this below as an example (with exact spaces as in output files):
Number of electrons = 9
Number of orbitals = 13
I thought about a script (bash or Fortran), as a solution to this problem, which grep these two lines (at same time) and get the corresponding integer values (9 and 13, for instance), compare them and finds the difference between two values, and finally, list them in a new text file with the corresponding filenames.
I would really appreciate any help given.
Am posting an attempt in GNU Awk, and have tested it in that only.
#!/bin/bash
for file in *.log
do
awk -F'=[[:blank:]]*' '/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' "$file" | awk 'function abs(v) {return v < 0 ? -v : v} {print abs($1-$2)}' >> output_"$file"
done
The reason I split the AWK logic to two was to reduce the complexity in doing it in single huge command. The first part is for extracting the numbers from your log file in a columnar format and second for getting their absolute value.
I will break-down the AWK logic:-
-F'=[[:blank:]]*' is a mult0 character delimiter logic including = and one or more instances of [[:blank:]] whitespace characters.
'/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' searches for lines starting with Number of and prints it in a columnar fashion, i.e. as 9 13 from your sample file.
The second part is self-explanatory. I have written a function to get the absolute value from the two returned values and print it.
Each output is saved in a file named output_, for you to process it further.
Run the script from your command line as bash script.sh, where script.sh is the name containing the above lines.
Update:-
In case if you are interested in negative values too i.e. without the absolute function, change the awk statement to
awk -F'=[[:blank:]]*' '/Number of/{printf "%s%s",$2,(NR%2?" ":RS)}' "$file" | awk '{print ($1-$2)}' >> output_"$file"
Bad way to do it (but it will work)-
while read file
do
first=$(awk -F= '/^Number/ {print $2}' "$file" | head -1)
second=$(awk -F= '/^Number/ {print $2}' "$file" | tail -1)
if [ "$first" -gt "$second" ]
then
echo $(("$first" - "$second"))
else
echo $(("$second" - "$first"))
fi > "$file"_answer ;
done < list_of_files
This method picks up the values (in the awk one liner and compares them.
It then subtracts them to give you one value which it saves in the file called "$file"_answer. i.e. the initial file name with '_answer' as a suffix to the name.
You may need to tweak this code to fit your purposes exactly.

How to process each grep (with -A option) result separately? Not by line, but by item

I have a long log to parse, the log message for each event takes more than one line usually. If I use hardcoded line numbers of each event, then I can use grep -A $EventLineNumbers to grab the whole message of each event.
Now, for example, I want to grep for two data fields in one event with 20 lines of messages. That event might be logged 100 times in the log. I want to avoid process grep results line by line because if one data field doesn't exist in one of the events, I will end up taking data from a different event and "bundle" it with previous event.
In short, how do I process each grep individually given that I have set a -A option in the command?
For this kind of parsing, I would use awk(1). Compared to grep(1), it allows you to easily use variables and conditions. So you can:
match on the line(s) indicating the start of a bunch of lines you are interested in
set a variable to remember this state
process lines following, remember relevant fields
when you reach the end of this bunch of lines, print the formatted information
If in fact, those lines are not interesting, you can reset the state and skip to the next interesting message.
The command will be longer than a simple grep, but it remains concise and you don't need to use a full blown programming language.
This should work:
Example Input:
1
2
3
1
4
5
6
2
4
Assuming that all blocks are 3 lines and that you want to match the block with a 1 in it. The following code works:
echo -e '1\n2\n3\n1\n4\n5\n6\n2\n4' | grep 1 -A2 | awk -v n=3 '{print} NR % 3 == 0 { printf "\0" }' | xargs -0 -n 1 echo -n "WOOT: "
Explanation:
The grep 1 -A2 is you expression to find the relevant block. I assume that the matching expression occurs only once per block. The -A2 is for the two lines of context after the match.
The awk -v '{print} NR % 3 == 0 { printf "\0" }' part instructs awk to print a \0 character every 3 lines. Again, adjust this for the relevant blocksize. We need this for the next command.
The xargs -0 -n1 part executes the next command (which would be the extra filters for you), giving the whole block to it. -0 for the null terminated items (a block) and -n1 for only passing a single item to the next command.
In my example this will give the following output:
WOOT: 1
2
3
WOOT: 1
4
5
Meaning that echo was executed for each block exactly once.

Reverse lines in a file two by two

I'm trying to reverse the lines in a file, but I want to do it two lines by two lines.
For the following input:
1
2
3
4
…
97
98
I would like the following output:
97
98
…
3
4
1
2
I found lots of ways to reverse a file line by line (especially on this topic: How can I reverse the order of lines in a file?).
tac. The simplest. Doesn't seem to have an option for what I want, even if I tried to play around with options -r and -s.
tail -r (not POSIX compliant). Not POSIX compliant, my version doesn't seem to have anything to do that.
Remains three sed formula, and I think a little modification would do the trick. But I'm not even understanding what they're doing, and thus I'm stuck here.
sed '1!G;h;$!d'
sed -n '1!G;h;$p'
sed 'x;1!H;$!d;x'
Any help would be appreciated. I'll try to understand these formula and to give answer to this question by myself.
Okay, I'll bite. In pure sed, we'll have to build the complete output in the hold buffer before printing it (because we see the stuff we want to print first last). A basic template can look like this:
sed 'N;G;h;$!d' filename # Incomplete!
That is:
N # fetch another line, append it to the one we already have in the pattern
# space
G # append the hold buffer to the pattern space.
h # save the result of that to the hold buffer
$!d # and unless the end of the input was reached, start over with the next
# line.
The hold buffer always contains the reversed version of the input processed so far, and the code takes two lines and glues them to the top of that. In the end, it is printed.
This has two problems:
If the number of input lines is odd, it prints only the last line of the file, and
we get a superfluous empty line at the end of the input.
The first is because N bails out if no more lines exist in the output, which happens with an odd number of input lines; we can solve the problem by executing it conditionally only when the end of the input was not yet reached. Just like the $!d above, this is done with $!N, where $ is the end-of-input condition and ! inverts it.
The second is because at the very beginning, the hold buffer contains an empty line that G appends to the pattern space when the code is run for the very first time. Since with $!Nwe don't know if at that point the line counter is 1 or 2, we should inhibit it conditionally on both. This can be done with 1,2!G, where 1,2 is a range spanning from line 1 to line 2, so that 1,2!G will run G if the line counter is not between 1 and 2.
The whole script then becomes
sed '$!N;1,2!G;h;$!d' filename
Another approach is to combine sed with tac, such as
tac filename | sed -r 'N; s/(.*)\n(.*)/\2\n\1/' # requires GNU sed
That is not the shortest possible way to use sed here (you could also use tac filename | sed -n 'h;$!{n;G;};p'), but perhaps easier to understand: Every time a new line is processed, N fetches another line, and the s command swaps them. Because tac feeds us the lines in reverse, this restores pairs of lines to their original order.
The key difference to the first approach is the behavior for an odd number of lines: with the second approach, the first line of the file will be alone without a partner, whereas with the first it'll be the last.
I would go with this:
tac file | while read a && read b; do echo $b; echo $a; done
Here is an awk you can use:
cat file
1
2
3
4
5
6
7
8
awk '{a[NR]=$0} END {for (i=NR;i>=1;i-=2) print a[i-1]"\n"a[i]}' file
7
8
5
6
3
4
1
2
It store all line in an array a, then print it out in reverse, two by two.

Copying part of a large file using command line

I've a text file with 2 million lines. Each line has some transaction information.
e.g.
23848923748, sample text, feild2 , 12/12/2008
etc
What I want to do is create a new file from a certain unique transaction number onwards. So I want to split the file at the line where this number exists.
How can I do this form the command line?
I can find the line by doing this:
cat myfile.txt | grep 23423423423
use sed like this
sed '/23423423423/,$!d' myfile.txt
Just confirm that the unique transaction number cannot appear as a pattern in some other part of the line (especially, before the correctly matching line) in your file.
There is already a 'perl' answer here, so, i'll give one more AWK way :-)
awk '{BEGIN{skip=1} /number/ {skip=0} // {if (skip!=1) print $0}' myfile.txt
On a random file in my tmp directory, this is how I output everything from the line matching popd onwards in a file named tmp.sh:
tail -n+`grep -n popd tmp.sh | cut -f 1 -d:` tmp.sh
tail -n+X matches from that line number onwards; grep -n outputs lineno:filename, and cut extracts just lineno from grep.
So for your case it would be:
tail -n+`grep -n 23423423423 myfile.txt | cut -f 1 -d:` myfile.txt
And it should indeed match from the first occurrence onwards.
It's not a pretty solution, but how about using -A parameter of grep?
Like this:
mc#zolty:/tmp$ cat a
1
2
3
4
5
6
7
mc#zolty:/tmp$ cat a | grep 3 -A1000000
3
4
5
6
7
The only problem I see in this solution is the 1000000 magic number. Probably someone will know the answer without using such a trick.
You can probably get the line number using Grep and then use Tail to print the file from that point into your output file.
Sorry I don't have actual code to show, but hopefully the idea is clear.
I would write a quick Perl script, frankly. It's invaluable for anything like this (relatively simple issues) and as soon as something more complex rears its head (as it will do!) then you'll need the extra power.
Something like:
#!/bin/perl
my $out = 0;
while (<STDIN>) {
if /23423423423/ then $out = 1;
print $_ if $out;
}
and run it using:
$ perl mysplit.pl < input > output
Not tested, I'm afraid.

Resources