search through file for data and create new txt file with just that data - bash

I have a txt file that is an output from a machine with a bunch of writing/data/paragraphs which are not used for graphing purposes, but somewhere in the middle of the file I have the actual data that I need to graph. I need to search the file for the data and then print the data to a txt file so I can graph it later.
The data in the middle of the file looks like this (with each data file potentially having different amounts of rows/columns and numbers are separated by spaces):
<> 1 2 3 4 5 6 etc.
A 1.2 1.3 1.4 etc.
B 0.2 0.3 0.4 etc.
C 2.2 2.3 2.4 etc.
etc.
My thinking so far was to grep to '<>' to find the first line (grep '^<>' file) but I'm not sure how I would account for the variable amount of rows/columns when trying to find them. Also, I am using awk to loop over all .txt files in the directory and print to a new outfile so I can do multiple files at once (so maybe I can do this search/printing in awk as well?).
Edit:
--input/expected output file--
input file
This is the data
Here are some paragraphs
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4
more paragraphs
more paragraphs
output file:
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4
Using awk to do this to multiple txt files in a directory.

Here's one in awk. It looks for <> or decimal number ([0-9]+\.[0-9]+) in a record. If that's not enough, maybe you could try to expand that decimal number testing part to test for 3 numbers, something like: (/ [0-9]+\.[0-9]+){3}/
$ awk '/<>/||/[0-9]+\.[0-9]+/' foo
<> 1 2 3
A 1.2 1.3 1.4
B 0.2 0.3 0.4
C 2.2 2.3 2.4

Related

text manipulation by addition and multiplication

I have a text file saved in the name 'test_file' that contain 6 rows and 7 columns as given below
0.00 5.8 2.0 5.0 6.0 8.0 0.0
10.00 5.8 2.0 1.0 1.0 1.2 9.6
10.00 9.3 2.2 2.0 1.4 2.5 9.6
30.00 9.3 2.2 1.2 1.5 1.9 1.4
30.00 9.3 2.2 3.2 2.4 1.2 4.1
60.00 9.8 3.5 1.4 2.7 3.2 4.5
I want to do some text manipulation in second and third column.
In the third column first two rows values should be same (2.0 and 2.0) and next three rows values are just the 0.2 increment of second row value(2.0+0.2=2.2,2.0+0.2=2.2,2.0+0.2=2.2).However, i don't want to change last row i want to keep it as it is.
After that in the second column, first two rows values should be just the multiplication of, first two rows of third column with 2.9.
similarly next three rows of second column are just the multiplication of next three rows of third column with 4.227
other columns values i don't want to change at all.
Now i want to change the first two rows value of third column sequentially, 2.1,2.2....2.5 followed by same increment and multiplication.
For example when i change first two rows values of third column from original 2.0 to 2.1 then the expected output should be
0.00 6.09 2.1 5.0 6.0 8.0 0.0
10.00 6.09 2.1 1.0 1.0 1.2 9.6
10.00 9.722 2.3 2.0 1.4 2.5 9.6
30.00 9.722 2.3 1.2 1.5 1.9 1.4
30.00 9.722 2.3 3.2 2.4 1.2 4.1
60.00 9.8 3.5 1.4 2.7 3.2 4.5
and i want to save the output file in different name such as file2.1.txt....file2.5.txt
awk to the rescue!
$ awk 'p {print p}
{pp=$0; v=$3; $3+=0.1; $2*=$3/v; p=$0}
END {print pp}' file | column -t
0.00 6.09 2.1 5.0 6.0 8.0 0.0
10.00 6.09 2.1 1.0 1.0 1.2 9.6
10.00 9.72273 2.3 2.0 1.4 2.5 9.6
30.00 9.72273 2.3 1.2 1.5 1.9 1.4
30.00 9.72273 2.3 3.2 2.4 1.2 4.1
60.00 9.8 3.5 1.4 2.7 3.2 4.5
since you want special treatment for the last record, delay processing using the previous record p, also you want unmodified last record, so store the original previous record in pp and print at the END. Delayed printing will print the modified records and last one will be unmodified.
You can specify number formatting as well but I didn't think it was important...
To run for multiple increments, just add an outer loop
$ for inc in {1..5};
do awk -v inc=$inc '...
... $3+=(inc/10) ...
...' file > file."$inc".txt
done
You can pass the increment (actually 10 times the increment) to awk script as a variable, use in in the script as well as in the output filename. The only change in the awk script is the increment.
Here's another version if you can't work with the other answer:
awk -vval=2.1 '{ # set "val" to the new value for column 3 on first two lines
if(NR==1 || NR==2) { # if it's the first or second line
$3=val; # set column 3 to val
$2=$3*2.9 # set column 2 to column 3 multiplied with 2.9
} else if(NR>=3 && NR<=5) { # else if it's line 3-5
$3=val+0.2; # set column 3 to val+0.2
$2=$3*4.227 # set column 2 to column 3 multiplied with 4.227
} else $3=$3; # just for formatting
print # print the result
}' test_file
Remove the comments (#) before you run it.
Output:
0.00 6.09 2.1 5.0 6.0 8.0 0.0
10.00 6.09 2.1 1.0 1.0 1.2 9.6
10.00 9.7221 2.3 2.0 1.4 2.5 9.6
30.00 9.7221 2.3 1.2 1.5 1.9 1.4
30.00 9.7221 2.3 3.2 2.4 1.2 4.1
60.00 9.8 3.5 1.4 2.7 3.2 4.5
To loop over a range and save it in different files you can do like below. I also made the other parameters available so you can set them when running the script:
#!/bin/bash
for val in $(seq 2.1 0.1 2.5)
do
awk -vval=$val -vfmul=2.9 -vadd=0.2 -vsmul=4.227 '{
if(NR==1 || NR==2) {
$3=val;
$2=$3*fmul
} else if(NR>=3 && NR<=5) {
$3=val+add;
$2=$3*smul
} else $3=$3;
print
}' test_file > output$val
done

Multiple plots from a single text file (gnuplot)

Currently, I have a text file and I'm interested in plotting two different curves from a single file(values for x axis are the same-column 1, values for y axis-columns 3 and 4). The plot should be in STDOUT since I'm working from ssh. The file that I am working with looks like this (filename: tmp)
%Iter duration train_objective valid_objective difference
0 6.0 0.0195735 0.0610958 0.0415223
1 5.0 0.180216 0.191344 0.011128
2 5.0 0.223318 0.241081 0.017763
3 6.0 0.245895 0.262197 0.016302
4 6.0 0.25796 0.28056 0.0226
5 6.0 0.269223 0.291769 0.022546
6 5.0 0.281187 0.298474 0.017287
7 5.0 0.283891 0.305579 0.021688
8 5.0 0.296456 0.307381 0.010925
9 5.0 0.296856 0.315487 0.018631
10 5.0 0.295805 0.321391 0.025586
Total training time is 0:06:27
So far, I can only plot the values corresponding to the 3rd column using the following line:
cat tmp | gnuplot -e "set terminal dumb size 120, 30; set autoscale; plot '-' u 1:3 with lines notitle"
Could someone tell me then how I could include the 4th column in the same plot? is that possible?
Thanks!
There is nothing in your description that rules out the trivial answer:
gnuplot -e "plot 'tmp' u 1:3 with lines, '' u 1:4 with lines"
The terminal choice is not relevant (you used 'set term dumb' but it could just as easily be any other output terminal, connection via ssh does not prevent that). If you have additional constraints that require a more complicated solution, please add them to the question.

diff -u -s, line cound (+, -) not giving correct value

I am using diff -u -s file1 file2 and counting + and - for Added and deleted lines in files for File comparison automation. (Modified lines will also count as one + and one -). These counts match with Araxis tool compare statistics (Total Added+Deleted of script=Changed+deleted+new of Araxis) for most of the files. But script total and Araxis total does not match for few files.
P.S. - I am using cygwin to run script on windows. I tried dos2unix, tail -c 4 etc in hope of removing BOM characters. But out of these culprit files some of them does not have BOM, and still count does not match. Following are few sample culprit files.
(1)SIACPO_ActivacionDesactivacionBlacklist.aspx.vb - Script gives 57 total count, while araxis 55
(2)SIACPO_Suspension_Servicio.aspx - Script gives 2509 total count, while araxis 2473
(3)repCuadreProceso.aspx - Script gives 1165 total count, while araxis 1163
(4)detaPago.aspx.vb - This is strange file. There is no change at all, except BOM character on 1st line. Script gives 0, 0 count, then why at all this in modified list of files??
Now how can I attach these 4 culprint files (Dev as well as Prod version) for your troubleshooting?

Comparing unknown number of variable in bash

I have 1 to 4 linux server names in a configuration file, I will have to take those names out of the configuration file and assign them values (Floating point Value derived from linux commands). Now the number of server taken out of the configuration file may vary depending on the server availability(For example if a server is down for some reason we will remove the server from configuration file or comment it out) so Fixed number of server is 4 but it may reduce based on its availability, how do I compare the values derived and find out the least/minimum out of this ? It will be great if someone could provide suggestions on this.
To compare two floating point numbers you can use bc. It will print (not return) 0 for false and 1 for true statements:
$ bc <<< '2.01 > 2.1'
0
$ bc <<< '2.1 > 2.01'
1
$ bc <<< '2.01 >= 2.1'
0
$ bc <<< '2.01 >= 2.01'
1

gnuplot not recognizing plot for syntax

I am trying to use the for syntax for multiple columns.
I have a data file colhead.dat:
Id a1 a2 a3
1 1 2 3
2 2 3 4
3 2 3 4
Following the answer https://stackoverflow.com/a/17525615/429850, I do
gnuplot> plot for [i=2:5] 'colhead.dat' u 1:i w lp title columnheader(i)
^
':' expected
How do i write the for loop? Here's the gnuplot version header
Version 4.2 patchlevel 6
last modified Sep 2009
System: Linux 2.6.32-71.el6.x86_64
For-loops have been implemented in version 4.6 of gnuplot, and there was nothing like loops in the versions before. So you have to update your version!
Edit: As Christoph mentioned, first for functionality was introduced in 4.4. However, 4.2 is too old.

Resources