how to extract path from string in shell script - shell

i need to extract path from string
for e.g
title="set key invert ; set bmargin 0 ; set multiplot ; set size 1.0 , 0.33 ; set origin 0.0 , 0.67 ; set format x "" ; set xtics offset 15.75 "1970-01-01 00:00:00" , 31556736 ; plot "/usr/local/lucid/www/tmp/20171003101438149255.dat" using 1:5 notitle with linespoints ls 2'"
Then expected output should be
/usr/local/lucid/www/tmp/20171003101438149255.dat
using awk or grep

sed approach:
title='set key invert ; set bmargin 0 ; set multiplot ; set size 1.0 , 0.33 ; set origin 0.0 , 0.67 ; set format x "" ; set xtics offset 15.75 "1970-01-01 00:00:00" , 31556736 ; plot "/usr/local/lucid/www/tmp/20171003101438149255.dat" using 1:5 notitle with linespoints ls 2'
sed 's/.* plot "\([^"]\+\).*/\1/' <<<$title
/usr/local/lucid/www/tmp/20171003101438149255.dat

With grep solution,
grep -oP '"\K/[^"]*(?=")' <<< $title
With awk solution,
awk '{match($0,/\/[^"]*/,a);print a[0]}' <<< $title

Shorter regex with grep:
grep -oP 'plot "\K[^"]+' <<< $title
/usr/local/lucid/www/tmp/20171003101438149255.dat

Related

bash / awk / gnuplot: pre-processing on the data for ploting using gnuplot

Dealing with the analysis of multi-column data, organized in the following format:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608#O2 GLU_166#H GLU_166#N 708 0.7548 2.8489 160.3990
lig_608#O3 THR_26#H THR_26#N 532 0.5672 2.8699 161.9043
THR_26#O lig_608#H15 lig_608#N6 414 0.4414 2.8509 153.3394
lig_608#N2 HIE_163#HE2 HIE_163#NE2 199 0.2122 2.9167 156.3248
GLN_189#OE1 lig_608#H2 lig_608#N4 32 0.0341 2.8899 156.4308
THR_25#OG1 lig_608#H14 lig_608#N5 26 0.0277 2.8906 160.9933
lig_608#O4 GLY_143#H GLY_143#N 25 0.0267 2.8647 146.5977
lig_608#O3 THR_25#HG1 THR_25#OG1 16 0.0171 2.7618 152.3421
lig_608#O2 GLN_189#HE21 GLN_189#NE2 15 0.0160 2.8947 154.3567
lig_608#N7 ASN_142#HD22 ASN_142#ND2 10 0.0107 2.9196 147.8856
lig_608#O4 ASN_142#HD21 ASN_142#ND2 9 0.0096 2.8462 148.4038
HIE_41#O lig_608#H14 lig_608#N5 9 0.0096 2.8693 148.4560
GLN_189#NE2 lig_608#H2 lig_608#N4 7 0.0075 2.9562 153.6447
lig_608#O4 ASN_142#HD22 ASN_142#ND2 4 0.0043 2.8954 158.0293
THR_26#O lig_608#H14 lig_608#N5 2 0.0021 2.8259 156.4279
lig_608#O4 ASN_119#HD21 ASN_119#ND2 1 0.0011 2.8786 144.1573
lig_608#N2 GLU_166#H GLU_166#N 1 0.0011 2.9295 149.3281
My gnuplot script integrated into BASH filters data, selecting only two columns matching the conditions: 1) either index from the 1st or 3rd column excluding pattern started from "lig"; 2) values from the 5th column that are > 0.05
#!/bin/bash
output=$(pwd)
# begining pattern of each processed file
target='HBavg'
# loop each file and create a bar graph
for file in "${output}"/${target}*.log ; do
file_name3=$(basename "$file")
file_name2="${file_name3/.log/}"
file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set term pngcairo size 800,600
### conditional xtic labels
reset session
set termoption noenhanced
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,10"
FILE = "$file"
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set yrange [0:1]
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
#set xrange[-1:5]
set table \$Filtered
myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)
plot FILE u ((y0=column(5))>0.05 ? sprintf("%g %s",y0,myTic(1,3)) : '') w table
unset table
plot \$Filtered u 0:1:xtic(2) w boxes, '' u 0:1:1 w labels offset 0,1
### end of script
EOS
done
eventually it writes filtered data into a new table producing a multi-bar plot which looks like:
As we may see here the bars are pre-sorted according to the values on Y (corresponded to the values from the 5th column of initial data). How would it be possible rather to sort bars according to the alphabetic order of the naming patterns displayed on X (eventually changing the order of the displayed bars on the graph)?
Since the original data is alway sorted according to the 5th column (Frac), would it be possible to resort it directly providing to Gnuplot ?
the idea may be to pipe it directly in gnuplot script with awk and sort e.g:
plot "<awk -v OFS='\t' 'NR > 1 && \$5 > 0.05' $file | sort -k1,1" using 0:5:xtic(3) with boxes
how could I do the same with my script (where the data is filtered using gnuplot and I need only to sort the bars produced via):
plot \$Filtered u 0:1:xtic(2) w boxes, '' u 0:1:1 w labels offset 0,1
edit: added color alternation
I would stick to external tools for processing the data then call gnuplot:
#!/bin/bash
{
echo '$data << EOD'
awk 'NR > 1 && $5 > 0.05 {print ($1 ~ /^lig/ ? $2 : $1 ), $5}' file.log |
sort -t ' ' -k1,1 |
awk -v colors='0x4472c4 0xed7d31' '
BEGIN { nc = split(colors,clrArr) }
{ print $0, clrArr[NR % nc + 1] }
'
echo 'EOD'
cat << 'EOF'
set term pngcairo size 800,600
set title "file.log" font "Century,22" textcolor "#b8860b"
set xtics noenhanced font "Helvetica,10"
set xlabel "H-bond donor, residue"
set ylabel "Fraction, %"
set yrange [0:1]
set key off
set boxwidth 0.9
set style fill solid 1.0
plot $data using 0:2:3:xtic(1) with boxes lc rgb var, \
'' using 0:2:2 with labels offset 0,1
EOF
} | gnuplot > file.png
remarks:
The problem with printing the values on top of the bars in Gnuplot is that you can't do it directly from a stream, you need a file or a variable. Here I saved the input data into the $data variable.
You'll be able to expand shell variables in the HEREDOC if you unquote it (<< 'EOF' => << EOF), but you have to make sure that you escape the $ of $data
The simplest way to add colors is to add a "color" field in the output of awk but the sorting would mess it up; that's why I add the color in an other awk after the sort.

Gnuplot script error "invalid expression"

Below is the code meant to output successive images for a gif file:
for i in {1..600}
do
python Phy_asg.py $i
gnuplot <<- EOF
unset tics;unset key;unset border
set xrange [-15:15]
set yrange [-15:15]
set arrow 1 from 0.012*$i,cos(0.012*$i)-pi to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set arrow 2 from sin(0.024*$i)+pi,0.012*$i to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
plot "< seq -9 .2 -3.1" u (cos(2*$1)):($1) with lines
replot "< seq -9 .2 -3.1" u ($1):(cos(2*$1)) with lines
replot "data_asg.txt" with lines lt 22 lw 2
set terminal png size 512,512
set output "Phy_gif_$i.png"
replot
EOF
done
Here the Phy_asg.py is python script to produce data in form of text file and its name is data_asg.txt. The shell gives me error in the line 10. It says:
gnuplot> plot "< seq -9 .2 -3.1" u (cos(2*)):() with lines
^
line 0: invalid expression
I am not able to figure out the problem. Is it with the seq command or formatting error.
The $1 is interpreted as shell parameter instead of data column. Either escape the dollar, \$1 or use column(1), I prefer latter
for i in {1..600}
do
python Phy_asg.py $i
gnuplot <<- EOF
set terminal png size 512,512
set output "Phy_gif_$i.png"
unset tics;unset key;unset border
set xrange [-15:15]
set yrange [-15:15]
set arrow 1 from 0.012*$i,cos(0.012*$i)-pi to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set arrow 2 from sin(0.024*$i)+pi,0.012*$i to sin(0.024*$i),cos(0.012*$i ) nohead ls 8 lw 2
set style data lines
plot "< seq -9 .2 -3.1" u (cos(2*column(1) )):1, \
"< seq -9 .2 -3.1" u 1:(cos(2*column(1))), \
"data_asg.txt" lt 22 lw 2
EOF
done

How to make that variable $f defined how much "Freq" will be printed from column number 3?

I need a help with my bash script. I've problem with code:
for v in $(seq 1 $f)); do echo $(grep "Freq" freq.log) | awk '{print$3}')
because this comands printed $f times column number 3 instead should be printed $f values of "Freq" from column number 3.
It's look like
enter image description here
Should be like
enter image description here
I don't know how make that variable $f defined how much "Freq" will be printed from column number 3. In this file I've plenty expressions of "Freq" but I need just $f.
For sure I pasted all content of script:
#!/bin/bash
e=$(grep "atomic number" freq.log | tail -1 | awk '{print$2}')
echo "Liczba atomow znajdujacyh sie w podanej czasteczce wynosi: $e"
f=$(bc <<< "($e*3-6)/3")
echo "Liczba wartosci Freq, ktore wczyta skrypt to $f"
for v in $(seq 1 $f); do
echo "$(grep "Freq" freq.log | awk '{print$3}')"
done
Sample input data file; geometry optimization calculations in GAUSSIAN
A A A
Frequencies -- 182.1477 202.8948 227.7144
Red. masses -- 6.6528 8.2622 6.3837
Frc consts -- 0.1300 0.2004 0.1950
IR Inten -- 0.8602 0.4870 1.2090
NAtoms= 35 NActive= 35 NUniq= 35 SFac= 1.00D+00 NAtFMM= 60 NAOKFM=F Big=F
Here is your bash script converted to a single awk script:
awk script script.awk
/atomic number/{ # for each line matching regEx pattern "atomic number"
e = $2; # store current 2nd field in variable e
}
/Freq/{ # for each line matching regEx pattern "Freq"
freqArr[fr++]=$3; # add 3rd field to array freqArr, increment array counter fr
}
END { # on complete scanning input file
print "Liczba atomow znajdujacyh sie w podanej czasteczce wynosi: " e;
f = ( ((e * 3) - 6) / 3 ); # claculate vairable f
print "Liczba wartosci Freq, ktore wczyta skrypt to " f;
for (currFreq in freqArr) { # scan all element freqArr
if (currFreq == f) # if currFreq equals f
freqCount++; # increment freqCount coutner
}
print freqCount;
}
run command
awk -f script.awk freq.log

How to subtract values of a specific row value from all the other row values?

My current working file is like this
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
My columns of interest are those with "_in" header. In those columns, I want to subtract the values of all Row elements from the row element that start with ID "Exp".
Lets consider A_in column, where the "Exp" row value is 10. So I want to subtract 10 from all the other elements of that A_in column
My amateur code is like this (I know it is silly)
#This part is grabbing all the values in ```Exp``` row
Exp=$( awk 'BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "#val_num_asc"}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
/Exp/ {
for (c in cols){
shift = $cols[c]
printf shift" "
}
}
' File.txt |paste -sd " ")
Exp_array=($Exp)
z=1
for i in "${Exp_array[#]}"
do
z=$(echo 2+$z | bc -l)
Exp_point=$i
awk -vd="$Exp_point" -vloop="$z" -v '
BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "#val_num_asc"}
function abs(x) {return x<0?-x:x}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
NR>2{
$loop=abs($loop-d); print
}
' File.txt
done
My First desired outcome is this
ID Time A_in Time B_in Time C_in
Ax 0.1 0.0 0.1 10 0.1 35
By 0.2 02 0.2 10 0.2 20
Cz 0.3 10 0.3 05 0.3 05
Fr 0.4 25 0.4 10 0.4 05
Exp 0.5 0.0 0.5 0.0 0.5 0.0
Now from each "_in" columns I want to find the corresponding ID of 2 smallest values. So
My second desired outcome is
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
Perl to the rescue!
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
#ARGV = (#ARGV[0, 0]); # Read the input file twice.
my #header = split ' ', <>;
my #in = grep $header[$_] =~ /_in$/, 0 .. $#header;
$_ = <> until eof;
my #exp = split;
my #min;
<>;
while (<>) {
my #F = split;
for my $i (#in) {
$F[$i] = abs($F[$i] - $exp[$i]);
#{ $min[$i] }[0, 1]
= sort { $a->[0] <=> $b->[0] }
[$F[$i], $F[0]], grep defined, #{ $min[$i] // [] }
unless eof;
}
say join "\t", #F;
}
print "\n";
say join "\t", #header[#in];
for my $index (0, 1) {
for my $i (#in) {
next unless $header[$i] =~ /_in$/;
print $min[$i][$index][1], "\t";
}
print "\n";
}
It reads the file twice. In the first read, it just remembers the first line as the #header array and the last line as the #exp array.
In the second read, it subtracts the corresponding exp value from each _in column. It also stores the two least numbers in the #min array at the position corresponding to the column position.
Formatting the numbers (i.e. 0.0 instead of 0 and 02 instead of 2) left as an exercise to the reader. Same with redirecting the output to several different files.
After some fun and an hour or two I wrote this abomination:
cat <<EOF >file
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
EOF
# fix stackoverflow formatting
# input file should be separated with tabs
<file tr -s ' ' | tr ' ' '\t' > file2
mv file2 inputfile
# read headers to an array
IFS=$'\t' read -r -a hdrs < <(head -n1 inputfile)
# exp line read into an array
IFS=$'\t' read -r -a exps < <(grep -m1 $'^Exp\t' inputfile)
# column count
colcnt="${#hdrs[#]}"
if [ "$colcnt" -eq 0 ]; then
echo >&2 "ERROR - must be at least one column"
exit 1
fi
# numbers of those columns which headers have _in suffix
incolnums=$(
paste <(
printf "%s\n" "${hdrs[#]}"
) <(
# puff, the numbers will start from zero cause bash indexes arrays from zero
# but `cut` indexes fields from 1, so.. just keep in mind it's from 0
seq 0 $((colcnt - 1))
) |
grep $'_in\t' |
cut -f2
)
# read the input file
{
# preserve header line
IFS= read -r hdrline
( IFS=$'\t'; printf "%s\n" "$hdrline" )
# ok. read the file field by field
# I think we could awk here
while IFS=$'\t' read -a vals; do
# for each column number with _in suffix
while IFS= read -r incolnum; do
# update the column value
# I use bc for float calculations
vals[$incolnum]=$(bc <<-EOF
define abs(i) {
if (i < 0) return (-i)
return (i)
}
scale=2
abs(${vals[$incolnum]} - ${exps[$incolnum]})
EOF
)
done <<<"$incolnums"
# output the line
( IFS=$'\t'; printf "%s\n" "${vals[*]}" )
done
} < inputfile > MyFirstDesiredOutcomeIsThis.txt
# ok so, first part done
{
# output headers names with _in suffix
printf "%s\n" "${hdrs[#]}" |
grep '_in$' |
tr '\n' '\t' |
# omg, fix tr, so stupid
sed 's/\t$/\n/'
# puff
# output the corresponding ID of 2 smallest values of the specified column number
# #arg: $1 column number
tmpf() {
# remove header line
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
# extract only this column
cut -f$(($1 + 1)) |
# unique numeric sort and extract two smallest values
sort -n -u | head -n2 |
# now, well, extract the id's that match the numbers
# append numbers with tab (to match the separator)
# suffix numbers with dollar (to match end of line)
sed 's/^/\t/; s/$/$/;' |
# how good is grep at buffering(!)
grep -f /dev/stdin <(
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
cut -f1,$(($1 + 1))
) |
# extract numbers only
cut -f1
}
# the following is something like foldr $'\t' $(tmpf ...) for each $incolnums
# we need to buffer here, we are joining the output column-wise
output=""
while IFS= read -r incolnum; do
output=$(<<<$output paste - <(tmpf "$incolnum"))
done <<<"$incolnums"
# because with start with empty $output, paste inserts leading tabs
# remove them ... and finally output $output
<<<"$output" cut -f2-
} > MySecondDesiredOutcomeIs.txt
# fix formatting to post it on stackoverflow
# files have tabs, and column will output them with space
# which is just enough
echo '==> MyFirstDesiredOutcomeIsThis.txt <=='
column -t -s$'\t' MyFirstDesiredOutcomeIsThis.txt
echo
echo '==> MySecondDesiredOutcomeIs.txt <=='
column -t -s$'\t' MySecondDesiredOutcomeIs.txt
The script will output:
==> MyFirstDesiredOutcomeIsThis.txt <==
ID Time A_in Time B_in Time C_in
Ax 0.1 0 0.1 10 0.1 35
By 0.2 2 0.2 10 0.2 20
Cz 0.3 10 0.3 5 0.3 5
Fr 0.4 25 0.4 10 0.4 5
Exp 0.5 0 0.5 0 0.5 0
==> MySecondDesiredOutcomeIs.txt <==
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
Written and tested at tutorialspoint.
I use bash and core-/more-utils to manipulate the file. First I identify the numbers of columns ending with _in suffix. Then I buffor the value stored in the Exp line.
Then I just read a file line by line, field by field, and for each field that has the number of a column that header ends with _in suffix, I substract the field value with the field value from the exp line. I think this part should be the slowest (I use plain while IFS=$'\t' read -r -a vals), but a smart awk scripting could speed this process up. This generates your "first desired output", as you called it.
Then I need to output only the header names ending with _in suffix. Then for each column number that ends with _in suffix, I need to identify 2 smallest values in the column. I use plain sort -n -u | head -n2. Then, it get's a little tricky. I need to extract IDs that have one of the corresponding 2 smallest values in such column. This is a job for grep -f. I prepare proper regex in the input using sed and let grep -f /dev/stdin do the filtering job.
Please just ask 1 question at a time. Here's how to do the first thing you asked about:
$ cat tst.awk
BEGIN { OFS="\t" }
NR==FNR { if ($1=="Exp") split($0,exps); next }
FNR==1 { $1=$1; print; next }
{
for (i=1; i<=NF; i++) {
val = ( (i-1) % 2 ? $i : exps[i] - $i )
printf "%s%s", (val < 0 ? -val : val), (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file file
ID Time A_in Time B_in Time C_in
0 0.1 0 0.1 10 0.1 35
0 0.2 2 0.2 10 0.2 20
0 0.3 10 0.3 5 0.3 5
0 0.4 25 0.4 10 0.4 5
0 0.5 0 0.5 0 0.5 0
The above will work efficiently and robustly using any awk in any shell on every UNIX box.
If after reading this, re-reading the previous awk answers you've received, and looking up the awk man page you still need help with the 2nd thing you asked about then ask a new standalone question just about that.

i can't get the results in gnuplot using shell scripting

I have to make a shell script with gnuplot to generate a data usage of my interface wlan0. Here's what I've tried :
==> for gnuplot :
set title "Data usage over the current hour"
unset multiplot
set xdata time
set style data lines
set term png
set timefmt '"%M:%S"'
set xrange ['"00:00"':'"59:59"']
set xlabel "Time"
set ylabel "Traffic"
set autoscale y
set output "datausage.png"
plot "monitor.dat" using 1:2 t "RX" w lines, "monitor.dat" using 1:3 t "TX" w lines
==> and this is my shell :
#!/bin/bash
interface=$1
mkdir -p /tmp/netiface
while true;
do
recus=`ifconfig $interface | awk -F":" 'NR == 8 { print $2}' |awk '{print $2}' | cut -d "(" -f 2`
transmis=`ifconfig $interface | awk -F":" 'NR == 8 { print $3}' |awk '{print $2}' | cut -d "(" -f 2`
date=`date +%M:%S`
echo -e "$date $recus $transmis">>/tmp/netiface/monitor.dat
sleep 1
done
===> monitor.dat
08:18 823.6 121.4
08:19 823.6 121.4
08:20 823.6 121.4
08:21 823.6 121.4
08:22 823.7 121.5
08:23 824.3 121.5
08:24 824.6 121.5
08:25 824.6 121.5
08:26 824.6 121.5
08:27 824.6 121.5
08:28 824.6 121.5
08:29 824.6 121.5
08:30 824.6 121.5
08:31 824.6 121.5
08:32 824.6 121.5
but when I execute all of this, I get the following result:
How should I change my script so that my data is plotted correctly?
Have you tried it in the console? I get a warning about an empty x range.
Your xrange is causing problems. It looks like there's some time problem.
Try setting set xrange [*:*] and you'll see the data. Experimenting with range values should tune you in to the right ranges.
Just don’t use so many quotes:
set timefmt "%M:%S"
set xrange ["00:00":"59:59"]
Bonus: your output will look nicer using:
set format x "%M:%S"
set xtics "15:00"
(I changed the last point in your example file to get more than tiny little line.)

Resources