How to convert bash shell string to command [duplicate] - bash

This question already has answers here:
How can I store a command in a variable in a shell script?
(12 answers)
Dynamic variable names in Bash
(19 answers)
Closed 1 year ago.
I am running different program with different config. I tried to convert string (kmeans and bayes) in the inner loop to variables I defined at the beginning, so I can run the programs and capture the console output. kmeans_time and bayes_time are used to record execution time of each program.
#!/bin/bash
kmeans="./kmeans -m40 -n40 -t0.00001 -p 4 -i inputs/random-n1024-d128-c4.txt"
bayes="./bayes -t 4 -v32 -r1024 -n2 -p20 -s0 -i2 -e2"
kmeans_time=0
bayes_time=0
for n in {1..10}
do
for prog in kmeans bayes
do
output=($(${prog} | tail -1))
${$prog + "_time"}=$( echo $kmeans_time + ${output[1]} | bc)
echo ${output[1]}
done
done
However, I got the following errors. It seems that the prog is executed as a string instead of command I defined. Also, concatenation of the time variable filed. I've tried various ways. How is this accomplished in Bash?
./test.sh: line 11: kmeans: command not found
./test.sh: line 12: ${$app + "_time"}=$( echo $kmeans_time + ${output[1]} | bc): bad substitution
What I am trying to do is to execute as follow, which can work properly.
kmeans="./kmeans -m40 -n40 -t0.00001 -p 4 -i inputs/random-n1024-d128-c4.txt"
output=($($kmeans | tail -1))
# output[1] is the execution time
echo "${output[1]}"
kmeas_times=$kmeans_times+${output[1]}
I want to iterate over different programs and calculate each of their average execution time

I am vaguely guessing you are looking for printf -v.
The string in bayes is not a valid command, nor a valid sequence of arguments to another program, so I really can't guess what you are hoping for it to do.
Furthermore, output is not an array, so ${output[1]} is not well-defined. Are you trying to get the first token from the line? You seem to have misplaced the parentheses to make output into an array; but you can replace the tail call with a simple Awk script to just extract the token you want.
Your code would always add the value of kmeans_time to output; if you want to use the variable named by $prog you can use indirect expansion to specify the name of the variable, but you will need a temporary variable for that.
Mmmmaybe something like this? Hopefully this should at least show you what valid Bash syntax looks like.
kmeans_time=0
bayes_time=0
for n in {1..10}
do
for prog in kmeans bayes
do
case $prog in
kmeans) cmd=(./kmeans -m40 -n40 -t0.00001 -p 4 -i inputs/random-n1024-d128-c4.txt);;
bayes) cmd=(./bayes -t 4 -v32 -r1024 -n2 -p20 -s0 -i2 -e2);;
esac
output=$("${cmd[#]}" | awk 'END { print $2 }')
var=${prog}_time
printf -v "$var" %i $((!var + output))
echo "$output"
done
done
As an alternative to the indirect expansion, maybe use an associative array for the accumulated time. (Bash v5+ only, though.)
If running the two programs alternatingly is not important, your code can probably be simplified.
kmeans () {
./kmeans -m40 -n40 -t0.00001 -p 4 -i inputs/random-n1024-d128-c4.txt
}
bayes () {
./bayes -t 4 -v32 -r1024 -n2 -p20 -s0 -i2 -e2
}
get_output () {
awk 'END { print $2 }'
}
loop () {
time=0
for n in {1..10}; do
do
output=("$#" | get_output)
time=$((time+output))
print "$output"
done
printf -v "${0}_time" %i "$time"
}
loop kmeans
loop bayes
Maybe see also http://mywiki.wooledge.org/BashFAQ/050 ("I'm trying to put a command in a variable, but the complex cases always fail").

Related

How do I recursively replace part of a string with another given string in bash?

I need to write bash script that converts a string of only integers "intString" to :id. intString always exists after /, may never contain any other types (create_step2 is not a valid intString), and may end at either a second / or end of line. intString may be any 1-8 characters. Script needs to be repeated for every line in a given file.
For example:
/sample/123456/url should be converted to /sample/:id/url
and /sample_url/9 should be converted to /sampleurl/:id however /sample_url_2/ should remain the same.
Any help would be appreciated!
It seems like the long way around the problem to go recursive but then I don't know what problem you are solving. It seems like a good sed command like
sed -E 's/\/[0-9]{1,}/\/:id/g'
could do it in one shot, but if you insist on being recursive then it might go something like this ...
#!/bin/bash
function restring()
{
s="$1"
s="$(echo $s | sed -E 's/\/[0-9]{1,}/\/:id/')"
if ( echo $s | grep -E '\/[0-9]{1,}' > /dev/null ) ; then
restring $s
else
echo $s
exit
fi
echo $s
}
restring "$1"
now run it
$ ./restring.sh "/foo/123/bar/456/baz/45435/andstuff"
/foo/:id/bar/:id/baz/:id/andstuff

Bash command to read a line based on the parameters I pass - perform column-based lookups

I have a file links.txt:
1 a.sh
3 b.sh
6 c.sh
4 d.sh
So, if i pass 1,4 as parameters to another file(master.sh), a.sh and d.sh should be stored in a variable.
sed '3!d' would print the 3rd line, but not the line that starts with 3. For that, you need sed '/^3 /!d'. The problem is you can't combine them for more lines, as this means "Delete everything that doesn't start with a 3", which means all other lines will be missed. So, use sed -n '/^3 /p' instead, i.e. don't print by default and tell sed what lines to print, not what lines to delete.
You can loop over the argument and create a sed script from them that prints the lines, then run sed using this output:
#!/bin/bash
file=$1
shift
for id in "$#" ; do
echo "/^$id /p"
done | sed -nf- "$file"
Run as script.sh filename 3 4.
If you want to remove the id from the output, you can either use
cut -f2 -d' '
or you can modify the generated sed script to do the work
echo "/^$id /s/.* //p"
i.e. only print if the substitution was successful.
This loops through each argument and greps for it in the links file. The result is piped into cut where we specify the delimiter as a space with -d flag and the field number as 2 with -f flag. Finally this is appended to the array called files.
links="links.txt"
files=()
for arg in $#; do
files=("${files[#]}" `grep "^$arg" "$links" | cut -d" " -f2`)
done;
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Edit:
As pointed out by mklement0, the solution above reads the file once per arg. The following first builds the pattern then reads the file just once.
links="links.txt"
pattern="^$1\s"
for arg in ${#:2}; do
pattern+="|^$arg\s"
done
files=$(grep -E "$pattern" "$links" | cut -d" " -f2)
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Here is another example with grep and cut:
#!/bin/bash
for line in $(grep "$1\|$2" links.txt|cut -d' ' -f2)
do
echo $line
done
Example of usage:
./master.sh 1 4
a.sh
d.sh
Why not just stores the values and call them at will:
items=()
while read -r num file
do
items[num]="$file"
done<links.txt
for arg
do
echo "${items[arg]}"
done
Now you can use the items array any time you like :)
The following awk solution:
preserves the argument order; that is, the results reflect the order in which the lookup values were specified (as opposed to the order in which the lookup values happen to occur in the file).
If that is not important (i.e., if outputting the results in file order is acceptable), the readarray technique below can be combined with this one-liner, which is a generalized variant of Panta's answer:
grep -f <(printf "^%s\n" "$#") links.txt | cut -d' ' -f2-
performs well, because the input file is only read once; the only requirement is that all key-value pairs fit into memory as a whole (as a single associative Awk array (dictionary)).
works with any lookup values that don't have embedded whitespace.
Similarly, the assumption is that the output column values (containing values such as a.sh in the sample input) have no embedded whitespace. awk doesn't handle quoted fields well, so more work would be needed.
#!/bin/bash
readarray -t files < <(
awk -v idList="$*" '
BEGIN { count=split(idList, idArr); for (i in idArr) idDict[idArr[i]]++ }
$1 in idDict { idDict[$1] = $2 }
END { for (i=1; i<=count; ++i) print idDict[idArr[i]] }
' links.txt
)
# Print results.
printf '%s\n' "${files[#]}"
readarray -t files reads stdin input (<) line by line into array variable files.
Note: readarray requires Bash v4+; on Bash 3.x, such as on macOS, replace this part with
IFS=$'\n' read -d '' -ra files
<(...) is a Bash process substitution that, loosely speaking, presents the output from the enclosed command as if it were (self-deleting) temporary file.
This technique allows readarray to run in the current shell (as opposed to a subshell if a pipeline had been used), which is necessary for the files variable to remain defined in the remainder of the script.
The awk command breaks down as follows:
-v idList="$*" passes the space-separated list of all command-line arguments as a single string to Awk variable idList.
Note that this assumes that the arguments have no embedded spaces, which is indeed the case here and also generally the case with identifiers.
BEGIN { ... } is only executed once, before the individual lines are processed:
split(idList, idArr) splits the input ID list into an array by whitespace and stores the result in idArr.
for (i in idArr) idDict[idArr[i]]++ } then converts the (conceptually regular) array into associative array idDict (dictionary), whose keys are the input IDs - this enables efficient lookup by ID later, and also allows storing the lookup result for each ID.
$1 in idDict { idDict[$1] = $2 } is processed for every input line:
Pattern $1 in idDict returns true if the line's first whitespace-separated field ($1) - e.g., 6 - is among the keys (in) of associative array idDict, and, if so, executes the associated action ({...}).
Action { idDict[$1] = $2 } then assigns the second field ($2) - e.g., c.sh - to the iDict entry for key $1.
END { ... } is executed once, after all input lines have been processed:
for (i=1; i<=count; ++i) print idDict[idArr[i]] loops over all input IDs in order and prints each ID's lookup result, which is the value of the dictionary entry with that ID.

Print line after the match in grep [duplicate]

This question already has answers here:
How to show only next line after the matched one?
(14 answers)
Closed 6 years ago.
I'm trying to get the current track running from 'cmus-remote -Q'
Its always underneath of this line
tag genre Various
<some track>
Now, I need to keep it simple because I want to add it to my i3 bar. I used
cmus-remote -Q | grep -A 1 "tag genre"
but that grep's the 'tag' line AND the line underneath.
I want ONLY the line underneath.
With sed:
sed -n '/tag genre/{n;p}'
Output:
$ cmus-remote -Q | sed -n '/tag genre/{n;p}'
<some track>
If you want to use grep as the tool for this, you can achieve it by adding another segment to your pipeline:
cmus-remote -Q | grep -A 1 "tag genre" | grep -v "tag genre"
This will fail in cases where the string you're searching for is on two lines in a row. You'll have to define what behaviour you want in that case if we're going to program something sensible for it.
Another possibility would be to use a tool like awk, which allows for greater compexity in the line selection:
cmus-remote -Q | awk '/tag genre/ { getline; print }'
This searches for the string, then gets the next line, then prints it.
Another possibility would be to do this in bash alone:
while read line; do
[[ $line =~ tag\ genre ]] && read line && echo "$line"
done < <(cmus-remote -Q)
This implements the same functionality as the awk script, only using no external tools at all. It's likely slower than the awk script.
You can use awk instead of grep:
awk 'p{print; p=0} /tag genre/{p=1}' file
<some track>
/tag genre/{p=1} - sets a flag p=1 when it encounters tag genre in a line.
p{print; p=0} when p is non-zero then it prints a line and resets p to 0.
I'd suggest using awk:
awk 'seen && seen--; /tag genre/ { seen = 1 }'
when seen is true, print the line.
when seen is true, decrement the value, so it will no longer true after the desired number of lines are printed
when the pattern matches, set seen to the number of lines to be printed

bash script to modify and extract information

I am creating a bash script to modify and summarize information with grep and sed. But it gets stuck.
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
#Extract lines starting with ">#HWI"
ONLY=`grep -v ^\>#HWI`
#replaces A and G with R in lines
ONLYR=`sed -e s/A/R/g -e s/G/R/g $ONLY`
grep R $ONLYR | wc -l
The correct way to write a shell script to do what you seem to be trying to do is:
awk '
!/^>#HWI/ {
gsub(/[AG]/,"R")
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
Just put that in the file myscript.sh and execute it as you do today.
To be clear - the bulk of the above code is an awk script, the shell script part is the first and last lines where the shell just calls awk and passes it the input file names.
If you WANT to have intermediate variables then you can create/print them with:
awk '
!/^>#HWI/ {
only = $0
onlyR = only
gsub(/[AG]/,"R",onlyR)
print "only:", only
print "onlyR:", onlyR
if (/R/) {
++cnt
}
END { print cnt+0 }
' "$#"
The above will work robustly, portably, and efficiently on all UNIX systems.
First of all, and as #fedorqui commented - you're not providing grep with a source of input, against which it will perform line matching.
Second, there are some problems in your script, which will result in unwanted behavior in the future, when you decide to manipulate some data:
Store matching lines in an array, or a file from which you'll later read values. The variable ONLY is not the right data structure for the task.
By convention, environment variables (PATH, EDITOR, SHELL, ...) and internal shell variables (BASH_VERSION, RANDOM, ...) are fully capitalized. All other variable names should be lowercase. Since
variable names are case-sensitive, this convention avoids accidentally overriding environmental and internal variables.
Here's a better version of your script, considering these points, but with an open question regarding what you were trying to do in the last line : grep R $ONLYR | wc -l :
#!/bin/bash
# This script extracts some basic information
# from text files and prints it to screen.
#
# Usage: ./myscript.sh </path/to/text-file>
input_file=$1
# Read lines not matching the provided regex, from $input_file
mapfile -t only < <(grep -v '^\>#HWI' "$input_file")
#replaces A and G with R in lines
for((i=0;i<${#only[#]};i++)); do
only[i]="${only[i]//[AG]/R}"
done
# DEBUG
printf '%s\n' "Here are the lines, after relpace:"
printf '%s\n' "${only[#]}"
# I'm not sure what you were trying to do here. Am I gueesing right that you wanted
# to count the number of R's in ALL lines ?
# grep R $ONLYR | wc -l

Setting a BASH environment variable directly in AWK (in an AWK one-liner)

I have a file that has two columns of floating point values. I also have a C program that takes a floating point value as input and returns another floating point value as output.
What I'd like to do is the following: for each row in the original, execute the C program with the value in the first column as input, and then print out the first column (unchanged) followed by the second column minus the result of the C program.
As an example, suppose c_program returns the square of the input and behaves like this:
$ c_program 4
16
$
and suppose data_file looks like this:
1 10
2 11
3 12
4 13
What I'd like to return as output, in this case, is
1 9
2 7
3 3
4 -3
To write this in really sketchy pseudocode, I want to do something like this:
awk '{print $1, $2 - `c_program $1`}' data_file
But of course, I can't just pass $1, the awk variable, into a call to c_program. What's the right way to do this, and preferably, how could I do it while still maintaining the "awk one-liner"? (I don't want to pull out a sledgehammer and write a full-fledged C program to do this.)
you just do everything in awk
awk '{cmd="c_program "$1; cmd|getline l;print $1,$2-l}' file
This shows how to execute a command in awk:
ls | awk '/^a/ {system("ls -ld " $1)}'
You could use a bash script instead:
while read line
do
FIRST=`echo $line | cut -d' ' -f1`
SECOND=`echo $line | cut -d' ' -f2`
OUT=`expr $SECOND \* 4`
echo $FIRST $OUT `expr $OUT - $SECOND`
done
The shell is a better tool for this using a little used feature. There is a shell variable IFS which is the Input Field Separator that sh uses to split command lines when parsing; it defaults to <Space><Tab><Newline> which is why ls foo is interpreted as two words.
When set is given arguments not beginning with - it sets the positional parameters of the shell to the contents of the arguments as split via IFS, thus:
#!/bin/sh
while read line ; do
set $line
subtrahend=`c_program $1`
echo $1 `expr $2 - $subtrahend`
done < data_file
Pure Bash, without using any external executables other than your program:
#!/bin/bash
while read num1 num2
do
(( result = $(c_program num2) - num1 ))
echo "$num1 $result"
done
As others have pointed out: awk is not not well equipped for this job. Here is a suggestion in bash:
#!/bin/sh
data_file=$1
while read column_1 column_2 the_rest
do
((result=$(c_program $column_1)-$column_2))
echo $column_1 $result "$the_rest"
done < $data_file
Save this to a file, say myscript.sh, then invoke it as:
sh myscript.sh data_file
The read command reads each line from the data file (which was redirected to the standard input) and assign the first 2 columns to $column_1 and $column_2 variables. The rest of the line, if there is any, is stored in $the_rest.
Next, I calculate the result based on your requirements and prints out the line based on your requirements. Note that I surround $the_rest with quotes to reserve spacing. Failure to do so will result in multiple spaces in the input file to be squeezed into one.

Resources