Speed up shell script/Performance enhancement of shell script - bash

Is there a way to speed up the below shell script? It's taking me a good 40 mins to update about 150000 files everyday. Sure, given the volume of files to create & update, this may be acceptable. I don't deny that. However, if there is a much more efficient way to write this or re-write the logic entirely, I'm open to it. Please I'm looking for some help
#!/bin/bash
DATA_FILE_SOURCE="<path_to_source_data/${1}"
DATA_FILE_DEST="<path_to_dest>"
for fname in $(ls -1 "${DATA_FILE_SOURCE}")
do
for line in $(cat "${DATA_FILE_SOURCE}"/"${fname}")
do
FILE_TO_WRITE_TO=$(echo "${line}" | awk -F',' '{print $1"."$2".daily.csv"}')
CONTENT_TO_WRITE=$(echo "${line}" | cut -d, -f3-)
if [[ ! -f "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}" ]]
then
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
else
if ! grep -Fxq "${CONTENT_TO_WRITE}" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
then
sed -i "/${1}/d" "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
"${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
echo "${CONTENT_TO_WRITE}" >> "${DATA_FILE_DEST}"/"${FILE_TO_WRITE_TO}"
fi
fi
done
done

There are still parts of your published script that are unclear like the sed command. Although I rewrote it with saner practices and much less external calls witch should really speed it up.
#!/usr/bin/env sh
DATA_FILE_SOURCE="<path_to_source_data/$1"
DATA_FILE_DEST="<path_to_dest>"
for fname in "$DATA_FILE_SOURCE/"*; do
while IFS=, read -r a b content || [ "$a" ]; do
destfile="$DATA_FILE_DEST/$a.$b.daily.csv"
if grep -Fxq "$content" "$destfile"; then
sed -i "/$1/d" "$destfile"
fi
printf '%s\n' "$content" >>"$destfile"
done < "$fname"
done

Make it parallel (as much as you can).
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
# ...
for fname in "${DATA_FILE_SOURCE}/"*; do
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
while IFS= read -r line; do
FILE_TO_WRITE_TO="..."
# ...
done < "${fname}" & # forking here
pids[$!]="${fname}"
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Here’s a directly runnable skeleton showing how the harness above works (with 36 items to process and 20 parallel processes at most):
#!/bin/bash
set -e -o pipefail
declare -ir MAX_PARALLELISM=20 # pick a limit
declare -i pid
declare -a pids
do_something_and_maybe_fail() {
sleep $((RANDOM % 10))
return $((RANDOM % 2 * 5))
}
for fname in some_name_{a..f}{0..5}.txt; do # 36 items
if ((${#pids[#]} >= MAX_PARALLELISM)); then
wait -p pid -n || echo "${pids[pid]} failed with ${?}" 1>&2
unset 'pids[pid]'
fi
do_something_and_maybe_fail & # forking here
pids[$!]="${fname}"
echo "${#pids[#]} running" 1>&2
done
for pid in "${!pids[#]}"; do
wait -n "$((pid))" || echo "${pids[pid]} failed with ${?}" 1>&2
done
Strictly avoid external processes (such as awk, grep and cut) when processing one-liners for each line. fork()ing is extremely inefficient in comparison to:
Running one single awk / grep / cut process on an entire input file (to preprocess all lines at once for easier processing in bash) and feeding the whole output into (e.g.) a bash loop.
Using Bash expansions instead, where feasible, e.g. "${line/,/.}" and other tricks from the EXPANSION section of the man bash page, without fork()ing any further processes.
Off-topic side notes:
ls -1 is unnecessary. First, ls won’t write multiple columns unless the output is a terminal, so a plain ls would do. Second, bash expansions are usually a cleaner and more efficient choice. (You can use nullglob to correctly handle empty directories / “no match” cases.)
Looping over the output from cat is a (less common) useless use of cat case. Feed the file into a loop in bash instead and read it line by line. (This also gives you more line format flexibility.)

Related

bash: Wait for process substitution subshell to finish

How can bash wait for the subshell used in process substitution to finish in the following construct? (This is of course simplified from the real for loop and subshell which I am using, but it illustrates the intent well.)
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
echo "Finished"
Prints:
Finished
Subshell: 1
Subshell: 2
Subshell: 3
Instead of:
Subshell: 1
Subshell: 2
Subshell: 3
Finished
How can I make bash wait for those subshells to complete?
UPDATE
The reason for using process substitution is that I'm wanting to use file descriptors to control what is printed to the screen and what is sent to the process. Here is a fuller version of what I'm doing:
for myFile in file1 file2 file3; do
echo "Downloading $myFile" # Should print to terminal
scp -q $user#$host:$myFile ./ # Might take a long time
echo "$myFile" >&3 # Should go to process substitution
done 3> >(xargs -n1 bash -c 'sleep 1; echo "Processing: $0"')
echo "Finished"
Prints:
Downloading file1
Downloading file2
Downloading file3
Finished
Processing: file1
Processing: file2
Processing: file3
Processing each may take much longer than the transfer. The file transfers should be sequential since bandwidth is the limiting factor. I would like to start processing each file after it is received without waiting for all of them to transfer. The processing can be done in parallel, but only a with a limited number of instances (due to limited memory/CPU). So if the fifth file just finished transferring but only the second file has finished processing, the third and fourth files should complete processing before the fifth file is processed. Meanwhile the sixth file should start transferring.
Bash 4.4 lets you collect the PID of a process substitution with $!, so you can actually use wait, just as you would for a background process:
case $BASH_VERSION in ''|[123].*|4.[0123])
echo "ERROR: Bash 4.4 required" >&2; exit 1;;
esac
# open the process substitution
exec {ps_out_fd}> >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'); ps_out_pid=$!
for i in {1..3}; do
echo "$i"
done >&$ps_out_fd
# close the process substitution
exec {ps_out_fd}>&-
# ...and wait for it to exit.
wait "$ps_out_pid"
Beyond that, consider flock-style locking -- though beware of races:
for i in {1..3}; do
echo "$i"
done > >(flock -x my.lock xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
# this is only safe if the "for" loop can't exit without the process substitution reading
# something (and thus signalling that it successfully started up)
flock -x my.lock echo "Lock grabbed; the subshell has finished"
That said, given your actual use case, what you want should presumably look more like:
download() {
for arg; do
scp -q $user#$host:$myFile ./ || (( retval |= $? ))
done
exit "$retval"
}
export -f download
printf '%s\0' file1 file2 file3 |
xargs -0 -P2 -n1 bash -c 'download "$#"' _
you could have the subshell create a file that the main shell waits for.
tempfile=/tmp/finished.$$
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'; touch $tempfile)
while ! test -f $tempfile; do sleep 1; done
rm $tempfile
echo "Finished"
You can use bash coproc to hold a read-able filedescriptor to be closed when all process' children die:
coproc read # previously: `coproc cat`, see comments
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"')
exec {COPROC[1]}>&- # close my writing side
read -u ${COPROC[0]} # will wait until all potential writers (ie process children) end
echo "Finished"
If this is to be run on a system where there is an attacker you should not use a temp file name that can be guessed. So based on #Barmar's solution here is one that avoids that:
tempfile="`tempfile`"
for i in {1..3}; do
echo "$i"
done > >(xargs -n1 bash -c 'sleep 1; echo "Subshell: $0"'; rm "$tempfile")
while test -f "$tempfile"; do sleep 1; done
echo "Finished"
I think you are making it more complicated than it needs to be. Something like this works because the internal bash executions are a subprocess of the main process, the wait causes the process to wait until everything is finished before printing.
for i in {1..3}
do
bash -c "sleep 1; echo Subshell: $i" &
done
wait
echo "Finished"
Unix and derivatives (Linux) have the ability to wait for child (sub) processes but not grandchild processes such as occurred in your original. Some would consider the polling solution where you go back and check for completion to be vulgar since it does not use this mechanism.
The solution where the xargs PID was captured was not vulgar, just too complicated.

Why does bash script stop working

The script monitors incoming HTTP messages and forwards them to a monitoring application called zabbix, It works fine, however after about 1-2 days it stops working. Heres what I know so far:
Using pgrep i see the script is still running
the logfile file gets updated properly (first command of script)
The FIFO pipe seems to be working
The problem must be somewhere in WHILE loop or tail command.
Im new at scripting so maybe someone can spot the problem right away?
#!/bin/bash
tcpflow -p -c -i enp2s0 port 80 | grep --line-buffered -oE 'boo.php.* HTTP/1.[01]' >> /usr/local/bin/logfile &
pipe=/tmp/fifopipe
trap "rm -f $pipe" EXIT
if [[ ! -p $pipe ]]; then
mkfifo $pipe
fi
tail -n0 -F /usr/local/bin/logfile > /tmp/fifopipe &
while true
do
if read line <$pipe; then
unset sn
for ((c=1; c<=3; c++)) # c is no of max parameters x 2 + 1
do
URL="$(echo $line | awk -F'[ =&?]' '{print $'$c'}')"
if [[ "$URL" == 'sn' ]]; then
((c++))
sn="$(echo $line | awk -F'[ =&?]' '{print $'$c'}')"
fi
done
if [[ "$sn" ]]; then
hosttype="US2G_"
host=$hosttype$sn
zabbix_sender -z nuc -s $host -k serial -o $sn -vv
fi
fi
done
You're inputting from the fifo incorrectly. By writing:
while true; do read line < $pipe ....; done
you are closing and reopening the fifo on each iteration of the loop. The first time you close it, the producer to the pipe (the tail -f) gets a SIGPIPE and dies. Change the structure to:
while true; do read line; ...; done < $pipe
Note that every process inside the loop now has the potential to inadvertently read from the pipe, so you'll probably want to explicitly close stdin for each.

running each element in array in parallel in bash script

Lets say I have a bash script that looks like this:
array=( 1 2 3 4 5 6 )
for each in "${array[#]}"
do
echo "$each"
command --arg1 $each
done
If I want to run the everything in the loop in parallel, I could just change command --arg1 $each to command --arg1 $each &.
But now lets say I want to take the results of command --arg1 $each and do something with those results like this:
array=( 1 2 3 4 5 6 )
for each in "${array[#]}"
do
echo "$each"
lags=($(command --arg1 $each)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
done
If I just add a & to the end of command --arg1 $each, everything after command --arg1 $each will run without command --arg1 $each finishing first. How do I prevent that from happening? Also, how do I also limit the amount of threads the loop can occupy?
Essentially, this block should run in parallel for 1,2,3,4,5,6
echo "$each"
lags=($(command --arg1 $each)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
-----EDIT--------
Here is the original code:
#!/bin/bash
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/kafka/kafka.client.jaas.conf"
IFS=$'\n'
array=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --list --command-config /etc/kafka/client.properties --new-consumer))
lngth=${#array[*]}
echo "array length: " $lngth
timestamp=$(($(date +%s%N)/1000000))
log_time=`date +%Y-%m-%d:%H`
echo "log time: " $log_time
log_file="/home/ec2-user/laglogs/laglog.$log_time.log"
echo "log file: " $log_file
echo "timestamp: " $timestamp
get_lags () {
echo "$1"
lags=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --describe --group $1 --command-config /etc/kafka/client.properties --new-consumer))
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
}
for each in "${array[#]}"
do
get_lags $each &
done
------EDIT 2-----------
Trying with answer below:
#!/bin/bash
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/kafka/kafka.client.jaas.conf"
IFS=$'\n'
array=($(kafka-consumer-groups --bootstrap-server kafka1:9092 --list --command-config /etc/kafka/client.properties --new-consumer))
lngth=${#array[*]}
echo "array length: " $lngth
timestamp=$(($(date +%s%N)/1000000))
log_time=`date +%Y-%m-%d:%H`
echo "log time: " $log_time
log_file="/home/ec2-user/laglogs/laglog.$log_time.log"
echo "log file: " $log_file
echo "timestamp: " $timestamp
max_proc_count=8
run_for_each() {
local each=$1
echo "Processing: $each" >&2
IFS=$'\n' read -r -d '' -a lags < <(kafka-consumer-groups --bootstrap-server kafka1:9092 --describe --command-config /etc/kafka/client.properties --new-consumer --group "$each" && printf '\0')
for result in "${lags[#]}"; do
printf '%(%Y-%m-%dT%H:%M:%S)T\t%s\t%s\n' -1 "$each" "$result"
done >>"$log_file"
}
export -f run_for_each
export log_file # make log_file visible to subprocesses
printf '%s\0' "${array[#]}" |
xargs -P "$max_proc_count" -n 1 -0 bash -c 'run_for_each "$#"'
The convenient thing to do is to push your background code into a separate script -- or an exported function. That way xargs can create a new shell, and access the function from its parent. (Be sure to export any other variables that need to be available in the child as well).
array=( 1 2 3 4 5 6 )
max_proc_count=8
log_file=out.txt
run_for_each() {
local each=$1
echo "Processing: $each" >&2
IFS=$' \t\n' read -r -d '' -a lags < <(yourcommand --arg1 "$each" && printf '\0')
for result in "${lags[#]}"; do
printf '%(%Y-%m-%dT%H:%M:%S)T\t%s\t%s\n' -1 "$each" "$result"
done >>"$log_file"
}
export -f run_for_each
export log_file # make log_file visible to subprocesses
printf '%s\0' "${array[#]}" |
xargs -P "$max_proc_count" -n 1 -0 bash -c 'run_for_each "$#"'
Some notes:
Using echo -e is bad form. See the APPLICATION USAGE and RATIONALE sections in the POSIX spec for echo, explicitly advising using printf instead (and not defining an -e option, and explicitly defining than echo must not accept any options other than -n).
We're including the each value in the log file so it can be extracted from there later.
You haven't specified whether the output of yourcommand is space-delimited, tab-delimited, line-delimited, or otherwise. I'm thus accepting all these for now; modify the value of IFS passed to the read to taste.
printf '%(...)T' to get a timestamp without external tools such as date requires bash 4.2 or newer. Replace with your own code if you see fit.
read -r -a arrayname < <(...) is much more robust than arrayname=( $(...) ). In particular, it avoids treating emitted values as globs -- replacing *s with a list of files in the current directory, or Foo[Bar] with FooB should any file by that name exist (or, if the failglob or nullglob options are set, triggering a failure or emitting no value at all in that case).
Redirecting stdout to your log_file once for the entire loop is somewhat more efficient than redirecting it every time you want to run printf once. Note that having multiple processes writing to the same file at the same time is only safe if all of them opened it with O_APPEND (which >> will do), and if they're writing in chunks small enough to individually complete as single syscalls (which is probably happening unless the individual lags values are quite large).
A lot of lenghty and theoretical answers here, I'll try to keep it simple - what about using | (pipe) to connect the commands as usual ?;) (And GNU parallel, which excels for these type of tasks).
seq 6 | parallel -j4 "command --arg1 {} | command2 > results/{}"
The -j4 will limit number of threads (jobs) as requested. You DON'T want to write to a single file from multiple jobs, output one file per job and join them after the parallel processing is finished.
Using GNU Parallel it looks like this:
array=( 1 2 3 4 5 6 )
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > output
GNU Parallel makes sure output from different jobs is not mixed.
If you prefer the output from jobs mixed:
parallel -0 --bar --line-buffer --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > output-linebuffer
Again GNU Parallel makes sure to only mix with full lines: You will not see half a line from one job and half a line from another job.
It also works if the array is a bit more nasty:
array=( "new
line" 'quotes" '"'" 'echo `do not execute me`')
Or if the command prints long lines half-lines:
command() {
echo Input: "$#"
echo '" '"'"
sleep 1
echo -n 'Half a line '
sleep 1
echo other half
superlong_a=$(perl -e 'print "a"x1000000')
superlong_b=$(perl -e 'print "b"x1000000')
echo -n $superlong_a
sleep 1
echo $superlong_b
}
export -f command
GNU Parallel strives to be a general solution. This is because I have designed GNU Parallel to care about correctness and try vehemently to deal correctly with corner cases, too, while staying reasonably fast.
GNU Parallel guards against race conditions and does not split words in the output on each their line.
array=( $(seq 30) )
max_proc_count=30
command() {
# If 'a', 'b' and 'c' mix: Very bad
perl -e 'print "a"x3000_000," "'
perl -e 'print "b"x3000_000," "'
perl -e 'print "c"x3000_000," "'
echo
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
# 'abc' should always stay together
# and there should only be a single line per job
cat parallel.out | tr -s abc
GNU Parallel works fine if the output has a lot of words:
array=(1)
command() {
yes "`seq 1000`" | head -c 10M
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
GNU Parallel does not eat all your memory - even if the output is bigger than your RAM:
array=(1)
outputsize=1000M
export outputsize
command() {
yes "`perl -e 'print \"c\"x30_000'`" | head -c $outputsize
}
export -f command
parallel -0 --bar --tagstring '{= $_=localtime(time)."\t".$_; =}' \
command --arg1 {} ::: "${array[#]}" > parallel.out
You know how to execute commands in separate processes. The missing part is how to allow those processes to communicate, as separate processes cannot share variables.
Basically, you must chose whether to communicate using regular files, or inter-process communication/FIFOs (which still boils down to using files).
The general approach :
Decide how you want to present tasks to be executed. You could have them as separate files on the filesystem, as a FIFO special file that can be read from, etc. This could be a simple as writing to a separate file each command to be executed, or writing each command to a FIFO (one command per line).
In the main process, prepare the files describing tasks to perform or launch a separate process in the background that will feed the FIFO.
Then, still in the main process, launch worker processes in the background (with &), as many of them as you want parallel tasks being executed (not one per task to perform). Once they have been launched, use wait to, well, wait until all processes are finished. Separate processes cannot share variables, you will have to write any output that needs to be used later to separate files, or a FIFO, etc. If using a FIFO, remember more than one process can write to a FIFO at the same time, so use some kind of mutex mechanism (I suggest looking into the use of mkdir/rmdir for that purpose).
Each worker process must fetch the next task (from a file/FIFO), execute it, generate the output (to a file/FIFO), loop until there are no new tasks, then exit. If using files, you will need to use a mutex to "reserve" a file, read it, and then delete it to mark it as taken care of. This would not be needed for a FIFO.
Depending on the case, your main process may have to wait until all tasks are finished before handling the output, or in some cases may launch a worker process that will detect and handle output as it appears. This worker process would have to either be stopped by the main process once all tasks have been executed, or figure out for itself when all tasks have been executed and exit (while being waited on by the main process).
This is not detailed code, but I hope it gives you an idea of how to approach problems like this.
(Community Wiki answer with the OP's proposed self-answer from the question -- now edited out):
So here is one way I can think of doing this, not sure if this is the most efficient way and also, I can't control the amount of threads (I think, or processes?) this would use:
array=( 1 2 3 4 5 6 )
lag_func () {
echo "$1"
lags=($(command --arg1 $1)
lngth_lags=${#lags[*]}
for (( i=1; i<=$(( $lngth_lags -1 )); i++))
do
result=${lags[$i]}
echo -e "$timestamp\t$result" >> $log_file
echo "result piped"
done
}
for each in "${array[#]}"
do
lag_func $each &
done

Creating a for loop in a trap doesn't work in Shell script

I have been trying to create a trap in a script to basically create some logs of a script that has been running in the background.
Whenever I introduce a for loop in the trap, the script stops doing what it is supposed to do:
trap 'terminate' 10
...
write_log(){
local target=$1
local file="/tmp/"$target"_log.txt"
local lines=$(cat /tmp/"$target"_log.txt | wc -l)
printf "Log for $target\n" >> "log.txt" # This line is printed
for ((i=1;i<=$lines;i++)); # Nothing in this loop happens
do
local start_date=$(date -d "$(sed -n ""$i"p") $file | cut -f1")
local end_date=$(date -d "$sed -n ""$i"p") $file | cut -f2")
printf "Logged in $start_date, logged out $end_date" > "log.txt"
done
}
terminate(){
for target
do
echo "In the for loop!"
end_session "$target"
write_log "$target"
done
exit 0
}
When I run my script in the background and kill it with
kill -10 (process_id)
the script stops, and starts doing the cleanup, until the point where it finds a for loop. When I remove the for loop in terminate() and instead do individual calls to end_session() and write_log(), end_session() works just fine, and write_log() works fine--until it reaches the for loop.
I am probably missing something basic, but I have looked at this for a while now and can't seem to figure out what is happening. Is there any limitation to for loops in traps?
No arguments are passed to terminate when it is invoked by the trap, so its loop executes zero times (because for target; do …; done is a shorthand for for target in "$#"; do …; done, and in a function, "$#" is the list of arguments to the function, not to the shell script as a whole).
If that's not what you want to have happen, you have to arrange to pass the relevant arguments to terminate in the trap. For example, you could pass all the arguments to the script via a global array:
args=( "$#" )
and inside terminate:
for target in "${args[#]}"
However, what's best depends on what you want to achieve.
The function is hanging because the parentheses are messed up in the date commands. Try this:
local start_date=$(date -d "$(sed -n ${i}p "$file" | cut -f1)")
local end_date=$(date -d "$(sed -n ${i}p "$file" | cut -f2)")

pass a command as an argument to bash script

How do I pass a command as an argument to a bash script?
In the following script, I attempted to do that, but it's not working!
#! /bin/sh
if [ $# -ne 2 ]
then
echo "Usage: $0 <dir> <command to execute>"
exit 1;
fi;
while read line
do
$($2) $line
done < $(ls $1);
echo "All Done"
A sample usage of this script would be
./myscript thisDir echo
Executing the call above ought to echo the name of all files in the thisDir directory.
First big problem: $($2) $line executes $2 by itself as a command, then tries to run its output (if any) as another command with $line as an argument to it. You just want $2 $line.
Second big problem: while read ... done < $(ls $1) doesn't read from the list of filenames, it tries to the contents of a file specified by the output of ls -- this will fail in any number of ways depending on the exact circumstances. Process substitution (while read ... done < <(ls $1)) would do more-or-less what you want, but it's a bash-only feature (i.e. you must start the script with #!/bin/bash, not #!/bin/sh). And anyway it's a bad idea to parse ls, you should almost always just use a shell glob (*) instead.
The script also has some other potential issues with spaces in filenames (using $line without double-quotes around it, etc), and weird stylistic oddities (you don't need ; at the end of a line in shell). Here's my stab at a rewrite:
#! /bin/sh
if [ $# -ne 2 ]; then
echo "Usage: $0 <dir> <command to execute>"
exit 1
fi
for file in "$1"/*; do
$2 "$file"
done
echo "All done"
Note that I didn't put double-quotes around $2. This allows you to specify multiword commands (e.g. ./myscript thisDir "cat -v" would be interpreted as running the cat command with the -v option, rather than trying to run a command named "cat -v"). It would actually be a bit more flexible to take all arguments after the first one as the command and its argument, allowing you to do e.g. ./myscript thisDir cat -v, ./myscript thisDir grep -m1 "pattern with spaces", etc:
#! /bin/sh
if [ $# -lt 2 ]; then
echo "Usage: $0 <dir> <command to execute> [command options]"
exit 1
fi
dir="$1"
shift
for file in "$dir"/*; do
"$#" "$file"
done
echo "All done"
your command "echo" command is "hidden" inside a sub-shell from its argments in $line.
I think I understand what your attempting in with $($2), but its probably overkill, unless this isn't the whole story, so
while read line ; do
$2 $line
done < $(ls $1)
should work for your example with thisDir echo. If you really need the cmd-substitution and the subshell, then put you arguments so they can see each other:
$($2 $line)
And as D.S. mentions, you might need eval before either of these.
IHTH
you could try: (in your codes)
echo "$2 $line"|sh
or the eval:
eval "$2 $line"

Resources