Bash multiple files processing

Bash multiple files processing - bash

I have a file named data_file with data:
london
paris
newyork
italy...50 more items
Have a directory with over 75 files, say dfile1, dfie2...afle75 in which i am performing search for entries in data_file.
files=$(find . -type f)
for f in $files; do
while read -r line; do
found=$(grep $line $f)
if [ ! -z "$found" ]; then
perform task here
fi
done < data_file
done
As the loop runs for each file one by one, it takes lots of time to finish. How can I speed it up, can i run the for loop for multiple files at same time?

Using GNU Parallel you can do something like this:
doit() {
f="$1"
line="$2"
found=$(grep $line $f)
if [ ! -z "$found" ]; then
perform task here
fi
}
export -f doit
find . -type f | parallel doit :::: - data_file

The following example is a full blown parallel execution method, that deals with:
Execution time (will warn after a certain execution time, and stop tasks after more time has passed)
Async logging (keeps logging what's going on while tasks being executed)
Parallelism (allows to specify the number of simultaneous tasks)
IO related zombie tasks (will not block the execution)
Does handle killing of grand children pids
Lots of more stuff
In your example, your (hardened) code would look like:
# Load the ExecTasks function described below (must be in the same directory as this one)
source ./exectasks.sh
directoryToProcess="/my/dir/to/find/stuff/into"
tasklist=""
# Prepare task list separated by semicolumn
while IFS= read -r -d $'\0' file; do
if grep "$line" "$file" > /dev/null 2>&1; then
tasklist="$tasklist""my_task;"
done < <(find "$directoryToProcess" -type f -print0)
# Run tasks
ExecTasks "$tasklist" "trivial-task-id" false 1800 3600 18000 36000 true 1 1800 true false false 8
Here we used a complex function ExecTasks that will deal with parallel queueing the tasks, and let you keep control of what's going on without fear to block the script because of some hanged task.
Quick explanation of ExecTasks arguments:
"$tasklist" = variable containing task list
"some name" trivial task id (in order to identify in logs)
boolean: read tasks from file (you may have passed a task list from a file if there are too many to fit in a variable
1800 = maximum number of seconds a task may be executed before a warning is raised
3600 = maximum number of seconds a task may be executed before an error is raised and the tasks is stopped
18000 = maximum number of seconds the whole tasks may be executed before a warning is raised
36000 = maximum number of seconds the whole tasks may be executed before an error is raised and all the tasks are stopped
boolean: account execution time since beginning of tasks execution (true) or since script begin
1 = number of seconds between each state check (accepts float like .1)
1800 = Number of seconds between each "i am alive" log just to know everything works as expected
boolean: show spinner (true) or not (false)
boolean: log errors when reaching max times (false) or do not log them (true)
boolean: do not log any errors at all (false) or do log them (true)
And finally
8 = number of simultaneous tasks to launch (8 in our case)
Here's the source to exectasks.sh (which you can also copy paste directly into your script header instead of source ./exectasks.sh):
function Logger {
# Dummy log function, replace with whatever you need
echo "$2: $1"
}
# Nice cli spinner so we now execution is ongoing
_OFUNCTIONS_SPINNER="|/-\\"
function Spinner {
printf " [%c] \b\b\b\b\b\b" "$_OFUNCTIONS_SPINNER"
_OFUNCTIONS_SPINNER=${_OFUNCTIONS_SPINNER#?}${_OFUNCTIONS_SPINNER%%???}
return 0
}
# Portable child (and grandchild) kill function tester under Linux, BSD and MacOS X
function KillChilds {
local pid="${1}" # Parent pid to kill childs
local self="${2:-false}" # Should parent be killed too ?
# Paranoid checks, we can safely assume that $pid should not be 0 nor 1
if [ $(IsInteger "$pid") -eq 0 ] || [ "$pid" == "" ] || [ "$pid" == "0" ] || [ "$pid" == "1" ]; then
Logger "Bogus pid given [$pid]." "CRITICAL"
return 1
fi
if kill -0 "$pid" > /dev/null 2>&1; then
if children="$(pgrep -P "$pid")"; then
if [[ "$pid" == *"$children"* ]]; then
Logger "Bogus pgrep implementation." "CRITICAL"
children="${children/$pid/}"
fi
for child in $children; do
Logger "Launching KillChilds \"$child\" true" "DEBUG" #__WITH_PARANOIA_DEBUG
KillChilds "$child" true
done
fi
fi
# Try to kill nicely, if not, wait 15 seconds to let Trap actions happen before killing
if [ "$self" == true ]; then
# We need to check for pid again because it may have disappeared after recursive function call
if kill -0 "$pid" > /dev/null 2>&1; then
kill -s TERM "$pid"
Logger "Sent SIGTERM to process [$pid]." "DEBUG"
if [ $? -ne 0 ]; then
sleep 15
Logger "Sending SIGTERM to process [$pid] failed." "DEBUG"
kill -9 "$pid"
if [ $? -ne 0 ]; then
Logger "Sending SIGKILL to process [$pid] failed." "DEBUG"
return 1
fi # Simplify the return 0 logic here
else
return 0
fi
else
return 0
fi
else
return 0
fi
}
function ExecTasks {
# Mandatory arguments
local mainInput="${1}" # Contains list of pids / commands separated by semicolons or filepath to list of pids / commands
# Optional arguments
local id="${2:-base}" # Optional ID in order to identify global variables from this run (only bash variable names, no '-'). Global variables are WAIT_FOR_TASK_COMPLETION_$id and HARD_MAX_EXEC_TIME_REACHED_$id
local readFromFile="${3:-false}" # Is mainInput / auxInput a semicolon separated list (true) or a filepath (false)
local softPerProcessTime="${4:-0}" # Max time (in seconds) a pid or command can run before a warning is logged, unless set to 0
local hardPerProcessTime="${5:-0}" # Max time (in seconds) a pid or command can run before the given command / pid is stopped, unless set to 0
local softMaxTime="${6:-0}" # Max time (in seconds) for the whole function to run before a warning is logged, unless set to 0
local hardMaxTime="${7:-0}" # Max time (in seconds) for the whole function to run before all pids / commands given are stopped, unless set to 0
local counting="${8:-true}" # Should softMaxTime and hardMaxTime be accounted since function begin (true) or since script begin (false)
local sleepTime="${9:-.5}" # Seconds between each state check. The shorter the value, the snappier ExecTasks will be, but as a tradeoff, more cpu power will be used (good values are between .05 and 1)
local keepLogging="${10:-1800}" # Every keepLogging seconds, an alive message is logged. Setting this value to zero disables any alive logging
local spinner="${11:-true}" # Show spinner (true) or do not show anything (false) while running
local noTimeErrorLog="${12:-false}" # Log errors when reaching soft / hard execution times (false) or do not log errors on those triggers (true)
local noErrorLogsAtAll="${13:-false}" # Do not log any errros at all (useful for recursive ExecTasks checks)
# Parallelism specific arguments
local numberOfProcesses="${14:-0}" # Number of simulanteous commands to run, given as mainInput. Set to 0 by default (WaitForTaskCompletion mode). Setting this value enables ParallelExec mode.
local auxInput="${15}" # Contains list of commands separated by semicolons or filepath fo list of commands. Exit code of those commands decide whether main commands will be executed or not
local maxPostponeRetries="${16:-3}" # If a conditional command fails, how many times shall we try to postpone the associated main command. Set this to 0 to disable postponing
local minTimeBetweenRetries="${17:-300}" # Time (in seconds) between postponed command retries
local validExitCodes="${18:-0}" # Semi colon separated list of valid main command exit codes which will not trigger errors
local i
# Expand validExitCodes into array
IFS=';' read -r -a validExitCodes <<< "$validExitCodes"
# ParallelExec specific variables
local auxItemCount=0 # Number of conditional commands
local commandsArray=() # Array containing commands
local commandsConditionArray=() # Array containing conditional commands
local currentCommand # Variable containing currently processed command
local currentCommandCondition # Variable containing currently processed conditional command
local commandsArrayPid=() # Array containing commands indexed by pids
local commandsArrayOutput=() # Array containing command results indexed by pids
local postponedRetryCount=0 # Number of current postponed commands retries
local postponedItemCount=0 # Number of commands that have been postponed (keep at least one in order to check once)
local postponedCounter=0
local isPostponedCommand=false # Is the current command from a postponed file ?
local postponedExecTime=0 # How much time has passed since last postponed condition was checked
local needsPostponing # Does currentCommand need to be postponed
local temp
# Common variables
local pid # Current pid working on
local pidState # State of the process
local mainItemCount=0 # number of given items (pids or commands)
local readFromFile # Should we read pids / commands from a file (true)
local counter=0
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local subRetval=0 # return value of condition commands
local errorcount=0 # Number of pids that finished with errors
local pidsArray # Array of currently running pids
local newPidsArray # New array of currently running pids for next iteration
local pidsTimeArray # Array containing execution begin time of pids
local executeCommand # Boolean to check if currentCommand can be executed given a condition
local functionMode
local softAlert=false # Does a soft alert need to be triggered, if yes, send an alert once
local failedPidsList # List containing failed pids with exit code separated by semicolons (eg : 2355:1;4534:2;2354:3)
local randomOutputName # Random filename for command outputs
local currentRunningPids # String of pids running, used for debugging purposes only
# fnver 2019081401
# Initialise global variable
eval "WAIT_FOR_TASK_COMPLETION_$id=\"\""
eval "HARD_MAX_EXEC_TIME_REACHED_$id=false"
# Init function variables depending on mode
if [ $numberOfProcesses -gt 0 ]; then
functionMode=ParallelExec
else
functionMode=WaitForTaskCompletion
fi
if [ $readFromFile == false ]; then
if [ $functionMode == "WaitForTaskCompletion" ]; then
IFS=';' read -r -a pidsArray <<< "$mainInput"
mainItemCount="${#pidsArray[#]}"
else
IFS=';' read -r -a commandsArray <<< "$mainInput"
mainItemCount="${#commandsArray[#]}"
IFS=';' read -r -a commandsConditionArray <<< "$auxInput"
auxItemCount="${#commandsConditionArray[#]}"
fi
else
if [ -f "$mainInput" ]; then
mainItemCount=$(wc -l < "$mainInput")
readFromFile=true
else
Logger "Cannot read main file [$mainInput]." "WARN"
fi
if [ "$auxInput" != "" ]; then
if [ -f "$auxInput" ]; then
auxItemCount=$(wc -l < "$auxInput")
else
Logger "Cannot read aux file [$auxInput]." "WARN"
fi
fi
fi
if [ $functionMode == "WaitForTaskCompletion" ]; then
# Force first while loop condition to be true because we don't deal with counters but pids in WaitForTaskCompletion mode
counter=$mainItemCount
fi
# soft / hard execution time checks that needs to be a subfunction since it is called both from main loop and from parallelExec sub loop
function _ExecTasksTimeCheck {
if [ $spinner == true ]; then
Spinner
fi
if [ $counting == true ]; then
exec_time=$((SECONDS - seconds_begin))
else
exec_time=$SECONDS
fi
if [ $keepLogging -ne 0 ]; then
# This log solely exists for readability purposes before having next set of logs
if [ ${#pidsArray[#]} -eq $numberOfProcesses ] && [ $log_ttime -eq 0 ]; then
log_ttime=$exec_time
Logger "There are $((mainItemCount-counter+postponedItemCount)) / $mainItemCount tasks in the queue of which $postponedItemCount are postponed. Currently, ${#pidsArray[#]} tasks running with pids [$(joinString , ${pidsArray[#]})]." "NOTICE"
fi
if [ $(((exec_time + 1) % keepLogging)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then # Fix when sleep time lower than 1 second
log_ttime=$exec_time
if [ $functionMode == "WaitForTaskCompletion" ]; then
Logger "Current tasks still running with pids [$(joinString , ${pidsArray[#]})]." "NOTICE"
elif [ $functionMode == "ParallelExec" ]; then
Logger "There are $((mainItemCount-counter+postponedItemCount)) / $mainItemCount tasks in the queue of which $postponedItemCount are postponed. Currently, ${#pidsArray[#]} tasks running with pids [$(joinString , ${pidsArray[#]})]." "NOTICE"
fi
fi
fi
fi
if [ $exec_time -gt $softMaxTime ]; then
if [ "$softAlert" != true ] && [ $softMaxTime -ne 0 ] && [ $noTimeErrorLog != true ]; then
Logger "Max soft execution time [$softMaxTime] exceeded for task [$id] with pids [$(joinString , ${pidsArray[#]})]." "WARN"
softAlert=true
SendAlert true
fi
fi
if [ $exec_time -gt $hardMaxTime ] && [ $hardMaxTime -ne 0 ]; then
if [ $noTimeErrorLog != true ]; then
Logger "Max hard execution time [$hardMaxTime] exceeded for task [$id] with pids [$(joinString , ${pidsArray[#]})]. Stopping task execution." "ERROR"
fi
for pid in "${pidsArray[#]}"; do
KillChilds $pid true
if [ $? -eq 0 ]; then
Logger "Task with pid [$pid] stopped successfully." "NOTICE"
else
if [ $noErrorLogsAtAll != true ]; then
Logger "Could not stop task with pid [$pid]." "ERROR"
fi
fi
errorcount=$((errorcount+1))
done
if [ $noTimeErrorLog != true ]; then
SendAlert true
fi
eval "HARD_MAX_EXEC_TIME_REACHED_$id=true"
if [ $functionMode == "WaitForTaskCompletion" ]; then
return $errorcount
else
return 129
fi
fi
}
function _ExecTasksPidsCheck {
newPidsArray=()
if [ "$currentRunningPids" != "$(joinString " " ${pidsArray[#]})" ]; then
Logger "ExecTask running for pids [$(joinString " " ${pidsArray[#]})]." "DEBUG"
currentRunningPids="$(joinString " " ${pidsArray[#]})"
fi
for pid in "${pidsArray[#]}"; do
if [ $(IsInteger $pid) -eq 1 ]; then
if kill -0 $pid > /dev/null 2>&1; then
# Handle uninterruptible sleep state or zombies by ommiting them from running process array (How to kill that is already dead ? :)
pidState="$(eval $PROCESS_STATE_CMD)"
if [ "$pidState" != "D" ] && [ "$pidState" != "Z" ]; then
# Check if pid hasn't run more than soft/hard perProcessTime
pidsTimeArray[$pid]=$((SECONDS - seconds_begin))
if [ ${pidsTimeArray[$pid]} -gt $softPerProcessTime ]; then
if [ "$softAlert" != true ] && [ $softPerProcessTime -ne 0 ] && [ $noTimeErrorLog != true ]; then
Logger "Max soft execution time [$softPerProcessTime] exceeded for pid [$pid]." "WARN"
if [ "${commandsArrayPid[$pid]}]" != "" ]; then
Logger "Command was [${commandsArrayPid[$pid]}]]." "WARN"
fi
softAlert=true
SendAlert true
fi
fi
if [ ${pidsTimeArray[$pid]} -gt $hardPerProcessTime ] && [ $hardPerProcessTime -ne 0 ]; then
if [ $noTimeErrorLog != true ] && [ $noErrorLogsAtAll != true ]; then
Logger "Max hard execution time [$hardPerProcessTime] exceeded for pid [$pid]. Stopping command execution." "ERROR"
if [ "${commandsArrayPid[$pid]}]" != "" ]; then
Logger "Command was [${commandsArrayPid[$pid]}]]." "WARN"
fi
fi
KillChilds $pid true
if [ $? -eq 0 ]; then
Logger "Command with pid [$pid] stopped successfully." "NOTICE"
else
if [ $noErrorLogsAtAll != true ]; then
Logger "Could not stop command with pid [$pid]." "ERROR"
fi
fi
errorcount=$((errorcount+1))
if [ $noTimeErrorLog != true ]; then
SendAlert true
fi
fi
newPidsArray+=($pid)
fi
else
# pid is dead, get its exit code from wait command
wait $pid
retval=$?
# Check for valid exit codes
if [ $(ArrayContains $retval "${validExitCodes[#]}") -eq 0 ]; then
if [ $noErrorLogsAtAll != true ]; then
Logger "${FUNCNAME[0]} called by [$id] finished monitoring pid [$pid] with exitcode [$retval]." "ERROR"
if [ "$functionMode" == "ParallelExec" ]; then
Logger "Command was [${commandsArrayPid[$pid]}]." "ERROR"
fi
if [ -f "${commandsArrayOutput[$pid]}" ]; then
Logger "Truncated output:\n$(head -c16384 "${commandsArrayOutput[$pid]}")" "ERROR"
fi
fi
errorcount=$((errorcount+1))
# Welcome to variable variable bash hell
if [ "$failedPidsList" == "" ]; then
failedPidsList="$pid:$retval"
else
failedPidsList="$failedPidsList;$pid:$retval"
fi
else
Logger "${FUNCNAME[0]} called by [$id] finished monitoring pid [$pid] with exitcode [$retval]." "DEBUG"
fi
fi
fi
done
# hasPids can be false on last iteration in ParallelExec mode
pidsArray=("${newPidsArray[#]}")
# Trivial wait time for bash to not eat up all CPU
sleep $sleepTime
}
while [ ${#pidsArray[#]} -gt 0 ] || [ $counter -lt $mainItemCount ] || [ $postponedItemCount -ne 0 ]; do
_ExecTasksTimeCheck
retval=$?
if [ $retval -ne 0 ]; then
return $retval;
fi
# The following execution bloc is only needed in ParallelExec mode since WaitForTaskCompletion does not execute commands, but only monitors them
if [ $functionMode == "ParallelExec" ]; then
while [ ${#pidsArray[#]} -lt $numberOfProcesses ] && ([ $counter -lt $mainItemCount ] || [ $postponedItemCount -ne 0 ]); do
_ExecTasksTimeCheck
retval=$?
if [ $retval -ne 0 ]; then
return $retval;
fi
executeCommand=false
isPostponedCommand=false
currentCommand=""
currentCommandCondition=""
needsPostponing=false
if [ $readFromFile == true ]; then
# awk identifies first line as 1 instead of 0 so we need to increase counter
currentCommand=$(awk 'NR == num_line {print; exit}' num_line=$((counter+1)) "$mainInput")
if [ $auxItemCount -ne 0 ]; then
currentCommandCondition=$(awk 'NR == num_line {print; exit}' num_line=$((counter+1)) "$auxInput")
fi
# Check if we need to fetch postponed commands
if [ "$currentCommand" == "" ]; then
currentCommand=$(awk 'NR == num_line {print; exit}' num_line=$((postponedCounter+1)) "$RUN_DIR/$PROGRAM.${FUNCNAME[0]}-postponedMain.$id.$SCRIPT_PID.$TSTAMP")
currentCommandCondition=$(awk 'NR == num_line {print; exit}' num_line=$((postponedCounter+1)) "$RUN_DIR/$PROGRAM.${FUNCNAME[0]}-postponedAux.$id.$SCRIPT_PID.$TSTAMP")
isPostponedCommand=true
fi
else
currentCommand="${commandsArray[$counter]}"
if [ $auxItemCount -ne 0 ]; then
currentCommandCondition="${commandsConditionArray[$counter]}"
fi
if [ "$currentCommand" == "" ]; then
currentCommand="${postponedCommandsArray[$postponedCounter]}"
currentCommandCondition="${postponedCommandsConditionArray[$postponedCounter]}"
isPostponedCommand=true
fi
fi
# Check if we execute postponed commands, or if we delay them
if [ $isPostponedCommand == true ]; then
# Get first value before '#'
postponedExecTime="${currentCommand%%#*}"
postponedExecTime=$((SECONDS-postponedExecTime))
# Get everything after first '#'
temp="${currentCommand#*#}"
# Get first value before '#'
postponedRetryCount="${temp%%#*}"
# Replace currentCommand with actual filtered currentCommand
currentCommand="${temp#*#}"
# Since we read a postponed command, we may decrase postponedItemCounter
postponedItemCount=$((postponedItemCount-1))
#Since we read one line, we need to increase the counter
postponedCounter=$((postponedCounter+1))
else
postponedRetryCount=0
postponedExecTime=0
fi
if ([ $postponedRetryCount -lt $maxPostponeRetries ] && [ $postponedExecTime -ge $minTimeBetweenRetries ]) || [ $isPostponedCommand == false ]; then
if [ "$currentCommandCondition" != "" ]; then
Logger "Checking condition [$currentCommandCondition] for command [$currentCommand]." "DEBUG"
eval "$currentCommandCondition" &
ExecTasks $! "subConditionCheck" false 0 0 1800 3600 true $SLEEP_TIME $KEEP_LOGGING true true true
subRetval=$?
if [ $subRetval -ne 0 ]; then
# is postponing enabled ?
if [ $maxPostponeRetries -gt 0 ]; then
Logger "Condition [$currentCommandCondition] not met for command [$currentCommand]. Exit code [$subRetval]. Postponing command." "NOTICE"
postponedRetryCount=$((postponedRetryCount+1))
if [ $postponedRetryCount -ge $maxPostponeRetries ]; then
Logger "Max retries reached for postponed command [$currentCommand]. Skipping command." "NOTICE"
else
needsPostponing=true
fi
postponedExecTime=0
else
Logger "Condition [$currentCommandCondition] not met for command [$currentCommand]. Exit code [$subRetval]. Ignoring command." "NOTICE"
fi
else
executeCommand=true
fi
else
executeCommand=true
fi
else
needsPostponing=true
fi
if [ $needsPostponing == true ]; then
postponedItemCount=$((postponedItemCount+1))
if [ $readFromFile == true ]; then
echo "$((SECONDS-postponedExecTime))#$postponedRetryCount#$currentCommand" >> "$RUN_DIR/$PROGRAM.${FUNCNAME[0]}-postponedMain.$id.$SCRIPT_PID.$TSTAMP"
echo "$currentCommandCondition" >> "$RUN_DIR/$PROGRAM.${FUNCNAME[0]}-postponedAux.$id.$SCRIPT_PID.$TSTAMP"
else
postponedCommandsArray+=("$((SECONDS-postponedExecTime))#$postponedRetryCount#$currentCommand")
postponedCommandsConditionArray+=("$currentCommandCondition")
fi
fi
if [ $executeCommand == true ]; then
Logger "Running command [$currentCommand]." "DEBUG"
randomOutputName=$(date '+%Y%m%dT%H%M%S').$(PoorMansRandomGenerator 5)
eval "$currentCommand" >> "$RUN_DIR/$PROGRAM.${FUNCNAME[0]}.$id.$pid.$randomOutputName.$SCRIPT_PID.$TSTAMP" 2>&1 &
pid=$!
pidsArray+=($pid)
commandsArrayPid[$pid]="$currentCommand"
commandsArrayOutput[$pid]="$RUN_DIR/$PROGRAM.${FUNCNAME[0]}.$id.$pid.$randomOutputName.$SCRIPT_PID.$TSTAMP"
# Initialize pid execution time array
pidsTimeArray[$pid]=0
else
Logger "Skipping command [$currentCommand]." "DEBUG"
fi
if [ $isPostponedCommand == false ]; then
counter=$((counter+1))
fi
_ExecTasksPidsCheck
done
fi
_ExecTasksPidsCheck
done
# Return exit code if only one process was monitored, else return number of errors
# As we cannot return multiple values, a global variable WAIT_FOR_TASK_COMPLETION contains all pids with their return value
eval "WAIT_FOR_TASK_COMPLETION_$id=\"$failedPidsList\""
if [ $mainItemCount -eq 1 ]; then
return $retval
else
return $errorcount
fi
}
Hope you have fun.

You can do it like this :
files=$(find . -type f)
for f in $files; do
while read -r line; do
{
found=$(grep $line $f)
if [ ! -z "$found" ]; then
## perform task here
fi
} &
done < data_file
done
wait
It will execute the block within {} in the background. So basically it will open as many background processes as files you have. If you want finer control over how many processes are actually spawned you can instead use parallel.

The find command will slow things down and the script is more complicated than it needs to be.
If you want to do this with grep, better to loop through data_file and within that grep $line * > /dev/null && do_something (or grep -R $line * > /dev/null && do_something if there are subdirectories to deal with)

You could use grep's q option to stop searching after the first match and f option to obtain the patterns from a file:
for f in $(find . -type f); do
if $(grep -qf data_file "$f"); then
...
fi
done
If data_file contains:
xxx
yyy
zzz
then grep -qf "$data_file" "$f" evaluates to true if either xxx, yyy, or zzz are found in $f.

Related

Keep retrying yarn script until it passes

I am new to bash and wondering if there is a way to run a script x amount of times until it succeeds? I have the following script, but it naturally bails out and doesn't retry until it succeeds.
yarn graphql
if [ $? -eq 0 ]
then
echo "SUCCESS"
else
echo "FAIL"
fi
I can see there is a way to continuously loop, however is there a way to throttle this to say, loop every second, for 30 seconds?
while :
do
command
done

I guess you could devise a dedicated bash function for this, relying on the sleep command.
E.g., this code is freely inspired from that code by Travis, distributed under the MIT license:
#!/usr/bin/env bash
ANSI_GREEN="\033[32;1m"
ANSI_RED="\033[31;1m"
ANSI_RESET="\033[0m"
usage() {
cat >&2 <<EOF
Usage: retry_until WAIT MAX_TIMES COMMAND...
Examples:
retry_until 1s 3 echo ok
retry_until 1s 3 false
retry_until 1s 0 false
retry_until 30s 0 false
EOF
}
retry_until() {
[ $# -lt 3 ] && { usage; return 2; }
local wait_for="$1" # e.g., "30s"
local max_times="$2" # e.g., "3" (or "0" to have no limit)
shift 2
local result=0
local count=1
local str_of=''
[ "$max_times" -gt 0 ] && str_of=" of $max_times"
while [ "$count" -le "$max_times" ] || [ "$max_times" -le 0 ]; do
[ "$result" -ne 0 ] && {
echo -e "\n${ANSI_RED}The command '$*' failed. Retrying, #$count$str_of.${ANSI_RESET}\n" >&2
}
"$#" && {
echo -e "\n${ANSI_GREEN}The command '$*' succeeded on attempt #$count.${ANSI_RESET}\n" >&2
result=0
break
} || result=$?
count=$((count + 1))
sleep "$wait_for"
done
[ "$max_times" -gt 0 ] && [ "$count" -gt "$max_times" ] && {
echo -e "\n${ANSI_RED}The command '$*' failed $max_times times.${ANSI_RESET}\n" >&2
}
return "$result"
}
Then to fully answer your question, you could run:
retry_until 1s 30 command

Using Flock in Bash so Request is Made Only Once

I'm trying to configure my script in such a way that
If some data isn't available, try to fetch it
If another process is already fetching it, wait for that process to finish
Use the data
And from here I found this very nice example of flock:
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200
And this fails if it can't aquire the lock (-n flag).
Can I assume that this means another file has locked the $pidfile, and how can I detect that the lock has been released in a different process?
I understand that wait $pid would wait until that process is complete, and so if there's some way to record which process currently holds the lock or just detect the unlocking so that other processes know once the data is available, then I think this will work.
Any ideas?

As per the flock (1) man page,
if the lock cannot be immediately acquired, [in the absence of a -w timeout], flock waits until the lock is available
You can use fuser to see which process is holding a file handle.

My solution uses two files, pid.temp and data.temp:
backgroundGetData() {
local data=$1
# if global is null, check file.
if [ -z "$data" ]; then
data=$( cat $DATA_TEMP_FILE 2>/dev/null )
fi
# if file is empty, check process is making the request
if [ -z "$data" ]; then
for i in {1..5}; do
echo "INFO - Process: $BASHPID - Attempting to lock data temp file" >&2
local request_pid=$( cat $PID_FILE 2>/dev/null )
if [ -z "$request_pid" ]; then request_pid=0; fi
local exit_code=1
if [ "$request_pid" -eq 0 ]; then
( flock -n 200 || exit 1
echo "INFO - Process: $BASHPID - Fetching data." >&2
echo "$BASHPID">"$PID_FILE"
getData > $DATA_TEMP_FILE
) 200>$DATA_TEMP_FILE
exit_code=$?
fi
echo "INFO - Process: $BASHPID - returned $exit_code from lock attempt.">&2
[ $request_pid -ne 0 ] && echo "INFO - Process: $BASHPID - $request_pid is possibly locking">&2
if [ $exit_code -ne 0 ] && [ $request_pid -ne 0 ]; then
echo "INFO - Process: $BASHPID - waiting on $request_pid to complete">&2
tail --pid=${request_pid} -f /dev/null
echo "INFO - Process: $BASHPID - finished waiting.">&2
break
elif [ $exit_code -eq 0 ]; then break;
else
sleep 2
fi
done
data=$( cat $DATA_TEMP_FILE )
if [ -z "$data" ]; then
echo "WARN - Process: $BASHPID - Failed to retrieve data.">&2
fi
fi
echo "$least_loaded"
}
And it can be used like so:
DATA=""
DATA_TEMP_FILE="data.temp"
PID_FILE="pid.temp"
$( backgroundGetData $DATA ) & ## Begin making request
doThing() {
if [ -z $DATA ]; then
# Redirect 3 to stdout, then use sterr in backgroundGetData to 3 so that
# logging messages can be shown and output can also be captued in variable.
exec 3>&1
DATA=$( backgroundGetData $DATA 2>&3)
fi
}
for job in "$jobs"; do
doThing &
done
It's working for me, though I'm not 100% sure on how safe it is.

How to avoid printing an error in the console in a Bash script when executing a command?

How to avoid printing an error in Bash? I want to do something like this. If the user enters a wrong argument (like a "." for example), it will just exit the program rather than displaying the error on the terminal. (I've not posted the whole code here... That's a bit long).
if [ -n "$1" ]; then
sleep_time=$1
# it doesn't work, and displays the error on the screen
sleep $sleep_time > /dev/null
if [ "$?" -eq 0 ]; then
measurement $sleep_time
else
exit
fi
# if invalid arguments passed, take the refreshing interval from the user
else
echo "Proper Usage: $0 refresh_interval(in seconds)"
read -p "Please Provide the Update Time: " sleep_time
sleep $sleep_time > /dev/null
if [ "$?" -eq 0 ]; then
measurement $sleep_time
else
exit
fi
fi

2>/dev/null will discard any errors. Your code can be simplified like this:
#!/usr/bin/env bash
if [[ $# -eq 0 ]]; then
echo "Usage: $0 refresh_interval (in seconds)"
read -p "Please provide time: " sleep_time
else
sleep_time=$1
fi
sleep "$sleep_time" 2>/dev/null || { echo "Wrong time" >&2; exit 1; }
# everything OK - do stuff here
# ...

Create parallel processes and wait for all of them to finish, then redo steps

What i want to do should be pretty simple, on my own i have reached the solution below, all i need is a few pointers to tell me if this is the way to do it or i should refactor anything in the code.
The below code, should create a few parallel processes and wait for them to finish executing then rerun the code again and again and again...
The script is triggered by a cron job once at 10 minutes, if the script is running, then do nothing, otherwise start the working process.
Any insight is highly appreciated since i am not that familiar with bash programming.
#!/bin/bash
# paths
THISPATH="$( cd "$( dirname "$0" )" && pwd )"
# make sure we move in the working directory
cd $THISPATH
# console init path
CONSOLEPATH="$( cd ../../ && pwd )/console.php"
# command line arguments
daemon=0
PHPPATH="/usr/bin/php"
help=0
# flag for binary search
LOOKEDFORPHP=0
# arguments init
while getopts d:p:h: opt; do
case $opt in
d)
daemon=$OPTARG
;;
p)
PHPPATH=$OPTARG
LOOKEDFORPHP=1
;;
h)
help=$OPTARG
;;
esac
done
shift $((OPTIND - 1))
# allow only one process
processesLength=$(ps aux | grep -v "grep" | grep -c $THISPATH/send-campaigns-daemon.sh)
if [ ${processesLength:-0} -gt 2 ]; then
# The process is already running
exit 0
fi
if [ $help -eq 1 ]; then
echo "---------------------------------------------------------------"
echo "| Usage: send-campaigns-daemon.sh |"
echo "| To force PHP CLI binary : |"
echo "| send-campaigns-daemon.sh -p /path/to/php-cli/binary |"
echo "---------------------------------------------------------------"
exit 0
fi
# php executable path, find it if not provided
if [ $PHPPATH ] && [ ! -f $PHPPATH ] && [ $LOOKEDFORPHP -eq 0 ]; then
phpVariants=( "php-cli" "php5-cli" "php5" "php" )
LOOKEDFORPHP=1
for i in "${phpVariants[#]}"
do
which $i >/dev/null 2>&1
if [ $? -eq 0 ]; then
PHPPATH=$(which $i)
fi
done
fi
if [ ! $PHPPATH ] || [ ! -f $PHPPATH ]; then
# Did not find PHP
exit 1
fi
# load options from app
parallelProcessesPerCampaign=3
campaignsAtOnce=10
subscribersAtOnce=300
sleepTime=30
function loadOptions {
local COMMAND="$PHPPATH $CONSOLEPATH option get_option --name=%s --default=%d"
parallelProcessesPerCampaign=$(printf "$COMMAND" "system.cron.send_campaigns.parallel_processes_per_campaign" 3)
campaignsAtOnce=$(printf "$COMMAND" "system.cron.send_campaigns.campaigns_at_once" 10)
subscribersAtOnce=$(printf "$COMMAND" "system.cron.send_campaigns.subscribers_at_once" 300)
sleepTime=$(printf "$COMMAND" "system.cron.send_campaigns.pause" 30)
parallelProcessesPerCampaign=$($parallelProcessesPerCampaign)
campaignsAtOnce=$($campaignsAtOnce)
subscribersAtOnce=$($subscribersAtOnce)
sleepTime=$($sleepTime)
}
# define the daemon function that will stay in loop
function daemon {
loadOptions
local pids=()
local k=0
local i=0
local COMMAND="$PHPPATH -q $CONSOLEPATH send-campaigns --campaigns_offset=%d --campaigns_limit=%d --subscribers_offset=%d --subscribers_limit=%d --parallel_process_number=%d --parallel_processes_count=%d --usleep=%d --from_daemon=1"
while [ $i -lt $campaignsAtOnce ]
do
while [ $k -lt $parallelProcessesPerCampaign ]
do
parallelProcessNumber=$(( $k + 1 ))
usleep=$(( $k * 10 + $i * 10 ))
CMD=$(printf "$COMMAND" $i 1 $(( $subscribersAtOnce * $k )) $subscribersAtOnce $parallelProcessNumber $parallelProcessesPerCampaign $usleep)
$CMD > /dev/null 2>&1 &
pids+=($!)
k=$(( k + 1 ))
done
i=$(( i + 1 ))
done
waitForPids pids
sleep $sleepTime
daemon
}
function daemonize {
$THISPATH/send-campaigns-daemon.sh -d 1 -p $PHPPATH > /dev/null 2>&1 &
}
function waitForPids {
stillRunning=0
for i in "${pids[#]}"
do
if ps -p $i > /dev/null
then
stillRunning=1
break
fi
done
if [ $stillRunning -eq 1 ]; then
sleep 0.5
waitForPids pids
fi
return 0
}
if [ $daemon -eq 1 ]; then
daemon
else
daemonize
fi
exit 0

when starting a script, create a lock file to know that this script is running. When the script finish, delete the lock file. If somebody kill the process while it is running, the lock file remain forever, though test how old it is and delete after if older than a defined value. For example,
#!/bin/bash
# 10 min
LOCK_MAX=600
typedef LOCKFILE=/var/lock/${0##*/}.lock
if [[ -f $LOCKFILE ]] ; then
TIMEINI=$( stat -c %X $LOCKFILE )
SEGS=$(( $(date +%s) - $TIEMPOINI ))
if [[ $SEGS -gt $LOCK_MAX ]] ; then
reportLocking or somethig to inform you
# Kill old intance ???
OLDPID=$(<$LOCKFILE)
[[ -e /proc/$OLDPID ]] && kill -9 $OLDPID
# Next time that the program is run, there is no lock file and it will run.
rm $LOCKFILE
fi
exit 65
fi
# Save PID of this instance to the lock file
echo "$$" > $LOCKFILE
### Your code go here
# Remove the lock file before script finish
[[ -e $LOCKFILE ]] && rm $LOCKFILE
exit 0

from here:
#!/bin/bash
...
echo PARALLEL_JOBS:${PARALLEL_JOBS:=1}
declare -a tests=($(.../find_what_to_run))
echo "${tests[#]}" | \
xargs -d' ' -n1 -P${PARALLEL_JOBS} -I {} bash -c ".../run_that {}" || { echo "FAILURE"; exit 1; }
echo "SUCCESS"
and here you can nick the code for portable locking with fuser

Okay, so i guess i can answer to my own question with a proper answer that works after many tests.
So here is the final version, simplified, without comments/echo :
#!/bin/bash
sleep 2
DIR="$( cd "$( dirname "$0" )" && pwd )"
FILE_NAME="$( basename "$0" )"
COMMAND_FILE_PATH="$DIR/$FILE_NAME"
if [ ! -f "$COMMAND_FILE_PATH" ]; then
exit 1
fi
cd $DIR
CONSOLE_PATH="$( cd ../../ && pwd )/console.php"
PHP_PATH="/usr/bin/php"
help=0
LOOKED_FOR_PHP=0
while getopts p:h: opt; do
case $opt in
p)
PHP_PATH=$OPTARG
LOOKED_FOR_PHP=1
;;
h)
help=$OPTARG
;;
esac
done
shift $((OPTIND - 1))
if [ $help -eq 1 ]; then
printf "%s\n" "HELP INFO"
exit 0
fi
if [ "$PHP_PATH" ] && [ ! -f "$PHP_PATH" ] && [ "$LOOKED_FOR_PHP" -eq 0 ]; then
php_variants=( "php-cli" "php5-cli" "php5" "php" )
LOOKED_FOR_PHP=1
for i in "${php_variants[#]}"
do
which $i >/dev/null 2>&1
if [ $? -eq 0 ]; then
PHP_PATH="$(which $i)"
break
fi
done
fi
if [ ! "$PHP_PATH" ] || [ ! -f "$PHP_PATH" ]; then
exit 1
fi
LOCK_BASE_PATH="$( cd ../../../common/runtime && pwd )/shell-pids"
LOCK_PATH="$LOCK_BASE_PATH/send-campaigns-daemon.pid"
function remove_lock {
if [ -d "$LOCK_PATH" ]; then
rmdir "$LOCK_PATH" > /dev/null 2>&1
fi
exit 0
}
if [ ! -d "$LOCK_BASE_PATH" ]; then
if ! mkdir -p "$LOCK_BASE_PATH" > /dev/null 2>&1; then
exit 1
fi
fi
process_running=0
if mkdir "$LOCK_PATH" > /dev/null 2>&1; then
process_running=0
else
process_running=1
fi
if [ $process_running -eq 1 ]; then
exit 0
fi
trap "remove_lock" 1 2 3 15
COMMAND="$PHP_PATH $CONSOLE_PATH option get_option --name=%s --default=%d"
parallel_processes_per_campaign=$(printf "$COMMAND" "system.cron.send_campaigns.parallel_processes_per_campaign" 3)
campaigns_at_once=$(printf "$COMMAND" "system.cron.send_campaigns.campaigns_at_once" 10)
subscribers_at_once=$(printf "$COMMAND" "system.cron.send_campaigns.subscribers_at_once" 300)
sleep_time=$(printf "$COMMAND" "system.cron.send_campaigns.pause" 30)
parallel_processes_per_campaign=$($parallel_processes_per_campaign)
campaigns_at_once=$($campaigns_at_once)
subscribers_at_once=$($subscribers_at_once)
sleep_time=$($sleep_time)
k=0
i=0
pp=0
COMMAND="$PHP_PATH -q $CONSOLE_PATH send-campaigns --campaigns_offset=%d --campaigns_limit=%d --subscribers_offset=%d --subscribers_limit=%d --parallel_process_number=%d --parallel_processes_count=%d --usleep=%d --from_daemon=1"
while [ $i -lt $campaigns_at_once ]
do
while [ $k -lt $parallel_processes_per_campaign ]
do
parallel_process_number=$(( $k + 1 ))
usleep=$(( $k * 10 + $i * 10 ))
CMD=$(printf "$COMMAND" $i 1 $(( $subscribers_at_once * $k )) $subscribers_at_once $parallel_process_number $parallel_processes_per_campaign $usleep)
$CMD > /dev/null 2>&1 &
k=$(( k + 1 ))
pp=$(( pp + 1 ))
done
i=$(( i + 1 ))
done
wait
sleep ${sleep_time:-30}
$COMMAND_FILE_PATH -p "$PHP_PATH" > /dev/null 2>&1 &
remove_lock
exit 0

Usually, it is a lock file, not a lock path. You hold the PID in the lock file for monitoring your process. In this case your lock directory does not hold any PID information. Your script also does not do any PID file/directory maintenance when it starts in case of a improper shutdown of your process without cleaning of your lock.
I like your first script better with this in mind. Monitoring the PID's running directly is cleaner. The only problem is if you start a second instance with cron, it is not aware of the PID's connect to the first instance.
You also have processLength -gt 2 which is 2, not 1 process running so you will duplicate your process threads.
It seems also that daemonize is just recalling the script with daemon which is not very useful. Also, having a variable with the same name as a function is not effective.

The correct way to make a lockfile is like this:
# Create a temporary file
echo $$ > ${LOCKFILE}.tmp$$
# Try the lock; ln without -f is atomic
if ln ${LOCKFILE}.tmp$$ ${LOCKFILE}; then
# we got the lock
else
# we didn't get the lock
fi
# Tidy up the temporary file
rm ${LOCKFILE}.tmp$$
And to release the lock:
# Unlock
rm ${LOCKFILE}
The key thing is to create the lock file to one side, using a unique name, and then try to link it to the real name. This is an atomic operation, so it should be safe.
Any solution that does "test and set" gives you a race condition to deal with. Yes, that can be sorted out, but you end up write extra code.

How to kick out a user from my machine by executing a script ? Bash

I want to be able to kick out some users from my machine, and prevent them to execute anything
For this i considered writing a script (kick-out.sh) on the /etc/profile.d/ to be able to execute it automatically whenever somebody connects to my machine.
Do you have an idea how to do this ?
N.B: I don't have admin privileges.
Thanks,
Debugger

for the bash - simple example (not bulletproof - it is interruptable):
case `whoami` in
notwanted1|notwanted2|notwanted3) logout;;
esac
but, in normal system you cannot do this without root (admin) privilege.

It's been done:
http://se.archive.ubuntu.com/ubuntu/pool/universe/s/slay/slay_2.7.0_all.deb
#!/bin/sh
#
# slay 2.0 - kill all processes belonging to the specified user(s).
# originally by Chris Ausbrooks <fish#bucket.ualr.edu>
# based on kall (a script of unknown origin)
# Heavily rewritten by Pawel Wiecek <coven#debian.org> for Debian
# Revision history:
# 0.99 First attempt.
# 1.0 Added Butthead.
# 1.1 Added retribution.
# 1.2 Added slayee notification.
# 2.0 Completely rewritten
# 2.1 Fix an *ugly* bug that caused slayer to be slain...
# 2.2, 2.3 Debian specific updates
# 2.4 Updated command line handler to avois username/signal name mismatches
# 2.5 Debian specific updates
# 2.6 Properly slay oneself, fixed misleading error messages
# 3.7 Set PATH to prevent all sorts of problems with it coming from outside
PATH=/usr/sbin:/sbin:/usr/bin:/bin
export PATH
USER=`whoami`
SIGNAL='-KILL'
SLAYEE=''
ME=`basename $0`
COOL='0'
# this piece of nested ifs is added for Debian package only
if [ -f /etc/slay_mode ]
then
if grep -q mean /etc/slay_mode
then
MODE='mean'
fi
if grep -q nice /etc/slay_mode
then
MODE='nice'
fi
if [ -z $SLAY_BUTTHEAD ]
then
if grep -q butthead /etc/slay_mode
then
SLAY_BUTTHEAD='on'
fi
if grep -q normal /etc/slay_mode
then
SLAY_BUTTHEAD='off'
fi
fi
else
MODE='mean'
if [ -z $SLAY_BUTTHEAD ]
then
SLAY_BUTTHEAD='off'
fi
fi
# Command line handling.
while [ $# -gt 0 ]
do
case $1 in
-*)
SIGNAL=$1
;;
*)
SLAYEE="$SLAYEE $1"
esac
shift
done
if [ "$SIGNAL" != "-clean" ]
then
SIGSHOW="$SIGNAL"
else
SIGSHOW="-TERM + -KILL"
fi
# Help for losers.
if [ "$SLAYEE" = "" -o "$SIGNAL" = "--help" ]
then
echo "usage: $ME [-signal] name [name...]"
if [ "$SLAY_BUTTHEAD" = "on" ]
then
echo " Like, kills people and stuff."
echo " With -clean kicks ass forst and then does real pain."
else
echo " Kills all processes belonging to any of the given names."
echo " Use -clean as a signal name to kill with TERM first and then with KILL."
fi
exit -1
fi
# Misuse trap.
if [ "$USER" != "$(echo "${SLAYEE}" | tr -d ' ')" ]
then
if [ "$USER" != "root" ]
then
if [ "$MODE" = "mean" ]
then
$0 -KILL $USER
else
if [ "$SLAY_BUTTHEAD" = "on" ]
then
echo "${ME}: Cut it out."
else
echo "${ME}: Only root gets to do that."
fi
fi
exit 2
fi
fi
# Main body.
for slayee in $SLAYEE
do
if [ "$slayee" = "$USER" ]
then
if [ "$SLAY_BUTTHEAD" = "on" ]
then
echo "${ME}: Beavis, don't make me have to smack you."
else
echo "${ME}: Illegal operation."
fi
fi
COOL="1"
if [ "$SLAY_BUTTHEAD" = "on" ]
then
cat <<-_THEEND_ | write $slayee
${ME}: ${SIGSHOW} is kicking ${slayee}'s butt!
I'm kicking your butt.
_THEEND_
else
cat <<-_THEEND_ | write $slayee
${ME}: Sending ${SIGSHOW} signal to ${slayee}'s process(es)..."
Your current session has been terminated.
_THEEND_
fi
if [ "$SIGNAL" = "-clean" ]
then
su -m $slayee -c "kill -TERM -1"
sleep 10
su -m $slayee -c "kill -KILL -1"
else
if [ "$USER" != "root" ]
then
kill $SIGNAL -1
fi
su -m $slayee -c "kill $SIGNAL -1"
fi
done 2>/dev/null
# Error message.
if [ $COOL = "0" ]
then
if [ "$SLAY_BUTTHEAD" = "on" ]
then
echo "${ME}: How old are you, Beavis?"
else
echo "${ME}: Nothing done."
fi
exit 1
fi
# Non-error message.
if [ $COOL = "1" ]
then
if [ "$SLAY_BUTTHEAD" = "on" ]
then
echo "${ME}: Whoa, I have the power supreme."
else
echo "${ME}: Done."
fi
exit 0
fi

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash multiple files processing - bash

Using GNU Parallel you can do something like this: doit() { f="$1" line="$2" found=$(grep $line $f) if [ ! -z "$found" ]; then perform task here fi } export -f doit find . -type f | parallel doit :::: - data_file

Related

Keep retrying yarn script until it passes

Using Flock in Bash so Request is Made Only Once

How to avoid printing an error in the console in a Bash script when executing a command?

Create parallel processes and wait for all of them to finish, then redo steps

How to kick out a user from my machine by executing a script ? Bash

Categories

Resources