Bash variable math expansion not working with printf - bash

I'm trying to get a formatted number that increments every time through a while loop.
I've got fnum=$(printf "%03d" $((++num)) ) but the number doesn't increment. fnum is "000" and remains at that.
Of course num=$((++num)) ; fnum=$(printf "%03d" $num) works but I'm wondering why the first one doesn't increment the number.

You don't need comamnd-substitution($(..)) in the first place to store the output of printf use the -v option to store it in a variable
printf -v fnum "%03d" $((++num))
Also the num variable is updated in a sub-shell, $(..) runs the command inside in a separate shell. The value of num incremented will never be reflected back in the parent shell.

With:
$(printf "%03d" $((++num)))
the command inside $() is run in a sub-shell so changes to the num variable in there are not carried back to the parent shell.
With the working version, num=$((++num)) is executed in the context of the current shell, so num is modified.
Of course, it makes little sense to assign back to num since the side-effect of ++ is changing num anyway, so you can just do something like:
((++num)) ; fnum=$(printf "%03d" $num)
And you can totally avoid starting a sub-shell and just use internal bash stuff, which will make a large difference if you need to do this a lot(a):
((++num)) ; fnum=000${num} ; fnum=${fnum: -3} ; doSomethingWith ${fnum}
(a) As seen in the following script:
rm -f qq[12]
time (
var=0
while [[ ${var} -lt 99999 ]] ; do
((++var))
svar=$(printf "%05d" ${var})
echo ${svar}
done
) >>qq1
time (
var=0
while [[ ${var} -lt 99999 ]] ; do
((++var))
svar=00000${var}
svar=${svar: -5}
echo ${svar}
done
) >>qq2
The first snippet takes a little over nine seconds CPU time (user+system) to run, the second completes in about a second (the difference is even more pronounced if you measure wall clock time, since many copies of "printf in a subshell" need to be started):
real 0m30.875s
user 0m0.320s
sys 0m9.144s
real 0m1.008s
user 0m0.924s
sys 0m0.080s

Related

IF test against ARG_MAX and number of files in directory

I have potentially a large number of generated files and subdirectories in a directory for which the number is not known ahead of execution. For simplicity, lets say I just want to invoke
mv * /another_dir/.
But for large numbers of files and subdirectories, it'll come back with the dreaded 'argument too long'. Yes I know find and xargs is a way to deal with it. But I want to test that number against ARG_MAX and try it that way first.
I'm on a Mac and so I can't up the ulimit setting.
So far I have
# max_value
if ((expr `getconf ARG_MAX` - `env|wc -c` - `env|wc -l` \* 4 - 2048) ) ; then echo "ok" ; fi
which offers me a value to test against.
Lets say my test value to compute number of files or subdirectories in directory is based on
# test_value
(find . -maxdepth 1 | wc -l)
How can I get a working expression no matter how many files
if (test_value < max_value ) ; then echo "do this" else echo "do that" ; fi
Every way I try to construct the if test, the syntax fails for some reason in trying to set the max_value and test_value parameters and then test them together. Grateful for help.
When writing shell scripts, you have to pay a lot of attention to what context you're in, and use the right syntax for that context. The thing that goes between if and then is treated as a command, so you could use e.g. if expr 2 \> 1; then. But the modern replacement of the expr command is the (( )) arithmetic expression, so you'd use if (( 2 > 1 )); then. Note that you don't need to escape the > because it's no longer part of a regular command, and that you cannot put spaces between the parentheses ( ( something ) ) does something completely different from (( something )).
(( )) lets you run a calculation/comparison/etc as though it were a command. It's also common to want to use the result of a calculation as part of a command; for that, use $(( )):
max_value=$(($(getconf ARG_MAX) - $(env|wc -c) - $(env|wc -l) * 4 - 2048))
test_value=$(ls -A | wc -l)
Note that I used $( ) instead of backticks; it does essentially the same thing, with somewhat cleaner syntax. And again, the arrangement of parentheses matter: $(( )) does a calculation and captures its result; $( ) runs a command and captures its output. And $( ( ) ) would run a command in a pointless sub-subshell and capture its output. Oh, and I used ls -A because find . -maxdepth 1 will include . (the current directory) in its output, giving an overcount of 1. (And both the find and ls versions will miscount files with linefeeds in their name; oh, well.)
Then, to do the final comparison:
if ((test_value < max_value)) ; then echo "do this" else echo "do that" ; fi

Use PS0 and PS1 to display execution time of each bash command

It seems that by executing code in PS0 and PS1 variables (which are eval'ed before and after a prompt command is run, as I understand) it should be possible to record time of each running command and display it in the prompt. Something like that:
user#machine ~/tmp
$ sleep 1
user#machine ~/tmp 1.01s
$
However, I quickly got stuck with recording time in PS0, since something like this doesn't work:
PS0='$(START=$(date +%s.%N))'
As I understand, START assignment happens in a sub-shell, so it is not visible in the outer shell. How would you approach this?
I was looking for a solution to a different problem and came upon this question, and decided that sounds like a cool feature to have. Using #Scheff's excellent answer as a base in addition to the solutions I developed for my other problem, I came up with a more elegant and full featured solution.
First, I created a few functions that read/write the time to/from memory. Writing to the shared memory folder prevents disk access and does not persist on reboot if the files are not cleaned for some reason
function roundseconds (){
# rounds a number to 3 decimal places
echo m=$1";h=0.5;scale=4;t=1000;if(m<0) h=-0.5;a=m*t+h;scale=3;a/t;" | bc
}
function bash_getstarttime (){
# places the epoch time in ns into shared memory
date +%s.%N >"/dev/shm/${USER}.bashtime.${1}"
}
function bash_getstoptime (){
# reads stored epoch time and subtracts from current
local endtime=$(date +%s.%N)
local starttime=$(cat /dev/shm/${USER}.bashtime.${1})
roundseconds $(echo $(eval echo "$endtime - $starttime") | bc)
}
The input to the bash_ functions is the bash PID
Those functions and the following are added to the ~/.bashrc file
ROOTPID=$BASHPID
bash_getstarttime $ROOTPID
These create the initial time value and store the bash PID as a different variable that can be passed to a function. Then you add the functions to PS0 and PS1
PS0='$(bash_getstarttime $ROOTPID) etc..'
PS1='\[\033[36m\] Execution time $(bash_getstoptime $ROOTPID)s\n'
PS1="$PS1"'and your normal PS1 here'
Now it will generate the time in PS0 prior to processing terminal input, and generate the time again in PS1 after processing terminal input, then calculate the difference and add to PS1. And finally, this code cleans up the stored time when the terminal exits:
function runonexit (){
rm /dev/shm/${USER}.bashtime.${ROOTPID}
}
trap runonexit EXIT
Putting it all together, plus some additional code being tested, and it looks like this:
The important parts are the execution time in ms, and the user.bashtime files for all active terminal PIDs stored in shared memory. The PID is also shown right after the terminal input, as I added display of it to PS0, and you can see the bashtime files added and removed.
PS0='$(bash_getstarttime $ROOTPID) $ROOTPID experiments \[\033[00m\]\n'
As #tc said, using arithmetic expansion allows you to assign variables during the expansion of PS0 and PS1. Newer bash versions also allow PS* style expansion so you don't even need a subshell to get the current time. With bash 4.4:
# PS0 extracts a substring of length 0 from PS1; as a side-effect it causes
# the current time as epoch seconds to PS0time (no visible output in this case)
PS0='\[${PS1:$((PS0time=\D{%s}, PS1calc=1, 0)):0}\]'
# PS1 uses the same trick to calculate the time elapsed since PS0 was output.
# It also expands the previous command's exit status ($?), the current time
# and directory ($PWD rather than \w, which shortens your home directory path
# prefix to "~") on the next line, and finally the actual prompt: 'user#host> '
PS1='\nSeconds: $((PS1calc ? \D{%s}-$PS0time : 0)) Status: $?\n\D{%T} ${PWD:PS1calc=0}\n\u#\h> '
(The %N date directive does not seem to be implemented as part of \D{...} expansion with bash 4.4. This is a pity since we only have a resolution in single second units.)
Since PS0 is only evaluated and printed if there is a command to execute, the PS1calc flag is set to 1 to do the time difference (following the command) in PS1 expansion or not (PS1calc being 0 means PS0 was not previously expanded and so didn't re-evaluate PS1time). PS1 then resets PS1calc to 0. In this way an empty line (just hitting return) doesn't accumulate seconds between return key presses.
One nice thing about this method is that there is no output when you have set -x active. No subshells or temporary files in sight: everything is done within the bash process itself.
I took this as puzzle and want to show the result of my puzzling:
First I fiddled with time measurement. The date +%s.%N (which I didn't realize before) was where I started from. Unfortunately, it seems that bashs arithmetic evaluation seems not to support floating points. Thus, I chosed something else:
$ START=$(date +%s.%N)
$ awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$START') }' /dev/null
8.059526s
$
This is sufficient to compute the time difference.
Next, I confirmed what you already described: sub-shell invocation prevents usage of shell variables. Thus, I thought about where else I could store the start time which is global for sub-shells but local enough to be used in multiple interactive shells concurrently. My solution are temp. files (in /tmp). To provide a unique name I came up with this pattern: /tmp/$USER.START.$BASHPID.
$ date +%s.%N >/tmp/$USER.START.$BASHPID ; \
> awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$(cat /tmp/$USER.START.$BASHPID)') }' /dev/null
cat: /tmp/ds32737.START.11756: No such file or directory
awk: cmd. line:1: BEGIN { printf("%fs", 1491297723.111219300 - ) }
awk: cmd. line:1: ^ syntax error
$
Damn! Again I'm trapped in the sub-shell issue. To come around this, I defined another variable:
$ INTERACTIVE_BASHPID=$BASHPID
$ date +%s.%N >/tmp/$USER.START.$INTERACTIVE_BASHPID ; \
> awk 'BEGIN { printf("%fs", '$(date +%s.%N)' - '$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)') }' /dev/null
0.075319s
$
Next step: fiddle this together with PS0 and PS1. In a similar puzzle (SO: How to change bash prompt color based on exit code of last command?), I already mastered the "quoting hell". Thus, I should be able to do it again:
$ PS0='$(date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}")'
$ PS1='$(awk "BEGIN { printf(\"%fs\", "$(date +%s.%N)" - "$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)") }" /dev/null)'"$PS1"
0.118550s
$
Ahh. It starts to work. Thus, there is only one issue - to find the right start-up script for the initialization of INTERACTIVE_BASHPID. I found ~/.bashrc which seems to be the right one for this, and which I already used in the past for some other personal customizations.
So, putting it all together - these are the lines I added to my ~/.bashrc:
# command duration puzzle
INTERACTIVE_BASHPID=$BASHPID
date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}"
PS0='$(date +%s.%N >"/tmp/${USER}.START.${INTERACTIVE_BASHPID}")'
PS1='$(awk "BEGIN { printf(\"%fs\", "$(date +%s.%N)" - "$(cat /tmp/$USER.START.$INTERACTIVE_BASHPID)") }" /dev/null)'"$PS1"
The 3rd line (the date command) has been added to solve another issue. Comment it out and start a new interactive bash to find out why.
A snapshot of my cygwin xterm with bash where I added the above lines to ./~bashrc:
Notes:
I consider this rather as solution to a puzzle than a "serious productive" solution. I'm sure that this kind of time measurement consumes itself a lot of time. The time command might provide a better solution: SE: How to get execution time of a script effectively?. However, this was a nice lecture for practicing the bash...
Don't forget that this code pollutes your /tmp directory with a growing number of small files. Either clean-up the /tmp from time to time or add the appropriate commands for clean-up (e.g. to ~/.bash_logout).
Arithmetic expansion runs in the current process and can assign to variables. It also produces output, which you can consume with something like \e[$((...,0))m (to output \e[0m) or ${t:0:$((...,0))} (to output nothing, which is presumably better). 64-bit integer support in Bash supports will count POSIX nanoseconds until the year 2262.
$ PS0='${t:0:$((t=$(date +%s%N),0))}'
$ PS1='$((( t )) && printf %d.%09ds $((t=$(date +%s%N)-t,t/1000000000)) $((t%1000000000)))${t:0:$((t=0))}\n$ '
0.053282161s
$ sleep 1
1.064178281s
$
$
PS0 is not evaluated for empty commands, which leaves a blank line (I'm not sure if you can conditionally print the \n without breaking things). You can work around that by switching to PROMPT_COMMAND instead (which also saves a fork):
$ PS0='${t:0:$((t=$(date +%s%N),0))}'
$ PROMPT_COMMAND='(( t )) && printf %d.%09ds\\n $((t=$(date +%s%N)-t,t/1000000000)) $((t%1000000000)); t=0'
0.041584565s
$ sleep 1
1.077152833s
$
$
That said, if you do not require sub-second precision, I would suggest using $SECONDS instead (which is also more likely to return a sensible answer if something sets the time).
As correctly stated in the question, PS0 runs inside a sub-shell which makes it unusable for this purpose of setting the start time.
Instead, one can use the history command with epoch seconds %s and the built-in variable $EPOCHSECONDS to calculate when the command finished by leveraging only $PROMPT_COMMAND.
# Save start time before executing command (does not work due to PS0 sub-shell)
# preexec() {
# STARTTIME=$EPOCHSECONDS
# }
# PS0=preexec
# Save end time, without duplicating commands when pressing Enter on an empty line
precmd() {
local st=$(HISTTIMEFORMAT='%s ' history 1 | awk '{print $2}');
if [[ -z "$STARTTIME" || (-n "$STARTTIME" && "$STARTTIME" -ne "$st") ]]; then
ENDTIME=$EPOCHSECONDS
STARTTIME=$st
else
ENDTIME=0
fi
}
__timeit() {
precmd;
if ((ENDTIME - STARTTIME >= 0)); then
printf 'Command took %d seconds.\n' "$((ENDTIME - STARTTIME))";
fi
# Do not forget your:
# - OSC 0 (set title)
# - OSC 777 (notification in gnome-terminal, urxvt; note, this one has preexec and precmd as OSC 777 features)
# - OSC 99 (notification in kitty)
# - OSC 7 (set url) - out of scope for this question
}
export PROMPT_COMMAND=__timeit
Note: If you have ignoredups in your $HISTCONTROL, then this will not report back for a command that is re-run.
Following #SherylHohman use of variables in PS0 I've come with this complete script. I've seen you don't need a PS0Time flag as PS0Calc doesn't exists on empty prompts so _elapsed funct just exit.
#!/bin/bash
# string preceding ms, use color code or ascii
_ELAPTXT=$'\E[1;33m \uf135 '
# extract time
_printtime () {
local _var=${EPOCHREALTIME/,/};
echo ${_var%???}
}
# get diff time, print it and end color codings if any
_elapsed () {
[[ -v "${1}" ]] || ( local _VAR=$(_printtime);
local _ELAPSED=$(( ${_VAR} - ${1} ));
echo "${_ELAPTXT}$(_formatms ${_ELAPSED})"$'\n\e[0m' )
}
# format _elapsed with simple string substitution
_formatms () {
local _n=$((${1})) && case ${_n} in
? | ?? | ???)
echo $_n"ms"
;;
????)
echo ${_n:0:1}${_n:0,-3}"ms"
;;
?????)
echo ${_n:0:2}","${_n:0,-3}"s"
;;
??????)
printf $((${_n:0:3}/60))m+$((${_n:0:3}%60)),${_n:0,-3}"s"
;;
???????)
printf $((${_n:0:4}/60))m$((${_n:0:4}%60))s${_n:0,-3}"ms"
;;
*)
printf "too much!"
;;
esac
}
# prompts
PS0='${PS1:(PS0time=$(_printtime)):0}'
PS1='$(_elapsed $PS0time)${PS0:(PS0time=0):0}\u#\h:\w\$ '
img of result
Save it as _prompt and source it to try:
source _prompt
Change text, ascii codes and colors in _ELAPTXT
_ELAPTXT='\e[33m Elapsed time: '

bash loop taking extremely long time

I have a list of times that I am looping through in the format HH:MM:SS to find the nearest but not past time. The code that I have is:
for i in ${times[#]}; do
hours=$(echo $i | sed 's/\([0-9]*\):.*/\1/g')
minutes=$(echo $i | sed 's/.*:\([0-9]*\):.*/\1/g')
currentHours=$(date +"%H")
currentMinutes=$(date +"%M")
if [[ hours -ge currentHours ]]; then
if [[ minutes -ge currentMinutes ]]; then
break
fi
fi
done
The variable times is an array of all the times that I am sorting through (its about 20-40 lines). I'd expect this to take less than 1 second however it is taking upwards of 5 seconds. Any suggestions for decreasing the time of the regular expression would be appreciated.
times=($(cat file.txt))
Here is a list of the times that are stored in a text file and are imported into the times variable using the above line of code.
6:05:00
6:35:00
7:05:00
7:36:00
8:08:00
8:40:00
9:10:00
9:40:00
10:11:00
10:41:00
11:11:00
11:41:00
12:11:00
12:41:00
13:11:00
13:41:00
14:11:00
14:41:00
15:11:00
15:41:00
15:56:00
16:11:00
16:26:00
16:41:00
16:58:00
17:11:00
17:26:00
17:41:00
18:11:00
18:41:00
19:10:00
19:40:00
20:10:00
20:40:00
21:15:00
21:45:00
One of the key things to understand in looking at bash scripts from a performance perspective is that while the bash interpreter is somewhat slow, the act of spawning an external process is extremely slow. Thus, while it can often speed up your scripts to use a single invocation of awk or sed to process a large stream of input, starting those invocations inside a tight loop will greatly outweigh the performance of those tools once they're running.
Any command substitution -- $() -- causes a second copy of the interpreter to be fork()ed off as a subshell. Invoking any command not built into bash -- date, sed, etc -- then causes a subprocess to be fork()ed off for that process, and then the executable associated with that process to be exec()'d -- something involves a great deal of OS-level overhead (the binary needs to be linked, loaded, etc).
This loop would be better written as:
IFS=: read -r currentHours currentMinutes < <(date +"%H:%M")
while IFS=: read -r hours minutes _; do
if (( hours >= currentHours )) && (( minutes >= currentMinutes )); then
break
fi
done <file.txt
In this form only one external command is run, date +"%H:%M", outside the loop. If you were only targeting bash 4.2 and newer (with built-in time formatting support), even this would be unnecessary:
printf -v currentHours '%(%H)T' -1
printf -v currentMinutes '%(%M)T' -1
...will directly place the current hour and minute into the variables currentHours and currentMinutes using only functionality built into modern bash releases.
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
BashFAQ #100 - How can I do native string manipulations in bash? (Subsection: "Splitting a string into fields")
To be honest I'm not sure why it's taking an extremely long time but there are certainly some things which could be made more efficient.
currentHours=$(date +"%H")
currentMinutes=$(date +"%M")
for time in "${times[#]}"; do
IFS=: read -r hours minutes seconds <<<"$time"
if [[ hours -ge currentHours && minutes -ge currentMinutes ]]; then
break
fi
done
This uses read, a built-in command, to split the text into variables, rather than calling external commands and creating subshells.
I assume that you want the script to run so quickly that it's safe to reuse currentHours and currentMinutes within the loop.
Note that you can also just use awk to do the whole thing:
awk -F: -v currentHours="$(date +"%H") -v currentMinutes="$(date +"%M")" '
$1 >= currentHours && $2 >= currentMinutes { print; exit }' file.txt
Just to make the program produce some output, I added a print, so that the last line is printed.
awk to the rescue!
awk -v time="12:12:00" '
function pad(x) {split(x,ax,":"); return (ax[1]<10)?"0"x:x}
BEGIN {time=pad(time)}
time>pad($0) {next}
{print; exit}' times
12:41:00
with 0 padding the hour you can do string only comparison.

Infinite 'for' loop with Bash

I have a script like
#!/bin/bash
for i in {1..xx};do break="$i"
If....; then Some command
else break;fi
done
I need something which can repeat this script n times with incrementing $i.
I tried this:
For (( ; ; )); do i=1 && echo $i && ((i++));done
But this always shows 1, not an incrementing number. I also tried $((i+=1)).
Where xx is must be endless number.
Where break="$i" gives me how many times repeated script.
Using for to create an endless loop is unidiomatic, but not hard. Just make the ending condition never true; or, trivially, omit it.
for((i=0; ;++i)); do
echo "$i"
done
The above is Bash only. The usual solution, which works in POSIX sh too, is to use while true (but then that doesn't come with an incrementing index, if that's really what you need).

For loop with an argument based range

I want to run certain actions on a group of lexicographically named files (01-09 before 10). I have to use a rather old version of FreeBSD (7.3), so I can't use yummies like echo {01..30} or seq -w 1 30.
The only working solution I found is printf "%02d " {1..30}. However, I can't figure out why can't I use $1 and $2 instead of 1 and 30. When I run my script (bash ~/myscript.sh 1 30) printf says {1..30}: invalid number
AFAIK, variables in bash are typeless, so how can't printf accept an integer argument as an integer?
Bash supports C-style for loops:
s=1
e=30
for i in ((i=s; i<e; i++)); do printf "%02d " "$i"; done
The syntax you attempted doesn't work because brace expansion happens before parameter expansion, so when the shell tries to expand {$1..$2}, it's still literally {$1..$2}, not {1..30}.
The answer given by #Kent works because eval goes back to the beginning of the parsing process. I tend to suggest avoiding making habitual use of it, as eval can introduce hard-to-recognize bugs -- if your command were whitelisted to be run by sudo and $1 were, say, '$(rm -rf /; echo 1)', the C-style-for-loop example would safely fail, and the eval example... not so much.
Granted, 95% of the scripts you write may not be accessible to folks executing privilege escalation attacks, but the remaining 5% can really ruin one's day; following good practices 100% of the time avoids being in sloppy habits.
Thus, if one really wants to pass a range of numbers to a single command, the safe thing is to collect them in an array:
a=( )
for i in ((i=s; i<e; i++)); do a+=( "$i" ); done
printf "%02d " "${a[#]}"
I guess you are looking for this trick:
#!/bin/bash
s=1
e=30
printf "%02d " $(eval echo {$s..$e})
Ok, I finally got it!
#!/bin/bash
#BSD-only iteration method
#for day in `jot $1 $2`
for ((day=$1; day<$2; day++))
do
echo $(printf %02d $day)
done
I initially wanted to use the cycle iterator as a "day" in file names, but now I see that in my exact case it's easier to iterate through normal numbers (1,2,3 etc.) and process them into lexicographical ones inside the loop. While using jot, remember that $1 is the numbers amount, and the $2 is the starting point.

Resources