Can i cache the output of a command on Linux from CLI? - bash

I'm looking for an implementation of a 'cacheme' command, which 'memoizes' the output (stdout) of whatever has in ARGV. If it never ran it, it will run it and somewhat memorize the output. If it ran it, it will just copy the output of the file (or even better, both output and error to &1 and &2 respectively).
Let's suppose someone wrote this command, it would work like this.
$ time cacheme sleep 1 # first time it takes one sec
real 0m1.228s
user 0m0.140s
sys 0m0.040s
$ time cacheme sleep 1 # second time it looks for stdout in the cache (dflt expires in 1h)
#DEBUG# Cache version found! (1 minute old)
real 0m0.100s
user 0m0.100s
sys 0m0.040s
This example is a bit silly because it has no output. Ideally it would be tested on a script like sleep-1-and-echo-hello-world.sh.
I created a small script that creates a file in /tmp/ with hash of full command name and username, but I'm pretty sure something already exists.
Are you aware of any of this?
Note. Why I would do this? Occasionally I run commands that are network or compute intensive, they take minutes to run and the output doesn't change much. If I know it in advance I'd just prepend a cacheme <cmd>, go for dinner and when i'm back I can just rerun the SAME command over and over on the same machine and get the same answer in an instance.

Improved solution above somewhat by also adding expiry age as optional argument.
#!/bin/sh
# save as e.g. $HOME/.local/bin/cacheme
# and then chmod u+x $HOME/.local/bin/cacheme
VERBOSE=false
PROG="$(basename $0)"
DIR="${HOME}/.cache/${PROG}"
mkdir -p "${DIR}"
EXPIRY=600 # default to 10 minutes
# check if first argument is a number, if so use it as expiration (seconds)
[ "$1" -eq "$1" ] 2>/dev/null && EXPIRY=$1 && shift
[ "$VERBOSE" = true ] && echo "Using expiration $EXPIRY seconds"
CMD="$#"
HASH=$(echo "$CMD" | md5sum | awk '{print $1}')
CACHE="$DIR/$HASH"
test -f "${CACHE}" && [ $(expr $(date +%s) - $(date -r "$CACHE" +%s)) -le $EXPIRY ] || eval "$CMD" > "${CACHE}"
cat "${CACHE}"

I've implemented a simple caching script for bash, because I wanted to speed up plotting from piped shell command in gnuplot. It can be used to cache output of any command. Cache is used as long as the arguments are the same and files passed in arguments haven't changed. System is responsible for cleaning up.
#!/bin/bash
# hash all arguments
KEY="$#"
# hash last modified dates of any files
for arg in "$#"
do
if [ -f $arg ]
then
KEY+=`date -r "$arg" +\ %s`
fi
done
# use the hash as a name for temporary file
FILE="/tmp/command_cache.`echo -n "$KEY" | md5sum | cut -c -10`"
# use cached file or execute the command and cache it
if [ -f $FILE ]
then
cat $FILE
else
$# | tee $FILE
fi
You can name the script cache, set executable flag and put it in your PATH. Then simply prefix any command with cache to use it.

Author of bash-cache here with an update. I recently published bkt, a CLI and Rust library for subprocess caching. Here's a simple example:
# Execute and cache an invocation of 'date +%s.%N'
$ bkt -- date +%s.%N
1631992417.080884000
# A subsequent invocation reuses the same cached output
$ bkt -- date +%s.%N
1631992417.080884000
It supports a number of features such as asynchronous refreshing (--stale and --warm), namespaced caches (--scope), and optionally keying off the working directory (--cwd) and select environment variables (--env). See the README for more.
It's still a work in progress but it's functional and effective! I'm using it already to speed up my shell prompt and a number of other common tasks.

I created bash-cache, a memoization library for Bash, which works exactly how you're describing. It's designed specifically to cache Bash functions, but obviously you can wrap calls to other commands in functions.
It handles a number of edge-case behaviors that many simpler caching mechanisms miss. It reports the exit code of the original call, keeps stdout and stderr separately, and retains any trailing whitespace in the output ($() command substitutions will truncate trailing whitespace).
Demo:
# Define function normally, then decorate it with bc::cache
$ maybe_sleep() {
sleep "$#"
echo "Did I sleep?"
} && bc::cache maybe_sleep
# Initial call invokes the function
$ time maybe_sleep 1
Did I sleep?
real 0m1.047s
user 0m0.000s
sys 0m0.020s
# Subsequent call uses the cache
$ time maybe_sleep 1
Did I sleep?
real 0m0.044s
user 0m0.000s
sys 0m0.010s
# Invocations with different arguments are cached separately
$ time maybe_sleep 2
Did I sleep?
real 0m2.049s
user 0m0.000s
sys 0m0.020s
There's also a benchmark function that shows the overhead of the caching:
$ bc::benchmark maybe_sleep 1
Original: 1.007
Cold Cache: 1.052
Warm Cache: 0.044
So you can see the read/write overhead (on my machine, which uses tmpfs) is roughly 1/20th of a second. This benchmark utility can help you decide whether it's worth caching a particular call or not.

How about this simple shell script (not tested)?
#!/bin/sh
mkdir -p cache
cachefile=cache/cache
for i in "$#"
do
cachefile=${cachefile}_$(printf %s "$i" | sed 's/./\\&/g')
done
test -f "$cachefile" || "$#" > "$cachefile"
cat "$cachefile"

Improved upon solution from error:
Pipes output into the "tee" command which allows it to be viewed real-time as well as stored in the cache.
Preserve colors (for example in commands like "ls --color") by using "script --flush --quiet /dev/null --command $CMD".
Avoid calling "exec" by using script as well
Use bash and [[
#!/usr/bin/env bash
CMD="$#"
[[ -z $CMD ]] && echo "usage: EXPIRY=600 cache cmd arg1 ... argN" && exit 1
# set -e -x
VERBOSE=false
PROG="$(basename $0)"
EXPIRY=${EXPIRY:-600} # default to 10 minutes, can be overriden
EXPIRE_DATE=$(date -Is -d "-$EXPIRY seconds")
[[ $VERBOSE = true ]] && echo "Using expiration $EXPIRY seconds"
HASH=$(echo "$CMD" | md5sum | awk '{print $1}')
CACHEDIR="${HOME}/.cache/${PROG}"
mkdir -p "${CACHEDIR}"
CACHEFILE="$CACHEDIR/$HASH"
if [[ -e $CACHEFILE ]] && [[ $(date -Is -r "$CACHEFILE") > $EXPIRE_DATE ]]; then
cat "$CACHEFILE"
else
script --flush --quiet --return /dev/null --command "$CMD" | tee "$CACHEFILE"
fi

The solution I came up in ruby is this. Does anybody see any optimization?
#!/usr/bin/env ruby
VER = '1.2'
$time_cache_secs = 3600
$cache_dir = File.expand_path("~/.cacheme")
require 'rubygems'
begin
require 'filecache' # gem install ruby-cache
rescue Exception => e
puts 'gem filecache requires installation, sorry. trying to install myself'
system 'sudo gem install -r filecache'
puts 'Try re-running the program now.'
exit 1
end
=begin
# create a new cache called "my-cache", rooted in /home/simon/caches
# with an expiry time of 30 seconds, and a file hierarchy three
# directories deep
=end
def main
cache = FileCache.new("cache3", $cache_dir, $time_cache_secs, 3)
cmd = ARGV.join(' ').to_s # caching on full command, note that quotes are stripped
cmd = 'echo give me an argment' if cmd.length < 1
# caches the command and retrieves it
if cache.get('output' + cmd)
#deb "Cache found!(for '#{cmd}')"
else
#deb "Cache not found! Recalculating and setting for the future"
cache.set('output' + cmd, `#{cmd}`)
end
#deb 'anyway calling the cache now'
print(cache.get('output' + cmd))
end
main

An implementation exists here: https://bitbucket.org/sivann/runcached/src
Caches executable path, output, exit code, remembers arguments. Configurable expiration. Implemented in bash, C, python, choose whatever suits you.

Related

Configure bash_history to show the pwd of each command [duplicate]

I'd like to save the current directory where the each command was issued alongside the command in the history. In order not to mess things up, I was thinking about adding the current directory as a comment at the end of the line. An example might help:
$ cd /usr/local/wherever
$ grep timmy accounts.txt
I'd like bash to save the last command as:
grep timmy accounts.txt # /usr/local/wherever
The idea is that this way I could immediately see where I issued the command.
One-liner version
Here is a one-liner version. It's the original. I've also posted a short function version and a long function version with several added features. I like the function versions because they won't clobber other variables in your environment and they're much more readable than the one-liner. This post has some information on how they all work which may not be duplicated in the others.
Add the following to your ~/.bashrc file:
export PROMPT_COMMAND='hpwd=$(history 1); hpwd="${hpwd# *[0-9]* }"; if [[ ${hpwd%% *} == "cd" ]]; then cwd=$OLDPWD; else cwd=$PWD; fi; hpwd="${hpwd% ### *} ### $cwd"; history -s "$hpwd"'
This makes a history entry that looks like:
rm subdir/file ### /some/dir
I use ### as a comment delimiter to set it apart from comments that the user might type and to reduce the chance of collisions when stripping old path comments that would otherwise accumulate if you press enter on a blank command line. Unfortunately, the side affect is that a command like echo " ### " gets mangled, although that should be fairly rare.
Some people will find the fact that I reuse the same variable name to be unpleasant. Ordinarily I wouldn't, but here I'm trying to minimize the footprint. It's easily changed in any case.
It blindly assumes that you aren't using HISTTIMEFORMAT or modifying the history in some other way. It would be easy to add a date command to the comment in lieu of the HISTTIMEFORMAT feature. However, if you need to use it for some reason, it still works in a subshell since it gets unset automatically:
$ htf="%Y-%m-%d %R " # save it for re-use
$ (HISTTIMEFORMAT=$htf; history 20)|grep 11:25
There are a couple of very small problems with it. One is if you use the history command like this, for example:
$ history 3
echo "hello world" ### /home/dennis
ls -l /tmp/file ### /home/dennis
history 3
The result will not show the comment on the history command itself, even though you'll see it if you press up-arrow or issue another history command.
The other is that commands with embedded newlines leave an uncommented copy in the history in addition to the commented copy.
There may be other problems that show up. Let me know if you find any.
How it works
Bash executes a command contained in the PROMPT_COMMAND variable each time the PS1 primary prompt is issued. This little script takes advantage of that to grab the last command in the history, add a comment to it and save it back.
Here it is split apart with comments:
hpwd=$(history 1) # grab the most recent command
hpwd="${hpwd# *[0-9]* }" # strip off the history line number
if [[ ${hpwd%% *} == "cd" ]] # if it's a cd command, we want the old directory
then # so the comment matches other commands "where *were* you when this was done?"
cwd=$OLDPWD
else
cwd=$PWD
fi
hpwd="${hpwd% ### *} ### $cwd" # strip off the old ### comment if there was one so they
# don't accumulate, then build the comment
history -s "$hpwd" # replace the most recent command with itself plus the comment
hcmnt - long function version
Here is a long version in the form of a function. It's a monster, but it adds several useful features. I've also posted a one-liner (the original) and a shorter function. I like the function versions because they won't clobber other variables in your environment and
they're much more readable than the one-liner. Read the entry for the one-liner and the commments in the function below for additional information on how it works and some limitations. I've posted each version in its own answer in order to keep things more organized.
To use this one, save it in a file called hcmnt in a location like /usr/local/bin (you can chmod +x it if you want) then source it in your ~/.bashrc like this:
source /usr/local/bin/hcmnt
export hcmntextra='date "+%Y%m%d %R"'
export PROMPT_COMMAND='hcmnt'
Don't edit the function's file where PROMPT_COMMAND or hcmntextra are set. Leave them as is so they remain as defaults. Include them in your .bashrc as shown above and edit them there to set options for hcmnt or to change or unset hcmntextra. Unlike the short function, with this one you must both have the hcmntextra variable set and use the -e option to make that feature work.
You can add several options which are documented (with a couple of examples) in the comments in the function. One notable feature is to have the history entry with appended comment logged to a file and leave the actual history untouched. In order to use this function, just
add the -l filename option like so:
export PROMPT_COMMAND="hcmnt -l ~/histlog"
You can use any combination of options, except that -n and -t are mutually exclusive.
#!/bin/bash
hcmnt() {
# adds comments to bash history entries (or logs them)
# by Dennis Williamson - 2009-06-05 - updated 2009-06-19
# http://stackoverflow.com/questions/945288/saving-current-directory-to-bash-history
# (thanks to Lajos Nagy for the idea)
# the comments can include the directory
# that was current when the command was issued
# plus optionally, the date or other information
# set the bash variable PROMPT_COMMAND to the name
# of this function and include these options:
# -e - add the output of an extra command contained in the hcmntextra variable
# -i - add ip address of terminal that you are logged in *from*
# if you're using screen, the screen number is shown
# if you're directly logged in, the tty number or X display number is shown
# -l - log the entry rather than replacing it in the history
# -n - don't add the directory
# -t - add the from and to directories for cd commands
# -y - add the terminal device (tty)
# text or a variable
# Example result for PROMPT_COMMAND='hcmnt -et $LOGNAME'
# when hcmntextra='date "+%Y%m%d %R"'
# cd /usr/bin ### mike 20090605 14:34 /home/mike -> /usr/bin
# Example for PROMPT_COMMAND='hcmnt'
# cd /usr/bin ### /home/mike
# Example for detailed logging:
# when hcmntextra='date "+%Y%m%d %R"'
# and PROMPT_COMMAND='hcmnt -eityl ~/.hcmnt.log $LOGNAME#$HOSTNAME'
# $ tail -1 ~/.hcmnt.log
# cd /var/log ### dave#hammerhead /dev/pts/3 192.168.1.1 20090617 16:12 /etc -> /var/log
# INSTALLATION: source this file in your .bashrc
# will not work if HISTTIMEFORMAT is used - use hcmntextra instead
export HISTTIMEFORMAT=
# HISTTIMEFORMAT still works in a subshell, however, since it gets unset automatically:
# $ htf="%Y-%m-%d %R " # save it for re-use
# $ (HISTTIMEFORMAT=$htf; history 20)|grep 11:25
local script=$FUNCNAME
local hcmnt=
local cwd=
local extra=
local text=
local logfile=
local options=":eil:nty"
local option=
OPTIND=1
local usage="Usage: $script [-e] [-i] [-l logfile] [-n|-t] [-y] [text]"
local newline=$'\n' # used in workaround for bash history newline bug
local histline= # used in workaround for bash history newline bug
local ExtraOpt=
local LogOpt=
local NoneOpt=
local ToOpt=
local tty=
local ip=
# *** process options to set flags ***
while getopts $options option
do
case $option in
e ) ExtraOpt=1;; # include hcmntextra
i ) ip="$(who --ips -m)" # include the terminal's ip address
ip=($ip)
ip="${ip[4]}"
if [[ -z $ip ]]
then
ip=$(tty)
fi;;
l ) LogOpt=1 # log the entry
logfile=$OPTARG;;
n ) if [[ $ToOpt ]]
then
echo "$script: can't include both -n and -t."
echo $usage
return 1
else
NoneOpt=1 # don't include path
fi;;
t ) if [[ $NoneOpt ]]
then
echo "$script: can't include both -n and -t."
echo $usage
return 1
else
ToOpt=1 # cd shows "from -> to"
fi;;
y ) tty=$(tty);;
: ) echo "$script: missing filename: -$OPTARG."
echo $usage
return 1;;
* ) echo "$script: invalid option: -$OPTARG."
echo $usage
return 1;;
esac
done
text=($#) # arguments after the options are saved to add to the comment
text="${text[*]:$OPTIND - 1:${#text[*]}}"
# *** process the history entry ***
hcmnt=$(history 1) # grab the most recent command
# save history line number for workaround for bash history newline bug
histline="${hcmnt% *}"
hcmnt="${hcmnt# *[0-9]* }" # strip off the history line number
if [[ -z $NoneOpt ]] # are we adding the directory?
then
if [[ ${hcmnt%% *} == "cd" ]] # if it's a cd command, we want the old directory
then # so the comment matches other commands "where *were* you when this was done?"
if [[ $ToOpt ]]
then
cwd="$OLDPWD -> $PWD" # show "from -> to" for cd
else
cwd=$OLDPWD # just show "from"
fi
else
cwd=$PWD # it's not a cd, so just show where we are
fi
fi
if [[ $ExtraOpt && $hcmntextra ]] # do we want a little something extra?
then
extra=$(eval "$hcmntextra")
fi
# strip off the old ### comment if there was one so they don't accumulate
# then build the string (if text or extra aren't empty, add them plus a space)
hcmnt="${hcmnt% ### *} ### ${text:+$text }${tty:+$tty }${ip:+$ip }${extra:+$extra }$cwd"
if [[ $LogOpt ]]
then
# save the entry in a logfile
echo "$hcmnt" >> $logfile || echo "$script: file error." ; return 1
else
# workaround for bash history newline bug
if [[ $hcmnt != ${hcmnt/$newline/} ]] # if there a newline in the command
then
history -d $histline # then delete the current command so it's not duplicated
fi
# replace the history entry
history -s "$hcmnt"
fi
} # END FUNCTION hcmnt
# set a default (must use -e option to include it)
export hcmntextra='date "+%Y%m%d %R"' # you must be really careful to get the quoting right
# start using it
export PROMPT_COMMAND='hcmnt'
update 2009-06-19: Added options useful for logging (ip and tty), a workaround for the duplicate entry problem, removed extraneous null assignments
You could install Advanced Shell History, an open source tool that writes your bash or zsh history to a sqlite database. This records things like the current working directory, the command exit code, command start and stop times, session start and stop times, tty, etc.
If you want to query the history database, you can write your own SQL queries, save them and make them available within the bundled ash_query tool. There are a few useful prepackaged queries, but since I know SQL pretty well, I usually just open the database and query interactively when I need to look for something.
One query I find very useful, though, is looking at the history of the current working directory. It helps me remember where I left off when I was working on something.
vagrant#precise32:~$ ash_query -q CWD
session
when what
1
2014-08-27 17:13:07 ls -la
2014-08-27 17:13:09 cd .ash
2014-08-27 17:16:27 ls
2014-08-27 17:16:33 rm -rf advanced-shell-history/
2014-08-27 17:16:35 ls
2014-08-27 17:16:37 less postinstall.sh
2014-08-27 17:16:57 sudo reboot -n
And the same history using the current working directory (and anything below it):
vagrant#precise32:~$ ash_query -q RCWD
session
where
when what
1
/home/vagrant/advanced-shell-history
2014-08-27 17:11:34 nano ~/.bashrc
2014-08-27 17:12:54 source /usr/lib/advanced_shell_history/bash
2014-08-27 17:12:57 source /usr/lib/advanced_shell_history/bash
2014-08-27 17:13:05 cd
/home/vagrant
2014-08-27 17:13:07 ls -la
2014-08-27 17:13:09 cd .ash
/home/vagrant/.ash
2014-08-27 17:13:10 ls
2014-08-27 17:13:11 ls -l
2014-08-27 17:13:16 sqlite3 history.db
2014-08-27 17:13:43 ash_query
2014-08-27 17:13:50 ash_query -Q
2014-08-27 17:13:56 ash_query -q DEMO
2014-08-27 17:14:39 ash_query -q ME
2014-08-27 17:16:26 cd
/home/vagrant
2014-08-27 17:16:27 ls
2014-08-27 17:16:33 rm -rf advanced-shell-history/
2014-08-27 17:16:35 ls
2014-08-27 17:16:37 less postinstall.sh
2014-08-27 17:16:57 sudo reboot -n
FWIW - I'm the author and maintainer of the project.
hcmnts - short function version
Here is a short version in the form of a function. I've also posted a one-liner (the original) and a longer function with several added features. I like the function versions because they won't clobber other variables in your environment and they're much more readable than the one-liner. Read the entry for the one-liner for additional information on how this works and some limitations. I've posted each version in its own answer in order to keep things more organized.
To use this one, save it in a file called hcmnts in a location like /usr/local/bin (you can chmod +x it if you want) then source it in your ~/.bashrc like this:
source /usr/local/bin/hcmnts
Comment out the line that sets hcmntextra if you don't want the date and time (or you can change its format or use some other command besides date).
That's all there is to it.
#!/bin/bash
hcmnts() {
# adds comments to bash history entries
# the *S*hort version of hcmnt (which has many more features)
# by Dennis Williamson
# http://stackoverflow.com/questions/945288/saving-current-directory-to-bash-history
# (thanks to Lajos Nagy for the idea)
# INSTALLATION: source this file in your .bashrc
# will not work if HISTTIMEFORMAT is used - use hcmntextra instead
export HISTTIMEFORMAT=
# HISTTIMEFORMAT still works in a subshell, however, since it gets unset automatically:
# $ htf="%Y-%m-%d %R " # save it for re-use
# $ (HISTTIMEFORMAT=$htf; history 20)|grep 11:25
local hcmnt
local cwd
local extra
hcmnt=$(history 1)
hcmnt="${hcmnt# *[0-9]* }"
if [[ ${hcmnt%% *} == "cd" ]]
then
cwd=$OLDPWD
else
cwd=$PWD
fi
extra=$(eval "$hcmntextra")
hcmnt="${hcmnt% ### *}"
hcmnt="$hcmnt ### ${extra:+$extra }$cwd"
history -s "$hcmnt"
}
export hcmntextra='date +"%Y%m%d %R"'
export PROMPT_COMMAND='hcmnts'
For those who want this in zsh I've modified Jeet Sukumaran's implementation and percol to allow interactive keyword searching and extraction of either the command or path it was executed in. It's also possible to filter out duplicate commands and hide fields (date, command, path)
Here's a one liner of what I use. Sticking it here because it's vastly simpler, and I have no problem with per-session history, I just also want to have a history with the working directory.
Also the one-liner above mucks with your user interface too much.
export PROMPT_COMMAND='if [ "$(id -u)" -ne 0 ]; then echo "$(date "+%Y-%m-%d.%H:%M:%S") $(pwd) $(history 1)" >> ~/.bash.log; fi'
Since my home dir is typically a cross-mounted gluster thingy, this has the side effect of being a history of everything I've ever done. Optionally add $(hostname) to the echo command above... depending on your working environment.
Even with 100k entries, grep is more than good enough. No need to sqlite log it. Just don't type passwords on the command line and you're good. Passwords are 90's tech anyway!
Also, for searching I tend to do this:
function hh() {
grep "$1" ~/.bash.log
}
Gentleman this works better.. The only thing I can not figure out is how to make the script NOT log to syslog on login and log the last command in history. But works like a charm so far.
#!/bin/bash
trackerbash() {
# adds comments to bash history entries
# by Dennis Williamson
# http://stackoverflow.com/questions/945288/saving-current-directory-to-bash-history
# (thanks to Lajos Nagy for the idea)
#Supper Enhanced by QXT
# INSTALLATION: source this file in your .bashrc
export HISTTIMEFORMAT=
# export HISTTIMEFORMAT='%F %T '
local hcmnt
local cwd
local extra
local thistty
local whoiam
local sudouser
local shelldate
local TRACKIP
local TRACKHOST
thistty=`/usr/bin/tty|/bin/cut -f3-4 -d/`
whoiam=`/usr/bin/whoami`
sudouser=`last |grep $thistty |head -1 | awk '{ print $1 }' |cut -c 1-10`
hcmnt=$(history 1)
hcmnt="${hcmnt# *[0-9]* }"
cwd=`pwd`
hcmnt="${hcmnt% ### *}"
hcmnt=" $hcmnt ${extra:+$extra }"
shelldate=`date +"%Y %b %d %R:%S"`
TRACKHOST=`whoami | sed -r "s/.*\((.*)\).*/\\1/"`
TRACKIP=`last |grep $thistty |head -1 | awk '{ print $3 }'`
logger -p local1.notice -t bashtracker -i -- "$sudouser ${USER}: $thistty: $TRACKIP: $shelldate: $cwd : $hcmnt"
history -w
}
export PROMPT_COMMAND='trackerbash'
Full disclosure: I'm the author of the FOSS-tool
shournal - A (file-) journal for your shell:
Using it's bash integration, the working directory of a command is also stored within shournal's sqlite-database and can be retrieved via
shournal --query -cmdcwd "$PWD"
Querying for sub-working-directories can be done with
shournal --query -cmdcwd -like "$PWD/%"
You could consider an independent project (I wrote it) that supports saving the path for each command: https://github.com/chrissound/MoscoviumOrange
Where you can add a hook into Bash to save each entry with:
$(jq -n --arg command "$1" --arg path "$PWD" '{"command":$command, "path":$path}' | "$(echo 'readlink -f $(which nc)' | nix run nixpkgs.netcat)" -N -U ~/.config/moscoviumOrange/monitor.soc &)

Optimizing a script that lists available commands with manual pages

I'm using this script to generate a list of the available commands with manual pages on the system. Running this with time shows an average of about 49 seconds on my computer.
#!/usr/local/bin/bash
for x in $(for f in $(compgen -c); do which $f; done | sort -u); do
dir=$(dirname $x)
cmd=$(basename $x)
if [[ ! $(man --path "$cmd" 2>&1) =~ 'No manual entry' ]]; then
printf '%b\n' "${dir}:\n${cmd}"
fi
done | awk '!x[$0]++'
Is there a way to optimize this for faster results?
This is a small sample of my current output. The goal is to group commands by directory. This will later be fed into an array.
/bin: # directories generated by $dir
[ # commands generated by $cmd (compgen output)
cat
chmod
cp
csh
date
Going for a complete disregard of built-ins here. That's what which does, anyway. Script not thoroughly tested.
#!/bin/bash
shopt -s nullglob # need this for "empty" checks below
MANPATH=${MANPATH:-/usr/share/man:/usr/local/share/man}
IFS=: # chunk up PATH and MANPATH, both colon-deliminated
# just look at the directory!
has_man_p() {
local needle=$1 manp manpp result=()
for manp in $MANPATH; do
# man? should match man0..man9 and a bunch of single-char things
# do we need 'man?*' for longer suffixes here?
for manpp in "$manp"/man?; do
# assumption made for filename formats. section not checked.
result=("$manpp/$needle".*)
if (( ${#result[#]} > 0 )); then
return 0
fi
done
done
return 1
}
unset seen
declare -A seen # for deduplication
for p in $PATH; do
printf '%b:\n' "$p" # print the path first
for exe in "$p"/*; do
cmd=${exe##*/} # the sloppy basename
if [[ ! -x $exe || ${seen[$cmd]} == 1 ]]; then
continue
fi
seen["$cmd"]=1
if has_man_p "$cmd"; then
printf '%b\n' "$cmd"
fi
done
done
Time on Cygwin with a truncated PATH (the full one with Windows has too many misses for the original version):
$ export PATH=/usr/local/bin:/usr/bin
$ time (sh ./opti.sh &>/dev/null)
real 0m3.577s
user 0m0.843s
sys 0m2.671s
$ time (sh ./orig.sh &>/dev/null)
real 2m10.662s
user 0m20.138s
sys 1m5.728s
(Caveat for both versions: most stuff in Cygwin's /usr/bin comes with a .exe extension)

wrong output because of backgrounded processes

If I run the script with ./test.sh 100 I do not get the output 100 because I am using a thread. What do I have to do to get the expected output? (I must not change test.sh though.)
test.sh
#!/bin/bash
FILE="number.txt"
echo "0" > $FILE
for (( x=1; x<=$1; x++)); do
exec "./increment.sh" $FILE &
done
wait
cat $FILE
increment.sh
#!/bin/bash
value=(< "$1")
let value++
echo $value > "$1"
EDIT
Well I tried this:
#!/bin/bash
flock $1 --shared 2>/dev/null
value=(< "$1")
let value++
echo $value > "$1"
Now i get something like 98 99 all the time if I use ./test.sh 100
I is not working very well and I do not know how to fix it.
If test.sh really cannot be improved, then each instance of increment.sh must serialize it's own access to $FILE.
Filesystem locking is the obvious solution for this under UNIX. However, there is no shell builtin to accomplish this. Instead, you must rely on an external utility program like flock, setlock, or chpst -l|-L. For example:
#!/bin/bash
(
flock 100 # Lock *exclusively* (not shared)
value=(< "$1")
let value++
echo $value > "$1"
) 100>>"$1" # A note of caution
A note of caution: using the file you'll be modifying as a lockfile gets tricky quickly — it's easy to truncate in shell when you didn't mean to, and the mixing of access modes above might offend some people — but the above avoids gross mistakes.

Concurrent or lock access in bash script function

does anyone know how to lock on a function in bash script?
I wanted to do something like in java (like synchronize), ensuring that each file saved in monitored folder is on hold ever tries to use submit function.
an excerpt from my script:
(...)
ON_EVENT () {
local date = $1
local time = $2
local file = $3
sleep 5
echo "$date $time New file created: $file"
submit $file
}
submit () {
local file = $1
python avsubmit.py -f $file -v
python dbmgr.py -a $file
}
if [ ! -e "$FIFO" ]; then
mkfifo "$FIFO"
fi
inotifywait -m -e "$EVENTS" --timefmt '%Y-%m-%d %H:%M:%S' --format '%T %f' "$DIR" > "$FIFO" &
INOTIFY_PID=$!
trap "on_exit" 2 3 15
while read date time file
do
on_event $date $time $file &
done < "$FIFO"
on_exit
I'm using inotify to monitor a folder when a new file is saved. For each file saved (received), submit to VirusTotal service (avsubmit.py) and TreathExpert (dbmgr.py).
Concurrent access would be ideal to avoid blocking every new file created in monitored folder, but lock submit function should be sufficient.
Thank you guys!
Something like this should work:
if (set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null; then
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
# Your code here
rm -f "$lockfile"
trap - INT TERM EXIT
else
echo "Failed to acquire $lockfile. Held by $(cat $lockfile)"
then
Any code using rm in combination with trap or similar facility is inherently flawed against ungraceful kills, panics, system crashes, newbie sysadmins, etc. The flaw is that the lock needs to be manually cleaned after such catastrophic event for the script to run again. That may or may not be a problem for you. It is a problem for those managing many machines or wishing to have an unplugged vacation once in a while.
A modern solution using a file descriptor lock has been around for a while - I detailed it here and a working example is on the GitHub here. If you do not need to track process ID for whatever monitoring or other reasons, there is an interesting suggestion for a self-lock (I did not try it, not sure of its portability guarantee).
You can use a lock file to determine whether or not the file should be submitted.
Inside your ON_EVENT function, you should check if the appropriate lock file exists before calling the submit function. If it does exist, then return, or sleep and check again later to see if it's gone. If it doesn't exist, then create the lock and call submit. After the submit function completes, then delete the lock file.
See this thread for implementation details.
But I liked that files can not get lock stay on the waiting list (cache) to be submitted then or later.
I currently have something like this:
lockfile="./lock"
on_event() {
local date=$1
local time=$2
local file=$3
sleep 5
echo "$date $time New file created: $file"
if (set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null; then
trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT
submit_samples $file
rm -f "$lockfile"
trap - INT TERM EXIT
else
echo "Failed to acquire lockfile: $lockfile."
echo "Held by $(cat $lockfile)"
fi
}
submit_samples() {
local file=$1
python avsubmit.py -f $file -v
python dbmgr.py -a $file
}
Thank you once again ...
I had proplems wiith this approach and found a better solution:
Procmail comes with a lockfile command which does what I wanted:
lockfile -5 -r10 /tmp/lock.file
do something very important
rm -f /tmp/lock.file
lockfile will try to create the specified lockfile. If it exists it iwll retry in 5 seconds, this will be repeated for maximum 10 times. If can create the flile it goes on with the script.
Another solution are the lockfile-progs in debian, example directly from the man page:
Locking a file during a lengthy process:
lockfile-create /some/file
lockfile-touch /some/file &
# Save the PID of the lockfile-touch process
BADGER="$!"
do-something-important-with /some/file
kill "${BADGER}"
lockfile-remove /some/file
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
inotifywait -q -m -r -e CLOSE_WRITE --format %w%f $DIR |
parallel -u python avsubmit.py -f {}\; python dbmgr.py -a {}
It will run at most one python per CPU when a file is written (and closed). That way you can bypass all the locking, and you get the added benefit that you avoid a potential race condition where a file is immediately overwritten (how do you make sure that both the first and the second version was checked?).
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

How can I check if a program exists from a Bash script?

How would I validate that a program exists, in a way that will either return an error and exit, or continue with the script?
It seems like it should be easy, but it's been stumping me.
Answer
POSIX compatible:
command -v <the_command>
Example use:
if ! command -v <the_command> &> /dev/null
then
echo "<the_command> could not be found"
exit
fi
For Bash specific environments:
hash <the_command> # For regular commands. Or...
type <the_command> # To check built-ins and keywords
Explanation
Avoid which. Not only is it an external process you're launching for doing very little (meaning builtins like hash, type or command are way cheaper), you can also rely on the builtins to actually do what you want, while the effects of external commands can easily vary from system to system.
Why care?
Many operating systems have a which that doesn't even set an exit status, meaning the if which foo won't even work there and will always report that foo exists, even if it doesn't (note that some POSIX shells appear to do this for hash too).
Many operating systems make which do custom and evil stuff like change the output or even hook into the package manager.
So, don't use which. Instead use one of these:
command -v foo >/dev/null 2>&1 || { echo >&2 "I require foo but it's not installed. Aborting."; exit 1; }
type foo >/dev/null 2>&1 || { echo >&2 "I require foo but it's not installed. Aborting."; exit 1; }
hash foo 2>/dev/null || { echo >&2 "I require foo but it's not installed. Aborting."; exit 1; }
(Minor side-note: some will suggest 2>&- is the same 2>/dev/null but shorter – this is untrue. 2>&- closes FD 2 which causes an error in the program when it tries to write to stderr, which is very different from successfully writing to it and discarding the output (and dangerous!))
If your hash bang is /bin/sh then you should care about what POSIX says. type and hash's exit codes aren't terribly well defined by POSIX, and hash is seen to exit successfully when the command doesn't exist (haven't seen this with type yet). command's exit status is well defined by POSIX, so that one is probably the safest to use.
If your script uses bash though, POSIX rules don't really matter anymore and both type and hash become perfectly safe to use. type now has a -P to search just the PATH and hash has the side-effect that the command's location will be hashed (for faster lookup next time you use it), which is usually a good thing since you probably check for its existence in order to actually use it.
As a simple example, here's a function that runs gdate if it exists, otherwise date:
gnudate() {
if hash gdate 2>/dev/null; then
gdate "$#"
else
date "$#"
fi
}
Alternative with a complete feature set
You can use scripts-common to reach your need.
To check if something is installed, you can do:
checkBin <the_command> || errorMessage "This tool requires <the_command>. Install it please, and then run this tool again."
The following is a portable way to check whether a command exists in $PATH and is executable:
[ -x "$(command -v foo)" ]
Example:
if ! [ -x "$(command -v git)" ]; then
echo 'Error: git is not installed.' >&2
exit 1
fi
The executable check is needed because bash returns a non-executable file if no executable file with that name is found in $PATH.
Also note that if a non-executable file with the same name as the executable exists earlier in $PATH, dash returns the former, even though the latter would be executed. This is a bug and is in violation of the POSIX standard. [Bug report] [Standard]
Edit: This seems to be fixed as of dash 0.5.11 (Debian 11).
In addition, this will fail if the command you are looking for has been defined as an alias.
I agree with lhunath to discourage use of which, and his solution is perfectly valid for Bash users. However, to be more portable, command -v shall be used instead:
$ command -v foo >/dev/null 2>&1 || { echo "I require foo but it's not installed. Aborting." >&2; exit 1; }
Command command is POSIX compliant. See here for its specification: command - execute a simple command
Note: type is POSIX compliant, but type -P is not.
It depends on whether you want to know whether it exists in one of the directories in the $PATH variable or whether you know the absolute location of it. If you want to know if it is in the $PATH variable, use
if which programname >/dev/null; then
echo exists
else
echo does not exist
fi
otherwise use
if [ -x /path/to/programname ]; then
echo exists
else
echo does not exist
fi
The redirection to /dev/null/ in the first example suppresses the output of the which program.
I have a function defined in my .bashrc that makes this easier.
command_exists () {
type "$1" &> /dev/null ;
}
Here's an example of how it's used (from my .bash_profile.)
if command_exists mvim ; then
export VISUAL="mvim --nofork"
fi
Expanding on #lhunath's and #GregV's answers, here's the code for the people who want to easily put that check inside an if statement:
exists()
{
command -v "$1" >/dev/null 2>&1
}
Here's how to use it:
if exists bash; then
echo 'Bash exists!'
else
echo 'Your system does not have Bash'
fi
Try using:
test -x filename
or
[ -x filename ]
From the Bash manpage under Conditional Expressions:
-x file
True if file exists and is executable.
To use hash, as #lhunath suggests, in a Bash script:
hash foo &> /dev/null
if [ $? -eq 1 ]; then
echo >&2 "foo not found."
fi
This script runs hash and then checks if the exit code of the most recent command, the value stored in $?, is equal to 1. If hash doesn't find foo, the exit code will be 1. If foo is present, the exit code will be 0.
&> /dev/null redirects standard error and standard output from hash so that it doesn't appear onscreen and echo >&2 writes the message to standard error.
Command -v works fine if the POSIX_BUILTINS option is set for the <command> to test for, but it can fail if not. (It has worked for me for years, but I recently ran into one where it didn't work.)
I find the following to be more failproof:
test -x "$(which <command>)"
Since it tests for three things: path, existence and execution permission.
There are a ton of options here, but I was surprised no quick one-liners. This is what I used at the beginning of my scripts:
[[ "$(command -v mvn)" ]] || { echo "mvn is not installed" 1>&2 ; exit 1; }
[[ "$(command -v java)" ]] || { echo "java is not installed" 1>&2 ; exit 1; }
This is based on the selected answer here and another source.
If you check for program existence, you are probably going to run it later anyway. Why not try to run it in the first place?
if foo --version >/dev/null 2>&1; then
echo Found
else
echo Not found
fi
It's a more trustworthy check that the program runs than merely looking at PATH directories and file permissions.
Plus you can get some useful result from your program, such as its version.
Of course the drawbacks are that some programs can be heavy to start and some don't have a --version option to immediately (and successfully) exit.
Check for multiple dependencies and inform status to end users
for cmd in latex pandoc; do
printf '%-10s' "$cmd"
if hash "$cmd" 2>/dev/null; then
echo OK
else
echo missing
fi
done
Sample output:
latex OK
pandoc missing
Adjust the 10 to the maximum command length. It is not automatic, because I don't see a non-verbose POSIX way to do it:
How can I align the columns of a space separated table in Bash?
Check if some apt packages are installed with dpkg -s and install them otherwise.
See: Check if an apt-get package is installed and then install it if it's not on Linux
It was previously mentioned at: How can I check if a program exists from a Bash script?
I never did get the previous answers to work on the box I have access to. For one, type has been installed (doing what more does). So the builtin directive is needed. This command works for me:
if [ `builtin type -p vim` ]; then echo "TRUE"; else echo "FALSE"; fi
I wanted the same question answered but to run within a Makefile.
install:
#if [[ ! -x "$(shell command -v ghead)" ]]; then \
echo 'ghead does not exist. Please install it.'; \
exit -1; \
fi
It could be simpler, just:
#!/usr/bin/env bash
set -x
# if local program 'foo' returns 1 (doesn't exist) then...
if ! type -P foo; then
echo 'crap, no foo'
else
echo 'sweet, we have foo!'
fi
Change foo to vi to get the other condition to fire.
hash foo 2>/dev/null: works with Z shell (Zsh), Bash, Dash and ash.
type -p foo: it appears to work with Z shell, Bash and ash (BusyBox), but not Dash (it interprets -p as an argument).
command -v foo: works with Z shell, Bash, Dash, but not ash (BusyBox) (-ash: command: not found).
Also note that builtin is not available with ash and Dash.
zsh only, but very useful for zsh scripting (e.g. when writing completion scripts):
The zsh/parameter module gives access to, among other things, the internal commands hash table. From man zshmodules:
THE ZSH/PARAMETER MODULE
The zsh/parameter module gives access to some of the internal hash ta‐
bles used by the shell by defining some special parameters.
[...]
commands
This array gives access to the command hash table. The keys are
the names of external commands, the values are the pathnames of
the files that would be executed when the command would be in‐
voked. Setting a key in this array defines a new entry in this
table in the same way as with the hash builtin. Unsetting a key
as in `unset "commands[foo]"' removes the entry for the given
key from the command hash table.
Although it is a loadable module, it seems to be loaded by default, as long as zsh is not used with --emulate.
example:
martin#martin ~ % echo $commands[zsh]
/usr/bin/zsh
To quickly check whether a certain command is available, just check if the key exists in the hash:
if (( ${+commands[zsh]} ))
then
echo "zsh is available"
fi
Note though that the hash will contain any files in $PATH folders, regardless of whether they are executable or not. To be absolutely sure, you have to spend a stat call on that:
if (( ${+commands[zsh]} )) && [[ -x $commands[zsh] ]]
then
echo "zsh is available"
fi
The which command might be useful. man which
It returns 0 if the executable is found and returns 1 if it's not found or not executable:
NAME
which - locate a command
SYNOPSIS
which [-a] filename ...
DESCRIPTION
which returns the pathnames of the files which would
be executed in the current environment, had its
arguments been given as commands in a strictly
POSIX-conformant shell. It does this by searching
the PATH for executable files matching the names
of the arguments.
OPTIONS
-a print all matching pathnames of each argument
EXIT STATUS
0 if all specified commands are
found and executable
1 if one or more specified commands is nonexistent
or not executable
2 if an invalid option is specified
The nice thing about which is that it figures out if the executable is available in the environment that which is run in - it saves a few problems...
Use Bash builtins if you can:
which programname
...
type -P programname
For those interested, none of the methodologies in previous answers work if you wish to detect an installed library. I imagine you are left either with physically checking the path (potentially for header files and such), or something like this (if you are on a Debian-based distribution):
dpkg --status libdb-dev | grep -q not-installed
if [ $? -eq 0 ]; then
apt-get install libdb-dev
fi
As you can see from the above, a "0" answer from the query means the package is not installed. This is a function of "grep" - a "0" means a match was found, a "1" means no match was found.
This will tell according to the location if the program exist or not:
if [ -x /usr/bin/yum ]; then
echo "This is Centos"
fi
I'd say there isn't any portable and 100% reliable way due to dangling aliases. For example:
alias john='ls --color'
alias paul='george -F'
alias george='ls -h'
alias ringo=/
Of course, only the last one is problematic (no offence to Ringo!). But all of them are valid aliases from the point of view of command -v.
In order to reject dangling ones like ringo, we have to parse the output of the shell built-in alias command and recurse into them (command -v isn't a superior to alias here.) There isn't any portable solution for it, and even a Bash-specific solution is rather tedious.
Note that a solution like this will unconditionally reject alias ls='ls -F':
test() { command -v $1 | grep -qv alias }
If you guys/gals can't get the things in answers here to work and are pulling hair out of your back, try to run the same command using bash -c. Just look at this somnambular delirium. This is what really happening when you run $(sub-command):
First. It can give you completely different output.
$ command -v ls
alias ls='ls --color=auto'
$ bash -c "command -v ls"
/bin/ls
Second. It can give you no output at all.
$ command -v nvm
nvm
$ bash -c "command -v nvm"
$ bash -c "nvm --help"
bash: nvm: command not found
#!/bin/bash
a=${apt-cache show program}
if [[ $a == 0 ]]
then
echo "the program doesn't exist"
else
echo "the program exists"
fi
#program is not literal, you can change it to the program's name you want to check
The hash-variant has one pitfall: On the command line you can for example type in
one_folder/process
to have process executed. For this the parent folder of one_folder must be in $PATH. But when you try to hash this command, it will always succeed:
hash one_folder/process; echo $? # will always output '0'
I second the use of "command -v". E.g. like this:
md=$(command -v mkdirhier) ; alias md=${md:=mkdir} # bash
emacs="$(command -v emacs) -nw" || emacs=nano
alias e=$emacs
[[ -z $(command -v jed) ]] && alias jed=$emacs
I had to check if Git was installed as part of deploying our CI server. My final Bash script was as follows (Ubuntu server):
if ! builtin type -p git &>/dev/null; then
sudo apt-get -y install git-core
fi
To mimic Bash's type -P cmd, we can use the POSIX compliant env -i type cmd 1>/dev/null 2>&1.
man env
# "The option '-i' causes env to completely ignore the environment it inherits."
# In other words, there are no aliases or functions to be looked up by the type command.
ls() { echo 'Hello, world!'; }
ls
type ls
env -i type ls
cmd=ls
cmd=lsx
env -i type $cmd 1>/dev/null 2>&1 || { echo "$cmd not found"; exit 1; }
If there isn't any external type command available (as taken for granted here), we can use POSIX compliant env -i sh -c 'type cmd 1>/dev/null 2>&1':
# Portable version of Bash's type -P cmd (without output on stdout)
typep() {
command -p env -i PATH="$PATH" sh -c '
export LC_ALL=C LANG=C
cmd="$1"
cmd="`type "$cmd" 2>/dev/null || { echo "error: command $cmd not found; exiting ..." 1>&2; exit 1; }`"
[ $? != 0 ] && exit 1
case "$cmd" in
*\ /*) exit 0;;
*) printf "%s\n" "error: $cmd" 1>&2; exit 1;;
esac
' _ "$1" || exit 1
}
# Get your standard $PATH value
#PATH="$(command -p getconf PATH)"
typep ls
typep builtin
typep ls-temp
At least on Mac OS X v10.6.8 (Snow Leopard) using Bash 4.2.24(2) command -v ls does not match a moved /bin/ls-temp.
My setup for a Debian server:
I had the problem when multiple packages contained the same name.
For example apache2. So this was my solution:
function _apt_install() {
apt-get install -y $1 > /dev/null
}
function _apt_install_norecommends() {
apt-get install -y --no-install-recommends $1 > /dev/null
}
function _apt_available() {
if [ `apt-cache search $1 | grep -o "$1" | uniq | wc -l` = "1" ]; then
echo "Package is available : $1"
PACKAGE_INSTALL="1"
else
echo "Package $1 is NOT available for install"
echo "We can not continue without this package..."
echo "Exitting now.."
exit 0
fi
}
function _package_install {
_apt_available $1
if [ "${PACKAGE_INSTALL}" = "1" ]; then
if [ "$(dpkg-query -l $1 | tail -n1 | cut -c1-2)" = "ii" ]; then
echo "package is already_installed: $1"
else
echo "installing package : $1, please wait.."
_apt_install $1
sleep 0.5
fi
fi
}
function _package_install_no_recommends {
_apt_available $1
if [ "${PACKAGE_INSTALL}" = "1" ]; then
if [ "$(dpkg-query -l $1 | tail -n1 | cut -c1-2)" = "ii" ]; then
echo "package is already_installed: $1"
else
echo "installing package : $1, please wait.."
_apt_install_norecommends $1
sleep 0.5
fi
fi
}

Resources