I need to write a Linux shell script which can scans a root directory and prints files which were modified after they were last executed.
For example, if File A executed yesterday and I modify it today, the shell script must print File A. However, if File B executed yesterday and I don't modify it yet, then file B shouldn't be printed.
Your primary problem is tracking when the files were executed.
The trouble is, Linux does not keep separate track of when a file was executed as opposed to when it was read for other purposes (such as backup, or review), so it is going to be extremely tricky to get going.
There are a variety of tricks that could be considered, but none of them are particularly trivial or inviting. One option might be to enable process accounting. Another might be to modify each script to record when it is executed.
The 'last accessed' time (or atime, or st_atime, based on the name of the field in struct stat that contains the information) doesn't help you because, as already noted, it is modified whenever the file is read. Although an executed file would certainly have been accessed, there may be many read accesses that do not execute the file but that do trigger an update of the access time.
With those caveats in place, it may be that the access time is the best that you can do, and your script needs to look for files where the access time is equal to the modify time (which means the file was modified and has not been accessed since it was modified - neither read nor printed nor executed). It is less than perfect, but it may be the best approximation available, short of a complex execution tracking system.
Once you've got a mechanism in place to track the execution times of files, then you can devise an appropriate means of working out which files were modified since they were last executed.
Unix system stores 3 time values for any file:
last access
last modification
last change.
I don't think you can get last execution time without using some artificial means, like creating a log or temp file etc. when a executable file runs.
PS: Remember not every file in Unix is an executable so that's the reason probably they never thought of storing a file's last execution timestamp as well.
However if you do want to get these time values then use:
stat -c "%X" file-name # to get last accessed time value as seconds since Epoch
stat -c "%Y" file-name # to get last modified time value as seconds since Epoch
stat -c "%Z" file-name # to get last change time value as seconds since Epoch
It is very hard to do this in shell, simply because it is very hard to get atime or mtime in a sensible format in shell. Consider moving the routine to a more full-featured language like Ruby or Perl:
ruby -e 'puts Dir["**/*"].select{ |file| File.mtime(file) > File.atime(file) }'
Use **/* for all files in current directory and below, **/*.rb for all Ruby scripts in current directory in below, /* for all files in root... you get the pattern.
Take note what I wrote in a comment to #JohanthanLeffer: UNIX does not differentiate between reading a file and executing it. Thus, printing the script out with cat ./script will have the same effect as executing it with ./script, as far as this procedure is concerned. There is no way to differentiate reading and executing that I can think of, short of making your own kernel.
However, in most cases, you probably won't read the executables; and if you edit them, the save will come after opening, so mtime will still trump atime. The only bad scenario is if you open a file in an editor then exit without saving it (or just view it with less, without modification). As long as you avoid this, the method will work.
Also make note that most editors will not actually modify a file, but create a new file and copy the contents from the old one, then overwrite the old one with the new one. This does not set the mtime, but ctime. Modify the script accordingly, if this is your usage pattern.
EDIT: Apparently, stat can help with the sensible representation. This is in bash:
#!/bin/sh
for FILE in `find .`; do
if [ `stat -f "%m -gt %a" $FILE` ]; then
echo $FILE
fi
done
Replace "find ." (with backticks) with * for just current directory, or /* for root. To use ctime instead of mtime, use %c instead of %m.
Related
Say I source a bash script from within $PROMPT_COMMAND, which is to say at every time enter is pressed, which makes it quite often, does bash optimize this somehow when the file wasn't changed?
EDIT:
Just to clarify, I only ask about loading the script's content from disk, not optimizing the code itself.
An example an optimization one could manually do is check if the sourced file has the same modified date and size[1], if so, then not read the file from disk again and use an already parsed script from memory and execute that instead. If that file contains only bash function definitions then one could also imagine an optimization where these definitions need not be changed (reevaluated) at all - given that the contents are the same.
Is checking file size and modified date sufficient to determine if a file has changed? It can certainly be subverted but given that this is what rsync does by default then it surely is a method to consider.
[1] If a filesystem also stores checksums for files then this would be an even better way to determine if a file on disk has or hasn't changed.
Just to avoid misunderstandings regarding the term optimization:
It seems you are concerned with the time it takes to load the sourced file from the disk (this special form of optimization is usually called caching)
... not about the time it takes to executed an already loaded file (optimization as done by compilers, e.g. gcc -O2)
As far as I know, bash neither caches file contents nor does it optimize scripts. Although the underlying file system or operating system may cache files, bash would have to parse the cached file again; which probably takes longer than loading it from a modern disk (e.g. an SSD).
I wouldn't worry too much about such things unless they actually become a problem for you. If they do, you can easily ...
Cache the script yourself
Wrap the entire content of the sourced file in a function definition. Then source the file once on shell startup. After that, you can run the function from memory.
define-my-prompt-command.sh
my_prompt_command() {
# a big script
}
.bashrc
source define-my-prompt-command.sh
PROMPT_COMMAND=my_prompt_command
You can try to add following snippet to your script :
if ! type reload-sourced-file > /dev/null 2>&1 ; then
echo "This is run when you first source the file..."
PROMPT_COMMAND=$'reload-sourced-file\n'"$PROMPT_COMMAND"
absolute_script_path=$(realpath -e $BASH_SOURCE)
script_previous_stat=$(stat -c "%B%y%z" $absolute_script_path)
fi
reload-sourced-file(){
local stat=$(stat -c "%B%y%z" "$absolute_script_path")
if ! test "$stat" = "$script_previous_stat"; then # Re-source when stat changes.
script_previous_stat="$stat"
echo "You'll see this message for the following Re-sourcings."
source "$absolute_script_path"
fi
}
the script will be re-sourced when stat changes. Hopefully the stat is cached by the file system.
I have searched the forum couldn't find one.can we define a variable that only increments on every cronjob run?
for example:
i have a script that runs every 5minutes so i need a variable that increments based on the cron run
Say if the job ran 5minutes for minutes. so 6 times the script got executed so my counter variable should be 6 now
Im expecting in bash/shell
Apologies if a duplicate question
tried:
((count+1))
You can do it this way:
create two scripts: counter.sh and increment_counter.sh
add execution of increment_counter.sh in your cron job
add . /path/to/counter.sh into /etc/profile or /etc/bash.bashrc or wherever you need
counter.sh
declare -i COUNTER
COUNTER=1
export COUNTER
increment_counter.sh
#!/bin/bash
echo "COUNTER=\$COUNTER+1" >> /path/to/counter.sh
The shell that you've run the command in has exited; any variables it has set have gone away. You can't use variables for this purpose.
What you need is some sort of permanent data store. This could be a database, or a remote network service, or a variety of things, but by far the simplest solution is to store the value in a file somewhere on disk. Read the file in when the script starts and write out the incremented value afterwards.
You should think about what to do if the file is missing and what happens if multiple copies of the script are run at the same time, and decide whether those are situations you care about at all. If they are, you'll need to add appropriate error handling and locking, respectively, in your script.
Wouldn't this be a better solution?
...to define a file under /tmp, such that a command like:
echo -n "." > $MyCounterFilename
Tracks the number of times something is invoked, in my particular case of app.:
#!/bin/bash
xterm [ Options ] -T "$(cat $MyCounterFilename | wc -c )" &
echo -n "." > $MyCounterFilename
Because i had to modify the way xterm is invoked for my purposes and i found already that having opened many of these concurrently one waste less time if knowing exactly what is running on each one by its number (without having to cycle alt+tab and eye inspect through everything).
NOTE: /etc/profile, or better either ~/.profile or ~/.bash_profile needs only a env. variable name defined containing the full path to your counter file.
Anyway, if you dont like the idea above, experiments might be performed to determine a) 1st time out of all that /etc/profile is executed since machine is powered on and system boots. 2) Wether /etc/profile is executed or not, and how many times (Each time we open an xterm?, for instance). ... thereafter the same sort of testing for the other dudes lesser general than /etc one.
I need to create some sort of fail safe in one of my scripts, to prevent it from being re-executed immediately after failure. Typically when a script fails, our support team reruns the script using a 3rd party tool. Which is usually ok, but it should not happen for this particular script.
I was going to echo out a time-stamp into the log, and then make a condition to see if the current time-stamp is at least 2 hrs greater than the one in the log. If so, the script will exit itself. I'm sure this idea will work. However, this got me curious to see if there is a way to pull in the last run time of the script from the system itself? Or if there is an alternate method of preventing the script from being immediately rerun.
It's a SunOS Unix system, using the Ksh Shell.
Just do it, as you proposed, save the date >some file and check it at the script start. You can:
check the last line (as an date string itself)
or the last modification time of the file (e.g. when the last date command modified the somefile
Other common method is create one specified lock-file, or pid-file such /var/run/script.pid, Its content is usually the PID (and hostname, if needed) of the process what created it. Of course, the file-modification time tell you when it is created, by its content you can check the running PID. If the PID doesn't exists, (e.g. pre process is died) and the file modification time is older as X minutes, you can start the script again.
This method is good mainly because you can use simply the cron + some script_starter.sh what will periodically check the script running status and restart it when needed.
If you want use system resources (and have root access) you can use the accton + lastcomm.
I don't know SunOS but probably knows those programs. The accton starts the system-wide accounting of all programs, (needs to be root) and the lastcomm command_name | tail -n 1 shows when the command_name is executed last time.
Check the man lastcomm for the command line switches.
I've written a bash script on Cygwin which is rather like rsync, although different enough that I believe I can't actually use rsync for what I need. It iterates over about a thousand pairs of files in corresponding directories, comparing them with cmp.
Unfortunately, this seems to run abysmally slowly -- taking about ten (Edit: actually 25!) times as long as it takes to generate one of the sets of files using a Python program.
Am I right in thinking that this is surprisingly slow? Are there any simple alternatives that would go faster?
(To elaborate a bit on my use-case: I am autogenerating a bunch of .c files in a temporary directory, and when I re-generate them, I'd like to copy only the ones that have changed into the actual source directory, leaving the unchanged ones untouched (with their old creation times) so that make will know that it doesn't need to recompile them. Not all the generated files are .c files, though, so I need to do binary comparisons rather than text comparisons.)
Maybe you should use Python to do some - or even all - of the comparison work too?
One improvement would be to only bother running cmp if the file sizes are the same; if they're different, clearly the file has changed. Instead of running cmp, you could think about generating a hash for each file, using MD5 or SHA1 or SHA-256 or whatever takes your fancy (using Python modules or extensions, if that's the correct term). If you don't think you'll be dealing with malicious intent, then MD5 is probably sufficient to identify differences.
Even in a shell script, you could run an external hashing command, and give it the names of all the files in one directory, then give it the names of all the files in the other directory. Then you can read the two sets of hash values plus file names and decide which have changed.
Yes, it does sound like it is taking too long. But the trouble includes having to launch 1000 copies of cmp, plus the other processing. Both the Python and the shell script suggestions above have in common that they avoid running a program 1000 times; they try to minimize the number of programs executed. This reduction in the number of processes executed will give you a pretty big bang for you buck, I expect.
If you can keep the hashes from 'the current set of files' around and simply generate new hashes for the new set of files, and then compare them, you will do well. Clearly, if the file containing the 'old hashes' (current set of files) is missing, you'll have to regenerate it from the existing files. This is slightly fleshing out information in the comments.
One other possibility: can you track changes in the data that you use to generate these files and use that to tell you which files will have changed (or, at least, limit the set of files that may have changed and that therefore need to be compared, as your comments indicate that most files are the same each time).
If you can reasonably do the comparison of a thousand odd files within one process rather than spawning and executing a thousand additional programs, that would probably be ideal.
The short answer: Add --silent to your cmp call, if it isn't there already.
You might be able to speed up the Python version by doing some file size checks before checking the data.
First, a quick-and-hacky bash(1) technique that might be far easier if you can change to a single build directory: use the bash -N test:
$ echo foo > file
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
newer than last read
$ cat file
foo
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
older than last read
$ echo blort > file # regenerate the file here
$ if [ -N file ] ; then echo newer than last read ; else echo older than last read ; fi
newer than last read
$
Of course, if some subset of the files depend upon some other subset of the generated files, this approach won't work at all. (This might be reason enough to avoid this technique; it's up to you.)
Within your Python program, you could also check the file sizes using os.stat() to determine whether or not you should call your comparison routine; if the files are different sizes, you don't really care which bytes changed, so you can skip reading both files. (This would be difficult to do in bash(1) -- I know of no mechanism to get the file size in bash(1) without executing another program, which defeats the whole point of this check.)
The cmp program will do the size comparison internally IFF you are using the --silent flag and both files are regular files and both files are positioned at the same place. (This is set via the --ignore-initial flag.) If you're not using --silent, add it and see what the difference is.
lets say we were to use standard bash terminology to write a for loop which loops srm to securely erase a item on your drive.
Now lets say we set it to iterate 10 times, after it is done the first iteration, How can it still work on the file? The file should no longer exist, so how can It erase it? Not a question specific to srm, anything can be ran. Even something like mv, even when the file is no longer availible.
It'll run through the loop 10 times, but except on the first iteration, the command you're executing will fail (and return -1). The command will also write out any error messages it normally writes out (to stdout, stderr or a file).
#!/bin/bash
for i in {1..5}
do
rm something
done
Now, assuming there's a file called something, you get:
rm: something: No such file or directory
rm: something: No such file or directory
rm: something: No such file or directory
rm: something: No such file or directory
Note that this happens 4 times, not 5, since the first time, rm ran successfully.
You can't. Once srm has returned, the file is gone.
Rather then writing a loop, you will want to adjust the arguments to srm to overwrite the data more times before returning.
According to the Wikipedia writeup on srm, the default mode is 35 pass Gutmann. Is that really not sufficient?
srm does the looping for you, and then deletes the drive, there is no need or ability to do what you want from bash. You would have to write something in C/C++ that talked directly to the filesystem using some OS specific API.
Overkill. Just use shred --remove <file>; it's a dozen times easier.
If you're trying to wipe your whole drive, it's been pretty systematically proven that nothing gets you more bang for the buck than dd and writing your drive with zeroes.