Bash Script to Check That Files Are Being Created - bash

We have an Amazon EC2 instance where we upload output from our security cameras. Every now and then, the cameras have an issue, stop uploaded, and need to be rebooted. The easy way for us to determine this is by seeing if the files are not being created. The problem is it creates lots and lots of files. If I use find with -ctime, it takes a very long time for this script to run. Is there a faster way to check to see if files have been created since yesterday? I just need to capture the result, (yes there are some files, or not there are not,) and email a message, but it would be nice to have something that didn't take half an hour to run.
#!/bin/bash
find /vol/security_ftp/West -ctime -1
find /vol/security_ftp/BackEntrance -ctime -1
find /vol/security_ftp/BoardroomDoor -ctime -1
find /vol/security_ftp/MainEntrance -ctime -1
find /vol/security_ftp/North -ctime -1
find /vol/security_ftp/South -ctime -1

Using find is a natural solution, but if you really must avoid it, you can see the newest file in a directory using ls and sorting the output according to ctime, eg.
ls /vol/security_ftp/West -clt | head --lines=1
This would be enough if you want to see the date.
If you need better formatted output (or only ctime to process it further) you can feed the filename to stat:
stat --format="%z" $( ls /vol/security_ftp/West -ct | head --lines=1 )
This does not answer automatically if any file was created recently, though.

The simple (and recommended man find) solution is:
find /vol/security_ftp/ -mtime 0
To find files in /vol/security_ftp modified within the last 24 hours. Give it a try and see if it will meet your time requirements. We can look for another solution if the default can't do it quick enough. If the delay is due to numerous subdirectories under /vol/security_ftp, then limit the depth and type with:
find /vol/security_ftp/ -maxdepth 1 -type f -mtime 0

Related

Unix find list matching directories

I have a rather interesting problem that I am trying to find the optimal solution for. I am creating an file autocompletion backend for Emacs. This means that I am using the linux find command to get files and directories.
The backend is given a file with a partially completed path (e.g. /usr/folder/foo) and I want to grab all files and directories that could match the partial path for two directories down (e.g. for example it could provide foo_bar/, foo_bar/bar, foo_bar/baz, foo_bar/bat/ foo_baz). So far I have only been to break this down into 3 steps
find all files in the current directory that may match the prefix
find foo* -type f -maxdepth 1
collect all possible directories we may want to look through
find foo* -type d -maxdepth 1
use each of those directories to make 2 more calls to find (I need to be able to differentiate between files and directories)
find foo_bar/ -type d -maxdepth 1
find foo_bar/ -type f -maxdepth 1
This solution involves a lot of calls to find (especially because the last step has to be called for every matching directory). This makes getting candidates slow, especially in large file systems. Ideally I would like to only make one call to get all the candidates. But I have not found a good way to do that. Does anyone know an optimal solution?
looking though the find manpage, I ended up using -printf.
find -L foo* -maxdepth 1 -printf '%p\t%y\n'
gives me everything I needed. only one command, differentiate between files and directories, search depth, etc.

How to delete specific files in unix

We have a few files on our server instance under /wslogs/instance_name directory and these are all log files created on daily basis.
I am looking for a script to automatically delete those files based on date.
So lets say delete files older than 10 days. The problem is that the filename is not purely of date format rather it is
hostname_%m%d%Y_access.log and hostname_%m%d%Y_error.log
For example, ra70960708_12042016_access.log and ra70960708_12042016_error.log (where ra70960708 is the server name or hostname).
I'm trying to use rm command, but unable to figure out how to specify the files here if say I have to delete those which are 10 days older from current date.
Any help would be greatly appreciated.
Cheers,
Ashley
Forgot about name, and use modification time instead:
The below will list files in current directory, that matches the glob: hostname_*_error.log and which are last modified +10 days ago:
find . -maxdepth 1 -mindepth 1 \
-type f -name 'hostname_*_error.log' \
-mtime +10
They can then be deleted with -delete.
. is the directory to search in.

find files in huge directory - very slow

I have a directory with files. The archive is very big and has 1.5 million pdf files inside.
the directory is stored on an IBM i server with OS V7R1 and the machine is new and very fast.
The files are named like this :
invoice_[custno]_[year']_[invoice_number].pdf
invoice_081500_2013_7534435564.pdf
No I try to find files with the find command using the Shell.
find . -name 'invoice_2013_*.pdf' -type f | ls -l > log.dat
The command took a long time so I aborted the operation with no result.
If I try it with smaller directories all works fine.
Later I want to have a job that runs everey day and finds the files created the last 24 hours but I it aleays runs so slow I can forget this.
That invocation would never work because ls does not read filenames from stdin.
Possible solutions are:
Use the find utility's built-in list option:
find . -name 'invoice_2013_*.pdf' -type f -ls > log.dat
Use the find utility's -exec option to execute ls -l for each matching file:
find . -name 'invoice_2013_*.pdf' -type f -exec ls {} \; > log.dat
Pipe the filenames to the xargs utility and let it execute ls -l with the filenames as parameters:
find . -name 'invoice_2013_*.pdf' -type f | xargs ls -l > log.dat
A pattern search of 1.5 million files in a single directory is going to be inefficient on any filesystem.
For looking only at a list of new entries in the directory, you might consider journaling the directory. You would specify INHERIT(*NO) to prevent journaling all the files in the directory as well. Then you could simply extract the recent journal entries with DSPJRN to find out what objects had been added.
I don't think I'd put more than maybe 15k files in a single directory. Some QShell utilities run into trouble at around 16k files. But I'm not sure I'd store them in a directory in any case, except maybe for ones over 16MB if that's a significant fraction of the total. I'd possibly look to store them in CLOBs/BLOBs in the database first.
Storing as individual streamfile objects brings ownership/authority problems that need to be addressed. Some profile is getting entries into its owned-objects table, and I'd expect that profile to be getting pretty large. Perhaps getting to one or more limits.
By storing in the database, you drop to a single owned object.
Or perhaps a few similar objects... There might be a purging/archiving process that moves rows off to a secondary or tertiary table. Hard to guess how that might need to be structured, if at all.
Saves could also benefit, especially SAVSECDTA and SAV saves. Security data is greatly reduced. And saving a 4GB table is faster than saving a thousand 4MB objects (or whatever the breakdown might be).
Other than determining how the original setup and implementation would go in your environment, the big tricky part could involve volatility. If these are stable objects with relatively few changes and few deletions, it should be okay. But if BLOBs are often modified, it can bring trouble when the table takes at a significant fraction of DASD capacity. It gets particularly rough when it exceeds the size of DASD free space and a re-org is needed. With low volatility, that's much less of a concern.
Typically what is done in such cases is to create subdirectories -- perhaps by using the first letter of each file.. For example, the file
abcsdsjahdjhfdsfds.xyz would be store in
/something/a/abcsdsjahdjhfdsfds.xyz
that would cut down on the size each subdirectory..

Bash script to find file older than X days, then subsequently delete it, and any files with the same base name?

I am trying to figure out a way to search a directory for a file older than 365 days. If it finds a match, I'd like it to both delete the file and locate any other files in the directory that have the same basename, and delete those as well.
File name examples: 12345.pdf (Search for) then delete, 12345_a.pdf, 12345_xyz.pdf (delete if exist).
Thanks! I am very new to BASH scripting, so patience is appreciated ;-))
I doubt this can be done cleanly in a single pass.
Your best bet is to use -mtime or a variant to collect names and then use another find command to delete files matching those names.
UPDATE
With respect to your comment, I mean something like:
# find basenames of old files
find .... -printf '%f\n' | sort -u > oldfiles
for file in ($<oldfiles); do find . -name $file -exec rm; done

Script to copy files on CD and not on hard disk to a new directory

I need to copy files from a set of CDs that have a lot of duplicate content, with each other, and with what's already on my hard disk. The file names of identical files are not the same, and are in sub-directories of different names. I want to copy non-duplicate files from the CD into a new directory on the hard disk. I don't care about the sub-directories - I will sort it out later - I just want the unique files.
I can't find software to do that - see my post at SuperUser https://superuser.com/questions/129944/software-to-copy-non-duplicate-files-from-cd-dvd
Someone at SuperUser suggested I write a script using GNU's "find" and the Win32 version of some checksum tools. I glanced at that, and have not done anything like that before. I'm hoping something exists that I can modify.
I found a good program to delete duplicates, Duplicate Cleaner (it compares checksums), but it won't help me here, as I'd have to copy all the CDs to disk, and each is probably about 80% duplicates, and I don't have room to do that - I'd have to cycle through a few at a time copying everything, then turning around and deleting 80% of it, working the hard drive a lot.
Thanks for any help.
I don't use Windows, but I'll give a suggestion: a combination of GNU find and a Lua script. For find you can try
find / -exec md5sum '{}' ';'
If your GNU software includes xargs the following will be equivalent but may be significantly faster:
find / -print0 | xargs -0 md5sum
This will give you a list of checksums and corresponding filenames. We'll throw away the filenames and keep the checksums:
#!/usr/bin/env lua
local checksums = {}
for l in io.lines() do
local checksum, pathname = l:match('^(%S+)%s+(.*)$')
checksums[checksum] = true
end
local cdfiles = assert(io.popen('find e:/ -print0 | xargs -0 md5sum'))
for l in cdfiles:lines() do
local checksum, pathname = l:match('^(%S+)%s+(.*)$')
if not checksums[checksum] then
io.stderr:write('copying file ', pathname, '\n')
os.execute('cp ' .. pathname .. ' c:/files/from/cd')
checksums[checksum] = true
end
end
You can then pipe the output from
find / -print0 | xargs -0 md5um
into this script.
There are a few problems:
If the filename has special characters, it will need to be quoted. I don't know the quoting conventions on Windows.
It would more efficient to write the checksums to disk rather than to run find all the time. You could try
local csums = assert(io.open('/tmp/checksums', 'w'))
for cs in pairs(checksums) do csums:write(cs, '\n') end
csums:close()
And then read checksums back in from the file using io.lines again.
I hope this is enough to get you started. You can download Lua from http://lua.org, and I recommend the superb book Programming in Lua (check out the previous edition free online).

Resources