Check from files in directory which is the most recent in Bash Shell Script - bash

I am making a bash script to run in a directory with files generated everyday and copy the most recent file to another directory.
I am using this now
for [FILE in directory]
do
if [ls -Art | tail -n 1]
something...
else
something...
fi
done
I know this is not alright. I would like to compare the date modified of the files with the current date and if it was equal, copy that file then.
How would that work or is there an easier method to do it?

We could use find:
find . -maxdepth 1 -daystart -type f -mtime -1 -exec cp -f {} dest \;
Explanation:
-maxdepth 1 limits the search to the current directory.
-daystart sets the reference time of -mtime to the beginning of today.
-type f limits the search to files.
-mtime -1 limits the search to files that have been modified less than 1 day from reference time.
-exec cp -f {} dest \; copies the found files to directory dest.
Note that -daystart -mtime -1 means anytime after today 00:00 (included), but also tomorrow or any time in the future. So if you have files with last modification time in year 2042 they will be copied too. Use -mtime 0 if you prefer coping files that have been modified between today at 00:00 (excluded) and tomorrow at 00:00 (included).
Note also that all this could be impacted by irregularities like daylight saving time or leap seconds (not tested).

The newest file is different from file(s) modified today.
Using ls is actually a pretty simple and portable approach. The stdout output format is defined by POSIX (if not printing to a terminal), and ls -A is also in newer POSIX standards.
It should look more like this though:
newest=$(ls -At | head -n 1)
You could add -1, but it AFAIK it shouldn’t be required, as it’s not printing to a terminal.
If you don’t want to use ls, you can use this on linux:
find . -mindepth 1 -maxdepth 1 -type f -exec stat -c ‘%Y:%n’ {} + |
sort -n |
tail -n 1 |
cut -d : -f 2-
Note using 2- not 2 with cut, in case a filename contains :.
Also, the resulting file name will be a relative path (./file), or an empty string if no files exist.

Related

find the last created subdirectory in a directory of 500k subdirs

I have a folder with some 500k subfolders - and I would like to find the last directory which was added to this folder. I am having to do this due to a power failure issue :(
I dont excatly know when the power failed, so using this:
find . -type d -mmin -360 -print
which I beleive is the last 360 minutes? However, gives me results which I am not exactly sure of.
Shortly speaking, I would like to get the last directory which was created within this folder.
Any pointers would be great.
Suggesting :
find . -type d -printf "%C# %p\n" |sort -n|tail -n1|awk '{print $2}'
Explanation:
find . -type d -printf "%C# %p\n"
find . start searching from current directory recursively
-type d search only directory files
-printf "%C# %p\n" for each directory print its last change time in secs from Unix epoch time including sec fraction, followed by file name with path.
For example: 1648051886.4404644000 /tmp/mc-dudi
|sort -n|tail -n1
Sort the result from find as numbers, and print the last row.
awk '{print $2}'
From last row, print only second field
You might try this: it shows your last modification date/time in a sortable manner, and by sorting it, the last entry should be the most recent one:
find ./ -exec ls -dils --time-style=long-iso {} \; | sort -k8,9
Edit: and specific for directories:
find ./ -type d -exec ls -dils --time-style=long-iso {} \; | sort -k8,9
Assuming you're using a file system that tracks file creation ('birth' is the usual terminology) times, and GNU versions of the programs used below:
find . -type d -exec stat --printf '%W\t%n\0' \{\} + | sort -z -k1,1nr | head -1 -z | cut -f 2-
This will find all subdirectories of the current working directory, and for each one, print its birth time (The %W format field for stat(1)) and name (The %n format). Those entries are then sorted based on the timestamp, newest first, and the first line is returned minus the timestamp.
Unfortunately, GNU find's -printf doesn't support birth times, so it calls out to stat(1) to get those, using the multi-argument version of -exec to minimize the number of instances of the program that need to be run. The rest is straightforward sorting of a column, using 0-byte terminators instead of newlines to robustly handle filenames with newlines or other funky characters in them.
Mantaining a symbolic link to the last known subdirectory could avoid listing all of them to find the latest one.
ls -dl $(readlink ~/tmp/last_dir)
drwxr-xr-x 2 lmc users 4096 Jan 13 13:20 /home/lmc/Documents/some_dir
Find newer ones
ls -ldt $(find -L . -newer ~/tmp/last_dir -type d ! -path .)
drwxr-xr-x 2 lmc users 6 Mar 1 00:00 ./dir2
drwxr-xr-x 2 lmc users 6 Feb 1 00:00 ./dir1
Or
ls -ldt $(find -L . -newer ~/tmp/last_dir -type d ! -path .) | head -n 1
drwxr-xr-x 2 lmc users 6 Mar 1 00:00 ./dir2
Don't use the chosen answer if you really want to find the last created sub-directory
According to the question:
Directories should be sorted by creation time instead of modification time.
find --mindepth 1 is necessary because we want to search only sub-directories.
Here are 2 solutions that both fulfill the 2 requirements:
GNU
find . -mindepth 1 -type d -exec stat -c '%W %n' '{}' '+' |
sort -nr | head -n1
BSD
find . -mindepth 1 -type d -exec stat -f '%B %N' '{}' '+' |
sort -nr | head -n1

how does grep only today's files in current directory?

I want to grep files which created today in the current directory. So how many ways to do that? What's the best way to do that?
grep --color 'content' ./directory
This should do the trick for you:
find ./directory -maxdepth 1 -type f -daystart -ctime 0 -print | xargs grep --color 'content'
In the above command, we are using find to find all the files (-type f) in directory, that were made today (-daystart -ctime 0) and then -print the full files paths to standard output. We then send the output to xargs. Using xargs we are able to execute each line of the output through the grep command. This is much simpler than having to create a for loop and iterate over each line of the output.
If I understand you want to grep "content" within all file in ./directory modified today, then you can use a combination of find and xargs. For example to find the files in ./directory modified today, you can give the -mtime 0 option which find files modified 0 24 hour periods ago (e.g. today). To handle strange filenames, use the -print0 option to have find output nul-terminated filenames. Your find command could be:
find . -maxdepth 1 -type f -mtime 0 -print0
One the list of files is generated, you can pass the result to xargs -0 which will process the list of filenames as being nul-terminated and using your grep command, you would have:
xargs -0 grep --color 'content'
To put it altogether, simply pipe the result of find to xargs, e.g.
find . -maxdepth 1 -type f -mtime 0 -print0 |
xargs -0 grep --color 'content'
Give that a go and let me know if it does what you need or if you have further questions.
Edit Per Comment
If you want more exact control of the hour, or minute or second from which you want to select your files, you can use the -newermt option for find to file all files newer than the date you give as the option, e.g. -newermt "2021-07-02 02:10:00" would select today's file created after 2:10:00 (all files after 2:10:00 am this morning)
Modifying the test above and replacing -mtime 0 with -newermt "2021-07-02 02:10:00" you would have:
find . -maxdepth 1 -type f -newermt "2021-07-02 02:10:00"` -print 0 |
xargs -0 grep --color 'content'
(adjust the time to your exact starting time you want to begin selecting files from)
Give that a go also. It is quite a bit more flexible as you can specify any time within the day to begin selecting files from based on the files modification time.

Shell script to remove and compress files from 2 paths

Need to delete log files older than 60 days and compress it if files are greater than 30 days and lesser than 60 days. I have to remove and compress files from 2 paths as mentioned in PURGE_DIR_PATH.
Also have to take the output of find command and redirect it to log file. Basically need to create an entry in the log file whenever a file is deleted. How can i achieve this?
I have to also validate if the directory path is valid or not and put a message in log file if the directory is valid or not
I have written a shell script but doesn't cover all the scenarios. This is my first shell script and need some help. How do i keep just one variable log_retention and
use it to compress files as the condition would be >30 days and <60 days? how do I validate if directories is valid or not? is my IF condition checking that?
Please let me know.
#!/bin/bash
LOG_RETENTION=60
WEB_HOME="/web/local/artifacts"
ENG_DIR="$(dirname $0)"
PURGE_DIR_PATH="$(WEB_HOME)/backup/csvs $(WEB_HOME)/home/archives"
if[[ -d /PURGE_DIR_PATH]] then echo "/PURGE_DIR_PATH exists on your filesystem." fi
for dir_name in ${PURGE_DIR_PATH}
do
echo $PURGE_DIR_PATH
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec ls -l {} \;
find ${dir_name} -type f -name "*.csv" -mtime +${LOG_RETENTION} -exec rm {} \;
done
Off the top of my head -
#!/bin/bash
CSV_DELETE=60
CSV_COMPRESS=30
WEB_HOME="/web/local/artifacts"
PURGE_DIR_PATH=( "$(WEB_HOME)/backup/csvs" "$(WEB_HOME)/home/archives" ) # array, not single string
# eliminate the oldest
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_DELETE} |
xargs -P 100 rm -f # run 100 in bg parallel
# compress the old-enough after the oldest are gone
find "${PURGE_DIR_PATH[#]}" -type f -name "*.csv" -mtime +${CSV_COMPRESS} |
xargs -P 100 gzip # run 100 in bg parallel
Shouldn't need loops.

Unix Count Multiple Folders Needed

I have a directory on unix server.
cd /home/client/files
It has multiple client folders like these below.
cd /home/client/files/ibm
cd /home/client/files/aol
cd /home/client/files/citi
All of them send us a file starting with either lower or upper case like below:
pre-ibm-03222017
PRE-aol-170322
Once we recieve the files, we process them and convert pre to pro as below:
pro-ibm-03222017
PRO-aol-170322
I want to count the files processed each day. Here is what I am looking for:
If I can just get the total count per client, that would be perfect. If not, then the total count overall.
Keep in mind it has all files as below:
cd /home/client/files/ibm
pre-ibm-03222017
pro-ibm-03222017
cd /home/client/files/aol
PRE-aol-170322
PRO-aol-170322
And I want to COUNT ONLY the PRO/pro that will either be lower or upper case. One folder can get more than 1 file per day.
I am using the below command:
find /home/client/files -type f -mtime -1 -exec ls -1 {} \;| wc -l
But it is giving me the total count of pre and pro files and also it is counting files for last 24 hours....and not the current day only.
For Example. It is currently 09:00 PM. The above command include files received yesterday between 09:00 PM and 12:00 AM as well. I don't wan't those. In other words if I run it at 01:00 AM....it should have all files for 1 hour only and not last 24 hours.
Thanks
---- Update -----
This works great for me.
touch -t 201703230000 first
touch -t 201703232359 last
find /home/client/files/ -newer first ! -newer last | grep -i pro | wc -l
Now, I was just wondering if I can pass the above as parameter.
For example, instead of using touch -t date and alias.....I want to type shortcuts and dates only to get the output. I have made the following aliases:
alias reset='touch -t `date +%m%d0000` /tmp/$$'
alias count='find /home/client/files/ -type f -newer /tmp/$$ -exec ls -1 {} \; | grep -i pro | wc -l'
This way as soon as I logon to the server, I type reset and then I type count and I get my daily number.
I was wondering if I can do something similar for any duration of days by setting date1 and date2 as aliases. If not, then perhaps a short script that would ask for parameters.
What about this?
touch -t `date +%m%d0000` /tmp/$$
find /home/client/files -type f -newer /tmp/$$ -exec ls -1 {} \; | grep -i pro | wc -l
rm /tmp/$$
Other options for finding a file created today can be found in this question:
How do I find all the files that were created today
Actually, a better way to do this is to just use this:
find /home/client/files -type f -m 0 | grep -i pro | wc -l
You can replace -m 0 with -m 5 to find files 5 days old.
For the same day issue u can use -daystart (GNU find)
The regex define a contains of /pre
find /home/client/files -regex '.*\/[pP][rR][eE].*' -type f -daystart -mtime -1 -exec ls -1 {} \;| wc -l

Copying all the files modified this month from the command line

I want to copy all the files in a directory that were modified this month. I can list those files like this:
ls -l * | grep Jul
And then to copy them I was trying to pipe the result into cp via xargs but had no success (I think) because I couldn't figure out how to parse the ls -l output to just grab the filename for cp.
I'm sure there are many ways of doing this; I'll give the correct answer out to the person who can show me how to parse ls -l in this manner (or talk me down from that position) though I'd be interested in seeing other methods as well.
Thanks!
Of course, just doing grep Jul is bad because you might have files with Jul in their name.
Actually, find is probably the right tool for your job. Something like this:
find $DIR -maxdepth 1 -type f -mtime -30 -exec cp {} $DEST/ \;
where $DIR is the directory where your files are (e.g. '.') and $DEST is the target directory.
The -maxdepth 1 flag means it doesn't look inside sub-directories for files (isn't recursive)
The -type f flag means it looks only at regular files (e.g. not directories)
The -mtime -30 means it looks at files with modification time newer than 30 days (+30 would be older than 30 days)
the -exec flag means it executes the following command on each file, where {} is replaced with the file name and \; is the end of the command
interested in seeing how this might be done with zsh
ls -lt *.*(mM0)
last month
ls -lt *.*(mM1)
or for precise date date ranges
autoload -U age
ls -tl *.*(e#age 2014/06/07 now#)
ls -tl *.*(e#age 2014/06/01 2014/06/20#)

Resources