Archive files older than x days [duplicate] - bash

This question already has answers here:
How to create tar for files older than 7 days using linux shell scripting
(3 answers)
Closed 5 years ago.
I would like to archive all files (to one .tar.gz file) in a directory when they are older than X days.
I have this one liner:
find /home/xml/ -maxdepth 1 -mtime +14 -type f -exec sh -c \ 'tar -czvPf /home/xml/archive/archive_$(date +%F).tar.gz $0' {} \;
When I run this command, I see correct files selected in this directory, but in the archive is only the last file. Is there any way to get all files into one tar.gz archive?
One more problem after #Alex answer: still many files are missing, check the screenshot.
Maybe the colons (:) are causing the problem?

-exec runs the command for each file selected, so it's writing a tar with one file in it and then overwriting it for every source file, which explains why you're only getting the last one. You can use find to generate the list of files you want and then pipe that through xargs to pass the list as if they were parameters to your tar command:
find /home/xml/ -maxdepth 1 -mtime +14 -type f | xargs tar -czvPf /home/xml/archive/archive_$(date +%F).tar.gz
File names with colons work fine for me:
% dd if=/dev/urandom of=one:1 count=1
% dd if=/dev/urandom of=two:2 count=1
% dd if=/dev/urandom of=three:3 count=1
% dd if=/dev/urandom of=four:4 count=1
% dd if=/dev/urandom of=five:5 count=1
% find . -type f | xargs tar cvf foo.tar
./five:5
./four:4
./two:2
./three:3
./one:1
% tar tvf foo.tar
-rw------- alex/alex 512 2017-07-03 21:08 ./five:5
-rw------- alex/alex 512 2017-07-03 21:08 ./four:4
-rw------- alex/alex 512 2017-07-03 21:08 ./two:2
-rw------- alex/alex 512 2017-07-03 21:08 ./three:3
-rw------- alex/alex 512 2017-07-03 21:08 ./one:1

Related

Deleting empty files in tar.gz file in bash

i have a tar.gz file and it contains .yang files along with some empty .yang files.
so i want to go into the tar.gz file and delete only those empty files
Currently i am using:
for f in *.tar.gz
do
echo "Processing file $f"
gzip -d "$f"
find $PWD -size 0 -print -delete
gzip -9 "${f%.*}"
echo "******************************************"
done
but this is not working maybe because currently, i m not in a directory instead inside the tar.gz file.
any other way to do this?
Your find command doesn't do anything useful to your tarballs because it searches and deletes in the current directory, not inside the tarballs.
So we need to first unpack the tarball (tar -xf), delete the empty files (find), and repack (tar -czf). As a safety measure we will work in temporary directories (mktemp -d) and create new tarballs (*.tar.gz.new) instead of overwriting the old ones. As you want to delete only yang empty files, we will also use some more find options. The following is for GNU tar, adapt to your own tar version (or install GNU tar). Before using it read what comes next, just in case...
for f in *.tar.gz; do
echo "Processing file $f"
d="$(mktemp -d)"
tar -xf "$f" -C "$d"
find "$d" -type f -name '*.yang' -size 0 -print -delete
tar -C "$d" -czf "$f.new" .
rm -rf "$d"
echo "******************************************"
done
But what you want is more complex than it seems because your tarballs could contain files with meta-data (owner, permissions...) that you are not allowed to use. If you run what precedes as a regular user, tar will silently change the ownership and permissions of such files and directories. When re-packing they will thus have modified meta-data. If it is a problem and you absolutely want to preserve the meta-data there are basically two options:
Pretend you are root with fakeroot or an equivalent.
Delete the files inside the tarballs without unpacking.
To use fakeroot just run the above bash script inside a fakeroot environment:
$ fakeroot
# for f in *.tar.gz; do
# ...
# done
# exit
The second solution (in-place tarball edition) uses GNU tar and GNU awk:
for f in *.tar.gz; do
echo "Processing file $f"
t="${f%.*}"
gzip -cd "$f" > "$t"
tar -tvf "$t" | awk -vORS=$"\0" '/^-.*\.yang$/ && $3==0 {
match($0,/(\S+\s+){4}\S+\s/); print substr($0,RLENGTH+1)}' |
xargs -0 -n1 tar -f "$t" --delete
gzip -c9 "$t" > "$f.new"
echo "******************************************"
done
Explanations:
We use the GNU tar --delete option to delete files directly inside the tarball, without unpacking it, which is probably more elegant (even if it is also probably slower than a fakeroot-based solution).
Let's first find all empty files in the tarball:
$ tar -tvf foo.tar
drwx------ john/users 0 2021-10-18 14:26 ./
drwx------ john/users 0 2021-10-18 16:34 ./
-rw------- john/users 0 2021-10-18 16:34 ./nonyang
drwx------ john/users 0 2021-10-18 15:22 ./foo.yang/
-rw------- john/users 0 2021-10-18 16:01 ./empty.yang
-rw------- john/users 7 2021-10-18 15:22 ./nonempty.yang
-rw------- john/users 0 2021-10-18 16:01 ./filename with spaces.yang
As you can see the size is in third column. Directory names have a leading d and a trailing /. Symbolic links have a leading l. So by keeping only lines starting with - and ending with .yang we eliminate them. GNU awk can do this twofold filtering:
$ tar -tvf foo.tar | awk '/^-.*\.yang$/ && $3==0 {print}'
-rw------- john/users 0 2021-10-18 16:01 ./empty.yang
-rw------- john/users 0 2021-10-18 16:01 ./filename with spaces.yang
This is more than what we want, so let's print only the name part. We first measure the length of the 5 first fields, including the spaces, with the match function (that sets a variable named RLENGTH) and remove them with substr:
$ tar -tvf foo.tar | awk '/^-.*\.yang$/ && $3==0 {
match($0,/(\S+\s+){4}\S+\s/); print substr($0,RLENGTH+1)}'
./empty.yang
./filename with spaces.yang
We could try to optimize a bit by calling match only on the first line but I am not 100% sure that all output lines are perfectly aligned, so let's call it on each line.
We are almost done: just pass this to tar -f foo.tar --delete <filename>, one name at a time. xargs can do this for us but there is a last trick: as file names can contain spaces we must use another separator, something that cannot be found in file names, like the NUL character (ASCII code 0). Fortunately GNU awk can use NUL as Output Record Separator (ORS) and xargs has the -0 option to use it as input separator. So, let's put all this together:
$ tar -tvf foo.tar | awk -vORS=$"\0" '/^-.*\.yang$/ && $3==0 {
match($0,/(\S+\s+){4}\S+\s/); print substr($0,RLENGTH+1)}' |
xargs -0 -n1 tar -f foo.tar --delete
$ tar -tvf foo.tar
drwx------ john/users 0 2021-10-18 16:34 ./
-rw------- john/users 0 2021-10-18 16:34 ./nonyang
drwx------ john/users 0 2021-10-18 15:22 ./foo.yang/
-rw------- john/users 7 2021-10-18 15:22 ./nonempty.yang
Inside your for loop:
for f in *.tar.gz; do
echo "Processing file $f"
t="${f%.*}"
gzip -cd "$f" > "$t"
tar -tvf "$t" | awk -vORS=$"\0" '/^-.*\.yang$/ && $3==0 {
match($0,/(\S+\s+){4}\S+\s/); print substr($0,RLENGTH+1)}' |
xargs -0 -n1 tar -f "$t" --delete
gzip -c9 "$t" > "$f.new"
echo "******************************************"
done
Note that we must decompress the tarballs before editing them because GNU tar cannot edit compressed tarballs.

List the files that I can read

I would like to list any files that can be read by my current user in bash.
I'm not sure what would be the best way to check for that. I'm thinking something along the lines of ls -l | grep <myusername>|<mygroupname> or find ., but that doesn't deal with the other permissions.
Also, I'm working on NetBSD box.
Considering the 2 files below, where one can be read by user, and the other can't:
[fsilveir#fsilveir tmp]$ ls -l ./test_dir/can_read.txt ./test_dir/cant_read.txt
-rw-r--r--. 1 root root 861784 May 29 20:34 ./test_dir/can_read.txt
-rwx------. 1 root root 0 May 29 20:30 ./test_dir/cant_read.txt
You can use find with -perm option. By using +r you'll list the files you can read, and using -r for finding the ones you can't read, as shown below:
[fsilveir#fsilveir tmp]$ find . -name "*.txt" -perm -g+r 2>/dev/null
./test_dir/can_read.txt
[fsilveir#fsilveir tmp]$
Another approach is using find with -readable option, as shown below:
[fsilveir#fsilveir tmp]$ find . -name "*.txt" -readable
./test_dir/can_read.txt
[fsilveir#fsilveir tmp]$ find . -name "*.txt" ! -readable
./test_dir/cant_read.txt

Recursively touch files with file

I have a directory that contains sub-directories and other files and would like to update the date/timestamps recursively with the date/timestamp of another file/directory.
I'm aware that:
touch -r file directory
changes the date/timestamp for the file or directory with the others, but nothing within it. There's also the find version which is:
find . -exec touch -mt 201309300223.25 {} +\;
which would work fine if i could specify the actual file/directory and use anothers date/timestamp. Is there a simple way to do this? even better, is there a way to avoid changing/updating timestamps when doing a 'cp'?
even better, is there a way to avoid changing/updating timestamps when doing a 'cp'?
Yes, use cp with the -p option:
-p
same as --preserve=mode,ownership,timestamps
--preserve
preserve the specified attributes (default:
mode,ownership,timestamps), if possible additional attributes:
context, links, xattr, all
Example
$ ls -ltr
-rwxrwxr-x 1 me me 368 Apr 24 10:50 old_file
$ cp old_file not_maintains <----- does not preserve time
$ cp -p old_file do_maintains <----- does preserve time
$ ls -ltr
total 28
-rwxrwxr-x 1 me me 368 Apr 24 10:50 old_file
-rwxrwxr-x 1 me me 368 Apr 24 10:50 do_maintains <----- does preserve time
-rwxrwxr-x 1 me me 368 Sep 30 11:33 not_maintains <----- does not preserve time
To recursively touch files on a directory based on the symmetric file on another path, you can try something like the following:
find /your/path/ -exec touch -r $(echo {} | sed "s#/your/path#/your/original/path#g") {} \;
It is not working for me, but I guess it is a matter of try/test a little bit more.
In addition to 'cp -p', you can (re)create an old timestamp using 'touch -t'. See the man page of 'touch' for more details.
touch -t 200510071138 old_file.dat

reliable way to delete older files in unix shell script

I'm trying to construct a reliable shell script to remove older files based on Xn of days using find. However, the script seems to work intermittently. Is there a better way? I list the files first to make sure I capture them, then use -exec rm{} to delete them.
I execute the script like so:
/home/scripts/rmfiles.sh /u05/backup/export/test dmp 1
#!/usr/bin/ksh
if [ $# != 3 ]; then
echo "Usage: rmfiles.sh <directory> <log|dmp|par> <numberofdays>" 2>&1
exit 1
fi
# Declare variables
HOURDATE=`date '+%Y%m%d%H%M'`;
CLEANDIR=$1;
DELETELOG=/tmp/cleanup.log;
echo "Listing files to remove..." > $DELETELOG 2>&1
/usr/bin/find $CLEANDIR -name "*.$2" -mtime +$3 -exec ls -ltr {} \; > $DELETELOG 2>&1
echo "Removing files --> $HOURDATE" > $DELETELOG 2>&1
#/usr/bin/find $CLEANDIR -name "*.$2" -mtime +$3 -exec rm {} \; > $DELETELOG 2>&1
My sample directory clearly has files older than one day as of today, but find is not picking it up when it was before during some previous testing.
Thu Sep 26 08:54:57 PDT 2013
total 161313630
-rw------- 1 oracle dba 10737418240 Sep 24 14:17 testexp01.dmp
-rw------- 1 oracle dba 10737418240 Sep 24 14:20 testexp02.dmp
-rw------- 1 oracle dba 10737418240 Sep 24 14:30 testexp03.dmp
-rw------- 1 oracle dba 508 Sep 24 15:41 EXPORT-20130924.log
-rw------- 1 oracle dba 509 Sep 25 06:00 EXPORT-20130925.log
-rw------- 1 oracle dba 508 Sep 26 08:30 EXPORT-20130926.log
Apart from a couple of small issues, the script looks good in general. My guess is that you want to add -daystart to the list of options so the base for the -mtime test is measured "from the beginning of today rather than from 24 hours ago. This option only affects tests which appear later on the command line."
If you have GNU find, then try find -D tree,search,stat,rates to see what is going on.
Some comments:
Always quote variables to make sure odd spaces don't have an effect: /usr/bin/find "$CLEANDIR" -name "*.$2" -mtime "+$3" .... Same with CLEANDIR="$1"
Don't terminate lines with ;, it's bad style.
You can replace -exec ls -ltr {} \; with -ls or -print. That way, you don't have to run the find command twice.
You should quote {} since some shells interpret them as special characters.
man find
-mtime mentions the read the comment at -atime
"When find figures out how many 24-hour periods ago the file was last accessed, any fractional part is ignored, so to match -atime +1, a file has to have been accessed at least two days ago." so this is also true for -mtime.

How to view file date of result of find command in bash

I use a find command to find some kinds of files in bash. Everything goes fine unlness the result that is shown to me just contains the file name but not the (last modification) date of file. I tried to pipe it into ls or ls -ltr but it just does not show the filedate column in result, also I tried this:
ls -ltr | find . -ctime 1
but actually I didn't work.
Can you please guide me how can I view the filedate of files returned by a find command?
You need either xargs or -exec for this:
find . -ctime 1 -exec ls -l {} \;
find . -ctime 1 | xargs ls -l
(The first executes ls on every found file individually, the second bunches them up into one ore more big ls invocations, so that they may be formatted slightly better.)
If all you want is to display an ls like output you can use the -ls option of find:
$ find . -name resolv.conf -ls
1048592 8 -rw-r--r-- 1 root root 126 Dec 9 10:12 ./resolv.conf
If you want only the timestamp you'll need to look at the -printf option
$ find . -name resolv.conf -printf "%a\n"
Mon May 21 09:15:24 2012
find . -ctime 1 -printf '%t\t%p\n'
prints the datetime and file path, separated by a ␉ character.

Resources