I'm trying to copy a very large filesystem using a parallel pipeline of tar create/extract jobs using xargs. I can't seem to figure out the correct syntax.
find image -maxdepth 2 -mindepth 2 -type d -print|xargs -P 48 tar cf - --files-from|(cd /testfiles; tar xf -)
I get these errors:
xargs: tar: terminated by signal 13
xargs: tar: terminated by signal 13
But if I execute the same command without the -P option, it runs. It's just single threaded and will take forever to do 50 million files across the 700K subdirectories.
The following works, but ist slow:
find image -maxdepth 2 -mindepth 2 -type d -print|xargs tar cf - --files-from|(cd /testfiles; tar xf -)
So what am I missing?
The problem is that your parallel pipeline stdout is getting consumed by a "single" stdin from |(cd /testfiles; tar xf -)
So, you need to "also" parallelize the tar xf - part, a possible solution can be treating that pipeline as a "mini-script", then getting xargs passed arguments with $#:
find image -maxdepth 2 -mindepth 2 -type d -print| \
xargs -P 48 sh -c 'tar cf - --files-from $# | tar -C /testfiles -xf -' --
Btw also I'd be careful with the -P 48, start with more frugal values until you find a comfy tradeoff for the I/O impact of above.
Using -n 1 in xargs will make tar run with each output line from the find command just before.
find image -maxdepth 2 -mindepth 2 -type d -print|xargs -n 1 -P 48 tar cf - --files-from|(cd /testfiles; tar xf -)
Related
I set up a daily cron job to backup my server.
In my folder backup, the backup command generates 2 files : the archive itself .tar.gz and a file .info.json like the ones below:
-rw-r--r-- 1 root root 1617 Feb 2 16:17 20200202-161647.info.json
-rw-r--r-- 1 root root 48699726 Feb 2 16:17 20200202-161647.tar.gz
-rw-r--r-- 1 root root 1617 Feb 3 06:25 20200203-062501.info.json
-rw-r--r-- 1 root root 48737781 Feb 3 06:25 20200203-062501.tar.gz
-rw-r--r-- 1 root root 1618 Feb 4 06:25 20200204-062501.info.json
-rw-r--r-- 1 root root 48939569 Feb 4 06:25 20200204-062501.tar.gz
How to I write a bash script that will only keep the last 2 archives and deletes all the others backup (targ.gz and info.json).
In this example, that would mean deleted 20200204-062501.info.json and 20200204-062501.tar.gz .
Edit:
I replace -name by -wholename in the script but when I run it, it doesn't have any effects apparently.The old archives are still there and they have not been deleted.
the script :
#!/bin/bash
DEBUG="";
DEBUG="echo DEBUG..."; #put last to safely debug without deleting files
keep=2;
for suffix in /home/archives .json .tar; do
list=( $( find . -wholename "*$suffix" ) ); #allow for zero names
if [ ${#list[#]} -gt $keep ]; then
# delete all but last $keep oldest files
${DEBUG}rm -f "$( ls -tr "${list[#]}" | head -n-$keep )";
fi
done
Edit 2:
if I run #sorin script, does it actually delete everything if I believe the script output?
The archive folder before running the script:
https://pastebin.com/7WtwVHCK
The script I run:
find home/archives/ \( -name '*.json' -o -name '*.tar.gz' \) -print0 |\
sort -zr |\
sed -z '3,$p' | \
xargs -0 echo rm -f
The script output:
https://pastebin.com/zd7a2zcq
Edit 3 :
The command find /home/archives/ -daystart \( -name '*.json' -o -name '*.tar.gz' \) -mtime +1 -exec echo rm -f {} + works and does the job.
Marked as solved
If the file is generated daily, a simple approach would be to take advantage of the -mtime find condition:
find /home/archives/ -daystart \( -name '*.json' -o -name '*.tar.gz' \) -mtime +1 -exec echo rm -f {} +
-daystart - use the start of the day for comparing modification times
\( -name '*.json' -o -name '*.tar.gz' \) - select files that end either in *.json or *.tar.gz
-mtime +1 - modification time is older than 24 hours (from the day start)
-exec echo rm -f {} + - remove the files (remove the echo after testing and verifying the result is what you want)
A simpler solution avoiding ls and it's pitfalls and not depending on the modification time of the files:
find /home/archives/ \( -name '*.json' -o -name '*.tar.gz' \) -print0 |\
sort -zr |\
sed -nz '3,$p' | \
xargs -0 echo rm -f
\( -name '*.json' -o -name '*.tar.gz' \) - find files that end in either *.json or tar.gz
-print0 - print them null separated
sort -zr - -z tells sort to use null as a line separator, -r sorts them in reverse
sed -nz '3,$p' - -z same as above. '3,$p' - print lines between 3rd and the end ($)
xargs -0 echo rm -f - execute rm with the piped arguments (remove the echo after you tested and you are satisfied with the command)
Note: not all sort and sed support the -z but most do. If you are stuck with such a situation, you might have to use a higher level language
Find the two most recent files in path:
most_recent_json=$(ls -t *.json | head -1)
most_recent_tar_gz=$(ls -t *.tar.gz | head -1)
Remove everything else ignoring the found recent files:
rm -i $(ls -I $most_recent_json -I $most_recent_tar_gz)
Automatic deleting can be hazardous to your mental state if it deletes unwanted files or aborts long scripts early due to unexpected errors. Say when there are fewer than 1+2 files in your example. Be sure the script does not fail if there are no files at all.
tdir=/home/archives/; #target dir
DEBUG="";
DEBUG="echo DEBUG..."; #put last to safely debug without deleting files
keep=2;
for suffix in .json .tar; do
list=( $( find "$tdir" -name "*$suffix" ) ); #allow for zero names
if [ ${#list[#]} -gt $keep ]; then
# delete all but last $keep oldest files
${DEBUG}rm -f "$( ls -tr "${list[#]}" | head -n-$keep )";
fi
done
Assuming that you have fewer than 10 files and that they are created in pairs, then you can do something straightforward like this:
files_to_delete=$(ls -tr1 | tail -n+3)
rm $files_to_delete
The -tr1 tells the ls command to list the files in reverse chronological order by modification time, each on a single line.
The tail -n+3 tells the tail command to start at the third line (skipping the first two lines).
If you have more than 10 files, a more complicated solution will be necessary, or you would need to run this multiple times.
I have a lot of folders I'd like to backup on a remote location.
I'd like to tar.gz and encrypt all of these, [if possible] in a single command line.
So far, I've successfuly did half the work, with
find . -type d -maxdepth 1 -mindepth 1 -exec tar czf {}.tar.gz {} \;
Now I'd like to add an encryption step to this command, if possible using gnupg.
Can someone help?
No, you can't directly include multiple commands into -exec option of find.
On the other hand, you can easily iterate over the results. For example in bash, you can do:
find . -maxdepth 1 -mindepth 1 -type d | while read dir; do
tar czO "${dir}" | gpg --output "${dir}".tar.gz.asc --encrypt --recipient foo#example.com
done
In my script I have the following 3 commands
Basically what it is trying to do is:
create a symlink to a certain bunch of files based on their filenames, in a temp directory.
change the name of the symlink to match the current date
move the symlinks from a temp directory to their proper location
-
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
This will be inserted as a cron job to run every 15 mins, which is not a problem when the source directory contains valid data.
However when it doesn't contain any files I get errors on the second find command and the mv command
What I want I guess is a way of not executing the last two lines of the script if the first one does not create any new links
GNU xargs supports a --no-run-if-empty parameter that, to quote the documentation "If the standard input is completely empty, do not run the command. By default, the command is run once even if there is no input".
This should help avoid the xargs error (assuming you are running GNU xargs)
check the status of the command:
find . -type f -name "*${regex}-*" -exec ln -s {} "${DataTempPath}/"{} \;
if [[ $? == 0 ]]; then
find "$DataTempPath" -type l | sed -e "p;s/A[0-9]*/A${today}/" | xargs -n2 mv
mv $DataTempPath/* $DataSetPath
fi
Purpose of the script :
1.This script will delete Files older than 4 months.
2.Files older than 3 days will be compressed.
A script has been written such as :
#!/bin/bash
exec >> /dir5/dir6/cleanup-logfiles.log 2>&1
# customer list job
cd /dir1/dir2/dir3/dir4/tmp
find -type f -mtime +120 -exec rm -v '{}' \;
find -type f -mtime +3 -name '*.csv' -exec gzip -v '{}' \;
Can anyone please explain usage of both the above commands (and how do they serve the purpose ?
And this script has been placed at /etc/. what could be the reason ?
exec without a command parameter redirects all output (stdout + stderr [2>&1]) from the current shell (i.e. this script) to /dir5/dir6/cleanup-logfiles.log
cd changes directory ;)
the find commands will find all files (-type f) whose modified time (-mtime) is older than 120, respectively 3 days and: delete them (-exec rm -v '{}' \;) or gzip them (-exec gzip -v '{}' \;). gzipping only happens when the file has a csv extension (-name '*.csv')
{} is a placeholder for the currently found file
the script is probably run through cron (/etc/cron.{d,daily,hourly,weekly,monthly} or /etc/crontab)
Ok, this is my third try posting this, maybe I'm asking the wrong question!!
It's been a few years since I've done any shell programming so I'm a bit rusty...
I'm trying to create a simple shell script that finds all subdirectories under a certain named subdirectory in a tree and creates symbolic links to those directories (sounds more confusing than it is). I'm using cygwin on Windows XP.
This find/grep command finds the directories in the filesystem like I want it to:
find -mindepth 3 -maxdepth 3 -type d | grep "New Parts"
Now for the hard part... I just want to take that list, pipe it into ln and create some symlinks. The list of directories has some whitespace, so I was trying to use xargs to clean things up a bit:
find -mindepth 3 -maxdepth 3 -type d | grep "New Parts" | xargs -0 ln -s -t /cygdrive/c/Views
Unfortunately, ln spits out a long list of all the directories concatenated together (seperated by \n) and spits out a "File name too long" error.
Ideas??
I think you can do this all within your find command. OTTOMH:
find -mindepth 3 -maxdepth 3 -type d -name "*New Parts*" -exec ln -s -t /cygdrive/c/Views {} \;
Hope I remembered that syntax right.
your command
find -mindepth 3 -maxdepth 3 -type d | grep "New Parts" | xargs -0 ln -s -t /cygdrive/c/Views
have argument "-0" to xargs but you did not tell find to "-print0" (if you did grep could not work in the pipe inbetween). What you want is the following I guess:
find -mindepth 3 -maxdepth 3 -type d | grep "New Parts" | tr '\012' '\000' | xargs -0 ln -s -t /cygdrive/c/Views
The tr command will convert newlines to ascii null.
Use a for loop.
for name in $(find $from_dir -mindepth 3 -maxdepth 3 -type d); do
ln -s $name $to_dir
done
Xargs has issues where the input from the pipe goes at the end of the command. What you want is multiple commands, not just 1 command.
My experience with doing things within the find command can sometimes be slow, although it does get the job done.