Ignore spaces in Solaris 'find' output - bash

I am trying to remove all empty files that are older than 2 days. Also I am ignoring hidden files, starting with dot. I am doing it with this code:
find /u01/ -type f -size 0 -print -mtime +2 | grep -v "/\\." | xargs rm
It works fine until there are spaces in the name of the file. How could I make my code ignore them?
OS is Solaris.

Option 1
Install GNU find and GNU xargs in an appropriate location (not /usr/bin) and use:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -print0 | xargs -0 rm
(Note that I removed (what I think is) a stray -print from your find options. The options shown removes empty files modified more than 2 days ago where the name does not start with a ., which is the condition that your original grep seemed to deal with.)
Option 2
The problem is primarily that xargs is defined to split its input at spaces. An alternative is to write your own xargs surrogate that behaves sensibly with spaces in names; I've done that. You then only run into problems if the file names contain newlines — which the file system allows. Using a NUL ('\0') terminator is guaranteed safe; it is the only character that can't appear in a path name (which is why GNU chose to use it with -print0 etc).
Option 3
A final better option is perhaps:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -exec rm {} \;
This avoids using xargs at all and handles all file names (path names) correctly — at the cost of executing rm once for each file found. That's not too painful if you're only dealing with a few files on each run.
POSIX 2008 introduces the notation + in place of the \; and then behaves rather like xargs, collecting as many arguments as will conveniently fit in the space it allocates for the command line before running the command:
find /u01/ -type f -size 0 -mtime +2 -name '[!.]*' -exec rm {} +
The versions of Solaris I've worked on do not support that notation, but I know I work on antique versions of Solaris. GNU find does support the + marker and therefore renders the -print0 and xargs -0 workaround unnecessary.

Related

Is this shell command to delete all but last X directories safe?

I've seen a lot of warnings against the dangers of filenames with funny characters wreaking havoc in shell scripts.
I've scoured SO and seen dozens of variants of xargs and -exec rm -rf {} \;, and "don't use ls for scripting" and I've come up with what I think is "safe" to run.
find /path/to/dir -mindepth 1 -maxdepth 1 -type d -print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf
I've got a directory full of sub-directores in this format:
# find /srv/mywebsite/releases -mindepth 1 -maxdepth 1 -type d | sort
/srv/mywebsite/releases/2017-01-01T01:43:23Z
/srv/mywebsite/releases/2017-01-01T02:09:44Z
/srv/mywebsite/releases/2017-01-01T02:20:06Z
...
/srv/mywebsite/releases/2017-04-22T01:34:45Z
/srv/mywebsite/releases/2017-04-30T03:24:19Z
/srv/mywebsite/releases/2017-05-02T01:48:39Z
I want to delete all but the last 10 of them, sorted by the date in the directory name, not the directory mod/create-time. This is just a precaution in case one of the dirs gets touched and mtime/ctime doesn't match.
I think my shell command above should do exactly that, but I just want to double check that it won't blow up my server if one of the dirs ever contains a * or . or something.
This is safe, in that:
No shell evaluation whatsoever is run on the names. This specifically includes glob expansion, so a name containing a * will not result in additional rm arguments.
Since all names are prefixed with /path/to/dir, we don't need to worry about leading dashes being interpreted as options. (In a scenario where you did have this concern, xargs -r0 rm -rf -- would be appropriate; per POSIX utility syntax guideline #10, passing the string -- ensures that all subsequent arguments are parsed as positional).
Since all names are separated with NULs, and NULs can't exist in names, we can't have a single name result in multiple arguments to rm. (Poorly-written scripts often make a similar assumption about newlines, but that assumption is unfounded).
Inasmuch as you're depending on the names representing UTC timestamps in a specific format (and on new names continuing to match that format so they can be appropriately compared against old names), you might want to add an appropriate filter, making the full command something like:
find /path/to/dir -mindepth 1 -maxdepth 1 -type d \
-regextype posix-extended \
-regex '.*/[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}T[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}Z$' \
-print0 | sort -z | head -z -n -10 | xargs -r0 rm -rf --
None of this is particularly portable -- both the original code and the above suggestion require non-POSIX extensions to find, sort, head and xargs; and the naming convention itself wouldn't be allowed on Windows filesystems (where : is reserved) -- but if you're running a modern GNU toolchain on a UNIXy platform, this looks good to me.

unix command for file seperation in two different folders

I am currently in data folder which has following files and folders
Folders:
ISOLATE
JUKEBOX
Files:
XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt
...
I want to put the files with .ISOLATE in Folder ISOLATE and .JUKEBOX ones in the JUKEBOX folder. How could I perform this task using terminal?
There are more than 12000 files, so I cannot really change the naming scheme.
Thanks in advance
Try to use wildcards:
mv *.ISOLATE.quantifier.txt ISOLATE/
mv *.JUKEBOX.quantifier.txt JUKEBOX/
If the number of files is too high, you might need to move them in smaller loads.
find -name '*.ISOLATE.quantifier.txt' -maxdepth 1 -exec mv {} ISOLATE/ +
-exec with + should accumulate the command line arguments the same way as xargs, so you shouldn't overflow the maximal number of arguments.
Since you're dealing with huge # of files, you can use this mv with xargs:
printf '%s\0' *.ISOLATE.* | xargs -0 mv -t ISOLATE/
printf '%s\0' *.JUKEBOX.* | xargs -0 mv -t JUKEBOX/
In addition to trying wildcards (bash pattern match or globs), which at some point will hit an upper limit based on the number of files, you can also use find and xargs:
find . -name '*.ISOLATE.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./ISOLATE
find . -name '*.JUKEBOX.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./JUKEBOX
Doing this won't be subject to the maximum number of command line arguments that the glob solution may hit.
They key things in the commands above are:
-maxdepth 1 ensures that find won't keep looking into the ./ISOLOATE or ./JUKEBOX subdirectories
-print0 causes find to delimit the file names with a null byte rather than whitespace. This protects you against files that have spaces or other special characters in their names.
-0 causes xargs to use the null byte delimiter rather than whitespace for the same reason
-IFILE tells xargs to use the string FILE for each of the arguments. Typically xargs puts the filenames on the right, which wouldn't work with the mv command.
I tested the approach with a small shell script:
touch XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
touch XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
touch XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
touch XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt
mkdir ISOLATE
mkdir JUKEBOX
find . -name '*.ISOLATE.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./ISOLATE
find . -name '*.JUKEBOX.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./JUKEBOX
find .
Which outputs:
$ bash example.sh
.
./example.sh
./ISOLATE
./ISOLATE/XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
./ISOLATE/XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
./JUKEBOX
./JUKEBOX/XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
./JUKEBOX/XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt

Awk/Sed: How to do a recursive find/replace of a string in files with a certain file extension?

I need to recursively find and replace a string in my .cpp and .hpp files.
Looking at an answer to this question I've found the following command:
find /home/www -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'
Changing it to include my file type did not work - did not changed any single word:
find /myprojects -type f -name *.cpp -print0 | xargs -0 sed -i 's/previousword/newword/g'
Help appreciated.
Don't bother with xargs; use the -exec primary. (Split across two lines for readability.)
find /home/www -type f -name '*.cpp' \
-exec sed -i 's/previousword/newword/g' '{}' \;
chepner's helpful answer proposes the simpler and more efficient use of find's -exec action instead of piping to xargs.
Unless special xargs features are needed, this change is always worth making, and maps to xargs features as follows:
find ... -exec ... {} \; is equivalent to find ... -print0 | xargs -0 -n 1 ...
find ... -exec ... {} + is equivalent to find ... -print0 | xargs -0 ...
In other words:
the \; terminator invokes the target command once for each matching file/folder.
the + terminator invokes the target command once overall, supplying all matching file/folder paths as a single list of arguments.
Multiple calls happen only if the resulting command line becomes too long, which is rare, especially on Linux, where getconf ARG_MAX, the max. command-line length, is large.
Troubleshooting the OP's command:
Since the OP's xargs command passes all matching file paths at once - and per xargs defaults at the end of the command line, the resulting command will effectively look something like this:
sed -i 's/previousword/newword/g' /myprojects/file1.cpp /myprojects/file2.cpp ...
This can easily be verified by prepending echo to sed - though (conceptual) quoting of arguments that need it (paths with, e.g., embedded spaces) will not show (note the echo):
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 echo sed -i 's/previousword/newword/g'
Next, after running the actual command, check whether the last-modified date of the files has changed using stat:
If they have, yet the contents haven't changed, the implication is that sed has processed the files, but the regex in the s function call didn't match anything.
It is conceivable that older GNU sed versions don't work properly when combining -i (in-place editing) with multiple file operands (though I couldn't find anything in the GNU sed release notes).
To rule that out, invoke sed once for each file:
If you still want to use xargs, add -n 1:
find /myprojects -type f -name '*.cpp' -print0 |
xargs -0 -n 1 sed -i 's/previousword/newword/g'
To use find's -exec action, see chepner's answer.
With a GNU sed version that does support updating of multiple files with the -i option - which is the case as of at least v4.2.2 - the best formulation of your command is (note the quoted *.cpp argument to prevent premature expansion by the shell, and the use of terminator + to only invoke sed once):
find /myprojects -type f -name '*.cpp' -exec sed -i 's/previousword/newword/g' '{}' +

Move files of specific size in Ubuntu using Terminal

I want to move all the files in a specific folder having size of 0 bytes. I know that the following prints all the files with size zero bytes.
find /home/Desktop/ -size 0
But i want to move them to another folder, so i tried :
find /home/Desktop/ -size 0 | xargs -0 mv /home/Desktop/a
But that doesn't work. ? Is there any other way to do it.? What am i doing wrong?
You can do that in find itself using -exec option:
find /home/Desktop/ -size 0 -exec mv '{}' /home/Desktop/a \;
find default prints the file name on the standard output followed by a newline. The option -print0 prints the file name followed by a null character instead. The option -0 of xargs means that the input is terminated by a null character.
find /home/Desktop/ -size 0 -print0 | xargs -0 -I {} mv {} /home/Desktop/a
You could instead use find's option -exec
In both cases consider also using find's option -type f if you only want to find files and the option -maxdepth 1 if you do not want find to descend directories. This is specially usefull in your example since you move the found files to a subdirectory!

Explain how many processes created?

Could someone answer how many processes are created in each case for the commands below as I dont understand it :
The following three commands have roughly the same effect:
rm $(find . -type f -name '*.o')
find . -type f -name '*.o' | xargs rm
find . -type f -name '*.o' -exec rm {} \;
Exactly 2 processes - 1 for rm, the other for find.
3 or more processes. 1 for find, another for xargs, and one or more rm. xargs will read standard input, and if it reads more lines than can be passed as parameters to a program (There is a maximum value named ARG_MAX).
Many processes, 1 for find and another one for each file ending in .o for rm.
In my opinion, option 2 is the best, because it handles the maximum parameter limit correctly and doesn't spawn too many processes. However, I prefer to use it like this (with GNU find and xargs):
find . -type f -name '*.o' -print0 | xargs -0 rm
This terminates each filename with a \0 instead of a newline, since filenames in UNIX can legally contain newlines. This also handles spaces in filenames (much more common) correctly.

Resources