delete n largest files in a directory in ubuntu terminal - bash

I want to delete n (say 2 in our case) largest files in a directory.
files=$(ls -S | head -2)
rm $files
This doesn't work because the file names have space and all sorts of special characters in them. I got similar results with this ls -xS | head -2 | xargs rm. I guess one should escape all the special characters in the file name but there are various types of special characters. Although it's doable, I didn't expect it to be this complicated.
I used -Q option to quote the file names, but I still get the same error.
Downloads > files=$(ls -SQ | head -1)
Downloads > echo $files
"[ www.UsaBit.com ] - Little Children 2006 720p BRRip x264-PLAYNOW.mp4"
Downloads > rm $files
rm: cannot remove ‘"[’: No such file or directory
rm: cannot remove ‘www.UsaBit.com’: No such file or directory
rm: cannot remove ‘]’: No such file or directory
rm: cannot remove ‘-’: No such file or directory
rm: cannot remove ‘Little’: No such file or directory
rm: cannot remove ‘Children’: No such file or directory
rm: cannot remove ‘2006’: No such file or directory
rm: cannot remove ‘720p’: No such file or directory
rm: cannot remove ‘BRRip’: No such file or directory
rm: cannot remove ‘x264-PLAYNOW.mp4"’: No such file or directory

choroba's answer works well, and even though use of eval happens to be safe in this case, it's better to form a habit of avoiding it if there are alternatives.
The same goes for parsing the output of ls.
The general recommendations are:
Avoid use of eval on input you don't control, because it can result in execution of arbitrary commands.
Do not parse ls output; if possible, use pathname expansion (globbing).
That said, sometimes ls offers so much convenience that it's hard not to use it, as is the case here: ls -S conveniently sorts by file size (in descending order); hand-crafting the same logic would be nontrivial.
The price you pay for parsing ls output is that filenames with embedded newlines (\n) won't be handled correctly (as is true of choroba's answer as well). That said, such filenames are rarely a real-world concern.
While xargs applies word-splitting to its input lines by default - which is why handling of filenames with embedded whitespace fails - it can be made to recognize each input line as a distinct, as-is argument (note that ls, when not outputting to a terminal, outputs each filename on its own line by default):
GNU xargs (as used on most Linux distros):
ls -S | head -2 | xargs -d $'\n' rm # $'\n' requires bash, ksh, or zsh
-d $'\n tells xargs to treat each input line as a whole as a separate argument when passing arguments to rm.
BSD/macOS xargs (also works with GNU xargs):
This xargs implementation doesn't support the -d option, but it supports -0 to split the input into arguments by NULs (0x0 bytes). Therefore, an intermediate tr command is needed to translate \n to NULs:
ls -S | head -2 | tr '\n' '\0' | xargs -0 rm

If your ls supports the -Q option, it will quote all the names in double quotes (and backslash double quotes).
You can't use such an output directly as the argument of rm, as word-splitting won't respect the quotes. You can use eval to force a new word splitting:
eval rm $(ls -Q | head -2)
Use this with care! eval is dangerous, it can run turn data into running code that you can't control. My tests show ls -Q turns newline into \n which isn't interpreted as a newline in double quotes!

Related

Piping the contents of a file to ls

I have a file called "input.txt." that contains one line:
/bin
I would like to make the contents of the file be the input of the command ls
I tried doing
cat input.txt | ls
but it doesn't output the list of files in the /bin directory
I also tried
ls < input.txt
to no avail.
You are looking for the xargs (transpose arguments) command.
xargs ls < input.txt
You say you want /bin to be the "input" to ls, but that's not correct; ls doesn't do anything with its input. Instead, you want /bin to be passed as a command-line argument to ls, as if you had typed ls /bin.
Input and arguments are completely different things; feeding text to a command as input is not the same as supplying that text as an argument. The difference can be blurred by the fact that many commands, such as cat, will operate on either their input or their arguments (or both) – but even there, we find an important distinction: what they actually operate on is the content of files whose names are passed as arguments.
The xargs command was specifically designed to transform between those two things: it interprets its input as a whitespace-separated list of command-line arguments to pass to some other command. That other command is supplied to xargs as its command-line argument(s), in this case ls.
Thanks to the input redirection provided by the shell via <, the arguments xargs supplies to ls here come from the input.txt file.
There are other ways of accomplishing the same thing; for instance, as long as input.txt does not have so many files in it that they won't fit in a single command line, you can just do this:
ls $(< input.txt)
Both the above command and the xargs version will treat any spaces in the input.txt file as separating filenames, so if you have filenames containing space characters, you'll have to do more work to interpret the file properly. Also, note that if any of the filenames contain wildcard/"glob" characters like ? or * or [...], the $(<...) version will expand them as wildcard patterns, while xargs will not.
ls takes the filenames from its command line, not its standard input, which | ls and ls < file would use.
If you have only one file listed in input.txt and the filename doesn't contain trailing newlines, it's enough to use (note quotes):
ls "$(cat input.txt)"
Or in almost all but plain POSIX shell:
ls "$(< input.txt)"
If there are many filenames in the file, you'd want to use xargs, but to deal with whitespace in the names, use -d "\n" (with GNU xargs) to take each line as a filename.
xargs -d "\n" ls < input.txt
Or, if you need to handle filenames with newlines, you can separate them using NUL bytes in the input, and use
xargs -0 ls < input.txt
(This also works even if there's only one filename.)
Try xargs
cat file | xargs ls
Ohhh man, I have to put this to get 30 characters long ;)

Passing filepaths containing spaces with xargs

I'm trying to use xargs to pass the contents of a variable containing zero or more filepaths separated by newlines to another command and have been having inconsistent success.
My input is the output of this:
newHTK=`grep -Fxv -f $TMPFILE /Users/foo/.htk`
Which generates the aforementioned list of filenames. Here's where things go wrong (or sometimes inexplicably right):
echo "$newHTK" | xargs -L 1 xattr -w com.apple.metadata:kMDItemFinderComment htk
The intention is for is to use each line in $newHTK as a filename argument for xattr. What usually happens is xattr splits the input at the spaces. I think I might need to escape the filenames coming out of the echo command or somehow enclose them in double quotation marks (Any advice on an easy way to do this would be appreciated). But if that's the case why did it work for some of the files?
You can use the xargs -I flag (if you have it I don't know what its portability is) to do this.
grep -Fxv -f $TMPFILE /Users/foo/.htk | xargs -I % xattr -w com.apple.metadata:kMDItemFinderComment htk %

"filename too long" bash mv command old files

#! /bin/sh -
cd /PHOTAN || exit
fn=$(ls -t | tail -n -30)
mv -f -- "${fn}" /old
all I want todo is keep most recent 30 files... but cant get past the mv
"File name too long" problem
please help'
The notation "${fn}" adds all the file names into a single argument string, separated by spaces. Just for once, assuming you don't have to worry about file names with spaces in them, you need:
mv -f -- ${fn} /old
If you have file names with spaces in them, then you've got problems starting with parsing the output of the ls command.
But what if you do have to worry about spaces in your filenames?
Then, as I stated, you have major problems, starting with the issues of parsing the output of ls.
$ echo > 'a b'
$ echo > ' c d '
$
Two nice file names with spaces in them. They cause merry hell. I'm about to assume you're on Linux or something similar enough. You need to use bash arrays, the stat command, printf, sort -z, sed -z. Or you should simply outlaw filenames with spaces; it is probably easier.
names=( * )
The array names contains each file name as a separate array element, leading and trailing and embedded blanks all handled correctly.
names=( * )
for file in "${names[#]}"
do printf "%s\0" "$(stat -c '%Y' "$file") $file"
done |
sort -nzr |
sed -nze '1,30s/^[0-9][0-9]* //p' |
tr '\0' '\n'
The for loop evaluates the modification time of each file separately, and combines the modification time, a space, and the file name into a single string followed by a null byte to mark the end of the string. The sort command sorts the 'lines' numerically, assuming the lines are terminated by null bytes because of the -z option, and places the most recent file names first. The sed command prints the first 30 'lines' (file names) only; the tr command replaces null bytes with newlines (but in doing so, loses the identity of file name boundaries).
The code works even with file names containing newlines, but only on systems where sed and sort support the (non-standard) -z option to process null-terminated input 'lines' — that means systems using GNU sed and sort (even BSD sed as found on Mac OS X does not, though the Mac OS X sort is GNU sort and does support -z).
Ugh! The shell was designed for spaces to appear between and not within file names.
As noted by BroSlow in a comment, if you assume 'no newlines in filenames', then the code can be simpler and more nearly portable — but it is still tricky:
ls -t |
tail -30 |
{
list=()
while IFS='' read -r file
do list+=( "$file" )
done
mv -f -- "${list[#]}" /old
}
The IFS='' is needed so that leading and trailing spaces in filenames are preserved (and tabs, too).
I note in passing that the Korn shell would not require the braces but Bash does.

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Listing files in date order with spaces in filenames

I am starting with a file containing a list of hundreds of files (full paths) in a random order. I would like to list the details of the ten latest files in that list. This is my naive attempt:
$ ls -las -t `cat list-of-files.txt` | head -10
That works, so long as none of the files have spaces in, but fails if they do as those files are split up at the spaces and treated as separate files. File "hello world" gives me:
ls: hello: No such file or directory
ls: world: No such file or directory
I have tried quoting the files in the original list-of-files file, but the here-document still splits the files up at the spaces in the filenames, treating the quotes as part of the filenames:
$ ls -las -t `awk '{print "\"" $0 "\""}' list-of-files.txt` | head -10
ls: "hello: No such file or directory
ls: world": No such file or directory
The only way I can think of doing this, is to ls each file individually (using xargs perhaps) and create an intermediate file with the file listings and the date in a sortable order as the first field in each line, then sort that intermediate file. However, that feels a bit cumbersome and inefficient (hundreds of ls commands rather than one or two). But that may be the only way to do it?
Is there any way to pass "ls" a list of files to process, where those files could contain spaces - it seems like it should be simple, but I'm stumped.
Instead of "one or more blank characters", you can force bash to use another field separator:
OIFS=$IFS
IFS=$'\n'
ls -las -t $(cat list-of-files.txt) | head -10
IFS=$OIFS
However, I don't think this code would be more efficient than doing a loop; in addition, that won't work if the number of files in list-of-files.txt exceeds the max number of arguments.
Try this:
xargs -a list-of-files.txt ls -last | head -n 10
I'm not sure whether this will work, but did you try escaping spaces with \? Using sed or something. sed "s/ /\\\\ /g" list-of-files.txt, for example.
This worked for me:
xargs -d\\n ls -last < list-of-files.txt | head -10

Resources