"find | xargs | ls" not running ls on filenames from find - shell

So I have a directory with files and sub-directories in it. I want to get all the files recursively and then list them in long format, sorted by the modified date. Here's what I came up with.
find . -type f | xargs -d "\n" | ls -lt
However this only lists the files in the current directory and not the sub-directories. I don't understand why, given that the following prints out all the files.
find . -type f | xargs -d "\n" | cat
Any help appreciated.

xargs can only start ls if it's passed ls as an argument. When you pipe from xargs into ls, only one copy of ls is started -- by the parent shell -- and it isn't given any of the filenames from find | xargs as arguments -- instead they're on its stdin, but ls never reads its stdin, so it doesn't even know that they're there.
Thus, you need to remove the | character:
# Does what you specified in the common case, but buggy; don't use this
# (filenames can contain newlines!)
# ...also, xargs -d is GNU-only
find . -type f | xargs -d '\n' ls -lt
...or, better:
# uses NUL separators, which cannot exist inside filenames
# also, while a non-POSIX extension, this is supported in both GNU and BSD xargs
find . -type f -print0 | xargs -0 ls -lt
...or, even better than that:
# no need for xargs at all here; find -exec can do the same thing
# -exec ... {} + is POSIX-mandated functionality since 2008
find . -type f -exec ls -lt {} +
Much of the content in this answer is also covered in the Actions, Complex Actions, and Actions in Bulk sections of Using Find, which is well worth reading.

Related

An alias in .bashrc fails vs command line succeeds

I run this from a user's home dir to show me the most recent files while omitting the shell profile files:
find ./ -type f -printf "%T# %p\n"|grep -vP "/\.(bash|emacs|gtkrc|kde/|zshrc)" |sort -n| tail -10|cut -f2- -d" "|while read EACH; do ls -l "$EACH"; done;
This works, but just not as well when placed in my .bashrc as an alias:
alias recentfiles='find ./ -type f -printf "%T# %p\n"|grep -vP "/\.\(bash|emacs|gtkrc|kde/|zshrc\)"|sort -n| tail -10|cut -f2- -d" "|while read EACH; do ls -l "$EACH"; done;'
In the image you see the results without doing any filtering, followed by the desired result using grep -v for filtering which works on command line. Then final result - only partially succeeds in weeding out those files.
I have tried using bash_ and [b]ash. Not even bas (which fails to even get .basin) work ?!? And also I can use macs or acs AND still get the .emacs omitted so obviously the syntax in my alias is not respecting the /. either. Not a problem with reserved words as I originally thought.
I DO get the expected results if I place my original command as is in a file and then use the alias that way:
alias recentfiles='. /root/mycommands/recentfiles'
Can someone explain or point me to a reference to understand what is at play here? I wouldn't know what phrase with the proper terms to search on.
This should fix your problems:
alias recentfiles='find ./ -type f -printf "%T# %p\n"|grep -vP "/\.(bash|emacs|gtkrc|kde/|zshrc)"|sort -n| tail -10|cut -f2- -d" "|while read EACH; do ls -l "$EACH"; done;'
The issue is with grep -P, where -P makes it use the perl regular expressions. In perl there is no need to use \ in grouping. So (bash|emacs|...) instead of \(bash|emacs|...\) . I really doubt it worked outside of .bashrc, unless you have some alias for grep which make it behave differently outside of .bashrc.
As other have said in the comments, your filtering is inefficient. Better rewrite your command with:
find ./ \( -name ".bash*" -o -name ".emacs*" -o -name .gtkrc -o -name .kde -o -name .zshrc \) -prune -o \( -type f -printf "%T# %p\n" \) |sort -n| tail -10|cut -f2- -d" "| tr "\n" "\0" | xargs -0 ls -l;
This way it will not waste time searching files inside .emacs.d/ or inside .kde/, and will immediately prune the search. Also, xargs -0 ls -l is so much shorter and clearer than the while loop.
To avoid issues with filenames that contain newlines, better use \0 characters, that are never part of a file name:
find ./ \( -name ".bash*" -o -name .emacs -o -name .gtkrc -o -name .kde -o -name .zshrc \) -prune -o \( -type f -printf "%T# %p\0" \) |sort -n -z | tail -z -n -10| cut -z -f2- -d" " | xargs -0 ls -l
Part 1: Fixing The Issue
Use a function instead.
There are several major issues with aliases:
Because you pass your content to be string-prefixed inside quotes when creating an alias, it's parsed differently than it would be when typed directly at the command line.
Because an alias is simple prefix substitution, they don't have their own arguments ($1, $2, etc); they don't have a call stack; debugging mechanisms like PS4=':$BASH_SOURCE:$LINENO+'; set -x can't tell you which file code from an alias originated in; etc.
Aliases are an interactive feature; POSIX doesn't mandate that shells support them at all, and they're turned off by default during script execution.
Functions solve all these problems.
recentfiles() {
find ./ \
'(' -name '.bash*' -o -name '.emacs*' -o -name .gtkrc -o -name .kde -o -name .zshrc ')' -prune \
-o -type f -printf "%T# %p\0" |
sort -nz |
tail -z -n -10 |
while read -d' ' _ && IFS= read -r -d '' file; do
printf '%s\0' "$file"
done |
xargs -0 ls -ld --
}
Note that I also made several other changes:
Instead of using \n as a separator, the above code uses \0. This is because newlines can be found in filenames; a file that contained newlines in its name could look like any number of files, with any arbitrary sizes it wanted, to the rest of your pipeline. (Unfortunately, POSIX doesn't require that sort and tail support newline delimiters, so the -z options used above are GNUisms).
Instead of using grep -v to remove dotfiles, I used the -prune option to find. This is particularly important for directories like .kde, since it stops find from spending the time and I/O bandwidth to recurse down directories for which you intend to throw the results away anyhow.
For documentation of the importance of the IFS= and -r arguments used in the while read loop, see BashFAQ #1. Both of these improve behavior in presence of unusual filenames (clearing IFS prevents trailing whitespace from being stripped; passing -r prevents literal backslashes from being elided).
Instead of grep -P -- a GNU extension which is only available if grep was compiled with libpcre support -- my first cut (prior to moving to find -prune) switched to grep -E, which is adequately expressive, much more widely available, and lends itself to higher performance implementations.
Part 2: Explaining The Issue
Running your alias after set -x, we see:
+ find ./ -type f -printf '%T# %p\n'
+ grep -vP '/\.\(bash|emacs|gtkrc|kde/|zshrc\)'
+ sort -n
+ tail -10
+ cut -f2- '-d '
+ read EACH
By contrast, running the command it was intended to wrap, we see:
+ find ./ -type f -printf '%T# %p\n'
+ grep -vP '/\.(bash|emacs|gtkrc|kde/|zshrc)'
+ sort -n
+ tail -10
+ cut -f2- '-d '
+ read EACH
In the command itself, there are no literal backslashes before ( and ).

unix command for file seperation in two different folders

I am currently in data folder which has following files and folders
Folders:
ISOLATE
JUKEBOX
Files:
XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt
...
I want to put the files with .ISOLATE in Folder ISOLATE and .JUKEBOX ones in the JUKEBOX folder. How could I perform this task using terminal?
There are more than 12000 files, so I cannot really change the naming scheme.
Thanks in advance
Try to use wildcards:
mv *.ISOLATE.quantifier.txt ISOLATE/
mv *.JUKEBOX.quantifier.txt JUKEBOX/
If the number of files is too high, you might need to move them in smaller loads.
find -name '*.ISOLATE.quantifier.txt' -maxdepth 1 -exec mv {} ISOLATE/ +
-exec with + should accumulate the command line arguments the same way as xargs, so you shouldn't overflow the maximal number of arguments.
Since you're dealing with huge # of files, you can use this mv with xargs:
printf '%s\0' *.ISOLATE.* | xargs -0 mv -t ISOLATE/
printf '%s\0' *.JUKEBOX.* | xargs -0 mv -t JUKEBOX/
In addition to trying wildcards (bash pattern match or globs), which at some point will hit an upper limit based on the number of files, you can also use find and xargs:
find . -name '*.ISOLATE.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./ISOLATE
find . -name '*.JUKEBOX.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./JUKEBOX
Doing this won't be subject to the maximum number of command line arguments that the glob solution may hit.
They key things in the commands above are:
-maxdepth 1 ensures that find won't keep looking into the ./ISOLOATE or ./JUKEBOX subdirectories
-print0 causes find to delimit the file names with a null byte rather than whitespace. This protects you against files that have spaces or other special characters in their names.
-0 causes xargs to use the null byte delimiter rather than whitespace for the same reason
-IFILE tells xargs to use the string FILE for each of the arguments. Typically xargs puts the filenames on the right, which wouldn't work with the mv command.
I tested the approach with a small shell script:
touch XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
touch XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
touch XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
touch XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt
mkdir ISOLATE
mkdir JUKEBOX
find . -name '*.ISOLATE.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./ISOLATE
find . -name '*.JUKEBOX.*.txt' -maxdepth 1 -print0 | xargs -0 -IFILE mv FILE ./JUKEBOX
find .
Which outputs:
$ bash example.sh
.
./example.sh
./ISOLATE
./ISOLATE/XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.ISOLATE.quantifier.txt
./ISOLATE/XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.ISOLATE.quantifier.txt
./JUKEBOX
./JUKEBOX/XXX-12-2345-67A-89T-1011-12.ab20.RenderBase20.JUKEBOX.quantifier.txt
./JUKEBOX/XXX-24-2345-67A-89T-2022-24.ab10.RenderBase20.JUKEBOX.quantifier.txt

Terminal find, directories last instead of first

I have a makefile that concatenates JavaScript files together and then runs the file through uglify-js to create a .min.js version.
I'm currently using this command to find and concat my files
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;
But it lists files in directories first, this makes heaps of sense but I'd like it to list the .js files in the src/js files above the directories to avoid getting my undefined JS error.
Is there anyway to do this or? I've had a google around and seen the sort command and the -s flag for find but it's a bit above my understanding at this point!
[EDIT]
The final solution is slightly different to the accepted answer but it is marked as accepted as it brought me to the answer. Here is the command I used
cat `find src/js -type f -name "*.js" -print0 | xargs -0 stat -f "%z %N" | sort -n | sed -e "s|[0-9]*\ \ ||"` > public/js/myCleverScript.js
Possible solution:
use find for getting filenames and directory depth, i.e find ... -printf "%d\t%p\n"
sort list by directory depth with sort -n
remove directory depth from output to use filenames only
test:
without sorting:
$ find folder1/ -depth -type f -printf "%d\t%p\n"
2 folder1/f2/f3
1 folder1/file0
with sorting:
$ find folder1/ -type f -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||"
folder1/file0
folder1/f2/f3
the command you need looks like
cat $(find src/js -type f -name "*.js" -printf "%d\t%p\n" | sort -n | sed -e "s|[0-9]*\t||")>min.js
Mmmmm...
find src/js -type f
shouldn't find ANY directories at all, and doubly so as your directory names will probably not end in ".js". The brackets around your "-name" parameter are superfluous too, try removing them
find src/js -type f -name "*.js" -exec cat {} >> ${jsbuild}$# \;
find could get the first directory level already expanded on commandline, which enforces the order of directory tree traversal. This solves the problem just for the top directory (unlike the already accepted solution by Sergey Fedorov), but this should answer your question too and more options are always welcome.
Using GNU coreutils ls, you can sort directories before regular files with --group-directories-first option. From reading the Mac OS X ls manpage it seems that directories are grouped always in OS X, you should just drop the option.
ls -A --group-directories-first -r | tac | xargs -I'%' find '%' -type f -name '*.js' -exec cat '{}' + > ${jsbuild}$#
If you do not have the tac command, you could easily implement it using sed. It reverses the order of lines. See info sed tac of GNU sed.
tac(){
sed -n '1!G;$p;h'
}
You could do something like this...
First create a variable holding the name of our output file:
OUT="$(pwd)/theLot.js"
Then, get all "*.js" in top directory into that file:
cat *.js > $OUT
Then have "find" grab all other "*.js" files below current directory:
find . -type d ! -name . -exec sh -c "cd {} ; cat *.js >> $OUT" \;
Just to explain the "find" command, it says:
find
. = starting at current directory
-type d = all directories, not files
-! -name . = except the current one
-exec sh -c = and for each one you find execute the following
"..." = go to that directory and concatenate all "*.js" files there onto end of $OUT
\; = and that's all for today, thank you!
I'd get the list of all the files:
$ find src/js -type f -name "*.js" > list.txt
Sort them by depth, i.e. by the number of '/' in them, using the following ruby script:
sort.rb:
files=[]; while gets; files<<$_; end
files.sort! {|a,b| a.count('/') <=> b.count('/')}
files.each {|f| puts f}
Like so:
$ ruby sort.rb < list.txt > sorted.txt
Concatenate them:
$ cat sorted.txt | while read FILE; do cat "$FILE" >> output.txt; done
(All this assumes that your file names don't contain newline characters.)
EDIT:
I was aiming for clarity. If you want conciseness, you can absolutely condense it to something like:
find src/js -name '*.js'| ruby -ne 'BEGIN{f=[];}; f<<$_; END{f.sort!{|a,b| a.count("/") <=> b.count("/")}; f.each{|e| puts e}}' | xargs cat >> concatenated

Why ls command combined with xargs and cp move only 10 files?

I have a command that copies file from one dir to another
FILE_COLLECTOR_PATH="/var/www/";
FILE_BACKUP_PATH='/home/'
ls $FILE_COLLECTOR_PATH | head -${1} | xargs -i basename {} | xargs -t -i cp $FILE_COLLECTOR_PATH{} "${FILE_BACKUP_PATH}{}-`date +%F%H%M%S%N`"
I loop it in a shell script like,
#!/bin/sh
SLEEP=120
FILE_COLLECTOR_PATH="/var/www/";
FILE_BACKUP_PATH='/home/'
while true
do
ls $FILE_COLLECTOR_PATH | head -${1} | xargs -i basename {} | xargs -t -i cp $FILE_COLLECTOR_PATH{} "${FILE_BACKUP_PATH}{}-`date +%F%H%M%S%N`"
sleep ${SLEEP}
done
But it seems to move only 10 files and not all files in the dir, Why? It should suppose to move all files.
In general, don't try to parse the output of ls in a script. You can end up with many different types of subtle problems. There is almost always a better tool for the job. Many times, this tool is find. For example, to generate a list of all of the files in a directory and do something to each of them, you would do something like this:
find <search directory> -maxdepth 1 -type f -print0 | xargs -0i basename {} ...
The -print0 and -0 arguments allow find and xargs to communicate filenames in a way that handles special characters (like spaces) correctly.
The find command has other options that you may find useful in a backup script (which is what it appears you are building). Options like -mmin and -newer will enable you to only back up files that have changed since the last iteration.
Try doing
ls -1
instead of just ls, because ls by default don't displays files on a newline (tail expect newlines) for each files when ls -1 does.

Get the newest directory to a variable in Bash

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Resources