How to pass command into xargs that contains pipe? - bash

I have relatively short question, how do you do something like this:
ls -1a | xargs -L1 (find ./'{}' -type f | wc -l)
(it should basically count number of files in each folder within current folder)
(more precisely how to fit find ./'{}' -type f | wc -l into ls -1a | xargs -L1 (HERE)
Thank you in advance!
Edit
I'm using this command to get number of files in every subsequent folder, i.e. result should look like
2134123 # Folder #1
1234231 # Folder #2
12341 # Folder #3
2343224 # Folder #4

Please have a look at Why not parse ls?
find . -type d | while read d; do
echo $d $(find "$d" -maxdepth 1 -type f | wc -l)
done

Related

"find | xargs | ls" not running ls on filenames from find

So I have a directory with files and sub-directories in it. I want to get all the files recursively and then list them in long format, sorted by the modified date. Here's what I came up with.
find . -type f | xargs -d "\n" | ls -lt
However this only lists the files in the current directory and not the sub-directories. I don't understand why, given that the following prints out all the files.
find . -type f | xargs -d "\n" | cat
Any help appreciated.
xargs can only start ls if it's passed ls as an argument. When you pipe from xargs into ls, only one copy of ls is started -- by the parent shell -- and it isn't given any of the filenames from find | xargs as arguments -- instead they're on its stdin, but ls never reads its stdin, so it doesn't even know that they're there.
Thus, you need to remove the | character:
# Does what you specified in the common case, but buggy; don't use this
# (filenames can contain newlines!)
# ...also, xargs -d is GNU-only
find . -type f | xargs -d '\n' ls -lt
...or, better:
# uses NUL separators, which cannot exist inside filenames
# also, while a non-POSIX extension, this is supported in both GNU and BSD xargs
find . -type f -print0 | xargs -0 ls -lt
...or, even better than that:
# no need for xargs at all here; find -exec can do the same thing
# -exec ... {} + is POSIX-mandated functionality since 2008
find . -type f -exec ls -lt {} +
Much of the content in this answer is also covered in the Actions, Complex Actions, and Actions in Bulk sections of Using Find, which is well worth reading.

find folders with executable files

I wrote a script to find all folders that contain executable files. I was first seeking a oneliner command but could find one. (I especially tried to use sort -k -u).
. The script works fine but my initial question remains: Is there a oneliner command to do that?
#! /bin/bash
find $1 -type d | while read Path
do
X=$(ls -l "$Path" | grep '^-rwx' | wc -l)
if ((X>0))
then
echo $Path
fi
done
Using find:
find $1 -type f -perm /111 -exec dirname {} \; | sort -u
This finds all files with permission 111 (i.e. rwx) but then we output only the directory name. To avoid duplicates, sort -u is used.
As pointed out by Paulo Almeida in the comments, this would also work:
find $1 -type f -perm /111 -printf "%h\n" | sort -u

bash shell script not working as intended using cmp with output redirection

I am trying to write a bash script that remove duplicate files from a folder, keeping only one copy.
The script is the following:
#!/bin/sh
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
for f2 in `find ./ -name "*.txt"`
do
if [ -f $f2 ] && [ "$f1" != "$f2" ]
then
# if cmp $f1 $f2 &> /dev/null # DOES NOT WORK
if cmp $f1 $f2
then
rm $f2
echo "$f2 purged"
fi
fi
done
fi
done
I want to redirect the output and stderr to /dev/null to avoid printing them to screen.. But using the commented statement this script does not work as intended and removes all files but the first..
I'll give more informations if needed.
Thanks
Few comments:
First, the:
for f1 in `find ./ -name "*.txt"`
do
if test -f $f1
then
is the same as (find only plain files with the txt extension)
for f1 in `find ./ -type f -name "*.txt"`
Better syntax (bash only) is
for f1 in $(find ./ -type f -name "*.txt")
and finally the whole is wrong, because if the filename contains a space, the f1 variable will not get the full path name. So instead the for do:
find ./ -type f -name "*.txt" -print | while read -r f1
and as #Sir Athos pointed out, the filename can contain \n so the best is to use
find . -type f -name "*.txt" -print0 | while IFS= read -r -d '' f1
Second:
Use "$f1" instead of $f1 - again, because the $f1 can contain space.
Third:
doing N*N comparisons is not very effective. You should make a checksum (md5 or better sha256) for every txt file. When the checksum is identical - the files are dups.
If you don't trust checksums, simply compare only files what has identical checksums. Files with different checksum are SURE not duplicates. ;)
Making checksums are slow to, so you should 1st compare ony files with the same size. Different sized files are not duplicates...
You can skip empty txt files - they are duplicates all :).
so the final command can be:
find -not -empty -type f -name \*.txt -printf "%s\n" | sort -rn | uniq -d |\
xargs -I% -n1 find -type f -name \*.txt -size %c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate
commented:
#find all non-empty file with the txt extension and print their size (in bytes)
find . -not -empty -type f -name \*.txt -printf "%s\n" |\
#sort the sizes numerically, and keep only duplicated sizes
sort -rn | uniq -d |\
#for each sizes (what are duplicated) find all files with the given size and print their name (path)
xargs -I% -n1 find . -type f -name \*.txt -size %c -print0 |\
#make an md5 checksum for them
xargs -0 md5sum |\
#sort the checksums and keep duplicated files separated with an empty line
sort | uniq -w32 --all-repeated=separate
The output now, you can simply edit the output file and decide what want remove and what file want keep.
&> is bash syntax, you'll need to change the shebang line (first line) to #!/bin/bash (or the appropriate path to bash.
Or if you're really using the Bourne Shell (/bin/sh), then you have to use old-style redirection, i.e.
cmp ... >/dev/null 2>&1
Also, I think the &> was only introduced in bash 4, so if you're using bash, 3.X you'll still need the old-style redirections.
IHTH
Credit to #kobame for this answer: this is really a comment but for the formatting.
You don't need to call find twice, print out the size and the filename in the find command
find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
# find the files that have duplicate sizes
sort -n | uniq -Dw 8 |
# strip off the size and get the md5 sum
cut -c 10- | xargs md5sum
An example
$ cat a.txt
this is file a
$ cat b.txt
this is file b
$ cat c.txt
different contents
$ cp a.txt d.txt
$ cp b.txt e.txt
$ find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
sort -n | uniq -Dw 8 | cut -c 10- | xargs md5sum
76fd4c1589ef708d9203f3cf09cfd032 ./a.txt
e2d75fd6a1080efb6230d0608b1f9014 ./b.txt
76fd4c1589ef708d9203f3cf09cfd032 ./d.txt
e2d75fd6a1080efb6230d0608b1f9014 ./e.txt
To keep one and delete the rest, I would pipe the output into:
... | awk '++seen[$1] > 1 {print $2}' | xargs echo rm
rm ./d.txt ./e.txt
Remove the echo if your testing is satisfactory.
Like many complex pipelines, filenames containing newlines will break it.
All nice answers, so only one short suggestion: you can install and use the
fdupes -r .
from the man:
Searches the given path for duplicate files. Such files are found by
comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
Added by #Francesco
fdupes -rf . | xargs rm -f
for remove dupes. (the -f in fdupes omit the 1st occurence the file, so list only dupes)

Script to count number of files in each directory

I need to count the number of files on a large number of directories. Is there an easy way to do this with a shell script (using find, wc, sed, awk or similar)? Just to avoid having to write a proper script in python.
The output would be something like this:
$ <magic_command>
dir1 2
dir2 12
dir3 5
The number after the dir name would be the number of files. A plus would be able to turn counting of dot/hidden files on and off.
Thanks!
Try the below one:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
from http://www.linuxquestions.org/questions/linux-newbie-8/how-to-find-the-total-number-of-files-in-a-folder-510009/#post3466477
find <dir> -type f | wc -l
find -type f will list all files in the specified directory one at each line, wc -l count the amount of newlines seen from stdin.
Also for future reference: answers like this are a google away.
More or less what I was looking for:
find . -type d -exec sh -c 'echo "{}" `ls "{}" |wc -l`' \;
try ls | wc it list the file in your directory and gives list of file output to wc as input
One way like this:
$ for dir in $(find . -type d )
> do
> echo $dir $(ls -A $dir | wc -l )
> done
Just remove the -A option if you do not want the hidden file count
find . -type d | xargs ls -1 | perl -lne 'if(/^\./ || eof){print $a." ".$count;$a=$_;$count=-1}else{$count++}'
below is the test:
> find . -type d
.
./SunWS_cache
./wicked
./wicked/segvhandler
./test
./test/test2
./test/tempdir.
./signal_handlers
./signal_handlers/part2
> find . -type d | xargs ls -1 | perl -lne 'if(/^\./ || eof){print $a." ".$count;$a=$_;$count=-1}else{$count++}'
.: 79
./SunWS_cache: 4
./signal_handlers: 6
./signal_handlers/part2: 5
./test: 6
./test/tempdir.: 0
./test/test2: 0
./wicked: 4
./wicked/segvhandler: 9
A generic version of Mehdi Karamosly's solution to list folders of any directory without changing current directory
DIR=~/test/ sh -c 'cd $DIR; du -a | cut -d/ -f2 | sort | uniq -c | sort -nr'
Explanation:
Extract directory into variable
Start new shell
Change directory in that shell so that current shell's directory stays same
Process
I use these functions:
nf()(for d;do echo $(ls -A -- "$d"|wc -l) "$d";done)
nfr()(for d;do echo $(find "$d" -mindepth 1|wc -l) "$d";done)
Both assume that filenames don't contain newlines.
Here's bash-only versions:
nf()(shopt -s nullglob dotglob;for d;do a=("$d"/*);echo "${#a[#]} $d";done)
nfr()(shopt -s nullglob dotglob globstar;for d;do a=("$d"/**);echo "${#a[#]} $d";done)
I liked the output from the du based answer, but when I was looking at a large filesystem it was taking ages, so I put together a small ls based script which gives the same output, but much quicker:
for dir in `ls -1A ~/test/`;
do
echo "$dir `ls -R1Ap ~/test/$dir | grep -Ev "[/:]|^\s*$" | wc -l`"
done
You can try out copying the output of ls command in a text file and then count the number of lines in that file.
ls $LOCATION > outText.txt; NUM_FILES=$(wc -w outText.txt); echo $NUM_FILES
find -type f -printf '%h\n' | sort | uniq -c | sort -n

Get the newest directory to a variable in Bash

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Resources