How do I limit the results of the command find in bash? - bash

The following command:
find . -name "file2015-0*" -exec mv {} .. \;
Affects about 1500 results. One by one they move a previous level.
If I would that the results not exceeds for example in 400? How could I?

You can do this:
find . -name "file2015-0*" | head -400 | xargs -I filename mv filename ..
If you want to simulate what it does use echo:
find . -name "file2015-0*" | head -400 | xargs -I filename echo mv filename ..

You can for example provide the find output into a while read loop and keep track with a counter:
counter=1
while IFS= read -r file
do
[ "$counter" -ge 400 ] && exit
mv "$file" ..
((counter++))
done < <(find . -name "file2015-0*")
Note this can lead to problems if the file name contains new lines... which is quite unlikely. Also, note the mv command is now moving to the upper level. If you want it to be related to the path of the dir, some bash conversion can make it.

Related

Searching for hundreds of files on a server

I have a list of 577 image files that I need to search for on a large server. I am no expert when it comes to bash so the best I could do myself was 577 lines of this:
find /source/directory -type f -iname "alternate1_1052956.tif" -exec cp {} /dest/directory \;
...repeating this line for each file name. It works... but it's unbelievably slow because it searches the entire server for one file and then moves on to the next line, but each search could take 20 minutes. I left this overnight and it only found 29 of them by the morning which is just way too slow. It could take two weeks at that rate to find all of these.
I've tried separating each line with -o as an OR separator in the hopes that it would search once for 577 files but I can't get it to work.
Does anyone have any suggestions? I also tried using the .txt file I have of the file names as a basis for the search but couldn't get that to work either. Unfortunately I don't have the paths for these files, only the basenames.
If you want to copy all .tif files
find /source/directory -type f -name "*.tif" -exec cp {} /dest/directory \;
# ^
On MacOS, use the mdfind command that will look for the filename in the SpotLight index. This is very fast as it is only an index lookup, just like the locate command in Linux:
cp $(mdfind alternate1_1052956.tif) /dest/directory
If you have all the filenames in a file (one line per file) use xargs
xargs -L 1 -I {} cp $(mdfind {}) /dest/directory < file_with_list
Create a file with all filenames, then write a loop which runs through that file and executes command in background.
Note, that this will take a lot of memory, as you will be executing this simultaneously multiple times. So make sure you have enough memory for this.
while read -r line; do
find /source/directory -type f -iname "$line" -exec cp {} /dest/directory \ &;
done < input.file
There are a few assumption made in this answer. You have a list of all 577 file names, let's call it, inputfile.list. There are no whitespaces in the file names. Following may work:
$ cat findcopy.sh
#!/bin/bash
cmd=$(
echo -n 'find /path/to/directory -type f '
readarray -t filearr < inputfile.list # Read the list to an array
n=0
for f in "${filearr[#]}" # Loop over the array and print -iname
do
(( n > 0 )) && echo "-o -iname ${f}" || echo "-iname ${f}"
((n++))
done
echo -n ' | xargs -I {} cp {} /path/to/destination/'
)
eval $cmd
execute: ./findcopy.sh
Note for MacOS. It doesn't have readarray. Instead use any other simple method to feed the list into array, for example,
filearr=($(cat inputfile.list))

find files and delete by filename parameter

I have a folder with lots of images. In this folder are subfolders containing high resolution images. Images can be .png, .jpg or .gif.
Some images are duplicates called a.jpg and a.hi.jpg or a.b.c.gif and a.b.c.hi.gif. File names are always different, the will be never a.gif, a.jpg or a.png. I guess i have not to take care of extension.
These are the same images with different resolution.
Now i want to write a script to delete all lower resolution images. But there are files that do not have high resolution like b.png. So i want to delete only if there is a high resolution image too.
I guess i have to do something like this, but can't figure out how exactly.
find . -type f -name "*" if {FILENAME%hi*} =2 --delete smallest else keep file
Could anyone help? Thanks
Something like the following could do the job:
#!/bin/bash
while IFS= read -r -d '' hi
do
d=$(dirname "$hi")
b=$(basename "$hi")
low="${b//.hi./}"
[[ -e "$d/$low" ]] && echo rm -- "$d/$low" #dry run - if satisfied, remove the echo
done < <(find /some/path -type f -name \*.hi.\* -print0)
how it works:
finds all files with .hi. in their names. (not only images, you can extend the find be more restrictive
for all found images
get the directory, where is he
and get the name of the file (without directory)
in the name, remove all occurences of the string .hi. (aka make the "lowres" name
check the existence of the lowres image
delete if exists.
You can use bash extended glob features for this, which you can enable first by
shopt -s extglob
and using the pattern
!(pattern-list)
Matches anything except one of the given patterns.
Now to store the files not containing the string hi
shopt -s extglob
fileList=()
fileList+=( !(*hi*).jpg )
fileList+=( !(*hi*).gif )
fileList+=( !(*hi*).png )
You can print once the array to see if it lists all the files you need as
printf "%s\n" "${fileList[#]}"
and to delete those files do
for eachfile in "${fileList[#]}"; do
rm -v -- "$eachfile"
done
(or) as Benjamin.W suggested in comments below, do
rm -v -- "#{fileList[#]}"
Now i want to write a script to delete all lower resolution images
This script could be used for that:
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' | while read F; do LOWRES="$(echo "$F" | rev | cut -c7- | rev)$(echo "$F" | rev | cut -c 1-3 | rev)"; if [ -f "$LOWRES" ]; then echo rm -fv -- "$LOWRES"; fi; done
You can run it to see what files will be removed first. If you're ok with results then remove echo before rm command.
Here is the non-one line version, but a script:
#!/bin/sh
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' |
while read F; do
NAME="$(echo "$F" | rev | cut -c7- | rev)"
EXTENSION="$(echo "$F" | rev | cut -c 1-3 | rev)"
LOWRES="$NAME$EXTENSION"
if [ -f "$LOWRES" ]; then
echo rm -fv -- "$LOWRES"
fi
done

How to find files containing exactly 16 lines?

I have to find files that containing exactly 16 lines in Bash.
My idea is:
find -type f | grep '/^...$/'
Does anyone know how to utilise find + grep or maybe find + awk?
Then,
Move the matching files another directory.
Deleting all non-matching files.
I would just do:
wc -l **/* 2>/dev/null | awk '$1=="16"'
Keep it simple:
find . -type f |
while IFS= read -r file
do
size=$(wc -l < "$file")
if (( size == 16 ))
then
mv -- "$file" /wherever/you/like
else
rm -f -- "$file"
fi
done
If your file names can contain newlines then google for the find and read options to handle that.
You should use grep instead of wc because wc counts newline characters \n and will not count if the last line doesn't ends with a newline.
e.g.
grep -cH '' * 2>/dev/null | awk -F: '$2==16'
for more correct approach (without error messages, and without argument list too long error) you should combine it with the find and xargs commands, like
find . -type f -print0 | xargs -0 grep -cH '' | awk -F: '$2==16'
if you don't want count empty lines (so only lines what contains at least one character), you can replace the '' with the '.'. And instead of awk, you can use second grep, like:
find . -type f -print0 | xargs -0 grep -cH '.' | grep ':16$'
this will find all files what are contains 16 non-empty lines... and so on..
GNU sed
sed -E '/^.{16}$/!d' file
A pure bash version:
#!/usr/bin/bash
for f in *; do # Look for files in the present dir
[ ! -f "$f" ] && continue # Skip not simple files
cnt=0
# Count the first 17 lines
while ((cnt<17)) && read x; do ((++cnt)); done<"$f"
if [ $cnt == 16 ] ; then echo "Move '$f'"
else echo "Delete '$f'"
fi
done
This snippet will do the work:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then echo "file $0 has 16 lines"; else echo "file $0 doesn'"'"'t have 16 lines"; fi' {} \;
Hence, if you need to delete the files that are not 16 lines long, and move those who are 16 lines long to folder /my/folder, this will do:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then mv -nv "$0" /my/folder; else rm -v "$0"; fi' {} \;
Observe the quoting for "$0" so that it's safe regarding any file name with funny symbols in it (spaces, ...).
I'm using the -v option so that rm and mv are verbose (I like to know what's happening). The -n option to mv is no-clobber: a security to not overwrite an existing file; this option might not be available if you have an old system.
The good thing about this method. It's really safe regarding any filename containing funny symbols.
The bad thing(s). It forks a bash and a grep and an mv or rm for each file found. This can be quite slow. This can be fixed using trickier stuff (while still remaining safe regarding funny symbols in filenames). If you really need it, I can give you a possible answer. It will also break if file can't be (re)moved.
Remark. I'm using the -readable option to find, so that it only considers files that are readable. If you have this option, use it, you'll have a more robust command!
I would go with
find . -type f | while read f ; do
[[ "${f##*/}" =~ ^.{16}$ ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
or
find . -type f | while read f ; do
[[ $(echo -n "${f##*/}" | wc -c) -eq 16 ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
Replace <any_directory> with the directory you actually want to move the files to.
BTW, find command will go sub-directories. if you don't want this, then you should change the find command to fit your need.

Bash looping through files in Directory

I have a bash script, created by someone else, that I need to modify a little.
Since I'm new to Bash, I may need a little help with some common commands.
The script simply loops through a directory (recursively) for a specific file extension.
Here's the current script: (runme.sh)
#! /bin/bash
SRC=/docs/companies/
function report()
{
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" `find $SRC -iname "*.aws" -type f -print |wc -l`
echo "-----------------------"
exit 0
}
report
I simply type #./runme.sh and I can see a list of all files with the extension of .aws
My primary goal is to limit the search. (some directories have way too many files)
I would like to run the script, limiting it to just 20 files.
Do I need to place the entire script into a loop method?
That's easy -- as long as you want the first 20 files, just pipe the first find command through head -n 20. But I can't resist a little cleanup while I'm at it: as written, it runs find twice, once to print the filenames and once to count them; if there are a lot of files to search, this is a waste of time. Second, wrapping the actual content of the script in a function (report) doesn't make much sense, and having the function exit (rather than returning) makes even less. Finally, I like to protect filenames with double-quotes and hate backquotes (use $() instead). So I took the liberty of a bit of cleanup:
#! /bin/bash
SRC=/docs/companies/
files="$(find "$SRC" -iname "*.aws" -type f -print)"
if [ -n "$files" ]; then
count="$(echo "$files" | wc -l)"
else # echo would print one line even if there are no files, so special-case the empty list
count=0
fi
echo "-----------------------"
echo "$files" | head -n 20
echo -e "\033[1mSOURCE FILES=\033[0m $count"
echo "-----------------------"
Use head -n 20 (as proposed by Peter). Additional remark: the script is very inefficient, as it runs find twice. You should consider using tee to gennerate a temporary file when the command runs for the first time, count the lines of this file afterwards and delete the file.
I would personnaly prefer to do it like this:
files=0
while read file ; do
files=$(($files + 1))
echo $file
done < <(find "$SRC" -iname "*.aws" -type f -print0 | head -20)
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" $files
echo "-----------------------"
If you just want there count, you could only use find "$SRC" -iname "*.aws" -type f -print0 | head -20

Get the newest directory to a variable in Bash

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?
BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.
There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.
This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.
Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)
The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)
With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.
Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Resources