How to find files containing exactly 16 lines? - bash

I have to find files that containing exactly 16 lines in Bash.
My idea is:
find -type f | grep '/^...$/'
Does anyone know how to utilise find + grep or maybe find + awk?
Then,
Move the matching files another directory.
Deleting all non-matching files.

I would just do:
wc -l **/* 2>/dev/null | awk '$1=="16"'

Keep it simple:
find . -type f |
while IFS= read -r file
do
size=$(wc -l < "$file")
if (( size == 16 ))
then
mv -- "$file" /wherever/you/like
else
rm -f -- "$file"
fi
done
If your file names can contain newlines then google for the find and read options to handle that.

You should use grep instead of wc because wc counts newline characters \n and will not count if the last line doesn't ends with a newline.
e.g.
grep -cH '' * 2>/dev/null | awk -F: '$2==16'
for more correct approach (without error messages, and without argument list too long error) you should combine it with the find and xargs commands, like
find . -type f -print0 | xargs -0 grep -cH '' | awk -F: '$2==16'
if you don't want count empty lines (so only lines what contains at least one character), you can replace the '' with the '.'. And instead of awk, you can use second grep, like:
find . -type f -print0 | xargs -0 grep -cH '.' | grep ':16$'
this will find all files what are contains 16 non-empty lines... and so on..

GNU sed
sed -E '/^.{16}$/!d' file

A pure bash version:
#!/usr/bin/bash
for f in *; do # Look for files in the present dir
[ ! -f "$f" ] && continue # Skip not simple files
cnt=0
# Count the first 17 lines
while ((cnt<17)) && read x; do ((++cnt)); done<"$f"
if [ $cnt == 16 ] ; then echo "Move '$f'"
else echo "Delete '$f'"
fi
done

This snippet will do the work:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then echo "file $0 has 16 lines"; else echo "file $0 doesn'"'"'t have 16 lines"; fi' {} \;
Hence, if you need to delete the files that are not 16 lines long, and move those who are 16 lines long to folder /my/folder, this will do:
find . -type f -readable -exec bash -c \
'if(( $(grep -m 17 -c "" "$0")==16 )); then mv -nv "$0" /my/folder; else rm -v "$0"; fi' {} \;
Observe the quoting for "$0" so that it's safe regarding any file name with funny symbols in it (spaces, ...).
I'm using the -v option so that rm and mv are verbose (I like to know what's happening). The -n option to mv is no-clobber: a security to not overwrite an existing file; this option might not be available if you have an old system.
The good thing about this method. It's really safe regarding any filename containing funny symbols.
The bad thing(s). It forks a bash and a grep and an mv or rm for each file found. This can be quite slow. This can be fixed using trickier stuff (while still remaining safe regarding funny symbols in filenames). If you really need it, I can give you a possible answer. It will also break if file can't be (re)moved.
Remark. I'm using the -readable option to find, so that it only considers files that are readable. If you have this option, use it, you'll have a more robust command!

I would go with
find . -type f | while read f ; do
[[ "${f##*/}" =~ ^.{16}$ ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
or
find . -type f | while read f ; do
[[ $(echo -n "${f##*/}" | wc -c) -eq 16 ]] && mv "${f}" <any_directory> || rm -f "${f}"
done
Replace <any_directory> with the directory you actually want to move the files to.
BTW, find command will go sub-directories. if you don't want this, then you should change the find command to fit your need.

Related

how to find every file in my repo that has a specific word in the last line?

In other words, how to combine tail and find/grep command in bash.
I want to find all the files(including the files in subdirectories) in my repo have a specific word in the last line, say FIX in the last line. I tried grep -Rl "FIX" to display all the files containing "FIX", but I don't know how to combine the tail command in it. Anyone can help??
Run tail on all the files at once and then grep the output for FIX. Since tail prepends each line with the corresponding file name when given multiple file names, that's all you have to do.
find -type f -exec tail -n1 {} + | grep FIX
Or use ** to find all files and subdirectories, then run tail on each of them one at a time:
shopt -s globstar
for file in **; do
[[ -f $file ]] && tail -n1 "$file" | grep -q FIX && echo "$file"
done
Or use find to find all matches and pipe it to a while read loop:
find -type f -print0 | while IFS= read -rd '' file; do
tail -n1 "$file" | grep -q FIX && echo "$file"
done
Or do the same thing but with -exec + and an explicit sub-shell:
find -type f -exec sh -c 'for file; do tail -n1 "$file" | grep -q FIX && echo "$file"; done' sh {} +
If you want to know if the last line matches a pattern, use sed and restrict the match to the last line with $. sed doesn't easily give a return value or do pretty printing of the filename like grep, but it gets the job done.
find . -exec sh -c "sed -n '$ { /FIX/p; }' {} | grep -q . " \; -print
Here, we use -n to suppress printing, and then print (with /p) only when the last line matches the pattern /FIX/. The output is piped to grep to get a return value that find uses to decide whether or not to -print the name.
Or, you can avoid using grep for the return by doing something like:
find . -exec awk 'END{ exit ! match($0, "FIX")}' {} \; -print

How to use bash string formatting to reverse date format?

I have a lot of files that are named as: MM-DD-YYYY.pdf. I want to rename them as YYYY-MM-DD.pdf I’m sure there is some bash magic to do this. What is it?
For files in the current directory:
for name in ./??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
Recursively, in or under the current directory:
find . -type f -name '??-??-????.pdf' -exec bash -c '
for name do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done' bash {} +
Enabling the globstar shell option in bash lets us do the following (will also, like the above solution, handle all files in or below the current directory):
shopt -s globstar
for name in **/??-??-????.pdf; do
if [[ "$name" =~ (.*)/([0-9]{2})-([0-9]{2})-([0-9]{4})\.pdf ]]; then
echo mv "$name" "${BASH_REMATCH[1]}/${BASH_REMATCH[4]}-${BASH_REMATCH[3]}-${BASH_REMATCH[2]}.pdf"
fi
done
All three of these solutions uses a regular expression to pick out the relevant parts of the filenames, and then rearranges these parts into the new name. The only difference between them is how the list of pathnames is generated.
The code prefixes mv with echo for safety. To actually rename files, remove the echo (but run at least once with echo to see that it does what you want).
A direct approach example from the command line:
$ ls
10-01-2018.pdf 11-01-2018.pdf 12-01-2018.pdf
$ ls [0-9]*-[0-9]*-[0-9]*.pdf|sed -r 'p;s/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\1-\2/'|xargs -n2 mv
$ ls
2018-10-01.pdf 2018-11-01.pdf 2018-12-01.pdf
The ls output is piped to sed , then we use the p flag to print the argument without modifications, in other words, the original name of the file, and s to perform and output the conversion.
The ls + sed result is a combined output that consist of a sequence of old_file_name and new_file_name.
Finally we pipe the resulting feed through xargs to get the effective rename of the files.
From xargs man:
-n number Execute command using as many standard input arguments as possible, up to number arguments maximum.
You can use the following command very close to the one of klashxx:
for f in *.pdf; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
before:
ls *.pdf
12-01-1998.pdf 12-03-2018.pdf
after:
ls *.pdf
1998-01-12.pdf 2018-03-12.pdf
Also if you have other pdf files that does not respect this format in your folder, what you can do is to select only the files that respect the format: MM-DD-YYYY.pdf to do so use the following command:
for f in `find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf' | xargs -n1 basename`; do echo "$f"; mv "$f" "$(echo "$f" | sed 's#\(..\)-\(..\)-\(....\)#\3-\2-\1#')"; done
Explanations:
find . -maxdepth 1 -type f -regextype sed -regex './[0-9]\{2\}-[0-9]\{2\}-[0-9]\{4\}.pdf this find command will look only for files in the current working directory that respect your syntax and extract their basename (remove the ./ at the beginning, folders and other type of files that would have the same name are not taken into account, other *.pdf files are also ignored.
for each file you do a move and the resulting file name is computed using sed and back reference to the 3 groups for MM,DD and YYYY
For these simple filenames, using a more verbose pattern, you can simplify the body of the loop a bit:
twodigit=[[:digit:]][[:digit:]]
fourdigit="$twodigit$twodigit"
for f in $twodigit-$twodigit-$fourdigit.pdf; do
IFS=- read month day year <<< "${f%.pdf}"
mv "$f" "$year-$month-$day.pdf"
done
This is basically #Kusalananda's answer, but without the verbosity of regular-expression matching.

find files and delete by filename parameter

I have a folder with lots of images. In this folder are subfolders containing high resolution images. Images can be .png, .jpg or .gif.
Some images are duplicates called a.jpg and a.hi.jpg or a.b.c.gif and a.b.c.hi.gif. File names are always different, the will be never a.gif, a.jpg or a.png. I guess i have not to take care of extension.
These are the same images with different resolution.
Now i want to write a script to delete all lower resolution images. But there are files that do not have high resolution like b.png. So i want to delete only if there is a high resolution image too.
I guess i have to do something like this, but can't figure out how exactly.
find . -type f -name "*" if {FILENAME%hi*} =2 --delete smallest else keep file
Could anyone help? Thanks
Something like the following could do the job:
#!/bin/bash
while IFS= read -r -d '' hi
do
d=$(dirname "$hi")
b=$(basename "$hi")
low="${b//.hi./}"
[[ -e "$d/$low" ]] && echo rm -- "$d/$low" #dry run - if satisfied, remove the echo
done < <(find /some/path -type f -name \*.hi.\* -print0)
how it works:
finds all files with .hi. in their names. (not only images, you can extend the find be more restrictive
for all found images
get the directory, where is he
and get the name of the file (without directory)
in the name, remove all occurences of the string .hi. (aka make the "lowres" name
check the existence of the lowres image
delete if exists.
You can use bash extended glob features for this, which you can enable first by
shopt -s extglob
and using the pattern
!(pattern-list)
Matches anything except one of the given patterns.
Now to store the files not containing the string hi
shopt -s extglob
fileList=()
fileList+=( !(*hi*).jpg )
fileList+=( !(*hi*).gif )
fileList+=( !(*hi*).png )
You can print once the array to see if it lists all the files you need as
printf "%s\n" "${fileList[#]}"
and to delete those files do
for eachfile in "${fileList[#]}"; do
rm -v -- "$eachfile"
done
(or) as Benjamin.W suggested in comments below, do
rm -v -- "#{fileList[#]}"
Now i want to write a script to delete all lower resolution images
This script could be used for that:
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' | while read F; do LOWRES="$(echo "$F" | rev | cut -c7- | rev)$(echo "$F" | rev | cut -c 1-3 | rev)"; if [ -f "$LOWRES" ]; then echo rm -fv -- "$LOWRES"; fi; done
You can run it to see what files will be removed first. If you're ok with results then remove echo before rm command.
Here is the non-one line version, but a script:
#!/bin/sh
find /path/to/dir -type f -iname '*.hi.png' -or -iname '*.hi.gif' -or -iname '*.hi.jpg' |
while read F; do
NAME="$(echo "$F" | rev | cut -c7- | rev)"
EXTENSION="$(echo "$F" | rev | cut -c 1-3 | rev)"
LOWRES="$NAME$EXTENSION"
if [ -f "$LOWRES" ]; then
echo rm -fv -- "$LOWRES"
fi
done

Filenames are lost when I find multiple pdf files, xarg pdftotext, and grep pattern

I want to make a shell script for searching pattern in pdf files (to make them kind of corpus for myself!!)
I stole the following snippet from here
How to search contents of multiple pdf files?
find /path/to/folder -name '*.pdf' | xargs -P 6 -I % pdftotext % - | grep -C1 --color "pattern"
and the output looks like this
--
--
small deviation of γ from the average value 0.33 triggers
a qualitative difference in the evolution pattern, even if the
Can I make this command to print filename?
It doesn't have to be a "one-liner".
Thank you.
Not much. Just split the command into a loop.
find /path/to/folder -name '*.pdf' | while read file
do
echo "$file"
pdftotext "$file" | grep -C1 --color "pattern" && echo "$file"
done
EDIT: I just noticed the example included a parallel xargs command. This is not impossible to solve in a loop. You can write the pdftotext & grep command into a function and then use xargs
EDIT2: only print out file when there is a match
it might look something like this:
#!/bin/bash
files=$(find /path/to/folder -name '*.pdf')
function PDFtoText
{
file="$1"
if [ "$#" -ne "1" ]
then
echo "Invalid number of input arguments"
exit 1
fi
pdftotext "$file" | grep -C1 --color "pattern" && echo "$file"
}
export -f PDFtoText
printf "%s\n" ${files[#]} | xargs -n1 -P 6 -I '{}' bash -c 'PDFtoText "$#" || exit 255' arg0 {}
if [[ $? -ne 0 ]]
then
exit 1
fi
Why don't use something like
find /path/to/folder/ -type f -name '*.pdf' -print0 | \
xargs -0 -I{} \
sh -c 'echo "===== file: {}"; pdftotext "{}" - | grep -C1 --color "pattern"'
It always prints the filename. Do you think it's an acceptable compromise? Otherwise the echo part can be moved after the grep with a && as suggested before.
I prefer to use -print0 in combination with -0 just to deal with filenames with spaces.
I'd remove the -P6 option because the output of the 6 processes in parallel could be mixed.

How do I limit the results of the command find in bash?

The following command:
find . -name "file2015-0*" -exec mv {} .. \;
Affects about 1500 results. One by one they move a previous level.
If I would that the results not exceeds for example in 400? How could I?
You can do this:
find . -name "file2015-0*" | head -400 | xargs -I filename mv filename ..
If you want to simulate what it does use echo:
find . -name "file2015-0*" | head -400 | xargs -I filename echo mv filename ..
You can for example provide the find output into a while read loop and keep track with a counter:
counter=1
while IFS= read -r file
do
[ "$counter" -ge 400 ] && exit
mv "$file" ..
((counter++))
done < <(find . -name "file2015-0*")
Note this can lead to problems if the file name contains new lines... which is quite unlikely. Also, note the mv command is now moving to the upper level. If you want it to be related to the path of the dir, some bash conversion can make it.

Resources