I'm looking for a way to automate renaming all images with a wrong filename extension. So far I at least found out how to get the list of all these files:
find /media/folder/ -name *.jpg -exec file {} \; | grep 'PNG\|GIF' > foobar.txt
find /media/folder/ -name *.png -exec file {} \; | grep 'JPEG\|GIF' >> foobar.txt
find /media/folder/ -name *.gif -exec file {} \; | grep 'JPEG\|PNG' >> foobar.txt
However, I would also like to automate the renaming. I tried things like
find /media/folder/ -name *.jpg -exec file {} \; | grep -l PNG | rename s/.jpg/.png/
but in this case grep -l or grep -lH don't list only filenames like I thought they would.
The -l and -H flags of grep are not useful in your example. These flags have no effect when used with the standard input, like in your example coming from the pipe. These flags only work if you specify files (or directories and the -r flag for recursion), for example:
grep -rl PNG path/to/dir1 file2 file3
In your example the -l has no effect, so the output is the complete lines that matched PNG, which in your example probably look something like this:
icon.png: PNG image, 512 x 512, 8-bit/color RGBA, non-interlaced
To get only the filename, maybe you can cut off everything after the colon like this:
find /media/folder/ -name *.jpg -exec file {} \; | grep PNG | sed -e s/:.*// | rename s/.jpg/.png/
Related
I copied and re-sorted nearly 1TB of files on a Drobo using a find . -name \*.%ext% -print0 | xargs -I{} -0 cp -v {} %File%s command. I need to make sure all the files copied correctly. This is what I have so far:
#!/bin/sh
find . -type f -exec basename {} \; > Filelist.txt
sort -o Filelist.txt Filelist.txt
uniq -c Filelist.txt Duplist.txt
I need to find a way get the checksum for each file as well as making sure all of them are duplicated. The source folder is in the same directory as the copies, it is arranged as follows:
_Stock
_Audio
_CG
_Images
_Source (the files in all other folders come from here)
_Videos
I'm working on OSX.
#!/bin/sh
find . \( ! -regex '.*/\..*' \) -type f -exec shasum {} \; -exec basename {} \; | cut -c -40 | sed 'N;s/\n/ /' > Filelist.txt
sort -o Filelist.txt Filelist.txt
uniq -c Filelist.txt Duplist.txt
sort -o Duplist.txt Duplist.txt
The regex expression removes hidden files, the shasum and basename arguments create two separate outputs in the text file so we | to cut and then sed to merge the outputs so that the sort and uniq commands can parse them. The script is messy but it got the job done quite nicely.
I have a directory that contains files and other directories. And I have one specific file where I know that there are duplicates of somewhere in the given directory tree.
How can I find these duplicates using Bash on macOS?
Basically, I'm looking for something like this (pseudo-code):
$ find-duplicates --of foo.txt --in ~/some/dir --recursive
I have seen that there are tools such as fdupes, but I'm neither interested in any duplicate files (only duplicates of a specific file) nor am I interested in duplicates anywhere on disk (only within the given directory or its subdirectories).
How do I do this?
For a solution compatible with macOS built-in shell utilities, try this instead:
find DIR -type f -print0 | xargs -0 md5 -r | grep "$(md5 -q FILE)"
where:
DIR is the directory you are interested in;
FILE is the file (path) you are searching for duplicates of.
If you only need the duplicated files paths, then pipe thru this as well:
cut -d' ' -f2
If you're looking for a specific filename, you could do:
find ~/some/dir -name foo.txt
which would return a list of all files with the name foo.txt in the directory. If you're looking if there are multiple files in the directory with the same name, you could do:
find ~/some/dir -exec basename {} \; | sort | uniq -d
This will give you a list of files with duplicate names (you can then use find again to figure out where those live).
---- EDIT -----
If you're looking for identical files (with the same md5 sum), you could also do:
find . -type f -exec md5sum {} \; | sort | uniq -d --check-chars=32
--- EDIT 2 ----
If your md5sum doesn't output the filename, you can use:
find . -type f -exec echo -n "{} " \; -exec md5sum {} \; | awk {'print $2 $1'} | sort | uniq -d --check-chars=32
--- EDIT 3 ----
if you're looking for a file with a specific md5 sums:
sum=`md5sum foo.txt | cut -f1 -d " "`
find ~/some/dir -type f -exec md5sum {} \; | grep $sum
ls -1 | xargs --verbose -I{} basename {} \
| sed 's/\.[^.]*$//' \
| xargs -I{} lame {}.* {}.wav
Using this code to convert all the wav files in the folder to mp3 throws a error:
xargs : lame : No such file or directory
Try this:
find . -type f -name "*.wav" -exec bash -c 'mv $0 ${0/\.wav/\.mp3}' {} \;
find - will recursively find for a file type starting current directoy with name like *.wav
-exec will move file with .wav extension to .mp3.
why not a loop instead of multiple xargs ?
LameApp="/YourAppPath/lame.exe"
ls | while read WavFileWExt
do
"${LameApp}" "${WavFileWExt%.*}.mp3" "${WavFileWExt}"
done
So after a lot of searching and trying to interpret others' questions and answers to my needs, I decided to ask for myself.
I'm trying to take a directory structure full of images and place all the images (regardless of extension) in a single folder. In addition to this, I want to be able to remove images matching certain filenames in the process. I have a find command working that outputs all the filepaths for me
find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'
but if I try to use that to copy files, I have trouble with the spaces in the filenames.
cp `find -type f -exec file -i -- {} + | grep -i image | sed 's/\:.*//'` out/
What am I doing wrong, and is there a better way to do this?
With the caveat that it won't work if files have newlines in their names:
find . -type f -exec file -i -- {} + |
awk -vFS=: -vOFS=: '$NF ~ /image/{NF--;printf "%s\0", $0}' |
xargs -0 cp -t out/
(Based on answer by Jonathan Leffler and subsequent comments discussion with him and #devnull.)
The find command works well if none of the file names contain any newlines. Within broad limits, the grep command works OK under the same circumstances. The sed command works fine as long as there are no colons in the file names. However, given that there are spaces in the names, the use of $(...) (command substitution, also indicated by back-ticks `...`) is a disaster. Unfortunately, xargs isn't readily a part of the solution; it splits on spaces by default. Because you have to run file and grep in the middle, you can't easily use the -print0 option to (GNU) find and the -0 option to (GNU) xargs.
In some respects, it is crude, but in many ways, it is easiest if you write an executable shell script that can be invoked by find:
#!/bin/bash
for file in "$#"
do
if file -i -- "$file" | grep -i -q "$file:.*image"
then cp "$file" out/
fi
done
This is a little painful in that it invokes file and grep separately for each name, but it is reliable. The file command is even safe if the file name contains a newline; the grep is probably not.
If that script is called 'copyimage.sh', then the find command becomes:
find . -type f -exec ./copyimage.sh {} +
And, given the way the grep command is written, the copyimage.sh file won't be copied, even though its name contains the magic word 'image'.
Pipe the results of your find command to
xargs -l --replace cp "{}" out/
Example of how this works for me on Ubuntu 10.04:
atomic#atomic-desktop:~/temp$ ls
img.png img space.png
atomic#atomic-desktop:~/temp$ mkdir out
atomic#atomic-desktop:~/temp$ find -type f -exec file -i \{\} \; | grep -i image | sed 's/\:.*//' | xargs -l --replace cp -v "{}" out/
`./img.png' -> `out/img.png'
`./img space.png' -> `out/img space.png'
atomic#atomic-desktop:~/temp$ ls out
img.png img space.png
atomic#atomic-desktop:~/temp$
find . -type f | xargs file | grep text | cut -d':' -f1 | xargs grep -l "TEXTSEARCH" {}
it's a good solution? for find TEXTSEARCH recursively in only textual files
You can use the -r(recursive) and -I(ignore binary) options in grep:
$ grep -rI "TEXTSEARCH" .
-I Process a binary file as if it did not contain matching data; this is equivalent to the --binary-files=without-match option.
-r Read all files under each directory, recursively; this is equivalent to the -d recurse option.
Another, less elegant solution than kevs, is, to chain -exec commands in find together, without xargs and cut:
find . -type f -exec bash -c "file -bi {} | grep -q text" \; -exec grep TEXTSEARCH {} ";"
If you know what the file extension is that you want to search, then a very simple way to search all *.txt files from the current dir, recursively through all subdirs, case insensitive:
grep -ri --include=*.txt "sometext" *