Bulk convert cp1252 to utf-8 in Windows - utf-8

So,
I've been trying to convert a large java source tree from cp1252 to UTF-8 in Windows, using tips and trix I've found online, specificly here. Problem is, I'm on Windows; I don't do VB; Cygwin's iconv doesn't take the -o switch.
The line I first tried to use is:
find . -type f -print -exec iconv -f cp1252 -t utf-8 {} > {}.converted \; -exec mv {}.converted {} \;
This creates a file {}.converted in the working directory and the second -exec fails for obvious reasons.
Putting quotes around the iconv expression:
find . -type f -print -exec 'iconv -f cp1252 -t utf-8 {} > {}.converted' \; -exec mv {}.converted {} \;
resulsts in the folowing error:
find: `iconv -f cp1252 -t utf-8 ./java/dv/framework/activity/model/ActivitiesMediaViewImpl.java > ./java/dv/framework/activity/model/ActivitiesMediaViewImpl.java.converted': No such file or directory
though executing the individual expressions by hand works perfectly.
I've experimented with random quoting but nothing seems to work, what am I missing? Why won't it work..?
Thanx in advance,
Lars

for f in `find . -type f`; do
iconv -f cp1252 -t utf-8 $f > $f.converted
mv $f.converted $f
done

Allright, once again answering my own question (this is starting to become a bad habit...)
Allthough there is nothing wrong with Neevek's solution, the perfectionist in me wants to get the find -exec expression right. Wrapping the iconv statement in a sh -c '...' does the trick:
find . -type f -print -exec sh -c 'iconv -f cp1252 -t utf-8 {} > {}.converted' \; -exec mv {}.converted {} \;
Still, the underlying question of why there is a problem using i/o redirection in find -exec statements remains unresolved...

I haven't used Cygwin very much but there's a "native" windows version of Iconv that I use all the time. Here's an excerpt from a batch file that i use to convert all the files in a sub-dir from HP-ROMAN8 encoding to UTF-8 encoding -- putting the result './temp" under the originals:
#set dir=original
#set ICONV="C:\Program Files (x86)\iconv-1.9.2.win32\bin\iconv"
if EXIST .\%dir%\temp (
erase .\%dir%\temp*.* /Q
#if ERRORLEVEL 1 (#echo Unable to erase all files from the "temp" sub-directory
#goto THE_END
)
) else (
mkdir .\%dir%\temp
#if ERRORLEVEL 1 (#echo Unable to create the "temp" sub-directory
#goto THE_END
)
)
for %%f IN (./%dir%/*.xml) do (
%ICONV% -f HP-ROMAN8 -t UTF-8 "./%dir%/%%f" > "./%dir%/temp/%%f"
if ERRORLEVEL 1 (goto ICONV_ERROR)
)

The error in the first try is that the redirection operator '>' ist evaluated by the shell before find starts.
The error in the second try is that the text between the single quotes is interpreted as the name of a command that is to be executed by find, but that doesn't exist.
In your working solution the first command to be executed by find is a subshell, and the options are enclosed in single quotes, so they are not interpreted by the outer shell but by the subshell.

Related

Removing files with/without spaces in filename

Hello stackoverflow community,
I'm facing a problem with removing files that contain spaces in filename, i have this part of code which is responsible of deleting files that we get from a directory,
for f in $(find $REP -type f -name "$Filtre" -mtime +${DelAvtPurge})
do
rm -f $f
I know that simple or double quotes are working for deleting files with spaces, it works for me when i try them in a command line, but when i put them in $f in the file it doesn't work at all.
Could anybody help me to find a solution for this ?
GNU find has -delete for that:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -delete
With any other find implementation, you can use bulk-exec:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -exec rm -f {} +
For a dry-run, drop -delete from the first and see the list of files to be deleted; for second, insert echo before rm.
The other answer has shown how to do this properly. But fundamentally the issue in your command is the lack of quoting, due to the way the shell expands variables:
rm -f $f
needs to become
rm -f "$f"
In fact, always quoting your variables is safe and generally a good idea.
However, this will not fix your code. Now filenames with spaces will work, but filenames with other valid characters (to wit, newlines) won’t. Try it:
touch foo$'\n'bar
for f in $(find . -maxdepth 1 -name foo\*); do echo "rm -f $f"; done
Output:
rm -f ./foo
rm -f bar
Clearly that won’t do. In fact, you mustn’t parse the output of find, for this reason. The only way of making this safe, apart from the solution via find -exec is to use the -print0 option:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -print0 \
| IFS= while read -r -d '' f; do
rm -f "$f"
done
Using -print0 instead of (implicit) -print causes find to delimit hits by the null character instead of newline. Correspondingly, IFS= read -r -d '' reads a null-character delimited input string, which we do in a loop using while (the -r option prevents read from interpreting backslashes as escape sequences).

Gzip CSS/JS Using OS X Terminal

I am trying to use OSX Terminal to create gzip versions (.gz) of all css & js files in a folder. I found the following command, but when I cd to a test folder & then enter the command, it doesn't output anything & I would expect it to create a gzip copy in the folder next to the original file:
find . -regex ".*\(css\|js\)$" -exec bash -c 'echo Compressing "{}" && gzip -c --best "{}" > "{}.gz"' \;
What am I doing wrong? Or does the command need to be modified?
It seems that OS X version of find doesn't support extended regular expressions. Simple workaround would be to use logical or operator (-o option) like this:
find . \( -name "*\.js" -o -name "*\.css" \) -exec bash -c 'echo Compressing "{}" && gzip -c --best "{}" > "{}.gz"' \;
It will search for both file extensions and exec bash command for each found file.
Update.
I actually found out that your syntax will also work. You need to use -E option of find command to work with extended regular expressions. You may also need to enclose the pattern in double quotes.
find -E . -regex "".*\(css\|js\)$"" -exec bash -c 'echo Compressing "{}" && gzip -c --best "{}" > "{}.gz"' \;
From find man page:
-E Interpret regular expressions followed by -regex and -iregex
primaries as extended (modern) regular expressions rather than basic
regular expressions (BRE's). The re_format(7) manual page fully
describes both formats.

batch rename file extensions in subdirectories

I'm trying to create a batch file in linux that will allow me to change extensions of files in multiple subdirectories. After much searching and experimenting i've found what seems to be a solution:
find /volume1/uploads -name "*.mkv" -exec rename .mkv .avi {} +
When running the script i get the following error:
find: -exec CMD must end by ';'
I've tried adding ; and \; (with or without +) but to no avail. What's wrong with the command and how can I fix it?
Edit: Running on a Synology NAS with DSM 4.2
you have to escape all characters that would be interpreted by bash. in your case these are the semicolon and the curly braces (you forgot to escape the latter in your code):
find /volume1/uploads -name "*.mkv" -exec rename .mkv .avi \{\} \;
the {} (in our case \{\}) is expanded to the filename, so the actual call would look like rename .mkv .avi /volume1/uploads/foo/bla.mkv (which is not the exact syntax the /usr/bin/rename needs, at least on my system).
instead it would be something like:
find /volume1/uploads -name "*.mkv" -exec rename 's/\.mkv$/.avi/' \{\} \;
UPDATE
if you don't want to (or cannot) use perl's rename script, you could use the following simple bash script and save it as /tmp/rename.sh
#!/bin/sh
INFILE=$1
OUTFILE="${INFILE%.mkv}.avi"
echo "moving ${INFILE} to ${OUTFILE}"
mv "${INFILE}" "${OUTFILE}"
make it executable (chmod u+x /tmp/rename.sh) and call:
find /volume1/uploads -name "*.mkv" -exec /tmp/rename.sh \{\} \;
UPDATE2
it turned out that this question is really not about bash but about busybox.
with a limited shell interpreter like busybox, the simplest solution is just to append the new file extension:
find /volume1/uploads -name "*.mkv" -exec mv \{\} \{\}.avi \;
Not sure how different find and rename commands are on your DSM 4.2 OS so try something like:
find /volume1/uploads -name "*.mkv" | while read filename;
do mv -v "${filename}" "$(echo "${filename}" | sed -e 's/\.mkv$/\.avi/')"
done

One more bash (now .bat) script

I need to convert about 12000 TIF files in many directories, and try to write bash-script:
#!/bin/bash
find -name "*.tif" | while read f
do
convert "$f" "${f%.*}.png"
rm -f "$f"
done
Why it say: x.sh: 6: Syntax error: end of file unexpected (expecting "do") and what I should to do?
Great thanks to you all, men, but I was cheated: the computer on which this should be run out works under Windows. I don't know how to work with strings and cycles in DOS, now my script look like:
FOR /R %i IN (*.tif) DO # (set x=%i:tif%png) & (gm convert %i %xtif) & (erase /q /f %i)
%i - one of the .tif files.
%x - filename with .png extension
gm convert - graphics magick utility, work similarly with image magick's convert on linux.
The syntax looks okay, but if it's a problem with EOLs, try adding a semicolon before the do to fix the syntax error (or check the newlines are actually present/encoded as ghostdog74 suggests):
find -name "*.tif" | while read f ; do # ...
Note that the find/read pattern isn't robust. Use can use find's exec capability directly (thanks Philipp for the inline command):
find -name "*.tif" -exec sh -c 'file=$0 && convert "$file" "${file%.tif}.png"' '{}' ';' -delete

How do I check if all files inside directories are valid jpegs (Linux, sh script needed)?

Ok, I got a directory (for instance, named '/photos') in which there are different directories
(like '/photos/wedding', '/photos/birthday', '/photos/graduation', etc...) which have .jpg files in them. Unfortunately, some of jpeg files are broken. I need to find a way how to determine, which files are broken.
I found out, that there is tool named imagemagic, which can help a lot. If you use it like this:
identify -format '%f' whatever.jpg
it prints the name of the file only if file is valid, if it is not it prints something like "identify: Not a JPEG file: starts with 0x69 0x75 `whatever.jpg' # jpeg.c/EmitMessage/232.".
So the correct solution should be find all files ending with ".jpg", apply to them "identify", and if the result is just the name of the file - don't do anything, and if the result is different from the name of the file - then save the name of the file somethere (like in a file "errors.txt").
Any ideas how I can probably do that?
The short-short version:
find . -iname "*.jpg" -exec jpeginfo -c {} \; | grep -E "WARNING|ERROR"
You might not need the same find options, but jpeginfo was the solution that worked for me:
find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1
as a script (as requested in this question)
#!/bin/sh
find . -type f \
\( -iname "*.jpg" \
-o -iname "*.jpeg" \) \
-exec jpeginfo -c {} \; | \
grep -E "WARNING|ERROR" | \
cut -d " " -f 1
I was clued into jpeginfo for this by http://www.commandlinefu.com/commands/view/2352/find-corrupted-jpeg-image-files and this explained mixing find -o OR with -exec
One problem with identify -format is that it doesn't actually verify that the file is not corrupt, it just makes sure that it's really a jpeg.
To actually test it you need something to convert it. But the convert that comes with ImageMagick seems to silently ignore non-fatal errors in the jpeg (such as being truncated.)
One thing that works is this:
djpeg -fast -grayscale -onepass file.jpg > /dev/null
If it returns an error code, the file has a problem. If not, it's good.
There are other programs that could be used as well.
You can put this into bash script file or run directly:
find -name "*.jpg" -type f |xargs --no-run-if-empty identify -format '%f' 1>ok.txt 2>errors.txt
In case identify is missing, here is how to install it in Ubuntu:
sudo apt install imagemagick --no-install-recommends
This script will print out the names of the bad files:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if [[ $(identify -format '%f' "$FILE" 2>/dev/null) != $FILE ]]; then
echo "$FILE"
fi
done
You could run it as is or as ./badjpegs > errors.txt to save the output to a file.
To break it down, the find command finds *.jpg files in /photos or any of its subdirectories. These file names are piped to a while loop, which reads them in one at a time into the variable $FILE. Inside the loop, we grab the output of identify using the $(...) operator and check if it matches the file name. If not, the file is bad and we print the file name.
It may be possible to simplify this. Most UNIX commands indicate success or failure in their exit code. If the identify command does this, then you could simplify the script to:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if ! identify "$FILE" &> /dev/null; then
echo "$FILE"
fi
done
Here the condition is simplified to if ! identify; then which means, "did identify fail?"

Resources