Batch converting file encoding

Batch converting file encoding - shell

I have bunch of files are encoded with GB2312, now I want to convert them into UTF-8, so I apply the code below:
find . | xargs iconv -f GB2312 -t UTF-8
It successfully convert them but the output is printed in console.
Here I want them to be saved in their original files, how do I make it ?

You could always use a loop instead of xargs. I wouldn't recommend overwriting files in a one-shot command-line call. How about moving them aside first:
for file in `find .`; do
mv "$file" "$file.old" && iconv -f GB2312 -t UTF-8 < "$file.old" > "$file"
done
Just take care with this. If your file names contain spaces, this loop might not work correctly.

Related

Error while passing multiple files from different directories to for loop

Could you guys check the below message and provide me any solution or hint to resolve issue with for loop variable.
The files which i have in below format:
/directory/file1.gz -> gunzip file1, it will be in ascll format
/directory/file2.gz -> gunzip file2, but the file2 still in gzip format
For clearing this issue, I have written below small script:
for f in ./ETF_Directories_*/*;
do
if file -z "$f" | grep -i ascii
then
echo "file is in ascii format"
else
gunzip "$f"
//**If the file is not in ascii format I mean the second case I want to
move this gunziped file to .gz. So that it will in single .gz format, but
I need command to write, or hint on how to use "mv" command to achieve
what I needed**//
fi
done

gunzip x.gz will remove .gz, so when $f is something.gz, the extension is lost and $f can not be found.

Handling special characters like $'\346' on bash

I have a number of files with characters such as $'\351' and $'\346'. I haven't figured out how to reference these files on bash. How can I use mv and sed to change their names?
When I run ls, one of the files that appears, for example, is shown as:
'根'$'\346''%8B'$'\240''.html'

Most users in this situation would want to use convmv to convert the encoding of such filenames.
However, since you don't really know or care what the original filename was supposed to be but just want a reversible transformation to make the names easier to deal with, you could rename all files to a hexdump of their bytes:
export LC_ALL=C
for f in *
do
mv -- "$f" "$(printf '%s' "$f" | od -t x1 -An | tr -cd 'a-f0-9')"
done
This will e.g. turn the file '根'$'\346''%8B'$'\240''.html' into e6a0b9e6253842a02e68746d6c

How to copy multiple files and rename them at once by appending a string in between the file names in Unix?

I have a few files that I want to copy and rename with the new file names generated by adding a fixed string to each of them.
E.g:
ls -ltr | tail -3
games.txt
files.sh
system.pl
Output should be:
games_my.txt
files_my.sh
system_my.pl
I am able to append at the end of file names but not before *.txt.
for i in `ls -ltr | tail -10`; do cp $i `echo $i\_my`;done
I am thinking if I am able to save the extension of each file by a simple cut as follows,
ext=cut -d'.' -f2
then I can append the same in the above for loop.
do cp $i `echo $i$ext\_my`;done
How do I achieve this?

You can use the following:
for file in *
do
name="${file%.*}"
extension="${file##*.}"
cp $file ${name}_my${extension}
done
Note that ${file%.*} returns the file name without extension, so that from hello.txt you get hello. By doing ${file%.*}_my.txt you then get from hello.txt -> hello_my.txt.
Regarding the extension, extension="${file##*.}" gets it. It is based on the question Extract filename and extension in bash.

If the shell variable expansion mechanisms provided by fedorqui's answer look too unreadable to you, you also can use the unix tool basename with a second argument to strip off the suffix:
for file in *.txt
do
cp -i "$file" "$(basename "$file" .txt)_my.txt"
done
Btw, in such cases I always propose to apply the -i option for cp to prevent any unwanted overwrites due to typing errors or similar.
It's also possible to use a direct replacement with shell methods:
cp -i "$file" "${file/.txt/_my.txt}"
The ways are numerous :)

convert a directory of images into a single PDF

I have a directory of images:
path/to/directory/
image01.jpg
image02.jpg
...
and would like to convert it into a single PDF file:
path/to/directory.pdf
This is what I managed to code so far:
#!/bin/bash
echo Directory $1
out=$(echo $1 | sed 's|/$|.pdf|')
echo Output $out
mkdir tmp
for i in $(ls $1)
do
# MAC hates sed with "I" (ignore case) - thanks SO for the perl solution!
# I want to match "jpg, JPG, Jpg, ..."
echo $1$i $(echo "tmp/$i" | perl -C -e 'use utf8;' -pe 's/jpg$/pdf/i')
convert $1$i $(echo "tmp/$i" | perl -C -e 'use utf8;' -pe 's/jpg$/pdf/i')
done
pdftk tmp/*.pdf cat output $out
rm -rf tmp
So the idea was to convert each image into a pdf file with imagemagick, and use pdftk to merge it into a single file. Thanks to the naming of the files I don't have to bother about the ordering.
Since I'm a newbie to this I'm sure there are many refinements one can do:
only iterate over image-files in the directory (in case there is some Readme.txt,...)
including the extensions png, jpeg, ...
using the trailing "/" is not elegant I admint
etc.
Currently my main problem is, however, that there are cases where my directories and image files contain spaces in their names. The for-loop then iterates over sub-strings of the filename and I imagine that the line with convert will also fail.
I have tried out some things but haven't succeeded so far and hope someone will be able to help me here.
If anyone has ideas to address the issues I listed above as well I would be very glad to hear them too.

convert can do this in one go:
convert *.[jJ][pP][gG] output.pdf
Or to answer several of your other questions and replace your script:
#!/bin/bash
shopt -s nullglob nocaseglob
convert "$1"/*.{png,jpg,jpeg} "${1%/}.pdf"
will iterate over all the given extensions in the first argument, regardless of capitalization, and write to yourdir.pdf. It will not break on spaces.

iconv in Mac OS X 10.7.3 does nothing

I am trying to convert a php file (client.php) from utf-8 to iso-8859-1 and the following command does nothing on the file:
iconv -f UTF-8 -t ISO-8859-1 client.php
Upon execution the original file contents are displayed.
In fact, when I check for the file's encoding after executing iconv with:
file -I client.php
The same old utf-8 is shown:
client.php: text/x-php; charset=utf-8

The iconv utility shall convert the encoding of characters in file from one codeset to another and write the results to standard output.
Here's a solution :
Write stdout to a temporary file and rename the temporary file
iconv -f UTF-8 -t ISO_8859-1 client.php > client_temp.php && mv -f client_temp.php client.php

ASCII, UTF-8 and ISO-8859 are 100% identical encodings for the lowest 128 characters. If your file only contains characters in that range (which is basically the set of characters you find on a common US English keyboard), there's no difference between these encodings.
My guess what's happening: A plain text file has no associated encoding meta data. You cannot know the encoding of a plain text file just by looking at it. What the file utility is doing is simply giving its best guess, and since there's no difference it prefers to tell you the file is UTF-8 encoded, which technically it may well be.

In addition to jackjr300 with the following One-Liner you can do it for all php files in the current folder:
for filename in *.php; do iconv -f ISO_8859-1 -t UTF-8 $filename > temp_$filename && mv -f ./temp_$filename ./$filename; done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio