Rename every unicode file of a linux directory in ascii - bash

I am trying to rename all unicode files name in ASCII.
I wanted to do something like this :
for file in `ls | egrep -v ^[a-z0-9\._-]+$`; do mv "$file" $(echo "$file" | slugify); done
But it doesn't work yet.
first, regexp ^[a-z0-9\._-]+$ doesn't seem to be enough.
second, slugify also transform the extension of the file so I have to cut the extension first and after put it back.
Any idea of a way to do that ?

First thing first, don't parse the output of ls. That is, in general, a bad idea, especially if you're expecting files that have any sort of strange characters in their names.
Assuming slugify does what you want with filenames in general, try:
for file in * ; do
if [ -f "$file" ] ; then
ext=${file##*.}
name=${file%.*}
new_name=$(echo "$name"|slugify)
if [[ $name != $new_name ]] ; then
echo mv -v "$name.$ext" "$new_name.$ext"
fi
fi
done
Warning: this will fail if you have files without an extension (it'll double-up the filename). See this other answer by Doctor J if you need to handle that.

Related

Script which will move non-ASCII files

I need help. I should write a script,whih will move all non-ASCII files from one directory to another. I got this code,but i dont know why it is not working.
#!/bin/bash
for file in "/home/osboxes/Parkhom"/*
do
if [ -eq "$( echo "$(file $file)" | grep -nP '[\x80-\xFF]' )" ];
then
if test -e "$1"; then
mv $file $1
fi
fi
done
exit 0
It's not clear which one you are after, but:
• To test if the variable $file contains a non-ASCII character, you can do:
if [[ $file == *[^[:ascii:]]* ]]; then
• To test if the file $file contains a non-ASCII character, you can do:
if grep -qP '[^[:ascii:]]' "$file"; then
So for example your code would look like:
for file in "/some/path"/*; do
if grep -qP '[^[:ascii:]]' "$file"; then
test -d "$1" && mv "$file" "$1"
fi
done
The first problem is that your first if statement has an invalid test clause. The -eq operator of [ needs to take one argument before and one after; your before argument is gone or empty.
The second problem is that I think the echo is redundant.
The third problem is that the file command always has ASCII output but you're checking for binary output, which you'll never see.
Using file pretty smart for this application, although there are two ways you can go on this; file says a variety of things and what you're interested in are data and ASCII, but not all files that don't identify as data are ASCII and not all files that don't identify as ASCII are data. You might be better off going with the original idea of using grep, unless you need to support Unicode files. Your grep is a bit strange to me so I don't know what your environment is but I might try this:
#!/bin/bash
for file in "/home/osboxes/Parkhom"/*
do
if grep -qP '[\0x80-\0xFF]' $file; then
[ -e "$1" ] && mv $file $1
fi
done
The -q option means be quiet, only return a return code, don't show the matches. (It might be -s in your grep.) The return code is tested directly by the if statement (no need to use [ or test). The && in the next line is just a quick way of saying if the left-hand side is true, then execute the right-hand side. You could also form this as an if statement if you find that clearer. [ is a synonym for test. Personally if $1 is a directory and doesn't change, I'd check it once at the beginning of the script instead of on each file, it would be faster.
If you mean you want to know if something is not a plain text file then you can use the file command which returns information about the type of a file.
[[ ! $( file -b "$file" ) =~ (^| )text($| ) ]]
The -b simply tells it not to bother returning the filename.
The returned value will be something like:
ASCII text
HTML document text
POSIX shell script text executable
PNG image data, 21 x 34, 8-bit/color RGBA, non-interlaced
gzip compressed data, from Unix, last modified: Mon Oct 31 14:29:59 2016
The regular expression will check whether the returned file information includes the word "text" that is included for all plain text file types.
You can instead filter for specific file types like "ASCII text" if that is all you need.

Bash rename last underscore in string

I have got a directory with files in which some of then end with an underscore.
I would like to test each file to see if it ends with an underscore and then strip off the underscore.
I am currently running the following code:
for file in *;do
echo $file;
if [[ "${file:$length:1}" == "_" ]];then
mv $file $(echo $file | sed "s/.$//g");
fi
done
But it does not seem to be renaming the files with underscore. For example if i have a file called all_indoors_ I expect it to give me all_indoors.
You could use built-in string substitution:
for file in *_; do
mv "$file" "${file%_}"
done
Just use a regex to check the string:
for file in *
do
[[ $file =~ "_$" ]] && echo mv "$file" "${file%%_}"
done
Once you are sure it works as intended, remove the echo so that the mv command executes!
It may even be cleaner to use *_ so that the for will just loop over the files with a name ending with _, as hek2mgl suggests in comments.
for file in *_
do
echo mv "$file" "${file%%_}"
done
You can use which will be recursive:
while read f; do
mv "$f" "${f:0:-1}"; # Remove last character from $f
done < <(find . -type f -name '*_')
Although not a pure bash approach, you can use rename.ul (written by Larry Wall, the person behind perl). Rename is not part of the default linux environment, but is part of util-linux.
You use rename with:
rename perlexpr files
(some flags ommitted).
So you could use:
rename 's/_$//' *
if you want to remove all characters including and after the underscore.
As #hek2mgl points out, there are multiple rename commands (see here), so first test if you have picked the right one.

bash scripting and conditional statements

I am trying to run a simple bash script but I am struggling on how to incoperate a condition. any pointers. the loop says. I would like to incoperate a conditions such that when gdalinfo cannot open the image it copies that particular file to another location.
for file in `cat path.txt`; do gdalinfo $file;done
works fine in opening the images and also shows which ones cannot be opened.
the wrong code is
for file in `cat path.txt`; do gdalinfo $file && echo $file; else cp $file /data/temp
Again, and again and again - zilion th again...
Don't use contsructions like
for file in `cat path.txt`
or
for file in `find .....`
for file in `any command what produces filenames`
Because the code will BREAK immediatelly, when the filename or path contains space. Never use it for any command what produces filenames. Bad practice. Very Bad. It is incorrect, mistaken, erroneous, inaccurate, inexact, imprecise, faulty, WRONG.
The correct form is:
for file in some/* #if want/can use filenames directly from the filesystem
or
find . -print0 | while IFS= read -r -d '' file
or (if you sure than no filename contains a newline) can use
cat path.txt | while read -r file
but here the cat is useless, (really - command what only copies a file to STDOUT is useless). You should use instead
while read -r file
do
#whatever
done < path.txt
It is faster (doesn't fork a new process, as do in case of every pipe).
The above whiles will fill the corect filename into the variable file in cases when the filename contains a space too. The for will not. Period. Uff. Omg.
And use "$variable_with_filename" instead of pure $variable_with_filename for the same reason. If the filename contains a white-space any command will misunderstand it as two filenames. This probably not, what you want too..
So, enclose any shell variable what contain a filename with double quotes. (not only filename, but anything what can contain a space). "$variable" is correct.
If i understand right, you want copy files to /data/temp when the gdalinfo returns error.
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done < path.txt
Nice, short and safe (at least if your path.txt really contains one filename per line).
And maybe, you want use your script more times, therefore dont out the filename inside, but save the script in a form
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done
and use it like:
mygdalinfo < path.txt
more universal...
and maybe, you want only show the filenames for what gdalinfo returns error
while read -r file
do
gdalinfo "$file" || printf "$file\n"
done
and if you change the printf "$file\n" to printf "$file\0" you can use the script in a pipe safely, so:
while read -r file
do
gdalinfo "$file" || printf "$file\0"
done
and use it for example as:
mygdalinfo < path.txt | xargs -0 -J% mv % /tmp/somewhere
Howgh.
You can say:
for file in `cat path.txt`; do gdalinfo $file || cp $file /data/temp; done
This would copy the file to /data/temp if gdalinfo cannot open the image.
If you want to print the filename in addition to copying it in case of failure, say:
for file in `cat path.txt`; do gdalinfo $file || (echo $file && cp $file /data/temp); done

In shell, how do I delete numbered duplicate files?

I've got a directory with a few thousand files in it, named things like:
filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.
Most of the files with bracketed numbers are duplicates of the original, but in some cases they're not.
How can I keep my original files, delete the duplicates, but not lose the files that are different?
I know that I could rm *\).ext, but that obviously doesn't make sure that files match the original.
I'm using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).
And I don't know what to write in the script. I was thinking of storing things in an array with this:
awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'
But that always seems to delete my original.
What am I doing wrong? How do I do it right?
Thanks.
Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that's seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.
Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.
I wouldn't do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.
Instead:
$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done
You can expand this to check MD5's along the way. But it's more code, so I'll break it into multiple lines, in a script:
#!/bin/bash
shopt -s nullglob # Show nothing if a fileglob matches no files
for file in *[^\)].ext; do
md5=$(md5 -q "$file") # The -q option gives you only the message digest
echo "-- Found: $file ($md5)"
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
rm -v "$duplicate"
fi
done
done
As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:
#!/bin/bash
shopt -s nullglob
for file in *[^\)].ext; do
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
  if cmp "$file" "$duplicate"; then
rm -v "$file"
fi
done
done
If you don't need to use AWK, you could maybe do something simpler in bash:
for file in *\([0-9]*\)*; do
[ -e "$(echo "$file" | sed -e 's/ ([0-9]\+)//')" ] && rm "$file"
done
Hope this helps a little =)

In a small script to monitor a folder for new files, the script seems to be finding the wrong files

I'm using this script to monitor the downloads folder for new .bin files being created. However, it doesn't seem to be working. If I remove the grep, I can make it copy any file created in the Downloads folder, but with the grep it's not working. I suspect the problem is how I'm trying to compare the two values, but I'm really not sure what to do.
#!/bin/sh
downloadDir="$HOME/Downloads/"
mbedDir="/media/mbed"
inotifywait -m --format %f -e create $downloadDir -q | \
while read line; do
if [ $(ls $downloadDir -a1 | grep '[^.].*bin' | head -1) == $line ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
fi
done
The ls $downloadDir -a1 | grep '[^.].*bin' | head -1 is the wrong way to go about this. To see why, suppose you had files named a.txt and b.bin in the download directory, and then c.bin was added. inotifywait would print c.bin, ls would print a.txt\nb.bin\nc.bin (with actual newlines, not \n), grep would thin that to b.bin\nc.bin, head would remove all but the first line leaving b.bin, which would not match c.bin. You need to be checking $line to see if it ends in .bin, not scanning a directory listing. I'll give you three ways to do this:
First option, use grep to check $line, not the listing:
if echo "$line" | grep -q '[.]bin$'; then
Note that I'm using the -q option to supress grep's output, and instead simply letting the if command check its exit status (success if it found a match, failure if not). Also, the RE is anchored to the end of the line, and the period is in brackets so it'll only match an actual period (normally, . in a regular expression matches any single character). \.bin$ would also work here.
Second option, use the shell's ability to edit variable contents to see if $line ends in .bin:
if [ "${line%.bin}" != "$line" ]; then
the "${line%.bin}" part gives the value of $line with .bin trimmed from the end if it's there. If that's not the same as $line itself, then $line must've ended with .bin.
Third option, use bash's [[ ]] expression to do pattern matching directly:
if [[ "$line" == *.bin ]]; then
This is (IMHO) the simplest and clearest of the bunch, but it only works in bash (i.e. you must start the script with #!/bin/bash).
Other notes: to avoid some possible issues with whitespace and backslashes in filenames, use while IFS= read -r line; do and follow #shellter's recommendation about double-quotes religiously.
Also, I'm not very familiar with inotifywait, but AIUI its -e create option will notify you when the file is created, not when its contents are fully written out. Depending on the timing, you may wind up copying partially-written files.
Finally, you don't have any checking for duplicate filenames. What should happen if you download a file named foo.bin, it gets copied, you delete the original, then download a different file named foo.bin. As the script is now, it'll silently overwrite the first foo.bin. If this isn't what you want, you should add something like:
if [ ! -e "$mbedDir/$line" ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
elif ! cmp -s "$downloadDir/$line" "$mbedDir/$line"; then
echo "Eeek, a duplicate filename!" >&2
# or possibly something more constructive than that...
fi

Resources