Performance with bash loop when renaming files

Performance with bash loop when renaming files - bash

Sometimes I need to rename some amount of files, such as add a prefix or remove something.
At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:
$1 - which directory to list,
$2 - what pattern will be replacement,
$3 - replacement.
echo "usage: dir pattern replacement"
for fname in `ls $1`
do
newName=$(echo $fname | sed "s/^$2/$3/")
echo 'mv' "$1/$fname" "$1/$newName&&"
mv "$1/$fname" "$1/$newName"
done
It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?
I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.
Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.
--- update ---
I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".
Thanks to kev and S.R.I for giving me a rename solution to rename files.

Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.
You can do everything you want to do in pure bash 4.X and only need to call mv
pat_rename(){
if [[ ! -d "$1" ]]; then
echo "Error: '$1' is not a valid directory"
return
fi
shopt -s globstar
cd "$1"
for file in **; do
echo "mv $file ${file//$2/$3}"
done
}

Simplest first. What's wrong with rename?
mkdir tstbin
for i in `seq 1 20`
do
touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt
Or are you using an older *nix machine?

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:
exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')
IFS=`echo`
while read -u3 orig && read -u4 to; do
mv "${orig}" "${to}";
done;

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Related

bash script rename multiple files [duplicate]

This question already has answers here:
Rename filename to another name
(3 answers)
Closed 7 years ago.
Let´s say I have a bunch of files named something like this: bsdsa120226.nai bdeqa140223.nai and I want to rename them to 120226.nai 140223.nai. How can i achieve this using the script below?
#!/bin/bash
name1=`ls *nai*`
names=`ls *nai*| grep -Po '(?<=.{5}).+'`
for i in $name1
do
for y in $names
do
mv $i $y
done
done
Solution:
name1=`ls *nai*`
for i in $name1
do
y=$(echo "$i" | grep -Po '(?<=.{5}).+')
mv $i $y
done

This:
#!/bin/bash
shopt -s extglob nullglob
for file in *+([[:digit:]]).nai; do
echo mv -nv -- "$file" "${file##+([^[:digit:]])}"
done
Remove the echo if you're happy with the mv commands.
Note. This solution does not assume that there are 5 leading characters to delete. It will delete all the leading non-numeric characters.

Using only bash, you could do this:
for file in *nai* ; do
echo mv -- "$file" "${file:5}"
done
(Remove the echo when satisfied with the output.)
Avoid ls in scripts, except for displaying information. Use plain globbing instead.
See also How do I do string manipulations in bash? for more string manipulation techniques.
Your script can't work with that structure: if you have 5 files, it will call mv five times for the first file (once for each element in the second list), five times for the second, etc. You'd need to iterate over the two sets of names in lockstep. (It also doesn't deal with things like whitespace in filenames.)

You would be better off using rename (prename on some systems) since that allows you to use Perl regular expressions to do the renaming, along the lines of:
prename 's/^.{5}//' *.nai
The reason your script is not behaving is that, for every source file, you're attempting to rename it to every target file.
If you need to limit yourself to using that script, you need to work out the single target file for each source file, something like:
#!/bin/bash
for i in *.nai; do
y=$(echo "$i" | cut -c6-)
mv "$i" "$y"
done

If your system has rename tool, it's better to go with the simple rename command,
rename 's/^.{5}//' *.nai
It just remove the first 5 characters from the file name.
OR
for i in *.nai; do mv "$i" $(grep -oP '(?<=^.{5}).+' <<< "$i"); done

In shell, how do I delete numbered duplicate files?

I've got a directory with a few thousand files in it, named things like:
filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.
Most of the files with bracketed numbers are duplicates of the original, but in some cases they're not.
How can I keep my original files, delete the duplicates, but not lose the files that are different?
I know that I could rm *\).ext, but that obviously doesn't make sure that files match the original.
I'm using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).
And I don't know what to write in the script. I was thinking of storing things in an array with this:
awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'
But that always seems to delete my original.
What am I doing wrong? How do I do it right?
Thanks.

Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that's seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.
Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.
I wouldn't do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.
Instead:
$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done
You can expand this to check MD5's along the way. But it's more code, so I'll break it into multiple lines, in a script:
#!/bin/bash
shopt -s nullglob # Show nothing if a fileglob matches no files
for file in *[^\)].ext; do
md5=$(md5 -q "$file") # The -q option gives you only the message digest
echo "-- Found: $file ($md5)"
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
rm -v "$duplicate"
fi
done
done
As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:
#!/bin/bash
shopt -s nullglob
for file in *[^\)].ext; do
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
  if cmp "$file" "$duplicate"; then
rm -v "$file"
fi
done
done

If you don't need to use AWK, you could maybe do something simpler in bash:
for file in *\([0-9]*\)*; do
[ -e "$(echo "$file" | sed -e 's/ ([0-9]\+)//')" ] && rm "$file"
done
Hope this helps a little =)

Renaming multiples files with a bash loop

I need to rename 45 files, and I don't want to do it one by one. These are the file names:
chr10.fasta chr13_random.fasta chr17.fasta chr1.fasta chr22_random.fasta chr4_random.fasta chr7_random.fasta chrX.fasta
chr10_random.fasta chr14.fasta chr17_random.fasta chr1_random.fasta chr2.fasta chr5.fasta chr8.fasta chrX_random.fasta
chr11.fasta chr15.fasta chr18.fasta chr20.fasta chr2_random.fasta chr5_random.fasta chr8_random.fasta chrY.fasta
chr11_random.fasta chr15_random.fasta chr18_random.fasta chr21.fasta chr3.fasta chr6.fasta chr9.fasta
chr12.fasta chr16.fasta chr19.fasta chr21_random.fasta chr3_random.fasta chr6_random.fasta chr9_random.fasta
chr13.fasta chr16_random.fasta chr19_random.fasta chr22.fasta chr4.fasta chr7.fasta chrM.fasta
I need to change the extension ".fasta" to ".fa". I'm trying to write a bash script to do it:
for i in $(ls chr*)
do
NEWNAME = `echo $i | sed 's/sta//g'`
mv $i $NEWNAME
done
But it doesn't work. Can you tell me why, or give another quick solution?
Thanks!

Several mistakes here:
NEWNAME = should be without space. Here bash is looking for a command named NEWNAME and that fails.
you parse the output of ls. this is bad if you had files with spaces. Bash can build itself a list of files with the glob operator *.
You don't escape "$i" and "$NEWNAME". If any of them contains a space it makes two arguments for mv.
If a file name begins with a dash mv will believe it is a switch. Use -- to stop argument processing.
Try:
for i in chr*
do
mv -- "$i" "${i/%.fasta/.fa}"
done
or
for i in chr*
do
NEWNAME="${i/%.fasta/.fa}"
mv -- "$i" "$NEWNAME"
done
The "%{var/%pat/replacement}" looks for pat only at the end of the variable and replaces it with replacement.

for f in chr*.fasta; do mv "$f" "${f/%.fasta/.fa}"; done

If you have the rename command, you can do:
rename .fasta .fa chr*.fasta

In a small script to monitor a folder for new files, the script seems to be finding the wrong files

I'm using this script to monitor the downloads folder for new .bin files being created. However, it doesn't seem to be working. If I remove the grep, I can make it copy any file created in the Downloads folder, but with the grep it's not working. I suspect the problem is how I'm trying to compare the two values, but I'm really not sure what to do.
#!/bin/sh
downloadDir="$HOME/Downloads/"
mbedDir="/media/mbed"
inotifywait -m --format %f -e create $downloadDir -q | \
while read line; do
if [ $(ls $downloadDir -a1 | grep '[^.].*bin' | head -1) == $line ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
fi
done

The ls $downloadDir -a1 | grep '[^.].*bin' | head -1 is the wrong way to go about this. To see why, suppose you had files named a.txt and b.bin in the download directory, and then c.bin was added. inotifywait would print c.bin, ls would print a.txt\nb.bin\nc.bin (with actual newlines, not \n), grep would thin that to b.bin\nc.bin, head would remove all but the first line leaving b.bin, which would not match c.bin. You need to be checking $line to see if it ends in .bin, not scanning a directory listing. I'll give you three ways to do this:
First option, use grep to check $line, not the listing:
if echo "$line" | grep -q '[.]bin$'; then
Note that I'm using the -q option to supress grep's output, and instead simply letting the if command check its exit status (success if it found a match, failure if not). Also, the RE is anchored to the end of the line, and the period is in brackets so it'll only match an actual period (normally, . in a regular expression matches any single character). \.bin$ would also work here.
Second option, use the shell's ability to edit variable contents to see if $line ends in .bin:
if [ "${line%.bin}" != "$line" ]; then
the "${line%.bin}" part gives the value of $line with .bin trimmed from the end if it's there. If that's not the same as $line itself, then $line must've ended with .bin.
Third option, use bash's [[ ]] expression to do pattern matching directly:
if [[ "$line" == *.bin ]]; then
This is (IMHO) the simplest and clearest of the bunch, but it only works in bash (i.e. you must start the script with #!/bin/bash).
Other notes: to avoid some possible issues with whitespace and backslashes in filenames, use while IFS= read -r line; do and follow #shellter's recommendation about double-quotes religiously.
Also, I'm not very familiar with inotifywait, but AIUI its -e create option will notify you when the file is created, not when its contents are fully written out. Depending on the timing, you may wind up copying partially-written files.
Finally, you don't have any checking for duplicate filenames. What should happen if you download a file named foo.bin, it gets copied, you delete the original, then download a different file named foo.bin. As the script is now, it'll silently overwrite the first foo.bin. If this isn't what you want, you should add something like:
if [ ! -e "$mbedDir/$line" ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
elif ! cmp -s "$downloadDir/$line" "$mbedDir/$line"; then
echo "Eeek, a duplicate filename!" >&2
# or possibly something more constructive than that...
fi

Renaming part of a filename [duplicate]

This question already has answers here:
Rename multiple files based on pattern in Unix
(24 answers)
Closed 5 years ago.
I have loads of files which look like this:
DET01-ABC-5_50-001.dat
...
DET01-ABC-5_50-0025.dat
and I want them to look like this:
DET01-XYZ-5_50-001.dat
...
DET01-XYZ-5_50-0025.dat
How can I do this?

There are a couple of variants of a rename command, in your case, it may be as simple as
rename ABC XYZ *.dat
You may have a version which takes a Perl regex;
rename 's/ABC/XYZ/' *.dat

for file in *.dat ; do mv $file ${file//ABC/XYZ} ; done
No rename or sed needed. Just bash parameter expansion.

Something like this will do it. The for loop may need to be modified depending on which filenames you wish to capture.
for fspec1 in DET01-ABC-5_50-*.dat ; do
fspec2=$(echo ${fspec1} | sed 's/-ABC-/-XYZ-/')
mv ${fspec1} ${fspec2}
done
You should always test these scripts on copies of your data, by the way, and in totally different directories.

You'll need to learn how to use sed http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
And also to use for so you can loop through your file entries http://www.cyberciti.biz/faq/bash-for-loop/
Your command will look something like this, I don't have a term beside me so I can't check
for i in `dir` do mv $i `echo $i | sed '/orig/new/g'`

I like to do this with sed. In you case:
for x in DET01-*.dat; do
echo $x | sed -r 's/DET01-ABC-(.+)\.dat/mv -v "\0" "DET01-XYZ-\1.dat"/'
done | sh -e
It is best to omit the "sh -e" part first to see what will be executed.

All of these answers are simple and good. However, I always like to add an interactive mode to these scripts so that I can find false positives.
if [[ -n $inInteractiveMode ]]
then
echo -e -n "$oldFileName => $newFileName\nDo you want to do this change? [Y/n]: "
read run
[[ -z $run || "$run" == "y" || "$run" == "Y" ]] && mv "$oldFileName" "$newFileName"
fi
Or make interactive mode the default and add a force flag (-f | --force) for automated scripts or if you're feeling daring. And this doesn't slow you down too much: the default response is "yes, I do want to rename" so you can just hit the enter key at each prompt (because of the -z $run test.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Performance with bash loop when renaming files - bash

Simplest first. What's wrong with rename? mkdir tstbin for i in `seq 1 20` do touch tstbin/filename$i.txt done rename .txt .html tstbin/.txt Or are you using an older nix machine?

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends: exec 3< <(ls) exec 4< <(ls | sed 's/from/to/') IFS=`echo` while read -u3 orig && read -u4 to; do mv "${orig}" "${to}"; done;

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Related

bash script rename multiple files [duplicate]

In shell, how do I delete numbered duplicate files?

Renaming multiples files with a bash loop

In a small script to monitor a folder for new files, the script seems to be finding the wrong files

Renaming part of a filename [duplicate]

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Performance with bash loop when renaming files - bash

Simplest first. What's wrong with rename? mkdir tstbin for i in `seq 1 20` do touch tstbin/filename$i.txt done rename .txt .html tstbin/*.txt Or are you using an older *nix machine?

To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends: exec 3< <(ls) exec 4< <(ls | sed 's/from/to/') IFS=`echo` while read -u3 orig && read -u4 to; do mv "${orig}" "${to}"; done;

I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Related

bash script rename multiple files [duplicate]

In shell, how do I delete numbered duplicate files?

Renaming multiples files with a bash loop

In a small script to monitor a folder for new files, the script seems to be finding the wrong files

Renaming part of a filename [duplicate]

Categories

Resources

Simplest first. What's wrong with rename? mkdir tstbin for i in `seq 1 20` do touch tstbin/filename$i.txt done rename .txt .html tstbin/.txt Or are you using an older nix machine?