How to recursively copy files while removing part of the path - bash

I have a hundreds of image files in a structure like this:
path/to/file/100/image1.jpg
path/to/file/9999/image765.jpg
path/to/file/333/picture2.jpg
I'd like to remove the 4th part of the path (100,9999,333, ...) so that I get this:
path/to/file/image1.jpg
path/to/file/image765.jpg
path/to/file/picture2.jpg
In this case the image file names have no duplicates and the the target directory could be named entirely different if this makes things easier (e.g. target could be "another/path/to/the/images/image1.jpg"
The solution might be some combination of find/cut/rename command.
How can I do this in bash?

Since you only have "hundreds" of files, it's quite possible that you don't need to do anything special, and can just write:
mv path/to/file/*/*.jpg path/to/file/
But depending on the number of files and lengths of their names, this may turn out to be more than the kernel will let you pass to a single command, in which case you may need to write a for-loop instead:
for file in path/to/file/*/*.jpg ; do
mv "$file" path/to/file/
done
(Of course, this assumes you have mv on your path. There's no Bash builtin for renaming a file, so any approach will depend on what else is available on your system. If you don't have mv, you'll need to adjust the above accordingly.)

I recommend using ruakh's solution if it will work, but if you need to explicitly test for those numeric directories, here's an alternative.
I'm just using echo to pipe the list of names in, and to show the mv at the end, but you could use find (example in a comment) and remove the echo on the mv to make it live.
IFS=/
echo "path/to/file/100/image1.jpg
path/to/file/9999/image765.jpg
path/to/file/333/picture2.jpg" |
# find path/to/file -name "*.jpg" |
while read -r orig
do this=""
read -a line <<< "$orig"
for sub in "${line[#]}"
do if [[ "$sub" =~ ^[0-9]+$ ]]
then continue
else this="$this$sub/"
fi
done
old="${line[*]}"
echo mv "$old" "${this%/}"
done
mv path/to/file/100/image1.jpg path/to/file/image1.jpg
mv path/to/file/9999/image765.jpg path/to/file/image765.jpg
mv path/to/file/333/picture2.jpg path/to/file/picture2.jpg

Related

automatically renaming files

I have a bunch of files (more than 1000) on this like the followings
$ ls
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
org.allenai.ari.solvers.termselector.ExpandedLearner.lex
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lc
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lex
....
I have to rename these files files by adding a learners right before the capitalized name. For example
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
would change to
org.allenai.ari.solvers.termselector.learners.BaselineLearnersurfaceForm.lex
and this one
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
would change to
org.allenai.ari.solvers.termselector.learners.ExpandedLearner.lc
Any ideas how to do this automatically?
for f in org.*; do
echo mv "$f" "$( sed 's/\.\([A-Z]\)/.learner.\1/' <<< "$f" )"
done
This short loop outputs an mv command that renames the files in the manner that you wanted. Run it as-is first, and when you are certain it's doing what you want, remove the echo and run again.
The sed bit in the middle takes a filename ($f, via a here-string, so this requires bash) and replaces the first occurrence of a capital letter after a dot with .learner. followed by that same capital letter.
There is a tool called perl-rename, sometimes rename. Not to be confused with rename from util-linux.
It's very good for tasks like this as it takes a perl expression and renames accordingly:
perl-rename 's/(?=\.[A-Z])/.learners/' *
You can play with the regex online
Alternative you can a for loop and $BASH_REMATCH:
for file in *; do
[ -e "$file" ] || continue
[[ "$file" =~ ^([^A-Z]*)(.*)$ ]]
mv -- "$file" "${BASH_REMATCH[1]}learners.${BASH_REMATCH[2]}"
done
A very simple approach (useful if you only need to do this one time) is to ls >dummy them into a text file dummy, and then use find/replace in a text editor to make lines of the form mv xxx.yyy xxx.learners.yyy. Then you can simple execute the resulting file with ./dummy.
The exact find/replace commands depend on the text editor you use, but something like
replace org. with mv org.. That gets you the mv in the beginning.
replace mv org.allenai.ari.solvers.termselector.$1 with mv org.allenai.ari.solvers.termselector.$1 org.allenai.ari.solvers.termselector.learner.$1 to duplicate the filename and insert the learner.
There is also syntax with a for, which can do it probably in one line, (long) but I cannot explain it - try help for if you want to learn about it.

bash script rename multiple files [duplicate]

This question already has answers here:
Rename filename to another name
(3 answers)
Closed 7 years ago.
Let´s say I have a bunch of files named something like this: bsdsa120226.nai bdeqa140223.nai and I want to rename them to 120226.nai 140223.nai. How can i achieve this using the script below?
#!/bin/bash
name1=`ls *nai*`
names=`ls *nai*| grep -Po '(?<=.{5}).+'`
for i in $name1
do
for y in $names
do
mv $i $y
done
done
Solution:
name1=`ls *nai*`
for i in $name1
do
y=$(echo "$i" | grep -Po '(?<=.{5}).+')
mv $i $y
done
This:
#!/bin/bash
shopt -s extglob nullglob
for file in *+([[:digit:]]).nai; do
echo mv -nv -- "$file" "${file##+([^[:digit:]])}"
done
Remove the echo if you're happy with the mv commands.
Note. This solution does not assume that there are 5 leading characters to delete. It will delete all the leading non-numeric characters.
Using only bash, you could do this:
for file in *nai* ; do
echo mv -- "$file" "${file:5}"
done
(Remove the echo when satisfied with the output.)
Avoid ls in scripts, except for displaying information. Use plain globbing instead.
See also How do I do string manipulations in bash? for more string manipulation techniques.
Your script can't work with that structure: if you have 5 files, it will call mv five times for the first file (once for each element in the second list), five times for the second, etc. You'd need to iterate over the two sets of names in lockstep. (It also doesn't deal with things like whitespace in filenames.)
You would be better off using rename (prename on some systems) since that allows you to use Perl regular expressions to do the renaming, along the lines of:
prename 's/^.{5}//' *.nai
The reason your script is not behaving is that, for every source file, you're attempting to rename it to every target file.
If you need to limit yourself to using that script, you need to work out the single target file for each source file, something like:
#!/bin/bash
for i in *.nai; do
y=$(echo "$i" | cut -c6-)
mv "$i" "$y"
done
If your system has rename tool, it's better to go with the simple rename command,
rename 's/^.{5}//' *.nai
It just remove the first 5 characters from the file name.
OR
for i in *.nai; do mv "$i" $(grep -oP '(?<=^.{5}).+' <<< "$i"); done

Cannot change the names of files that are the result of a for loop that echos file names

I've been successfully running a script that prints out the names of the files in a specific directory by using
for f in data/*
do echo $f
and when I run the program it gives me:
data/data-1.txt
data/data-2.txt
data/data-3.txt (the files in the data directory)
however, when I need to change all of the file names from data-*.txt to mydata-*txt, I can't figure it out.
I keep trying to use sed s/data/mydata/g $f but it prints out the whole file instead and doesn't change the name correctly. Can anybody give me some tips on how to change the file names? it seems to also change the name of the directory if I use SED, so I'm kind of a dead end. Even using mv doesn't seem to do anything.
for f in data/*
do
NewName="$( echo "${f}" | sed 's#/data-\([0-9]*.txt\)$#mydata\1#' )"
if [ ! "${f}" = "${NewName}" ]
then
mv ${f} ${NewName}
fi
done
based on your code but lot of other way to do it (ex: find -exec)

bash rename files with prefix serial number

I have loads of files in a folder. I want to do two things:
prefix them with xxx three digit serial numbers - ascending: 001 002 and so on
remove the prefix from their names, so 001a.xyz = a.xyz
I intend to do this using a simple bash script. What's the most elegant and simple to understand way to do this?
edit
the files are on a removable device, and I cannot seem to set chmod +X on the script on the device. So how do I run a script from my home directory which will change the files in another directory?
To add prefixes:
counter=1
for f in *; do
printf -v prefix_str '%03d' "$((counter++))"
mv "$f" "${prefix_str}$f"
done
To remove prefixes (caution -- this may overwrite if you have two files with the same suffix but different prefixes):
for f in [0-9][0-9][0-9]*; do
mv "$f" "${f:3}"
done
Use mv -n to avoid overwriting when two files have the same suffix.
This should work:
#!/bin/bash
count=1
for file in *; do
if [[ $file =~ [0-9][0-9][0-9].* ]]; then
sfile="${file:3}"
new=$(printf "%03d" ${count})
mv "$file" "${new}${sfile}"
((count++))
else
new=$(printf "%03d" ${count})
mv "$file" "${new}${file}"
((count++))
fi
done
What this script does is, checks for a given file in the current directory. If the file has a prefix already it will remove it and assign a new sequential prefix. If the file has no prefix it will add a sequential prefix to it.
The end result should be, all the files in your current directory (some with and some without prefixes) will have a new sequential prefixes.

In shell, how do I delete numbered duplicate files?

I've got a directory with a few thousand files in it, named things like:
filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.
Most of the files with bracketed numbers are duplicates of the original, but in some cases they're not.
How can I keep my original files, delete the duplicates, but not lose the files that are different?
I know that I could rm *\).ext, but that obviously doesn't make sure that files match the original.
I'm using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).
And I don't know what to write in the script. I was thinking of storing things in an array with this:
awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'
But that always seems to delete my original.
What am I doing wrong? How do I do it right?
Thanks.
Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that's seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.
Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.
I wouldn't do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.
Instead:
$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done
You can expand this to check MD5's along the way. But it's more code, so I'll break it into multiple lines, in a script:
#!/bin/bash
shopt -s nullglob # Show nothing if a fileglob matches no files
for file in *[^\)].ext; do
md5=$(md5 -q "$file") # The -q option gives you only the message digest
echo "-- Found: $file ($md5)"
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
rm -v "$duplicate"
fi
done
done
As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:
#!/bin/bash
shopt -s nullglob
for file in *[^\)].ext; do
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
  if cmp "$file" "$duplicate"; then
rm -v "$file"
fi
done
done
If you don't need to use AWK, you could maybe do something simpler in bash:
for file in *\([0-9]*\)*; do
[ -e "$(echo "$file" | sed -e 's/ ([0-9]\+)//')" ] && rm "$file"
done
Hope this helps a little =)

Resources