Batch rename all files in a directory to basename-sequentialnumber.extention - bash

I have a directory containing .jpg files, currently named photo-1.jpg, photo-2.jpg etc. There are about 20,000 of these files, sequentially numbered.
Sometimes I delete some of these files, which creates gaps in the file naming convention.
Can you guys help me with a bash script that would sequentially rename all the files in the directory to eliminate the gaps? I have found many posts about renaming files and tried a bunch of things, but can't quite get exactly what I'm looking for.
For example:
photo-1.jpg
photo-2.jpg
photo-3.jpg
Delete photo-2.jpg
photo-1.jpg
photo-3.jpg
run script to sequentially rename all files
photo-1.jpg
photo-2.jpg
done

With find and sort.
First check the output of
find directory -type f -name '*.jpg' | sort -nk2 -t-
If the output is not what you expected it to be, meaning the order of sorting is not correct, then It might have something to do with your locale. Add the LC_ALL=C before the sort.
find directory -type f -name '*.jpg' | LC_ALL=C sort -nk2 -t-
Redirect it to a file so it can be recorded, add a | tee output.txt after the sort
Add the LC_ALL=C before the sort in the code below if it is needed.
#!/bin/sh
counter=1
find directory -type f -name '*.jpg' |
sort -nk2 -t- | while read -r file; do
ext=${file##*[0-9]} filename=${file%-*}
[ ! -e "$filename-$counter$ext" ] &&
echo mv -v "$file" "$filename-$counter$ext"
counter=$((counter+1))
done # 2>&1 | tee log.txt
Change the directory to the actual name of your directory that contains the files that you need to rename.
If your sort has the -V flag/option then that should work too.
sort -nk2 -t- The -n means numerically sort. -k2 means the second field and the -t- means the delimiter/separator is a dash -, can be written as -t -, caveat, if the directory name has a dash - as well, sorting failure is expected. Adjust the value of -k and it should work.
ext=${file##*[0-9]} is a Parameter Expansion, will remain only the .jpg
filename=${file%-*} also a Parameter Expansion, will remain only the photo plus the directory name before it.
[ ! -e "$filename-$counter$ext" ] will trigger the mv ONLY if the file does not exists.
If you want some record or log, remove the comment # after the done
Remove the echo if you think that the output is correct

Related

How to get full path of last modified file in directory including nested directories?

How would I modify this code to give me the full file path of the last modified file in the code directory, including nested sub-directories?
# Gets the last modified file in the code directory.
get_filename(){
cd "$code_directory" || no_code_directory_error # Stop script if directory doesn't exist.
last_modified=$(ls -t | head -n1)
echo "$last_modified"
}
Use find instead of ls, because the use of ls is an anti-pattern.
Use a Schwartzian transform to prefix your data with a sort key.
Sort the data.
Take what you need.
Remove the sort key.
Post process the data.
find "$code_directory" -type f -printf '%T# %p\n' |
sort -rn |
head -1 |
sed 's/^[0-9.]\+ //' |
xargs readlink -f
You can use the realpath utility.
# Gets the last modified file in the code directory.
get_filename(){
cd "$code_directory" || no_code_directory_error # Stop script if directory doesn't exist.
last_modified=$(ls -t | head -1)
echo "$last_modified"
realpath "$last_modified"
}
Output:
blah.txt
/full/path/to/blah.txt
ls -t sort by modification time and if you want first one you can add | head -1, R helps you recursively sort files, I think the only tips here is ls -tR doesn't stack all files then sort them, so you can use
find . -type f -printf "%T# %f\n" | sort -rn > out.txt

plus 1 to filename

My task is log rotation and I can't find any command which can some number from filename with 1.
For example, I have some files with name: wrapper.log.1, wrapper.log.2.
I need to rename and move that files to other directory and get wrapper_1.log, wrapper_2.log. After file was moved it should be deleted from origin directory.
It is possible, that in new folder there are files with the same name.
So, I should get last file and plus 1 to its filename like wrapper_(2+1).log.
For whole my task I found something like
find . -name "wrapper.log.*"
mkdir $(date '+ %d.%m.%y')
find . -name "wrapper.log.*" |sort -r |head -n1 | sed -E 's/(.log)(.[0-9])/_$(2+1)\1/'
But, of course, it doesn`t work after second line.
And, in future, it needs to be in bash.
P.S: Also, I think, It is possible to create just new file in new folder with timestamp or smth like that as postfix.
For example:
folder file
01.01.19 wrapper_00_00_01
wrapper_00_01_07
wrapper_01_10_53
wrapper_13_07_11
02.01.19
wrapper_01_00_01
wrapper_03_01_07
wrapper_05_10_53
wrapper_13_07_11
To find the highest number of the wrapper_ log files:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1
I'm using grep's pearl switch to do a look-behind for "wrapper_", then reverse sorting the numbers found and taking the first one. If you want to generate a new file name, I'd use awk, e.g:
find . -type f -name "*.log" -exec basename {} \; | ggrep -Po '(?<=wrapper_)[0-9]' | sort -rn | head -n 1 | awk '{print "wrapper_"$1 + 1".log" }'
This will produce a file name with the next number in the sequence.
I don't understand your question entirely but I know that using a dollar-sign and double brackets you can execute a calculation:
Prompt>echo $((1+1))
2
Finally, I found two solutions.
First, it`s bash, smth like this
#!/bin/bash
#DECLARE
FILENAME=$1
DATE=$(date '+%d.%m.%y')
SRC_DIR="/usr/local/apache-servicemix-6.1.0/data/log"
DEST_DIR="/mnt/smxlog/$HOSTNAME"
#START
mkdir -m 777 "$DEST_DIR/$DATE"
if [ -d "$DEST_DIR/$DATE" ]
then
for f in $(find "$SRC_DIR/" -name "$FILENAME.log.*")
do
TIME=$(date '+%H_%M_%S.%3N')
NEW_FILENAME="$FILENAME-$TIME.log"
NEW_DEST_WITH_FILE="$DEST_DIR/$DATE/$NEW_FILENAME"
mv $f $NEW_DEST_WITH_FILE
gzip "$NEW_DEST_WITH_FILE"
done
else
exit 1
fi
#END
And the second variant is using log4j logger properties, but it should to upload to servicemix system folder log4j-1.2.17_fragment.jar and apache-log4j-extras-1.2.17_fragment. May be it is possible to upload them as bundle, I didn`t try.
Both jar use different API. There are
https://logging.apache.org/log4j/1.2/apidocs/index.html?overview-summary.html and
http://logging.apache.org/log4j/companions/apidocs/index.html?overview-summary.html
And properties will be
log4j.logger.wrapper.log=DEBUG, wrapper
log4j.additivity.logger.wrapper.log=false
log4j.appender.wrapper=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.wrapper.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
#This setting should be used with commented line log4j.appender.wrapper.File=... if it needs to zip to target directory immediately
#log4j.appender.wrapper.rollingPolicy.FileNamePattern=/mnt/smxlog/${env:HOSTNAME}/wrapper.%d{HH:mm:ss}.log.gz
#Or it is possible to log and zip in the same folder, and after that with cron replace zipped files to required folder
log4j.appender.wrapper.rollingPolicy.FileNamePattern=${karaf.data}/log/wrapper.%d{HH:mm:ss}.log.gz
log4j.appender.wrapper.File=${karaf.data}/log/wrapper.log
log4j.appender.wrapper.triggeringPolicy=org.apache.log4j.rolling.SizeBasedTriggeringPolicy
#Size in bytes
log4j.appender.wrapper.triggeringPolicy.MaxFileSize=1000000
log4j.appender.wrapper.layout=org.apache.log4j.PatternLayout
log4j.appender.wrapper.layout.ConversionPattern=%d{dd-MM-yyyy_HH:mm:ss} %-5p [%t] - %m%n
log4j.appender.wrapper.Threshold=DEBUG
log4j.appender.wrapper.append=true

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Bash command to (re)move duplicate files [eg photos]

If I have a directory with a bunch of photos and some of them are duplicates [in everything except name], is there a way I can get a list of uniques and move them to another dir?
Eg
find . -type f -print0 | xargs -0 md5sum
that will give me a list of "md5 filename"
Now I just want to look at uniques based on that...
eg pipe that to sort -u.
After that I want to mv all of those files somewhere else, but I can worry about that later...
You can use fdupes:
fdupes -r .
to get a list of duplicates. The move should be possible with some command chaining.
fdupes -r -f .
Shows you only the duplicated files. So if you have an image twice. You'll get one entry instead of both duplicated paths.
To move you could do:
for file in $(fdupes -r -f . | grep -v '^$')
do
mv "$file" duplicated-files/
done
But be aware of name clashes..
From there:
sort | uniq -w 32
Will compare only the first 32 characters, which I believe should be the md5sum itself.

unix command to find most recent directory created

I want to copy the files from the most recent directory created. How would I do so in unix?
For example, if I have the directories names as date stamp as such:
/20110311
/20110318
/20110325
This is the answer to the question I think you are asking.
When I deal with many directories that have date/time stamps in the name, I always take the approach that you have which is YYYYMMDD - the great thing about that is that the date order is then also the alphabetical order. In most shells (certainly in bash and I am 90% sure of the others), the '*' expansion is done alphabetically, and by default 'ls' return alphabetical order. Hence
ls | head -1
ls | tail -1
Give you the earliest and the latest dates in the directory.
This can be extended to only keep the last 5 entries etc.
lastdir=`ls -tr <parentdir> | tail -1`
I don't know how to make the backticks play nice with the commenting system here. Just replace those apostrophes with backticks.
After some experimenting, I came up with the following:
The unix stat command is useful here. The '-t' option causes stat to print its output in terse mode (all in one line), and the 13th element of that terse output is the unix timestamp (seconds since epoch) for the last-modified time. This command will list all directories (and sub-directories) in order from newest-modified to oldest-modified:
find -type d -exec stat -t {} \; | sort -r -n -k 13,13
Hopefully the "terse" mode of stat will remain consistent in future releases of stat !
Here's some explanation of the command-line options used:
find -type d # only find directories
find -exec [command] {} \; # execute given command against each *found* file.
sort -r # reverse the sort
sort -n # numeric sort (100 should not appear before 2!)
sort -k M,N # only sort the line using elements M through N.
Returning to your original request, to copy files, maybe try the following. To output just a single directory (the most recent), append this to the command (notice the initial pipe), and feed it all into your 'cp' command with backticks.
| head --lines=1 | sed 's/\ .*$//'
The trouble with the ls based solutions is that they are not filtering just for directories. I think this:
cp `find . -mindepth 1 -maxdepth 1 -type d -exec stat -c "%Y %n" {} \; |sort -n -r |head -1 |awk '{print $2}'`/* /target-directory/.
might do the trick, though note that that will only copy files in the immediate directory. If you want a more general answer for copying anything below your newest directory over to a new directory I think you would be better off using rsync like:
rsync -av `find . -mindepth 1 -maxdepth 1 -type d -exec stat -c "%Y %n" {} \; |sort -n -r |head -1 |awk '{print $2}'`/ /target-directory/
but it depends a bit which behaviour you want. The explanation of the stuff in the backticks is:
. - the current directory (you may want to specify an absolute path here)
-mindepth/-maxdepth - restrict the find command only to the immediate children of the current directory
-type d - only directories
-exec stat .. - outputs the modified time and the name of the directory from find
sort -n -r |head -1 | awk '{print $2}' - date orders the directory and outputs the name of the most recently modified
If your directories are named YYYYMMDD like your question suggests, take advantage of the alphabetic globbing.
Put all directories in an array, and then pick the first one:
dirs=(*/); first_dir="$dirs";
(This is actually a shortcut for first_dir="${dirs[0]}";.)
Similarly, for the last one:
dirs=(*/); last_dir="${dirs[$((${#dirs[#]} - 1))]}";
Ugly syntax, but this is what it breaks down to:
# Create an array of all directories inside the working directory.
dirs=(*/);
# Get the number of entries in the array.
num_dirs=${#dirs[#]};
# Calculate the index of the last entry.
last_index=$(($num_dirs - 1));
# Get the value at the last index.
last_dir="${dirs[$last_index]}";
I know this is an old question with an accepted answer, but I think this method is preferable as it does everything in Bash. No reason to spawn extra processes, let alone parse the output of ls. (Which, admittedly, should be fine in this particular case of YYYYMMDD names.)
please try with following command
ls -1tr | tail -1
find ~ -type d | ls -ltra
This one is simple and useful which I learned recently.
This command will show the results in reverse chronological order.
I wrote a command that can be used to identify which folder or files are created in a folder as a newest. That's seems pure :)
#/bin/sh
path=/var/folder_name
newest=`find $path -maxdepth 1 -exec stat -t {} \; |sed 1d |sort -r -k 14 | head -1 |awk {'print $1'} | sed 's/\.\///g'`
find $path -maxdepth 1| sed 1d |grep -v $newest

Resources