How to decrement a number in each filename in a directory? - bash

Running "ls -lrt" on my terminal I get a large list that looks something like this:
-rw-r--r-- 1 pratik staff 1849089 Jun 23 12:24 cam13-vid.webm
-rw-r--r-- 1 pratik staff 1850653 Jun 23 12:24 cam12-vid.webm
-rw-r--r-- 1 pratik staff 1839110 Jun 23 12:24 cam11-vid.webm
-rw-r--r-- 1 pratik staff 1848520 Jun 23 12:24 cam10-vid.webm
-rw-r--r-- 1 pratik staff 1839122 Jun 23 12:24 cam1-vid.webm
I have only shown part of it above as a sample.
I would like to rename all the files to have a number one less than current.
For example,
mv cam1-vid.webm cam0-vid.webm
mv cam2-vid.webm cam1-vid.webm
.....
....
mv cam 200-vid.webm cam199-vid.webm
How can this be done using a os x / linux bash script (perhaps using sed) ?

You can do this with plain bash:
for i in {1..200}
do
mv "cam${i}-vid.webm" "cam$((i-1))-vid.webm"
done

I would use find, split up the file names, to find the number, subtract one, and rename:
find . -name "cam*-vid.webm" -print0 | while read -d\$0 old_name
do
number=${old_name#cam} #Filter left to remove 'cam' prefix
number=${number%-vid.webm"} #Filter right to remove '-vid.webm' suffix
$((number -= 1))
new_name="cam${number}-vid.webm"
echo "mv \"$old_name\" \"$new_name\""
done | tee results
This will merely print out the commands (that is why I have echo). I'm piping it into a file named results. Once this command completes, look at results and make sure it does everything it should. Whenever there's an operation like this, there can be a nasty surprise. For example, if I rename cam02-vid.webm to cam01-vid.webm before I rename cam01-vid.webm, I am going to overwrite cam01-vid-webm.
Maybe a safer way is to explicitly give the file numbers I need:
for number in {1..200}
do
$((old_number = $number + 1))
echo mv "\"cam${old_number}-vid.webm\" \"cam${number}-vid.webm\""
done | tee results
Useful hint: If the result file looks good, you can actually just run it as a shell script:
$ bash results
Another possibility is to test to make sure the old file exist:
for number in {1..200}
do
$((old_number = $number + 1))
if [ -f "$cam${old_number}-vid.webm" ]
then
echo mv "\"cam${old_number}-vid.webm\" \"cam${number}-vid.webm\""
else
echo "ERROR: Can't find a file called 'cam${old_number}-vid.webm'"
fi
done | tee results

A perl solution.
First it traverses all input files (#ARGV) and filters those that are plain files and not links (grep), extracts the number (map) and sorts numerically in ascendant to avoid overwritting (sort). Later creates a new file decrementing the number and renames the original:
perl -e '
for (
sort { $a->[0] <=> $b->[0] }
map { m/(\d+)/; [$1, $_ ] }
grep { -f $_ && ! -l $_ }
#ARGV
) {
$n = --$_-> [0];
($newname = $_->[1]) =~ s/\A(?i)(cam)\d+(.*)\z/$1$n$2/;
print "Executing command ===> rename $_->[1], $newname\n";
rename $_->[1], $newname;
}' *
Assuming initial content of the directory as:
cam1-vid.webm
cam13-vid.webm
cam12-vid.webm
cam11-vid.webm
cam10-vid.webm
cam2-vid.webm
After running the command yields:
cam0-vid.webm
cam10-vid.webm
cam11-vid.webm
cam12-vid.webm
cam1-vid.webm
cam9-vid.webm

Related

Bash: How do I check (and return) the results of a command filtered by file content

I executed a command on Linux to list all the files & subfiles (with specific format) in a folder.
This command is:
ls -R | grep -e "\.txt$" -e "\.py$"
In an other hand, I have some filenames stored in a file .txt (line by line).
I want to show the result of my previous command, but I want to filter the result using the file called filters.txt.
If the result is in the file, I keep it
Else, I do not keep it.
How can I do it, in bash, in only one line?
I suppose this is something like:
ls -R | grep -e "\.txt$" -e "\.py$" | grep filters.txt
An example of the files:
# filters.txt
README.txt
__init__.py
EDIT 1
I am trying to a file instead a list of argument because I get the error:
'/bin/grep: Argument list too long'
EDIT 2
# The result of the command ls -R
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py
-rw-r--r-- 1 XXX 1 Oct 28 23:36 iamaninja.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 donttakeme2.txt
What I want as a result:
-rw-r--r-- 1 XXX 1 Oct 28 23:36 README.txt
-rw-r--r-- 1 XXX 1 Oct 28 23:36 __init__.py
You can use comm :
comm -12 <(ls -R | grep -e "\.txt$" -e "\.py$" ) <(cat filters.txt)
This will give you the intersection of the two lists.
EDIT
It seems that ls is not great for this, maybe find Would be safer
find . -type f | xargs grep $(sed ':a;N;$!ba;s/\n/\\|/g' filters.txt)
That is, for each of your files, take your filters.txt and replace all newlines with \| using sed and then grep for all the entries.
Grep uses \| between items when grepping for more than one item. So the sed command transforms the filters.txt into such a list of items to be used by grep.
grep -f filters.txt -r .
..where . is your current folder.
You can run this script in the target directory, giving the list file as a single argument.
#!/bin/bash -e
# exit early if awk fails (ie. can't read list)
shopt -s lastpipe
find . -mindepth 1 -type f -name '*.txt' -o -name '*.py' -print0 |
awk -v exclude_list_file="${1?:no list file provided}" \
'BEGIN {
while ((getline line < exclude_list_file) > 0) {
exclude_list[c++] = line
}
close(exclude_list_file)
if (c==0) {
exit 1
}
FS = "/"
RS = "\000"
}
{
for (i in exclude_list) {
if (exclude_list[i] == $NF) {
next
}
}
print
}'
It prints all paths, recursively, excluding any filename which exactly matches a line in the list file (so lines not ending .py or .txt wouldn’t do anything).
Only the filename is considered, the preceding path is ignored.
It fails immediately if no argument is given or it can't read a line from the list file.
The question is tagged bash, but if you change the shebang to sh, and remove shopt, then everything in the script except -print0 is POSIX. -print0 is common, it’s available on GNU (Linux), BSDs (including OpenBSD), and busybox.
The purpose of lastpipe is to exit immediately if the list file can't be read. Without it, find keeps runs until completion (but nothing gets printed).
If you specifically want the ls -l output format, you could change awk to use a null output record separator (add ORS = "\000" to the end of BEGIN, directly below RS="\000"), and pipe awk in to xargs -0 ls -ld.

Rename files with consecutive numbers, keeping the original filename

I got a bunch of mp3 files with random names and numbers like:
01_fileabc.mp3
01.filecdc.mp3
fileabc.mp3
929-audio.mp3
For sorting purposes, I need to add a sequential number in front of the file name like:
001_01_fileabc.mp3
002_01.filecdc.mp3
003_fileabc.mp3
004_929-audio.mp3
I checked some of the solutions I found here. One of the first solutions worked kind of but replaced the filename instead of adding to it.
num=0; for i in *; do mv "$i" "$(printf '%04d' $num).${i#*.}"; ((num++)); done
How can I modify this command to add to the filename instead?
I am sorry, but whatever I try I can't find a solution myself here.
Just replace ${i#*.} (which stands for "Remove from $i from the left up to the first dot) with $i, which is the original name of the file (I'd probably use $filename, $oldfile, or at least $f instead of $i as the variable's name).
You can also replace the . before it with _, otherwise the files will be named
0001.01_fileabc.mp3
etc.
UPDATE: As RobC commented about this answer, existing whitespace or newline characters can cause problems listing files because of using ls command with bash arrays. So the above code can be improved in this way
#!/bin/bash
i=0
for file in *.mp3; do
i=$((i+1))
mv "$file" "$(printf "%03d_%s" "$i" "$file")"
done
ORIGINAL ANSWER: You can try this code in a bash script. Remember to make it executable with
$ chmod +x script.sh.
#!/bin/bash
contents_dir=($(ls *.mp3))
for file in ${!contents_dir[*]}; do
new=$(awk -v i="$file" -v cd="${contents_dir[$file]}" 'BEGIN {printf("%03d_%s", i+1, cd)}')
mv ${contents_dir[$file]} $new
done
It will add a consecutive 0-leaded tree digits number as you wanted to all mp3 files found in the dir where the script is executed.
You could try this …
$ ls -l
total 4
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 01.filecdc.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 01_fileabc.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 929-audio.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 fileabc.mp3
$ t=0
$ for i in *mp3
> do
> # Use the seq command to get a formatted zero filled string.
> prefix=$(seq -f "%04g" $t $t)
>
> # Move $i to new file name.
> mv $i ${prefix}_${i}
>
> # Increment our counter, t.
> t=$(expr $t + 1)
> done
$ ls -l
total 4
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 0000_01.filecdc.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 0001_01_fileabc.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 0002_929-audio.mp3
-rw-r--r-- 1 plankton None 5 Nov 4 13:35 0003_fileabc.mp3

how to move specific files based on a key and rename them

I have over 100000 files.
for example, I mentioned 3 files below
bcbb79d8-1d4a-4fbb-b16c-4df86839773e.htseq.counts.gz
bcdc68db-c874-4097-9c46-b06e331caaf5.htseq.counts.gz
bd4b6975-90d9-43f8-aadc-344d04644822.htseq.counts.gz
I have a text file named key.txt with the following information.
File Name ID
bcbb79d8-1d4a-4fbb-b16c-4df86839773e.htseq.counts.gz TCCC-06-0210
bcdc68db-c874-4097-9c46-b06e331caaf5.htseq.counts.gz TCHA-27-2519
bd4b6975-90d9-43f8-aadc-344d04644822.htseq.counts.gz TCHU-76-4929
I want to take only those files that their name are in the key , move them to a new folder and change their name to the ID.
I guess a little more of a write up rather than a comment would be helpful. The approach to take is to read the filename (fname) and ID (id) from each line in key.txt and then validate that fname is a file and does exist, and then move the file in "$fname" to whatever "/path/to/move/to/$id" you need.
For example:
#!/bin/bash
## read each line into variables fname and id (handle non-POSIX eof)
while read -r fname id || [ -n "$fname" ]; do
## test that "$fname" is a file, and if so, move to destination
[ -f "$fname" ] && mv "$fname" "/path/to/move/to/$id"
done < key.txt
(note: a POSIX end-of-file (eof) is simply the final '\n' at the end of the last line. Some editors do not enforce it and it will cause your read to miss the final line of data unless you check that "$fname" was filled with data (is non-empty) -- the [ -n "$fname" ] added to the end of the white read -r ...)
You are feeding the loop with a redirection of key.txt. Each iteration of the while loop will read a new line from key.txt into the variables fname and id (word-splitting on the default Internal Field Separator (IFS). After the read and separation into fname and id, you simply verify $fname holds a valid filename (in the current working directory) and then mv the file where you want it.
You should execute the script in the directory containing the files, or append a relative or absolute filename to where they are located to "$fname".
Example
Here is a short example that may help clear things up:
The move_rename.sh script:
$ cat move_rename.sh
#!/bin/bash
## read each line into variables fname and id (handle non-POSIX eof)
while read -r fname id || [ -n "$fname" ]; do
## test that "$fname" is a file, and if so, move to destination
[ -f "$fname" ] && mv "$fname" "dest/$id.txt"
done < key.txt
The key.txt file:
$ cat key.txt
File Name ID
bcbb79d8-1d4a-4fbb-b16c-4df86839773e.htseq.counts.gz TCCC-06-0210
bcdc68db-c874-4097-9c46-b06e331caaf5.htseq.counts.gz TCHA-27-2519
bd4b6975-90d9-43f8-aadc-344d04644822.htseq.counts.gz TCHU-76-4929
File locations before script execution. (dest) is the directory to move to. (that is ls -one output not ls -L(lowercase), the ls -al is `L(lowercase))
$ ls -1
dest
bcbb79d8-1d4a-4fbb-b16c-4df86839773e.htseq.counts.gz
bcdc68db-c874-4097-9c46-b06e331caaf5.htseq.counts.gz
bd4b6975-90d9-43f8-aadc-344d04644822.htseq.counts.gz
key.txt
move_rename.sh
$ ls -al dest
total 16
drwxr-xr-x 2 david david 4096 Jan 17 20:05 .
drwxr-xr-x 16 david david 12288 Jan 17 20:05 ..
Execute the script
$ bash move_rename.sh
Working directory contents after execution
$ ls -1
dest
key.txt
move_rename.sh
Contents of dest after execution.
$ ls -al dest
total 8
drwxr-xr-x 2 david david 4096 Jan 17 20:00 .
drwxr-xr-x 3 david david 4096 Jan 17 20:00 ..
-rw-r--r-- 1 david david 0 Jan 17 19:59 TCCC-06-0210.txt
-rw-r--r-- 1 david david 0 Jan 17 19:59 TCHA-27-2519.txt
-rw-r--r-- 1 david david 0 Jan 17 19:59 TCHU-76-4929.txt

How to add formatting to, and batch rename filenames?

I have around 7,000 .txt files that have been spat out by a program where the naming convention clearly broke. The only saving grace is that they follow the following structure: id, date, time.
m031060209104704.txt --> id:m031 date:060209 time:104704.txt
Sample of other filenames (again same thing):
115-060202105710.txt --> id:115- date:060202 time: 105710.txt
x138051203125338.txt etc...
9756060201194530.txt etc..
I want to rename all 7,000 files in this directory to look like the following:
m031060209104704.txt --> 090206_104704_m031.txt
i.e date_time_id (each separated by underscores or hyphens, I don't mind). I need the date format to be switched from yymmdd to ddmmyy as shown directly above though!
I'm not clear on whats overkill here, full program script or bash command (MAC OS). Again, I don't mind, any and all help is appreciated.
Try something like:
#!/bin/bash
# directory to store renamed files
newdir="./renamed"
mkdir -p $newdir
for file in *.txt; do
if [[ $file =~ ^(....)([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{6})\.txt$ ]]; then
# extract parameters
id=${BASH_REMATCH[1]}
yy=${BASH_REMATCH[2]}
mm=${BASH_REMATCH[3]}
dd=${BASH_REMATCH[4]}
time=${BASH_REMATCH[5]}
# then rearrange them to new name
newname=${dd}${mm}${yy}_${time}_${id}.txt
# move to new directory
mv "$file" "$newdir/$newname"
fi
done
Bash string indexes makes it very easy and efficient to rework the filenames as you intend. You should also validate you are only operating on input filenames of 20 characters. That can be accomplished as follows:
#!/bin/bash
for i in *.txt; do
## validate a 20 character filename
(( ${#i} == 20 )) || { printf "invalid length '%s'\n" "$i"; continue; }
echo "mv $i ${i:8:2}${i:6:2}${i:4:2}_${i:10:6}_${i:0:4}.txt" ## output rename
mv "$i" "${i:8:2}${i:6:2}${i:4:2}_${i:10:6}_${i:0:4}.txt" ## actual rename
done
Example Directory
$ ls -l
total 0
-rw-r--r-- 1 david david 0 Dec 21 19:16 115-060202105710.txt
-rw-r--r-- 1 david david 0 Dec 21 19:16 9756060201194530.txt
-rw-r--r-- 1 david david 0 Dec 21 19:15 m031060209104704.txt
-rw-r--r-- 1 david david 0 Dec 21 19:16 x138051203125338.txt
Example Use/Output
$ cd thedir
$ bash ../script.sh
mv 115-060202105710.txt 020206_105710_115-.txt
mv 9756060201194530.txt 010206_194530_9756.txt
mv m031060209104704.txt 090206_104704_m031.txt
mv x138051203125338.txt 031205_125338_x138.txt
$ ls -l
total 0
-rw-r--r-- 1 david david 0 Dec 21 19:42 010206_194530_9756.txt
-rw-r--r-- 1 david david 0 Dec 21 19:42 020206_105710_115-.txt
-rw-r--r-- 1 david david 0 Dec 21 19:42 031205_125338_x138.txt
-rw-r--r-- 1 david david 0 Dec 21 19:42 090206_104704_m031.txt
Look things over and let me know if you have any further questions.

awk, IFS, and file name truncations

Updated question based on new information…
Here is a gist of my code, with the general idea that I store items in DropBox at:
~/Dropbox/Public/drops/xx.xx.xx/whatever
Where the date is always 2 chars, 2 chars, and 2 chars, dot separated. Within that folder can be more folders and more files, which is why when I use find I do not set the depth and allow it to scan recursively.
https://gist.github.com/anonymous/ad51dc25290413239f6f
Below is a shortened version of the gist, it won't run as it stands, I don't believe, though the gist will run assuming you have DropBox installed and there are files at the path location that I set up.
General workflow:
SIZE="+250k" # For `find` this is the value in size I am looking for files to be larger than
# Location where I store the output to `find` to process that file further later on.
TEMP="/tmp/drops-output.txt"
Next I rm the tmp file and touch a new one.
I will then cd into
DEST=/Users/$USER/Dropbox/Public/drops
Perform a quick conditional check to make sure that I am working where I want to be,
with all my values as variables, I could mess up easily and not be working where I
thought I would be.
# Conditional check: is the current directory the one I want to be the working directory?
if [ "$(pwd)" = "${DEST}" ]; then
echo -e "Destination and current working directory are equal, this is good!:\n $(pwd)\n"
fi
The meat of step one is the `find` command
# Use `find` to locate a subset of files that are larger than a certain size
# save that to a temp file and process it. I believe this could all be done in
# one find command with -exec or similar but I can't figure it out
find . -type f -size "${SIZE}" -exec ls -lh {} \; >> "$TEMP"
Inside $TEMP will be a data set that looks like this:
-rw-r--r--# 1 me staff 61K Dec 28 2009 /Users/me/Dropbox/Public/drops/12.28.09/wor-10e619e1-120407.png
-rw-r--r--# 1 me staff 230K Dec 30 2009 /Users/me/Dropbox/Public/drops/12.30.09/hijack-loop-d6250496-153355.pdf
-rw-r--r--# 1 me staff 49K Dec 31 2009 /Users/me/Dropbox/Public/drops/12.31.09/mt-5a819185-180538.png
The trouble is, not all files will contains no spaces, though I have done all I can to make sure variables are quoted
and wrapped in parens or braces or quotes where applicable.
With the results in /tmp I run:
# Number of results located as a result of the find `command` above
RESULTS=$(wc -l "$TEMP" | awk '{print $1}')
echo -e "Located: [$RESULTS] total files greater than or equal to $SIZE\n"
# With a result set found via `find`, now use awk to print out the sorted list of file
# sizes and paths.
echo -e "SIZE DATE FILE PATH"
#awk '{print "["$5"] ", $9, $10}' < "$TEMP" | sort -n
awk '{for(i=5;i<=NF;i++) {printf $i " "} ; printf "\n"}' "$TEMP" | sort -n
With the changes to awk from how I had it originally, my result now looks like this:
751K Oct 21 19:00 ./10.21.14/netflix-67-190039.png
760K Sep 14 19:07 ./01.02.15/logos/RCA_old_logo.jpg
797K Aug 21 03:25 ./08.21.14/girl-88-032514.zip
916K Sep 11 21:47 ./09.11.14/small-shot-4d-214727.png
I want it to look like this:
SIZE FILE PATH
========================================
751K ./10.21.14/netflix-67-190039.png
760K ./01.02.15/logos/RCA_old_logo.jpg
797K ./08.21.14/girl-88-032514.zip
916K ./09.11.14/small-shot-4d-214727.png
# All Done
if [ "$?" -ne "0" ]; then
echo "find of drop files larger than $SIZE completed without errors.\n"
exit 1
fi
Original Post to Stack prior to gaining some new information leading to new questions…
Original Post is below, given new information, I tried some new tactics and have left myself with the above script and info.
I have a simple script, Mac OS X, it performs a find on a dir and locates all files of type file and of size greater than +SIZE
These are then appended to a file via >>
From there, I have a file that essentially contains a ls -la listing, so I use awk to get to the file size and the file name with this command:
# With a result set found via `find`, now use awk to print out the sorted list of file
# sizes and paths.
echo -e "SIZE FILE PATH"
awk '{print "["$5"] ", $9, $10}' < "$TEMP" | sort -n
All works as I want it to, but I get some filename truncation right at the above code. The entire file is around 30 lines, I have pinned it to this line. I think if I throw in a different Internal Field Sep that would fix it. I could use \t as there can't be a \t in Mac OS X filenames.
I thought it was just quoting, but I can't seem to see where if that is the case. Here is a sample of the data returned, usually I get about 50 results. The first one I stuffed in this file has filename truncation:
[1.0M] ./11.26.14/Bruna Legal
[1.4M] ./12.22.14/card-88-082636.jpg
[1.6M] ./12.22.14/thrasher-8c-082637.jpg
[11M] ./01.20.15/td-6e-225516.mp3
Bruna Legal is "Bruna Legal Name.pdf" on the filesystem.
You can avoid parsing the output of ls command and do the whole work with find using the printf action, like:
find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%kKB %f\n" |
sort -nrk1,1
In my example it outputs every file that is bigger than 4 kilobytes. The issue is that the find command cannot print formatted output with the size in MB. In addition the numeric ordering does not work for me with square brackets surrounding the number, so I omit them. In my test it yields:
140KB +~JF7115171557203024470.tmp
140KB +~JF3757415404286641313.tmp
120KB +~JF8126196619419441256.tmp
120KB +~JF7746650828107924225.tmp
120KB +~JF7068968012809375252.tmp
120KB +~JF6524754220513582381.tmp
120KB +~JF5532731202854554147.tmp
120KB +~JF4394954996081723171.tmp
24KB +~JF8516467789156825793.tmp
24KB +~JF3941252532304626610.tmp
24KB +~JF2329724875703278852.tmp
16KB 578829321_2015-01-23_1708257780.pdf
12KB 575998801_2015-01-16_1708257780-1.pdf
8KB adb.log
EDIT because I've noted that %k is not accurate enough, so you can use %s to print in bytes and transform to KB o MB using awk, like:
find /tmp -type f -maxdepth 1 -size +4k 2>/dev/null -printf "%sKB %f\n" |
sort -nrk1,1 |
awk '{ $1 = sprintf( "%.2f", $1 / 1024) } { print }'
It yields:
136.99KB +~JF7115171557203024470.tmp
136.99KB +~JF3757415404286641313.tmp
117.72KB +~JF8126196619419441256.tmp
117.72KB +~JF7068968012809375252.tmp
117.72KB +~JF6524754220513582381.tmp
117.68KB +~JF7746650828107924225.tmp
117.68KB +~JF5532731202854554147.tmp
117.68KB +~JF4394954996081723171.tmp
21.89KB +~JF8516467789156825793.tmp
21.89KB +~JF3941252532304626610.tmp
21.89KB +~JF2329724875703278852.tmp
14.14KB 578829321_2015-01-23_1708257780.pdf
10.13KB 575998801_2015-01-16_1708257780-1.pdf
4.01KB adb.log

Resources