Loop over filenames, rename them via a condition - bash

How do I loop over separate filenames and rename them?
The "task/condition" is:
Cut the first 5 letters and the last 4 letters?
e.g. I have these files:
1212erertugg.jpg
14rtzuzuiopo.jpg
tz7878nhmnop.jpg
etc...
The result should look like this:
rertugg
uzuiopo
8nhmnop

Use parameter expansion to extract the substrings:
#!/bin/bash
for file in 1212erertugg.jpg 14rtzuzuiopo.jpg tz7878nhmnop.jpg ; do
substr=${file:5}
substr=${substr:0:-4}
mv "$file" "$substr"
done
You might need to check whether you're not overwriting an already existing file, either an original one or created by the script itself in one of the previous steps.

Related

How Can I Loop Edit Multiple Files in Bash script?

I have 40 csv files that I need to edit. 20 have matching format and the names only differ by one character, e.g., docA.csv, docB.csv, etc. The other 20 also match and are named pair_docA.csv, pair_docB.csv, etc.
I have the code written to edit and combine docA.csv and pair_docA.csv, but I'm struggling writing a loop that calls both the above files, edits them, and combines them under the name combinedA.csv, then goes on the the next pair.
Can anyone help my rudimentary bash scripting? Here's what I have thus far. I've tried in a single for loop, and now I'm trying in 2 (probably 3) for loops. I'd prefer to keep it in a single loop.
set -x
DIR=/path/to/file/location
for file in `ls $DIR/doc?.csv`
do
#code to edit the doc*.csv files ie $file
done
for pairdoc in `ls $DIR/pair_doc?.csv`
do
#code to edit the piar_doc*.csv files ie $pairdoc
done
#still need to combine the files. I have the join written for a single iteration,
#but how do I loop the code to save each join as a different file corresponding
#to combined*.csv
Something along these lines:
#!/bin/bash
dir=/path/to/file/location
cd "$dir" || exit
for file in doc?.csv; do
pair=pair_$file
# "${file#doc}" deletes the prefix "doc"
combined=combined_${file#doc}
cat "$file" "$pair" >> "$combined"
done
ls, on principle, shouldn't be used in a shell script in order to iterate over the files. It is intended to be used interactively and nearly never needed within a script. Also, all-capitalized variable names shouldn't be used as ordinary variables, since they may collide with internal shell variables or environment variables.
Below is a version without changing the directory.
#!/bin/bash
dir=/path/to/file/location
for file in "$dir/"doc?.csv; do
basename=${file#"$dir/"}
pair=$dir/pair_$basename
combined=$dir/combined_${basename#doc}
cat "$file" "$pair" >> "$combined"
done
This might work for you (GNU parallel):
parallel cat {1} {2} \> join_{1}_{2} ::: doc{A..T}.csv :::+ pair_doc{A..T}.csv
Change the cat commands to your chosen commands where {1} represents the docX.csv files and {2} represents the pair_docX.csv file.
N.B. X represents the letters A thru T

Using brace expansion to move files on the command line

I have a question concerning why this doesn't work. Probably, it's a simple answer, but I just can't seem to figure it out.
I want to move a couple of files I have. They all have the same filename (let's say file1) but they are all in different directories (lets say /tmp/dir1,dir2 and dir3). If I were to move these individually I could do something along the lines of:
mv /tmp/dir1/file1 /tmp
That works. However, I have multiple directories and they're all going to end up in the same spot....AND I don't want to overwrite. So, I tried something like this:
mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
When I try this I get:
/tmp/file1.c is not a directory
Just to clarify...this also works:
mv /tmp/dir1/file1 /tmp/file1.c
Pretty sure this has to do with brace expansion but not certain why.
Thanks
Just do echo to understand how the shell expands:
$ echo mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
mv /tmp/dir1/file1 /tmp/dir2/file1 /tmp/dir3/file1 /tmp/file1.a /tmp/file1.b /tmp/file1.c
Now you can see that your command is not what you want, because in a mv command, the destination (directory or file) is the last argument.
That's unfortunately now how the shell expansion works.
You'll have to probably use an associative array.
!/bin/bash
declare -A MAP=( [dir1]=a [dir2]=b [dir3]=c )
for ext in "${!MAP[#]}"; do
echo mv "/tmp/$ext/file1" "/tmp/file1.${MAP[$ext]}"
done
You get the following output when you run it:
mv /tmp/dir2/file1 /tmp/file1.b
mv /tmp/dir3/file1 /tmp/file1.c
mv /tmp/dir1/file1 /tmp/file1.a
Like with many other languages key ordering is not guaranteed.
${!MAP[#]} returns an array of all the keys, while ${MAP[#]} returns the an array of all the values.
Your syntax of /tmp/{dir1,dir2,dir3}/file1 expands to /tmp/dir1/file /tmp/dir2/file /tmp/dir3/file. This is similar to the way the * expansion works. The shell does not execute your command with each possible combination, it simply executes the command but expands your one value to as many as are required.
Perhaps instead of a/b/c you could differentiate them with the actual number of the dir they came from?
$: for d in 1 2 3
do echo mv /tmp/dir$d/file1 /tmp/file1.$d
done
mv /tmp/dir1/file1 /tmp/file1.1
mv /tmp/dir2/file1 /tmp/file1.2
mv /tmp/dir3/file1 /tmp/file1.3
When happy with it, take out the echo.
A relevant point - brace expansion is not a wildcard. It has nothing to do with what's on disk. It just creates strings.
So, if you create a bunch of files named with single letters or digits, echo ? will wildcard and list them all, but only the ones actually present. If there are files for vowels but not consonants, only the vowels will show. But -
if you say echo {foo,bar,nope} it will output foo bar nope regardless of whether or not any or all of those exist as files or directories, etc.

Wildcard on mv folder destination

I'm writing a small piece of code that checks for .mov files in a specific folder over 4gb and writes it to a log.txt file by name (without an extension). I'm then reading the names into a while loop line by line which signals some archiving and copying commands.
Consider a file named abcdefg.mov (new) and a corresponding folder somewhere else named abcdefg_20180525 (<-*underscore timestamp) that also contains a file named abcedfg.mov (old).
When reading in the filename from the log.txt, I strip the extension to store the variable "abcdefg" ($in1) and i'm using that variable to locate a folder elsewhere that contains that matching string at the beginning.
My problem is with how the mv command seems to support a wild card in the "source" string, but not in the "destination" string.
For example i can write;
mv -f /Volumes/Myshare/SourceVideo/$in1*/$in1.mov /Volumes/Myshare/Archive
However a wildcard on the destination doesn't work in the same way. For example;
mv -f /Volumes/Myshare/Processed/$in1.mov Volumes/Myshare/SourceVideo/$in1*/$in1.mov
Is there an easy fix here that doesn't involve using another method?
Cheers for any help.
mv accepts a single destination path. Suppose that $in1 is abcdfg, and that $in1* expands to abcdefg_20180525 and abcdefg_20180526. Then the command
mv -f /dir1/$in1 /dir2/$in1*/$in1.mov
will be equivalent to:
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir1/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
mv -f /dir2/abcdefg_20180525/abcdefg.mov /dir2/abcdefg_20180526/abcdefg.mov
Moreover, because the destination file is the same in all three cases, the first two files will be overwritten by the third.
You should create a precise list and do a precise copy instead of using wild cards.
This is what I would probably do, generate a list of results in a file with FULL path information, then read those results in another function. I could have used arrays but I wanted to keep it simple. At the bottom of this script is a function call to scan for files of EXT mp4 (case insensitive) then writes the results to a file in tmp. then the script reads the results from that file in another function and performs some operation (mv etc.). Note, if functions are confusing , you can just remove the function name { } and name calls and it becomes a normal script again. functions are really handy, learn to love them!
#!/usr/bin/env bash
readonly SIZE_CHECK_LIMIT_MB="10M"
readonly FOLDER="/tmp"
readonly DESTINATION_FOLDER="/tmp/archive"
readonly SAVE_LIST_FILE="/tmp/$(basename $0)-save-list.txt"
readonly EXT="mp4"
readonly CASE="-iname" #change to -name for exact ext type upper/lower
function find_files_too_large() {
> ${SAVE_LIST_FILE}
find "${FOLDER}" -maxdepth 1 -type f "${CASE}" "*.${EXT}" -size +${SIZE_CHECK_LIMIT_MB} -print0 | while IFS= read -r -d $'\0' line ; do
echo "FOUND => $line"
echo "$line" >> ${SAVE_LIST_FILE}
done
}
function archive_large_files() {
local read_file="${SAVE_LIST_FILE}"
local write_folder="$DESTINATION_FOLDER"
if [ ! -s "${read_file}" ] || [ ! -f "${read_file}" ] ;then
echo "No work to be done ... "
return
fi
while IFS= read -r line ;do
echo "mv $line $write_folder" ;sleep 1
done < "${read_file}"
}
# MAIN (this is where the script starts) We just call two functions.
find_files_too_large
archive_large_files
it might be easier, i think, to change the filenames to the folder name initially. So abcdefg.mov would be abcdefg_timestamp.mov. I can always strip the timestamp from the filename easy enough after its copied to the right location. I was hoping i had a small syntax issue but i think there is no easy way of doing what i thought i could...
I think you have a basic misunderstanding of how wildcards work here. The mv command doesn't support wildcards at all; the shell expands all wildcards into lists of matching files before they get passed to the mv command as wildcards. Furthermore, the mv command doesn't know if the list of arguments it got came from wildcards or not, and the shell doesn't know anything about what the command is going to do with them. For instance, if you run the command grep *, the grep command just gets a list of names of files in the current directory as arguments, and will treat the first of them as a regex pattern ('cause that's what the first argument to grep is) to search the rest of the files for. If you ran mv * (note: don't do this!), it will interpret all but the last filename as sources, and the last one as a destination.
I think there's another source of confusion as well: when the shell expands a string containing a wildcard, it tries to match the entire thing to existing files and/or directories. So when you use Volumes/Myshare/SourceVideo/$in1*/$in1.mov, it looks for an already-existing file in a matching directory; AIUI the file isn't there yet, there's no match. What it does in that case is pass the raw (unexpanded) wildcard-containing string to mv as an argument, which looks for that exact name, doesn't find it, and gives you an error.
(BTW, should there be a "/" at the front of that pattern? I assume so below.)
If I understand the situation correctly, you might be able to use this:
mv -f /Volumes/Myshare/Processed/$in1.mov /Volumes/Myshare/SourceVideo/$in1*/
Since the filename isn't supplied in the second string, it doesn't look for existing files by that name, just directories with the right prefix; mv will automatically retain the filename from the source.
However, I'll echo #Sergio's warning about chaos from multiple matches. In this case, it won't overwrite files (well, it might, but for other reasons), but if it gets multiple matching target directories it'll move all but the last one into the last one (along with the file you meant to move). You say you're 100% certain this won't be a problem, but in my experience that means that there's at least a 50% chance that something you'd never have thought of will go ahead and make it happen anyway. For instance, is it possible that $in1 could wind up empty, or contain a space, or...?
Speaking of spaces, I'd also recommend double-quoting all variable references. You want the variables inside double-quotes, but the wildcards outside them (or they won't be expanded), like this:
mv -f "/Volumes/Myshare/Processed/$in1.mov" "/Volumes/Myshare/SourceVideo/$in1"*/

Rename batch of specific files using bash

Say I have a folder which contains files like this: a constant prefix and then an underscore and some description which is different for every file:
constantnamehere_description1.doc
constantnamehere_description2.doc
.
.
etc
Here description1, description2 etc just symbolized the different descriptions and not the actual number 1,2 etc..
How can I rename these files to just this?
constantnamehere1.doc
constantnamehere2.doc
.
.
etc
Here the numbers 1,2,..,etc symbolize the actual sequential ending that i want my files to have after the renaming.
The sequential ending (1,2,3,...,end) is very important.
Till now I have tried:
for i in *.doc; do mv "$i" "{i/_*.doc/ .doc}"; done
example actual file names
1003407_cc_1.vtk
1003407_cc_2.vtk
1003407_cc_3.vtk
1003407_cv.left.right.vtk
1003407_thalamo_frontal.left.vtk
I want to be like:
1003407_1.vtk
1003407_2.vtk
1003407_3.vtk
1003407_4.vtk
1003407_5.vtk
To make it extremely clear: I want everything to be removed after the first underscore and to be replaced with sequential numbers keeping the ".vtk" extension of the file
Using the an answer to Capturing Groups From a Grep RegEx, we can generate a regex for these file names and then rename by using the captured groups:
$ regex="([^_]*)_[^0-9]*([0-9]*).([a-z]*)"
$ for f in *doc
do
[[ $f =~ $regex ]]
echo "mv $f --> ${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
done
The regex says: get everything up to _, then expect some characters until a digit is found. Catch that set of digits and then expect a dot followed by the extension.
Use rename:
i=1
for file in *_*.vtk
do
rename "s/_[^.]*/${i}/" "$file"
i=$(( i + 1 ))
done
This removes everything between the underscore and the first . from all files matching the *_*.vtk pattern. If your filenames contain more than one ., the pattern needs to be adapted.
EDIT: Solution modified according to modified question.
I solved it like this:
i=0;
for file in *.vtk; do mv "${file}" 100307_"${i}".vtk; i=$((i+1)); done

How to rename files with ordering using Bash {thisfile.jpg -> newfile1.jpg, thatfile.jpg -> newfile2.jpg}

Let's say I have 100 jpg files.
DSC_0001.jpg
DSC_0002.jpg
DSC_0003.jpg
....
DSC_0100.jpg
And I want to rename them like
summer_trip_1.jpg
summer_trip_2.jpg
summer_trip_3.jpg
.....
summer_trip_100.jpg
So I want these properties to be modified:
1. filename
2. order of the file(as the order by date files were created)
How could I achieve this by using bash? Like:
for file in *.jpg ; do mv blah blah blah ; done
Thank you!
It's very simple: have a variable and increment it at each step.
Example:
cur_number=1
prefix="summer_trip_"
suffix=""
for file in *.jpg ; do
echo mv "$file" "${prefix}${cur_number}${suffix}.jpg" ;
let cur_number=$cur_number+1 # or : cur_number=$(( $cur_number + 1 ))
done
and once you think it's ready, take out the echo to let the mv occur.
If you prefer them to be ordered by file date (usefull, for example, when mixing photos from several cameras, of if on yours the "numbers" rolled over):
change
for file in *.jpg ; do
into
for file in $( ls -1t *.jpg ) ; do
Note that that second example will only work if your original filenames don't have space (and other weird characters) in them, which is fine with almost all cameras I know about.
Finally, instead of ${cur_number} you could use $(printf '%04s' "${cur_number}") so that the new number has leading zeros, making sorting much easier.
How about using rename ?
rename DSC_ summer_trip_ *.jpg
See man page of rename
This works if your original numbers are all padded with zeroes to the same length:
i=0; for f in DSC_*.jpg; do mv "$f" "summer_trip_$((++i)).jpg"; done
If I understand your goal correctly:
So I want these properties to be modified: 1. filename 2. order of the file(as the order by date files were created)
if numbers of renamed files shall increment in order by file creation date, then use the following for loop:
for file in $(ls -t -r *.jpg); do
-t sorts by mtime (last modification time, not exactly creation time, newest first) and -r reverses the order listing oldest first. Just in case if original .jpg file numbers are not in the same order as pictures were taken.
As it was mentioned previously, this won't work if file names have whitespaces. If your files have spaces please try modifying IFS variable before for loop:
IFS=$'\n'
It will force split of 'ls' command results on newlines only (not whitespaces). Also it would fail if there is a newline in a file name (rather exotic IMHO :). Changing IFS may have some subtle effects further in your script, so you can save old one and restore after the loop.

Resources