How do change all filenames with a similar but not identical structure? - bash

Due to a variety of complex photo library migrations that had to be done using a combination of manual copying and importing tools that renamed the files, it seems I wound up with a ton of files with a similar structure. Here's an example:
2009-05-05 - 2009-05-05 - IMG_0486 - 2009-05-05 at 10-13-43 - 4209 - 2009-05-05.JPG
What it should be:
2009-05-05 - IMG_0486.jpg
The other files have the same structure, but obviously the individual dates and IMG numbers are different.
Is there any way I can do some command line magic in Terminal to automatically rename these files to the shortened/correct version?

I assume you may have sub-directories and want to find all files inside this directory tree.
This first code block (which you could put in a script) is "safe" (does nothing), but will help you see what would be done.
datep="[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"
dir="PUT_THE_FULL_PATH_OF_YOUR_MAIN_DIRECTORY"
while IFS= read -r file
do
name="$(basename "$file")"
[[ "$name" =~ ^($datep)\ -\ $datep\ -\ ([^[:space:]]+)\ -\ $datep.*[.](.+)$ ]] || continue
date="${BASH_REMATCH[1]}"
imgname="${BASH_REMATCH[2]}"
ext="${BASH_REMATCH[3],,}"
dir_of_file="$(dirname "$file")"
target="$dir_of_file/$date - $imgname.$ext"
echo "$file"
echo " would me moved to..."
echo " $target"
done < <(find "$dir" -type f)
Make sure the output is what you want and are expecting. I cannot test on your actual files, and if this script does not produce results that are entirely satisfactory, I do not take any responsibility for hair being pulled out. Do not blindly let anyone (including me) mess with your precious data by copy and pasting code from the internet if you have no reliable, checked backup.
Once you are sure, decide if you want to take a chance on some guy's code written without any opportunity for testing and replace the three consecutive lines beginning with echo with this :
mv "$file" "$target"
Note that file names have to match to a pretty strict pattern to be considered for processing, so if you notice that some files are not being processed, then the pattern may need to be modified.

Assuming they are all the exact same structure, spaces and everything, you can use awk to split the names up using the spaces as break points. Here's a quick and dirty example:
#!/bin/bash
output=""
for file in /path/to/files/*; do
unset output #clear variable from previous loop
output="$(echo $file | awk '{print $1}')" #Assign the first field to the output variable
output="$output"" - " #Append with [space][dash][space]
output="$output""$(echo $file | awk '{print $5}')" #Append with IMG_* field
output="$output""." #Append with period
#Use -F '.' to split by period, and $NF to grab the last field (to get the extension)
output="$output""$(echo $file | awk -F '.' '{print $NF}')"
done
From there, something like mv /path/to/files/$file /path/to/files/$output as a final line in the file loop will rename the file. I'd copy a few files into another folder to test with first, since we're dealing with file manipulation.
All the output assigning lines can be consolidated into a single line, as well, but it's less easy to read.
output="$(echo $file | awk '{print $1 " - " $5 "."}')""$(echo $file | awk -F '.' '{print $NF}')"
You'll still want a file loop, though.

Assuming that you want to convert the filename with the first date and the IMG* name, you can run the following on the folder:
IFS=$'\n'
for file in *
do
printf "mv '$file' '"
printf '%s' $(cut -d" " -f1,4,5 <<< "$file")
printf "'.jpg"
done | sh

Related

Batch create files with name and content based on input file

I am a mac OS user trying to batch create a bunch of files. I have a text file with column of several hundred terms/subjects, eg:
hydrogen
oxygen
nitrogen
carbon
etcetera
I want to programmatically fill a directory with text files generated from this subject list. For example, "hydrogen.txt" and "oxygen.txt" and so on, with each file created by iterating through the lines of my list_of_names.txt file. Some lines are one word, but other lines are two or three words (eg: "carbon monoxide"). This I have figured out how to do:
awk 'NF>0' list_of_names.txt | while read line; do touch "${line}.txt"; done
Additionally I need to create two lines of content within each of these files, and the content is both static and dynamic...
# filename
#elements/filename
...where in the example above the pound sign ("#") and "elements/" would be the same in all of the files created, but "filename" would be variable (eg: "hydrogen" for "hydrogen.txt" and "oxygen" for "oxygen.txt" etc). One further wrinkle is that if any spaces appear at all on the second line of content, there needs to be a trailing pound sign. For example:
# filename
#elements/carbon monoxide#
...although this last part is not a dealbreaker and I can use grep to modify list_of_names.txt such that phrases like "carbon monoxide" become "carbon_monoxide" and just deal with the repercussions of this later. (But if it is easy to preserve the spaces, I would prefer that.)
After a couple hours of searching and attempts to use sed, awk, and so on I am stuck at a directory full of files with the correct filename.txt format, but I can't get further that this. Mostly I think my efforts are failing because the solutions I can find for doing something like this are using commands I am not familiar with and they are structured for GNU and don't execute correctly in Terminal on Mac OS.
I am amenable to processing this in multiple steps (ie make all of the files.txt first, then run a second step to populate the content of the files), or as a single command that makes the files and all of their content simultaneously ('simultaneously' from a human timescale).
My horrible pseudocode (IN CAPS) for how this would look as 2 steps:
awk 'NF>0' list_of_names.txt | while read line; do touch "${line}.txt"; done
awk 'NF>0' list_of_names.txt | while read line; OPEN "${line}.txt" AND PRINT "# ${line}\n#elements/${line}"; IF ${line} CONTAINS CHARACTER " " PRINT "#"; done
You could use a simple Bash loop and create the files in one shot:
#!/bin/bash
while read -r name; do # loop through input file content
[[ $name ]] || continue # skip empty lines
output=("# $name") # initialize the array with first element
trailing=
[[ $name = *" "* ]] && trailing="#" # name has spaces in it
output+=("#elements/$name$trailing") # name doesn't have a space
printf '%s\n' "${output[#]}" > "$name.txt" # write array content to the output file
done < list_of_names.txt
Doing it in awk:
awk '
NF {
trailing = (/ / ? "#" : "")
out=$0".txt"
printf("# %s\n#elements/%s%s\n", $0, $0, trailing) > out
close(out)
}
' list_of_names.txt
Doing the whole job in awk will yield better performance than in bash, which isn't really suited to processing text like this.
It seems to me that this should cover the requirements you've specified:
awk '
{
out=$0 ".txt"
printf "# %s\n#elements/%s%s\n", $0, $0, (/ / ? "#" : "") >> out
close(out)
}
' list_of_subjects.txt
Though you could shrink it to a one-liner:
awk '{printf "# %s\n# elements/%s%s\n",$0,$0,(/ /?"#":"")>($0".txt");close($0".txt")}' list_of_subjects.txt

grep files against a list only containing numbers

I have several files (~70000) that have numbers in the name, a couple of examples would be 991000_Metatissue.qsub.file 828000_Metatissue.qsub.file, and then I have another file (files_failed.txt) with a bunch of numbers that I would use to grep. This list looks like this:
4578000
458000
4582000
527000
5288000
5733000
653000
6548000
6663000
I have tried with: ls -1 *.qsub.file | grep -F -f files_failed.txt - and even doing this:
ls -1 *.qsub.file > files_to_submit.txt
grep -F -f files_failed.txt files_to_submit.txt
But always got all the qsub.files...
grep -f isn't well composed (see GNU bug 16305), so I recommend using awk instead:
find . -name '*_*.qsub.file' |awk -F_ '
NR == FNR { failed[$NR] = 1; next }
$1 in failed
' files_failed.txt /dev/stdin
This uses find to locate the files in question, piping them into awk. Before awk processes that, it reads files_failed.txt and stores the values into an associated array (aka dictionary or hash) when the line number (NR, number of records so far) equals the line number of the current file (FNR), meaning it's the first file read. If the first column (the number of the file since we delimited by _) is in that array, it was a failure. AWK's default action on a stanza is to print it, so you will get a list of those failed files.
Note the lack of regular expressions! On a big directory, this is much faster than grep -F -f …, which itself is much faster than grep -f …, even assuming the aforementioned bug is fixed.
You should be using find and you need to modify your "patterns". Here is one way that should work:
# List all files ending in "qsub.file"
find . -name '*.qsub.file' |
# Add ./ and _ to each number to make the match exact
grep -F -f <(sed -e 's:^:./:' -e 's/$/_/' files_failed.txt)
70000 files is too much for a ls, you should use find instead.
And I prefer invert the logic, list just what a want instead of list all and then filter.
Something like
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt
If you need the exit in another file?
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt >> files_to_submit.txt
You can use the below script:-
ls -1 *.qsub.file > filelist.txt
while read pattern
do
filefound=$(grep $pattern filelist.txt)
if [ "$filefound" != "" ]; then
echo "File Found : $filefound"
fi
done < files_failed.txt
Second option:-
while read pattern
do
find . -name "$pattern*.qsub.file" >> filefound.txt
done < files_failed.txt
All your files will be stored in file filefound.txt

Move files in S3 to folders based on filename

I have s3 folder where files are staged from an application.
I need to move these files based on a specified folder structure using the filenames.
The files are named in a particular format:
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
I need to move them to s3 folders of this format:
s3://bucketname/file1/YYYY/MM/DD
I have the following code now to store all the filenames present in the staging folder in a file.
path=s3://bucketname/staging
count=`s3cmd ls $path | wc -l`
echo $count
if [[ $count -gt 0 ]]; then
list_files_to_move_s3=$(s3cmd ls -r $path | awk '{print $4}' > files_in_bucket.txt)
echo "exists"
else
echo "do not exist"
fi
I now need to read the filenames and move the files accordingly.
Can you please help.
You can parse the contents of files_in_bucket.txt with sed to produce the output you want:
---> cat tests3.txt
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
s3://bucketname/staging/file1_YYYY_MM_DD_HH_MM_SS
---> sed -r "s|^(s3://.*)/.*/(.*)_(.*)_(.*)_(.*)_.*_.*_.*$|\1/\2/\3/\4/\5|g" tests3.txt
s3://bucketname/file1/YYYY/MM/DD
s3://bucketname/file1/YYYY/MM/DD
--->
What's happening there is it's parsing out each line from the file tests3.txt, with each bit inside parentheses saved as a "variable" (I'm not sure what the correct term is for sed, but you get the idea) which can then be referenced in the substitution string as \1, \2, \3, etc. So it's picking out the first bit, including up until the first slash, skipping the "staging" bit, and then picking out the file and date portions of the file name.
Note that this assumes a very standardized layout of the filenames and your desired output.
Let me know if you have any questions about this or need further help.

Renames numbered files using names from list in other file

I have a folder where there are books and I have a file with the real name of each file. I renamed them in a way that I can easily see if they are ordered, say "00.pdf", "01.pdf" and so on.
I want to know if there is a way, using the shell, to match each of the lines of the file, say "names", with each file. Actually, match the line i of the file with the book in the positión i in sort order.
<name-of-the-book-in-the-1-line> -> <book-in-the-1-position>
<name-of-the-book-in-the-2-line> -> <book-in-the-2-position>
.
.
.
<name-of-the-book-in-the-i-line> -> <book-in-the-i-position>
.
.
.
I'm doing this in Windows, using Total Commander, but I want to do it in Ubuntu, so I don't have to reboot.
I know about mv and rename, but I'm not as good as I want with regular expressions...
renamer.sh:
#!/bin/bash
for i in `ls -v |grep -Ev '(renamer.sh|names.txt)'`; do
read name
mv "$i" "$name.pdf"
echo "$i" renamed to "$name.pdf"
done < names.txt
names.txt: (line count must be the exact equal to numbered files count)
name of first book
second-great-book
...
explanation:
ls -v returns naturally sorted file list
grep excludes this script name and input file to not be renamed
we cycle through found file names, read value from file and rename the target files by this value
For testing purposes, you can comment out the mv command:
#mv "$i" "$name"
And now, simply run the script:
bash renamer.sh
This loops through names.txt, creates a filename based on a counter (padding to two digits with printf, assigning to a variable using -v), then renames using mv. ((++i)) increases the counter for the next filename.
#!/bin/bash
i=0
while IFS= read -r line; do
printf -v fname "%02d.pdf" "$i"
mv "$fname" "$line"
((++i))
done < names.txt

Add file date to file name in bash

I'm looking for a programmatic way to add a file's date to the filename. I'm on a Mac, Yosemite (10.10).
Using Bash, I have put a fair amount of effort into this, but just too new to get there so far. Here's what I have so far:
#!/bin/bash
#!/bin/bash
(IFS='
'
for x in `ls -l | awk '{print$9" "$7"-"$6"-"$9}'`
do
currentfilename=$(expr "$x" : '\($substring\)')
filenamewithdate=$(expr "$x" : '.*\($substring\)')
echo "$currentfilename"
echo "$filenamewithdate"
done)
The idea here is to capture detailed ls output, use awk to capture the strings for the columns with the filename ($9), and also date fields ($7 and $6), then loop that output to capture the previous filename and new filename with the date to mv the file from old filename to new. The awk statement adds a space to separate current filename from new. The echo statement is there now to test if I am able to parse the awk ouput. I just don't know what to add for $substring to get the parts of the string that are needed.
I have much more to learn about Bash scripting, so I hope you'll bear with me as I learn. Any guidance?
Thanks.
Looking at the stat man page, you'd want:
for file in *; do
filedate=$(stat -t '%Y-%m-%dT%H:%M:%S' -f '%m' "$file")
newfile="$file-$filedate"
echo "current: $file -> new: $newfile"
done
Adjust your preferred datetime format to your taste.
You could save a line with
for file in *; do
newfile=$(stat -t '%Y-%m-%dT%H:%M:%S' -f '%N-%m' "$file")

Resources