I have a list of files ~100,000 txt files with the following pattern name: file_[combination of letters and numbers]_[numbers from 1 to 400].txt, three examples would be:
file_ab34_1.txt, file_ab35_1.txt, file_bg12_2.txt, file_bg12_2.txt. What I want to do automatically is move all the files with _1 to a subfolder named /1 and all the files with _2 to a subfolder /2 and so on.
I would need a bash script to do it automatically and not one by one
Please back up your files before trying this answer.
I would try rename, like this:
rename --dry-run 's|(.*)_(\d+).txt|$2/$1_$2.txt|' *.txt
'file_ab34_1.txt' would be renamed to '1/file_ab34_1.txt'
'file_ab35_1.txt' would be renamed to '1/file_ab35_1.txt'
'file_bg12_2.txt' would be renamed to '2/file_bg12_2.txt'
The --dry-run just shows you what it would do without actually doing anything - great for testing before using.
It is basically Perl, and it is doing a substitution on the filename. The bones of it is to substitute like this:
s|something|something else|
Every time there are parentheses on the left side (called capture groups), they capture some aspect of the left side and it is then available as a numbered item to put in the replacement, right hand side, where $1 represents whatever was captured in the first set of parentheses and $2 represents whatever was in the second set of parentheses and so on.
You will likely need -p option to create the output directories, so:
rename -p ....
If you get errors about the argument list being too long, you will probably need to use find and xargs, along these lines (untested):
find . -name \*.txt -maxdepth 1 -print0 | xargs -0 -n 1000 rename ....
You can do this with just mv with shell globbing to get the files:
mv -t /1 file_*_1.txt
mv -t /2 file_*_2.txt
Related
I have a very long list of files stored in a text file (missing-files.txt) that I want to locate on my drive. These files are scattered in different folders in my drive. I want to get whatever closest available that can be found.
missing-files.txt
wp-content/uploads/2019/07/apple.jpg
wp-content/uploads/2019/08/apricots.jpg
wp-content/uploads/2019/10/avocado.jpg
wp-content/uploads/2020/04/banana.jpg
wp-content/uploads/2020/07/blackberries.jpg
wp-content/uploads/2020/08/blackcurrant.jpg
wp-content/uploads/2021/06/blueberries.jpg
wp-content/uploads/2021/01/breadfruit.jpg
wp-content/uploads/2021/02/cantaloupe.jpg
wp-content/uploads/2021/03/carambola.jpg
....
Here's my working bash code:
while read p;
do
file="${p##*/}"
/usr/local/bin/fd "${file}" | /usr/local/bin/rg "${p}" | /usr/bin/head -n 1 >> collected-results.txt
done <missing-files.txt
What's happening in my bash code:
I iterate from my list of files
I use FD (https://github.com/sharkdp/fd) command to locate those files in my drive
I then piped it to RIPGREP (https://github.com/BurntSushi/ripgrep) to filter the results and find the closest match. The match I'm looking for should match the same file and folder structure. I only limit it to one result.
Then finally stored it on another text file where I can later then evaluate the lists for next step
Where I need help:
Is this the most effecient way to do this? I have over 2,000 files that I need to locate. I'm open to other solution, this is something I just divised.
For some reason my coded broke, It stopped returning results to "collected-results.txt". My guess is that it broke somewhere in the second pipe right after the FD command. I haven't setup any condition in case it encounters an error or it can't find the file so it's hard for me to determine.
Additional Information:
I'm using Mac, and running on Catalina
Clearly this is not my area of expertise
"Missing" sounds like they do not exist where expected.
What makes you think they would be somewhere else?
If they are, I'd put the filenames in a list.txt file with enough minimal pattern to pick them out of the output of find.
$: cat list.txt
/apple.jpg$
/apricots.jpg$
/avocado.jpg$
/banana.jpg$
/blackberries.jpg$
/blackcurrant.jpg$
/blueberries.jpg$
/breadfruit.jpg$
/cantaloupe.jpg$
/carambola.jpg$
Then search the whole machine, which is gonna take a bit...
$: find / | grep -f list.txt
/tmp/apricots.jpg
/tmp/blackberries.jpg
/tmp/breadfruit.jpg
/tmp/carambola.jpg
Or if you want those longer partial paths,
$: find / | grep -f missing-files.txt
That should show you the actual paths to wherever those files exist IF they do exist on the system.
From the way I understand it, you want to find all files what could match the directory structure:
path/to/file
So it should return something like "/full/path/to/file" and "/another/full/path/to/file"
Using a simple find command you can get a list of all files that match this criteria.
Using find you can search your hard disk in a single go with something of the form:
$ find -regex pattern
The idea is now to build pattern, which we can do from the file missing_files.txt. The pattern should look something like .*/\(file1\|file2\|...\|filen\). So we can use the following awk to do so:
$ sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt
So now we can do exactly what you did, but a bit quicker, in the following way:
pattern="$(sed ':a;N;$!ba;s/\n/\|/g' missing_files.txt)"
pattern=".*/\($pattern\)"
find -regex "$pattern" > file_list.txt
In order to find the files, you can now do something like:
grep -F -f missing_files file_list.txt
This will return all the matching cases. If you just want the first case, i.e.
awk '(NR==FNR){a[$0]++;next}{for(i in a) if (!(i in b)) if ($0 ~ i) {print; b[i]}}' missing_files file_list.txt
Is this the most effecient way to do this?
I/O is mostly usually the biggest bottleneck. You are running some software fd to find the files for one file one at a time. Instead, run it to find all files at once - do single I/O for all files. In shell you would do:
find . -type f '(' -name "first name" -o -name "other name" -o .... ')'
How can I iterate from a list of source files and locate those files on my disk drive?
Use -path to match the full path. First build the arguments then call find.
findargs=()
# Read bashfaq/001
while IFS= read -r patt; do
# I think */ should match anything in front.
findargs+=(-o -path "*/$patt")
done < <(
# TODO: escape glob better, not tested
# see https://pubs.opengroup.org/onlinepubs/009604499/utilities/xcu_chap02.html#tag_02_13
sed 's/[?*[]/\\&/g' missing-files.txt
)
# remove leading -o
unset findargs[0]
find / -type f '(' "${findargs[#]}" ')'
Topics to research: var=() - bash arrays, < <(...) shell redirection with process substitution and when to use it (bashfaq/024), glob (and see man 7 glob) and man find.
I'm trying to create a bash script based on a input file (list.txt). The input File contains a list of files with absolute path. The output should be a bash script (move.sh) which moves the files to another location, preserve the folder structure, but changing the target folder name slightly before.
the Input list.txt File example looks like this :
/In/Folder_1/SomeFoldername1/somefilename_x.mp3
/In/Folder_2/SomeFoldername2/somefilename_y.mp3
/In/Folder_3/SomeFoldername3/somefilename_z.mp3
The output file (move.sh) should looks like this after creation :
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_2/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_3/
The folder structure should be preserved, more or less.
after executing the created bash script (move.sh), the result should looks like this :
/gain/Folder_1/somefilename_x.mp3
/gain/Folder_2/somefilename_y.mp3
/gain/Folder_3/somefilename_z.mp3
What I've done so far.
1. create a list of files with absolute path
find /In/ -iname "*.mp3" -type f > /home/maars/mp3/list.txt
2. create the move.sh script
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
# read the list and split the absolute path into fields
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'/' --output-delimiter=' ' -f1-))
done < /home/maars/mp3/move.sh
# add the target path based on variables at the end of the line
sed -i -E "s|\.mp3|\.mp3"\"" /gain/"${fields[1]}"/|g" /home/maars/mp3/move.sh
sed -i "s|/In/|mv "\""/In/|g" /home/maars/mp3/move.sh
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
The current result looks like
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_1/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_1/
rsync is not an option since I need the full control of files to be moved.
What could I do better to solve this issue ?
EDIT : #Socowi helped me a lot by pointing me in the right direction. After I did a deep dive into the World of Regex, I could solve my Issues. Thank you very much
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
You iterate over all lines and update fields for every line. After you finished the loop, fields retains its value (from the last line). You would have to move the sed commands into your loop and make sure that only the current line is replaced by sed. However, there's a better way – see down below.
What could I do better
There are a lot of things you could improve, for instance
Creating the array fields with mapfile -d/ fields instead of printf+cut+($()). That way, you also wouldn't have problems with spaces in paths.
Use sed only once instead of creating the array fields and using multiple sed commands. You can replace step 2 with this small script:
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
sed -i -E 's|^/[^/]*/([^/]*).*$|mv "&" "/gain/\1"|' /home/maars/mp3/move.sh
However, the best optimization would be to drop that three step approach and use only one script to find and move the files:
find /In/ -iname "*.mp3" -type f -exec rename -n 's|^/.*?/(.*?)/.*/(.*)$|/gain/$1/$2|' {} +
The -n option will print what will be renamed without actually renaming anything . Remove the -n when you are happy with the result. Here is the output:
rename(/In/Folder_1/SomeFoldername1/somefilename_x.mp3, /gain/Folder_1/somefilename_x.mp3)
rename(/In/Folder_2/SomeFoldername2/somefilename_y.mp3, /gain/Folder_2/somefilename_y.mp3)
rename(/In/Folder_3/SomeFoldername3/somefilename_z.mp3, /gain/Folder_3/somefilename_z.mp3)
It's not builtin to bash, but the mmv command is nice for this kind of mv where you need to use wildcards in paths. Something like the following should work:
mmv "in/*/*/*" "#1/#3"
Note that this won't create the directories for you - but in your example above it looks like these already exist?
I would appreciate any help, relatively new here
I have the following directory structure
Main_dir
|-Barcode_subdirname_01\(many further subfolders)\filename.pdf
|-Barcode_subdirname_02\(many further subfolders)\filename.csv
There are 1000s of files within many subfolders
The first level sub directories have the barcode associated to all files within. eg 123456_dirname
I want to copy all files within all subfoders to the main_dir and
rename the files subdirname_barcode_filename.extension (based only on the first subdirectory name and barcode)
I've been attempting to write a bash script to do this from the main_dir but have hit the limit of my coding ability (i'm open to any other way that'll work).
firstly identifying the first level sub folders
find -maxdepth 1 -type d |
then cut out the first 2 parts deliminated by the underscores
cut -d\_ -f1 > barcode
then find the files within the subfolders, rename and move
find -type f -print0 |
while IFS= read -r filenames; do
newname="${barcode/sudirname/filename\/}"
mv "filename" "main_dir"/"newname"
done
I can't get it to work and may be headed in the wrong direction.
You can use rename with sed like substitute conventions, for example
$ rename 's~([^_]+)_([^_]+)_.*/([^/.]+\..*)~$1_$2_$3~' barcode_subdir_01/a/b/c/file2.csv
will rename file to
barcode_subdir_file2.csv
I used ~ instead of the more common / separator to make it more clear.
You can test the script with -n option to show the renamed files without actually doing the action.
I am using a Mac OS X Lion.
I have a folder: LITERATURE with the following structure:
LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
I want to recursively concatenate the chapters that are split into multiple files (not all are). Then, I want to write the concatenated file to its parent's parent directory. The name of the concatenated file should be the same as the name of its parent directory.
For example, after running the script (in the folder structure shown above) I should get the following.
LITERATURE > Y > YATES, DORNFORD:
THE BROTHER OF DAPHNE.txt
THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
In this example, the parent directory is THE BROTHER OF DAPHNE and the parent's parent directory is YATES, DORNFORD.
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
It's not clear what you mean by "recursively" but this should be enough to get you started.
#!/bin/bash
titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188
local arr
arr=("${#,,}")
echo "${arr[#]^}"
}
for book in LITERATURE/?/*/*; do
title=$(titlecase ${book##*/})
for file in "$book"/*; do
cat "$file"
echo
done >"$book/$title"
echo '# not doing this:' rm "$book"/*.txt
done
This loops over LITERATURE/initial/author/BOOK TITLE and creates a file Book Title (where should a space be added?) from the catenated files in each book directory. (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure.
Removing the chapter files is a bit risky so I'm not doing it here. You could remove the echo prefix from the line after the first done to enable it.
If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title assignment assumes you can use the book title unquoted.
Only the parameter expansion with case conversion is beyond the very basics of Bash. The array operations could perhaps also be a bit scary if you are a complete beginner. Proper understanding of quoting is also often a challenge for newcomers.
cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
cat all txt at once (assuming name sorted list)
take chapter number/ref from the ls of the folder and with a sed to adapt the format
rename the concatenate file including chapters
Shell doesn't like white space in names. However, over the years, Unix has come up with some tricks that'll help:
$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt
Might do what you want.
The find recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt).
Normally, find separates out the directory entry names with NL, but the -print0 says to separate out the entries names with the NUL character. The NL is a valid character in a file name, but NUL isn't.
The xargs command takes the output of the find and processes it. xargs gathers all the names and passes them in bulk to the command you give it -- in this case the cat command.
Normally, xargs separates out files by white space which means Chapters would be one file and 01-05.txt would be another. However, the -0 tells xargs, to use NUL as a file separator -- which is what -print0 does.
Thanks for all your input. They got me thinking, and I managed to concatenate the files using the following steps:
This script replaces spaces in filenames with underscores.
#!/bin/bash
# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
do
# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
find . -name '* *' -maxdepth $i | while read file
do
# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
item=`echo "$file" | sed 's/ /_/g'`
echo "Renaming '$file' to '$item'"
mv "$file" "$item"
done
done
This script concatenates text files that start with Part, Chapter, Section, or Book.
#!/bin/bash
# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do
# Check if the parent directory contains any files of interest.
if ls $D/Part*.txt &>/dev/null ||
ls $D/Chapter*.txt &>/dev/null ||
ls $D/Section*.txt &>/dev/null ||
ls $D/Book*.txt &>/dev/null
then
# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
ppdir="$(dirname "$D")"
# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
fi
done
Now, we delete all the files that we concatenated so that its parent directory is left empty.
find . -name 'Part*' -delete
find . -name 'Chapter*' -delete
find . -name 'Section*' -delete
find . -name 'Book*' -delete
The following command will delete empty directories. (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files.
find . -type d -empty -delete
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
I know how I can rename files and such, but I'm having trouble with this.
I only need to rename test-this in a for loop.
test-this.ext
test-this.volume001+02.ext
test-this.volume002+04.ext
test-this.volume003+08.ext
test-this.volume004+16.ext
test-this.volume005+32.ext
test-this.volume006+64.ext
test-this.volume007+78.ext
If you have all of these files in one folder and you're on Linux you can use:
rename 's/test-this/REPLACESTRING/g' *
The result will be:
REPLACESTRING.ext
REPLACESTRING.volume001+02.ext
REPLACESTRING.volume002+04.ext
...
rename can take a command as the first argument. The command here consists of four parts:
s: flag to substitute a string with another string,
test-this: the string you want to replace,
REPLACESTRING: the string you want to replace the search string with, and
g: a flag indicating that all matches of the search string shall be replaced, i.e. if the filename is test-this-abc-test-this.ext the result will be REPLACESTRING-abc-REPLACESTRING.ext.
Refer to man sed for a detailed description of the flags.
Use rename as shown below:
rename test-this foo test-this*
This will replace test-this with foo in the file names.
If you don't have rename use a for loop as shown below:
for i in test-this*
do
mv "$i" "${i/test-this/foo}"
done
Function
I'm on OSX and my bash doesn't come with rename as a built-in function. I create a function in my .bash_profile that takes the first argument, which is a pattern in the file that should only match once, and doesn't care what comes after it, and replaces with the text of argument 2.
rename() {
for i in $1*
do
mv "$i" "${i/$1/$2}"
done
}
Input Files
test-this.ext
test-this.volume001+02.ext
test-this.volume002+04.ext
test-this.volume003+08.ext
test-this.volume004+16.ext
test-this.volume005+32.ext
test-this.volume006+64.ext
test-this.volume007+78.ext
Command
rename test-this hello-there
Output
hello-there.ext
hello-there.volume001+02.ext
hello-there.volume002+04.ext
hello-there.volume003+08.ext
hello-there.volume004+16.ext
hello-there.volume005+32.ext
hello-there.volume006+64.ext
hello-there.volume007+78.ext
Without using rename:
find -name test-this\*.ext | sed 'p;s/test-this/replace-that/' | xargs -d '\n' -n 2 mv
The way it works is as follows:
find will, well, find all files matching your criteria. If you pass -name a glob expression, don't forget to escape the *.
Pipe the newline-separated* list of filenames into sed, which will:
a. Print (p) one line.
b. Substitute (s///) test-this with replace-that and print the result.
c. Move on to the next line.
Pipe the newline-separated list of alternating old and new filenames to xargs, which will:
a. Treat newlines as delimiters (-d '\n').
b. Call mv repeatedly with up to 2 (-n 2) arguments each time.
For a dry run, try the following:
find -name test-this\*.ext | sed 'p;s/test-this/replace-that/' | xargs -d '\n' -n 2 echo mv
*: Keep in mind it won't work if your filenames include newlines.
to rename index.htm to index.html
rename [what you want to rename] [what you want it to be] [match on these files]
rename .htm .HTML *.htm
renames index.htm to index.html
It will do this for all files that match *.htm in the folder.
thx for your passion and answers. I also find a solution for me to rename multiple files on my linux terminal and directly add a little counter. With this I have a very good chance to have better SEO names.
Here is the command
count=1 ; zmv '(*).jpg' 'new-seo-name--$((count++)).jpg'
I also do a live coding video and publush it to YouTube