Script to prepend all filenames within a directory - shell

I've found an issue with adobes bates numbering tool, where file names are messing up the order in which they are numbered.
I was hoping to write a script that users would be able to click on and add the folder extension for all the files.
Then the script would prepend all the file names within the folder with a 000001filename.pdf 000002filename.pdf etc...
I've never combined scripts before but i've found scripts that either rename OR prepend. and i couldn't find anything that would rename sequentially with preceding 0's.

without much testing:
n=0 # or 1 if you like
format="%06d" # format of prefix
find . -maxdepth 1 -type f | # only one level, no dirs but also no symlinks etc
cut -d/ -f2 | # remove leading ./
sort | # plugin your sorting here
while read file
do
prefix=`printf "%06d" $n`
mv "$file" "$prefix$file" # but mv is dangerous!
n=$((n+1))
done

Related

bash change absolute path in file line by line for script creation

I'm trying to create a bash script based on a input file (list.txt). The input File contains a list of files with absolute path. The output should be a bash script (move.sh) which moves the files to another location, preserve the folder structure, but changing the target folder name slightly before.
the Input list.txt File example looks like this :
/In/Folder_1/SomeFoldername1/somefilename_x.mp3
/In/Folder_2/SomeFoldername2/somefilename_y.mp3
/In/Folder_3/SomeFoldername3/somefilename_z.mp3
The output file (move.sh) should looks like this after creation :
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_2/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_3/
The folder structure should be preserved, more or less.
after executing the created bash script (move.sh), the result should looks like this :
/gain/Folder_1/somefilename_x.mp3
/gain/Folder_2/somefilename_y.mp3
/gain/Folder_3/somefilename_z.mp3
What I've done so far.
1. create a list of files with absolute path
find /In/ -iname "*.mp3" -type f > /home/maars/mp3/list.txt
2. create the move.sh script
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
# read the list and split the absolute path into fields
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'/' --output-delimiter=' ' -f1-))
done < /home/maars/mp3/move.sh
# add the target path based on variables at the end of the line
sed -i -E "s|\.mp3|\.mp3"\"" /gain/"${fields[1]}"/|g" /home/maars/mp3/move.sh
sed -i "s|/In/|mv "\""/In/|g" /home/maars/mp3/move.sh
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
The current result looks like
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_1/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_1/
rsync is not an option since I need the full control of files to be moved.
What could I do better to solve this issue ?
EDIT : #Socowi helped me a lot by pointing me in the right direction. After I did a deep dive into the World of Regex, I could solve my Issues. Thank you very much
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
You iterate over all lines and update fields for every line. After you finished the loop, fields retains its value (from the last line). You would have to move the sed commands into your loop and make sure that only the current line is replaced by sed. However, there's a better way – see down below.
What could I do better
There are a lot of things you could improve, for instance
Creating the array fields with mapfile -d/ fields instead of printf+cut+($()). That way, you also wouldn't have problems with spaces in paths.
Use sed only once instead of creating the array fields and using multiple sed commands. You can replace step 2 with this small script:
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
sed -i -E 's|^/[^/]*/([^/]*).*$|mv "&" "/gain/\1"|' /home/maars/mp3/move.sh
However, the best optimization would be to drop that three step approach and use only one script to find and move the files:
find /In/ -iname "*.mp3" -type f -exec rename -n 's|^/.*?/(.*?)/.*/(.*)$|/gain/$1/$2|' {} +
The -n option will print what will be renamed without actually renaming anything . Remove the -n when you are happy with the result. Here is the output:
rename(/In/Folder_1/SomeFoldername1/somefilename_x.mp3, /gain/Folder_1/somefilename_x.mp3)
rename(/In/Folder_2/SomeFoldername2/somefilename_y.mp3, /gain/Folder_2/somefilename_y.mp3)
rename(/In/Folder_3/SomeFoldername3/somefilename_z.mp3, /gain/Folder_3/somefilename_z.mp3)
It's not builtin to bash, but the mmv command is nice for this kind of mv where you need to use wildcards in paths. Something like the following should work:
mmv "in/*/*/*" "#1/#3"
Note that this won't create the directories for you - but in your example above it looks like these already exist?

Comparing two directories to produce output

I am writing a Bash script that will replace files in folder A (source) with folder B (target). But before this happens, I want to record 2 files.
The first file will contain a list of files in folder B that are newer than folder A, along with files that are different/orphans in folder B against folder A
The second file will contain a list of files in folder A that are newer than folder B, along with files that are different/orphans in folder A against folder B
How do I accomplish this in Bash? I've tried using diff -qr but it yields the following output:
Files old/VERSION and new/VERSION differ
Files old/conf/mime.conf and new/conf/mime.conf differ
Only in new/data/pages: playground
Files old/doku.php and new/doku.php differ
Files old/inc/auth.php and new/inc/auth.php differ
Files old/inc/lang/no/lang.php and new/inc/lang/no/lang.php differ
Files old/lib/plugins/acl/remote.php and new/lib/plugins/acl/remote.php differ
Files old/lib/plugins/authplain/auth.php and new/lib/plugins/authplain/auth.php differ
Files old/lib/plugins/usermanager/admin.php and new/lib/plugins/usermanager/admin.php differ
I've also tried this
(rsync -rcn --out-format="%n" old/ new/ && rsync -rcn --out-format="%n" new/ old/) | sort | uniq
but it doesn't give me the scope of results I require. The struggle here is that the data isn't in the correct format, I just want files not directories to show in the text files e.g:
conf/mime.conf
data/pages/playground/
data/pages/playground/playground.txt
doku.php
inc/auth.php
inc/lang/no/lang.php
lib/plugins/acl/remote.php
lib/plugins/authplain/auth.php
lib/plugins/usermanager/admin.php
List of files in directory B (new/) that are newer than directory A (old/):
find new -newermm old
This merely runs find and examines the content of new/ as filtered by -newerXY reference with X and Y both set to m (modification time) and reference being the old directory itself.
Files that are missing in directory B (new/) but are present in directory A (old/):
A=old B=new
diff -u <(find "$B" |sed "s:$B::") <(find "$A" |sed "s:$A::") \
|sed "/^+\//!d; s::$A/:"
This sets variables $A and $B to your target directories, then runs a unified diff on their contents (using process substitution to locate with find and remove the directory name with sed so diff isn't confused). The final sed command first matches for the additions (lines starting with a +/), modifies them to replace that +/ with the directory name and a slash, and prints them (other lines are removed).
Here is a bash script that will create the file:
#!/bin/bash
# Usage: bash script.bash OLD_DIR NEW_DIR [OUTPUT_FILE]
# compare given directories
if [ -n "$3" ]; then # the optional 3rd argument is the output file
OUTPUT="$3"
else # if it isn't provided, escape path slashes to underscores
OUTPUT="${2////_}-newer-than-${1////_}"
fi
{
find "$2" -newermm "$1"
diff -u <(find "$2" |sed "s:$2::") <(find "$1" |sed "s:$1::") \
|sed "/^+\//!d; s::$1/:"
} |sort > "$OUTPUT"
First, this determines the output file, which either comes from the third argument or else is created from the other inputs using a replacement to convert slashes to underscores in case there are paths, so for example, running as bash script.bash /usr/local/bin /usr/bin would output its file list to _usr_local_bin-newer-than-_usr_bin in the current working directory.
This combines the two commands and then ensures they are sorted. There won't be any duplicates, so you don't need to worry about that (if there were, you'd use sort -u).
You can get your first and second files by changing the order of arguments as you invoke this script.

Recursively concatenating (joining) and renaming text files in a directory tree

I am using a Mac OS X Lion.
I have a folder: LITERATURE with the following structure:
LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
I want to recursively concatenate the chapters that are split into multiple files (not all are). Then, I want to write the concatenated file to its parent's parent directory. The name of the concatenated file should be the same as the name of its parent directory.
For example, after running the script (in the folder structure shown above) I should get the following.
LITERATURE > Y > YATES, DORNFORD:
THE BROTHER OF DAPHNE.txt
THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
In this example, the parent directory is THE BROTHER OF DAPHNE and the parent's parent directory is YATES, DORNFORD.
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
It's not clear what you mean by "recursively" but this should be enough to get you started.
#!/bin/bash
titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188
local arr
arr=("${#,,}")
echo "${arr[#]^}"
}
for book in LITERATURE/?/*/*; do
title=$(titlecase ${book##*/})
for file in "$book"/*; do
cat "$file"
echo
done >"$book/$title"
echo '# not doing this:' rm "$book"/*.txt
done
This loops over LITERATURE/initial/author/BOOK TITLE and creates a file Book Title (where should a space be added?) from the catenated files in each book directory. (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure.
Removing the chapter files is a bit risky so I'm not doing it here. You could remove the echo prefix from the line after the first done to enable it.
If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title assignment assumes you can use the book title unquoted.
Only the parameter expansion with case conversion is beyond the very basics of Bash. The array operations could perhaps also be a bit scary if you are a complete beginner. Proper understanding of quoting is also often a challenge for newcomers.
cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
cat all txt at once (assuming name sorted list)
take chapter number/ref from the ls of the folder and with a sed to adapt the format
rename the concatenate file including chapters
Shell doesn't like white space in names. However, over the years, Unix has come up with some tricks that'll help:
$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt
Might do what you want.
The find recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt).
Normally, find separates out the directory entry names with NL, but the -print0 says to separate out the entries names with the NUL character. The NL is a valid character in a file name, but NUL isn't.
The xargs command takes the output of the find and processes it. xargs gathers all the names and passes them in bulk to the command you give it -- in this case the cat command.
Normally, xargs separates out files by white space which means Chapters would be one file and 01-05.txt would be another. However, the -0 tells xargs, to use NUL as a file separator -- which is what -print0 does.
Thanks for all your input. They got me thinking, and I managed to concatenate the files using the following steps:
This script replaces spaces in filenames with underscores.
#!/bin/bash
# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
do
# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
find . -name '* *' -maxdepth $i | while read file
do
# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
item=`echo "$file" | sed 's/ /_/g'`
echo "Renaming '$file' to '$item'"
mv "$file" "$item"
done
done
This script concatenates text files that start with Part, Chapter, Section, or Book.
#!/bin/bash
# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do
# Check if the parent directory contains any files of interest.
if ls $D/Part*.txt &>/dev/null ||
ls $D/Chapter*.txt &>/dev/null ||
ls $D/Section*.txt &>/dev/null ||
ls $D/Book*.txt &>/dev/null
then
# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
ppdir="$(dirname "$D")"
# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
fi
done
Now, we delete all the files that we concatenated so that its parent directory is left empty.
find . -name 'Part*' -delete
find . -name 'Chapter*' -delete
find . -name 'Section*' -delete
find . -name 'Book*' -delete
The following command will delete empty directories. (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files.
find . -type d -empty -delete
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]

Search files for a list of keywords using command line?

I have a list of keywords in a txt file like this:
keyword1
keyword2
keyword3
I need to search all of my files - EXCEPT for HTML and CSS files - for these keywords.
The only thing I need to know is which of the keyword DON'T appear inside any of the files. I don't care about the ones that do or what files they are in. I simply need to know which of the keywords aren't in any of the files.
Everything I've looked up keeps coming back with results about how to find keywords and outputs the files they are in. I'm open to doing this through command line, Perl, or whatever is the easiest way to get it done.
It looks like these commands should work for finding files not containing my keywords:
grep -L "foo" *
or
ack -L "foo" *
But I don't know how to pull the keywords from my txt file or how to make it search all files except .html or .css
I'm running this on my server so I'm not really too concerned about how resource-intensive it is...
Since your description is not as complete, I will assume the followings:
HTML file has .html extension (Note: it could have .htm .HTM, .HTML
extension, I just assume on of them, please adapt the answer to suit
your situation)
CSS file has .css extension (Again, it may have .CSS extension)
You keywords can be easily put into a grep command, i.e. NOT having
special regular expression characters, for example "^" means starts
of a line match, "$" means end of a line match.
You are trying to search for files under the a folder and its
subfolder.
Assume your keyword file is ../keywordfile.txt. Note: Since current
folder search is assumed, your keywordfile.txt can not be placed in
current folder, otherwise, searching keywordfile.txt itself yields
all matches, and nothing will output (since every keyword matches)
Now a quick-and-dirty way to do it:
#!/bin/bash
TMP=/tmp/filelist$$.txt
find . -type f | grep -v ".html$" | grep -v ".css$" > $TMP
## Note: if you are search only current fold but not subfolders,
## add "-maxdepth 1" option to "find" command
while read keyword; do
if [ `while read file; do \
cat "$file"; \
done < $TMP | grep -c "$keyword"` -eq 0 ]; then \
echo "$keyword does not appear in any files."; \
fi; \
done < ../keywordfile.txt
Try this:
#!/bin/bash
keywordlist=$(cat keywordfile.txt | tr "\n" "\|")
for x in $(find . ! -name "*.html" ! -name "*.css" -type f)
do
if ! grep -qE "(${keywordlist%"|"})" $x
then
echo $x
fi
done

Bash script pdftk merge PDFs

I have a few thousand PDFs that I need merged based on filename.
Named like:
Lastname, Firstname_12345.pdf
Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:
Lastname, Firstname_12345_201305160953344627.pdf
For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf? and the originals deleted.
I gave this my best effort and this is what I have so far.
#! /bin/bash
# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"
# Remove .pdf extension. merge pdfs. delete originals.
do
y=${x%%.*}
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done
This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?
Since your file names contain spaces the for loop won't work as is.
Once you have a list of file names, a test on the number of files matching y*.pdf to determine if you need to merge the pdfs.
#!/bin/bash
LIST=( * )
# Remove .pdf extension. merge pdfs. delete originals.
for x in "${LIST[#]}" ; do
y=${x%%.pdf}
if [ $(ls "$y"*.pdf 2>/dev/null | wc -l ) -gt 1 ]; then
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
fi
done

Resources