How to recursively find & replace whole files with bash? - bash

I have hundreds of files that I need to recursively replace as the files are currently stored like so:
/2019/01/
file1.pdf
file2.pdf
/2019/02
file3.pdf
file4.pdf
etc
I then have all of the updated files in another directory like so:
/new-files
file1.pdf
file2.pdf
file3.pdf
file4.pdf
Could someone please tell me the best way of doing this with a bash script? I'd basically like to read the new-files directory and then replace any matching file names in the other folders.
Thanks in advance for any help!

Assuming that the 'new-files' directory and all the directory trees containing PDF files are under the current directory, try this Shellcheck-clean Bash code:
#! /bin/bash -p
find . -path ./new-files -prune -o -type f -name '*.pdf' -print0 \
| while IFS= read -r -d '' pdfpath; do
pdfname=${pdfpath##*/}
new_pdfpath=new-files/$pdfname
if [[ -f $new_pdfpath ]]; then
printf "Replace '%s' with '%s'\n" "$pdfpath" "$new_pdfpath" >&2
# cp -- "$new_pdfpath" "$pdfpath"
fi
done
The -path ./new-files -prune in the find command stops the 'new-files' directory from being searched.
The -o in the find command causes the next test and actions to be tried after checking for 'new-files'.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of the use of the -print0 option to find and the while IFS= read -r -d '' .... In short, the code can handle arbitrary file paths, including ones with whitespace and newline characters in them.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${pdfpath##*/}.
It's not clear to me if you want to copy or move the new file to replace the old file, or do something else. Run the code as it is to check if it is identifying the correct replacements to be done. If you are happy with it, uncomment the cp line, and modify it to do something different if that is what you want.
The -- in the cp command protects against arguments beginning with dash characters being interpreted as options. It's unnecessary in this case, but I always use it when arguments begin with variable (or other) expansions so the code will remain safe if it is used in other contexts.

I think this calls for a bash array.
#!/usr/bin/env bash
# Make an associative array
declare -A files=()
# Populate array as $files[file.pdf]="/path/to/file.pdf"
for f in 20*/*/*.pdf; do
files[${f##*/}]="$f"
done
# Step through files and replace
for f in new-files/*.pdf; do
if [[ ! -e "${files[${f##*/}]}" ]]; then
echo "ERROR: missing $f" >&2
continue
fi
mv -v "$f" "${files[${f##*/}]}"
done
Note that associative arrays require bash version 4 or above. If you're using the native bash on a Mac, this won't work as-is.
Note also that if you remove continue in the final lines, then the mv command will NOT safely move files that do not exist in the date hash directories, since no target is known.
If you wanted further protection you might use test -nt or friends to confirm that an update is happening in the right direction.

Related

Passing variables to cp / mv

I can use the cp or mv command to copy/mv files to a new folder manually but while in a for loop, it fails.
I've tried various ways of doing this and none seem to work. The most frustrating part is it works when run locally.
A simple version of what I'm trying to do is shown below:
#!bin/bash
#Define path variables
source_dir=/home/me/loop
destination_dir=/home/me/loop/new
#Change working dir
cd "$source_dir"
#Step through source_dir for each .txt. file
for f in *.txt
do
# If the txt file was modified within the last 300 minutes...
if [[ $(find "$f" -mmin -300) ]]
then
# Add breaks for any spaces in filenames
f="${f// /\\ }"
# Copy file to destination
cp "$source_dir/$f $destination_dir/"
fi
done
Error message is:
cp: missing destination file operand after '/home/me/loop/first\ second.txt /home/me/loop/new/'
Try 'cp --help' for more information.
However, I can manually run:
mv /home/me/loop/first\ second.txt /home/me/loop/new/
and it works fine.
I get the same error using cp and similar errors using rsync so I'm not sure what I'm doing wrong...
cp "$source_dir/$f $destination_dir/"
When you surround both arguments with double quotes you turn them into one argument with an embedded space. Quote them separately.
cp "$source_dir/$f" "$destination_dir/"
There's no do anything special for spaces beforehand. The quoting already ensures files with whitespace are handled correctly.
# Add breaks for any spaces in filenames
f="${f// /\\ }"
Let's take a step back, though. Looping over all *.txt files and then checking each one with find is overly complicated. find already loops over multiple files and does arbitrary things to those files. You can do everything in this script in a single find command.
#!bin/bash
source_dir=/home/me/loop
destination_dir=/home/me/loop/new
find "$source_dir" -name '*.txt' -mmin -300 -exec cp -t "$destination_dir" {} +
You need to divide it in to two strings, like this:
cp "$source_dir/$f" "$destination_dir/"
by having as one you are basically telling cp that the entire line is the first parameter, where it is actually two (source and destination).
Edit: As #kamil-cuk and #aaron states there are better ways of doing what you try to do. Please read their comments

Replace the complete filenames for files with their MD5 hash string of the content in bash

Problem:
I have a bunch of files in a folder,i want to rename all of them to the md5 of the content of the file.
What i tried:
This is the command i tried.
for i in $(find /home/admin/test -type f);do mv $i $(md5sum $i|cut -d" " -f 1);done
But this is failing after sometime with the error and only some files are getting renamed leaving rest untouched.
mv: missing destination file operand after /home/admin/test/help.txt
Try `mv --help' for more information.
Is the implementation correct? Am i doing something wrong in the script.
Make things simple by making use the glob patterns that the shell provides, instead of using external utilities like find. Also see Why you don't read lines with "for"
Navigate inside the folder /home/admin/test and do the following which should be sufficient
for file in *; do
[ -f "$file" ] || continue
md5sum -- "$file" | { read sum _; mv "$file" "$sum"; }
done
Try using echo inplace of mv first to check once if they files are renamed as expected.
To go to sub-directories below, which I assume would also be your requirement, enable globstar, which is one of the extended globing options provided by the shell to go deeper
shopt -s globstar
for file in **/*; do
If you want to recursively rename all files with their md5 hash, you could try this:
find /home/admin/test -type f -exec bash -c 'md5sum "$1" | while read s f; do mv "${f#*./}" "$(dirname ${f#*./})/$s"; done' _ {} \;
The hash and filename is given as argument into the s and f variables. The ${f#*./} removes the prefix added by md5sum and find commands.
Note that if some file have exact same content, it will end up with only 1 file.

Bash: remove first line of file, create new file with prefix in new dir

I have a bunch of files in a directory, old_dir. I want to:
remove the first line of each file (e.g. using "sed '1d'")
save the output as a new file with a prefix, new_, added to the original filename (e.g. using "{,new_}old_filename")
add these files to a different directory, new_dir, overwriting any conflicting filenames
How do I do this with a Bash script? Having trouble putting the pieces together.
#!/usr/bin/env bash
old_dir="/path/to/somewhere"
new_dir="/path/to/somewhere_else"
prefix="new_"
if [ ! -d "$old_dir" -o ! -d "$new_dir" ]; then
echo "ERROR: We're missing a directory. Aborting." >&2
exit 1
fi
for file in "$old_dir"/*; do
tail +2 "$file" > "$new_dir"/"${prefix}${file##*/}"
done
The important parts of this are:
The for loop, which allows you do to work on each $file.
tail +2 which is notation which should remove the first line of the file. If your tail does not support this, you can get the same result with sed -e 1d.
${file##*/} which is functionally equivalent to basename "$file" but without spawning a child.
Really, none of this is bash-specific. You could run this in /bin/sh in most operating systems.
Note that the code above is intended to explain a process. Once you understand that process, you may be able to come up with faster, shorter strategies for achieving the same thing. For example:
find "$old_dir" -depth 1 -type f -exec sh -c "tail +2 \"{}\" > \"$new_dir/$prefix\$(basename {})\"" \;
Note: I haven't tested this. If you plan to use either of these solutions, do make sure you understand them before you try, so that you don't clobber your data by accident.

Trouble iterating through all files in directory

Part of my Bash script's intended function is to accept a directory name and then iterate through every file.
Here is part of my code:
#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory
D=$1
cd $D #go to directory specified as default input
fileNum=0 #save file numbers
DIR=".*|*"
for f in $DIR #for every file in the directory
do
files[$fileNum]=$f #save that file into the array
fileNum=$((fileNum+1)) #increment the fileNum
echo aFile
done
The echo statement is for testing purposes. I passed as an argument the name of a directory with four regular files, and I expected my output to look like:
aFile
aFile
aFile
aFile
but the echo statement only shows up once.
A single operation
Use find for this, it's perfect for it.
find <dirname> -maxdepth 1 -type f -exec echo "{}" \;
The flags explained: maxdepth defines how deep int he hierarchy you want to look (dirs in dirs in dirs), type f defines files, as opposed to type d for dirs. And exec allows you to process the found file/dir, which is can be accessed through {}. You can alternatively pass it to a bash function to perform more tasks.
This simple bash script takes a dir as argument and lists all it's files:
#!/bin/bash
find "$1" -maxdepth 1 -type f -exec echo "{}" \;
Note that the last line is identical to find "$1" -maxdepth 1 -type f -print0.
Performing multiple tasks
Using find one can also perform multiple tasks by either piping to xargs or while read, but I prefer to use a function. An example:
#!/bin/bash
function dostuff {
# echo filename
echo "filename: $1"
# remove extension from file
mv "$1" "${1%.*}"
# get containing dir of file
dir="${1%/*}"
# get filename without containing dirs
file="${1##*/}"
# do more stuff like echoing results
echo "containing dir = $dir and file was called $file"
}; export -f dostuff
# export the function so you can call it in a subshell (important!!!)
find . -maxdepth 1 -type f -exec bash -c 'dostuff "{}"' \;
Note that the function needs to be exported, as you can see. This so you can call it in a subshell, which will be opened by executing bash -c 'dostuff'. To test it out, I suggest your comment to mv command in dostuff otherwise you will remove all your extensions haha.
Also note that this is safe for weird characters like spaces in filenames so no worries there.
Closing note
If you decide to go with the find command, which is a great choice, I advise you read up on it because it is a very powerful tool. A simple man find will teach you a lot and you will learn a lot of useful options to find. You can for instance quit from find once it has found a result, this can be handy to check if dirs contain videos or not for example in a rapid way. It's truly an amazing tool that can be used on various occasions and often you'll be done with a one liner (kinda like awk).
You can directly read the files into the array, then iterate through them:
#! /bin/bash
cd $1
files=(*)
for f in "${files[#]}"
do
echo $f
done
If you are iterating only files below a single directory, you are better off using simple filename/path expansion to avoid certain uncommon filename issues. The following will iterate through all files in a given directory passed as the first argument (default ./):
#!/bin/bash
srchdir="${1:-.}"
for i in "$srchdir"/*; do
printf " %s\n" "$i"
done
If you must iterate below an entire subtree that includes numerous branches, then find will likely be your only choice. However, be aware that using find or ls to populate a for loop brings with it the potential for problems with embedded characters such as a \n within a filename, etc. See Why for i in $(find . -type f) # is wrong even though unavoidable at times.

Rename files in shell

I've folder and file structure like
Folder/1/fileNameOne.ext
Folder/2/fileNameTwo.ext
Folder/3/fileNameThree.ext
...
How can I rename the files such that the output becomes
Folder/1_fileNameOne.ext
Folder/2_fileNameTwo.ext
Folder/3_fileNameThree.ext
...
How can this be achieved in linux shell?
How many different ways do you want to do it?
If the names contain no spaces or newlines or other problematic characters, and the intermediate directories are always single digits, and if you have the list of the files to be renamed in a file file.list with one name per line, then one of many possible ways to do the renaming is:
sed 's%\(.*\)/\([0-9]\)/\(.*\)%mv \1/\2/\3 \1/\2_\3%' file.list | sh -x
You'd avoid running the command through the shell until you're sure it will do what you want; just look at the generated script until its right.
There is also a command called rename — unfortunately, there are several implementations, not all equally powerful. If you've got the one based on Perl (using a Perl regex to map the old name to the new name) you'd be able to use:
rename 's%/(\d)/%/${1}_%' $(< file.list)
Use a loop as follows:
while IFS= read -d $'\0' -r line
do
mv "$line" "${line%/*}_${line##*/}"
done < <(find Folder -type f -print0)
This method handle spaces, newlines and other special characters in the file names and the intermediate directories don't necessarily have to be single digits.
This may work if the name is always the same, ie "file":
for i in {1..3};
do
mv $i/file ${i}_file
done
If you have more dirs on a number range, change {1..3} for {x..y}.
I use ${i}_file instead of $i_file because it would consider $i_file a variable of name i_file, while we just want i to be the variable and file and text attached to it.
This solution from AskUbuntu worked for me.
Here is a bash script that does that:
Note: This script does not work if any of the file names contain spaces.
#! /bin/bash
# Only go through the directories in the current directory.
for dir in $(find ./ -type d)
do
# Remove the first two characters.
# Initially, $dir = "./directory_name".
# After this step, $dir = "directory_name".
dir="${dir:2}"
# Skip if $dir is empty. Only happens when $dir = "./" initially.
if [ ! $dir ]
then
continue
fi
# Go through all the files in the directory.
for file in $(ls -d $dir/*)
do
# Replace / with _
# For example, if $file = "dir/filename", then $new_file = "dir_filename"
# where $dir = dir
new_file="${file/\//_}"
# Move the file.
mv $file $new_file
done
# Remove the directory.
rm -rf $dir
done
Copy-paste the script in a file.
Make it executable using
chmod +x file_name
Move the script to the destination directory. In your case this should be inside Folder/.
Run the script using ./file_name.

Resources