Remove YYYY_MM_DD_HH_MM from filename - bash

We have few csv and xml files in following formats
String_YYYY_MM_DD_HH_MM.csv
String_YYYY_MM_DD_HH_MM.xml
String.xml
String.csv
Examples:
Reference_Categories_2021_02_24_17_14.csv
CD_CategoryTree_2021_02_24_17_14.csv
New_Categories.xml
Mobile_Footnote_2021_03_05_16_21.csv
Campaign_Version_2018_09_24_20_00.xml
Campaign_new.csv
Now we have to remove _YYYY_MM_DD_HH_MM from filenames so result will be
Reference_Categories.csv
CD_CategoryTree.csv
New_Categories.xml
Mobile_Footnote.csv
Campaign_Version.xml
Campaign_new.csv
Any idea how to do that in bash?

In pure bash:
pat='_[0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[0-9][0-9]_[0-9][0-9]'
for f in *$pat*.{csv,xml}; do echo mv "$f" "${f/$pat}"; done
Delete the echo if the output looks fine.

With bash Something like:
shopt -s nullglob
for f in *.{xml,csv}; do
ext="${f##*.}"
[[ "${f%%_[0-9]*}" = *.#(xml|csv) ]] && continue
echo mv -v -- "$f" "${f%%_[0-9]*}.$ext"
done
With the =~ operator and BASH_REMATCH
shopt -s nullglob
regexp='^(.{1,})(_[[:digit:]]{4}_[[:digit:]]{2}_[[:digit:]]{2}_[[:digit:]]{2}_[[:digit:]]{2})([.].*)$'
for f in *.{xml,csv}; do
[[ "$f" =~ $regexp ]] &&
echo mv -v -- "$f" "${BASH_REMATCH[1]}${BASH_REMATCH[-1]}"
done
Remove the echo if you're satisfied with the output.

Using bash, find, and awk:
Use find to find files with .csv or .xml suffix in the current directory. Pipe the find output to awk and create the mv commands that are output and passed to bash.
bash < <(find * -type f \( -name '*.csv' -o -name '*.xml' \) | awk '{orig=$0; gsub(/_[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{2}/,""); print "mv "orig" "$0}')
Directory contents before:
find * -type f
CD_CategoryTree_2021_02_24_17_14.csv
Campaign_Version_2018_09_24_20_00.xml
Campaign_new.csv
Mobile_Footnote_2021_03_05_16_21.csv
New_Categories.xml
Reference_Categories_2021_02_24_17_14.csv
Directory contents after:
find * -type f
CD_CategoryTree.csv
Campaign_Version.xml
Campaign_new.csv
Mobile_Footnote.csv
New_Categories.xml
Reference_Categories.csv

Related

Rename files to unique names and move them into a single destination directory

i have 100s of directories with same filename of content.html along with other files.
I am trying to copy all these content.html files under 1 directory, but since they have same name, it overwrites each other
so how can i rename and move all these under 1 directory
Eg:
./0BD3D9D2-F8B1-4472-95C2-13319650A45C:
card.png content.html note.xhtml quickLook.png snippet.txt
./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0:
card.png content.html note.xhtml quickLook.png related snippet.txt
./1A33F29E-3938-4C2F-BA99-6B98FD045742:
card.png content.html note.xhtml quickLook.png snippet.txt
command i tried:
rename content.html to content
find . -type f | grep content.html | while read f; do mv $f ${f/.html/}; done
append number to filename "content" to make it unique
find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
MacBook-Pro$ find . -type f | grep content | while read f; do i=1; echo mv $f $f$i.html; i=i+1; done
mv ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content ./0BD3D9D2-F8B1-4472-95C2-13319650A45C/content1.html
mv ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content ./0EA34DB4-CD56-42BE-91DA-F631E44FB6E0/content1.html
mv ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content ./1A33F29E-3938-4C2F-BA99-6B98FD045742/content1.html
once above step is successful, i should be able do this to achieve my desired output:
find . -type f | grep content | while read f; do mv $f ../; done
however, i am sure i can do this in 1 step command and also my step 2 is not working (incrementing i)
any idea why step2 is not working??
bash script:
#!/bin/bash
find . -type f -name content.html | while IFS= read -r f; do
name=$(basename $f)
((++i))
mv "$f" "for_content/${name%.*}$i.html"
done
replace for_content with your destination folder name
Suppose in your base directory, you create a folder named final for storing
content.html files, then do something like below
find . -path ./final -prune -o -name "content.html" -print0 |
while read -r -d '' name
do
mv "$name" "./final/content$(mktemp -u XXXX).html"
# mktemp with -u option just creates random characters, or it is just a dry run
done
At the end you'll get all the content.html files under ./final folder in the format contentXXXX.html where XXXX are random characters.
Note:-path ./final -prune -o in find prevents it from descending to our results folder.
The inode of the of the files should be unique and so you could use the following:
find $(pwd) -name "content.html" -printf %f" "%i" "%p"\n" | awk '{ system("mv "$3" <directorytomoveto>"$2$1) }'
I'd use something like this:
find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
You can replace ./output-dir/ with your destination directory
Example:
[root#sl7-o2 test]# ls -R
.:
1 2 3 output-dir
./1:
test
./2:
test
./3:
test
./output-dir:
[root#sl7-o2 test]# find . -type f -name 'test' | awk 'BEGIN{ cnt=0 }{ printf "mv %s ./output-dir/content_%03d.txt\n", $0, cnt++ }' | bash;
[root#sl7-o2 test]# ls ./output-dir/
content_000.txt content_001.txt content_002.txt
You can use shopt -s globstar to grab all content.html files recursively and then use a loop to rename them:
#!/bin/bash
set -o globstar
counter=0
dest_dir=/path/to/destination
for f in **/content.html; do # pick up all content.html files
[[ -f "$f" ]] || continue # skip if not a regular file
mv "$f" "$dest_dir/content_$((++counter).html"
done

Renaming all files in a folder with a prefix and in ascending order

How does one rename
random_files.jpg
that\ may\ contain\ spaces.jpg
and_differ_in_extensions.mp4
to
PREFIX_1.jpg
PREFIX_2.jpg
PREFIX_3.mp4
via bash script? More formally, how do I rename all files in a directory into an ordered list of form PREFIX_N.ext where .ext is preserved from the original filename.
My attempt below
for f in *; do
[[ -f "$f" ]] && mv "$f" "PREFIX_$f"
done
changes only prefixes.
You can use this in a for loop using find:
while IFS= read -rd '' file; do
ext="${file##*.}"
echo mv "$file" "PREFIX_$((++i)).$ext"
done < <(find . -type f -name '*.*' -maxdepth 1 -print0)
Once satisfied with the output, remove echo before mv command.
You can loop over the files using *, and then access them with a quoted var to preserve all the special characters.
You can then use parameter expansion to remove the start of the file up to ., and append that to your new filename.
x=1;for i in *;do [[ -f "$i" ]] && mv "$i" "PREFIX_$((x++)).${i##*.}";done
If you know x isn't already set though you can remove the assignment at the start and change $((x++)) to $((++x))

Compare files with the same name

I created script to compare files in folder (with the name .jpg and without it BUT with the same NAME).The problem that script searches for files in ONE directory ,not in SubDirectories!How i can fix it?
for f in *
do
for n in *.jpg
do
tempfile="${n##*/}"
echo "Processing"
echo "${tempfile%.*}"
echo "$f"
if [[ "${tempfile%.*}" = $f ]]
then
echo "This files have the same name!"
//do something here
else
echo "No files"
fi
done
done
This requires bash version 4 for associative arrays.
shopt -s globstar nullglob extglob
declare -A jpgs
for jpg in **/*.jpg; do
name=$(basename "${jpg%.jpg}")
jpgs["$name"]=$jpg
done
for f in **/!(*.jpg); do
name=$(basename "$f")
if [[ -n ${jpgs["$name"]} ]]; then
echo "$f has the same name as ${jpgs["$name"]}"
fi
done
You can also try using find
find . -type f -name "*.sh" -printf "%f\n" | cut -f1 -d '.' > jpg.txt
while read line
do
find . -name "$line.*" -print
done < jpg.txt

Bash script to list files not found

I have been looking for a way to list file that do not exist from a list of files that are required to exist. The files can exist in more than one location. What I have now:
#!/bin/bash
fileslist="$1"
while read fn
do
if [ ! -f `find . -type f -name $fn ` ];
then
echo $fn
fi
done < $fileslist
If a file does not exist the find command will not print anything and the test does not work. Removing the not and creating an if then else condition does not resolve the problem.
How can i print the filenames that are not found from a list of file names?
New script:
#!/bin/bash
fileslist="$1"
foundfiles="~/tmp/tmp`date +%Y%m%d%H%M%S`.txt"
touch $foundfiles
while read fn
do
`find . -type f -name $fn | sed 's:./.*/::' >> $foundfiles`
done < $fileslist
cat $fileslist $foundfiles | sort | uniq -u
rm $foundfiles
#!/bin/bash
fileslist="$1"
while read fn
do
FPATH=`find . -type f -name $fn`
if [ "$FPATH." = "." ]
then
echo $fn
fi
done < $fileslist
You were close!
Here is test.bash:
#!/bin/bash
fn=test.bash
exists=`find . -type f -name $fn`
if [ -n "$exists" ]
then
echo Found it
fi
It sets $exists = to the result of the find. the if -n checks if the result is not null.
Try replacing body with [[ -z "$(find . -type f -name $fn)" ]] && echo $fn. (note that this code is bound to have problems with filenames containing spaces).
More efficient bashism:
diff <(sort $fileslist|uniq) <(find . -type f -printf %f\\n|sort|uniq)
I think you can handle diff output.
Give this a try:
find -type f -print0 | grep -Fzxvf - requiredfiles.txt
The -print0 and -z protect against filenames which contain newlines. If your utilities don't have these options and your filenames don't contain newlines, you should be OK.
The repeated find to filter one file at a time is very expensive. If your file list is directly compatible with the output from find, run a single find and remove any matches from your list:
find . -type f |
fgrep -vxf - "$1"
If not, maybe you can massage the output from find in the pipeline before the fgrep so that it matches the format in your file; or, conversely, massage the data in your file into find-compatible.
I use this script and it works for me
#!/bin/bash
fileslist="$1"
found="Found:"
notfound="Not found:"
len=`cat $1 | wc -l`
n=0;
while read fn
do
# don't worry about this, i use it to display the file list progress
n=$((n + 1))
echo -en "\rLooking $(echo "scale=0; $n * 100 / $len" | bc)% "
if [ $(find / -name $fn | wc -l) -gt 0 ]
then
found=$(printf "$found\n\t$fn")
else
notfound=$(printf "$notfound\n\t$fn")
fi
done < $fileslist
printf "\n$found\n$notfound\n"
The line counts the number of lines and if its greater than 0 the find was a success. This searches everything on the hdd. You could replace / with . for just the current directory.
$(find / -name $fn | wc -l) -gt 0
Then i simply run it with the files in the files list being separated by newline
./search.sh files.list

How to loop through a directory recursively to delete files with certain extensions

I need to loop through a directory recursively and remove all files with extension .pdf and .doc. I'm managing to loop through a directory recursively but not managing to filter the files with the above mentioned file extensions.
My code so far
#/bin/sh
SEARCH_FOLDER="/tmp/*"
for f in $SEARCH_FOLDER
do
if [ -d "$f" ]
then
for ff in $f/*
do
echo "Processing $ff"
done
else
echo "Processing file $f"
fi
done
I need help to complete the code, since I'm not getting anywhere.
As a followup to mouviciel's answer, you could also do this as a for loop, instead of using xargs. I often find xargs cumbersome, especially if I need to do something more complicated in each iteration.
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm $f; done
As a number of people have commented, this will fail if there are spaces in filenames. You can work around this by temporarily setting the IFS (internal field seperator) to the newline character. This also fails if there are wildcard characters \[?* in the file names. You can work around that by temporarily disabling wildcard expansion (globbing).
IFS=$'\n'; set -f
for f in $(find /tmp -name '*.pdf' -or -name '*.doc'); do rm "$f"; done
unset IFS; set +f
If you have newlines in your filenames, then that won't work either. You're better off with an xargs based solution:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -print0 | xargs -0 rm
(The escaped brackets are required here to have the -print0 apply to both or clauses.)
GNU and *BSD find also has a -delete action, which would look like this:
find /tmp \( -name '*.pdf' -or -name '*.doc' \) -delete
find is just made for that.
find /tmp -name '*.pdf' -or -name '*.doc' | xargs rm
Without find:
for f in /tmp/* tmp/**/* ; do
...
done;
/tmp/* are files in dir and /tmp/**/* are files in subfolders. It is possible that you have to enable globstar option (shopt -s globstar).
So for the question the code should look like this:
shopt -s globstar
for f in /tmp/*.pdf /tmp/*.doc tmp/**/*.pdf tmp/**/*.doc ; do
rm "$f"
done
Note that this requires bash ≥4.0 (or zsh without shopt -s globstar, or ksh with set -o globstar instead of shopt -s globstar). Furthermore, in bash <4.3, this traverses symbolic links to directories as well as directories, which is usually not desirable.
If you want to do something recursively, I suggest you use recursion (yes, you can do it using stacks and so on, but hey).
recursiverm() {
for d in *; do
if [ -d "$d" ]; then
(cd -- "$d" && recursiverm)
fi
rm -f *.pdf
rm -f *.doc
done
}
(cd /tmp; recursiverm)
That said, find is probably a better choice as has already been suggested.
Here is an example using shell (bash):
#!/bin/bash
# loop & print a folder recusively,
print_folder_recurse() {
for i in "$1"/*;do
if [ -d "$i" ];then
echo "dir: $i"
print_folder_recurse "$i"
elif [ -f "$i" ]; then
echo "file: $i"
fi
done
}
# try get path from param
path=""
if [ -d "$1" ]; then
path=$1;
else
path="/tmp"
fi
echo "base path: $path"
print_folder_recurse $path
This doesn't answer your question directly, but you can solve your problem with a one-liner:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -exec rm {} +
Some versions of find (GNU, BSD) have a -delete action which you can use instead of calling rm:
find /tmp \( -name "*.pdf" -o -name "*.doc" \) -type f -delete
For bash (since version 4.0):
shopt -s globstar nullglob dotglob
echo **/*".ext"
That's all.
The trailing extension ".ext" there to select files (or dirs) with that extension.
Option globstar activates the ** (search recursivelly).
Option nullglob removes an * when it matches no file/dir.
Option dotglob includes files that start wit a dot (hidden files).
Beware that before bash 4.3, **/ also traverses symbolic links to directories which is not desirable.
This method handles spaces well.
files="$(find -L "$dir" -type f)"
echo "Count: $(echo -n "$files" | wc -l)"
echo "$files" | while read file; do
echo "$file"
done
Edit, fixes off-by-one
function count() {
files="$(find -L "$1" -type f)";
if [[ "$files" == "" ]]; then
echo "No files";
return 0;
fi
file_count=$(echo "$files" | wc -l)
echo "Count: $file_count"
echo "$files" | while read file; do
echo "$file"
done
}
This is the simplest way I know to do this:
rm **/#(*.doc|*.pdf)
** makes this work recursively
#(*.doc|*.pdf) looks for a file ending in pdf OR doc
Easy to safely test by replacing rm with ls
The following function would recursively iterate through all the directories in the \home\ubuntu directory( whole directory structure under ubuntu ) and apply the necessary checks in else block.
function check {
for file in $1/*
do
if [ -d "$file" ]
then
check $file
else
##check for the file
if [ $(head -c 4 "$file") = "%PDF" ]; then
rm -r $file
fi
fi
done
}
domain=/home/ubuntu
check $domain
There is no reason to pipe the output of find into another utility. find has a -delete flag built into it.
find /tmp -name '*.pdf' -or -name '*.doc' -delete
The other answers provided will not include files or directories that start with a . the following worked for me:
#/bin/sh
getAll()
{
local fl1="$1"/*;
local fl2="$1"/.[!.]*;
local fl3="$1"/..?*;
for inpath in "$1"/* "$1"/.[!.]* "$1"/..?*; do
if [ "$inpath" != "$fl1" -a "$inpath" != "$fl2" -a "$inpath" != "$fl3" ]; then
stat --printf="%F\0%n\0\n" -- "$inpath";
if [ -d "$inpath" ]; then
getAll "$inpath"
#elif [ -f $inpath ]; then
fi;
fi;
done;
}
I think the most straightforward solution is to use recursion, in the following example, I have printed all the file names in the directory and its subdirectories.
You can modify it according to your needs.
#!/bin/bash
printAll() {
for i in "$1"/*;do # for all in the root
if [ -f "$i" ]; then # if a file exists
echo "$i" # print the file name
elif [ -d "$i" ];then # if a directroy exists
printAll "$i" # call printAll inside it (recursion)
fi
done
}
printAll $1 # e.g.: ./printAll.sh .
OUTPUT:
> ./printAll.sh .
./demoDir/4
./demoDir/mo st/1
./demoDir/m2/1557/5
./demoDir/Me/nna/7
./TEST
It works fine with spaces as well!
Note:
You can use echo $(basename "$i") # print the file name to print the file name without its path.
OR: Use echo ${i%/##*/}; # print the file name which runs extremely faster, without having to call the external basename.
Just do
find . -name '*.pdf'|xargs rm
If you can change the shell used to run the command, you can use ZSH to do the job.
#!/usr/bin/zsh
for file in /tmp/**/*
do
echo $file
done
This will recursively loop through all files/folders.
The following will loop through the given directory recursively and list all the contents :
for d in /home/ubuntu/*;
do
echo "listing contents of dir: $d";
ls -l $d/;
done

Resources