In my directory I have thousands of PDF files. I want to write a shell script where goes through all the files and trims the last 16 characters and and saves back to the directory without keeping the old filename.
Now:
KUD_1234_Abc_DEF_9055_01.pdf
New:
KUD_1234.pdf
How can I solve that.
Thank you all
To the importance of analyzing and describing a problem properly to find a proper solution.
Here I implement exactly what you ask for:
#!/usr/bin/env sh
for oldname
do
# Capture old file name extension for re-use, by trimming-out the leading
# characters up-to including the dot
extension=${oldname##*.}
# Capture the old name without extension
extensionless=${oldname%.*}
# Compose new name by printing the old file name
# up to its length minus 16 characters
# and re-adding the extension
newname=$(
printf '%.*s.%s\n' $((${#extensionless}-16)) "$extensionless" "$extension"
)
# Demonstrate rename as a dummy
echo mv -- "$oldname" "$newname"
done
Works for your sample case:
mv -- KUD_1234_Abc_DEF_9055_01.pdf KUD_1234.pdf
Will collide not rename this:
mv -- KUD_1234_ooh_fail_666_02.pdf KUD_1234.pdf
Will not work with names shorter than 16 characters:
mv -- notwork.pdf notwork.pdf
Will probably not do what you expect if name has no dot extension:
mv -- foobar foobar.foobar
This should work for you (please backup data before trying):
find -type f | sed -E 's|^(.+)(.{16})(\.pdf)$|\1\2\3\ \1\3|g' | xargs -I f -- bash -c "mv f"
However, it's much easier to do it with python:
import os
os.chdir("/home/tkhalymon/dev/tmp/empty")
for f in os.listdir("."):
name, ext = os.path.splitext(f)
os.rename(f, f"{name[:-16]}{ext}")
Related
I constantly get a bunch of files named "Unknown.png" into a folder, and often times they get renamed "unknown (1).png, unknown (2).png" etc. This is a bit of a problem as sometimes when cleaning up files and moving them somewhere else I get asked if I want to replace or rename, etc.
So I decided to make a crontab task that renames the files to CB_RANDOM this way I don't even have to worry about potentially overwriting two files with the same name.
I could figure it so far, I find the files, replace the name Unknown to CB_ and add a random number.
the problem comes to (x) at the end of the filename. I managed to figure out also how to solve it I just strip away any parenthesis and numbers.
The problem is I can't figure out how to make the rename function to follow both rules.
for u in (find -name unknown*); do
rCode = random
rename -v 's/unknown/CB_$rCode' $u
rename -v 's/[ ()0123456789]//g' $u
Ideally I'd like to be able to follow both rules on the same line of code, specially since once it runs the first line, then $u wont be able to find the file for the second step.
No need for a loop:
find -name 'unknown*' -exec rename 's/unknown \([0-9]+\)\.(.*)$/"CB_".sprintf("%04s",int(rand(10000))).".".$1/e' {} \;
find all the files, starting in the current directory, recursively, with names similar to "unknown (1).png"
rename them with a resulting filename similar to "CB_0135.png"
This produces an error message if a filename already exists.
Your code should first be changed into
# find is a subcommand, use $()
# find a file with wildcard, use quotes
for u in $(find -name "unknown*"); do
# Is random a command? Use $()
rCode=$(random)
# Debug with echo, will show other problem
echo "File $u"
# $rCode will not be replaced by its value in single quotes
# Write a filename in double quotes, so it will not be split by a space
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done
The new line with echo shows that the loop is breaking up the filenames at the spaces. You can change this in
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
echo "File $u"
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done < <(find -name "unknown*")
I never use rename and would use
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
# construct new filename.
# Restriction: Path to file is without newlines, spaces or parentheses
newfile=$(sed 's/[ ()]//g; s/.*unknown/&_'"${rCode}"'_/' <<< "$u")
echo "Moving file $u to ${newfile}"
mv "$u" to "${newfile}"
done < <(find -name "unknown*")
EDIT:
I removed a sed command for renaming files with (something) in it:
# Removed command
newfile=$(sed 's/\(.*\)(\(.*\))/\1'"${rCode}"'_\2/' <<< "$u")
Trying to remove a string that is located after the file name extension, on multiple files at once. I do not know where the files will be, just that they will reside in a subfolder of the one I am in.
Need to remove the last string, everything after the file extension. File name is:
something-unknown.js?ver=12234.... (last bit is unknown too)
This one (below) I found in this thread:
for nam in *sqlite3_done
do
newname=${nam%_done}
mv $nam $newname
done
I know that I have to use % to remove the bit from the end, but how do I use wildcards in the last bit, when I already have it as the "for any file" selector?
Have tried with a modifies bit of the above:
for nam in *.js*
do
newname=${ nam .js% } // removing all after .js
mv $nam $newname
done
I´m in MacOS Yosemite, got bash shell and sed. Know of rename and sed, but I´ve seen only topics with specific strings, no wildcards for this issue except these:
How to rename files using wildcard in bash?
https://unix.stackexchange.com/questions/227640/rename-first-part-of-multiple-files-with-mv
I think this is what you are looking for in terms of parameter substitution:
$ ls -C1
first-unknown.js?ver=111
second-unknown.js?ver=222
third-unknown.js?ver=333
$ for f in *.js\?ver=*; do echo ${f%\?*}; done
first-unknown.js
second-unknown.js
third-unknown.js
Note that we escape the ? as \? to say that we want to match the literal question mark, distinguishing it from the special glob symbol that matches any single character.
Renaming the files would then be something like:
$ for f in *.js\?ver=*; do echo "mv $f ${f%\?*}"; done
mv first-unknown.js?ver=111 first-unknown.js
mv second-unknown.js?ver=222 second-unknown.js
mv third-unknown.js?ver=333 third-unknown.js
Personally I like to output the commands, save it to a file, verify it's what I want, and then execute the file as a shell script.
If it needs to be fully automated you can remove the echo and do the mv directly.
for x in $(find . -type f -name '*.js*');do mv $x $(echo $x | sed 's/\.js.*/.js/'); done
I totally understand what the problem is here.
I have a set of files, prepended as 'cat.jpg' and 'dog.jpg.' I just want to move the 'cat.jpg' files into a directory called 'cat.' Same with the 'dog.jpg' files.
for f in *.jpg; do
name=`echo "$f"|sed 's/ -.*//'`
firstThreeLetters=`echo "$name"|cut -c 1-3`
dir="path/$firstThreeLetters"
mv "$f" "$dir"
done
I get this message:
mv: cannot stat '*.jpg': No such file or directory
That's fine. But I can't find any way to iterate over these images without using that wildcard.
I don't want to use the wildcard. The only files are prepended with the 'dog' or 'cat'. I don't need to match. All the files are .jpgs.
Can't I just iterate over the contents of the directory without using a wildcard? I know this is a bit of an XY Problem but still I would like to learn this.
*.jpg would yield the literal *.jpg when there are no matching files.
Looks like you need nullglob. With Bash, you can do this:
#!/bin/bash
shopt -s nullglob # makes glob expand to nothing in case there are no matching files
for f in cat*.jpg dog*.jpg; do # pick only cat & dog files
first3=${f:0:3} # grab first 3 characters of filename
[[ -d "$first3" ]] || continue # skip if there is no such dir
mv "$f" "$first3/$f" # move
done
I have several directories containing files whose names contain the name of the folder more other words.
Example:
one/berg - one.txt
two/tree - two.txt
three/water - three.txt
and I would like to remain so:
one/berg.txt
two/tree.txt
three/water.txt
I tried with the sed command, find command, for command, etc.
I fail has to find a way to get it.
Could you help me?. Thank you
Short and simple, if you have GNU find:
find . -name '* - *.*' -execdir bash -c '
for file; do
ext=${file##*.}
mv -- "$file" "${file%% - *}.${ext}"
done
' _ {} +
-execdir executes the given command within the directory where each set of files are found, so one doesn't need to worry about directory names.
for file; do is a shorter way to write for file in "$#"; do.
${file##*.} expands to the contents of $file, with everything up to and including the last . removed (thus, it expands to the file's extension).
"${varname%% - *}" expands to the contents of the variable varname, with everything after <space><dash><space> removed from the end.
In the idiom -exec bash -c '...' _ {} + (as with -execdir), the script passed to bash -c is run with _ as $0, and all files found by find in the subsequent positions.
Here's a way to do it with the help of sed:
#!/bin/bash
find -type f -print0 | \
while IFS= read -r -d '' old_path; do
new_path="$(echo "$old_path" | sed -e 's|/\([^/]\+\)/\([^/]\+\) - \1.\([^/.]\+\)$|/\1/\2.\3|')"
if [[ $new_path != $old_path ]]; then
echo mv -- "$old_path" "$new_path"
# ^^^^ remove this "echo" to actually rename the files
fi
done
You must cd to the top level directory that contains all those files to do this. Also, it constains an echo, so it does not actually rename the files. Run it one to see if you like its output and if you do, remove the echo and run it again.
The basic idea is that we iterate over all files and for each file, we try to find if the file matches with the given pattern. If it does, we rename it. The pattern detects (and captures) the second last component of the path and also breaks up the last component of the path into 3 pieces: the prefix, the suffix (which must match with the previous path component), and the extension.
I have a perl script (or any executable) E which will take a file foo.xml and write a file foo.txt. I use a Beowulf cluster to run E for a large number of XML files, but I'd like to write a simple job server script in shell (bash) which doesn't overwrite existing txt files.
I'm currently doing something like
#!/bin/sh
PATTERN="[A-Z]*0[1-2][a-j]"; # this matches foo in all cases
todo=`ls *.xml | grep $PATTERN -o`;
isdone=`ls *.txt | grep $PATTERN -o`;
whatsleft=todo - isdone; # what's the unix magic?
#tack on the .xml prefix with sed or something
#and then call the job server;
jobserve E "$whatsleft";
and then I don't know how to get the difference between $todo and $isdone. I'd prefer using sort/uniq to something like a for loop with grep inside, but I'm not sure how to do it (pipes? temporary files?)
As a bonus question, is there a way to do lookahead search in bash grep?
To clarify/extend the problem:
I have a bunch of programs that take input from sources like (but not necessarily) data/{branch}/special/{pattern}.xml and write output to another directory results/special/{branch}-{pattern}.txt (or data/{branch}/intermediate/{pattern}.dat, e.g.). I want to check in my jobfarming shell script if that file already exists.
So E transforms data/{branch}/special/{pattern}.xml->results/special/{branch}-{pattern}.dat, for instance. I want to look at each instance of the input and check if the output exists. One (admittedly simpler) way to do this is just to touch *.done files next to each input file and check for those results, but I'd rather not manage those, and sometimes the jobs terminate improperly so I wouldn't want them marked done.
N.B. I don't need to check concurrency yet or lock any files.
So a simple, clear way to solve the above problem (in pseudocode) might be
for i in `/bin/ls *.xml`
do
replace xml suffix with txt
if [that file exists]
add to whatsleft list
end
done
but I'm looking for something more general.
#!/bin/sh
shopt -s extglob # allow extended glob syntax, for matching the filenames
LC_COLLATE=C # use a sort order comm is happy with
IFS=$'\n' # so filenames can have spaces but not newlines
# (newlines don't work so well with comm anyhow;
# shame it doesn't have an option for null-separated
# input lines).
files_todo=( **([A-Z])0[1-2][a-j]*.xml )
files_done=( **([A-Z])0[1-2][a-j]*.txt )
files_remaining=( \
$(comm -23 --nocheck-order \
<(printf "%s\n" "${files_todo[#]%.xml}") \
<(printf "%s\n" "${files_done[#]%.txt}") ))
echo jobserve E $(for f in "${files_remaining[#]%.xml}"; do printf "%s\n" "${f}.txt"; done)
This assumes that you want a single jobserve E call with all the remaining files as arguments; it's rather unclear from the specification if such is the case.
Note the use of extended globs rather than parsing ls, which is considered very poor practice.
To transform input to output names without using anything other than shell builtins, consider the following:
if [[ $in_name =~ data/([^/]+)/special/([^/]+).xml ]] ; then
out_name=results/special/${BASH_REMATCH[1]}-${BASH_REMATCH[2]}.dat
else
: # ...handle here the fact that you have a noncompliant name...
fi
The question title suggests that you might be looking for:
set -o noclobber
The question content indicates a wholly different problem!
It seems you want to run 'jobserve E' on each '.xml' file without a matching '.txt' file. You'll need to assess the TOCTOU (Time of Check, Time of Use) problems here because you're in a cluster environment. But the basic idea could be:
todo=""
for file in *.xml
do [ -f ${file%.xml}.txt ] || todo="$todo $file"
done
jobserve E $todo
This will work with Korn shell as well as Bash. In Bash you could explore making 'todo' into an array; that will deal with spaces in file names better than this will.
If you have processes still generating '.txt' files for '.xml' files while you run this check, you will get some duplicated effort (because this script cannot tell that the processing is happening). If the 'E' process creates the corresponding '.txt' file as it starts processing it, that minimizes the chance or duplicated effort. Or, maybe consider separating the processed files from the unprocessed files, so the 'E' process moves the '.xml' file from the 'to-be-done' directory to the 'done' directory (and writes the '.txt' file to the 'done' directory too). If done carefully, this can avoid most of the multi-processing problems. For example, you could link the '.xml' to the 'done' directory when processing starts, and ensure appropriate cleanup with an 'atexit()' handler (if you are moderately confident your processing program does not crash). Or other trickery of your own devising.
whatsleft=$( ls *.xml *.txt | grep $PATTERN -o | sort | uniq -u )
Note this actually gets a symmetric difference.
i am not exactly sure what you want, but you can check for existence of the file first, if it exists, create a new name? ( Or in your E (perl script) you do this check. )
if [ -f "$file" ];then
newname="...."
fi
...
jobserve E .... > $newname
if its not what you want, describe more clearly in your question what you mean by "don't overwrite files"..
for posterity's sake, this is what i found to work:
TMPA='neverwritethis.tmp'
TMPB='neverwritethat.tmp'
ls *.xml | grep $PATTERN -o > $TMPA;
ls *.txt | grep $PATTERN -o > $TMPB;
whatsleft = `sort $TMPA $TMPB | uniq -u | sed "s/%/.xml" > xargs`;
rm $TMPA $TMPB;