Removing unknown / non-specific string after file extension on file names

Removing unknown / non-specific string after file extension on file names - bash

Trying to remove a string that is located after the file name extension, on multiple files at once. I do not know where the files will be, just that they will reside in a subfolder of the one I am in.
Need to remove the last string, everything after the file extension. File name is:
something-unknown.js?ver=12234.... (last bit is unknown too)
This one (below) I found in this thread:
for nam in *sqlite3_done
do
newname=${nam%_done}
mv $nam $newname
done
I know that I have to use % to remove the bit from the end, but how do I use wildcards in the last bit, when I already have it as the "for any file" selector?
Have tried with a modifies bit of the above:
for nam in *.js*
do
newname=${ nam .js% } // removing all after .js
mv $nam $newname
done
I´m in MacOS Yosemite, got bash shell and sed. Know of rename and sed, but I´ve seen only topics with specific strings, no wildcards for this issue except these:
How to rename files using wildcard in bash?
https://unix.stackexchange.com/questions/227640/rename-first-part-of-multiple-files-with-mv

I think this is what you are looking for in terms of parameter substitution:
$ ls -C1
first-unknown.js?ver=111
second-unknown.js?ver=222
third-unknown.js?ver=333
$ for f in *.js\?ver=*; do echo ${f%\?*}; done
first-unknown.js
second-unknown.js
third-unknown.js
Note that we escape the ? as \? to say that we want to match the literal question mark, distinguishing it from the special glob symbol that matches any single character.
Renaming the files would then be something like:
$ for f in *.js\?ver=*; do echo "mv $f ${f%\?*}"; done
mv first-unknown.js?ver=111 first-unknown.js
mv second-unknown.js?ver=222 second-unknown.js
mv third-unknown.js?ver=333 third-unknown.js
Personally I like to output the commands, save it to a file, verify it's what I want, and then execute the file as a shell script.
If it needs to be fully automated you can remove the echo and do the mv directly.

for x in $(find . -type f -name '*.js*');do mv $x $(echo $x | sed 's/\.js.*/.js/'); done

Related

How can I create a rename script using multiple rules?

I constantly get a bunch of files named "Unknown.png" into a folder, and often times they get renamed "unknown (1).png, unknown (2).png" etc. This is a bit of a problem as sometimes when cleaning up files and moving them somewhere else I get asked if I want to replace or rename, etc.
So I decided to make a crontab task that renames the files to CB_RANDOM this way I don't even have to worry about potentially overwriting two files with the same name.
I could figure it so far, I find the files, replace the name Unknown to CB_ and add a random number.
the problem comes to (x) at the end of the filename. I managed to figure out also how to solve it I just strip away any parenthesis and numbers.
The problem is I can't figure out how to make the rename function to follow both rules.
for u in (find -name unknown*); do
rCode = random
rename -v 's/unknown/CB_$rCode' $u
rename -v 's/[ ()0123456789]//g' $u
Ideally I'd like to be able to follow both rules on the same line of code, specially since once it runs the first line, then $u wont be able to find the file for the second step.

No need for a loop:
find -name 'unknown*' -exec rename 's/unknown \([0-9]+\)\.(.*)$/"CB_".sprintf("%04s",int(rand(10000))).".".$1/e' {} \;
find all the files, starting in the current directory, recursively, with names similar to "unknown (1).png"
rename them with a resulting filename similar to "CB_0135.png"
This produces an error message if a filename already exists.

Your code should first be changed into
# find is a subcommand, use $()
# find a file with wildcard, use quotes
for u in $(find -name "unknown*"); do
# Is random a command? Use $()
rCode=$(random)
# Debug with echo, will show other problem
echo "File $u"
# $rCode will not be replaced by its value in single quotes
# Write a filename in double quotes, so it will not be split by a space
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done
The new line with echo shows that the loop is breaking up the filenames at the spaces. You can change this in
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
echo "File $u"
rename -v "s/unknown/CB_$rCode" "$u"
rename -v 's/[ ()0123456789]//g' "$u"
done < <(find -name "unknown*")
I never use rename and would use
while IFS= read -r u; do
# Use unique timestamp, not random value
rCode=$(date '+%Y%m%d_%H%M')
# construct new filename.
# Restriction: Path to file is without newlines, spaces or parentheses
newfile=$(sed 's/[ ()]//g; s/.*unknown/&_'"${rCode}"'_/' <<< "$u")
echo "Moving file $u to ${newfile}"
mv "$u" to "${newfile}"
done < <(find -name "unknown*")
EDIT:
I removed a sed command for renaming files with (something) in it:
# Removed command
newfile=$(sed 's/\(.*\)(\(.*\))/\1'"${rCode}"'_\2/' <<< "$u")

How to remove unknown file extensions from files using script

I can remove file extensions if I know the extensions, for example to remove .txt from files:
foreach file (`find . -type f`)
mv $file `basename $file .txt`
end
However if I don't know what kind of file extension to begin with, how would I do this?
I tried:
foreach file (`find . -type f`)
mv $file `basename $file .*`
end
but it wouldn't work.

What shell is this? At least in bash you can do:
find . -type f | while read -r; do
mv -- "$REPLY" "${REPLY%.*}"
done
(The usual caveats apply: This doesn't handle files whose name contains newlines.)

You can use sed to compute base file name.
foreach file (`find . -type f`)
mv $file `echo $file | sed -e 's/^\(.*\)\.[^.]\+$/\1/'`
end

Be cautious: The command you seek to run could cause loss of data!
If you don't think your file names contain newlines or double quotes, then you could use:
find . -type f -name '?*.*' |
sed 's/$.*$\.[^.]*$/mv "&" "\1"/' |
sh
This generates your list of files (making sure that the names contain at least one character plus a .), runs each file name through the sed script to convert it into an mv command by effectively removing the material from the last . onwards, and then running the stream of commands through a shell.
Clearly, you test this first by omitting the | sh part. Consider running it with | sh -x to get a trace of what the shell's doing. Consider making sure you capture the output of the shell, standard output and standard error, into a log file so you've got a record of the damage that occurred.
Do make sure you've got a backup of the original set of files before you start playing with this. It need only be a tar file stored in a different part of the directory hierarchy, and you can remove it as soon as you're happy with the results.
You can choose any shell; this doesn't rely on any shell constructs except pipes and single quotes and double quotes (pretty much common to all shells), and the sed script is version neutral too.
Note that if you have files xyz.c and xyz.h before you run this, you'll only have a file xyz afterwards (and what it contains depends on the order in which the files are processed, which needn't be alphabetic order).
If you think your file names might contain double quotes (but not single quotes), you can play with the changing the quotes in the sed script. If you might have to deal with both, you need a more complex sed script. If you need to deal with newlines in file names, then it is time to (a) tell your user(s) to stop being silly and (b) fix the names so they don't contain newlines. Then you can use the script above. If that isn't feasible, you have to work a lot harder to get the job done accurately — you probably need to make sure you've got a find that supports -print0, a sed that supports -z and an xargs that supports -0 (installing the most recent GNU versions if you don't already have the right support in place).

It's very simple:
$ set filename=/home/foo/bar.dat
$ echo ${filename:r}
/home/foo/bar
See more in man tcsh, in "History substitution":
r
Remove a filename extension '.xxx', leaving the root name.

automatically renaming files

I have a bunch of files (more than 1000) on this like the followings
$ ls
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-dev.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm-train.lex
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lc
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
org.allenai.ari.solvers.termselector.ExpandedLearner.lex
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lc
org.allenai.ari.solvers.termselector.ExpandedLearnerSVM.lex
....
I have to rename these files files by adding a learners right before the capitalized name. For example
org.allenai.ari.solvers.termselector.BaselineLearnersurfaceForm.lex
would change to
org.allenai.ari.solvers.termselector.learners.BaselineLearnersurfaceForm.lex
and this one
org.allenai.ari.solvers.termselector.ExpandedLearner.lc
would change to
org.allenai.ari.solvers.termselector.learners.ExpandedLearner.lc
Any ideas how to do this automatically?

for f in org.*; do
echo mv "$f" "$( sed 's/\.\([A-Z]\)/.learner.\1/' <<< "$f" )"
done
This short loop outputs an mv command that renames the files in the manner that you wanted. Run it as-is first, and when you are certain it's doing what you want, remove the echo and run again.
The sed bit in the middle takes a filename ($f, via a here-string, so this requires bash) and replaces the first occurrence of a capital letter after a dot with .learner. followed by that same capital letter.

There is a tool called perl-rename, sometimes rename. Not to be confused with rename from util-linux.
It's very good for tasks like this as it takes a perl expression and renames accordingly:
perl-rename 's/(?=\.[A-Z])/.learners/' *
You can play with the regex online
Alternative you can a for loop and $BASH_REMATCH:
for file in *; do
[ -e "$file" ] || continue
[[ "$file" =~ ^([^A-Z]*)(.*)$ ]]
mv -- "$file" "${BASH_REMATCH[1]}learners.${BASH_REMATCH[2]}"
done

A very simple approach (useful if you only need to do this one time) is to ls >dummy them into a text file dummy, and then use find/replace in a text editor to make lines of the form mv xxx.yyy xxx.learners.yyy. Then you can simple execute the resulting file with ./dummy.
The exact find/replace commands depend on the text editor you use, but something like
replace org. with mv org.. That gets you the mv in the beginning.
replace mv org.allenai.ari.solvers.termselector.$1 with mv org.allenai.ari.solvers.termselector.$1 org.allenai.ari.solvers.termselector.learner.$1 to duplicate the filename and insert the learner.
There is also syntax with a for, which can do it probably in one line, (long) but I cannot explain it - try help for if you want to learn about it.

Appending and Renaming File in Bash

I've got a file
sandeep_mems_SJ_23102003.txt which needs to be renamed sj_new_members_SJ_23102003.txt
I'll be getting these files daily so its vital that anything after _SJ remain the same.
So far I've got the following:-
for each in `/bin/ls -1`;do
sed -i 's/sandeep_mems_SJ/sj_new_members/g' $each ;
done

sed would help you if you were changing the contents of files. For renaming the file itself, you could do:
for each in *;do
mv $each sj_new_members_${each##sandeep_mems_SJ}
done
I used * rather than /bin/ls because it avoids spawning an extra process and uses Bash's built in matching (globbing) mechanism.
Each filename is assigned to $each.
mv renames $each to sj_new_members_ followed by the substring of $each that you want, using Bash's substring mechanism. More details on how to use Bash substrings are here:
http://tldp.org/LDP/abs/html/string-manipulation.html
Also, here's an alternative that uses the cut command, which splits along a specified character delimiter, in this case _. I don't like it as much because it spawns a new process, but it works. View the cut man page for more details. Note that $(command) is equalent to using backticks -- it runs a command in a subshell.
for each in *;do
mv $each sj_new_members_$(cut -d '_' -f 3- <<< $each)
done

for each in `/bin/ls -1`;do
mv $each sj_new_members_SJ${each##*SJ}
done
The ##*SJ is syntax for parameter expansion for removing everything up to the last SJ. Haven't tested the whole thing but it should work.

You can use rename utility:
rename 's/sandeep.*?_(\d+\.txt)$/sj_new_members_$1/' sandeep*txt

I tried to replicate your function as much as possible, so here's a solution that implements sed:
for each in *; do
new=$(echo "$each" | sed 's/.*_SJ/sj_new_members_SJ_/')
mv $each $new
done
I don't believe you actually need the ls -1 command, as sed will change the filenames of those files that contain the requirements stated above.
In essence, what my command does is save the new file name in a variable, new, and then mv renames it to the filename saved in the variable.

Using Wildcards with 'rename'

I have been using the rename command to batch rename files. Up to now, I have had files like:
2010.306.18.08.11.0000.BO.ADM..BHZ.SAC
2010.306.18.08.11.0000.BO.AMM..BHZ.SAC
2010.306.18.08.11.0000.BO.ASI..BHE.SAC
2010.306.18.08.11.0000.BO.ASI..BHZ.SAC
and using rename 2010.306.18.08.11.0000.BO. "" * and rename .. _. * I have reduced them to:
ADM_.BHZ.SAC
AMM_.BHZ.SAC
ASI_.BHE.SAC
ASI_.BHZ.SAC
which is exactly what I want. A bit clumsy, I guess, but it works. The problem occurs now that I have files like:
2010.306.18.06.12.8195.TW.MASB..BHE.SAC
2010.306.18.06.14.7695.TW.CHGB..BHN.SAC
2010.306.18.06.24.4195.TW.NNSB..BHZ.SAC
2010.306.18.06.25.0695.TW.SSLB..BHZ.SAC
which exist in the same folder. I have been trying to get the similar results to above using wildcards in the rename command eg. rename 2010.306.18.*.*.*.*. "" but this appends the first appearance of 2010.306.18.*.*.*.*. to the beginning of all the other files - clearly not what I'm after, such that I get:
2010.306.18.06.12.8195.TW.MASB..BHE.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.14.7695.TW.CHGB..BHN.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.24.4195.TW.NNSB..BHZ.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.25.0695.TW.SSLB..BHZ.SAC
I guess I am not understanding a fairly fundamental principal of wildcards here so, can someone please explain why this doesn't work and what I can do to get the desired result (preferably using rename).
N.B.
To clarify, the output wants to be:
ADM_.BHZ.SAC
AMM_.BHZ.SAC
ASI_.BHE.SAC
ASI_.BHZ.SAC
MASB.BHE.SAC
CHGB.BHN.SAC
NNSB.BHZ.SAC
SSLB.BHZ.SAC

You can try this first to see what commands would be executed
for f in *; do echo mv $f `echo $f | sed 's/2010.*.TW.//'` ; done
If it's what you expect, you can remove echo from the command to execute
for f in *; do mv $f `echo $f | sed 's/2010.*.TW.//'` ; done

rename does not allow wildcards in the from and to strings. When you run rename 2010.306.18.*.*.*.*. "" * it is actually your shell which first expands the wildcard and then passes the result of the expansion to rename, hence why it does not work.
Instead of using rename, use a loop as follows:
for file in *
do
tmp="${file##2010*TW.}" # remove the file prefix
mv "$file" "${tmp/../_}" # replace dots with underscore
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio