How to remove unknown file extensions from files using script - shell

I can remove file extensions if I know the extensions, for example to remove .txt from files:
foreach file (`find . -type f`)
mv $file `basename $file .txt`
end
However if I don't know what kind of file extension to begin with, how would I do this?
I tried:
foreach file (`find . -type f`)
mv $file `basename $file .*`
end
but it wouldn't work.

What shell is this? At least in bash you can do:
find . -type f | while read -r; do
mv -- "$REPLY" "${REPLY%.*}"
done
(The usual caveats apply: This doesn't handle files whose name contains newlines.)

You can use sed to compute base file name.
foreach file (`find . -type f`)
mv $file `echo $file | sed -e 's/^\(.*\)\.[^.]\+$/\1/'`
end

Be cautious: The command you seek to run could cause loss of data!
If you don't think your file names contain newlines or double quotes, then you could use:
find . -type f -name '?*.*' |
sed 's/\(.*\)\.[^.]*$/mv "&" "\1"/' |
sh
This generates your list of files (making sure that the names contain at least one character plus a .), runs each file name through the sed script to convert it into an mv command by effectively removing the material from the last . onwards, and then running the stream of commands through a shell.
Clearly, you test this first by omitting the | sh part. Consider running it with | sh -x to get a trace of what the shell's doing. Consider making sure you capture the output of the shell, standard output and standard error, into a log file so you've got a record of the damage that occurred.
Do make sure you've got a backup of the original set of files before you start playing with this. It need only be a tar file stored in a different part of the directory hierarchy, and you can remove it as soon as you're happy with the results.
You can choose any shell; this doesn't rely on any shell constructs except pipes and single quotes and double quotes (pretty much common to all shells), and the sed script is version neutral too.
Note that if you have files xyz.c and xyz.h before you run this, you'll only have a file xyz afterwards (and what it contains depends on the order in which the files are processed, which needn't be alphabetic order).
If you think your file names might contain double quotes (but not single quotes), you can play with the changing the quotes in the sed script. If you might have to deal with both, you need a more complex sed script. If you need to deal with newlines in file names, then it is time to (a) tell your user(s) to stop being silly and (b) fix the names so they don't contain newlines. Then you can use the script above. If that isn't feasible, you have to work a lot harder to get the job done accurately — you probably need to make sure you've got a find that supports -print0, a sed that supports -z and an xargs that supports -0 (installing the most recent GNU versions if you don't already have the right support in place).

It's very simple:
$ set filename=/home/foo/bar.dat
$ echo ${filename:r}
/home/foo/bar
See more in man tcsh, in "History substitution":
r
Remove a filename extension '.xxx', leaving the root name.

Related

how list just one file from a (bash) shell directory listing

A bit lowly a query but here goes:
bash shell script. POSIX, Mint 21
I just want one/any (mp3) file from a directory. As a sample.
In normal execution, a full run, the code would be such
for f in *.mp3 do
#statements
done
This works fine but if I wanted to sample just one file of such an array/glob (?) without looping, how might I do that? I don't care which file, just that it is an mp3 from the directory I am working in.
Should I just start this for-loop and then exit(break) after one statement, or is there a neater way more tailored-for-the-job way?
for f in *.mp3 do
#statement
break
done
Ta (can not believe how dopey I feel asking this one, my forehead will hurt when I see the answers )
Since you are using Linux (Mint) you've got GNU find so one way to get one .mp3 file from the current directory is:
mp3file=$(find . -maxdepth 1 -mindepth 1 -name '*.mp3' -printf '%f' -quit)
-maxdepth 1 -mindepth 1 causes the search to be restricted to one level under the current directory.
-printf '%f' prints just the filename (e.g. foo.mp3). The -print option would print the path to the filename (e.g. ./foo.mp3). That may not matter to you.
-quit causes find to exit as soon as one match is found and printed.
Another option is to use the Bash : (colon) command and $_ (dollar underscore) special variable:
: *.mp3
mp3file=$_
: *.mp3 runs the : command with the list of .mp3 files in the current directory as arguments. The : command ignores its arguments and does nothing.
mp3file=$_ sets the value of the mp3file variable to the last argument supplied to the previous command (:).
The second option should not be used if the number of .mp3 files is large (hundreds or more) because it will find all of the files and sort them by name internally.
In both cases $mp3file should be checked to ensure that it really exists (e.g. [[ -e $mp3file ]]) before using it for anything else, in case there are no .mp3 files in the directory.
I would do it like this in POSIX shell:
mp3file=
for f in *.mp3; do
if [ -f "$f" ]; then
mp3file=$f
break
fi
done
# At this point, the variable mp3file contains a filename which
# represents a regular file (or a symbolic link) with the .mp3
# extension, or empty string if there is no such a file.
The fact that you use
for f in *.mp3 do
suggests to me, that the MP3s are named without to much strange characters in the filename.
In that case, if you really don't care which MP3, you could:
f=$(ls *.mp3|head)
statement
Or, if you want a different one every time:
f=$(ls *.mp3|sort -R | tail -1)
Note: if your filenames get more complicated (including spaces or other special characters), this will not work anymore.
Assuming you don't have spaces in your filenames, (and I don't understand why the collective taboo is against using ls in scripts at all, rather than not having spaces in filenames, personally) then:-
ls *.mp3 | tr ' ' '\n' | sed -n '1p'

Removing unknown / non-specific string after file extension on file names

Trying to remove a string that is located after the file name extension, on multiple files at once. I do not know where the files will be, just that they will reside in a subfolder of the one I am in.
Need to remove the last string, everything after the file extension. File name is:
something-unknown.js?ver=12234.... (last bit is unknown too)
This one (below) I found in this thread:
for nam in *sqlite3_done
do
newname=${nam%_done}
mv $nam $newname
done
I know that I have to use % to remove the bit from the end, but how do I use wildcards in the last bit, when I already have it as the "for any file" selector?
Have tried with a modifies bit of the above:
for nam in *.js*
do
newname=${ nam .js% } // removing all after .js
mv $nam $newname
done
I´m in MacOS Yosemite, got bash shell and sed. Know of rename and sed, but I´ve seen only topics with specific strings, no wildcards for this issue except these:
How to rename files using wildcard in bash?
https://unix.stackexchange.com/questions/227640/rename-first-part-of-multiple-files-with-mv
I think this is what you are looking for in terms of parameter substitution:
$ ls -C1
first-unknown.js?ver=111
second-unknown.js?ver=222
third-unknown.js?ver=333
$ for f in *.js\?ver=*; do echo ${f%\?*}; done
first-unknown.js
second-unknown.js
third-unknown.js
Note that we escape the ? as \? to say that we want to match the literal question mark, distinguishing it from the special glob symbol that matches any single character.
Renaming the files would then be something like:
$ for f in *.js\?ver=*; do echo "mv $f ${f%\?*}"; done
mv first-unknown.js?ver=111 first-unknown.js
mv second-unknown.js?ver=222 second-unknown.js
mv third-unknown.js?ver=333 third-unknown.js
Personally I like to output the commands, save it to a file, verify it's what I want, and then execute the file as a shell script.
If it needs to be fully automated you can remove the echo and do the mv directly.
for x in $(find . -type f -name '*.js*');do mv $x $(echo $x | sed 's/\.js.*/.js/'); done

Recursive batch rename

I have several folders with some files that I would like to rename from
Foo'Bar - Title
to
Title
I'm using OS X 10.7. I've looked at other solutions, but none that address recursion very well.
Any suggestions?
There are two parts to your problem: Finding files to operate on recursively, and renaming them.
For the first, if everything is exactly one level below the current directory, you can just list the contents of every directory in the current directory (as in Mattias Wadman's answer above), but more generally (and possibly more easy to understand, to boot), you can just use the find command.
For the second, you can use sed and work out how to get the quoting and piping right (which you should definitely eventually learn), but it's much simpler to use the rename command. Unfortunately, this one isn't built in on Mac, but you can install it with, e.g., Homebrew, or just download the perl script and sudo install -m755 rename /usr/local/bin/rename.
So, you can do this:
find . -exec rename 's|[^/]* - ||' {} +
If you want to do a "dry run" to make sure it's right, add the "-n" flag to rename:
find . -exec rename -n 's|[^/]* - ||' {} +
To understand how it works, you really should read the tutorial for find, and the manpage for rename, but breaking it down:
find . means 'find all files recursively under the current directory'.
You can add additional tests to filter things (e.g., -type f if you want to skip everything but regular files, or `-name '*Title' if you want to only change files that end in 'Title'), but that isn't necessary for your use.
-exec … + means to batch up the found files, and pass as many of them as possible in place of any {} in the command that appears in the '…'.
rename 's|[^/]* - ||' {} means for each file in {}, apply the perl expression s|[^/]* - || to the filename, and, if the result is different, rename it to that result.
s|[^/]* - || means to match the regular expression '[^/]* -' and replace the match with '' (the empty string).
[^/]* - means to match any string of non-slash characters that ends with ' - '. So, in './A/FooBar - Title', it'll match the 'FooBar -'.
I should mention that, when I have something complicated to do like this, if after a few minutes and a couple attempts to get it right with find/sed/awk/rename/etc., I still haven't got it, I often just code it up imperatively with Python and os.walk. If you know Python, that might be easier for you to understand (although more verbose and less simple), and easier for you to modify to other use cases, so if you're interested, ask for that.
Try this:
ls -1 * | while read f ; do mv "$f" "`echo $f | sed 's/^.* - //'`" ; done
I recommend you to add a echo before mv before running it to make sure the commands look ok. And as abarnert noted in the comments this command will only work for one directory at a time.
Detailed explanation of the various commands:
ls -1 * will output a line for each file (and directory) in the current directory (except .-files). So this will be expanded in to ls -1 file1 file2 ..., -1 to ls tells it to list the filename only and one file per line.
The output is then piped into while read f ; ... ; done which will loop while read f returns zero, which it does until it reaches end of file. read f reads one line at a time from standard input (which in this case is the output from ls -1 ...) and store it in the the variable specified, in this case f.
In the while loop we run a mv command with two arguments, first "$f" as the source file (note the quotes to handle filenames with spaces etc) and second the destination filename which uses sed and ` (backticks) to do what is called command substitution that will call the command inside the backticks and be replaced it with the output from standard output.
echo $f | sed 's/^.* - //' pipes the current file $f into sed that will match a regular expression and do substitution (the s in s/) and output the result on standard output. The regular expression is ^.* - which will match from the start of the string ^ (called anchoring) and then any characters .* followed by - and replace it with the empty string (the string between //).
I know you asked for batch rename, but I suggest you to use Automator.
It works perfectly, and if you create it as a service you will have the option in your contextual menu :)
After some trial and error, I came across this solution that worked for me to solve the same problem.
find <dir> -name *.<oldExt> -exec rename -S .<oldExt> .<newExt> {} \;
Basically, I leverage the find and rename utilities. The trick here is figuring out where to place the '{}' (which represents the files that need to be processed by rename) of rename.
P.S. rename is not a built-in linux utility. I work with OS X and used homebrew to install rename.

Properly handle lists of files with whitespace in filename

I want to iterate over a list of files in Bash and perform some action. The problem: the file names may contain whitespace, which creates an obvious problem with wildcards or ls:
touch a\ b
FILES=* # or $(ls)
for FILE in $FILES; do echo $FILE; done
yields
a
b
Now, the conventional way to handle this is to use find … -print0 instead. However, this only works (well) in conjunction with xargs -0, not with Bash variables / loops.
My idea was to set $IFS to the null character to make this work. However, the comp.unix.shell seems to think that this is impossible in bash.
Bummer. Well, it’s theoretically possible to use another character, such as : (after all, $PATH uses this format, too):
IFS=$':'
FILES=$(find . -print0 | xargs -0 printf "%s:")
for FILE in $FILES; do echo $FILE; done
(The output is slightly different but fair enough.)
However, I can’t help but feel that this is clumsy and that there should be a more direct way of doing this. I’m looking for a more direct way of accomplishing this, preferably using wildcards or ls.
The best way to handle this is to store the file list as an array, rather than a string (and be sure to double-quote all variable substitutions):
files=(*)
for file in "${files[#]}"; do
echo "$file"
done
If you want to generate an array from find's output (e.g. if you need to search recursively), see this previous answer.
Exactly what you have in the first example works fine for me in Msys Bash, Cygwin and on my Fedora box:
FILES=*
for FILE in $FILES
do
echo $FILE
done
Its very important to preceed
IFS=""
otherwise files with two directly following spaces will not be found

Shell script: execute cmd on a file, with additional processing of file name

So I am going to post a question about shell scripting again.
Problem Definition: For all files under a dir, ex.:
A_anything.txt, B_anything.txt, ......
I want to execute a script, say 'CMD', on each of them, with the output files named like:
A_result.txt, B_result.txt, ......
In addition, at the first line of these output file, I want to have the file name of the original one.
The 'find -exec' util seems to me unable to extract part of the file name.
Does someone know a solution to this problem, by any means(shell, python, find,etc)? Thank you!
cd /directory
for file in *.txt ; do
newfilename=`echo "$file"|sed 's/\(.\+\)_.*/\1_result.txt/`
echo "$file" > "$newfilename"
your-command $file >> "$newfilename"
done
HTH
Well, there's more than one way to do it (including using Perl, where that's the motto), but probably I'd write it like this:
find . -name '[A-Z]_*.txt' -type f -print0 |
xargs -0 modify_rename.sh
And then I'd write the script modify_rename.sh like this:
#!/bin/sh
for file in "$#"
do
dirname=$(dirname "$file")
basename=$(basename "$file" .txt)
leadname=${file%_*}
outname="$dirname/${leadname}_result.txt"
# Optionally check for pre-existence of $outname
{
# Optionally echo "$basename.txt" instead of "$file"
echo "$file"
# Does this invocation of CMD write to standard output?
# If not, adjust invocation appropriately.
CMD "$file"
} > "$outname"
done
The advantage of this separation into separate scripting operations is that the rename/modify operation can be checked out separately from the search process - which runs less risk of zapping your entire directory structure with bad commands.
Bash has the tools to avoid invoking basename and dirname but the notation is moderatly excruciating; I find the clarity of the command names worth having. I'd be happy if bash implemented them as built-ins. There are plenty of other ways to get the prefix of the file; this should be safe, though, even in the presence of spaces (tabs, newlines) in file or directory names because of the careful use of double quotes.

Resources