bash - mass renaming files with many special characters - bash

I have a lot of files (in single directory) like:
[a]File-. abc'.d -001[xxx].txt
so there are many spaces, apostrophes, brackets, and full stops. The only differences between them are numbers in place of 001, and letters in place of xxx.
How to remove the middle part, so all that remains would be
[a]File-001[xxx].txt
I'd like an explanation how such code would work, so I could adapt it for other uses, and hopefully help answer others similar questions.

Here is a simple script in pure bash:
for f in *; do # for all entries in the current directory
if [ -f "$f" ]; then # if the entry is a regular file (i.e. not a directory)
mv "$f" "${f/-*-/-}" # rename it by removing everything between two dashes
# and the dashes, and replace the removed part
# with a single dash
fi
done
The magic done in the "${f/-*-/-}" expression is described in the bash manual (the command is info bash) in the chapter 3.5.3 Shell Parameter Expansion
The * pattern in the first line of the script can be replaced with anything than can help to narrow the list of the filles you want to rename, e.g. *.txt, *File*.txt, etc.

If you have the rename (aka prename) utility that's a part of Perl distribution, you could say:
rename -n 's/([^-]*-).*-(.*)/$1$2/' *.txt
to rename all txt files in your desired format. The -n above would not perform the actual rename, it'd only tell you what it would do had you not specified it. (In order to perform the actual rename, remove -n from the above command.)
For example, this would rename the file
[a]File-. abc'.d -001[xxx].txt
as
[a]File-001[xxx].txt
Regarding the explanation, this captures the part upto the first - into a group, and the part after the second (or last) one into another and combines those.
Read about Regular Expressions. If you have perl docs available on your system, saying perldoc perlre should help.

Related

Replace first character of file in folder from Uppercase to Lower case

I'm trying to convert 3,000 or so .svg files from CapitalCase to camelCase.
Current:
-Folder
--FileName1
--FileName2
--FileName3
Goal:
-Folder
--fileName1
--fileName2
--fileName3
How can I use terminal to change the casing on the first character with to lowercase?
Currently I've been trying something along these lines: for f in *.svg; do mv -v "$f" "${f:1}"; done
All files in the folder start with a letter or number.
This can be done very succinctly in zsh with zmv:
autoload zmv
zmv -nvQ '(**/)(?)(*.svg)(.)' '$1${(L)2}$3'
This will recurse through any number of directory levels, and can handle name collisions and other edge cases.
Some of the pieces:
-n: no execution. With this option, zmv will only report what changes it would make. It's a dry run that can be used to test out the patterns. Remove it when you're ready to actually change the names.
-v: verbose.
-Q: qualifiers. Used to indicate that the source pattern includes a glob qualifier (in our case (.)).
'(**/)(?)(*.svg)(.)': source pattern. This is simply a regular zsh glob pattern, divided into groups with parentheses. The underlying pattern is **/?*.svg(.). The pieces:
(**/): directories and subdirectories. This will match any number of directory levels (to only affect the current directory, see below).
(?): matches a single character at the start of the file name. We'll convert this to lowercase later.
(*.svg): matches the rest of the file name.
(.): regular files only. This is a zsh glob qualifier; zmv recognizes it as a qualifier instead of a grouping because of the -Q option. The . qualifier limits the matching to regular files so that we don't try to rename directories.
'$1${(L)2}$3': destination pattern. Each of the groupings in the source pattern is referenced in order with $1, $2, etc.
$1: the directory. This could contain multiple levels.
${(L)2}: The first letter in the file name, converted to lowercase. This uses the L parameter expansion flag to change the case.
The l expansion modifier will also work: $2:l.
The conversion can handle non-ASCII characters, e.g. Éxito would
become éxito.
$3: the rest of the file name, including the extension.
Variations
This will only change files in the current directory:
zmv -nv '(?)(*.svg)' '$1:l$2'
The source pattern in the following version will only match files that start with an uppercase letter. Since the zmv utility won't rename files if the source and destination match, this isn't strictly necessary, but it will be slightly more efficient:
zmv -nvQ '(**/)([[:upper:]])(*.svg)(.)' '$1${(L)2}$3'
More information
zmv documentation:
https://zsh.sourceforge.io/Doc/Release/User-Contributions.html#index-zmv
zsh parameter expansion flags:
https://zsh.sourceforge.io/Doc/Release/Expansion.html#Parameter-Expansion-Flags
Page with some zsh notes, including a bunch of zmv examples:
https://grml.org/zsh/zsh-lovers.html
Solving in bash, tested and working fine, be careful though with your files you working on.
Renaming files in current directory where this script is (1st arg then'd be .) or provide a path, it's do lower-casing of the first letter, if it was uppercase, and yet nothing if it was a number, argument must be provided:
# 1 argument - folder name
# 2 argument - file extension (.txt, .svg, etc.)
for filename in $(ls "$1" | grep "$2")
do
firstChar=${filename:0:1}
restChars=${filename:1}
if [[ "$firstChar" =~ [A-Z] ]] && ! [[ "$firstChar" =~ [a-z] ]]; then
toLowerFirstChar=$(echo $firstChar | awk '{print tolower($0)}')
modifiedFilename="$toLowerFirstChar$restChars"
mv "$1/$filename" "$1/$modifiedFilename"
else
echo "Non-alphabetic or already lowercase"
# here may do what you want fith files started with numbers in name
fi
done
Use: bash script.sh Folder .txt
ATTENTION: Now here after running script and renaming, names of some files may coincide and there would be a conflict in this case. Can later fix it and update this script.

move command with regular expressions

Bash is not recognizing the regular expression in this mv command:
mv ../downloads'^[exam].*$[.pdf] ../physics2400/exams
I'm trying to move files from a download directory to what ever directory I have made for them to go into.
An example of such a file is 'Exam 2 Practice Homework (Solutions).pdf'
(the single quotes are part of the file in Bash apparently.
There are many other files in the download folder hence the regex or the attempt anyway.
When performing filename expansion, Bash does not use regular expressions. Instead, a type of pattern matching referred to as globbing is used. This is discussed in the Filename Expansion section of the Bash manual.
In regards to your example file name (Exam 2 Practice Homework (Solutions).pdf), here are a couple things to note:
the single quotes are not part of the file name, but are a convenience to avoid having to escape special characters in the filename (i.e. the spaces and the parentheses). Without the quotes, the filename would be specified Exam\ 2\ Practice\ Homework\ \(Solutions\).pdf. See the Quoting section of the Bash manual for further details.
filesystems in Unix-like operating systems are case sensitive, so you need to account for the upper case E the filename starts with
Here's a pattern matching expression that would match your example filename as well as other files that start with Exam and end with .pdf.
mv ../downloads/Exam*.pdf ../phyiscs2400/exams
If you have files that start with both Exam and exam, you could account for both with the following:
mv ../downloads/[Ee]xam*.pdf ../phyiscs2400/exams
The bracketed expression is interpreted as "matches any one of the enclosed characters". This allows you to account for both upper and lower case.
Before executing such mv commands, I would test the filename expansion by running ls to verify that the intended files are matched:
ls ../downloads/[Ee]xam*.pdf
If you want to use the regular expression, how about this?
find ./downloads -regex '.*\.pdf' -exec mv '{}' exams/ \;

Mass renaming of files in folder

I need to renami all the files below few files format in the folder in such a way that last _2.txt will be the same and apac, emea, mds will be the same in all files but before _XXX_2.txt need to add logs_date to all the files.
ABC_xyz_123_apac_2.txt
POR5_emea_2.txt
qw_1_0_122_mds_2.txt
to
logs_date_apac_2.txt
logs_date_emea_2.txt
logs_date_mds_2.txt
I'm not sure but maybe this is what you want:
#!/bin/bash
for file in *_2.txt;do
# remove echo to rename the files once you check it does what you expect
echo mv -v "$file" "$(sed 's/.*\(_.*_2\.txt\)$/logs_date\1/' <<<"$file")"
done
Do you have to use bash?
Bulk Rename Utility is an awesome tool that can easily rename multiple files in an intuitive way.
http://www.bulkrenameutility.co.uk/Main_Intro.php
Using mmv command should be easy.
mmv '*_*_2.txt' 'logs_date_#2_2.txt' *.txt
You could also use the rename tool:
rename 's/.+(_[a-z]+_[0-9].)/logs_date$1/' files
This will give you the desired output.
If you don't want to or can't use sed, you can also try this, which might even run faster. No matter what solution you use, be sure to backup before if possible.
shopt +s extglob # turn on the extglob shell option, which enables several extended pattern matching operators
set +H # turn off ! style history substitution
for file in *_2.txt;do
# remove echo to rename the files once you check it does what you expect
echo mv -v "$file" "${file/?(*_)!(*apac*|*emea*|*mds*)_/logs_date_}"
done
${parameter/pattern/string} performs pattern substitution. First optionally a number of characters ending with an underscore are matched, then a following number of characters not containing apac, emea or mds and ending with an underscore are matched, then the match is replaced with "logs_date_".
Copied from the bash man page:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns

Using Awk or Sed to tack on a statement at the end of a specific line

I have a file I named poscar1.cif, and I would like to insert the contents of a variable at a specific line in this file.
For example, line 24, which currently reads:
_cell_length_a
I would like to tack the contents of my variable a (defined in my function as a=5.3827) so that way the line now reads:
_cell_length_a 5.3827
Is there a way to do this using sed or awk? I am using a bash script to accomplish this (the full script is too large to post, unfortunately).
Since the veteran ed utility doesn't get enough attention anymore:
a=5.3827
ed -s poscar1.cif <<EOF
g/^_cell_length_a\$/ s//& $a/
w
EOF
ed truly edits a file in place, unlike sed with its -i option[1].
sed borrowed many features from ed, so there is significant overlap in functionality, but there are also important differences, some of which surface here.
-s suppresses ed's status messages.
poscar1.cif is the input file to edit in place.
<<EOF ... is the here-document that contains the commands for ed - ed requires its commands to come from stdin and each command to be on its own line.
g/^_cell_length_a\$/ ... is a (basic) regex (regular expression) that matches all lines that exactly contain _cell_length_a - the g ensures that no error is reported if there's no match at all.
Note that the $ is \-escaped to protect it from interpretation by the shell inside the here-document (not strictly necessary in this instance, but good practice).
s//& $a/ ... // repeats the search for the most recently used regex on a matching line and replaces the match with itself (&), followed by a space and the value of variable $a.
Note that since the opening delimiter (EOF) of the here-document is unquoted, shell variable expansions DO take place; in essence, the contents are treated by the shell like the contents of a double-quoted string.
w writes the modified buffer back to the input file.
For debugging, use ,p in place of w so as to only print the modified buffer, without writing it back to the file.
[1] Re in-place updating:
More precisely, ed preserves the file's existing inode, which ensures that all the file's attributes are preserved.
However, it does not overwrite individual bytes of the existing file, but reads the entire file into a buffer in memory, and writes the entire buffer to the file when asked to.
This makes ed suitable only for files small enough to be read into memory as a whole.
By contrast, sed -i (GNU and BSD sed), its GNU 4.1+ counterpart, awk -i inplace, and also perl -i replace the original file with a newly created one, which implies that they:
destroy symlinks(!) - if the input file was a symlink, it is replaced with a regular file of the same name
A common scenario where that matters: say your shell initialization file ~/.bashrc is a symlink to a file elsewhere you keep under source control; you then install a tool that uses sed -i to modify ~/.bashrc, which results in it being replaced with a regular file, and the link to your source-controlled version is broken.
What's more, BSD sed's behavior even introduces a security risk (see below).
do not preserve the original file-creation date (where supported; e.g., on OSX)
they do, however,
preserve extended attributes (where supported; e.g., on OSX)
preserve file permissions
Caution: BSD sed introduces a security risk with respect to symlinks (behavior still present as of the version that comes with FreeBSD 10):
The symlink's permissions are copied to the replacement file, not the symlink target's. Since symlinks get executable permissions by default, you'll invariably end up with an executable file, whether the input file was executable or not.
Fortunately, GNU sed handles this scenario properly.
sed, gawk, and perl could address the issues above by taking extra steps, but there's one thing that can only be ensured if the original inode is retained, as ed does:
When a file is being monitored for changes by its inode number (e.g., with tail -f), not preserving the inode breaks that monitoring.
You can use sed to do it, depending on your answer to dawg's question as
sed -i -e '24s/$/5.3827/' poscar1.cif
or if it's the pattern
sed -i -e '/_cell_length_a/s/$/5.3827/' poscar1.cif
The first goes to the line with the given number, the later will apply on any line that matches the pattern in the first set of slashes. In either case it will then "replace" the end of the line with the value between the final two slashes.
Using your example, you could do something like this:
sed -i 's/\(_cell_length_a\)/\1 5.3827/' poscar1.cif
where,
the -i option says to edit the file in place, rather than creating a copy
the funky looking quoted part is a string specifying a regular expression aka regex
poscar1.cif is the file
The regex syntax is hard to read. The basic format to find and replace is:
s/find/replace/
Where find is the text of the line you're looking for and replace is the text to replace that text with.
If we want to use part of the find string in our replacement, we group it by surrounding it with \( and \) and then use \1 to refer to it in the replacement string. The following appends replace to any line consisting of find:
s/\(find\)/\1replace/
Keep in mind that there are special escape characters or meta characters that you have to treat specially if your string contains them.

Bash foreach on cronjob

I am trying to create a "watch" folder where I will be able to copy files 2 sets of files with the same name, but different file extensions. I have a program that need to reference both files, but since they have the same name, only differing by extension I figure I might be able to do something like this with a cron job
cronjob.sh:
#/bin/bash
ls *.txt > processlist.txt
for filename in 'cat processlist.txt'; do
/usr/local/bin/runcommand -input1=/home/user/process/$filename \
-input2=/home/user/process/strsub($filename, -4)_2.stl \
-output /home/user/process/done/strsub($filename, -4)_2.final;
echo "$filename finished processing"
done
but substr is a php command, not bash. What would be the right way of doing this?
strsub($filename, -4)
in Bash is
${filename:(-4)}
See Shell Parameter Expansion.
Your command can look like
/usr/local/bin/runcommand "-input1=/home/user/process/$filename" \
"-input2=/home/user/process/${filename:(-4)}_2.stl" \
"-output /home/user/process/done/${filename:(-4)}_2.final"
Note: Prefer quoting your arguments with variables around double-quotes to prevent word splitting and possible pathname expansion. This would be helpful to filenames with spaces.
It would also be better to directly pass your glob pattern as an argument to for to properly distribute tokens without getting split with word splitting.
for filename in *.txt; do
So Konsolebox's solution was almost right, but the issue was that when you do ${filename:(-4)} it only returns the last 4 letters of the variable instead of trimming the last 4 off. When I did was change it to ${filename%.txt} where the %.txt matches to the text I want to find and remove, and then just tagged .mp3 on at the end to change the extension.
His other suggestion of using this for loop also was much better than mine:
for filename in *.txt; do
The only other modification was putting the full command all on one line in the end. I divided it up here to make sure it was all easily visible.

Resources