Rename multiple files with bash for loop, mv, and sed - bash

My goal is to rename a folder of files of the form 'img_MM-DD-YY_XX.jpg' to the form 'newyears_YYYY-MM-DD_XXX.jpg' by iterating through each filename and using sed to perform substitutions based on character positions. Unfortunately I cannot seem to get the position-based swaps to work.
e.g. s/.\{4\}[0-9][0-9]/.\{10\}[0-9][0-9]/ attempts to replace MM with YY
Here is my attempt (neglecting for now the _XX part):
for filename in images/*
do
newname=$(echo $filename | sed 's/.\{4\}[0-9][0-9]/.\{10\}[0-9][0-9]/;
s/.\{7\}[0-9][0-9]/.\{4\}[0-9][0-9]/;
s/.\{10\}[0-9][0-9]/.\{7\}[0-9][0-9]/;
s/img_/newyears_20/')
mv $filename $newname
done
Any ideas how I can fix this?

$ echo 'img_11-22-14_XX.jpg' | sed -r 's/[^_]*_([0-9]{2})-([0-9]{2})-([0-9]{2})/newyears_20\3-\1-\2/'
newyears_2014-11-22_XX.jpg
The above looks for anything up to and including the first underline followed by a 6-digit date. It replaces the initial part with newyears_ and reformats the date from mm-dd-yy to 20yy-mm-dd.
The two-digit mm, dd, or yy values are matched with ([0-9]{2}). The parentheses indicate that sed should capture the value for later use. The output side of the substitution is _20\3-\1-\2. This restores the underline and adds a 20 to the front of the year. The year was the third captured value so it is denoted \3. Likewise, the month was the first captured value so it is denoted \1 and the day the second so it is \2.
To eliminate some blackslashes, I used the -r option to invoke extended regular expressions. If you are on a Mac or other non-GNU system, use sed -E in place of sed -r. Otherwise, use:
sed 's/[^_]*_\([0-9]\{2\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/newyears_20\3-\1-\2/'

This is simple to do with awk
echo "img_MM-DD-YY_XX.jpg" | awk -F"[_-]" '{print "newyears_20"$4"-"$2"-"$3"_0"$5}'
newyears_20YY-MM-DD_0XX.jpg

Related

Ubuntu: Terminal remove part of string

I want to use the Ubuntu Terminal to rename several hundred files in a folder since I am not allowed to install anything.
The name of the files is in the following format:
ER201703_Company_Name_Something_9876543218_90087625374823.csv
Afterwards it should look like this:
ER201703_9876543218_90087625374823.csv
So, I want to remove the middle part (Company_name_something) which sometimes has 2, 3 or even 4 _'s. I wanted to create 2 strings; one for the front part and one for the back part. The front part is easy and already working but I am struggeling with the back part.
for name in *.csv;
do
charleng=${#name};
start=$(echo "$name" | grep -a '_9');
back=$(echo "$name" | cut -c $start-);
front=$(echo "$name" | cut -c1-9);
mv "$name""$front$back";
done
I am trying to find the position of _9 and keep everything from there to the end of the string.
Best regards
Jan
If rename is installed (I think that's the case for Ubuntu) you can use the following command instead of your loop.
rename -n 's/^(ER\d*)\w*?(_9\w*)/$1$2/' *.csv
Remove the -n (no act) to apply the changes.
Explanation
s/.../.../ substitutes matches of the left regex with the right pattern.
(ER\d*) matches the first part (ER followed by some digits) and stores it inside $1 for later use.
\w*? matches the company part, that is as few (non-greedy) word characters (letters, numbers, underscore, ...) as possible.
(_9\w*) matches the second part and stores it inside $2 for later use.
$1$2 is the substitution of the previously matched parts. We only omit the company part.
awk -F'_' '{printf "mv %s %s_%s_%s\n",$0,$1,$(NF-1),$NF}'
Example:
kent$ awk -F'_' '{printf "mv %s %s_%s_%s\n",$0,$1,$(NF-1),$NF}' <<<"ER201703_Company_Name_Something_9876543218_90087625374823.csv"
mv ER201703_Company_Name_Something_9876543218_90087625374823.csv ER201703_9876543218_90087625374823.csv
This one-liner will print out the mv old new command. If it is ok, you just pipe the output to |sh, (awk ....|sh), the rename will be done.
If your filename can contain spaces, pls consider to quote the filenames by double quotes.
I can offer alternative solution, may be more generic.
rename 's/^([^_]+(?=_))(?:\w+(?=_\d+))(_\d+_\d+\.csv)$/$1$2/' *.csv
in a case the name of the log will change you want to have robust regex expression.
([^_]+(?=_)) - match everything that not underscore till the first one and store it to $1
(?:\w+(?=_\d+)) - match chars until the numbers but (?:...) not store to var
(_\d+_\d+\.csv) - match set of numbers and file extension and store it to $2

Sed pattern in a date mm/dd/yyyy

My issue is changing a part of a date in mm/dd/yyyy to mm/dd/2016 or for learning purposes let's say mm/dd/yyyy to mm/02/yyyy.
In my file I'm going to cat in:
05/06/1989
05/06/2001
01/03/2015
Using sed to replace that file, I am running commands such as:
sed 's|[0-9][0-9]/[0-9][0-9]/[0-9][0-9]|[0-9][0-9]/[0-9][0-9]/2016|g'
This printed out the exact same thing.
So then I tried maybe changing the year by doing:
sed 's/[0-9][0-9][0-9][0-9]/2016/g'
but this didn't do anything either.
Using sed only:
echo "01/03/2015" | sed -e 's|\([0-9][0-9]\)/\([0-9][0-9]\)/\([0-9]\{4\}\)|year \3 month \1 day \2|'
When you need to skip the first 12 fields, you can use cut -F, -13- or use
echo "1,2,3,4,5,6,7,8,9,10,11,12,01/03/2015" | sed -e 's|\([^,]*,\)\{12\}\([0-9][0-9]\)/\([0-9][0-9]\)/\([0-9]\{4\}\)|year \4 month \2 day \3|'
Explanation:
You can mark matches with the construction using (something_to_match) to mark a match and a number to show what is marked. The () and numbers are all treated special, so they all need to be escaped with backslashes.
sed 's/\(match1\)......\(match2\)/and now \1 and \2/'
When you write it this way, don't forget that the characters between the matches should match too (the dots are actualy wildcards for one char each).
[0-9][0-9] you understand, but you can also say repeat [0-9] two (or four) times. Give the number in curly brackets, and the brackets are special so escape them.
When you want to use the curly brackets more often, the first line can be changed in
echo "01/03/2015" | sed -e 's|\([0-9]\{2\}\)/\([0-9]{2\}\)/\([0-9]\{4\}\)|year \3 month \1 day \2|'
Parsing the csv is easy with cut. Using the sed solution is just a challenge for learning sed better. What is that \([^,]*,\)?
Yes you are right, the \(\) is for matching the stuff in between. I want to match one field followed by a ,. How can you say you want to match a string without a , ? You use the negatioin ^ in the character class [,], so [^,] will match any character except the ,. Only once.
Using [^,]* will match a string without a ,.
The second , in \([^,]*,\) is ... just a plain ,.
The complete match is a the first field followed by a ,.
Now match the first 12 csv fields with {12}, but do not forget the backslashes.
In input string has slashes, so use another character like the | (you already found that):
sed 's|from|to|'
# or everything filled in
sed -e 's|\([^,]*,\)\{12\}\([0-9][0-9]\)/\([0-9][0-9]\)/\([0-9]\{4\}\)|year \4 month \2 day \3|'
^^^^^^^^^^ ^^^^ ^^^^^^^ ^^^^^^^^ ^^^^^^ ^^^ ^^^ ^^
field+, repeat month day year recall recall recall
Sed is simple, but I switch over to Perl when the regular expressions are more complicated. In the example above, converting 01/03/2015 to 01/02/2015:
echo 01/03/2015 | perl -pe 's/([0-9][0-9])\/([0-9][0-9])\/([0-9]{4})/\1\/02\/\3/'
There are three regular expression backreferences: month, day and year. I added back month and year, and changed the day to 02. The '{4}' in the regular expression means that there must be four matches.
If you just want to change the year, it would be:
echo 01/03/2015 | perl -pe 's/([0-9][0-9])\/([0-9][0-9])\/([0-9]{4})/\1\/\2\/2016/'

Dynamic delimiter in Unix

Input:-
echo "1234ABC89,234" # A
echo "0520001DEF78,66" # B
echo "46545455KRJ21,00"
From the above strings, I need to split the characters to get the alphabetic field and the number after that.
From "1234ABC89,234", the output should be:
ABC
89,234
From "0520001DEF78,66", the output should be:
DEF
78,66
I have many strings that I need to split like this.
Here is my script so far:
echo "1234ABC89,234" | cut -d',' -f1
but it gives me 1234ABC89 which isn't what I want.
Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:
echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'
This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.
Depending on the version of sed you can try:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'
or:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'
DEF
78,66
Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command
Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.
You could simply feed the data through sed:
sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
…process the two fields…
done
The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem — some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.
You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:
1234ABC89,234
0520001DEF78,66
46545455KRJ21,00
On your command line, try this sed command reading input.txt as file argument:
$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC 89,234
DEF 78,66
KRJ 21,00
How it works
uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
uses grouping ( and ), searches three groups:
firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
the last group searches for . meaning any character, + for one or more times
\2\t\3 then returns group 2 and group 3, with a tab separator
Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.

Using BASH, how to increment a number that uniquely only occurs once in most lines of an HTML file?

The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

Resources