Bash to match pattern in filename then add/edit - bash

I'm sure this has been answered before, but I can't seem to use the right search terms to find it.
I'm trying to write a bash script that can recognize, sort, and rename files based on patterns in their names.
Take this filename, for example: BBC Something Something 3 of 5 Blah 2007.avi
I would like the script to recognize that since the filename starts with BBC and contains something that matches the pattern "DIGIT of DIGIT," the script should rename it by removing the BBC at the front, inserting the string "s01e0" in front of the 3, and removing the "of 5," turning it into Something Something s01e03 Blah 2007.avi
In addition, I'd like for the script to recognize and deal differently with a file named, for example, BBC Something Else 2009.mkv . In this case, I need the script to recognize that since the filename starts with BBC and ends with a year, but does not contain that "DIGIT of DIGIT" pattern, it should rename it by inserting the word "documentaries" after BBC and then copying and pasting the year after that, so that the filename would become BBC documentaries 2009 Something Else.mkv
I hope this isn't asking for too much help... I've been working on this myself all day, but this is literally all I've got:
topic1 () {
if [ "$2" = "bbc*[:digit:] of [:digit:]" ]; then
And then nothing. I'd love some help! Thanks!

Use grep to match filenames that need to be changed and then sed to actually change them:
#!/bin/bash
get_name()
{
local FILENAME="${1}"
local NEWNAME=""
# check if input matches our criteria
MATCH_EPISODE=$(echo "${FILENAME}" | grep -c "BBC.*[0-9] of [0-9]")
MATCH_DOCUMENTARY=$(echo "${FILENAME}" | grep -c "BBC.*[0-9]\{4\}")
# if it matches then modify
if [ "${MATCH_EPISODE}" = "1" ]; then
NEWNAME=$(echo "${FILENAME}" | sed -e 's/BBC\(.*\)\([0-9]\) of [0-9]\(.*\)/\1 s01e0\2 \3/')
elif [ "${MATCH_DOCUMENTARY}" = "1" ]; then
NEWNAME=$(echo "${FILENAME}" | sed -e 's/BBC\(.*\)\([0-9]\{4\}\)\(.*\)/BBC documentaries \2 \1 \3/')
fi
# clean up: remove trailing spaces, double spaces, spaces before dot
echo "${NEWNAME}" | sed -e 's/^ *//' -e 's/ / /g' -e 's/ \./\./g'
}
FN1="BBC Something Something 3 of 5 Blah 2007.avi"
FN2="BBC Something Else 2009.mkv"
FN3="Something Not From BBC.mkv"
NN1=$(get_name "${FN1}")
NN2=$(get_name "${FN2}")
NN3=$(get_name "${FN3}")
echo "${FN1} -> ${NN1}"
echo "${FN2} -> ${NN2}"
echo "${FN3} -> ${NN3}"
The output is:
BBC Something Something 3 of 5 Blah 2007.avi -> Something Something s01e03 Blah 2007.avi
BBC Something Else 2009.mkv -> BBC documentaries 2009 Something Else.mkv
Something Not From BBC.mkv ->
Let's see at one of sed invocations:
sed -e 's/BBC\(.*\)\([0-9]\) of [0-9]\(.*\)/\1 s01e0\2 \3/'
We use capture groups to match interesting portions of the filename:
BBC - match literal BBC,
\(.*\) - match everything and remember it in capture group 1, until
\([0-9]\) - a digit, remember it in capture group 2, then
of [0-9] - match literal " of " and digit,
\(.*\) - match rest and remember it in capture group 3
and then put them in positions we want:
\1 - content of capture group 1, i.e. everything between "BBC" and first digit
s01e0 - literal " s01e0"
\2 - content of capture group 2, i.e. episode number
\3 - content of capture group 3, i.e. everything else
This may result in many superfluous spaces so at the end there is another sed invocation to clean that up.

Related

Double quotes containing variable not working in sed [duplicate]

In my bash script I have an external (received from user) string, which I should use in sed pattern.
REPLACE="<funny characters here>"
sed "s/KEYWORD/$REPLACE/g"
How can I escape the $REPLACE string so it would be safely accepted by sed as a literal replacement?
NOTE: The KEYWORD is a dumb substring with no matches etc. It is not supplied by user.
Warning: This does not consider newlines. For a more in-depth answer, see this SO-question instead. (Thanks, Ed Morton & Niklas Peter)
Note that escaping everything is a bad idea. Sed needs many characters to be escaped to get their special meaning. For example, if you escape a digit in the replacement string, it will turn in to a backreference.
As Ben Blank said, there are only three characters that need to be escaped in the replacement string (escapes themselves, forward slash for end of statement and & for replace all):
ESCAPED_REPLACE=$(printf '%s\n' "$REPLACE" | sed -e 's/[\/&]/\\&/g')
# Now you can use ESCAPED_REPLACE in the original sed statement
sed "s/KEYWORD/$ESCAPED_REPLACE/g"
If you ever need to escape the KEYWORD string, the following is the one you need:
sed -e 's/[]\/$*.^[]/\\&/g'
And can be used by:
KEYWORD="The Keyword You Need";
ESCAPED_KEYWORD=$(printf '%s\n' "$KEYWORD" | sed -e 's/[]\/$*.^[]/\\&/g');
# Now you can use it inside the original sed statement to replace text
sed "s/$ESCAPED_KEYWORD/$ESCAPED_REPLACE/g"
Remember, if you use a character other than / as delimiter, you need replace the slash in the expressions above wih the character you are using. See PeterJCLaw's comment for explanation.
Edited: Due to some corner cases previously not accounted for, the commands above have changed several times. Check the edit history for details.
The sed command allows you to use other characters instead of / as separator:
sed 's#"http://www\.fubar\.com"#URL_FUBAR#g'
The double quotes are not a problem.
The only three literal characters which are treated specially in the replace clause are / (to close the clause), \ (to escape characters, backreference, &c.), and & (to include the match in the replacement). Therefore, all you need to do is escape those three characters:
sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
Example:
$ export REPLACE="'\"|\\/><&!"
$ echo fooKEYWORDbar | sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
foo'"|\/><&!bar
Based on Pianosaurus's regular expressions, I made a bash function that escapes both keyword and replacement.
function sedeasy {
sed -i "s/$(echo $1 | sed -e 's/\([[\/.*]\|\]\)/\\&/g')/$(echo $2 | sed -e 's/[\/&]/\\&/g')/g" $3
}
Here's how you use it:
sedeasy "include /etc/nginx/conf.d/*" "include /apps/*/conf/nginx.conf" /etc/nginx/nginx.conf
It's a bit late to respond... but there IS a much simpler way to do this. Just change the delimiter (i.e., the character that separates fields). So, instead of s/foo/bar/ you write s|bar|foo.
And, here's the easy way to do this:
sed 's|/\*!50017 DEFINER=`snafu`#`localhost`\*/||g'
The resulting output is devoid of that nasty DEFINER clause.
It turns out you're asking the wrong question. I also asked the wrong question. The reason it's wrong is the beginning of the first sentence: "In my bash script...".
I had the same question & made the same mistake. If you're using bash, you don't need to use sed to do string replacements (and it's much cleaner to use the replace feature built into bash).
Instead of something like, for example:
function escape-all-funny-characters() { UNKNOWN_CODE_THAT_ANSWERS_THE_QUESTION_YOU_ASKED; }
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A="$(escape-all-funny-characters 'KEYWORD')"
B="$(escape-all-funny-characters '<funny characters here>')"
OUTPUT="$(sed "s/$A/$B/g" <<<"$INPUT")"
you can use bash features exclusively:
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A='KEYWORD'
B='<funny characters here>'
OUTPUT="${INPUT//"$A"/"$B"}"
Use awk - it is cleaner:
$ awk -v R='//addr:\\file' '{ sub("THIS", R, $0); print $0 }' <<< "http://file:\_THIS_/path/to/a/file\\is\\\a\\ nightmare"
http://file:\_//addr:\file_/path/to/a/file\\is\\\a\\ nightmare
Here is an example of an AWK I used a while ago. It is an AWK that prints new AWKS. AWK and SED being similar it may be a good template.
ls | awk '{ print "awk " "'"'"'" " {print $1,$2,$3} " "'"'"'" " " $1 ".old_ext > " $1 ".new_ext" }' > for_the_birds
It looks excessive, but somehow that combination of quotes works to keep the ' printed as literals. Then if I remember correctly the vaiables are just surrounded with quotes like this: "$1". Try it, let me know how it works with SED.
These are the escape codes that I've found:
* = \x2a
( = \x28
) = \x29
" = \x22
/ = \x2f
\ = \x5c
' = \x27
? = \x3f
% = \x25
^ = \x5e
sed is typically a mess, especially the difference between gnu-sed and bsd-sed
might just be easier to place some sort of sentinel at the sed side, then a quick pipe over to awk, which is far more flexible in accepting any ERE regex, escaped hex, or escaped octals.
e.g. OFS in awk is the true replacement ::
date | sed -E 's/[0-9]+/\xC1\xC0/g' |
mawk NF=NF FS='\xC1\xC0' OFS='\360\237\244\241'
1 Tue Aug 🤡 🤡:🤡:🤡 EDT 🤡
(tested and confirmed working on both BSD-sed and GNU-sed - the emoji isn't a typo that's what those 4 bytes map to in UTF-8 )
There are dozens of answers out there... If you don't mind using a bash function schema, below is a good answer. The objective below was to allow using sed with practically any parameter as a KEYWORD (F_PS_TARGET) or as a REPLACE (F_PS_REPLACE). We tested it in many scenarios and it seems to be pretty safe. The implementation below supports tabs, line breaks and sigle quotes for both KEYWORD and replace REPLACE.
NOTES: The idea here is to use sed to escape entries for another sed command.
CODE
F_REVERSE_STRING_R=""
f_reverse_string() {
: 'Do a string reverse.
To undo just use a reversed string as STRING_INPUT.
Args:
STRING_INPUT (str): String input.
Returns:
F_REVERSE_STRING_R (str): The modified string.
'
local STRING_INPUT=$1
F_REVERSE_STRING_R=$(echo "x${STRING_INPUT}x" | tac | rev)
F_REVERSE_STRING_R=${F_REVERSE_STRING_R%?}
F_REVERSE_STRING_R=${F_REVERSE_STRING_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/2705678/3223785 ]
F_POWER_SED_ECP_R=""
f_power_sed_ecp() {
: 'Escape strings for the "sed" command.
Escaped characters will be processed as is (e.g. /n, /t ...).
Args:
F_PSE_VAL_TO_ECP (str): Value to be escaped.
F_PSE_ECP_TYPE (int): 0 - For the TARGET value; 1 - For the REPLACE value.
Returns:
F_POWER_SED_ECP_R (str): Escaped value.
'
local F_PSE_VAL_TO_ECP=$1
local F_PSE_ECP_TYPE=$2
# NOTE: Operational characters of "sed" will be escaped, as well as single quotes.
# By Questor
if [ ${F_PSE_ECP_TYPE} -eq 0 ] ; then
# NOTE: For the TARGET value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[]\/$*.^[]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
else
# NOTE: For the REPLACE value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[\/&]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
fi
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R%?}
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/24134488/3223785 ,
# https://stackoverflow.com/a/21740695/3223785 ,
# https://unix.stackexchange.com/a/655558/61742 ,
# https://stackoverflow.com/a/11461628/3223785 ,
# https://stackoverflow.com/a/45151986/3223785 ,
# https://linuxaria.com/pills/tac-and-rev-to-see-files-in-reverse-order ,
# https://unix.stackexchange.com/a/631355/61742 ]
F_POWER_SED_R=""
f_power_sed() {
: 'Facilitate the use of the "sed" command. Replaces in files and strings.
Args:
F_PS_TARGET (str): Value to be replaced by the value of F_PS_REPLACE.
F_PS_REPLACE (str): Value that will replace F_PS_TARGET.
F_PS_FILE (Optional[str]): File in which the replacement will be made.
F_PS_SOURCE (Optional[str]): String to be manipulated in case "F_PS_FILE" was
not informed.
F_PS_NTH_OCCUR (Optional[int]): [1~n] - Replace the nth match; [n~-1] - Replace
the last nth match; 0 - Replace every match; Default 1.
Returns:
F_POWER_SED_R (str): Return the result if "F_PS_FILE" is not informed.
'
local F_PS_TARGET=$1
local F_PS_REPLACE=$2
local F_PS_FILE=$3
local F_PS_SOURCE=$4
local F_PS_NTH_OCCUR=$5
if [ -z "$F_PS_NTH_OCCUR" ] ; then
F_PS_NTH_OCCUR=1
fi
local F_PS_REVERSE_MODE=0
if [ ${F_PS_NTH_OCCUR} -lt -1 ] ; then
F_PS_REVERSE_MODE=1
f_reverse_string "$F_PS_TARGET"
F_PS_TARGET="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_REPLACE"
F_PS_REPLACE="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_SOURCE"
F_PS_SOURCE="$F_REVERSE_STRING_R"
F_PS_NTH_OCCUR=$((-F_PS_NTH_OCCUR))
fi
f_power_sed_ecp "$F_PS_TARGET" 0
F_PS_TARGET=$F_POWER_SED_ECP_R
f_power_sed_ecp "$F_PS_REPLACE" 1
F_PS_REPLACE=$F_POWER_SED_ECP_R
local F_PS_SED_RPL=""
if [ ${F_PS_NTH_OCCUR} -eq -1 ] ; then
# NOTE: We kept this option because it performs better when we only need to replace
# the last occurrence. By Questor
# [Ref(s).: https://linuxhint.com/use-sed-replace-last-occurrence/ ,
# https://unix.stackexchange.com/a/713866/61742 ]
F_PS_SED_RPL="'s/\(.*\)$F_PS_TARGET/\1$F_PS_REPLACE/'"
elif [ ${F_PS_NTH_OCCUR} -gt 0 ] ; then
# [Ref(s).: https://unix.stackexchange.com/a/587924/61742 ]
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/$F_PS_NTH_OCCUR'"
elif [ ${F_PS_NTH_OCCUR} -eq 0 ] ; then
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/g'"
fi
# NOTE: As the "sed" commands below always process literal values for the "F_PS_TARGET"
# so we use the "-z" flag in case it has multiple lines. By Quaestor
# [Ref(s).: https://unix.stackexchange.com/a/525524/61742 ]
if [ -z "$F_PS_FILE" ] ; then
F_POWER_SED_R=$(echo "x${F_PS_SOURCE}x" | eval "sed -z $F_PS_SED_RPL")
F_POWER_SED_R=${F_POWER_SED_R%?}
F_POWER_SED_R=${F_POWER_SED_R#?}
if [ ${F_PS_REVERSE_MODE} -eq 1 ] ; then
f_reverse_string "$F_POWER_SED_R"
F_POWER_SED_R="$F_REVERSE_STRING_R"
fi
else
if [ ${F_PS_REVERSE_MODE} -eq 0 ] ; then
eval "sed -i -z $F_PS_SED_RPL \"$F_PS_FILE\""
else
tac "$F_PS_FILE" | rev | eval "sed -z $F_PS_SED_RPL" | tac | rev > "$F_PS_FILE"
fi
fi
}
MODEL
f_power_sed "F_PS_TARGET" "F_PS_REPLACE" "" "F_PS_SOURCE"
echo "$F_POWER_SED_R"
EXAMPLE
f_power_sed "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" "[ ]+|$/,\"\0\"" "" "Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate that concatenation of the final \", \" then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,\"\0\"); print; }' <<<\"$string\") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readar"
echo "$F_POWER_SED_R"
IF YOU JUST WANT TO ESCAPE THE PARAMETERS TO THE SED COMMAND
MODEL
# "TARGET" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 0
echo "$F_POWER_SED_ECP_R"
# "REPLACE" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 1
echo "$F_POWER_SED_ECP_R"
IMPORTANT: If the strings for KEYWORD and/or replace REPLACE contain tabs or line breaks you will need to use the "-z" flag in your "sed" command. More details here.
EXAMPLE
f_power_sed_ecp "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" 0
echo "$F_POWER_SED_ECP_R"
f_power_sed_ecp "[ ]+|$/,\"\0\"" 1
echo "$F_POWER_SED_ECP_R"
NOTE: The f_power_sed_ecp and f_power_sed functions above was made available completely free as part of this project ez_i - Create shell script installers easily!.
Standard recommendation here: use perl :)
echo KEYWORD > /tmp/test
REPLACE="<funny characters here>"
perl -pi.bck -e "s/KEYWORD/${REPLACE}/g" /tmp/test
cat /tmp/test
don't forget all the pleasure that occur with the shell limitation around " and '
so (in ksh)
Var=">New version of \"content' here <"
printf "%s" "${Var}" | sed "s/[&\/\\\\*\\"']/\\&/g' | read -r EscVar
echo "Here is your \"text\" to change" | sed "s/text/${EscVar}/g"
If the case happens to be that you are generating a random password to pass to sed replace pattern, then you choose to be careful about which set of characters in the random string. If you choose a password made by encoding a value as base64, then there is is only character that is both possible in base64 and is also a special character in sed replace pattern. That character is "/", and is easily removed from the password you are generating:
# password 32 characters log, minus any copies of the "/" character.
pass=`openssl rand -base64 32 | sed -e 's/\///g'`;
If you are just looking to replace Variable value in sed command then just remove
Example:
sed -i 's/dev-/dev-$ENV/g' test to sed -i s/dev-/dev-$ENV/g test
I have an improvement over the sedeasy function, which WILL break with special characters like tab.
function sedeasy_improved {
sed -i "s/$(
echo "$1" | sed -e 's/\([[\/.*]\|\]\)/\\&/g'
| sed -e 's:\t:\\t:g'
)/$(
echo "$2" | sed -e 's/[\/&]/\\&/g'
| sed -e 's:\t:\\t:g'
)/g" "$3"
}
So, whats different? $1 and $2 wrapped in quotes to avoid shell expansions and preserve tabs or double spaces.
Additional piping | sed -e 's:\t:\\t:g' (I like : as token) which transforms a tab in \t.
An easier way to do this is simply building the string before hand and using it as a parameter for sed
rpstring="s/KEYWORD/$REPLACE/g"
sed -i $rpstring test.txt

Split a string to print first two characters delimited by "-" In Bash

I am listing the AWS region names.
us-east-1
ap-southeast-1
I want to split the string to print specific first characters delimited by - i.e. 'two characters'-'one character'-'one character'. So us-east-1 should be printed as use1 and ap-southeast-1 should be printed as aps1
I have tried this and it's giving me expected results. I was thinking if there is a shorter way to achieve this.
region=us-east-1
regionlen=$(echo -n $region | wc -m)
echo $region | sed 's/-//' | cut -c 1-3,expr $regionlen - 2-expr $regionlen - 1
How about using sed:
echo "$region" | sed -E 's/^(.[^-]?)[^-]*-(.)[^-]*-(.).*$/\1\2\3/'
Explanation: the s/pattern/replacement/ command picks out the relevant parts of the region name, replacing the entire name with just the relevant bits. The pattern is:
^ - the beginning of the string
(.[^-]?) - the first character, and another (if it's not a dash)
[^-]* - any more things up to a dash
- - a dash (the first one)
(.) - The first character of the second word
[^-]*- - the rest of the second word, then the dash
(.) - The first character of the third word
.*$ - Anything remaining through the end
The bits in parentheses get captured, so \1\2\3 pulls them out and replaces the whole thing with just those.
IFS influencing field splitting step of parameter expansion:
$ str=us-east-2
$ IFS=- eval 'set -- $str'
$ echo $#
3
$ echo $1
us
$ echo $2
east
$ echo $3
No external utilities; just processing in the language.
This is how smartly written build configuration scripts parse version numbers like 1.13.4 and architecture strings like i386-gnu-linux.
The eval can be avoided, if we save and restore IFS.
$ save_ifs=$IFS; set -- $str; IFS=$save_ifs
Using bash, and assuming that you need to distinguish between things like southwest and southeast:
s=ap-southwest-1
a=${s:0:2}
b=${s#*-}
b=${b%-*}
c=${s##*-}
bb=
case "$b" in
south*) bb+=s ;;&
north*) bb+=n ;;&
*east*) bb+=e ;;
*west*) bb+=w ;;
esac
echo "$a$bb$c"
How about:
region="us-east-1"
echo "$region" | (IFS=- read -r a b c; echo "$a${b:0:1}${c:0:1}")
use1
A simple sed -
$: printf "us-east-1\nap-southeast-1\n" |
sed -E 's/-(.)[^-]*/\1/g'
To keep noncardinal specifications like southeast distinct from south at the cost of adding an optional additional character -
$: printf "us-east-1\nap-southeast-1\n" |
sed -E '
s/north/n/;
s/south/s/;
s/east/e/;
s/west/w/;
s/-//g;'
If you could have south-southwest, add g to those directional reductions.
if you MUST have exactly 4 characters of output, I recommend mapping the eight or 16 map directions to specific characters, so that north is N, northeast is maybe O and northwest M... that sort of thing.

sed -r (e.g echo "aa" | sed -r 's/o*/_/g')

I tried the command below on terminal
echo "aa" | sed -r 's/o*/_/g'
The result was like this
_a_a_
What I expected was
__
When the first a is read, it means that there's zero o so that a would be replaced by _
If I use
echo "aoooa" | sed -r 's/o+/_/g'
The result is as what I expected
a_a
But when I use
echo "aoooa" | sed -r /s/o*/_/g
The result is
_a_a_
I think it should be _a_a and without the last _
I totally had no idea why there were 3 underscores. Could anyone tell me the running process of this specific case?
[From my comment:] o* does not match the "a", it matches the zero-length string before the "a" (and the zero-length string between the "a"s, and the zero-length string after the second "a"). The "a" marks the end of the match, but it is not part of the match, and hence does not get replaced.
To make this clearer, consider what happens when there are runs of "o"s (and I used capital "A"s to make them more visible):
$ echo "ooooAooooAoooo" | sed 's/o*/_/g'
_A_A_
...each group of "o"s gets replaced by a single "_"; the "A"s end the first two groups, but they aren't part of the groups (they're between the groups), so they get left alone.
If there's just one "o" in a group, the same thing happens:
$ echo "oAoAo" | sed 's/o*/_/g'
_A_A_
...and finally, with no "o"s (i.e. with zero-length groups of "o"s):
$ echo "AA" | sed 's/o*/_/g'
_A_A_
...again, the "A"s aren't part of the matches, so they don't get replaced.
If the output desired is anything starting with o, with underscore on letters that are not o, you could use:
echo "abcdefhijklmnopqrstuvwxyzoo" | sed -r 's/[^o]/_/g'
Output: _____________o___________oo

Print word between two characters by going backward in the line

I having problems in extracting the word from a line. What i want is that it picks the first word before the symbol # but after the /. Which is the only delimiter that stand out.
A line looks like this:
,["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
I want the word Programming.
To get that line i am using this which narrows it down.
sed -n '/.*picasa.*.jpg/p' 5743548866439293105
So i want it to pretty much find # and then go backward until it hit the first /. Then print it out. In this case the word should be Programming but could be anything.
I want it to be as short as possible and have experimented with
sed -n '/.*picasa.*.jpg/p' 5743548866439293105 | awk '$0=$2' FS="/" RS="[$#]"
You can do that with sed (slightly shortened for formatting but works on your original string as well):
pax> echo ',["https://p.g.com/111/Prog#574' | sed 's/^[^#]*\/\([^#]*\)#.*$/\1/'
Prog
pax>
Explaining in more detail:
/---+------------------> greedy capture up to '/'.
/ |
| | /------+---------> capture the stuff between '/' and '#'.
| |/ |
| || | /-+-----> everything from '#' to end of line.
| || |/ |
| || || |
's/^[^#]*\/\([^#]*\)#.*$/\1/'
||
\+---> replace with captured group.
It basically searches for an entire line that has the pattern you want (first # following a /), whilst capturing (with the \( and \) brackets) just the stuff between / and #.
The substitution then replaces the entire line with just that captured text you're interested in (via \1).
Using grep with some Perl regex extensions:
echo $string | grep -P -o "(?<=/)[^/]+(?=#)"
-P tells grep to use Perl extensions. -o tells grep to display only the matched text. To understand what gets matched, break the regex into three parts: (?<=/), [^/]+?, and (?=#). The first part says that the matched text must follow a '/', without including the '/' in the match. The second parts matches a string of non-'/' characters. The last part says that the matched text must be immediately followed by a '#', without including the '#' in the match.
Another grep, using the "\K" feature to "throw away" the match up to the last '/' before the '#':
# Match as much as possible up to a '/', but throw it away, then match as much as you can
# up to the first #
echo $string | grep -oP ".*/\K.+(?=#)"
Using cut and awk to get the first field (splitting on #) followed by the last field (splitting on /):
echo $string | cut -d# -f1 | awk -F/ '{print $NF}'
Using some temporary variables and bash's parameter expansion facilities:
$ FOO=["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
$ BAR=${FOO%#*} # Strip the last # and everything after
$ echo $BAR
[https://picasaweb.google.com/111560558537332305125/Programming
$ BAZ=${BAR##*/} # Strip everything up to and including the last /
$ echo $BAZ
Programming
This might work for you:
sed '/.*\/\([^#]*\)#.*/{s//\1/;q};d' file

How can I cut(1) camelcase words?

Is there an easy way in Bash to split a camelcased word into its constituent words?
For example, I want to split aCertainCamelCasedWord into 'a Certain Camel Cased Word' and be able to select those fields that interest me. This is trivially done with cut(1) when the word separator is the underscore, but how can I do this when the word is camelcased?
sed 's/\([A-Z]\)/ \1/g'
Captures each capital letter and substitutes a leading space with the capture for the whole stream.
$ echo "aCertainCamelCasedWord" | sed 's/\([A-Z]\)/ \1/g'
a Certain Camel Cased Word
This solution works if you need to not split up words that are all caps. For example, using the top answer you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z]\)/ \1/g'
F A Q Page
But instead with my solution, you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z][^A-Z]\)/ \1/g'
FAQ Page
Note: This does not work correctly when there is a second instance of multiple uppercase words, for example:
$ echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
This answer does not work correctly when there is a second instance of multiple uppercase
echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
So and additional expression is required for that
echo 'FAQPageOneReplacedByFAQPageTwo' | sed -e 's|\([A-Z][^A-Z]\)| \1|g' -e 's|\([a-z]\)\([A-Z]\)|\1 \2|g'
FAQ Page One Replaced By FAQ Page Two
Pure Bash:
name="aCertainCamelCasedWord"
declare -a word # the word array
counter1=0 # count characters
counter2=0 # count words
while [ $counter1 -lt ${#name} ] ; do
nextchar=${name:${counter1}:1}
if [[ $nextchar =~ [[:upper:]] ]] ; then
((counter2++))
word[${counter2}]=$nextchar
else
word[${counter2}]=${word[${counter2}]}$nextchar
fi
((counter1++))
done
echo -e "'${word[#]}'"

Resources