Find 1st Letter of every word in a string - bash

How would I find the first letter of a word contained within a string using bash.
For example
Code:
str="my-custom-string'
I would want to find m,c,s. I know how to find the very first letter, but this is slightly more complicated.
Many thanks,

$ echo 'my-custom-string' | egrep -o '\b\w'
m
c
s

Pure Bash using parameter substitution. Remove minus, select first character of each word:
str="my-custom-string"
for word in ${str//-/ }; do
echo "${word:0:1}"
done
Result
m
c
s

Here's a sed version:
echo 'my-custom-string' | sed 's/\(^\|-\)\(.\)[^-]*/\2\n/g'

This might work for you (GNU sed);
echo 'my-custom-string' | sed 's/\B.//g;y/-/,/'
m,c,s
or:
echo 'my-custom-string' | sed 's/\B.//g;y/-/\n/'
m
c
s

Related

Convert first character to capital along with special character separator

I would like to convert first character to capital and character coming after dash(-) needs to be converted to capital using bash.
I can split individual elements using - ,
echo "string" | tr [:lower:] [:upper:]
and join all but that doesn't seem effect. Is there any easy way to take care of this using single line?
Input string:
JASON-CONRAD-983636
Expected string:
Jason-Conrad-983636
I recommend using Python for this:
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
Usage:
capitalize() {
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
}
echo JASON-CONRAD-983636 | capitalize
Output:
Jason-Conrad-983636
In pure bash (v4+) without any third party utils
str=JASON-CONRAD-983636
IFS=- read -ra raw <<<"$str"
final=()
for str in "${raw[#]}"; do
first=${str:0:1}
rest=${str:1}
final+=( "${first^^}${rest,,}" )
done
and print the result
( IFS=- ; printf '%s\n' "${final[*]}" ; )
This might work for you (GNU sed):
sed 's/.*/\L&/;s/\b./\u&/g' file
Lowercase everything. Uppercase first characters of words.
Alternative:
sed -E 's/\b(.)((\B.)*)/\u\1\L\2/g' file
Could you please try following(in case you are ok with awk).
var="JASON-CONRAD-983636"
echo "$var" | awk -F'-' '{for(i=1;i<=NF;i++){$i=substr($i,1,1) tolower(substr($i,2))}} 1' OFS="-"
Although the party is mostly over, please let me join with a perl solution:
perl -pe 's/(^|-)([^-]+)/$1 . ucfirst lc $2/ge' <<<"JASON-CONRAD-983636"
It may be cunning to use the ucfirst function :)

Bash matching part of string

Say I have a string like
s1="sxfn://xfn.oxbr.ac.uk:8843/xfn/mech2?XFN=/castor/
xf.oxbr.ac.uk/prod/oxbr.ac.uk/disk/xf20.m.ac.uk/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
or
s2="sxfn://xfn.gla.ac.uk:8841/xfn/mech2?XFN=/castor/
xf.gla.ac.uk/space/disk1/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
and I want in my script to extract the last part starting from prod/ i.e. "prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst". Note that $s1 contains two occurrences of "prod/".
What is the most elegant way to do this in bash?
Using BASH string manipulations you can do:
echo "prod/${s1##*prod/}"
prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst
echo "prod/${s2##*prod/}"
prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst
With awk (which is a little overpowered for this, but it may be helpful if you have a file full of these strings you need to parse:
echo "sxfn://xfn.gla.ac.uk:8841/xfn/mech2?XFN=/castor/xf.gla.ac.uk/space/disk1/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst" | awk -F"\/prod" '{print "/prod"$NF}'
That's splitting the string by '/prod' then printing out the '/prod' delimiter and the last token in the string ($NF)
sed can do it nicely:
s1="sxfn://xfn.oxbr.ac.uk:8843/xfn/mech2?XFN=/castor/xf.oxbr.ac.uk/prod/oxbr.ac.uk/disk/xf20.m.ac.uk/prod/v1.8/pienug_ib-2/reco_c21_dr3809_r35057.dst"
echo "$s1" | sed 's/.*\/prod/\/prod/'
this relies on the earger matching of the .* part up front.

Bash: replace 4 occourance of a string if exist

I have a string that is sometimes
xxx.11_222_33_44_555.yyy
and sometimes
xxx.11_222_33_44.yyy
I would like to:
Check if has 4 occourances of _ (figured out how to do it).
If so - remove string's _33 (the 33 string changes, can be any number), so I am left with xxx.11_222_44.yyy.
Using sed :
sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
It matches the four underscores and replace the whole by the needed parts.
Test run :
$ echo "xxx.11_222_33_44_555.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_44_555.yyy
$ echo "xxx.11_222_33_44.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_33_44.yyy
perhaps something like this
echo "xxx.11_222_33_44.yyy" | sed -e's/\.\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)\./.\1_\2_\4./'
which checks if there are 4 groups of numbers separated by _ between the two dots and if yes, it leaves out the third group
try this;
echo "xxx.11_222_33_44_555.yyy" | awk -F'_' 'NF>4{print $1"_"$2"_"$4"_"$5};'
Solution using perl and Lookahead and Lookbehind
$ a="xxx.11_222_33_44_555.yyy"
$ perl -pe 's/\.\d+_\d+_\K\d+_(?=\d+_\d+\.)//' <<< "$a"
xxx.11_222_44_555.yyy

how to chop last n bytes of a string in bash string choping?

for example qa_sharutils-2009-04-22-15-20-39, want chop last 20 bytes, and get 'qa_sharutils'.
I know how to do it in sed, but why $A=${A/.\{20\}$/} does not work?
Thanks!
If your string is stored in a variable called $str, then this will get you give you the substring without the last 20 digits in bash
${str:0:${#str} - 20}
basically, string slicing can be done using
${[variableName]:[startIndex]:[length]}
and the length of a string is
${#[variableName]}
EDIT:
solution using sed that works on files:
sed 's/.\{20\}$//' < inputFile
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
using awk:
echo $str | awk '{print substr($0,1,length($0)-20)}'
or using strings manipulation - echo ${string:position:length}:
echo ${str:0:$((${#str}-20))}
In the ${parameter/pattern/string} syntax in bash, pattern is a path wildcard-style pattern, not a regular expression. In wildcard syntax a dot . is just a literal dot and curly braces are used to match a choice of options (like the pipe | in regular expressions), so that line will simply erase the literal string ".20".
There are several ways to accomplish the basic task.
$ str="qa_sharutils-2009-04-22-15-20-39"
If you want to strip the last 20 characters. This substring selection is zero based:
$ echo ${str::${#str}-20}
qa_sharutils
The "%" and "%%" to strip from the right hand side of the string. For instance, if you want the basename, minus anything that follows the first "-":
$ echo ${str%%-*}
qa_sharutils
only if your last 20 bytes is always date.
$ str="qa_sharutils-2009-04-22-15-20-39"
$ IFS="-"
$ set -- $str
$ echo $1
qa_sharutils
$ unset IFS
or when first dash and beyond are not needed.
$ echo ${str%%-*}
qa_sharutils

How can I cut(1) camelcase words?

Is there an easy way in Bash to split a camelcased word into its constituent words?
For example, I want to split aCertainCamelCasedWord into 'a Certain Camel Cased Word' and be able to select those fields that interest me. This is trivially done with cut(1) when the word separator is the underscore, but how can I do this when the word is camelcased?
sed 's/\([A-Z]\)/ \1/g'
Captures each capital letter and substitutes a leading space with the capture for the whole stream.
$ echo "aCertainCamelCasedWord" | sed 's/\([A-Z]\)/ \1/g'
a Certain Camel Cased Word
This solution works if you need to not split up words that are all caps. For example, using the top answer you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z]\)/ \1/g'
F A Q Page
But instead with my solution, you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z][^A-Z]\)/ \1/g'
FAQ Page
Note: This does not work correctly when there is a second instance of multiple uppercase words, for example:
$ echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
This answer does not work correctly when there is a second instance of multiple uppercase
echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
So and additional expression is required for that
echo 'FAQPageOneReplacedByFAQPageTwo' | sed -e 's|\([A-Z][^A-Z]\)| \1|g' -e 's|\([a-z]\)\([A-Z]\)|\1 \2|g'
FAQ Page One Replaced By FAQ Page Two
Pure Bash:
name="aCertainCamelCasedWord"
declare -a word # the word array
counter1=0 # count characters
counter2=0 # count words
while [ $counter1 -lt ${#name} ] ; do
nextchar=${name:${counter1}:1}
if [[ $nextchar =~ [[:upper:]] ]] ; then
((counter2++))
word[${counter2}]=$nextchar
else
word[${counter2}]=${word[${counter2}]}$nextchar
fi
((counter1++))
done
echo -e "'${word[#]}'"

Resources