I'm trying to select just the x
舌ぽう (舌鋒x) ぜっぽう (sharp) tongue
じょう舌 (饒x舌) じょうぜつ garrulity, loquacity
Its always in parentheses. So I want to LookBehind for a left parenthesis then zero or more characters, and LookAhead for zero or more character followed by a parenthesis.
I thought this would work, but it doesn't: (?<=\(.?)x(?=.?\))
This one will select all the text between the parentheses, but I only want the x: (?<=\().?x.?(?=\))
I also tried this (not sure if you can have two lookbehinds...but it didn't work:
(?<=\()(?<=.?)x(?=.?)(?=\))
I'm out of ideas.
Grep doesn't support variable length lookarounds. You could do something like:
# grep for x inside parenthesis
$ grep -Po '\(.*x.*?\)' file
(舌鋒x)
(饒x舌)
# pipe to grep again for just the x
$ grep -Po '\(.*x.*?\)' file | grep -o x
x
x
from OP's comment, I assume that the brackets are always paired.
The problem here is the look-behind needs a fixed length, which in this question we cannot know.
But if the (..) are always paired, we could just check the closing ). This should give what you want:
grep -Po "x(?=[^)]*\))" file
let's test it a bit:
kent$ echo "舌ぽう (舌鋒x) ぜっぽう (sharp) tongue
じょう舌 (饒x舌) じょうぜつ garrulity, loquacity"|grep -Po "x(?=[^)]*\))"
x
x
another test, I added y in brackets, and assume we want that y too:
kent$ echo "舌ぽう (舌y鋒x) ぜっぽう (sharp) tongue
じょう舌 (y饒x舌) じょうぜつ garrulity, loquacity"|grep -Po "[yx](?=[^)]*\))"
y
x
y
x
EDIT
grep -Po "x(?=[^)(]*\))" file
this should be ok:
kent$ echo "じょうx舌 (饒x舌) じょうぜつ garrxlity, loquacity"|grep -Po "x(?=[^)(]*\))"
x
can't do it all in one, but can do it with multiple statements, one for each specific case:
(?<=\(.)x(?=.\))
(?<=\(.{2})x(?=.{1}\))
(?<=\(.{2})x(?=.{2}\))
(?<=\(.{1})x(?=.{2}\))
(?<=\(.{3})x(?=.{1}\))
etc
Related
"$" should not be immediately followed by digits [0-9]. It should only show the
output- "$" which is immediately followed by aphabet/alphanumeric/alphacharacter.
Input: dirname $0/../bin/$12JAVA_INV/$FILE12NAME
Output: $FILE12NAME
grep -o '[$][a-zA-z_]*'
Using this I'm receiving an output as: $ $ $FILENAME
You're getting $ in the result because * means to match zero or more of the preceding pattern. $0 matches because it has a $ followed by 0 letters.
If you want at least 1 letter, use + instead, it means one or more.
But if you want to be able to match $FILE12NAME, you also need to allow digits after the first character. So use:
grep -i -o '\$[a-z_][a-z_0-9]*'
This matches $, followed by a letter or underscore, followed by zero or more letters, underscores, or numbers.
It looks like you want:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | awk '{print $NF}' FS=/
$FILE12NAME
But if you really want to parse it the way you describe, you could do either of:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -e 's/.*\(\$[^0-9]\)/\1/'
$FILE12NAME
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -E 's/.*(\$[^0-9])/\1/'
$FILE12NAME
How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '
I have a string that is sometimes
xxx.11_222_33_44_555.yyy
and sometimes
xxx.11_222_33_44.yyy
I would like to:
Check if has 4 occourances of _ (figured out how to do it).
If so - remove string's _33 (the 33 string changes, can be any number), so I am left with xxx.11_222_44.yyy.
Using sed :
sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
It matches the four underscores and replace the whole by the needed parts.
Test run :
$ echo "xxx.11_222_33_44_555.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_44_555.yyy
$ echo "xxx.11_222_33_44.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_33_44.yyy
perhaps something like this
echo "xxx.11_222_33_44.yyy" | sed -e's/\.\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)\./.\1_\2_\4./'
which checks if there are 4 groups of numbers separated by _ between the two dots and if yes, it leaves out the third group
try this;
echo "xxx.11_222_33_44_555.yyy" | awk -F'_' 'NF>4{print $1"_"$2"_"$4"_"$5};'
Solution using perl and Lookahead and Lookbehind
$ a="xxx.11_222_33_44_555.yyy"
$ perl -pe 's/\.\d+_\d+_\K\d+_(?=\d+_\d+\.)//' <<< "$a"
xxx.11_222_44_555.yyy
I want to get the string between <sometag param=' and '>
I tried to use the method from Get any string between 2 string and assign a variable in bash to get the "x":
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*/\1/p'
The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:
x_><irrelevant stuff=_nonsense
but the correct output would be the minimum-match, in this example just "x"
Thanks for your help
You are probably looking for something like this:
sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Results:
x
Explanation:
Instead of a greedy capture, use a non-greedy capture like: [^']* which means match anything except ' any number of times. To make the pattern stick, this is followed by: '>.
You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:
-
... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*/\1/p'
Notice how that the single quotes aren't really escaped. The sed expression is stopped, an escaped single quote is inserted and the sed expression is re-opened. Think of it like a four character escape sequence.
Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:
... | grep -oP "(?<=<sometag param=').*?(?='>)"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"
Results:
x
You don't have to assemble regexes in those cases, you can just use ' as the field separator
in="<sometag param='x'><irrelevant stuff='nonsense'>"
IFS="'" read x whatiwant y <<< "$in" # bash
echo "$whatiwant"
awk -F\' '{print $2}' <<< "$in" # awk
How would I find the first letter of a word contained within a string using bash.
For example
Code:
str="my-custom-string'
I would want to find m,c,s. I know how to find the very first letter, but this is slightly more complicated.
Many thanks,
$ echo 'my-custom-string' | egrep -o '\b\w'
m
c
s
Pure Bash using parameter substitution. Remove minus, select first character of each word:
str="my-custom-string"
for word in ${str//-/ }; do
echo "${word:0:1}"
done
Result
m
c
s
Here's a sed version:
echo 'my-custom-string' | sed 's/\(^\|-\)\(.\)[^-]*/\2\n/g'
This might work for you (GNU sed);
echo 'my-custom-string' | sed 's/\B.//g;y/-/,/'
m,c,s
or:
echo 'my-custom-string' | sed 's/\B.//g;y/-/\n/'
m
c
s