Regex not matching new line with parenthesis - ruby

I have this text :
#Heurtebise (Il ramasse son sac)
Vous regretterez de m'avoir fait du mal.
(Silence.) Vous me chassez ?
#Eurydice
Le mystère est mon ennemi. Je suis décidée à le combattre.
oui oui.
I want 2 matches of 2 groups, the result I want is :
Match 1
1. #Heurtebise (Il ramasse son sac)
2. Vous regretterez de m'avoir fait du mal.
(Silence.) Vous me chassez ?
Match 2
1. #Eurydice
2. Le mystère est mon ennemi. Je suis décidée à le combattre.
oui oui.
And I can't understand why my regex : /^(\#.+)$([^(\#|\#)]+)/ does not matches the 4th line beginning by a parenthesis. This is the result I have :
Match 1
1. #Heurtebise (Il ramasse son sac)
2. Vous regretterez de m'avoir fait du mal.
Match 2
1. #Eurydice
2. Le mystère est mon ennemi. Je suis décidée à le combattre.
oui oui.
Notice how it skips the line (Silence.) Vous me chassez ? in match 1. Can't understand why !
See the full case here : http://rubular.com/r/RR2eDc4ZBQ
Can someone help ? Thanks.

You may use
/^(#.+)((?:\R(?![##]).*)*)$/
See the regex demo. It will match any line starting with #, and then will match all consecutive lines that do not start with # or #.
Details
^ - start of a line
(#.+) - Group 1: # and the rest of the line
((?:\R(?![##]).*)*) - Group 2: 0 or more occurrences of:
\R(?![##]) - a line break sequence not followed with # or #
.* - the rest of the line
$ - end of line (not needed though).

The error is in the character class to exclude a line starting with # or #:
[^(\#|\#)] avoids # and # but also avoids (, | and ). A character class needs no alternation and parentheses. Using [^##] makes your sample code work for me.

Related

bash script sed or awk. replace string depending on whether it is odd or even

I m trying to replace a string "::" on a plain text for <b> or </b>. Depending of the first or second match without result with sed. The goal here is that the second tag </b> can be at the end of a paragraph not at the end of a line. i.e:
::Habiéndose calmado de distracciones, uno permanece completamente,
y la naturaleza superior es vista con los ojos de la sabiduría.::
must be
<b>Habiéndose calmado de distracciones, uno permanece completamente,
y la naturaleza superior es vista con los ojos de la sabiduría.</b>
I try it without result:
sed "s|::\(.*\)|\\<b>\1\</b>|g" EntrenamientoProgresivoSamadhi
Thank you in Advantage
Assumptions:
:: can occur more than once on a given line of input
:: never shows up as data (ie, we need to replace all occurrences of ::)
a solution using awk is acceptable
Adding some more data to our input:
$ cat file
::Habiéndose calmado de distracciones, uno permanece completamente,
y la naturaleza superior es vista con los ojos de la sabiduría.::
some more ::text1:: and then some more ::text2:: the end
One awk idea:
awk '
BEGIN { tag[0]="<b>"; tag[1]="</b>" }
{ while (sub(/::/,tag[c%2])) c++; print }
' file
This generates:
<b>Habiéndose calmado de distracciones, uno permanece completamente,
y la naturaleza superior es vista con los ojos de la sabiduría.</b>
some more <b>text1</b> and then some more <b>text2</b> the end
Using GNU sed
$ sed -Ez 's~::(.[^:]*)::~<b>\1</b>~' input_file
<b>Habiéndose calmado de distracciones, uno permanece completamente,
y la naturaleza superior es vista con los ojos de la sabiduría.</b>
Use awk and increment a counter variable. Then you can perform a different substitution depending on whether it's odd or event.
awk '/::/ && counter++ % 2 == 0 {sub("::", "<b>") }
/::/ {sub("::", "</b>") }
1'
Note that this will only work correctly if the start and end :: are on different lines.
This might work for you (GNU sed):
sed ':a;/::/{x;s/^/x/;/xx/{s///;x;s/::/<\/b>/;ba};x;s/::/<b>/;ba}' file
If a line contains ::, swap to the hold space and increment a counter by inserting an x at the start of the line.
If the counter is 2 i.e. the swap space contains xx, reset the counter and then swap back to the pattern space and replace :: by <\b> and go again (in the case that a line contains 2 or more :: strings).
Otherwise, swap back to the pattern space and replace :: by <b and go again (for reasons explained above).
All other lines are untouched.

sphinx gettext inserts empty quotes "" in front of previously matching msg

Currently my worflow when I change things in the original file is this:
make gettext to update *.pot files
sphinx-intl update -p build/gettext -l fr to create *.po files out of it
However, this always results in the following behavior: Some longer messages in the *.po files are not correctly updated or to be more correct they are updated although they didn't change. sphinx-intl update will insert quotes "" in front of every paragraph that spans over multiple lines. Here's how that looks:
Before: (in some french *.po file):
msgid "Some longer paragraph text that spans multiple lines. This text was just"
"lying here and matched a sequence within the file before gettext inserted"
"unnecessary quotes on top."
msgstr "Un texte de paragraphe plus long qui s'étend sur plusieurs lignes. Ce texte se trouvait juste ici et correspondait à une séquence dans le fichier avant que gettext n'insère des guillemets inutiles par-dessus."
After:
msgid ""
"Some longer paragraph text that spans multiple lines. This text was just"
"lying here and matched a sequence within the file before gettext inserted"
"unnecessary quotes on top."
msgstr ""
"Un texte de paragraphe plus long qui s'étend sur plusieurs lignes "
"Ce texte se trouvait juste ici et correspondait à une séquence dans le "
"fichier avant que gettext n'insère des guillemets inutiles par-dessus."
This is extremely annoying as it will not match anymore with the text it is supposed to! Only when I remove the leading "" the texts will match again. I wondered if this happens because I tend write my translated msgstr as one line without intermitted quotes (what are they good for anyway?). After sphinx-intl update they are enclosed in quotes...
What is going on and how can I prevent this?

How to check the value of several variables in a th: text thymeleaf

I find a lot of tutorial to check if a variable is null or not in a th: text.
But I can not find to check several, and change the text.
here is my example:
th:text="|${item?.startDate} ${item?.endDate} ${item?.startTime} ${item?.endTime}|"
See that sometimes one or more of its 4 variables can be null, so sometimes I have null displayed.
I therefore want to display
From XX to YY at 10 a.m. until 7 p.m.
And if possible, be able to use the locales 'From' 'To' 'to' and 'to' so that this line is multi-lingual.
Precision:
I use internalization well:
message1= Starts on(en), Débute le (fr)
message2= Ends on( en), se termine le (fr)
message 3= begins to (en), commence à(fr)
message4= finishes at (en), se termine à (fr)
So I want to Show
Fr: Débute le item.startDate se termine le item.endDate commence à item.startTime se termine à item.endTime
en: Starts on item.startDate ends on item.endDate begins to item.startTime finishes at item.endTime
but the problem is that sometimes I have item.startTime and / or item.endTime which is / are null
So I want to display the message partially:
Fr: Débute le item.startDate se termine le item.endDate
en: Starts on item.startDate ends on item.endDate
And sometimes I can have item.startTime and item.endTime not null but item.startDate and item.endDate null
So I will want to display the following message:
Fr: commence à item.startTime se termine à item.endTime
en: begins to item.startTime finishes at item.endTime
I can't find the correct syntax for this example
thanking you

Ruby remove parts of a string

I have a problem with some regular expressions in Ruby. This is the situation:
Input text:
"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
I need a regular expression witch can extract only useful text between "Abonează-te" word.
I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '') to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?
Instead of using regular expression, you can use String#split, then take the second part:
s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip # 3: at most 3 parts
# => "---- Here is some usefull text ---"
UPDATE
If you want to get multiple matches:
s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]
You could use string.scan function. You don't need to go for string.gsub function where you want to extract a particular text.
> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text ---
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]
Remove the newline \n character present inside the lookarounds if necessary. [\s\S]*? will do a non-greedy match of space or non-space characters zero or more times.
DEMO
Your regex syntax is incorrect . inside of a character class means match a dot literally, and the {*} matches an opening curly brace "zero or more" times followed by a closing curly brace.
You can match instead of replacing here.
s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()

convert certain special characters to ascii in php

I need a php script that convert certain special characters to ascii code( , . / - and all the letter with accent)
eg.
original:
Dingo a accidentellement fait tomber la pièce porte-bonheur de Mickey tout au fond du lac. Le Professeur Von Drake va utiliser son camping-car et le transformer en sous-marin pour explorer les eaux profondes.
result:
Dingo a accidentellement fait tomber la pièce porte-bonheur de Mickey tout au fond du lac. Le Professeur Von Drake va utiliser son camping-car et le transformer en sous-marin pour explorer les eaux profondes.
I've tried htmlspecialchars() doesn't seems work out it only convert the characters which are special significance in HTML
If you look at the documentation of htmlspecialchars() you will see:
If you require all input substrings that have associated named entities to be translated, use htmlentities() instead.

Resources