Ruby remove parts of a string - ruby
I have a problem with some regular expressions in Ruby. This is the situation:
Input text:
"NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
I need a regular expression witch can extract only useful text between "Abonează-te" word.
I tried this result = result.gsub(/^[.]{*}\nAbonează-te/, '') to remove the text from the start of the string to the 'Abonează-te' word, but this does not work. I have no ideea how to solve this situation. Can you help me?
Instead of using regular expression, you can use String#split, then take the second part:
s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
Abonează-te
---- Here is some usefull text ---
Abonează-te
× Citeşte mai mult »
Adauga un comentariu"
s.split('Abonează-te', 3)[1].strip # 3: at most 3 parts
# => "---- Here is some usefull text ---"
UPDATE
If you want to get multiple matches:
s = "NU
Abonează-te
-- Here's some
Abonează-te
text --
Abonează-te
comentariu"
s.split('Abonează-te')[1..-2].map(&:strip)
# => ["-- Here's some", "text --"]
You could use string.scan function. You don't need to go for string.gsub function where you want to extract a particular text.
> s = "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”
" Publicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35
" Adresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla
" Abonează-te
" ---- Here is some usefull text ---
" Abonează-te
" × Citeşte mai mult »
" Adauga un comentariu"
=> "NU POSTA aşa ceva pe Facebook! „Prostia se plăteşte”\nPublicat la: 10.02.2015 10:20 Ultima actualizare: 10.02.2015 10:35\nAdresa de e-mail la care vrei sa primesti STIREA atunci cand se intampla\nAbonează-te\n---- Here is some usefull text --- \nAbonează-te\n× Citeşte mai mult »\nAdauga un comentariu"
irb(main):010:0> s.scan(/(?<=Abonează-te\n)[\s\S]*?(?=\nAbonează-te)/)
=> ["---- Here is some usefull text --- "]
Remove the newline \n character present inside the lookarounds if necessary. [\s\S]*? will do a non-greedy match of space or non-space characters zero or more times.
DEMO
Your regex syntax is incorrect . inside of a character class means match a dot literally, and the {*} matches an opening curly brace "zero or more" times followed by a closing curly brace.
You can match instead of replacing here.
s.match(/Abonează-te(.*?)Abonează-te/m)[1].strip()
Related
bash script sed or awk. replace string depending on whether it is odd or even
I m trying to replace a string "::" on a plain text for <b> or </b>. Depending of the first or second match without result with sed. The goal here is that the second tag </b> can be at the end of a paragraph not at the end of a line. i.e: ::Habiéndose calmado de distracciones, uno permanece completamente, y la naturaleza superior es vista con los ojos de la sabiduría.:: must be <b>Habiéndose calmado de distracciones, uno permanece completamente, y la naturaleza superior es vista con los ojos de la sabiduría.</b> I try it without result: sed "s|::\(.*\)|\\<b>\1\</b>|g" EntrenamientoProgresivoSamadhi Thank you in Advantage
Assumptions: :: can occur more than once on a given line of input :: never shows up as data (ie, we need to replace all occurrences of ::) a solution using awk is acceptable Adding some more data to our input: $ cat file ::Habiéndose calmado de distracciones, uno permanece completamente, y la naturaleza superior es vista con los ojos de la sabiduría.:: some more ::text1:: and then some more ::text2:: the end One awk idea: awk ' BEGIN { tag[0]="<b>"; tag[1]="</b>" } { while (sub(/::/,tag[c%2])) c++; print } ' file This generates: <b>Habiéndose calmado de distracciones, uno permanece completamente, y la naturaleza superior es vista con los ojos de la sabiduría.</b> some more <b>text1</b> and then some more <b>text2</b> the end
Using GNU sed $ sed -Ez 's~::(.[^:]*)::~<b>\1</b>~' input_file <b>Habiéndose calmado de distracciones, uno permanece completamente, y la naturaleza superior es vista con los ojos de la sabiduría.</b>
Use awk and increment a counter variable. Then you can perform a different substitution depending on whether it's odd or event. awk '/::/ && counter++ % 2 == 0 {sub("::", "<b>") } /::/ {sub("::", "</b>") } 1' Note that this will only work correctly if the start and end :: are on different lines.
This might work for you (GNU sed): sed ':a;/::/{x;s/^/x/;/xx/{s///;x;s/::/<\/b>/;ba};x;s/::/<b>/;ba}' file If a line contains ::, swap to the hold space and increment a counter by inserting an x at the start of the line. If the counter is 2 i.e. the swap space contains xx, reset the counter and then swap back to the pattern space and replace :: by <\b> and go again (in the case that a line contains 2 or more :: strings). Otherwise, swap back to the pattern space and replace :: by <b and go again (for reasons explained above). All other lines are untouched.
sphinx gettext inserts empty quotes "" in front of previously matching msg
Currently my worflow when I change things in the original file is this: make gettext to update *.pot files sphinx-intl update -p build/gettext -l fr to create *.po files out of it However, this always results in the following behavior: Some longer messages in the *.po files are not correctly updated or to be more correct they are updated although they didn't change. sphinx-intl update will insert quotes "" in front of every paragraph that spans over multiple lines. Here's how that looks: Before: (in some french *.po file): msgid "Some longer paragraph text that spans multiple lines. This text was just" "lying here and matched a sequence within the file before gettext inserted" "unnecessary quotes on top." msgstr "Un texte de paragraphe plus long qui s'étend sur plusieurs lignes. Ce texte se trouvait juste ici et correspondait à une séquence dans le fichier avant que gettext n'insère des guillemets inutiles par-dessus." After: msgid "" "Some longer paragraph text that spans multiple lines. This text was just" "lying here and matched a sequence within the file before gettext inserted" "unnecessary quotes on top." msgstr "" "Un texte de paragraphe plus long qui s'étend sur plusieurs lignes " "Ce texte se trouvait juste ici et correspondait à une séquence dans le " "fichier avant que gettext n'insère des guillemets inutiles par-dessus." This is extremely annoying as it will not match anymore with the text it is supposed to! Only when I remove the leading "" the texts will match again. I wondered if this happens because I tend write my translated msgstr as one line without intermitted quotes (what are they good for anyway?). After sphinx-intl update they are enclosed in quotes... What is going on and how can I prevent this?
How to check the value of several variables in a th: text thymeleaf
I find a lot of tutorial to check if a variable is null or not in a th: text. But I can not find to check several, and change the text. here is my example: th:text="|${item?.startDate} ${item?.endDate} ${item?.startTime} ${item?.endTime}|" See that sometimes one or more of its 4 variables can be null, so sometimes I have null displayed. I therefore want to display From XX to YY at 10 a.m. until 7 p.m. And if possible, be able to use the locales 'From' 'To' 'to' and 'to' so that this line is multi-lingual. Precision: I use internalization well: message1= Starts on(en), Débute le (fr) message2= Ends on( en), se termine le (fr) message 3= begins to (en), commence à(fr) message4= finishes at (en), se termine à (fr) So I want to Show Fr: Débute le item.startDate se termine le item.endDate commence à item.startTime se termine à item.endTime en: Starts on item.startDate ends on item.endDate begins to item.startTime finishes at item.endTime but the problem is that sometimes I have item.startTime and / or item.endTime which is / are null So I want to display the message partially: Fr: Débute le item.startDate se termine le item.endDate en: Starts on item.startDate ends on item.endDate And sometimes I can have item.startTime and item.endTime not null but item.startDate and item.endDate null So I will want to display the following message: Fr: commence à item.startTime se termine à item.endTime en: begins to item.startTime finishes at item.endTime I can't find the correct syntax for this example thanking you
Regex not matching new line with parenthesis
I have this text : #Heurtebise (Il ramasse son sac) Vous regretterez de m'avoir fait du mal. (Silence.) Vous me chassez ? #Eurydice Le mystère est mon ennemi. Je suis décidée à le combattre. oui oui. I want 2 matches of 2 groups, the result I want is : Match 1 1. #Heurtebise (Il ramasse son sac) 2. Vous regretterez de m'avoir fait du mal. (Silence.) Vous me chassez ? Match 2 1. #Eurydice 2. Le mystère est mon ennemi. Je suis décidée à le combattre. oui oui. And I can't understand why my regex : /^(\#.+)$([^(\#|\#)]+)/ does not matches the 4th line beginning by a parenthesis. This is the result I have : Match 1 1. #Heurtebise (Il ramasse son sac) 2. Vous regretterez de m'avoir fait du mal. Match 2 1. #Eurydice 2. Le mystère est mon ennemi. Je suis décidée à le combattre. oui oui. Notice how it skips the line (Silence.) Vous me chassez ? in match 1. Can't understand why ! See the full case here : http://rubular.com/r/RR2eDc4ZBQ Can someone help ? Thanks.
You may use /^(#.+)((?:\R(?![##]).*)*)$/ See the regex demo. It will match any line starting with #, and then will match all consecutive lines that do not start with # or #. Details ^ - start of a line (#.+) - Group 1: # and the rest of the line ((?:\R(?![##]).*)*) - Group 2: 0 or more occurrences of: \R(?![##]) - a line break sequence not followed with # or # .* - the rest of the line $ - end of line (not needed though).
The error is in the character class to exclude a line starting with # or #: [^(\#|\#)] avoids # and # but also avoids (, | and ). A character class needs no alternation and parentheses. Using [^##] makes your sample code work for me.
How con i resize a label to a smaller one with LP 2824 Plus
I'm trying to resize automatic labels that comes from MercadoLibre.com.ar (latin american sell/buy page like eBay). They reccomend another zebra printer, but i have a smalller one, LP 2824 Plus. The current code for every label looks like this ^XA ^FX MELI LOGO IMAGE ^FO50,50^GFA,6900,6900,50,,:::::::::::::::::::::::gI0FF,g01LF8,Y03IF00IFChO08,X03FEL07FChM07C,W01FCN03F8hL0FC,W0FCP03FhL0FC,V07ER07EhK0FC,U01FT0F8hJ0FC,U07CT03EhJ0FC,T01FV078hI0FC,T03CV03CO07F81FEI03FFJ07E01FF8003FF8007F8FC03FF8,T0FO0FN0FN01FFE3FF800IFC001FE07FFE00IFE00FFCFC0IFE,S01CM01IF8L038M03IF7FFC01IFE007FE0JF01JF03JFC1JF,S07CM07E07F8K01EM07LFE03JF00FFE1JF83JF83JFC3JF8,S0FF8K01EI03EK03FM07MF07FCFF81FFE3JFC3FC7F87JFC3FE7FC,R01C7FK078J0FCI01FB8L0FE1FF87F07F03FC1FFC7F81FC3F01F8FF07FC7F01FC,R0380FFI0FFK03FC01FE1CL0FC0FF03F8FE01FC3FC07F01FC3F01F8FE03FC7F00FE,R07001FF0FFEL07JF80EL0FC07E01F8FC00FC3F807E00FC3E01F8FC01FCFE00FE,R0EI01JF8M07FFI07K01F807E01F8FC00FC3F00FEM03FDFC01FCFE007E,Q01CJ01F87I0F8O038J01F807E01F8KFC3F00FEL0IF9FC00FCFC007F,Q018M0E003FEO018J01F807E01F9KFE3F00FEK07IF9FC00FCFC007F,Q03N0C00FDFP0CJ01F807E01F9KFE3F00FEJ01JF9FC00FCFC007F,Q07M01C01E03CO0EJ01F807E01F9KFE3F00FEJ03FFDF9FC00FCFC007F,Q06M03803801EO06J01F807E01F9FCI043F00FEJ07FC1F9FC00FCFC007F,Q0CM0700FI07O03J01F807E01F8FCJ03F00FEJ07F01F9FC00FCFC007F,Q0CM0601EI038N03J01F807E01F8FCJ03F00FE00FC7E01F9FC01FCFE007E,P018M07078I01CN018I01F807E01F8FCJ03F007F00FC7E01F8FC01FC7E00FE,P018M07FFK0FN018I01F807E01F8FE00FC3F007F01FC7E03F8FE03FC7F01FE,P018M01FCK078M018I01F807E01F87F03FC3F007F83FC7E07F8FF07F87F83FC,P03V03CN0CI01F807E01F87JF83F003JF87F9FF07JF83JFC,P03W0EN0CI01F807E01F83JF03F001JF07JF03JF01JF8,P038V07N0CI01F807E01F81IFE03F001IFE03IFE03IFE00JF,P03F8U038L07CI01F807E01F80IFC03FI07FFC01IFC00IFC007FFE,P03FF8T01CK03FCJ0F803E00F803FF003EI01FFI0FFEI03FFI01FF8,P031FFU0FJ03FEC,P0601FET038001FE04,P06001FCS01C00FE004,P06I03FT0E07FI0C,P07J07C78Q071F8I0EU01C1F81E,P07J01FFEFCL01803FCJ0CU07C3F83E,P03K07C7FEL01C01FK0CU07C3F83E,P03K038387M0E00EK0CU0FC3F87E,P038J030303FCK03006J01CU0FC3F87E,P038J03I03BEK03806J01CU0FC1F07E,P038J03K06K01C06J01CU0FCI07E,P03CJ038J07L0E0EJ03CU0FCI07E0FK0E007C,P01CJ01CCI03I030079CJ038U0FC1F87E3FCI07E03FF8,P01EK0FCI03I03803F8J078U0FC3F87F7FF001FE0IFE,P01EK03CI03E001C03EK078U0FC3F87JF807FE1JF,Q0FL0EI07FI0E038K0FV0FC3F87JFC0FFE3JF8,Q0F8K06J03800703K01FV0FC3F87JFC0FFE3JF8,Q07CK03F8001CC038FK03EV0FC3F87FC1FE1FFE7F01FC,Q07CK01F8I0C601FEK03EV0FC3F87F80FE1FC07E00FC,Q03EL07CI0C701FCK07CV0FC3F87F007E3F807E00FE,Q03FM0E300C381CL0FCV0FC3F87F007F3F80FE007E,Q01F8L07F00C1C38K03F8V0FC3F87F007F3F00KFE,R0FEL03F01C0FFL07FW0FC3F87E003F3F00KFE,R07FM0381E0FEL0FEW0FC3F87E003F3F00KFE,R07F8L01E7FFCL01FEW0FC3F87E003F3F00KFE,R03FEM07E3F8L07FCW0FC3F87E003F3F00FE,R01FFM018O0FF8W0FC3F87F007F3F00FC,S0FFCV03FFX0FC3F87F007F3F00FE,S07FFV0FFEX0FC3F83F007F3F007E,S01FFCT03FF8X0FC3F83F80FE3F007F00FC,T0IF8R01IFY0FC3F83FC1FE3F007F83FC,T07FFER07FFEY0FC3F81JFC3F003JFC,T01IFCP03IF8Y0FC3F81JFC3F001JF8,U07IFCN03IFEg0FC3F80JF83FI0JF,U03JFEL07JFCg0FC3F803FFE03FI07FFC,V07TFEgI01FI0FF801EJ0FF,V01TF8,W03RFC,X07PFE,Y0PF,g07LFE,gH03FFC,,:::::::::::::::::::::::::::::::::::::^FS ^FX MOTONORTE LOGO IMAGE ^FO250,835^GFA,3045,3045,35,,::::::::::::gL078gG03C,gL07CgG07E,gL07CgG07C,::R0E1F01FI01FC01FF001FC01C3FI03F003870FF801FE,07LFI01IFC7FC007FF03FFC07FF03IF800FFC07FF1FFC07FF8J01LFC,07LFI01LFE00IF83FFC0IF83IFC01FFE07FF1FFC0IFCJ01LFC,03KFEI01MF01IFC3FFC1IFC3IFE03IF87FF1FFC1IFEK0LF8,Q01MF83IFE1FF83IFE3JF07IF87FF1FFC3FCFF,Q01FC3FE1FC3F8FE0FC07F8FE3FC7F0FE1FC7F807E03F03F,Q01F81FC0FC7E03F07C07E07F3F03F0FC0FC7E007E03E01F8,Q01F81F80FC7E03F07C07E03F3F01F1F807E7E007E07E01F8,Q01F00F807C7C01F07C07C01F3F01F1F807E7C007E07JF8,Q01F00F807C7C01F87C0FC01F3F01F1F803E7C007E07JF8J08J01,03KFEI01F00F807C7C01F87C0FC01F3F01F1F003E7C007E07JF8J0LF8,:Q01F00F807C7C01F07C07C01F3F01F1F807E7C003E07E,Q01F00F807C7E01F07C07C03F3F01F1F807E7C003E07C,Q01F00F807C7E03F07E07E03F3F01F0FC0FC7C003E03E00F,Q01F00F807C3F07E07F07F07E3F01F0FE1FC7C003F03F03F,Q01F00F807C3IFE07FC3IFE3F01F07IF87C003FC3JF,Q01F00F807C1IFC03FC1IFC3F01F03IF87C003FE1IFE,07LFI01F00F807C0IF801FC0IF83F01F03FFE07C001FE0IFCJ01LFC,07LFI01F00F807C07FFI0FC07FF03F01F00FFC07CI0FC07FF8J01LFC,03KFEJ0F00F807C01FCI03801FC01E01F007F803CI01C01FEL0LF8,,::::::::g06V03T04J03001,g06L018N03T04J030018,g04L01O0301R08M018,J01E600E01E00C03804018078100EI0230303C06038I0701C03E0100F80E,J03FF01F03F01C03C0603C078100FI0230303C0E03CI0F01E07E0301F80F,J03FF03F03F83801E0607E0703807I073030380E01EI0E00E07303033807,J0I383F03381C07E0607E060381FI023030300C03EI0C03E0630303181F,J0I383E03180E07E0607C060383FI037030380C07EI0C07E0730303183F8,J0I383C03380E06E06070060383FI03F030180C06EI0C07E07F0303F83F8,J031100E03101C07C0603C060101FI03F0301C0807EI0C03E07E0301F01F,g0EV02V07,g0EgS06,g08gS02,,::::gN03030302102,gN040244031830018,gN040244031830108,gN0202C3031818108,gN018201008806008,gO0820080C806008,gN01820180C807008,,:::::::::::::^FS ^MMT ^PW799 ^LL1519 ^LS0 ^FT256,928^XG000.GRF,1,1^FS ^FT32,192^XG001.GRF,1,1^FS ^FO14,13^GB772,1100,2^FS ^FO47,943^GB710,0,2^FS ^FO44,807^GB710,0,2^FS ^FO41,626^GB710,0,2^FS ^FO41,393^GB710,0,2^FS ^FO43,205^GB710,0,2^FS ^FT500,144^A0N,25,24^FH\^FDEmisi\A2n^FS ^FT578,144^A0N,25,24^FD03/02/2017 16:21^FS ^FT670,101^A0N,39,38^FDR2^FS ^FT670,249^A0N,39,38^FDC2^FS ^FT43,249^A0N,28,31^FDDestinatario:^FS ^FT45,280^A0N,25,24^FH\^FDBernardita Franco^FS ^FT510,280^A0N,21,20^FDTel: 0387156057943^FS ^FT45,310^A0N,25,24^FH\^FDAvenida Libertador 1154^FS ^FT45,340^A0N,25,24^FH\^FDpiso 14 A^FS ^FT45,370^A0N,25,24^FH\^FD(1112) autonoma - Capital Federal^FS ^FT41,443^A0N,28,31^FH\^FDNro. de Gu\A1a:^FS ^BY4,3,115^FT120,578^BCN,,Y,N ^FD0191070591^FS ^FT43,670^A0N,28,31^FDRemitente^FS ^FT45,698^A0N,25,24^FH\^FDIMPORTADORA FOTOGRAFICA SOCIEDAD ANONIMA^FS ^FT490,698^A0N,21,20^FDTel: (11)4643-2003 ^FS ^FT45,726^A0N,25,24^FH\^FDAv. Rivadavia 10820^FS ^FT45,752^A0N,25,24^FH\^FDImportadora Fotogr\A0fica S.A.^FS ^FT45,780^A0N,25,24^FH\^FD(1408) Liniers - Capital Federal^FS ^FT47,982^A0N,20,19^FH\^FDImportante: Se deja expresamente^FS ^FT319,982^A0N,20,19^FDaclarado que MercadoLibre^FS ^FT538,982^A0N,20,19^FH\^FDs\A2lo se limita a la^FS ^FT47,1010^A0N,20,19^FH\^FDpublicaci\A2n de anuncios^FS ^FT241,1010^A0N,20,19^FDde sus usuarios^FS ^FT370,1010^A0N,20,19^FDy no es el propietario,^FS ^FT545,1010^A0N,20,19^FDno ha vendido y no^FS ^FT47,1038^A0N,20,19^FH\^FDser\A0 responsable^FS ^FT182,1038^A0N,20,19^FH\^FDpor los art\A1culos^FS ^FT317,1038^A0N,20,19^FDentregados y/o contenidos^FS ^FT526,1038^A0N,20,19^FDen este paquete, ya que^FS ^FT47,1065^A0N,20,19^FDel vendedor es la persona^FS ^FT253,1065^A0N,20,19^FDidentificada en esta etiqueta.^FS ^PQ1,0,1,Y ^XZ What parameters should i change to make it work in a label such as: 2.5x4 inches?
This printers came by default in CPCL, you need to send the following command to change it to ZPL ! U1 setvar "device.languages" "zpl"