Ruby/Regex: Dealing with strings containing forward slashes and parentheses using gsub and regex - ruby

Hi I am using Watir to click through some links. I go to a page, click a link based on its text, and the do it again click a new link. I am locating the links based on their text (it is the only way I can based on their HTML) and need to match the text I pulled from the page to the link. The text that I get contains some extra text not part of the link, so I need to gsub it out. Here is my issue:
String: text = "Nuclear Launch Codes (Levels One/Two)"
Link: Nuclear Launch Codes (Levels One/Two) Blah Blah Blah
Because the links do not always have the exact text I need to locate them like so: /#{text}/
Problem is that returns "Nuclear Launch Codes (Levels One\/Two)"
I though I would gsub the 1st parenthesis and everything after, but I need to keep it because I can have Nuclear Launch Codes (Levels Four/Five)
Is there anyway to modify the string to match the link while ignoring the rest of the link text?

If I understand you correctly, try:
/#{Regexp.escape(text)}/
Or equivalently, if you prefer:
Regexp.new(Regexp.escape(text))
This will automatically escape parentheses, slashes and so on in the text so they are not treated as special regexp characters.

Related

Finding head section of HTML in Emeditor

I'm trying to use EmEditor to do some find and replace on HTML using regex. I know that Regex is not generally suitable for HTML parsing but I believe it will work for my limited requirement. I can't get it to do some fairly simple finds. e.g. find the head section and remove it. I've tried several different syntaxes e.g. <head.*?>(.|\n)*?</head> also simpler ones where there are no attributes e.g. <head>.*?</head>. None work. What am I doing wrong?
Please try a non-zero value to the Additional Lines to Search for Regular Expressions text box in the Advanced dialog box (Click the Advanced button in the Find dialog box) if you need to search for multi-line strings.

Why does Sphinx+MyST put spaces inside HTML tags in the generated output?

I'm new to MyST and Sphinx, so perhaps this has an obvious explanation, but it is not obvious to me. Whenever I use any construct in my Markdown source, such as bold facing or italics or even an html element such as <span>, the result in the formatted HTML output contains spaces inside the HTML tags. For example, an input such as
* **Barcode**: the barcode identifying the item
* **Title**: the title of the item.
* **Author**: the author of the item.
produces this:
Note the space before the :. Inspecting the HTML reveals that the bold-faced element contains unexpected space characters:
Note the spaces inside the <strong>...</strong>. Why is this happening? What am I doing wrong? More importantly, how do I make it stop?
I'm using Sphinx 3.4.3 with myst-parser 0.13.3. My conf.py defines extensions as 'myst_parser', 'sphinx.ext.autodoc', 'sphinx.ext.autosectionlabel', and 'sphinx.ext.napoleon'.

How to remove link's text from html table

I want to extract the plain text in the html table (that is, I don't want to grab the information including red arrow),
However, I tried to get the plain text by cell.text, it will get the unnecessary hyperlinks' text
"\n central tendency1 \n "
I expected that I can get
"central tendency"
So I tried cell.text.strip.downcase.gsub!(/\d/, ""),
However the gsub method will also clear the information in the green rectangle.
Is there any way to grab the text in html excepting the text of hyperlink ?
here's the html link I need to parse
You can remove all the links before converting to text with nokogiri:
table = doc.css(".page table")[0]
table.css("a").each(&:remove)
Edit: Alternatively, you can have a regexp that only removes numbers at the end of a string and if they're preceded by a letter, which seems like it may work in this specific case but cannot be relied upon to work in similar cases:
cell.text.strip.downcase.gsub(/(?<=\w)\d$/, "")

How can I stop Joomla from stripping HTML code from the Contact info?

I've only spent maybe 30 mins searching online for this, and couldn't come up with a decent answer.
But anyway, in Joomla there are normal input fields for the Contacts component, but there's a textarea for the Address.
This would make me assume you can enter multiple lines of address in there, and it would be displayed as separate lines... but it doesn't. Even if I enter line breaks, the output is rendered on one line.
So I try to enter <br> to separate, and upon saving, Joomla strips these tags out.
In the template, the output is being written simply by echoing $this->contact->address
Is there anyway, to explode this input and replace linebreaks with <br> marks?
UPDATE:
For now as a temporary measure I'm able to add HTML code into the database values, which saves and outputs on the front end.
On a separate note, I'm now looking to remove the Subject line from the contact form, without hacking the code. and by using overrides as much as possible. Can anyone help?
Have you tried the Sourcerer extension?
Your question is pretty old, but did you get a solution to this Lee?
To create line-breaks in Joomla, titles, text areas etc. Easiest way to do this is to use the ReReplace extension from NoNumber: http://extensions.joomla.org/extensions/edition/replace/4336
I personally use this to add line break in e.x. menu-item titles, where < br / > aren't allowed and get stripped.
With ReReplacer, you can create a custom tag e.x. {br} and then have ReReplacer replace {br} with < br / >.
So everytime you need to add a line break anywhere in Joomla, where html codes usually get stripped, you can just add {br} to have it add a new line.
Very old question but I've fallen into the same issue and tried to find a more user friendly solution.
You can enter multiple lines in the address textarea, and they are correctly outputted to the HTML page source. But as you know, newlines in HTML are not rendered, they have to be transformed to <br>.
For this PHP has a nice function, nl2br, that inserts a <br> each time it encounters a newline in a string.
So in html\com_contact\contact\default_address.php of your template, replace:
echo $this->contact->address;
with
echo nl2br($this->contact->address);
This would nicely do the job, and allow the user to naturally insert any newline in the contact address textarea that will be correctly rendered with the appropriate <br>; I believe this is quite more user friendly solution than your previous one of the user having to insert -br- tags in the address field.

ALT+ENTER, how to detect the newline in NSString?

in my Cocoa App I have a textfield, where users are supposed to insert a newline by pressing ALT+ENTER. That's fine... when I get the value of it as a NSString and do a NSLog, I actually see that new lines are print.
But when I append the string as part of an html page with appendString and then with loadHTMLString, the new lines are completely ignored in the resulting web page...
Could you kindly give me a suggestion or a documentation link to read? I have really no idea!
Thanks a lot!!!
Plain newline characters are not rendered by browsers.
Replace the newline characters ( \n ) by line break tags ( <br> ).
NSString *newString = [oldString stringByReplacingOccurancesOfString:#"\n" withString:#"<br>"]
The <br> tags will show as newline characters in the webpage.
The issue is that the new lines that exist in the string are inserted literally into the html document source. Newlines in html source do not get rendered literally when the html document is rendered. (Imagine if they did; you'd have many, many extra new lines displayed because of all those new line characters separating html tags.)
Instead you should process the raw text entered by the user to produce html source which will render as what the user entered. So you have to break the text up into paragraphs and then insert appropriate <p></p> tags.
You can use the NSString method enumerateSubstringsInRange:options:usingBlock: with the option NSStringEnumerationByParagraphs. To process each paragraph.
Also, you should to sanitize the data entered by the user. For example you don't want your page to get messed up if the user enters some html tags. Sanatizing the data may be as simple as replacing all the restricted characters in the text with the appropriate html entities, but it depends on what you're doing.

Resources