I want to extract the plain text in the html table (that is, I don't want to grab the information including red arrow),
However, I tried to get the plain text by cell.text, it will get the unnecessary hyperlinks' text
"\n central tendency1 \n "
I expected that I can get
"central tendency"
So I tried cell.text.strip.downcase.gsub!(/\d/, ""),
However the gsub method will also clear the information in the green rectangle.
Is there any way to grab the text in html excepting the text of hyperlink ?
here's the html link I need to parse
You can remove all the links before converting to text with nokogiri:
table = doc.css(".page table")[0]
table.css("a").each(&:remove)
Edit: Alternatively, you can have a regexp that only removes numbers at the end of a string and if they're preceded by a letter, which seems like it may work in this specific case but cannot be relied upon to work in similar cases:
cell.text.strip.downcase.gsub(/(?<=\w)\d$/, "")
Related
When I try to save edited node Drupal just reset field values and do nothing without any error message or log.
I entered text word by word and found that word "having" cause such behavior.
I'm using ckeditor to edit and filter text value and I guess that this module source of problem. As if I save text as plain text there are no issues.
Right now I don't know what to do next to track, dig deeper and isolate this issue...
PS. In ckeditor format settings I checked only two options:
Limit allowed HTML tags and correct faulty HTML
Convert line breaks into HTML (i.e. br and p)
in a Web page :
<h3 class="xh-highlight">Units Currently On Bed List</h3>
"[total beds=0]
"
i want to find xpath of total beds=0.
how can i do?
Your question and your comment are a bit contradictory. Do you want to find the text after a heading or do you want to find the element containing the text [total beds=0]? Also, how exact do you want to navigate your document?
To find a text after any h3 element you can use this: //h3/following-sibling::text()[1] (see XPath - select text after certain node).
To find a text after an h3 element with the class "xs-highlight" you can use this: //h3[#class='xh-highlight']/following-sibling::text()[1]
To be even more precise you can also look for the heading text: //h3[#class='xh-highlight' and text()='Units Currently On Bed List']/following-sibling::text()[1]
This doesn't match the html in your first comment however, so you might want to adjust the header class and text values. Also, it will find any first text even if there are other elements between it and the h3 element.
Now, your second comment makes it seem you actually want to find the element containing the text. The reason //*[text()='[total beds=0]'] doesn't work is because of the newline in the text. If you can get rid of that in the source it should match, otherwise you can "ignore" it in the xpath by using //*[normalize-space(text())='[total beds=0]']. (This is assuming the quotes around the text in your question aren't actually in the document.)
Hi I am using Watir to click through some links. I go to a page, click a link based on its text, and the do it again click a new link. I am locating the links based on their text (it is the only way I can based on their HTML) and need to match the text I pulled from the page to the link. The text that I get contains some extra text not part of the link, so I need to gsub it out. Here is my issue:
String: text = "Nuclear Launch Codes (Levels One/Two)"
Link: Nuclear Launch Codes (Levels One/Two) Blah Blah Blah
Because the links do not always have the exact text I need to locate them like so: /#{text}/
Problem is that returns "Nuclear Launch Codes (Levels One\/Two)"
I though I would gsub the 1st parenthesis and everything after, but I need to keep it because I can have Nuclear Launch Codes (Levels Four/Five)
Is there anyway to modify the string to match the link while ignoring the rest of the link text?
If I understand you correctly, try:
/#{Regexp.escape(text)}/
Or equivalently, if you prefer:
Regexp.new(Regexp.escape(text))
This will automatically escape parentheses, slashes and so on in the text so they are not treated as special regexp characters.
I've only spent maybe 30 mins searching online for this, and couldn't come up with a decent answer.
But anyway, in Joomla there are normal input fields for the Contacts component, but there's a textarea for the Address.
This would make me assume you can enter multiple lines of address in there, and it would be displayed as separate lines... but it doesn't. Even if I enter line breaks, the output is rendered on one line.
So I try to enter <br> to separate, and upon saving, Joomla strips these tags out.
In the template, the output is being written simply by echoing $this->contact->address
Is there anyway, to explode this input and replace linebreaks with <br> marks?
UPDATE:
For now as a temporary measure I'm able to add HTML code into the database values, which saves and outputs on the front end.
On a separate note, I'm now looking to remove the Subject line from the contact form, without hacking the code. and by using overrides as much as possible. Can anyone help?
Have you tried the Sourcerer extension?
Your question is pretty old, but did you get a solution to this Lee?
To create line-breaks in Joomla, titles, text areas etc. Easiest way to do this is to use the ReReplace extension from NoNumber: http://extensions.joomla.org/extensions/edition/replace/4336
I personally use this to add line break in e.x. menu-item titles, where < br / > aren't allowed and get stripped.
With ReReplacer, you can create a custom tag e.x. {br} and then have ReReplacer replace {br} with < br / >.
So everytime you need to add a line break anywhere in Joomla, where html codes usually get stripped, you can just add {br} to have it add a new line.
Very old question but I've fallen into the same issue and tried to find a more user friendly solution.
You can enter multiple lines in the address textarea, and they are correctly outputted to the HTML page source. But as you know, newlines in HTML are not rendered, they have to be transformed to <br>.
For this PHP has a nice function, nl2br, that inserts a <br> each time it encounters a newline in a string.
So in html\com_contact\contact\default_address.php of your template, replace:
echo $this->contact->address;
with
echo nl2br($this->contact->address);
This would nicely do the job, and allow the user to naturally insert any newline in the contact address textarea that will be correctly rendered with the appropriate <br>; I believe this is quite more user friendly solution than your previous one of the user having to insert -br- tags in the address field.
in my Cocoa App I have a textfield, where users are supposed to insert a newline by pressing ALT+ENTER. That's fine... when I get the value of it as a NSString and do a NSLog, I actually see that new lines are print.
But when I append the string as part of an html page with appendString and then with loadHTMLString, the new lines are completely ignored in the resulting web page...
Could you kindly give me a suggestion or a documentation link to read? I have really no idea!
Thanks a lot!!!
Plain newline characters are not rendered by browsers.
Replace the newline characters ( \n ) by line break tags ( <br> ).
NSString *newString = [oldString stringByReplacingOccurancesOfString:#"\n" withString:#"<br>"]
The <br> tags will show as newline characters in the webpage.
The issue is that the new lines that exist in the string are inserted literally into the html document source. Newlines in html source do not get rendered literally when the html document is rendered. (Imagine if they did; you'd have many, many extra new lines displayed because of all those new line characters separating html tags.)
Instead you should process the raw text entered by the user to produce html source which will render as what the user entered. So you have to break the text up into paragraphs and then insert appropriate <p></p> tags.
You can use the NSString method enumerateSubstringsInRange:options:usingBlock: with the option NSStringEnumerationByParagraphs. To process each paragraph.
Also, you should to sanitize the data entered by the user. For example you don't want your page to get messed up if the user enters some html tags. Sanatizing the data may be as simple as replacing all the restricted characters in the text with the appropriate html entities, but it depends on what you're doing.