Xpath - joining text within hrefs

Xpath - joining text within hrefs - xpath

<ul><li>
A name of something
by Hello World and Goodbye for now (1975)
</li></ul>
I cannot seem to grab the text from this loop so it returns me:
"Hello World and Goodbye for now"
How do I make xpath grab these bits of text and put them together? I tried concat and also using a pipe between the two queries:
/ul/li/a[2]|/ul/li/a[3]
concat(/ul/li/a[2]|/ul/li/a[3])
But this doesn't seem to provide me with the text I want.

You can get HelloWorld using /ul/li/a[2] since it is the contents of the second a.
Goodbye for now is at /ul/li/a[3]
Now and in not in an a element, but in the li. It's the third text node (the first one is a line-break, the second one is line-break + by). You can retrieve it with /ul/li/text()[3].
To obtain the sentence you want you can use a simple concat:
concat(/ul/li/a[2],/ul/li/text()[3],/ul/li/a[3])

Related

Regex_extract the next (and only the first) line after string

Using regex (on Ruby), how can I extract one line (only next line) after strings, like this
Title:
this is the text I'd like to extract
Not this one
Neither this
I managed to extract the text using "Lookahead and Lookbehind" like this:
puts text.scan(/Title:[^;]*)Not this one/)
but the second part ("Not this one") is not always mentionned

^(?<=Title:\n)([^\n]+$)
DEMO
Check it please.

Regex Markdown Header

I'm trying to create a regular (ruby) expression which checks for multiple conditions. I use this regex to replace the content of my object. My regex is close to finished, except two problems I'm facing with regard to markdown.
First of, headers are giving me trouble. For example, I don't want to replace the word "Hi" for "Hello" if "Hi" is in a header.
Hi John <== # should not change
==================
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Or:
#### Hi Peter <== # should not change
Text: Hi, how are you? <== # Should be: Hello, how are you? after substitution
Question: How can I escape markdown headers within my regex? I've tried negative lookbehind and lookahead assertions, but to no avail.
My second problem should be quite easy, but somehow I'm struggling. If words are Italic "hi" I want to find and replace them, without changing the underscores. I can find the word with this regex:
\b[_]*hi[_]*\b
Question 2: But if I would replace it, I would also change the underscores. Is there a way to only detect the word itself and replace it, while still using word boundaries?
Code Example
#website.autolinks.all.each do |autolink|
autolink.name #for example returns "Iphone5"
autolink.url #for example returns "http://www.apple.com"
regex = /\b(?<!##\s)(?<![\d.\[])([_]*)#{autolink.name}([_]*)(?![\d'"<\/a>])\b/
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[#{autolink.name}](# {autolink.url})")
end
end
Example text
Iphone5
==============
Iphone5 is the best mobile phone there is, even though the people at Samsung probably think, or perhaps only hope that their Samsung Galaxy S3 is better.
#### Samsung Galaxy S3?
Yes, that's the name of the newest Samsung phone.
This will result in a text with HTML tags, but when I use my regex my content uses Markdown syntax (used before the markdown converter).

Regexes work best when they do one clear thing. If you have multiple conditions, your code should usually reflect that by dividing the processing into steps.
In this case, you have two clear steps:
Use a simple regex or other logic to skip over the header portion of the message.
Once you know you are in the content, use another regex to process the content.

I've found a solution:
regex = /(?<!##\s)(?<![\d.\[a-z])#{autolink.name}(?![\d'"a-z<\/a>])(?!.*\n(==|--))/i
if #permalink.blog_entry.content.match(regex)
#permalink.blog_entry.content.gsub!(regex, "[\\0](#{autolink.url})")
end

How to identify paragraph and store it in an Array

I have two paragraphs in a buffer which has only simple text in it:
PARAGRAPH 1
PARAGRAPH 2
I need to read from the first char of the first character of each paragraph until it's last word and store in an Array. This should be done for each paragraph. How Can I identify paragraphs if there are no extra markup tags?
If this is not possible, if I ask user to press enter twice after each paragraph how can I again split my text by identifying these? I tried regex but it doesn't work.

Here's one way to do it:
(let ((input-text "this is a sample paragraph.
this is another paragraph"))
(apply #'vector (split-string input-text "\n")))
split-string is an easy way to divide up text based on a regular expression to split on.
To convert the list of results to an array I use the function 'vector' which makes an array the parameters passed to it. In order to pass the contents of the list to that function instead of the list itself, I use 'apply'.

If some other newbie to elisp had problem: I found a way for splitting the text using a while like below:
(while (re-search-forward "[ \n][ \n]$" nil t)
..... ..... .......
)
But still not sure about how I can put it in Array while doing loop.

Get input tag element with a given text value via xpath

How can I select, via xpath, all input elements in a document that have a given value typed into them.
For instance, if I go to Google and type in "hello world", how do I get all input tags that have "hello world" typed into them?
Playing around with things like below haven't paid off, since the value in the text field isn't really part of the document.
document.evaluate("//input[text() = 'hello world']", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue
Should be pretty simple, but I'm surprisingly stuck.

Your x-path expression should searching for inputs that have the value attribute with 'hello world'
This is because that's where the value gets put into, not the inner text of the element.
The actual html element would look like:
<input type='text' value='hello world' />
The XPATH expression should look like:
//input[#value = 'hello world']

An alternative without jQuery for getting the input which contains the target text is:
//input[contains(#value, 'hello world')]
This would find the input even if the user types "hello world number 7"

This also seems to work:
//input[#type='text'][#value='hello world']

Selenium: How to locate a node using exact text match

I want to locate a Element on a Web Page using text.
I know there is a method name contains to do so, for example:
tr[contains(.,'hello')]/td
But problem is if I have two elements name hello and hello1 then this function does not work properly.
Is there any other method like contains for exact string match for locating elements?

tr[.//text()='hello']/td
this will select all td child elements of all tr elements having a child with exactly 'hello' in it. Such an XPath still sounds odd to me.
I believe that this makes more sense:
tr/td[./text()='hello']
because it selects only the td that contains the text.
does that help?

It all depends on what your HTML actually contains, but your tr[contains(.,'hello')]/td XPath selector means "the first cell of the first row that contains the string 'hello' anywhere within it" (or, more accurately, "the first TD element in the TR element that contains the string 'hello' anywhere within it", since Selenium has no idea what the elements involved really do). That's why it's getting the wrong result when there are rows containing "hello" and "hello1" - both contain "hello".
The selector tr[. ='hello']/td would be more accurate, but it's a little unusual (because HTML TR elements aren't supposed to contain text - the text is supposed to be in TH or TD elements within the TR), and it probably won't work (because text in any other cells would break the comparison). You probably want tr[td[.='hello']]/td, which means "the first TD element contained in the TR element that contains a TD element that has the string 'hello' as it's complete text".

Well, your problem is that you are searching text into the tr (which is not correct anyway) and this cause a problem to the function contains which cannot accept a list of text. Try to use this location path instead. It should retrieve what you want.
//tr/td[contains(./text(),"hello")]
This location path will retrieve a set of node on which you have to iterate to get the text. You can try to append the
/text()
but this will cause (at least on my test) a result that is a string which is a concatenation of all the matched strings.

I had the same probem. I had a list of elements, one was named "Image" and another one was named "Text and Image". After reading all the posts here, non of the sugestions worked for me. So I tryed the following and it worked:
List<WebElement> elementList = new ArrayList<WebElement>();
elementList = driver.findElements(By.xpath("//*[text()= '" +componentName+"']"));
for(WebElement element : elementList){
if(element.getText().equals(componentName)){
element.click();
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Xpath - joining text within hrefs - xpath

Related

Regex_extract the next (and only the first) line after string

Regex Markdown Header

How to identify paragraph and store it in an Array

Get input tag element with a given text value via xpath

Selenium: How to locate a node using exact text match

Categories

Resources