The string
<div id="main">
content (is INT)
<div>some more content (is not INT) other content (also INT)</div>
</div>
I need to get the content which is an INT. A simple strip all non-INT function will not work since other contentsometimes also is an INT. I cannot use a select child solution since it is always outside div and to select the content of <div id="main">will also select the other div.
Thus is there a solution that can search the string from start for the first <and remove the rest of the string when found.
(The structure cannot be altered)
if that's the exactly format, you could just use substr and strpos
something like
$html = '<div id="main">
12345
<div>foobar6789</div>
</div>
';
$content_1 = substr($html,15,strpos($html,'<div>')-15); //the first INT content
$subdiv = str_replace("</div>","",substr($html,strpos($html,'<div>')+5));
preg_match('/(?P<noint>[^0-9]+)(?P<digit>\d+)/', $subdiv, $matches);
echo $matches['noint'];//the NO INT content
echo $matches['digit'];//the second INT
it's not a good idea to parse html using regexp... but maybe you could do it using only preg_match...
good luck!
Related
The example below:
content := "<p>https://github.com/</p>
<div class=\"extract\">
<p>hello1</p>
</div>
<div>hello2</div>
<div class=\"extract\"><p>hello3</p></div>"
I want to remove all "div" that has [class="extract"] include of all children elements too.
I want to get below result
content := "<p>https://github.com/</p>
<div>hello2</div>"
I try to use regex, but it`s not working
You can use goquery to parse and modify your HTML
I have the current HTML code:
<div class="group">
<ul class="smallList">
<li><strong>Date</strong>
13.06.2019
</li>
<li>...</li>
<li>...</li>
</ul>
</div>
and here is my "wrong" XPath:
//div[#class='group']/ul/li[1]
and I would like to extract the date with XPath without the text in the strong tag, but I'm not sure how NOT is used in XPath or could it even be used in here?
Keep in mind that the date is dynamic.
Use substring-after() to get the date value.
substring-after(//div[#class='group']/ul/li[1],'Date')
Output:
The easiest way to get the date is by using the XPath-1.0 expression
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]
The result does include the spaces.
If you want to get rid of them, too, use the following expression:
normalize-space(//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1])
Unfortunately this only works for one result in XPath-1.0.
If you'd have XPath-2.0 available, you could append the normalize-space() to the end of the expression which also enables the processing of multiple results:
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]/normalize-space()
Here is the python method that will read the data directly from the parent in your case the data is associated with ul/li.
Python:
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to call this in your case.
ulEle = driver.find_element_by_xpath("//div[#class='group']/ul/li[1]")
datePart = get_text_exclude_children(ulEle)
print(datePart)
Please feel free to convert to the language that you are using, if it's not python.
Here is an example of html:
<li class="index i1"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample.com">
<li class="index i2"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample2.com">
I would like to take the value of every href in a element. What makes the list is the class in the first li in which class' name change i1, i2.
So I have a counter and change it when I go to take the value.
i <- 1
stablestr <- "index "
myVal <- paste(stablestr , i, sep="")
so even if try just to access the general lib with myVal index using this
profile<-remDr$findElement(using = 'xpath', "//*/input[#li = myVal]")
profile$highlightElement()
or the href using this
profile<-remDr$findElement(using = 'xpath', "/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']")
profile$highlightElement()
Is there anything wrong with xpath?
Your HTML structure is invalid. Your <li> tags are not closed properly, and it seems you are confusing <ol> with <li>. But for the sake of the question, I assume the structure is as you write, with properly closed <li> tags.
Then, constructing myVal is not right. It will yield "index 1" while you want "index i1". Use "index i" for stablestr.
Now for the XPath:
//*/input[#li = myVal]
This is obviously wrong since there is no input in your XML. Also, you didn't prefix the variable with $. And finally, the * seems to be unnecessary. Try this:
//li[#class = $myVal]
In your second XPath, there are also some errors:
/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']
^ ^ ^
missing $ should be #class is actually 'tlt mhead'
The first two issues are easy to fix. The third one is not. You could use contains(#class, 'tlt'), but that would also match if the class is, e.g., tltt, which is probably not what you want. Anyway, it might suffice for your use-case. Fixed XPath:
/li[#class=$myVal]/ol[#id='rem']/div[#class='bare']/h3/a[contains(#class, 'tlt')]
In a case where a same element could change for a different id or name depending on many factors, I would be able to do an assertion on this element with accuracy.
Doest nighwatchjs permit to do an assertion based on a relative position like can do SAHI ? (Left of this element ..., Under a div, etc.)
I want to avoid Xpath solutions, it's based on the element type (div, id, name, etc.) and if I set it to all types:
//*[contains(text(),'hello world')]
I will get many occurrences and couldn't be able to know which one I'm trying to assert.
e.g : Running the same test on the same page, I would be able to find this "hello world" even if the div id changes or another element.
<div id="homebutton">
<p>
<a href=#>
<span name="hm">Home</span>
<a>
</p>
</div>
<div id=[0-9]>
<p>
<a href=#>
<span name="hw">hello world</span>
<a>
</p>
</div>
[...]
<div id=[0-9]>
<p>
<a href=#>
<span name="hw">hello world</span>
<a>
</p>
</div>
<div id="logoutbutton">
<p>
<a href=#>
<span name="lo">Logout</span>
<a>
</p>
</div>
Test example : Assert element containing string "hello world", not the one which is near the logout button but the one which is near the home button.
Expanding on my previous answer, you have two options, if the Hello World you want is *always the 2nd to last, appearing just before the Logout button then you want the 2nd to last of a type, you could use an xPath selector like this:
"//*[.='hello world'][last()-1]"
That's right in the Rosetta doc I shared with you, so you should know that by now
Another option is to get a collection of all matches. For that, I'd write a helper function like so:
module.exports = {
getCountOfElementsUseXpath : function (client, selector, value) {
// set an empty variable to store the count of elements
var elementCount;
// get a collection of all elements that match the passed selector
client.getEls(selector, function(collection) {
// set the variable to be that collection's length
elementCount = collection.length;
// log the count of elements to the terminal
console.log("There were " + elementCount + " question types")
return elementCount;
});
},
};
Then you can use that with some formula for how far your selector is from the last element.
The xpath selector "//div[contains(text(), 'hello world')]"
would match on both of the elements you've shown. If the element itself can change, you would use a wildcard: "//*[contains(text(), 'hello world')]"
For a match, on any element with that exact text:
"//*[.='hello world']"
A great source, a "Rosetta stone", for selector construction
To use an xpath selector with nightwatch:
"some test": function(client){
client
.useXpath().waitForElementPresent("//div[contains(text(), 'hello world')]", this.timeout)
}
The Xpath solution is okay but here is the solution I needed, more generic and giving many more options :
Using elements and manage to return an array of childrend elements
I choosed to return an array of objects with data matching my needs :
[{ id: webElementId, size: {width: 18, height: 35}, ...}, {id: webElementId, ...}, etc.]
With those informations, I can do many things:
Find an element with a specific text, attribute or cssproperty and
perform any action on it, like assertions or click on the right of it through a calculation of his size.
Mouse hover each elements matched (if you want to browse tabs with
submenus ul li / ol li)
More data is filled, more you can perform assertions.
HTML Portion:
<div class="abc">
<div style="text-align:left; itemscopr itemtype="xyz">
<h1 itemtype="mno"> I want this text </h1>
</div>
</div>
I am using
$text = $xpath->query('//div[class="abc"]/div/h1]
but I am getting no value. Please help me as I am new to it.
You should try
//div[#class="abc"]/div/h1
The difference is in the # sign before class, because the attribute axis is accessed this way. When you omit the # sign, it looks for node names (tag names).
This returns you the whole h1 node (or, rather, a node-set containing all the matching h1 nodes).
If you only wanted the text from the element, try the evaluate function instead:
$text = $xpath->evaluate("//div[#class='abc']/div/h1/text()")