Select all text not in div elements using xpath? - xpath

how can i select only text which is not in div tags?
eg.
<div>
<div>not this</div>
1<br/>
<div>not this</div>
1<br/>
1<br/>
<div>not this</div>
</div>
<div>
<div>not this</div>
2
<div>not this</div>
2
2
<div>not this</div>
</div>
<div>
<div>not this</div>
3
<div>not this</div>
3
3
<div>not this</div>
</div>
results: {'1/n1/n1/n','2 2 2','3 3 3'}

//text()[normalize-space()][../node()[not(self::text())]]
Meaning: any not whitespace only text node having at least one sibling node

Use:
div/text()[string-length(normalize-space()) > 0]
This expression, when evaluated with the parent of the provided XML fragment as the context node, selects all non-white-space-only text-node children of any div child of the context node.
Here is complete verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each select=
"div/text()[string-length(normalize-space()) > 0]">
"<xsl:value-of select="."/>"
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML fragment (wrapped in a top element to become well-formed XML document):
<t>
<div>
<div>not this</div>
1<br/>
<div>not this</div>
1<br/>
1<br/>
<div>not this</div></div>
<div>
<div>not this</div>
2
<div>not this</div>
2
2
<div>not this</div></div>
<div>
<div>not this</div>
3
<div>not this</div>
3
3
<div>not this</div></div>
</t>
the wanted, correct result is produced:
"
1"
"
1"
"
1"
"
2
"
"
2
2
"
"
3
"
"
3
3
"

Related

How to exclude from a contains query all the informations from a child class & after some sibling text?

<root>
<a></a>
<b></b>
<c></c>
<a></a>
<d></d>
<e></e>
<a></a>
<a></a>
</root>
In an XML document, how can I exclude from a contains research all the information from nodes after <d> ?
to get only result from:
<a></a>
<b></b>
<c></c>
<a></a>
<d></d>
I can't say only the first 2 answer from
and first for
and <c> because sometimes a value will exist only after the <d>
I have this code that is working:
//div[contains(#class,'class searched')]/*[contains(text(), 'Text Searched')] | //div[contains(#class,'class searched')]/*[not(contains(#class,'class excluded'))]/*[contains(text(), 'Text Searched')]
Thanks for your help :)
EDIT for clarity:
<div Class="TopClass">
<div Class="ClassA">
<div Class="ClassB">
<h3> Text Researched</h3>
<u1 Class="ClassC">
<h3> Text Researched</h3>
</u1>
</div>
</div>
<h4>Other Text</h4>
<div Class="ClassA">
<div Class="ClassB">
<h3> Text Researched</h3>
<u1 Class="ClassC">
<h3> Text Researched</h3>
</u1>
</div>
</div>
I would like to get only the Text Researched that is between the Class B and Class C and that is above the "Other Text". Sometime the "Text researched" will only appear below the "Other Text" and i don't want to get this result so a [1] will not work there. Also the <h3> and <h4> are used elsewhere in the code.
Given this html
<div class="TopClass">
<div class="ClassA">
<div class="ClassB">
<h3> Text Researched 1</h3>
<u1 class="ClassC">
<h3> Text Researched 2</h3>
</u1>
</div>
</div>
<h4>Other Text</h4>
<div class="ClassA">
<div class="ClassB">
<h3> Text Researched 3</h3>
<u1 class="ClassC">
<h3> Text Researched 4</h3>
</u1>
</div>
</div>
</div>
This XPath expression will get the first 2 h3 tags
//div[#class="ClassA" and following-sibling::h4[.="Other Text"]]//h3[contains(.,"Text Researched 1")]/text()
Result:
echo -e 'cat //div[#class="ClassA" and following-sibling::h4[.="Other Text"]]//h3/text()\nbye' | xmllint --shell test.html
/ > cat //div[#class="ClassA" and following-sibling::h4[.="Other Text"]]//h3[contains(.,"Text Researched 1")]/text()
-------
Text Researched 1
/ > bye

How to conditionally apply style color to an active tab

I have a tab that show contents base on the tab clicked, How do I conditionally add a background color to the active tab, here is my code
const [active, setActive] = useState(0);
const handleClick = e => {
const index = parseInt(e.target.id, 0);
if (index !== active) {
setActive(index);
console.log(index)
}
};
<ul>
<li>
<a onClick={handleClick} active={active ? 0 : 0} id={0}>Tab 1</a>
</li>
<li>
<a onClick={handleClick} active={active ? 1 : 0} id={1}>Tab 2</a>
</li>
<li>
<a onClick={handleClick} active={active ? 2 : 0} id={2}>Tab 3</a>
</li>
</ul>
<Content active={active ===0}>
<h1> Content1</h1>
</Content>
<Content active={active ===1}>
<h1> Content2</h1>
</Content>
<Content active={active ===2}>
<h1> Content2</h1>
</Content>
Method 1: Using class name
add a className to the active tab based on the active state. And then style that active tab from CSS.
<div>
<ul className={"tabsContainer"}>
<li>
<a
onClick={handleClick}
className={active === 0 ? "active" : "inactive"}
id={0}
>
Tab 1
</a>
</li>
<li>
<a
onClick={handleClick}
className={active === 1 ? "active" : "inactive"}
id={1}
>
Tab 2
</a>
</li>
<li>
<a
onClick={handleClick}
className={active === 2 ? "active" : "inactive"}
id={2}
>
Tab 3
</a>
</li>
</ul>
</div>
CSS
.tabsContainer > li > a{
width: 100px;
height: 60px;
display: flex;
align-items: center;
justify-content: center;
cursor: pointer
}
.tabsContainer > li > a.active{
background-color: green;
color: #fff
}
Here is the working sandbox example:- https://codesandbox.io/s/hungry-turing-fx754
Method 2: Using inline style
Other method is to add the inline style conditionally.
<div>
<ul className={"tabsContainer"}>
<li>
<a
onClick={handleClick}
id={0}
style={active === 0 ? {background:'green'} : {background:'#fff'}}
>
Tab 1
</a>
</li>
<li>
<a
onClick={handleClick}
id={1}
style={active === 1 ? {background:'green'} : {background:'#fff'}}
>
Tab 2
</a>
</li>
<li>
<a
onClick={handleClick}
id={2}
style={active === 2 ? {background:'green'} : {background:'#fff'}}
>
Tab 3
</a>
</li>
</ul>
</div>
Using class names is a better and efficient option as it gives more control for using pseudo selectors of CSS and also code repetition remains less.

XPATH: Select a node whose children do not containg some text

I'm trying to select a node whose children do not contain some specific text.
For example:
<div class="b-margin">
<div class="tag">Pt</div>
<div class="tag">En</div>
</div>
<div class="b-margin">
<div class="tag">Ru</div>
<div class="tag">En</div>
</div>
How would i go about selecting the 'div class="b-margin"' nodes that do not have children with the text "Pt"?
Here is the simple xpath.
//div[#class='b-margin' and not(div[.='Pt'])]
Screenshot:

How to get text between two strings with special characters in ruby?

I have a string (#description) that contains HTML code and I want to extract the content between two elements. It looks something like this
<b>Content title<b><br/>
*All the content I want to extract*
<a href="javascript:print()">
I've managed to do something like this
#want = #description.match(/Content title(.*?)javascript:print()/m)[1].strip
But obviously this solution is far from perfect as I get some unwanted characters in my #want string.
Thanks for your help
Edit:
As requested in the comments, here is the full code:
I'm already parsing an HTML document doing something where the following code:
#description = #doc.at_css(".entry-content").to_s
puts #description
returns:
<div class="post-body entry-content">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"><br><br><div style="text-align: justify;">
Some text</div>
<b>More text</b><br><b>More text</b><br><br><ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br><b>Content Title</b><br>
Some text<br><br>
Some text(with links and images)<br>
Some text(with links and images)<br>
Some text(with links and images)<br>
<br><br><img src="http://url.com/photo.jpg">
<div style="clear: both;"></div>
</div>
The text can include more paragraphs, links, images, etc. but it always starts with the "Content Title" part and ends with the javascript reference.
This XPath expression selects all (sibling) nodes between the nodes $vStart and $vEnd:
$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
To obtain the full XPath expression to use in your specific case, simply substitute $vStart with:
/*/b[. = 'Content Title']
and substitute $vEnd with:
/*/a[#href = 'javascript:print()']
The final XPath expressions after the substitutions is:
/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
Explanation:
This is a simple corollary of the Kayessian formula for the intersection of two nodesets $ns1 and $ns2:
$ns1[count(.|$ns2) = count($ns2)]
In our case, the set of all nodes between the nodes $vStart and $vEnd is the intersection of two node-sets: all following siblings of $vStart and all preceding siblings of $vEnd.
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vStart" select="/*/b[. = 'Content Title']"/>
<xsl:variable name="vEnd" select="/*/a[#href = 'javascript:print()']"/>
<xsl:template match="/">
<xsl:copy-of select=
"$vStart/following-sibling::node()
[count(.|$vEnd/preceding-sibling::node())
=
count($vEnd/preceding-sibling::node())
]
"/>
==============
<xsl:copy-of select=
"/*/b[. = 'Content Title']/following-sibling::node()
[count(.|/*/a[#href = 'javascript:print()']/preceding-sibling::node())
=
count(/*/a[#href = 'javascript:print()']/preceding-sibling::node())
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document (converted to a well-formed XML document):
<div class="post-body entry-content">
<a href="http://www.photourl">
<img alt="Photo title" height="333" src="http://photourl.com" width="500"/>
</a>
<br />
<br />
<div style="text-align: justify;">
Some text</div>
<b>More text</b>
<br />
<b>More text</b>
<br />
<br />
<ul>
<li>Numered item</li>
<li>Numered item</li>
<li>Numered item</li>
</ul>
<br />
<b>Content Title</b>
<br />
Some text
<br />
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
Some text(with links and images)
<br />
<br />
<br />
<a href="javascript:print()">
<img src="http://url.com/photo.jpg"/>
</a>
<div style="clear: both;"></div>
</div>
the two XPath expressions (with and without variable references) are evaluated and the nodes selected in each case, conveniently delimited, are copied to the output:
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
==============
<br/>
Some text
<br/>
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
Some text(with links and images)
<br/>
<br/>
<br/>
To test your HTML, I have added tags around your code then pasting it in a file
xmllint --html --xpath '/html/body/div/text()' /tmp/l.html
output :
Some text
Some text
Some text
Some text
Now, you can use an Xpath module in ruby and re-use the Xpath expression
You will find many examples on stackoverflow website searches.

XPath to find attributes where the name starts with a given value

With this xml:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
we want to find only
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
which are those nodes having an attribute where the attribute name begins with val
You can try this :
//div/#*[starts-with(name(.), 'val')]
if you know that you are looking for the first attribute of the div element.
Edit:
Sorry didn't realize you wanted to select the elements themselves. You could use parent::div or what you did, but the proper way of doing this would be to select directly the div themselves :
//div[#*[starts-with(name(), 'val')]]
have you tried with .../#val* ?
which are those nodes having an attribute where the attribute name
begins with val
Use:
//div[#*[starts-with(name(), 'val')]]
This selects any div element in the document, that has at least one attribute, whose name starts with the string "val".
XSLT - based verification:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//div[#*[starts-with(name(), 'val')]]"/>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document (produced from the provided XML fragment):
<html>
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>
<div some="r">d</div>
<div thing="t">f</div>
<div name="y">g</div>
</html>
selects and outputs the wanted nodes:
<div val1="q">a</div>
<div val2="w">b</div>
<div val3="e">c</div>

Resources