Short circuited or - xpath

Is there a way to do a short circuited or operator in XQuery like you can do in C#, Java, JavaScript, etc?
I have some XML that can be in two formats, either:
<sometag>
<bodytext>
<section>
<bodytext>
<h1>Some html here</h1>
<h2>Some more html</h2>
</bodytext>
</section>
</bodytext>
</sometag>
or
<sometag>
<bodytext>
<h1>Some html here</h1>
<h2>Some more html</h2>
</bodytext>
</sometag>
I'm trying to return the html that's inside the inner body text tag. If i do
/sometag/bodytext/section/bodytext/*
will return the first scenario
/sometag/bodytext/*
it will return the second scenario.

Well, you can always use /sometag/innermost(.//bodytext)/* if the function is supported by your XQuery processor or version. Or /sometag//bodytext[not(.//bodytext)]/*.

You may want to use a simple if-then-else condition:
let $v1 := /sometag/bodytext/section/bodytext/*
let $v2 := /sometag/bodytext/*
return
if (fn:exists($v1))
then $v1
else (
$v2
)

Related

How to remove HTML element (select by class) of string by golang?

The example below:
content := "<p>https://github.com/</p>
<div class=\"extract\">
<p>hello1</p>
</div>
<div>hello2</div>
<div class=\"extract\"><p>hello3</p></div>"
I want to remove all "div" that has [class="extract"] include of all children elements too.
I want to get below result
content := "<p>https://github.com/</p>
<div>hello2</div>"
I try to use regex, but it`s not working
You can use goquery to parse and modify your HTML

Obtaining a partial value from XPath

I have the current HTML code:
<div class="group">
<ul class="smallList">
<li><strong>Date</strong>
13.06.2019
</li>
<li>...</li>
<li>...</li>
</ul>
</div>
and here is my "wrong" XPath:
//div[#class='group']/ul/li[1]
and I would like to extract the date with XPath without the text in the strong tag, but I'm not sure how NOT is used in XPath or could it even be used in here?
Keep in mind that the date is dynamic.
Use substring-after() to get the date value.
substring-after(//div[#class='group']/ul/li[1],'Date')
Output:
The easiest way to get the date is by using the XPath-1.0 expression
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]
The result does include the spaces.
If you want to get rid of them, too, use the following expression:
normalize-space(//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1])
Unfortunately this only works for one result in XPath-1.0.
If you'd have XPath-2.0 available, you could append the normalize-space() to the end of the expression which also enables the processing of multiple results:
//div[#class='group']/ul/li[1]/text()[normalize-space(.)][1]/normalize-space()
Here is the python method that will read the data directly from the parent in your case the data is associated with ul/li.
Python:
def get_text_exclude_children(element):
return driver.execute_script(
"""
var parent = arguments[0];
var child = parent.firstChild;
var textValue = "";
while(child) {
if (child.nodeType === Node.TEXT_NODE)
textValue += child.textContent;
child = child.nextSibling;
}
return textValue;""",
element).strip()
This is how to call this in your case.
ulEle = driver.find_element_by_xpath("//div[#class='group']/ul/li[1]")
datePart = get_text_exclude_children(ulEle)
print(datePart)
Please feel free to convert to the language that you are using, if it's not python.

PHP possibilities in Slim Template language for Ruby

In PHP I can do this:
<div class="foo <?php if($a) echo "bar"; ?>">
<?php if ($b) echo "</div>"; ?>
It is incredibly convenient. I can break a string in any place, between any quotes, between any HTML symbols, just wherever I want.
And I need to implement the same in Ruby-HTML. I'm trying to port a PHP project to Ruby. I use the Slim template language. I tried this but it doesn't work, Slim throws errors:
<div class="foo
- if (x == 1)
= bar
"></div>
For now with Slim I know only one way:
- if (a == true)
<div class="foo"></div>
- else
<div class="foo bar"></div>
Firstly, duplication. Secondly, my HTML-PHP part of code is quite complicated. It is with two loops (for loop and foreach loop inside it) and I use more than one such an embeds to add div's attributes according to conditions. And just cannot imagine how to implement it with Slim. It throws an error even for this, I cannot break long html string:
- if(i != 5)
<div class="foo bar"
id="item_#{i}"
style="background-color:red;"
data-im="baz">
</div>
- else
Does Slim allow to break strings with conditional ifs between quotes or element attributes? How to do it?
If you're using Rails, you're free to facilitate ActionView::Helpers this way:
= content_tag :li, class: ( a == true ? "foo bar" : "foo") do
inside div
Elsewise you're free to create some helper method to cover this logic for you
Nevertheless it's considered ill practice to include much logic in a view. Consider using some Presenter pattern
edit.
Looking into some slim docs found you're able to achieve your goal this way
div.foo class="#{'bar' if a == true}"
| Text inside div

angular filtering decode characters

in angular, if
$scope.myStr = '™';
{{myStr}} yields '$trade;' instead of the TM mark, how would I solve this issue using a filter?
and in some cases, $amp;trade; also appears, so I would absolutely need a filter to run the procedures, and eventually I want to be able to {{}} the result without dom manipulation.
You can use ngBindUnsafeHtml: http://jsfiddle.net/Xnp3J/
<div ng-app ng-controller="x">
<span ng-bind-html-unsafe="myStr"></span>
</div>
-
function x($scope) {
$scope.myStr = '™';
}

Need php function to pull value from string

The string
<div id="main">
content (is INT)
<div>some more content (is not INT) other content (also INT)</div>
</div>
I need to get the content which is an INT. A simple strip all non-INT function will not work since other contentsometimes also is an INT. I cannot use a select child solution since it is always outside div and to select the content of <div id="main">will also select the other div.
Thus is there a solution that can search the string from start for the first <and remove the rest of the string when found.
(The structure cannot be altered)
if that's the exactly format, you could just use substr and strpos
something like
$html = '<div id="main">
12345
<div>foobar6789</div>
</div>
';
$content_1 = substr($html,15,strpos($html,'<div>')-15); //the first INT content
$subdiv = str_replace("</div>","",substr($html,strpos($html,'<div>')+5));
preg_match('/(?P<noint>[^0-9]+)(?P<digit>\d+)/', $subdiv, $matches);
echo $matches['noint'];//the NO INT content
echo $matches['digit'];//the second INT
it's not a good idea to parse html using regexp... but maybe you could do it using only preg_match...
good luck!

Resources