I am trying to write an XPath query in my Wix installer to select node with zero child elements.
My Element path looks like this as it's a formatted string and I escape square braces.
ElementPath="a/b[\[]not(node())[\]"
whieh I believe translates to
a/b[not(node())]
My xml is of the form
<a>
<b><x/><y/></b>
<b/>
<b><z/></b>
</a>
and I expect my Element path to select the second element which has no children. Bit when I run light on my installer, it throws an invalid format error. Please suggest where I am going wrong.
Also , if I have to select a node with specific value,
would this be the right way?
ElementPath="a/b[\[]z[\]"
so that I select the third node as child element
I think you need another closing square bracket in the expression :
ElementPath="a/b[\[]not(node())[\]]"
Notice this part for escaping opening square bracket :
[\[]
And another important part for escaping closing square bracket :
[\]]
You can see each of the two important parts I mentioned above as escaped square bracket part \] enclosed within another square brackets []. Put it all together and it become [\]].
Related
There is a page like the following:
<html>
<head></head>
<body>
<p> 5-8 </p>
<p></br>5-8</br></p>
<p> </br>5-8 </br></p>
</body>
</html>
The goal is to abstract the text in each p, the breaks and whitespaces are not wanted.
How to achieve that?
Thanks in advance! Best Wishes!
--The first Updating
Another post suggested using normalize_space(). I tried that, well, It can remove the spaces. However, only one node is left. How can I get all 30 node text without unwanted spaces? Thanks in advance and Best wishes!
enter image description here
It's not possible to achieve what you want entirely in XPath 1.0, but in XPath 2.0 or later it is possible.
You don't say what XPath interpreter you have available but you mention Chrome's XPath Helper which relies on Chrome's built in XPath interpreter which supports XPath 1.0 (as is the norm for web browsers).
But it's possible you are just using Chrome to examine the data, and have another, more modern XPath interpreter such as e.g. Saxon. If so, an XPath 2.0 solution will work for you, though you won't be able to use it in Chrome, obviously.
I've tidied up your XML example:
<html>
<head></head>
<body>
<p> 5-8 </p>
<p><br/>5-8<br/></p>
<p> <br/>5-8 <br/></p>
</body>
</html>
NB those are non-breaking spaces there.
In XPath 2.0:
for $paragraph in //p
return normalize-space(
translate($paragraph, codepoints-to-string(160), ' ')
)
NB this uses the translate function to convert non-breaking spaces (the char with Unicode codepoint 160) to a space, and then uses normalize-space to trim leading and trailing whitespace (I'm not sure what you would want to do if there were whitespace in the middle of the para, instead of just at the start or end; this will convert any such sequence of whitespace to a single space character). You might think normalize-space would be enough, but in fact a non-breaking-space doesn't fall into normalize-space's category of "white space" so it would not be trimmed.
In XPath 1.0 is not exactly possible to do what you want. You could use an XPath expression that would return each p element to your host language, and then iterate over those p elements, executing a second XPath expression for each one, with that p as the context. Essentially this mean moving the for ... in ... return iterator from XPath into your host language. To select the paragraphs:
//p
... and then for each one:
normalize-space(
translate(., ' ', ' ')
)
NB in that expression, the first string literal is a non-breaking-space character, and the second is a space. XPath 1.0 doesn't have the codepoints-to-string function or I'd have used that, for clarity.
The . which is the first parameter to the translate function represents the context node (the current node). When you execute this XPath expression in your host language you need to pass one of the p elements as the context node. You don't say what host language you're using, but in JavaScript, for instance, you could use the document.evaluate function to execute the first XPath, receiving an iterator of p elements. Then for each element, you'd call its evaluate method to execute the second XPath, and that would ensure that the p element was the context node for the XPath (i.e. the . in the expression).
I have a text with many expressions like this <.....>, e.g.:
<..> Text1 <.sdfdsvd> Text 2 <....dgdfg> Text3 <...something> Text4
How can I eliminate now all brackets <...> and all commands/texts between these brackets? But the other "real" text between these (like text1, text2 above) should not be touched.
I tried with the regular expression:
<.*>
But this finds also a block like this, including the inbetween text:
<..> Text1 <.sdfdsvd>
My second try was to search for alle expressions <.> without a third bracket between these two, so I tried:
<.*[^>^<]>
But that does not work either, no change in behavior. How to construct the needed expression correctly?
This works in Notepad++:
Find what: <[^>]+?>
Replace with: nothing
Try it out: http://regex101.com/r/lC9mD4
There are a few problems with your attempt: <.*[^>^<]>
.* matches all characters up through the final possible match. This means that all tags except the last will be bypassed. This is called greedy. In my solution, I have changed it to possessive, which goes up to the first possible match: .*?...although I apply this to the character class itself: [^>]+?.
[^>^<] is incorrect for two reasons, one small, one big. The small reason is that the first caret ^ says "do not match any of the following characters", and the characters following it are >, ^, and <. So you are saying you don't want to match the caret character, which is incorrect (but not harmful). The larger problem is that this is attempting to match exactly one character, when it needs to be one or more, which is signified by the plus sign: [^><]+.
Otherwise, your attempt is not that far off from my solution.
This seems to work:
<[^\s]*>
It looks for a left bracket, then anything that isn't whitespace between the brackets, then a right bracket. It would need some adjusting if there's whitespace between the brackets (<text1 text2>), though, and at that point a modification of one of your attempts would work better:
<[^<^>]*>
This one looks for a left bracket, then anything that isn't a left bracket or right bracket, then a right bracket.
Try <.*?>. If you don't use the "?", regular expressions will try to find the longest string that matches. Using "*?" will force to find the shortest.
I am looking for a regex to replace all terms in parentheses unless the parentheses are within square brackets.
e.g.
(matches) #match
[(do not match)] #should not match
[[does (not match)]] #should not match
I current have:
[^\]]\([^()]*\) #Not a square bracket, an opening bracket, any non-bracket character and a closing bracket.
However this is still matching words within the square brackets.
I have also created a rubular page of my progress so far: http://rubular.com/r/gG22pFk2Ld
A regex is not going to cut it for you if you can nest the square brackets (see this related question).
I think you can only do this with a regex if (a) you only allow one level of square brackets and (b) you assume all square brackets are properly matched. In that case
\([^()]*\)(?![^\[]*])
is sufficient - it matches any parenthesised expression not followed by an unpaired ]. You need (b) because of the limitations of negative lookbehind (only fixed length strings in 1.9, and not allowed at all in 1.8), which mean you are stuck matching (match)] even if you don't want to.
So basically if you need to nest, or to allow unmatched brackets, you should ditch the regex and look at the answer to the question I linked to above.
This is a type of expression you cannot parse using a pure-regex approach, because you need to keep track of the current nesting/state_if_in_square_bracket (so you don't have a type 3 language anymore).
However, depending on the exact circumstances, you can parse it with multiple regexes or simple parsers. Example approaches:
Split into sub-strings, delimited by
[/[[or ]/]], change the state
when such a square bracket is
encountered, replace () in a
sub-string if in
"not_in_square_bracket" state
Parse for square brackets (including content), remove & remember them (these are "comments"), now replace all the content in normal brackets and re-add the square brackets stuff (you can remember stuff by using unique temp strings)
The complexity of your solution also depends on the detail if escaping ] is allowed.
I was writing an XPath expression, and I had a strange error which I fixed, but what is the difference between the following two XPath expressions?
"//td[starts-with(normalize-space()),'Posted Date:')]"
and
"//td[starts-with(normalize-space(text()),'Posted Date:')]"
Mainly, what will the first XPath expression catch? Because I was getting a lot of strange results. So what does the text() make in the matching? Also, is there is a difference if I said normalize-space() & normalize-space(.)?
Well, the real question is: what's the difference between . and text()?
. is the current node. And if you use it where a string is expected (i.e. as the parameter of normalize-space()), the engine automatically converts the node to the string value of the node, which for an element is all the text nodes within the element concatenated. (Because I'm guessing the question is really about elements.)
text() on the other hand only selects text nodes that are the direct children of the current node.
So for example given the XML:
<a>Foo
<b>Bar</b>
lish
</a>
and assuming <a> is your current node, normalize-space(.) will return Foo Bar lish, but normalize-space(text()) will fail, because text() returns a nodeset of two text nodes (Foo and lish), which normalize-space() doesn't accept.
To cut a long story short, if you want to normalize all the text within an element, use .. If you want to select a specific text node, use text(), but always remember that despite its name, text() returns a nodeset, which is only converted to a string automatically if it has a single element.
I need to locate the node within an xml file by its value using XPath.
The problem araises when the node to find contains value with whitespaces inside.
F.e.:
<Root>
<Child>value</Child>
<Child>value with spaces</Child>
</Root>
I can not construct the XPath locating the second Child node.
Simple XPath /Root/Child perfectly works for both children, but /Root[Child=value with spaces] returns an empty collection.
I have already tried masking spaces with %20, & #20;, & nbsp; and using quotes and double quotes.
Still no luck.
Does anybody have an idea?
Depending on your exact situation, there are different XPath expressions that will select the node, whose value contains some whitespace.
First, let us recall that any one of these characters is "whitespace":
-- the Tab
-- newline
-- carriage return
' ' or -- the space
If you know the exact value of the node, say it is "Hello World" with a space, then a most direct XPath expression:
/top/aChild[. = 'Hello World']
will select this node.
The difficulties with specifying a value that contains whitespace, however, come from the fact that we see all whitespace characters just as ... well, whitespace and don't know if a it is a group of spaces or a single tab.
In XPath 2.0 one may use regular expressions and they provide a simple and convenient solution. Thus we can use an XPath 2.0 expression as the one below:
/*/aChild[matches(., "Hello\sWorld")]
to select any child of the top node, whose value is the string "Hello" followed by whitespace followed by the string "World". Note the use of the matches() function and of the "\s" pattern that matches whitespace.
In XPath 1.0 a convenient test if a given string contains any whitespace characters is:
not(string-length(.)= stringlength(translate(., '
','')))
Here we use the translate() function to eliminate any of the four whitespace characters, and compare the length of the resulting string to that of the original string.
So, if in a text editor a node's value is displayed as
"Hello World",
we can safely select this node with the XPath expression:
/*/aChild[translate(., '
','') = 'HelloWorld']
In many cases we can also use the XPath function normalize-space(), which from its string argument produces another string in which the groups of leading and trailing whitespace is cut, and every whitespace within the string is replaced by a single space.
In the above case, we will simply use the following XPath expression:
/*/aChild[normalize-space() = 'Hello World']
Try either this:
/Root/Child[normalize-space(text())=value without spaces]
or
/Root/Child[contains(text(),value without spaces)]
or (since it looks like your test value may be the issue)
/Root/Child[normalize-space(text())=normalize-space(value with spaces)]
Haven't actually executed any of these so the syntax may be wonky.
Locating the Attribute by value containing whitespaces using XPath
I have a input type element with value containing white space.
eg:
<input type="button" value="Import Selected File">
I solved this by using this xpath expression.
//input[contains(#value,'Import') and contains(#value ,'Selected')and contains(#value ,'File')]
Hope this will help you guys.
"x0020" worked for me on a jackrabbit based CQ5/AEM repository in which the property names had spaces. Below would work for a property "Record ID"-
[(jcr:contains(jcr:content/#Record_x0020_ID, 'test'))]
did you try #x20 ?
i've googled this up like on the second link:
try to replace the space using "x0020"
this seems to work for the guy.
All of the above solutions didn't really work for me.
However, there's a much simpler solution.
When you create the XMLDocument, make sure you set PreserveWhiteSpace property to true;
XmlDocument xmldoc = new XmlDocument();
xmldoc.PreserveWhitespace = true;
xmldoc.Load(xmlCollection);