Preserve spaces from attribute values with xpath

Preserve spaces from attribute values with xpath - xpath

Please consider this file: http://www.w3schools.com/dom/books.xml where line:
<book category="children">
is replaced with:
<book category=" children ">
Executing xpath query in vbscript for category attribute value:
For Each n In objXML.selectNodes("//book/#category")
WScript.Echo n.text
Next
returns result where leading and trailing spaces are removed:
children
This doesn't happen with any other xpath evaluator I have tried.
So is it possible MS to return attribute value as it is, without removing spaces?

It's possible through value property instead text.
You'd like to know the differences between those two properties.
Please read the reference, especially Remarks sections.
value Property
text Property
As you can see, in this case main differences between value and text is that text normalizes, value doesn't.
For Each n In objXML.selectNodes("//book/#category")
WScript.Echo n.text
WScript.Echo n.value
Next

Related

Classic ASP InStr() Evaluates True on Empty Comparison String

I ran into an issue with the Classic ASP VbScript InStr() function. As shown below, the second call to InStr() returns 1 when searching for an empty string in a non empty string. I'm curious why this is happening.
' InStr Test
Dim someText : someText = "So say we all"
Dim emptyString : emptyString = ""
'' I expect this to be true
If inStr(1,someText,"so",1) > 0 Then
Response.write ( "I found ""so""<br />" )
End If
'' I expect this to be false
If inStr(1, someText, emptyString, 1) > 0 Then
Response.Write( "I found an empty string<br />" )
End If
EDIT:
Some additional clarification: The reason for the question came up when debugging legacy code and running into a situation like this:
Function Go(value)
If InStr(1, "Option1|Option2|Option3", value, 1) > 0 Then
' Do some stuff
End If
End Function
In some cases function Go() can get called with an empty string. The original developer's intent was not to check whether value was empty, but rather, whether or not value was equal to one of the piped delimited values (Option1,Option2, etc.).
Thinking about this further it makes sense that every string is created from an empty string, and I can understand why a programming language would assume a string with all characters removed still contains the empty string.
What doesn't make sense to me is why programming languages are implementing this. Consider these 2 statements:
InStr("so say we all", "s") '' evaluates to 1
InStr("so say we all", "") '' evaluates to 1
The InStr() function will return the position of the first occurrence of one string within another. In both of the above cases, the result is 1. However, position 1 always contains the character "s", not an empty string. Furthermore, using another string function like Len() or LenB() on an empty string alone will result in 0, indicating a character length of 0.
It seems that there is some inconsistency here. The empty string contained in all strings is not actually a character, but the InStr() function is treating it as one when other string functions are not. I find this to be un-intuitive and un-necessary.

The Empty String is the Identity Element for Strings:
The identity element I (also denoted E, e, or 1) of a group or related
mathematical structure S is the unique element such that Ia=aI=a for
every element a in S. The symbol "E" derives from the German word for
unity, "Einheit." An identity element is also called a unit element.
If you add 0 to a number n the result is n; if you add/concatenate "" to a string s the result is s:
>> WScript.Echo CStr(1 = 1 + 0)
>> WScript.Echo CStr("a" = "a" & "")
>>
True
True
So every String and SubString contains at least one "":
>> s = "abc"
>> For p = 1 To Len(s)
>> WScript.Echo InStr(p, s, "")
>> Next
>>
1
2
3
and Instr() reports that faithfully. The docs even state:
InStr([start, ]string1, string2[, compare])
...
The InStr function returns the following values:
...
string2 is zero-length start
WRT your
However, position 1 always contains the character "s", not an empty
string.
==>
Position 1 always contains the character "s", and therefore an empty
string too.

I'm puzzled why you think this behavior is incorrect. To the extent that asking Does 'abc' contain ''? even makes sense, the answer has to be "yes": All strings contain the empty string as a trivial case. So the answer to your "why is this happening" question is because it's the only sane thing to do.

It is s correct imho. At least it is what I expect that empty string is part of any other string. But maybe this is a philosophical question. ASP does it so, so live with it. Practically speaking, if you need a different behavior write your own Method, InStrNotEmpty or something, which returns false on empty search string.

RegEx to remove new line characters and replace with comma

I scraped a website using Nokogiri and after using xpath I was left with the following string (which is a few td's pushed into one string).
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
My goal is to make this into an array that looks like the following(it will be a nested array):
["Total First Downs", "359", "274"]
The issue is creating a regex equation that removes the escaped characters, subs in one "," but does not sub in a "," after the last set of integers. If the comma after the last set of integers is necessary, I could use #compact to get rid of the nil that occurs in the array. If you need the code on how I scraped the website here it is: (please note i saved the webpage for testing in order for my ip address to not get burned during the trial phase)
f = File.open('page')
doc = Nokogiri::HTML:(f)
f.close
number = doc.xpath('//tr[#class="tbdy1"]').count
stats = Array.new(number) {Array.new}
i = 0
doc.xpath('//tr[#class="tbdy1"]').each do |tr|
stats[i] << tr.text
i += 1
end
Thanks for your help

I don't fully understand your problem, but the result can be easily achieved with this:
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t"
.split(/[\n\t]+/)
# => ["Total First Downs", "359", "274"]

Try with gsub
"Total First Downs\n\t\t\t\t\t\t\t\t359\n\t\t\t\t\t\t\t\t274\n\t\t\t\t\t\t\t".gsub("/[\n\t]+/",",")

Xpath with htmlagilitypack

I am try to select the "string b" text node using XPath with the HtmlAgilliyPack.
<div>
string a<br/>
string b<br/>
string c<br/>
</div>
I am not sure how to select the text?
This won't work //div/text(1)
Anybody has some suggestions?

There are two problems with your expression:
XPath starts counting at 1, so you want the second text node
text() is a node filter which does not accept arguments. If you want to limit to the second text node, use the predicate [position() = 2] or the short version [2].
Use this expression:
//div/text()[2]
Selecting text nodes can include some hassles, chopping leading and trailing whitespace and omitting whitespace-only text nodes is implementation-dependent.

Try:
//div/br[1]/following-sibling::text()[1]'
The direct following text after the first br.

XPath 2.0:reference earlier context in another part of the XPath expression

in an XPath I would like to focus on certain elements and analyse them:
...
<field>aaa</field>
...
<field>bbb</field>
...
<field>aaa (1)</field>
...
<field>aaa (2)</field>
...
<field>ccc</field>
...
<field>ddd (7)</field>
I want to find the elements who's text content (apart from a possible enumeration, are unique. In the aboce example that would be bbb, ccc and ddd.
The following XPath gives me the unique values:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
Now I would like to extent that and perform another XPath on all the distinct values, that would be to count how many field start with either of them and retreive the ones who's count is bigger than 1.
These could be a field content that is equal to that particular value, or it starts witrh that value and is followed by " (". The problem is that in the second part of that XPath I would have refer to the context of that part itself and to the former context at the same time.
In the following XPath I will - instead of using "." as the context- use c_outer and c_inner:
distinct-values(//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))[count(//field[(c_inner = c_outer) or starts-with(c_inner, concat(c_outer, ' ('))]) > 1]
I can't use "." for both for obvious reasons. But how could I reference a particular, or the current distinct value from the outer expression within the inner expression?
Would that even be possible?

XQuery can do it e.g.
for $s
in distinct-values(
//field[matches(normalize-space(.), ' \([0-9]\)$')]/substring-before(., '(')))
where count(//field[(. = $s) or starts-with(., concat($s, ' ('))]) > 1
return $s

XPath - find first occurance of string

I'm trying to select an anchor element by first containing the text "To Be Coded", then extracting a number from a string using substring, then using the greater than comparison operator (>0). This is what I have thus far:
/a[number(substring(text(),???,string-length()-1))>0]
An example of the HTML is:
<a class="" href="javascript:submitRequest('getRec','30', '63', 'Z')">
To Be Coded (23)
</a>
My issue right now is I don't know how to find the first occurrence of the open parenthesis. I'm also not sure how to combine what I have with the contains(text(),"To Be Coded") function.
So my criteria for the selection is:
Must be an anchor element
Must include the text "To Be Coded"
Must contain a number greater than 0 in the parentheses
Edit: I suppose I could just "hard code" the starting position for the substring, but I'm not sure what that would be - will XPath count the white space before the text in the element? How would it handle/count the characters?

Here try this :
a[contains(., 'To Be Coded') and number(substring-before(substring-after(., '('), ')')) > 0]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Preserve spaces from attribute values with xpath - xpath

Related

Classic ASP InStr() Evaluates True on Empty Comparison String

RegEx to remove new line characters and replace with comma

Xpath with htmlagilitypack

XPath 2.0:reference earlier context in another part of the XPath expression

XPath - find first occurance of string

Categories

Resources