Replacement for descendant-or-self - xpath

I have a xpath $x/descendant-or-self::*/#y which I have changed to $x//#y as it improved the performance.
Does this change have any other impact?

As explained in the W3C XPath Recommendation, // is short-hand for /descendant-or-self::node()/, so that is a slight difference. But since attributes can only occur on elements, I think this replacement is safe.
That might also explain why you see a performance boost, since MarkLogic will need to worry less whether there really are elements in between.
HTH!

Related

How can I optimize an XPath expression?

Is there any way I can shorten the following condition used in an XPath expression?
(../parent::td or ../parent::ol or ../parent::ul)
The version of XPath is 1.0.
The shortest is probably
../..[self::td|self::ol|self::ul]
Whether there is a performance difference between "|" and "or" will depend on the processor, but I suspect that in most cases it won't be noticeable. For performance, the important thing is to put the conditions in the right order (the one most likely to return true should come first). Factoring out the navigation to the grandparent should almost certainly help performance, with the caveats (a) your XPath engine may do this optimization automatically, and (b) the difference will probably be so tiny you will have trouble measuring it.
use the '|' operator.
(../parent::td|../parent::ol|../parent::ul)
Slightly shorter:
../..[self::td or self::ol or self::ul]
Example usage:
//p[../..[self::td or self::ol or self::ul]]

Is the :last-child selector bad for performance?

I use :last-child selector plenty of times, mostly when using border-bottom in a list where I use border: none; for the last child, or when using margins. So my question is, is the :last-child selector bad from a performance point of view?
Also I've heard that it was removed from the CSS2 specification because using :first-child is easy for the browser to detect, but for detecting :last-child it needs to loop back.
If it was deferred from CSS2 for performance concerns, but reintroduced in Selectors 3, I suspect that it's because performance is no longer an issue as it previously was.
Remember that :last-child is the definitive and only way to select the last child of a parent (besides :nth-last-child(1), obviously). If browser implementers no longer have performance concerns, neither should we as authors.
The only compelling reason I can think of for overriding a border style with :first-child as opposed to :last-child is to allow compatibility with IE7 and IE8. If that boosts performance, let that be a side effect. If you don't need IE7 and IE8 support, then you shouldn't feel compelled to use :first-child over :last-child. Even if browser performance is absolutely critical, you should be addressing it the proper way by testing and benchmarking, not by premature optimization.
In general, the core CSS selectors perform well enough that most of us should not be worried about using them. Yes, some of them do perform worse than others, but even the worst performing ones are unlikely to be the main bottleneck in your site.
Unless you've already optimised everything else to perfection, I would advise not worrying about this. Use a profiling tool like YSlow to find the real performance issues on your site and fix those.
In any case, even if there is a noticable performance implication for a given CSS selector (or any other browser feature), I would say that it's the browser makers' responsibility to fix it, not yours to work around it.
I believe it's still the simplest, low-performance way to get your last child.
by that I mean, all others way to get the last child will be worse for performance, because it won't have any work done by the W3C community before.

XPath performance & versions

I have 3 questions:
1) Is XPath string "//table[position()=8 or position()=10]/td[1]/span[2]/text()" faster than the XPath string "//table[8]/td[1]/span[2]/text() | //table[10]/td[1]/span[2]/text()"?
I use XPath with .NET CSharp and HTMLAgilityPack.
2) How can I determine what version of XPath I use. If I use XPath 1.0, how to upgrade to XPath 2.0?
3) Is there a performance optmimization and improvement into XPath 2.0 or just new features and new syntax?
XPath 2.0 expands significantly on XPath 1.0 (read here for a summary), though you don't need to switch unless you would benefit from the new functionality.
As for which one would be faster I believe the first one would be faster because you're repeating the node search in the second case. The first case is also more readable, and in general you want to go with the more readable one anyways.
As to the performance question, I'm afraid I don't know. It depends on the optimizer in the particular XPath processor you are using. If it's important to you, measure it. If it's not important enough to measure, then it's not important enough to worry about.
As I mentioned in my previous reply, //table[8] smells wrong to me. I think it's much more likely that you want (//table)[8]. (Both are valid XPath expressions, but they produce different answers).
You can probably assume that a processor is XPath 1.0 unless it says otherwise - if it supports 2.0, they'll want you to know. But you can easily test, for example by seeing what happens when you do //a except //b.
There's no intrinsic reason why an XPath 2.0 processor should be faster than a 1.0 processor on the same queries. In fact, it might be a bit slower, because it's required to do more careful type-checking. On the other hand it might be a lot faster, because many 1.0 processors were dashed off very quickly and never upgraded. But there are massive improvements in functionality in 2.0, for example regular expression support.

Performance of XPath vs DOM

Would anyone enlighten me some comprehensive performance comparison between XPath and DOM in different scenarios? I've read some questions in SO like xPath vs DOM API, which one has a better performance and XPath or querySelector?. None of them mentions specific cases. Here's somethings I could start with.
No iteration involved. getElementById(foobar) vs //*[#id='foobar']. Is former constantly faster than latter? What if the latter is optimized, e.g. /html/body/div[#id='foo']/div[#id='foobar']?
Iteration involved. getElementByX then traverse through child nodes vs XPath generate snapshot then traverse through snapshot items.
Axis involved. getElementByX then traverse for next siblings vs //following-sibling::foobar.
Different implementations. Different browsers and libraries implement XPath and DOM differently. Which browser's implementation of XPath is better?
As the answer in xPath vs DOM API, which one has a better performance says, average programmer may screw up when implementing complicated tasks (e.g. multiple axes involved) in DOM way while XPath is guaranteed optimized. Therefore, my question only cares about the simple selections that can be done in both ways.
Thanks for any comment.
XPath and DOM are both specifications, not implementations. You can't ask questions about the performance of a spec, only about specific implementations. There's at least a ten-to-one difference between a fast XPath engine and a slow one: and they may be optimized for different things, e.g. some spend a lot of time optimizing a query on the assumption it will be executed multiple times, which might be the wrong thing to do for single-shot execution. The one thing one can say is that the performance of XPath depends more on the engine you are using, and the performance of DOM depends more on the competence of the application programmer, because it's a lower-level interface. Of course all programmers consider themselves to be much better than average...
This page has a section where you can run tests to compare the two and see the results in different browsers. For instance, for Chrome, xpath is 100% slower than getElementById.
See getElementById vs QuerySelector for more information.
I agree with Michael that it may depends on implementation, but I would generally say that DOM is faster. The reason is because there is no way that I see you can optimize the parsed document to make XPath faster.
If you're traversing HTML and not XML, specialized parser is able to index all the ids and classes in the document. This will make getElementById and getElementsByClass much faster.
With XPath, there's only one way to find the element of that id...by traversing, either top down or bottom up. You may be able to memoize repeated queries (or partial queries), but I don't see any other optimization that can be done.

Do many old ColdFusion Performance admonitions still apply in CFMX 8?

I have an old standards document that has gone through several iterations and has its roots back in the ColdFusion 5 days. It contains a number of admonitions, primarily for performance, that I'm not so sure are still valid.
Do any of these still apply in ColdFusion MX 8? Do they really make that much difference in performance?
Use compare() or compareNoCase() instead of is not when comparing strings
Don't use evaluate() unless there is no other way to write your code
Don't use iif()
Always use struct.key or struct[key] instead of structFind(struct,key)
Don't use incrementValue()
I agree with Tomalak's thoughts on premature optimization. Compare is not as readable as "eq."
That being said there is a great article on the Adobe Developer Center about ColdFusion Performance: http://www.adobe.com/devnet/coldfusion/articles/coldfusion_performance.html
Compare()/CompareNoCase(): comparing case-insensitively is more expensive in Java, too. I'd say this still holds true.
Don't use evaluate(): Absolutely - unless there's no way around it. Most of the time, there is.
Don't use Iif(): I can't say much about this one. I don't use it anyway because the whole DE() stuff that comes with it sucks so much.
struct.key over StructFind(struct,key): I'd suspect that internally both use the same Java method to get a struct item. StructFind() is just one more function call on the stack. I've never used it, since I have no idea what benefit it would bring. I guess it's around for backwards compatibility only.
IncrementValue(): I've never used that one. I mean, it's 16 characters and does not even increment the variable in-place. Which would have been the only excuse for it's existence.
Some of the concerns fall in the "premature optimization" corner, IMHO. Personal preference or coding style apart, I would only start to care about some of the subtleties in a heavy inner loop that bogs down the app.
For instance, if you do not need a case-insensitive string compare, it makes no sense using CompareNoCase(). But I'd say 99.9% of the time the actual performance difference is negligible. Sure you can write a loop that times 100000 iterations of different operations and you'd find they perform differently. But in real-world situations these academic differences rarely make any measurable impact.
Coldfusion MX 8 is several times faster than MX 7 from all accounts. When it came out, I read many opinions that simply upgrading for the performance boost without changing a line of code was well worth it... It was worth it. With the gains in processing power, memory availability, generally, you can do a lot more with less optimized code.
Does this mean we should stop caring and write whatever? No. Chances are where we take the most shortcuts, we'll have to grow the system the most there.
Finding that find line between enough engineering and not over-engineering a solution is a fine balance. There's a quote there by Knuth I believe that says "Premature optimizations is the root of all evil"
For me, I try to base it on:
how much it will be used,
how expensive that will be across my expected user base,
how critical/central it is to everything,
how often I may be coming back to the code to extend it into other areas
The more that these types of ideas lie in the "probably or one way or another I will", I pay more attention to it. If it needs to be readable and a small performance hit results, it's the better way to go for sustainability of the code.
Otherwise, I let items fight for my attention while I solve and build things of real(er) value.
The single biggest favour we can do ourselves is use a framework with any project, no matter how small and do the small things right from the beginning.
That way there is no sense of dread in going back to work on a system that was originally meant to be a temporary hack but never got re-factored.

Resources