I'm working with Scrapy and lxml trees to sort out html trees.
I noticed that there is difference between these two xpath expressions. I was under the impression that they were interchangeable. Could someone please explain me the difference?
response.xpath('/html/body/div/table/tr/td/table/tr/td/table/tr/td/table/tr/td/table/tr/td/a/img/..//text()').extract()
response.xpath('/html/body/div/table/tr/td/table/tr/td/table/tr/td/table/tr/td/table/tr/td/a//text()').extract()
The difference between a/img/..//text() and a//text() is that the first will return you text nodes ONLY from a elements with img elements as children, whereas the second will return text nodes from a elements irrespective of whether they have img elements as children.
Put another way, a/img/..//text() could equally be written a[img]//text(); compare this with a//text().
Related
In a standard implementation of the Rope data structure using splay trees, the nodes would be ordered according to a rank statistic measuring the position of each one from the start of the string, so the keys normally found in binary search tree would be irrelevant, would they not?
I ask because the keys shown in the graphic below (thanks Wikipedia!) are letters, which would presumably become non-unique once the number of nodes exceeded the length of the chosen alphabet. Wouldn't it be better to use integers or avoid using keys altogether?
Separately, can anyone point me to a good implementation of the logic to recompute rank statistics after each operation?
Presumably, if the index for a split falls within the substring attached to a particular node, say, between "Hel" and "llo_" on the node E above, you would remove the substring from E, split it and reattach it as two children of E. Correct?
Finally, after a certain number of such operations, the tree could, I suppose, end up with as many leaves as letters. What would be the best way to keep track of that and prune the tree (by combining substrings) as necessary?
Thanks!
For what it's worth, you can implement a Rope using Splay Trees by attaching a substring to each node of the binary search tree (not just to the leaf nodes as shown above).
The rank of each node is its size plus the size of its left subtree. But when recomputing ranks during splay operations, you need to remember to walk down the node.left.right branch, too.
If each node records a reference to the substring it represents (cf. the actual substring itself), everything runs faster. That way when a split operation falls within an existing node, you just need to modify the node's attributes to reflect the right part of the substring you want to split, then add another node to represent the left part and merge it with the left subtree.
Done as above, each node records (in addition its left, right and parent attributes etc.) its rank, size (in characters) and the location of the first character it represents in the string you're trying to modify. That way, you never actually modify the initial string: you just do your operations on bits of the tree and reproduce the final string when you're ready by walking it in order.
I have to generate a random tree (the data structure, not the graphical one) given some parameters: at least the mean depth and the mean number of children of the nodes (as floats). There is no other contrainst (for now at least).
I really don't know this field so maybe there is something obvious I missed when I googled but I couldn't find anything... Maze generation algorithms looked interesting but they don't have these parameters as far as I can tell.
So please, tell me if this is possible at all, and if it is, give me some pointers, or even keywords to search for.
Thanks
You can start by creating procedure gen, creating the tree with given height and random number of children on each level avgChildCount. The number of children is selected in range of:
[0, (avgChildCount*2).toInt]
So, when you have this procedure you can introduce another one, taking two averages avgHeight and avgChildCount calling gen:
gen(random(0, (avgHeight*2).toInt), avgChildCount)
why the concept node-set has been replaced by sequence in XPath 2.0? for what reason. What are the problems considered using the node-set? what is the advantage of the sequence from the node-set?
i say that :
A node-set contains zero or more nodes, no node can appear in the node set
more than once (that is, no duplicates are possible), and the nodes are not in any particular order.
and
A sequence, by contrast, allows a node to appear more than once (duplicates are permitted), and the nodes in the sequence are in a particular order; in addition, sequences can
contains nodes, atomic values, or any mixture of the two.
Firstly, the only kind of collection allowed in XPath 1.0 was a collection of nodes. XPath 2.0 also allows collections (sequences) of strings, numbers, and so on. Without this, functions such as tokenize() or string-to-codepoints() are impossible.
Secondly, having only sets rather than sequences means you can't do things like binding a variable to the result of a sort operation.
I have the following dilemma: I have a list of strings, and I want to find the set of string which start with a certain prefix. The list is sorted, so the naive solution is this:
Perform binary search on the prefixes of the set, and when you find an element that starts with the prefix, traverse up linearly until you hit the top of the subset.
This runs in linear time, however, and I was wondering if anyone can suggest a more efficient way to do it.
Do a binary search for the top, and do a binary search for the bottom. Once you find the first hit, you know the top is above that point and the bottom is below (or at that point in both cases). Once you have the top and bottom you have the solution.
You can do a similar binary search for the top element, except that the string you should be looking for is the first string that starts with a prefix strictly greater than the prefix in question. This also takes O(lg n) time.
Once you find one element in the set, just keep binary searching until you find the endpoints.
So I have such problem in Scala, I need to implement binary search with the help of actors, with no loops and recursion, preferably with concurrency between actors. Of course it has no sense, but the problem is as follows. I think that it will be nice to have one actor-coordinator,which coordinates the work of others. So input data is sorted array and the key for searching. Output - index of a key. Do you have some ideas how it possible to implement?
Thanks in advance.
I'm not sure how you could have concurrency for binary search, as every step of the algorithm needs the result of the last one.
You could do "n-ary" search: Split the array in n parts and let every actor compare the value at the boundaries of the sub-arrays. You don't even have to wait for all answers, as soon as you got the two actors with different comparision result, you could start the next round recursively for the subarray you found.