XPath: descendants, but not by traversing this node - xpath

I have a tree of nodes which are quite frankly a mess.
|-...
|-cat
\-dog
|- dog *
| |- chicken
| | \- cat !
| \- cat !
| \- cat !
| \- dog
| |- cat
| \- ...
|- cat
|- dog
| \- cat
\- ...
Given that I've selected the asterisked 'dog' node, how can I select only those cats for whom it is the most recent 'dog' ancestor (i.e. those that have an exclamation mark)
Equivalently, how can I get only those cat descendants of the node that can be reached without traversing another dog node?
I'm working in lxml and currently have a bad solution involving disconnecting the graphs by drop_tree()-ing all dog nodes.

You could use EXSLT's set extensions: http://www.exslt.org/set/. They're available in lxml using namespaces={"set": "http://exslt.org/sets"} in your XPath expressions.
You could then do something like
asteriskeddog.xpath("set:difference(.//cat, .//dog/cat)",
namespaces={"set": "http://exslt.org/sets"})
meaning "all cat elements under the current node, except those under a dog element under the current node. I've used that trick in some microdata parsing with nested itemscope and itemprop elements

Related

<blockquote> tag inserted when using image in cell of RST table?

When I use the following code:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | |.. image:: /images/merchant-rating.png |
+----------------------+---------------+---------------------------------------------------------------------+
The text preceding the image in column C gets wrapped in <blockquote> tags in the HTML output. Is there any way to avoid this?
To avoid the blockquote tag in the first paragraph of the third column, you could try using this:
+----------------------+---------------+---------------------------------------------------------------------+
| A | B | C |
+======================+===============+=====================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |
| | | |img| |
+----------------------+---------------+---------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png
Instead, you'll get two paragraphs.
Use a substitution and remove the separating line so that Sphinx interprets the content as a single block of text.
+-----------------+--------------+------------------------------------------------------------------+
| A | B | C |
+=================+==============+==================================================================+
| Merchant Rating | Ad Extension | Star ratings plus number of reviews for the advertiser/merchant. |
| | | |img| |
+-----------------+--------------+------------------------------------------------------------------+
.. |img| image:: /images/merchant-rating.png

How to get non-striped table in AsciiDoctor?

I have this table:
[width="10%", cols="^", options="header"]
|===
| header
| one
| two
| three
| four
|===
Which renders as:
In order to get to none-striped:
I do this:
[width="10%", cols="^", options="header"]
|===
| Header
| one
{set:cellbgcolor:white}| two
| three
| four
|===
{set:cellbgcolor!}
But the disadvantage of this is clear (verbosity, forcing specific color, ...), not to state that it doesn't work in other AsciiDoctor variants (e.g. PDF)
I am aware of issue #1365, but it's very new, and only implemented in the ruby variant of AsciiDoctor, not in its JS variant (with which most of the WYSIWYG editors work).
Long story short - is there anyway to achieve it in present state?
Did you try 'stripes=none' (manual)?
[cols="2,4,2,4,2", stripes=none, grid=none, frame=none]
|===
| ^.>| +++_____________________+++ | ^.>| +++_____________________+++ |
| ^.<| Unterschrift | ^.<| Unterschrift |
|===

Merge two trees on equal nodes

Nodes are equal when their IDs are equal. IDs in one tree are unique. On the schemas, IDs of nodes are visible.
Consider tree1:
root
|
+-- CC
| |
| \-- ZZZ
| |
| \-- UU
|
\-- A
|
\-- HAH
And tree2:
root
|
+-- A
|
+-- ADD
|
\-- HAH
I would like that merge(tree1, tree2) will give this:
root
|
+-- CC
| |
| \-- ZZZ
| |
| \-- UU
|
\-- A
|
+-- HAH
|
\-- ADD
How to do it?
Node has typical methods like getParent(), getChildren().
Order of the children doesn't matter. So, the result could be also:
root
|
+-- A
| |
| +-- ADD
| |
| \-- HAH
|
\-- CC
|
\-- ZZZ
|
\-- UU
My proposition in pseudocode. Comments are more than welcome.
merge(tree1, tree2) {
for (node : tree2.bfs()) { // for every node in breadth-first traversal order
found = tree1.find(node.getParent()); // find parent in tree1
if (found == null) // no parent?
continue; // skip it, it's root
if (!found.getChildren().contains(node)) // no node from tree2 in tree1?
found.add(node); // add it
}
return tree1;
}
The basic algorithm is not hard:
def merge_trees (t1, t2):
make_tree(map(merge_trees,assign(getChildren(t1),getChildren(t2),tree_similarity)))
make_tree(children): create a tree with the given list of children
map(f,list): calls function f on each element of list and return the list of return values
assign(list1,list2,cost_function): implements the Hungarian algorithm, returning the list of matched pairs
The trick is in defining tree_similarity which would have to call assign recursively.
In fact, the efficient implementation would have to cache the return values of the assign calls.

XML xpath, get the parent element till a specific element

I'm looking for the right xpath syntax to get a specific parent of an element.
Example:
root
|- div
| |
| |----??? ---|
| | |-a [class=1]
| | |- text[ A TEXT I DON'T WANT]
| |
| |
| |
| |-text[THE TEXT]
|
|-div
| |-text[THE TEXT I DON'T WANT]
|
|-div
| |-text[THE TEXT I DON'T WANT]
I want to get the text "THE TEXT" but the one that contains a [class=1] inside the same div. Something like this:
//div//a[#class=1]/text[contains(.,'A TEXT')]/parent::*/parent::*.... <till div element> /text
Given the XML
<?xml version="1.0"?>
<root>
<foo id="id1">
<foo id="i2">
<baz/>
</foo>
</foo>
</root>
You can find the nearest ancestor foo element from baz using the XPath expression:
//baz/ancestor::foo[1]
Which will select the foo element node of id "i2".
So in your example (if I understand right) once you have got the "a" element you want, you can get "back up" the tree to the nearest ancestor div by appending "/ancestor::div[1]" to your expression.
Use:
/root/div[.//a[#class='1']]/text()
This selects any text node that is a child of any a element that has a class attribute with value '1' and that (the a element) is a descendent of any div element that is a child of the top element named root.

The "Waiting lists problem"

A number of students want to get into sections for a class, some are already signed up for one section but want to change section, so they all get on the wait lists. A student can get into a new section only if someone drops from that section. No students are willing to drop a section they are already in unless that can be sure to get into a section they are waiting for. The wait list for each section is first come first serve.
Get as many students into their desired sections as you can.
The stated problem can quickly devolve to a gridlock scenario. My question is; are there known solutions to this problem?
One trivial solution would be to take each section in turn and force the first student from the waiting list into the section and then check if someone end up dropping out when things are resolved (O(n) or more on the number of section). This would work for some cases but I think that there might be better options involving forcing more than one student into a section (O(n) or more on the student count) and/or operating on more than one section at a time (O(bad) :-)
Well, this just comes down to finding cycles in the directed graph of classes right? each link is a student that wants to go from one node to another, and any time you find a cycle, you delete it, because those students can resolve their needs with each other. You're finished when you're out of cycles.
Ok, lets try. We have 8 students (1..8) and 4 sections. Each student is in a section and each section has room for 2 students. Most students want to switch but not all.
In the table below, we see the students their current section, their required section and the position on the queue (if any).
+------+-----+-----+-----+
| stud | now | req | que |
+------+-----+-----+-----+
| 1 | A | D | 2 |
| 2 | A | D | 1 |
| 3 | B | B | - |
| 4 | B | A | 2 |
| 5 | C | A | 1 |
| 6 | C | C | - |
| 7 | D | C | 1 |
| 8 | D | B | 1 |
+------+-----+-----+-----+
We can present this information in a graph:
+-----+ +-----+ +-----+
| C |---[5]--->1| A |2<---[4]---| B |
+-----+ +-----+ +-----+
1 | | 1
^ | | ^
| [1] [2] |
| | | |
[7] | | [8]
| V V |
| 2 1 |
| +-----+ |
\--------------| D |--------------/
+-----+
We try to find a section with a vacancy, but we find none. So because all sections are full, we need a dirty trick. So lets take a random section with a non empty queue. In this case section A and assume, it has an extra position. This means student 5 can enter section A, leaving a vacancy at section C which is taken by student 7. This leaves a vacancy in section D which is taken by student 2. We now have a vacancy at section A. But we assumed that section A has an extra position, so we can remove this assumption and have gained a simpler graph.
If the path never returned to section A, undo the moves and mark A as an invalid startingpoint. Retry with another section.
If there are no valid sections left we are finished.
Right now we have the following situation:
+-----+ +-----+ +-----+
| C | | A |1<---[4]---| B |
+-----+ +-----+ +-----+
| 1
| ^
[1] |
| |
| [8]
V |
1 |
+-----+ |
| D |--------------/
+-----+
We repeat the trick with another random section, and this solves the graph.
If you start with several students currently not assigned, you add an extra dummy section as their startingpoint. Of course, this means that there must be vacancies in any sections or the problem is not solvable.
Note that due to the order in the queue, it can be possible that there is no solution.
This is actually a Graph problem. You can think of each of these waiting list dependencies as edges on a directed graph. If this graph has a cycle, then you have one of the situations you described. Once you have identified a cycle, you can chose any point to "break" the cycle by "over filling" one of the classes, and you will know that things will settle correctly because there was a cycle in the graph.

Resources