Search for nodes having a certain attribute with htmlagilitypack

Search for nodes having a certain attribute with htmlagilitypack - xpath

I have only seen examples on how to search for nodes where attributes have or contain certain values but I cannot find one where you search for nodes where the attribute exists to start with.
How is that done?

You could try to just loop over it :
HtmlAgilityPack.HtmlDocument doc = htmlWeb.Load("somewebsite.org");
foreach(HtmlNode matchedNode in doc.DocumentNode.SelectNodes("//*[#attrX]") {
/* ... */
}

Related

Fetch absolute or relative path using file object in IBM filenet

String mySQLString = "select * from document where documentTitle like '%test%' ";
SearchSQL sql = new SearchSQL(mySQLString);
IndependentObjectSet s = search.fetchObjects(sql, 10, null, true);
Document doc;
PageIterator iterator = s.pageIterator();
iterator.nextPage();
for (Object object : iterator.getCurrentPage()) {
doc = (Document) object;
Properties properties = doc.getProperties();
//I am trying to get an absolute or relative path here for every document.
// for eg: /objectstorename/foldername/filename like this.
}
I have tried searching propeties and class descriptions in document . but can't able to find the path. ?

To do it all in one single query (as you are trying to do in your code) you can create a join with the ReferentialContainmentRelationship table. The property Head of this table points to the document, the property Tail points to the folder the document is filled in and the property ContainmentName is the name the document has in the folder. Use the following code to construct the document path:
SearchSQL searchSQL = new SearchSQL("SELECT R.ContainmentName, R.Tail, D.This FROM Document AS D WITH INCLUDESUBCLASSES INNER JOIN ReferentialContainmentRelationship AS R WITH INCLUDESUBCLASSES ON D.This = R.Head WHERE DocumentTitle like '%test%'");
SearchScope searchScope = new SearchScope(objectStore);
RepositoryRowSet objects = searchScope.fetchRows(searchSQL, null, null, null);
Iterator<RepositoryRow> iterator = objects.iterator();
while (iterator.hasNext()) {
RepositoryRow repositoryRow = iterator.next();
Properties properties = repositoryRow.getProperties();
Folder folder = (Folder) properties.get("Tail").getEngineObjectValue();
String containmentName = properties.get("ContainmentName").getStringValue();
System.out.println(folder.get_PathName() + "/" + containmentName);
}
Paths constructed this way can also be used to fetch the object from the object store. The query code can be optimized by using a property filter as the third argument of the fetchRows() method. Don't know how this behaves if the document is filed in multiple folders.

I suggest you explore the "Creating DynamicReferentialContainmentRelationship Objects" section of FileNet documentation:
https://www.ibm.com/support/knowledgecenter/SSNW2F_5.5.0/com.ibm.p8.ce.dev.ce.doc/containment_procedures.htm#containment_procedures__fldr_creating_a_drcr
A FileNet Ddocument can be assigned to multiple Folders, so you can have several logical "Paths" for a given document.
At end, you should get something like "Folder.get_PathName() + DynamicReferentialContainmentRelationship.get_Name()" to display the full pathname.
As described by samples in FileNet documentation, a relationship object (e.g. DynamicReferentialContainmentRelationship) controls the relation of document/folder:
myRelationshipObject.set_Head(myDocument);
myRelationshipObject.set_Tail(myFolder);
Also, keep in mind that a FileNet Document can be also a "unfiled" document, so there is no actual "pathname" or folder "relationship" to be retrieved.

tl;dr from FileNet Content Engine - Database Table for Physical path
Documents are stored among the directories at the leaf level using a hashing algorithm to evenly distribute files among these leaf directories.

Creating unique structures in Neo4j with them having nodes that are part of another structure

Let's say that we have n nodes with label :Test and a unique property called type.
UNWIND[{ type:"a" }, { type:"b" }, { type:"c" }, { type:"d" }] AS x
MERGE (t:Test { type: x.type })
RETURN t
That looks like this
Now let's introduce a node of label :Collection. The purpose if this node is to have a unique relationship pattern with the :Test nodes.
MATCH (a:Test { type:"a" }),(b:Test { type:"b" })
CREATE UNIQUE (x:Collection)-[:HAS]->(a),(x:Collection)-[:HAS]->(b)
Return *
The problem that I face starts occurring when I try to make another unique structure, like the previous one, but with some nodes in common.
MATCH (a:Test { type:"a" })
CREATE UNIQUE (x:Collection)-[:HAS]->(a)
RETURN *
The expected result is that another node of label :Collection gets created and linked to :Test {type:"a"} but the actual result is that it matches the previous data structure and returns that instead of creating a new one.
The expected result should have 2 :Collection nodes, one linked to type:"a", the other one linked to type:"a" and type:"b".
Any input kind of input will be very appreciated :D

From the neo4j docs on CREATE UNIQUE:
CREATE UNIQUE is in the middle of MATCH and CREATE — it will match
what it can, and create what is missing. CREATE UNIQUE will always
make the least change possible to the graph — if it can use parts of
the existing graph, it will.
You add Collection nodes without any properties. I think if CREATE UNIQUE finds a Collection node, it will use it. This is how CREATE UNIQUE is supposed to work.
So if you want a new Collection that is linked to some Test nodes, you can either add some unique properties to the node:
MATCH (a:Test { type:"a" })
CREATE UNIQUE (x:Collection {key: 'unique value'})-[:HAS]->(a)
RETURN *
Or create it in a separate step:
MATCH (a:Test { type:"a" })
CREATE (x:Collection)
CREATE (x)-[:HAS]->(a)
RETURN *
Or use MERGE instead of CREATE UNIQUE.

Is it possible to filter the descendant elements returned from an XPath query?

At the moment, I'm trying to scrape forms from some sites using the following query:
select * from html
where url="http://somedomain.com"
and xpath="//form[#action]"
This returns a result like so:
{
form: {
action: "/some/submit",
id: "someId",
div: {
input: [
... some input elements here
]
}
fieldset: {
div: {
input: [
... some more input elements here
]
}
}
}
}
On some sites this could go many levels deep, so I'm not sure how to begin trying to filter out the unwanted elements in the result. If I could filter them out here, then it would make my back-end code much simpler. Basically, I'd just like the form and any label, input, select (and option) and textarea descendants.
Here's an XPath query I tried, but I realised that the element hierarchy would not be maintained and this might cause a problem if there are multiple forms on the page:
//form[#action]/descendant-or-self::*[self::form or self::input or self::select or self::textarea or self::label]
However, I did notice that the elements returned by this query were no longer returned under divs and other elements beneath the form.

I don't think it will be possible in a plain query as you have tried.
However, it would not be too much work to create a new data table containing some JavaScript that does the filtering you're looking for.
Data table
A quick, little <execute> block might look something like the following.
var elements = y.query("select * from html where url=#u and xpath=#x", {u: url, x: xpath}).results.elements();
var results = <url url={url}></url>;
for each (element in elements) {
var result = element.copy();
result.setChildren("");
result.normalize();
for each (descendant in y.xpath(element, filter)) {
result.node += descendant;
}
results.node += result;
}
response.object = results;
» See the full example data table.
Example query
use "store://VNZVLxovxTLeqYRH6yQQtc" as example;
select * from example where url="http://www.yahoo.com"
» See this query in the YQL console
Example results
Hopefully the above is a step in the right direction, and doesn't look too daunting.
Links
Open Data Tables Reference
Executing JavaScript in Open Data Tables
YQL Editor

This is how I would filter specific nodes but still allow the parent tag with all attributes to show:
//form[#name]/#* | //form[#action]/descendant-or-self::node()[name()='input' or name()='select' or name()='textarea' or name()='label']
If there are multiple form tags on the page, they should be grouped off by this parent tag and not all wedged together and unidentifiable.
You could also reverse the union if it would help how you'd like the nodes to appear:
//form[#action]/descendant-or-self::node()[name()='input' or name()='select' or name()='textarea' or name()='label'] | //form[#name]/#*

xPath - How to add a condition to a node's parent?

I'm trying to add a condition to a node's parent, and I can't get it to work.
I only want the nodes having a certain class, but also for which the parent has also a certain class, like :
//*[#class='price' and parent#class='special-price']
Does someone have an idea on how to add conditions on parents too ?
Thanks

//*[#class='special-price']/*[#class='price']
If you're searching the whole document anyway, then filter the parents en route to the children, rather than selecting the children and then going back up to check the parent.

Use //*p[#class = 'special-price']/*[#class = 'price'] or //*[#class = 'price' and ../#class = 'special-price'].

ExtJS Tree same parentNode

I am rendering a Tree using Jason array that i get from a jsp page. So the tree has root node and 3 nodes and each node has more than 5 children and some of the children has same id and same text. It renders properly and no issues in display.
I am trying to make the user select child nodes of only one type (one of 3 nodes). if the user selects any node which is not the sibling of already existing node then i just need to un check already checked nodes. This sounds pretty simple and i coded it. I basically compared the parent nodes(node.parentNode.id) of the checked node with the already checked nodes(tree.getCheckedNodes())
the problem is when i select children nodes which have same id and text my logic fails and they say that they have same parentNode.id even though they have different parentNode.id. Does the tree panel check for duplicate elements and assign them to same parentnode while loading? what is going here and how to fix this any ideas. thank you.
Ext.onReady(function(){
var tree = new Ext.tree.TreePanel({
id: 'deficiencyTree',
renderTo: 'MyTable',
title: 'Deficiencies',
height: 'auto',
width: 525,
useArrows: true,
autoScroll: true,
animate: true,
enableDD: true,
containerScroll: true,
rootVisible: false,
frame: false,
root:{nodeType: 'async'},
dataUrl: 'jsonFile.jsp',
listeners: {
'checkchange': function (node, checked) {
if (checked) {
selNodes = tree.getChecked();
alert(selNodes);
Ext.each(selNodes, function (nodes) {
alert("id values for node and nodes "+node.parentNode.id+" "+nodes.parentNode.id);
if (nodes.parentNode.id != node.parentNode.id)
{
nodes.getUI().toggleCheck();
}
});
}
list.length = 0;
iii = 0;
selNodess = tree.getChecked();
Ext.each(selNodess, function (nodes) {
list[iii] = nodes.id;
iii++;
});
}
}
});
tree.getRootNode().expand(false);
});

As a semi-aside, ideally you should not be replicating node IDs at all (an ID should never be replicated, otherwise it isnt an ID). If you need the value that you are currently assigning to the ID field, add an additional attribute to the node and place it here- you can refer to this attribute when you need to. ExtJS isnt built to handle duplicate IDs for notes within the same tree very well at all.

I used hierarhical ids to solve this problem:
so if path to element was x->y->z then his id in tree will be x+y+z
you just need to change server-side code:
- Get id in format x+y+z, find last + and get z
- Return elements with ids [x+y+z+childId]

Good news. I got it working. it was pretty simple. As "Xupypr MV" suggested, i shouldn't be using same ID which is against the basic functionality, so i did put a different id for each node and put a new attribute named id2 and assigned it the value i needed and then accessed it using node.attribute["id2"], and it work perfectly well. previously i tried to get the attribute value as node.id2 just like node.id, node.text which did not work. Thanks again for the responses.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Search for nodes having a certain attribute with htmlagilitypack - xpath

I have only seen examples on how to search for nodes where attributes have or contain certain values but I cannot find one where you search for nodes where the attribute exists to start with. How is that done?

You could try to just loop over it : HtmlAgilityPack.HtmlDocument doc = htmlWeb.Load("somewebsite.org"); foreach(HtmlNode matchedNode in doc.DocumentNode.SelectNodes("//[#attrX]") { / ... */ }

Related

Fetch absolute or relative path using file object in IBM filenet

Creating unique structures in Neo4j with them having nodes that are part of another structure

Is it possible to filter the descendant elements returned from an XPath query?

xPath - How to add a condition to a node's parent?

ExtJS Tree same parentNode

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Search for nodes having a certain attribute with htmlagilitypack - xpath

I have only seen examples on how to search for nodes where attributes have or contain certain values but I cannot find one where you search for nodes where the attribute exists to start with. How is that done?

You could try to just loop over it : HtmlAgilityPack.HtmlDocument doc = htmlWeb.Load("somewebsite.org"); foreach(HtmlNode matchedNode in doc.DocumentNode.SelectNodes("//*[#attrX]") { /* ... */ }

Related

Fetch absolute or relative path using file object in IBM filenet

Creating unique structures in Neo4j with them having nodes that are part of another structure

Is it possible to filter the descendant elements returned from an XPath query?

xPath - How to add a condition to a node's parent?

ExtJS Tree same parentNode

Categories

Resources

You could try to just loop over it : HtmlAgilityPack.HtmlDocument doc = htmlWeb.Load("somewebsite.org"); foreach(HtmlNode matchedNode in doc.DocumentNode.SelectNodes("//[#attrX]") { / ... */ }