storing links of a site in a tree - algorithm

I am trying to store the links that I scrape from a site in a non binary tree. The links are laid out hierarchically (obviously). The question is how do I generate the tree ? I mean, how am I going to work my way through the pages provided by the link so that I know who is who's child.
For now I can get the first and the second level of links, but have no idea how to go from here besides that I have to recursively have to build it and have a way to stop when I get to a leaf (which I have).
What I was thinking was something like (code in Python):
def buildTree(root):
for node in root.children:
if <end condition here>:
continue
else:
nodes = getNodes(urllib2.urlopen(node.url).read())
node.addChildren(nodes)
buildTree(node)
where root and nodes are a user defined Node class

Obviously, the links in a site are not a tree, but a graph. You should have a Page object, which is identified by a URL, and a Link object, which points from one page to another (and Page A can point to page B, while page B is pointing to Page A, making it a graph, instead of a tree).
Scanning algorithm pseudo-code:
process_page(current_page):
for each link on the current_page:
if target_page is not already in your graph:
create a Page object to represent target_page
add it to to_be_scanned set
add a link from current_page to target_page
scan_website(start_page)
create Page object for start_page
to_be_scanned = set(start_page)
while to_be_scanned is not empty:
current_page = to_be_scanned.pop()
process_page(current_page)

Related

how to add nodes and links do d3-force without enter

I am trying to update nodes and links but would like to not use d3's enter pattern. The reason is that I want the svelte framework to do this instead as well as handle all the rendering, I just want to use d3-force for calculations.
I get the initial render to work just fine, but adding links and nodes has the following issues:
adding links makes the network grow
the links don't seem to have any effect on the graph layout, i.e. they don't seem to be added to the force.simulation
first adding nodes seems to work, but when adding nodes after adding a link they don't seem to excert forces on the other nodes.
Here's my functions for adding nodes and links:
function addNode(){
force.forceSimulation(data.nodes.push({"id": "116", "group": 5, "index":data.nodes.length, "x":0, "y":0, "vx":0, "vy":0}))
data=data
graph.alpha(1.0).update()
graph.restart()
}
function addLink(){
let n=getRandomInt(0, data.nodes.length-1)
let tn=getRandomInt(0, n)
let sn=getRandomInt(n, data.nodes.length-1)
graph.force("link", force.forceLink(data.links.push({'source':data.nodes[sn], 'target':data.nodes[tn],'index':data.links.length, 'value':getRandomInt(1, 7)})))
data=data
graph.alpha(1.0).update()
graph.restart()
}
I found this answer but they use merge and tie it to DOM elements. I don't understand how I can do this without relating to the DOM, just updating the array of nodes and links in javascript to have d3-force include it in the simulation.
Here I have the current simulation in a svelte REPL, you can fork and edit it.
Luckily disassociating a d3-force layout from the DOM is fairly easy: the force layout itself has no interaction with the DOM, it simply is a physics calculation based on some data's properties. Adding and removing data points (nodes/links) can be a bit tricky though, but is the same regardless of whether D3 renders the DOM, something else does, or you don't render the force at all.
Here's where you add links and nodes:
function addNode(){
force.forceSimulation(data.nodes.push({"id": "116", "group": 5, "index":data.nodes.length, "x":0, "y":0, "vx":0, "vy":0}))
data=data
graph.alpha(1.0).update()
graph.restart()
}
function addLink(){
let n=getRandomInt(0, data.nodes.length-1)
let tn=getRandomInt(0, n)
let sn=getRandomInt(n, data.nodes.length-1)
graph.force("link", force.forceLink(data.links.push({'source':data.nodes[sn], 'target':data.nodes[tn],'index':data.links.length, 'value':getRandomInt(1, 7)})))
data=data
graph.alpha(1.0).update()
graph.restart()
}
There are a few issues here:
Array.push() does not return an array. It modifies an array in place, returning the length of the array after pushing an item. This means you aren't actually adding nodes to the force layout. This will cause issues as the force layout requires objects rather than primitives to represent nodes. Instead just push the node/link, then pass the node/link array to .nodes() or .links()
force.forceSimulation() will create a new force layout generator, this is not what you want. You want to add nodes to the existing nodes, so we can use graph.nodes() instead.
There is no force.update(), this causes an error and is why you are unable to restart the simulation once it is done cooling down. We can just drop this part.
Let's see what these two functions look like correcting for this:
function addNode(){
data.nodes.push({"id": "116", "group": 5, "index":data.nodes.length, "x":0, "y":0, "vx":0, "vy":0})
graph.nodes(data.nodes)
data=data
graph.alpha(1.0).restart()
}
function addLink(){
let n=getRandomInt(0, data.nodes.length-1)
let tn=getRandomInt(0, n)
let sn=getRandomInt(n, data.nodes.length-1)
data.links.push({'source':data.nodes[sn], 'target':data.nodes[tn],'index':data.links.length, 'value':getRandomInt(1, 7)})
graph.force("link", force.forceLink(data.links))
data=data
graph.alpha(1.0).restart()
}
I'm not sure why you have data=data, I see no difference without it, I'll quietly assume it's a quirk of the framework
A small alternative for updating links:
You can access the force you've named 'link' and assign it new links with:
graph.force("link").links(data.links)
Rather than:
graph.force("link", force.forceLink(data.links))
The latter recreates a force, where as the first simply modifies it.

Google Drive API v3, is there a way to get a list of folders that are parents of a fileId?

In v2 it was possible to make a call to /files with the query fileId in children to get a list of DriveFile objects that were parents of the supplied file.
Now, it seems to be required to make a call to /files/:fileId?fields=parents, then make a separate call to /files/:parentId for each returned parent, possibly turning one call into a dozen.
Is this correct, and if so why? This is a huge performance hit to our app, so hopefully there's an undocumented method.
The query "'fileId' in children'" doesn't publicly exist (not documented/supported) in v2 either and I don't recall it ever existing. What does exist in V2 is the Parents collection which effectively answers the same question. In v3, to get the parents of a file you just get the child and ask for the parents field.
As for whether or not that is a performance hit, I don't think it is in practice. The Parents resource in v2 was very light to begin with, and other than the ID the only useful field was the 'isRoot' property. That you can calculate yourself by calling files/root up front to get the ID of the root folder for that user (just once and save it, it won't change for that user.)
If you need to get more information about the parents than just the IDs and are worried about the # of calls you have to make, use batching to fetch them. If you just have one parent, no need to batch (it's just overhead.) If you find that a file has multiple parents, create a batch request. That'll be sent as a single HTTP request/response and is handled very efficiently on the back end.
Point is, if you just need IDs, it's no worse than before. It's one call to get the parents of a file.
If you need more than IDs, it's at most 2 HTTP requests (outside really bizarre edge cases like 1000+ parents which would exceed the batch size :)
In V3 it is possible to list all children of a parent as it's explained here: https://developers.google.com/drive/v3/web/search-parameters
Example call:
https://www.googleapis.com/drive/v3/files?q=parents in '0Byho0qAdzabmVl8xcDR1S0pNY3c' of course replace spaces with %20, this will list all the files in the folder which has id='0Byho0qAdzabmVl8xcDR1S0pNY3c'
you just need to mention like below:
var request = service.Files.List();
request.Q = "('root' in parents)";
var FileListOfParentOnly = request.Execute();

OPC Foundation Tree structure

I've been searching the web, but I can't figure out how to get a tree view of the items on an OPC server. I used the following code:
using Opc.Da;
using Server=Opc.Da.Server;
using Factory=OpcCom.Factory;
string urlstring = string.Format("opcda://{0}/{1}/{{{2}}}", _hostName, _serverName, serverid);
Server s = new Server(new Factory(), new URL(urlstring));
ItemIdentifier itemId = null;
BrowsePosition position;
BrowseFilters filters = new BrowseFilters() {BrowseFilter = browseFilter.item};
BrowseElement[] elements = s.Browse(itemId, filters, out position);
You do not state what precisely does not work. However, the main problems is probably in the fact that you are using BrowseFilter = browseFilter.item. The nodes in the tree are either leaves (sometimes called items), or branches. Your code only asks for the leafs, under the root of the tree. There may be no items under the root whatsoever, and you need to obtain the branches as well, and then dwelve deeper into the branches, recursively.
Start by changing your code to use BrowseFilter = browseFilter.all. This should give you all nodes under the root. Then, call the Browse recursively for the branches (just branches, not items) you receive, using the item ID of each branch as the starting point for the new browse.

Getting a level of an ALV tree node?

I created an ALV TREE report, using cl_gui_alv_tree, that has 3 levels. I'm also implementing an event handler for when he double clicks a node.
My problem is that I want to take some actions only when he double clicks a node that is a root node. The event 'node_double_click' gives a node_key, but that's the index of the displayed table. How could I achieve this?
The node ID is not an index, it's the ID you assigned to the node when adding it to the tree.
If possible, I'd suggest switching to CL_SALV_TREE - not only because it is documented
and supported by SAP, but also because it comes with some query methods that are quite handy. These methods are documented as well. You can use, for example, GET_NODE to retrieve a node by its ID and then use GET_PARENT to check whether the node in question is a top-level node or has a parent node it is attached to.
I created a pattern for myself, which i am using.
lv_parent1 = node_key.
while lv_parent1 ne go_Main_tree->C_VIRTUAL_ROOT_NODE.
CALL METHOD go_main_tree->get_parent
EXPORTING
i_node_key = lv_parent1
IMPORTING
e_parent_node_key = lv_parent1.
lv_hierlevel = lv_hierlevel + 1 .
ENDWHILE.
if lv_hierlevel > 2.
“ do what You want to do
endif.

Column Tree Model doesn't expand node after EXPAND_NO_CHILDREN event

I am displaying a list of items using a SAP ABAP column tree model, basically a tree of folder and files, with columns.
I want to load the sub-nodes of folders dynamically, so I'm using the EXPAND_NO_CHILDREN event which is firing correctly.
Unfortunately, after I add the new nodes and items to the tree, the folder is automatically collapsing again, requiring a second click to view the sub-nodes.
Do I need to call a method when handling the event so that the folder stays open, or am I doing something else wrong?
* Set up event handling.
LS_EVENT-EVENTID = CL_ITEM_TREE_CONTROL=>EVENTID_EXPAND_NO_CHILDREN.
LS_EVENT-APPL_EVENT = GC_X.
APPEND LS_EVENT TO LT_EVENTS.
CALL METHOD GO_MODEL->SET_REGISTERED_EVENTS
EXPORTING
EVENTS = LT_EVENTS
EXCEPTIONS
ILLEGAL_EVENT_COMBINATION = 1
UNKNOWN_EVENT = 2.
SET HANDLER GO_APPLICATION->HANDLE_EXPAND_NO_CHILDREN
FOR GO_MODEL.
...
* Add new data to tree.
CALL METHOD GO_MODEL->ADD_NODES
EXPORTING
NODE_TABLE = PTI_NODES[]
EXCEPTIONS
ERROR_IN_NODE_TABLE = 1.
CALL METHOD GO_MODEL->ADD_ITEMS
EXPORTING
ITEM_TABLE = PTI_ITEMS[]
EXCEPTIONS
NODE_NOT_FOUND = 1
ERROR_IN_ITEM_TABLE = 2.
It's been a while since I've played with SAP, but I always found the SAP Library to be particularly helpful when I got stuck...
I managed to come up with this one for you:
http://help.sap.com/saphelp_nw04/helpdata/en/47/aa7a18c80a11d3a6f90000e83dd863/frameset.htm, specifically:
When you add new nodes to the tree model, set the flag ITEMSINCOM to 'X'.
This informs the tree model that you want to load the items for that node on demand.
Hope it helps?
Your code looks fine,
I would use the method ADD_NODES_AND_ITEMS myself if I were to add nodes and items ;)
Beyond that, try to call EXPAND_NODE after you added the items/nodes and see if that helps.

Resources