What are labels and indices in Neo4j? - ruby

I am using neo4j-core gem (Neo4j::Node API). It is the only MRI-compatible Ruby binding of neo4j that I could find, and hence is valuable, but its documentation is a crap (it has missing links, lots of typographical errors, and is difficult to comprehend). In the Label and Index Support section of the first link, it says:
Create a node with an [sic] label person and one property
Neo4j::Node.create({name: 'kalle'}, :person)
Add index on a label
person = Label.create(:person)
person.create_index(:name)
drop index
person.drop_index(:name)
(whose second code line I believe is a typographical error of the following)
person = Node4j::Label.create(:person)
What is a label, is it the name of a database table, or is it an attribute peculiar to a node?
If it is the name of a node, I don't under the fact that (according to the API in the second link) the method Neo4j::Node.create and Neo4j::Node#add_label can take multiple arguments for the label. What does it mean to have multiple labels on a node?
Furthermore, If I repeat the create command with the same label argument, it creates a different node object each time. What does it mean to have multiple nodes with the same name? Isn't a label something to identify a node?
What is index? How are labels and indices different?

Labels are a way of grouping nodes. You can give the label to many nodes or just one node. Think of it as a collection of nodes that are grouped together. They allow you to assign indexes and other constraints.
An index allows quick lookup of nodes or edges without having to traverse the entire graph to find them. Think of it as a table of direct pointers to the particular nodes/edges indexed.

As I read what you pasted from the docs (and without, admittedly, knowing the slightest thing about neo4j):
It's a graph database, where every piece of data is a node with a certain amount of properties.
Each node can have a label (or more, presumably?). Think of it as a type -- or perhaps more appropriately, in Ruby parlance, a Module.
It's a database, so nodes can be part of an index for quicker access. So can subsets of nodes, and therefor nodes with a certain label.
Put another way: Think of the label as the table in a DB. Nodes as DB rows, which can belong to one or more labels/tables, or no label/table at all for that matter. And indexes as DB indexes on sets of rows.

Related

Best DataStructure to implement Excel Spreadsheet

How we can implement Excel spreadsheet with creation and deletion of rows and creation and deletion if cells, with also can modify data inside any cell.
I was looking for best data structure to implement this.
The problem statement is little vague in my opinion. We do not have any information about the kind of operations that will be very frequent or even the amount of data that this DS is going to hold.
So assuming there can be fair amount of data. Also the operations are addition and deletion of rows and cells.
For excel spreadsheet, If I have to implement it with a custom Data Structure, I would take each row as a node of a linked list. This is helpful because as opposed to an array (n dimensional), the memory can be assigned in non contiguous manner. Also with that benefit, it will make adding and deletion of rows much easy.
Inside each node, we can have array of string to hold cell values and a Id field to hold the Id of the row.
The head node of the DS will have column names as value of its string array. So in a way each column is mapped to an index of the array.
To add a row: It will be an insert into the linked list. Make a new row and append in the end.
To delete a row: Same as deletion of node in a linked list.
To add/update a cell value: You basically know the row Id, you have column name so you can know the index of the column in the array from head node. So once you have the node corresponding to the row, access the index of string array to add/read/update/delete the value of cell.
In order to optimize node access you can keep indexes on the actual linked list to easily locate node by row Id. Some more optimizations would be store row-Id to node pointers mapping some where in auxiliary map or array so that inserting rows in between in also fast.
However I would re-iterate that implementation should be done on the use-case basis. If there are heavy column addition/deletion ops for example, it will be quite slow. There are different kind of trade-offs for each kind of use case.
I think the easy way to go ahead with this is to simply use a JSON structure to hold each row. Column names as keys and the cell values as values. This handles null/empty values quite easily.
A spreadsheet is essentially similar to a table, changes can be made on any cell at any row. Hence going with a simple list structure would not be too bad. The downside to this is that deletion and insertion of in between rows is not performant. But the insertion of rows at end, which is the most common use case and modification of cells can be made quite easy.
To facilitate faster insertion and deletion a linked list structure will help, but it will affect random access adversely, so a simple list of json objects would be the better.

Neo4j optimization: Query for all graphs from selected to selected nodes

I am not so experienced in neo4j and have the requirement of searching for all graphs from a selection A of nodes to a selection B of nodes.
Around 600 nodes in the db with some relationships per node.
Node properties:
riskId
de_DE_description
en_GB_description
en_US_description
impact
Selection:
Selection A is determined by a property match (property: 'riskId')
Selection B is a known constant list of nodes (label: 'Core')
The following query returns the result I want, but it seems a bit slow to me:
match p=(node)-[*]->(:Core)
where node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN extract (n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description] )
as `risks`, length(p)
This query results in 7 rows with between 1 and 4 nodes per row, so not much.
I get around 270ms or more response time in my local environment.
I have not created any indices or done any other performance attempts.
Any hints how I can craft the query in more intelligent way or apply any performance tuning tricks?
Thank you very much,
Manuel
If there is not yet a single label that is shared by all the nodes that have the riskId property, you should add such a label (say, :Risk) to all those nodes. For example:
MATCH (n)
WHERE EXISTS(n.riskId)
SET n:Risk;
A node can have multiple labels. This alone can make your query faster, as long as you specify that node label in your query, since it would restrict scanning to only Risk nodes instead of all nodes.
However, you can do much better by first creating an index, like this:
CREATE INDEX ON :Risk(riskId);
After that, this slightly altered version of your query should be much faster, as it would use the index to quickly get the desired Risk nodes instead of scanning:
MATCH p=(node:Risk)-[*]->(:Core)
WHERE node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN
EXTRACT(n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description]) AS risks,
LENGTH(p);

Elasticsearch Data in Grafana without timestamp

I am wondering if it is possible to have data from elasticsearch indices without timestamp attached to them.
I need a list of two columns as a drop down. This list is cross checked against another index to generate maps but if I zoom into the graph breaks cause the drop down list exists from time a to be but not from c to d. (lol)
My macgyver solution to this is to just add the list every few minutes into the index so on the graph, the data is reasonably dense. This allows the user to zoom in pretty well into different parts of the graph. But overtime this is going to make my index unreasonably large.

Index a graph with ElasticSearch

I can describe my data as a graph, I have nodes and links between them (each of theme have their own data). Each node has a huge number of links connected to him.
My goal: I need to query all nodes with/without a link holding a specific data.
Ideally I would like to create a parent-child relation between the link type with both node types, but this is not possible with Elastic (multiple parents). How would you index it?

how to avoid treeview negative node number problem

I am binding a database table data to treeview.
In documntation it is mentioned nodes count property as integer value which is signed 2 byte.
so if the nodes exceeds this range, nodes count is becoming negative.
Is there any workaround for this?
Yes, this is a documented bug. Fortunately, no one ever encounters it in the real world because it's completely nonsensical for a single TreeView control to ever need to display more than 32,767 nodes.
As mentioned in the linked knowledge base article, the best workaround is to maintain less nodes in your TreeView control. Consider splitting the data up between multiple TreeViews, or using a different control that is better suited for such incredibly large quantities of data.
If you absolutely must use a TreeView, Microsoft recommends that you keep the following in mind:
Performance will become extremely slow as you add more and more nodes.
Do not add more than 65535 nodes. (That's the limit imposed by the native control, which uses an unsigned integer to store the node count.)
Use the SendMessage API function to obtain the true node count. Alternately, you can use a module- or public-level variable to keep track of how many nodes are in the TreeView. Each time you add or remove a node, increment or decrement the variable by one. This is necessary if you need to determine the count of nodes because the Count property of the Nodes collection will not return the correct value.
Don't rely on the Index property of a node object. For example, the Index property is 32767 for node 32767 but is -32768 for node 32768.
You can still refer to a node by using its Key or by passing a number to the Nodes collection.
For example:
TreeView1.Nodes(40000) refers to node 40000.

Resources