Index a graph with ElasticSearch

Index a graph with ElasticSearch - elasticsearch

I can describe my data as a graph, I have nodes and links between them (each of theme have their own data). Each node has a huge number of links connected to him.
My goal: I need to query all nodes with/without a link holding a specific data.
Ideally I would like to create a parent-child relation between the link type with both node types, but this is not possible with Elastic (multiple parents). How would you index it?

Related

JQ Grid Search for children as well when searching for Paren

I am fairly new to JQ Grid and I am using the Adjacency model for displaying the Hierarchical structure. The Requirement is such that, when I search for a text and if It happens to have child nodes, I want to display those child nodes as well in the search results.In the demos I see that, when I search for a Leaf Node, all its parents are listed in the search result, but when I search for a parent its children are not listed in the results.
I believe this is not available out of the box for JQ grid plugin , any help or pointers to the solution are highly Appreciable.
example - Say I have data like below,
ELECTRONICS
TELEVISIONS
TUBE
26 " TV
30 " TV
So say, if I search for Electronics , my search result should include all its children which in this case are Televisions,Tube, and both the leaf nodes and not just Electronics.

Neo4j optimization: Query for all graphs from selected to selected nodes

I am not so experienced in neo4j and have the requirement of searching for all graphs from a selection A of nodes to a selection B of nodes.
Around 600 nodes in the db with some relationships per node.
Node properties:
riskId
de_DE_description
en_GB_description
en_US_description
impact
Selection:
Selection A is determined by a property match (property: 'riskId')
Selection B is a known constant list of nodes (label: 'Core')
The following query returns the result I want, but it seems a bit slow to me:
match p=(node)-[*]->(:Core)
where node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN extract (n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description] )
as `risks`, length(p)
This query results in 7 rows with between 1 and 4 nodes per row, so not much.
I get around 270ms or more response time in my local environment.
I have not created any indices or done any other performance attempts.
Any hints how I can craft the query in more intelligent way or apply any performance tuning tricks?
Thank you very much,
Manuel

If there is not yet a single label that is shared by all the nodes that have the riskId property, you should add such a label (say, :Risk) to all those nodes. For example:
MATCH (n)
WHERE EXISTS(n.riskId)
SET n:Risk;
A node can have multiple labels. This alone can make your query faster, as long as you specify that node label in your query, since it would restrict scanning to only Risk nodes instead of all nodes.
However, you can do much better by first creating an index, like this:
CREATE INDEX ON :Risk(riskId);
After that, this slightly altered version of your query should be much faster, as it would use the index to quickly get the desired Risk nodes instead of scanning:
MATCH p=(node:Risk)-[*]->(:Core)
WHERE node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN
EXTRACT(n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description]) AS risks,
LENGTH(p);

Elasticsearch Data in Grafana without timestamp

I am wondering if it is possible to have data from elasticsearch indices without timestamp attached to them.
I need a list of two columns as a drop down. This list is cross checked against another index to generate maps but if I zoom into the graph breaks cause the drop down list exists from time a to be but not from c to d. (lol)
My macgyver solution to this is to just add the list every few minutes into the index so on the graph, the data is reasonably dense. This allows the user to zoom in pretty well into different parts of the graph. But overtime this is going to make my index unreasonably large.

What are labels and indices in Neo4j?

I am using neo4j-core gem (Neo4j::Node API). It is the only MRI-compatible Ruby binding of neo4j that I could find, and hence is valuable, but its documentation is a crap (it has missing links, lots of typographical errors, and is difficult to comprehend). In the Label and Index Support section of the first link, it says:
Create a node with an [sic] label person and one property
Neo4j::Node.create({name: 'kalle'}, :person)
Add index on a label
person = Label.create(:person)
person.create_index(:name)
drop index
person.drop_index(:name)
(whose second code line I believe is a typographical error of the following)
person = Node4j::Label.create(:person)
What is a label, is it the name of a database table, or is it an attribute peculiar to a node?
If it is the name of a node, I don't under the fact that (according to the API in the second link) the method Neo4j::Node.create and Neo4j::Node#add_label can take multiple arguments for the label. What does it mean to have multiple labels on a node?
Furthermore, If I repeat the create command with the same label argument, it creates a different node object each time. What does it mean to have multiple nodes with the same name? Isn't a label something to identify a node?
What is index? How are labels and indices different?

Labels are a way of grouping nodes. You can give the label to many nodes or just one node. Think of it as a collection of nodes that are grouped together. They allow you to assign indexes and other constraints.
An index allows quick lookup of nodes or edges without having to traverse the entire graph to find them. Think of it as a table of direct pointers to the particular nodes/edges indexed.

As I read what you pasted from the docs (and without, admittedly, knowing the slightest thing about neo4j):
It's a graph database, where every piece of data is a node with a certain amount of properties.
Each node can have a label (or more, presumably?). Think of it as a type -- or perhaps more appropriately, in Ruby parlance, a Module.
It's a database, so nodes can be part of an index for quicker access. So can subsets of nodes, and therefor nodes with a certain label.
Put another way: Think of the label as the table in a DB. Nodes as DB rows, which can belong to one or more labels/tables, or no label/table at all for that matter. And indexes as DB indexes on sets of rows.

neo4j performance with 4M nodes and 29M relations

I have a tree with 80,000 nodes and 4M leafs. The leafs are assigned to the tree nodes by 29M relations. In fact i have around 4 trees where the leafs are assigned to different nodes but that does not matter.
After about 6 days of work i figured out how to import such amount of data into neo4j within acceptable time and a lot of cases (csv import neo4j 2.1) where the neo4j process stuck at 100% and does not seem to do anything. I'm now creating the database with this tool:
https://github.com/jexp/batch-import/tree/20
which is VERY fast!
Now i finally got my database and started with a simple query like "how many leafs has a specific node":
MATCH (n:Node {id:123})-[:ASSIGNED]-(l:Leaf) RETURN COUNT(l);
i created an index on the "id" property but still this query takes 52 seconds.
It seems like the relation (without propertys) is not indexed at all...
Is there a way to make this faster?

The relationships don't have to be indexed.
Did you create an index like this:
create index on :Node(id);
I recommend that you add a direction to your arrow otherwise you will follow all relationship up and down the tree.
MATCH (n:Node {id:123})<-[:ASSIGNED]-(l:Leaf) RETURN COUNT(l);

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Index a graph with ElasticSearch - elasticsearch

Related

JQ Grid Search for children as well when searching for Paren

Neo4j optimization: Query for all graphs from selected to selected nodes

Elasticsearch Data in Grafana without timestamp

What are labels and indices in Neo4j?

neo4j performance with 4M nodes and 29M relations

Categories

Resources