How to create index for realtionships when using apoc.algo.dijkstra in neo4j

How to create index for realtionships when using apoc.algo.dijkstra in neo4j - performance

I want to create appropriate indexes on relationships which increase performance when using the apoc.algo.dijkstra algorithm.
My query looks like this
MATCH (a:Waypoint {name: 'nameTwo'}), (b:Waypoint{name: 'nameOne'}) CALL apoc.algo.dijkstra(a, b, 'STREET_A>|STREET_B>', 'distance') yield path as path, weight as distance RETURN path, distance;
I want indexes for the relationship types 'STREET_A' and 'STREET_B'
I tried to create indexes like these, but it does not seem to make performance difference:
CREATE INDEX STREET_A_INDEX IF NOT EXISTS FOR ()-[r:STREET_A]-() ON (r.distance)
This is the result with PROFILE:
Plan image
Is it at all possible to make apoc.algo.dijkstra more performant via an index?

I don't think a relationship index will help in this case. Indexes are used in Neo4j to find the starting point of a traversal - in your case that would be finding the two Waypoint nodes. From the query plan, it looks like you already have an index that is being used to find those nodes.
Neo4j has what's called "index-free adjacency". This means that a traversal from one node to another following a relationship doesn't use an index, rather the operation is more like following pointers - so a relationship index would not be used in your example.

Related

Are composite indexes supported in Memgraph?

Can I create index on more than one property for any given label? I am running query like:
MATCH (n:Node) WHERE n.id = xyz AND n.source = abc;
and I want to know whether I can create multiple property indexes and if not, is there a good way to query for nodes matching multiple properties?

Memgraph does not support composite indexes, but you can create multiple label-property indexes. For example, run
CREATE INDEX ON Node(id);
CREATE INDEX ON Node(source);
To check if they are properly created, run SHOW INDEX INFO;.
Use EXPLAIN/PROFILE query (inspecting and profiling queries) to see different options and test the performance, maybe one label-property index is good enough.

How do I query nodes that are missing a child of a specific type?

I'm new to graphql, and trying to understand how I might fill this use case.
I have thousands of nodes of a specific type/schema.
Some of these nodes have children, some of them don't.
I'd like to query all the nodes, and return only the ones that don't have children.
This might get more specific in the future, where I'd like to query only nodes that don't have children of a specific type.
Is that even possible?
I've seen plenty of query examples that show how to select children nodes, or nested nodes + fields, or nodes with specific values. It's an easy thing with SQL, I'm just having trouble understanding how it's done with graphql.
Thoughts?

As Daniel Rearden said, there is no built in way in GraphQL to filter or sort the results of a query. We have a few filters in our Gentics Mesh GraphQL API, but it is currently not possible to create a filter involving another list of items (children in your case).
I've added your case to the issue in Github. https://github.com/gentics/mesh/issues/27

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}

First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

stored procedures search elastic search

I am trying to script a recursive search in elastic.
i know there are search-templates,but i am not finding examples like below scenario
`ex:-father= neo
1.search in person-index documents for father attribute
2.if father=neo return direct,else(here father=ted)
3.search for ted now and check if father=neo return indirect
or repeat step 3 till script find's ancestor if not found return not related when reached father = some constant like (pre-genator or ancient)
`
This eliminates for me to go for graph database, if i have only one relation .
another scenario like find all decedents of "neo"

There is no facility to do exactly what you are describing inside elasticsearch at the moment.
If number of ancestral generations is limited and they can be expressed as 1-to-many relationships, you can use multiple has_parent queries.
Alternatively, if it's possible, you can denormalize the data and store names of all ancestors for the given record in a single field. So the record would look like this:
{
"father": "neo",
"ancestors": ["neo", "ted", ... ]
}
Otherwise, you need to do these searches outside of elasticsearch.

Neo4j Cypher query slow to find common children. What is the best approach?

I'm new to Neo4j - 2.1.6. In my graph any given node can have multiple parents and multiple children that are also parents and children of other nodes. What I need to do is find those parents and children that several searched nodes all have in common. The searched nodes can be one to hundreds of relations away from their common parent or child. All nodes have the same label.
I'm using the following cypher query but it is very slow when you add more than a couple nodes.... I'd like to search 20 or more nodes at a time for their common connections. Here i'm searching for children on 4 nodes:
MATCH (n1)-[*]->(x), (n2)-[*]->(x), (n3)-[*]->(x), (n4)-[*]->(x)
WHERE n1.name = "node1" AND n2.name ="node2" AND n3.name ="node3" AND n4.name ="node4"
RETURN DISTINCT x.name
Is there some other way I should be approaching this?
Thanks!

Add the labels, and make sure you have an index on :Label(name)
e.g. if you labels is :Node
create index on :Node(name);
MATCH (n1:Node)-[*]->(x),(n2:Node)-[*]->(x),(n3:Node)-[*]->(x),(n4:Node)-[*]->(x)
USING INDEX n1:Node(name)
USING INDEX n2:Node(name)
USING INDEX n3:Node(name)
USING INDEX n4:Node(name)
WHERE n1.name = "node1" AND n2.name ="node2" AND n3.name ="node3" AND n4.name ="node4"
RETURN DISTINCT x.name
For very long paths Cypher can have some issues.
If that is a frequent operation in your graph that must finish in milliseconds, I recommend creating an server extension for Neo4j server written in Java.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio