Cypher query for finding out content and connectedness as security measure - memgraphdb

I have a node that represents a computer infected with malware. I want to see if other computers (based on log files) have had some interaction with the infected computer. I have already transferred and mapped log files into the Memgraph database.
How would Cypher query look for this scenario?

Basic cypher code that you can use for this scenario would be:
MATCH p1=(n:Node1)-[*]->(m:Node2), p2=(n)-[*]->(m), (n)-[r]->(f:FraudulantActivity)
WHERE p1!=p2
RETURN nodes(p1)+nodes(p2)
This Cypher query looks for different paths p1 and p2 between node named n and node named m and returns such nodes on those different paths. Those nodes could be part of some malicious actions.

Related

Finding an alternate route using Cypher and Memgraph

I have a situation where I have several data centers spread across large distances. I have a main site and a backup site.
There are several routes between main and backup site. Based on conditions (connectivity issues) I need to decide which route to activate for data transfer.
I have all of the data on datacenters, routes and server characteristics stored as nodes and relationships in Memgraph.
How would a Cypher query look for a decision making scenario to chose to optimal route?
This issue can be solved using WSHORTEST algorithm. The code for this scenario could be something like this:
MATCH p=(n:Node3 {prop:"c"})-[r *wShortest (e,v | 1) sum]->(t:Target), (n2:Node2 {prop:"b"})
WHERE not n2 in nodes(p)
RETURN p
ORDER BY sum DESC;

In neo4j on what basis paths from apoc.path.spanningTree will get sorted(default sort)?

I am using apoc.path.spanningTree with some relationship filters and some label filters with maxLevel:-1
as a result, I am getting 5 paths as output in some order. I am not able to understand the basis of its sorting.
What I have noticed is, sorting is taking place on the basis of neo4j id of the last node in the path.
But If I update any intermediate node in any of the paths then this order changes.
The procedure is not documented to return the paths in any particular order, so you should not assume a particular ordering is used. And the algorithm can change at any time anyway.
If your query needs the paths in a specific order, it should sort the returned paths itself.

Neo4j cypher query takes an infinite time to execute

I have loaded in a local docker instance of neo4j 3.3.1 community 147 nodes connected by 1718 relationships. This form a highly cyclic graph.
All the nodes have the same label :EClass and two attributes, class and package.
The following query counts the numbers of classes reachable from the package modelQueryLanguage by following an infinite number of steps.
MATCH (a:EClass {package: 'modelQueryLanguage'})-[*1..]->(b)
RETURN count(DISTINCT b)
The problem is, this query never finish.
My instinct tells me that the distinct clause is supposed to define a stop condition for the potentially infinite traversal of the graph.
How can I write an equivalent cypher query but which execute fast?
Cypher's mode of expansion will attempt to find all possible paths matching the pattern, with the only restriction that a relationship cannot occur more than once per path. With highly connected graphs (and inadequate restrictions on relationship type/direction), this becomes an infeasible means of expansion, as the number of possible unique paths in the graph to every other node in the graph can become huge. This is not ideal for a reachability query.
APOC Procedures has some path expander procedures that are made specifically for use cases like this, where only a single path per node is needed, not all possible paths. And if you just need the nodes and not the paths, there's a procedure for that too.
Here's an example of usage for your query:
MATCH (a:EClass {package: 'modelQueryLanguage'})
CALL apoc.path.subgraphNodes(a, {relationshipFilter:'>'}) YIELD node
RETURN count(node) as count

Neo4j optimization: Query for all graphs from selected to selected nodes

I am not so experienced in neo4j and have the requirement of searching for all graphs from a selection A of nodes to a selection B of nodes.
Around 600 nodes in the db with some relationships per node.
Node properties:
riskId
de_DE_description
en_GB_description
en_US_description
impact
Selection:
Selection A is determined by a property match (property: 'riskId')
Selection B is a known constant list of nodes (label: 'Core')
The following query returns the result I want, but it seems a bit slow to me:
match p=(node)-[*]->(:Core)
where node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN extract (n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description] )
as `risks`, length(p)
This query results in 7 rows with between 1 and 4 nodes per row, so not much.
I get around 270ms or more response time in my local environment.
I have not created any indices or done any other performance attempts.
Any hints how I can craft the query in more intelligent way or apply any performance tuning tricks?
Thank you very much,
Manuel
If there is not yet a single label that is shared by all the nodes that have the riskId property, you should add such a label (say, :Risk) to all those nodes. For example:
MATCH (n)
WHERE EXISTS(n.riskId)
SET n:Risk;
A node can have multiple labels. This alone can make your query faster, as long as you specify that node label in your query, since it would restrict scanning to only Risk nodes instead of all nodes.
However, you can do much better by first creating an index, like this:
CREATE INDEX ON :Risk(riskId);
After that, this slightly altered version of your query should be much faster, as it would use the index to quickly get the desired Risk nodes instead of scanning:
MATCH p=(node:Risk)-[*]->(:Core)
WHERE node.riskId IN ["R47","R48","R49","R50","R51","R14","R3"]
RETURN
EXTRACT(n IN nodes(p)| [n.riskId, n.impact, n.en_GB_description]) AS risks,
LENGTH(p);

Neo4j - slow cypher query - big graph with hierarchies

Using Neo4j 2.1.4. I have a graph with 'IS A' relationships (and other types of relationships) between nodes. I have some hierarchies inside the graph (IS A relationships) and I need to know the descendants (IS A relationship) of one hierarchy that has a particular-known relationship with some descendant of second hierarchy. If that particular-known relationship exists, I return the descendant/s of the first hierarchy.
INPUTS: 'ID_parentnode_hierarchy_01', 'ID_relationship', 'ID_parentnode_hierarchy_02'.
OUTPUT: Descendants (IS A relationship) of 'ID_parentnode_hierarchy_01' that has 'ID_relationship' with some descendant of 'ID_parentnode_hierarchy_02'.
Note: The graph has 500.000 nodes and 2 million relationships.
I am using this cypher query but it is very slow (aprox. 40s in a 4GB RAM and 3GHz Pentium Dual Core 64 bit PC). It is possible to build a faster query?
MATCH (parentnode_hierarchy_01: Node{nodeid : {ID_parentnode_hierarchy_01}})
WITH parentnode_hierarchy_01
MATCH (parentnode_hierarchy_01) <- [:REL* {reltype: {isA}}] - (descendants01: Node)
WITH descendants01
MATCH (descendants01) - [:REL {reltype: {ID_relationship}}] -> (descendants02: Node)
WITH descendants02, descendants01
MATCH (parentnode_hierarchy_02: Node {nodeid: {ID_parentnode_hierarchy_02} })
<- [:REL* {reltype: {isA}}] - (descendants02)
RETURN DISTINCT descendants01;
Thank you very much.
Well, I can slightly clean up your query - this might help us understand the issues better. I doubt this one will run faster, but using the cleaned up version we can discuss what's going on: (mostly eliminating unneeded uses of MATCH/WITH)
MATCH (parent:Node {nodeid: {ID_parentnode_hierarchy_01}})<-[:REL* {reltype:{isA}}]-
(descendants01:Node)-[:REL {reltype:{ID_relationship}}]->(descendants02:Node),
(parent2:Node {nodeid: {ID_parentnode_hierarchy_02}})<-[:REL* {reltype:{isA}}]-
(descendants02)
RETURN distinct descendants01;
This looks like you're searching two (probably large) trees, starting from the root, for two nodes somewhere in the tree that are linked by an {ID_relationship}.
Unless you can provide some query hints about which node in the tree might have an ID_relationship or something like that, at worst, this looks like you could end up comparing every two nodes in the two trees. So this looks like it could take n * k time, where n is the number of nodes in the first tree, k the number of nodes in the second tree.
Here are some strategy things to think about - which you should use depends on your data:
Is there some depth in the tree where these links are likely to be found? Can you put a range on the depth of [:REL* {reltype:{isA}}]?
What other criteria can you add to descendants01 and descendants02? Is there anything that can help make the query more selective so that you're not comparing every node in one tree to every node in the other?
Another strategy you might try is this: (this might be a horrible idea, but it's worth trying) -- basically look for a path from one root to the other, over any number of undirected edges of either isa type, or the other. Your data model has :REL relationships with a reltype attribute. This is probably an antipattern; instead of a reltype attribute, why is the relationship type not just that? This prevents the query that I want to write, below:
MATCH p=shortestPath((p1:Node {nodeid: {first_parent_id}})-[:isA|ID_relationship*]-(p2:Node {nodeid: {second_parent_id}}))
return p;
This would return the path from one "root" to the other, via the bridge you want. You could then use path functions to extract whatever nodes you wanted. Note that this query isn't possible currently because of your data model.

Resources