Match the closest possible node in neo4j - time

I have a time tree which is granulated to 5 min intervals. I have no problem while trying to match the exact minute . But in few cases, i might be searching for a minute which is not exactly matching the node. For ex. if i want to match 10:05 i can. But if the input is 10.03 i get no result. I have epoch times added to the minute node. I wanted to return the closest minute node available for the given input (if its 10.03 then return 10.05). how do i achieve this?
MATCH (startMinute:Minute {epoch:apoc.date.parse('2018-04-12T16:33', 'ms',"yyyy-MM-dd'T'HH:mm")}) return startMinute
My Model is
here

If you want to always round up the time:
MATCH (startMinute:Minute {epoch:
(apoc.date.parse('2018-04-12T16:32', 'ms',"yyyy-MM-dd'T'HH:mm") + 29999)/300000*300000})
RETURN startMinute;
Or, if you want to round to nearest:
MATCH (startMinute:Minute {epoch:
(apoc.date.parse('2018-04-12T16:32', 'ms',"yyyy-MM-dd'T'HH:mm") + 150000)/300000*300000})
RETURN startMinute;

Related

How to find the largest number of times a candlestick pattern appears within 2 hours to 15 minute timeframes

I am trying to search figure out how to search for a pattern within a range of timeframes. Obviously, it is likely that the pattern would occur several times based on the timeframes, that’s why I’m particularly interested in the largest number of times it repeats.
To explain what I’m trying to achieve further, say I am searching for a pattern from 2 hour to 15 minute chart and I find it on the 2 hour chart, then I drill into the next timeframe 1 hour, and I end up with two of the patterns on the 1 hour chart, I’ll continue to the 30 minute (in both 1 hour patterns) and to 15 minutes till I get the largest time it occurs.
I believe that a method that returns the next lower timeframe would be needed. I’ve been able to write that, see code below. I would really appreciate some help.
ENUM_TIMEFRAMES findLowerTimeframe(ENUM_TIMEFRAMES timePeriod)
{
int timeFrames[5] = {15, 20, 30, 60, 120};
int TFIndex=ArrayBsearch(timeFrames, (int)timePeriod);
return((ENUM_TIMEFRAMES) timeFrames[TFIndex - 1]);
}
EDIT
I didn't add the specific candlestick pattern because I believe it isn't the most important part of my problem. The crux of the question is how to search for a pattern on several consecutive timeframes to find the largest number of times it occurs within the range of times.
const ENUM_TIMEFRAMES DEFAULT_TIMEFRAMES[5] = {PERIOD_M15, PERIOD_M20, PERIOD_M30, PERIOD_H1, PERIOD_H2};
ENUM_TIMEFRAMES findLowerTimeframe(ENUM_TIMEFRAMES timePeriod)
{
int TFIndex=ArrayBsearch(DEFAULT_TIMEFRAMES,timePeriod);
return(TFIndex>0 ? timeFrames[TFIndex - 1] : PERIOD_CURRENT);
}

Sum or Percentage of a word in the results / column

I have a report that in part is providing: Job Date, Job Target Date and Completion date.
I have a column at the end that works out whether or not the job was completed within the target time our outside of returning true or false.
As mentioned, I have created a column to work out whether a job is completed on time and I have tried googling many different solutions and trying them out.
The expression I've used to work out whether the job was completed on time is:
=IIF(Fields!CompletedDate.Value <= Fields!Target.Value, "True", "False")
Now I need an expression to work out the percentage that are within the target. So, let's say there are 80 jobs and 67 are completed in time. It would be 'True' (67) / 80 *100 = 83%.
The expression I've used to work out whether the job was completed on time is:
=IIF(Fields!CompletedDate.Value <= Fields!Target.Value, "True", "False")
Now I need an expression to work out the percentage that are within the target. So, let's say there are 80 jobs and 67 are completed in time. It would be 'True' (67) / 80 *100 = 83%.
I figured this one out myself, again..
If anyone is interested I kept the same IF statement in the original cell. I then created 2 additional cells underneath one containing the text 'Percentage of jobs completed on target' and the other cell I added a new expression that included the original one above;
=SUM(IIF(Fields!CompletedDate.Value <= Fields!Target.Value, 1, 0)) / count(Fields!PropertyCode.Value)
All I had to keep in mind was that I am already showing the results as true and false so in my seconds expression I change the values to numbers so I can calculate. True is '1' so I know that it will not be calculating on 0 as it cannot divide by zero.
So, I did a sum on the number true represented as 1 divided by number of results. Rather than times this by 100 ( 5/19*100 = percentage), I simply left out the times and changed the format of that cell to percentage so it takes the value in the cell and returns the percentage.
Thanks,
Jordan

Neo4J: 3-Level Expand node with filter

For a new POC I have the following use case:
For a given node do a 3-Level expand but apply also a filter on all expanded node (means I want to filter all resulting nodes for certain properties)
Test set:
nodes: ~ 17 Mio
edges: ~ 40 Mio
Properties: ~ 2650 Mio
My first solution looks like this:
MATCH path=(startNode:Entity {id:'RVDJRcV_yfXbG0-syGKp3Q..'})-[*..3]-(endNode:Entity)
WITH path
WHERE ALL (n IN nodes(path)[1..]
WHERE n.key = '1' AND n.domain = 'facebook.com' AND n.investigationID='any')
RETURN path
LIMIT 100
This does the job, but it is not very fast. Avg. query times in my test set are 2-3 seconds but with many timeouts (time > 30 seconds). I assume the problem is the path handling and that my node has lots of properties...
Explain plan:
Variant 1: i removed the "with path"
Solution:
Based on the tip that i should avoid [1..] in the query
MATCH path=(startNode:Entity {id:'v-jXIO7kozAa35gMUpUkvg..'})-[*..3]-(endNode:Entity)
WHERE ALL (n IN nodes(path)
WHERE n=startNode OR (n.key = '1' AND n.domain = 'facebook.com' AND n.investigationID='any'))
RETURN path
LIMIT 100
While you can filter during expansion with variable-length paths, Cypher currently can't apply that filter during expansion when you're working with a slice of the list instead of the whole list. It will fall back to doing the full var-length expansion, and then applying the filter to all results found.
We need to only use ALL (n IN nodes(path) ..., we can't use the slice of the path.
To do this, we need to add one more predicate within the all() function. Since the start node probably doesn't meet the current predicate, we'll create an exception for it:
MATCH path=(startNode:Entity {id:'RVDJRcV_yfXbG0-syGKp3Q..'})-[*..3]-(endNode:Entity)
WHERE ALL (n IN nodes(path)[1..]
WHERE n=startNode OR (n.key = '1' AND n.domain = 'facebook.com' AND n.investigationID='any'))
RETURN path
LIMIT 100

Can this Neo4j query be optimized?

I have rather large dataset (20mln nodes, 200mln edges), simplest shortestPath queries finish in milliseconds, everything is great.
But... I need to allow shortestPath to have ZERO or ONE relation of type 999 and it can be only the first from the start node.
So, my query became like this:
MATCH (one:Obj{oid:'startID'})-[r1*0..1]-(b:Obj)
WHERE all(rel in r1 where rel.val = 999)
WITH one, b
MATCH (two:Obj{oid:'endID'}), path=shortestPath((one) -[0..21]-(two))
WHERE ALL (x IN RELATIONSHIPS(path)
WHERE (x.val > -1 and x.val<101) or (x.val=999 or x.val=998)) return path
it runs in milliseconds when there's a short path (up to 2-4), but can take 5 or 20 seconds for paths like 5++. Maybe I've composed inefficient query?
This question will be bountied when available.
Some of your requirements are a bit unclear to me, so I'll reiterate my understanding and offer a solution.
You want to inspect the shortest paths between a start and end node.
The paths returned should have ZERO or ONE relationship with a val of 999. If it's ONE relationship with that value, it should be the first.
Here's an attempt based on that logic:
MATCH (start:Obj {oid:'startID'}),
(end:Obj {oid:'endID'}),
path=shortestPath((start)-[1..21]->(end))
WITH path, relationships(path) AS rels
WHERE all(r IN relationships WHERE r.val != 999)
OR (relationships[0].val = 999
AND all(r IN relationships[1..] WHERE r.val != 999))
RETURN path
I haven't had a chance to test on actual data, but hopefully this logic and approach at least point you in the right direction.
Also note: it's possible the entire WHERE clause at the end could be reduced to:
WHERE all(r IN relationships[1..] WHERE r.val != 999)
Meaning you don't even need to check the first relationship.

number of days in a period that fall within another period

I have 2 independent but contiguous date ranges. The first range is the start and end date for a project. Lets say start = 3/21/10 and end = 5/16/10. The second range is a month boundary (say 3/1/10 to 3/31/10, 4/1/10 to 4/30/10, etc.) I need to figure out how many days in each month fall into the first range.
The answer to my example above is March = 10, April = 30, May = 16.
I am trying to figure out an excel formula or VBA function that will give me this value.
Any thoughts on an algorithm for this? I feel it should be rather easy but I can't seem to figure it out.
I have a formula which will return TRUE/FALSE if ANY part of the month range is within the project start/end but not the number of days. That function is below.
return month_start <= project_end And month_end >= project_start
Think it figured it out.
=MAX( MIN(project_end, month_end) - MAX(project_start,month_start) + 1 , 0 )

Resources