Algorithm queries in latest version of Neo4j - GDS syntax updates - algorithm

I'm working on a project that began on an older version of Neo4j (3.5) and has slightly different syntax, particularly regarding algorithms. I'm trying to 'update' the following query to work with GDS:
CALL algo.labelPropagation.stream(
'MATCH (p:Publication) RETURN id(p) as id',
'MATCH (p1:Publication)-[r1:HAS_WORD]->(w)<-[r2:HAS_WORD]-(p2:Publication)
WHERE r1.occurrence > 5 AND r2.occurrence > 5
RETURN id(p1) as source, id(p2) as target, count(w) as weight',
{graph:'cypher',write:false, weightProperty : "weight"}) yield nodeId, label
with label, collect(algo.asNode(nodeId)) as nodes where size(nodes) > 2
MERGE (c:PublicationLPACommunity {id : label})
FOREACH (n in nodes |
MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
)
return label, nodes
The main issues are likely the first part (algo.labelPropagation) and (algo.asNode) since these have changed in GDS. Here is the error that is returned:
Procedure call provides too many arguments: got 3 expected no more than 2.
Procedure gds.labelPropagation.stream has signature: gds.labelPropagation.stream(graphName :: STRING?, configuration = Map{} :: MAP?) :: nodeId :: INTEGER?, communityId :: INTEGER?
meaning that it expects at least 1 argument of type STRING?
Description: The Label Propagation algorithm is a fast algorithm for finding communities in a graph. (line 1, column 1 (offset: 0))
"CALL gds.labelPropagation.stream("
^
Any help much appreciated!

According to the syntax I know about the LPA algorithm, which is used for finding communities in a graph, it seems you are using a different syntax by including MATCH statement.
You can find the syntax of the stream mode in Label Propagation
CALL gds.labelPropagation.stream(
graphName: String,
configuration: Map
)
YIELD
nodeId: Integer,
communityId: Integer
You can check the exapmle of the social network graph in the website as well

Related

how to use F expression to first cast a string to int, then add 1 to it then cast to string and update

I have a DB column which is generic type for some stats(qualitative and quantitative info).
Some values are string - type A and some values are numbers stored as string - type B.
What i want to do is cast the B types to number then add one to them and cast back to string and store.
Metadata.objects.filter(key='EVENT', type='COUNT').update(value=CAST(F(CAST('value', IntegerField()) + 1), CharField())
What i want to do is avoid race conditions using F expression and
update in DB.
https://docs.djangoproject.com/en/4.0/ref/models/expressions/#avoiding-race-conditions-using-f
It says in below post that casting and updating in db is possible for mysql
Mysql Type Casting in Update Query
I also know we can do arithmetic very easily on F expressions as it supports it and we can override functionality of add as well. How to do arthmetic on Django 'F' types?
How can i achieve Cast -> update -> cast -> store in Django queryset?
Try using annotation as follows:
Metadata.objects
.filter(key='EVENT', type='COUNT')
.annotate(int_value=CAST('value', IntegerField()))
.update(value=CAST(F('int_value') + 1, CharField())
Or maybe switching F and CAST works?
Metadata.objects
.filter(key='EVENT', type='COUNT')
.update(value=CAST( # cast the whole expression below
CAST( # cast a value
F('value'), # of field "value"
IntegerField() # to integer
) + 1, # then add 1
CharField() # to char.
)
I've added indentation, it helps sometimes to find the errors.
Also, doc says, CAST accepts field name, not an F-object. Maybe it works without F-object at all?
UPD: switched back to first example, it actually works :)
I believe the answer from #som-1 was informative but not substantiated with info or debugged data. I believe assuming is not always right.
I debugged the mysql queries formed in these two cases -
1 - Metadata.objects.update(value=Cast(Cast(F('value'), output_field=IntegerField()) + 1, output_field=CharField()))
2 - Metadata.objects.update(value=Cast(Cast('value', IntegerField()) + 1, CharField())) and
both give the same output as expected.
UPDATE Metadata SET value = CAST((CAST(value AS signed integer) + 1) AS char) WHERE ( key = 'EVENT' AND type = 'COUNT' )
Please find the link to add mysqld options to my.cnf and debug your queries. Location of my.cnf file on macOS
enabling queries - https://tableplus.com/blog/2018/10/how-to-show-queries-log-in-mysql.html

Is there a way to use range with Z3ints in z3py?

I'm relatively new to Z3 and experimenting with it in python. I've coded a program which returns the order in which different actions is performed, represented with a number. Z3 returns an integer representing the second the action starts.
Now I want to look at the model and see if there is an instance of time where nothing happens. To do this I made a list with only 0's and I want to change the index at the times where each action is being executed, to 1. For instance, if an action start at the 5th second and takes 8 seconds to be executed, the index 5 to 12 would be set to 1. Doing this with all the actions and then look for 0's in the list would hopefully give me the instances where nothing happens.
The problem is: I would like to write something like this for coding the problem
list_for_check = [0]*total_time
m = s.model()
for action in actions:
for index in range(m.evaluate(action.number) , m.evaluate(action.number) + action.time_it_takes):
list_for_check[index] = 1
But I get the error:
'IntNumRef' object cannot be interpreted as an integer
I've understood that Z3 isn't returning normal ints or bools in their models, but writing
if m.evaluate(action.boolean):
works, so I'm assuming the if is overwritten in a way, but this doesn't seem to be the case with range. So my question is: Is there a way to use range with Z3 ints? Or is there another way to do this?
The problem might also be that action.time_it_takes is an integer and adding a Z3int with a "normal" int doesn't work. (Done in the second part of the range).
I've also tried using int(m.evaluate(action.number)), but it doesn't work.
Thanks in advance :)
When you call evaluate it returns an IntNumRef, which is an internal z3 representation of an integer number inside z3. You need to call as_long() method of it to convert it to a Python number. Here's an example:
from z3 import *
s = Solver()
a = Int('a')
s.add(a > 4);
s.add(a < 7);
if s.check() == sat:
m = s.model()
print("a is %s" % m.evaluate(a))
print("Iterating from a to a+5:")
av = m.evaluate(a).as_long()
for index in range(av, av + 5):
print(index)
When I run this, I get:
a is 5
Iterating from a to a+5:
5
6
7
8
9
which is exactly what you're trying to achieve.
The method as_long() is defined here. Note that there are similar conversion functions from bit-vectors and rationals as well. You can search the z3py api using the interface at: https://z3prover.github.io/api/html/namespacez3py.html

I've been trying to match the a name received from 2 sources with each other and check if they are almost a match or not

In the sample data, I've listed the names of employers of a particular person(a prospective customer) which we received from 2 different sources.
I've been trying to find a way to better match the two names and get good results. (Currently, it's being done as a manual job)
I don't think I'm trying to do the impossible...but if it's not achievable, please don't be harsh!
The below is the dataset which is a "match" as per manual verification.
ADDUS==============================================Addus Home Care
Amazon.com, Inc. and its affiliates=====================Amazon.com
Aon========================================Aon Service Corporation
ARAMARK Food & Support Svc.================================Aramark
AT&T Mobility Services LLC===========================AT&T Mobility
CDW, LLC===========================================CDW Corporation
Lurie Children's Hospital of Chicago======Lurie Childrens Hospital
Securitas Security Services USA, Inc============Securitas security
The PNC Financial Services Group, Inc.======================PNC NA
United States Department of Homeland Security====US Homeland Securiti
TCS=========================================Tata Consultancy Services
Although almost obvious, let me state them for the sake of emphasis.
There might be spelling mistakes in names from either of these sources
There might be abbreviations(Ex: TCS in one place and Tata Consultancy in another)
Please suggest me an algorithm or a way to do this with least number of "wrong acceptance cases" - by which I meant cases like this, which have gotten high match ratios from different algorithms.
Please try to suggest a way of doing this.
I see only one, but over the time pretty progressive and accurate option:
(1) first the caveat: you have your 'manual job' and you will stick with it.
(2) but now the better part: the manual job is getting shorter and shorter the more data you have had classified over the time - kind of self learning machine. See the following attempt description, if you are interested, we can discus the details at later time.
1. Yur current workflow
1. create a initial employer list of triplets.
1. employer1 (string)
2. employer2 (string)
3. equivalence (values {VALID|INVALID}), default: INVALID
Result: AllEpmployersList, unverified.
2. Process the AllEpmployersList manually
1. for each AllEpmployersList member (triplet)
1. set the value for equivalence element
VALID or INVALID respectively.
Result: VerifiedEpmployersList, triplets with verified equivalence value.
3. Use the VerifiedEpmployersList as required for downstream processing.
2. The Adapted (advanced) new workflow
1. create a initial employer list of triplets.
1. employer1 (string)
2. employer2 (string)
3. equivalence (values {VALID|INVALID}), default: INVALID
Result: AllEpmployersList, unverified.
2. feed unverified AllEpmployersList into matchKnownEmployers process (described later).
Result: two lists, AllKnownEmployers and AllUnknownEployers.
3. Process the AllUnknownEployers list manually.
Result: VerifiedEpmployersList with verified equivalence value.
4. feed the VerifiedEpmployersList list into importKnownEmployers process
5. feed (again) the AllEpmployerList (Result 2.1) into matchKnownEmployers process
Result:two lists, AllKnownEmployers and AllUnknownEployers.
6. Use the AllKnownEmployers as required for downstream processes.
3. Required Investments (instances you have to establish)
1. create KnownEmployers database
1. create table knownEmployerNames,
1. columns:
1. id
2. employerName
3. aliasIdValue
2. create table lastAliasIdValue
1. columns:
1. aliasIdValue
3. init table lastAliasIdValue
1. insert one initial row, aliasIdValue = 0
2. create matchKnownEmployersProcess with this characteristics:
1. Input data: employerList (triplets)
2. init empty list for knownEmployers and unknownEployers
3. for each member in employerList do:
1. if employer1 and employer2 in table knownEmployerNames and employer1::aliasIdValue equals employer2::aliasIdValue
1. then set member::equivalence value to VALID and append the member into knownEmployers list
2. else append the member into unknownEployers list
4. Output data: two lists, knownEmployers and unknownEployers.
3. create importKnownEmployersProcess with this characteristics:
1. Input data: employerList (triplets)
2. for each element in employerList do:
1. if equivalence element value is VALID
1. insert new pattern
1. if employer1 or employer2 is in table knownEmployerNames
1. then
1. function isUnknown(employer1, employer2) {
retVal = {}
retVal[‘aliasIdValue’] =
employer1::aliasIdValue ||
employer2::aliasIdValue
retVal[‘newEmployer’] =
(!employer1 || !employer2)
return retVal
}
2. aliasIdValue, newEmployer = isUnknown(employer1, employer2)
3. insert aliasIdValue, newEmployer into knownEmployerNames table
2. else
1. fetch and increment aliasIdValue from lastAliasIdValue table
2. insert into knownEmployerNames (employer1, aliasIdValue) and (employer2, aliasIdValue)
3. update incremented lastAliasIdValue in the lastAliasIdValue table
3. Output data: none

General-purpose language to specify value constraints

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.
You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.
If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.
You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.
there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)
Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)
My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.
If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.
An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!
You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

XPath :: running counter two levels

Using the count(preceding-sibling::*) XPath expression one can obtaining incrementing counters. However, can the same also be accomplished in a two-levels deep sequence?
example XML instance
<grandfather>
<father>
<child>a</child>
</father>
<father>
<child>b</child>
<child>c</child>
</father>
</grandfather>
code (with Saxon HE 9.4 jar on the CLASSPATH for XPath 2.0 features)
Trying to get an counter sequence of 1,2 and 3 for the three child nodes with different kinds of XPath expressions:
XPathExpression expr = xpath.compile("/grandfather/father/child");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0 ; i < nodes.getLength() ; i++) {
Node node = nodes.item(i);
System.out.printf("child's index is: %s %s %s, name is: %s\n"
,xpath.compile("count(preceding-sibling::*)").evaluate(node)
,xpath.compile("count(preceding-sibling::child)").evaluate(node)
,xpath.compile("//child/position()").evaluate(doc)
,xpath.compile(".").evaluate(node));
}
The above code prints:
child's index is: 0 0 1, name is: a
child's index is: 0 0 1, name is: b
child's index is: 1 1 1, name is: c
None of the three XPaths I tried managed to produce the correct sequence: 1,2,3. Clearly it can trivially be done using the i loop variable but I want to accomplish it with XPath if possible. Also I need to keep the basic framework of evaluating an XPath expression to get all the nodes to visit and then iterating on that set since that's the way the real application I work on is structured. Basically I visit each node and then need to evaluate a number of XPath expressions on it (node) or on the document (doc); one of these XPAth expressions is supposed to produce this incrementing sequence.
Use the preceding axis with a name test instead.
count(preceding::child)
Using XPath 2.0, there is a much better way to do this. Fetch all <child/> nodes and use the position() function to get the index:
//child/concat("child's index is: ", position(), ", name is: ", text())
You don't say efficiency is important, but I really hate to see this done with O(n^2) code! Jens' solution shows how to do that if you can use the result in the form of a sequence of (position, name) pairs. You could also return an alternating sequence of strings and numbers using //child/(string(.), position()): though you would then want to use the s9api API rather than JAXP, because JAXP can only really handle the data types that arise in XPath 1.0.
If you need to compute the index of each node as part of other processing, it might still be worth computing the index for every node in a single initial pass, and then looking it up in a table. But if you're doing that, the simplest way is surely to iterate over the result of //child and build a map from nodes to the sequence number in the iteration.

Resources