How to evaluate a complex expression tree against incremental data?

How to evaluate a complex expression tree against incremental data? - algorithm

I have a collection of data and a collection of search filters I want to run against that data. The filters follow the LDAP search filter format and are parsed into an expression tree. The data is read one item at a time and processed through all the filters. Intermediate match results are stored in each leaf node of the tree until all the data has been processed. Then the final results are obtained by traversing the tree and applying the logical operators to each leaf node's intermediate result. For example, if I have the filter (&(a=b)(c=d)) then my tree will look like this:
root = "&"
left = "a=b"
right = "c=d"
So if a=b and c=d then both the left and right child nodes are a match and thus the filter is a match.
The data is a collection of different types of objects, each with their own fields. For example, assume the collection represents a class at a school:
class { name = "math" room = "12A" }
teacher { name = "John" age = "35" }
student { name = "Billy" age = "6" grade = "A" }
student { name = "Jane" age = "7" grade = "B" }
So a filter might look like (&(teacher.name=John)(student.age>6)(student.grade=A)) and be parsed like so:
root = "&"
left = "teacher.name=John"
right = "&"
left = "student.age>6"
right = "student.grade=A"
I run the class object against it; no matches. I run the teacher object against it; root.left is a match. I run the first student node against it; root.right.right is a match. I run the second student node against it; root.right.left is a match. Then I traverse the tree and determine that all nodes matched and thus the final result is a match.
The problem is the intermediate matches need to be constrained based upon commonality: the student.age and student.grade filters need to somehow be tied together in order to store an intermediate match only if they match for the same object. I can't for the life of me figure out how to do this.
My filter node abstract base class:
class FilterNode
{
public:
virtual void Evaluate(string ObjectName, map<string, string> Attributes) = 0;
virtual bool IsMatch() = 0;
};
I have a LogicalFilterNode class that handles logical AND, OR, and NOT operations; it's implementation is pretty straightforward:
void LogicalFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
m_Left->Evaluate(ObjectName, Attributes);
m_Right->Evaluate(ObjectName, Attributes);
}
bool LogicalFilterNode::IsMatch()
{
switch(m_Operator)
{
case AND:
return m_Left->IsMatch() && m_Right->IsMatch();
case OR:
return m_Left->IsMatch() || m_Right->IsMatch();
case NOT:
return !m_Left->IsMatch();
}
return false;
}
Then I have a ComparisonFilterNode class that handles the leaf nodes:
void ComparisonFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
if(ObjectName == m_ObjectName) // e.g. "teacher", "student", etc.
{
foreach(string_pair Attribute in Attributes)
{
Evaluate(Attribute.Name, Attribute.Value);
}
}
}
void ComparisonFilterNode::Evaluate(string AttributeName, string AttributeValue)
{
if(AttributeName == m_AttributeName) // e.g. "age", "grade", etc.
{
if(Compare(AttributeValue, m_AttributeValue) // e.g. "6", "A", etc.
{
m_IsMatch = true;
}
}
}
bool ComparisonFilterNode::IsMatch() { return m_IsMatch; }
How it's used:
FilterNode* Root = Parse(...);
foreach(Object item in Data)
{
Root->Evaluate(item.Name, item.Attributes);
}
bool Match = Root->IsMatch();
Essentially what I need is for AND statements where the children have the same object name, the AND statement should only match if the children match for the same object.

Create a new unary "operator", let's call it thereExists, which:
Does have state, and
Declares that its child subexpression must be satisfied by a single input record.
Specifically, for each instance of a thereExists operator in an expression tree you should store a single bit indicating whether or not the subexpression below this tree node has been satisfied by any of the input records seen so far. These flags will initially be set to false.
To continue processing your dataset efficiently (i.e. input record by input record, without having to load the entire dataset into memory), you should first preprocess the query expression tree to pull out a list of all instances of the thereExists operator. Then as you read in each input record, test it against the child subexpression of each of these operators that still has its satisfied flag set to false. Any subexpression that is now satisfied should toggle its parent thereExists node's satisfied flag to true -- and it would be a good idea to also attach a copy of the satisfying record to the newly-satisfied thereExists node, if you want to actually see more than a "yes" or "no" answer to the overall query.
You only need to evaluate tree nodes above a thereExists node once, after all input records have been processed as described above. Notice that anything referring to properties of an individual record must appear somewhere beneath a thereExists node in the tree. Everything above a thereExists node in the tree is only allowed to test "global" properties of the collection, or combine the results of thereExists nodes using logical operators (AND, OR, XOR, NOT, etc.). Logical operators themselves can appear anywhere in the tree.
Using this, you can now evaluate expressions like
root = "&"
left = thereExists
child = "teacher.name=John"
right = "|"
left = thereExists
child = "&"
left = "student.age>6"
right = "student.grade=A"
right = thereExists
child = "student.name = Billy"
This will report "yes" if the collection of records contains both a teacher whose name is "John" and either a student named "Billy" or an A student aged over 6, or "no" otherwise. If you track satisfying records as I suggested, you'll also be able to dump these out in the case of a "yes" answer.
You could also add a second operator type, forAll, which checks that its subexpression is true for every input record. But this is probably not as useful, and in any case you can simulate forAll(expr) with not(thereExists(not(expr))).

Related

how to convert forEach to lambda

Iterator<Rate> rateIt = rates.iterator();
int lastRateOBP = 0;
while (rateIt.hasNext())
{
Rate rate = rateIt.next();
int currentOBP = rate.getPersonCount();
if (currentOBP == lastRateOBP)
{
rateIt.remove();
continue;
}
lastRateOBP = currentOBP;
}
how can i use above code convert to lambda by stream of java 8? such as list.stream().filter().....but i need to operation list.

The simplest solution is
Set<Integer> seen = new HashSet<>();
rates.removeIf(rate -> !seen.add(rate.getPersonCount()));
it utilizes the fact that Set.add will return false if the value is already in the Set, i.e. has been already encountered. Since these are the elements you want to remove, all you have to do is negating it.
If keeping an arbitrary Rate instance for each group with the same person count is sufficient, there is no sorting needed for this solution.
Like with your original Iterator-based solution, it relies on the mutability of your original Collection.

If you really want distinct and sorted as you say in your comments, than it is as simple as :
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.toCollection(() ->
new TreeSet<>(Comparator.comparing(Rate::getPersonCount))));
But notice that in your example with an iterator you are not removing duplicates, but only duplicates that are continuous (I've exemplified that in the comment to your question).
EDIT
It seems that you want distinct by a Function; or in simpler words you want distinct elements by personCount, but in case of a clash you want to take the max pos.
Such a thing is not yet available in jdk. But it might be, see this.
Since you want them sorted and distinct by key, we can emulate that with:
Collection<Rate> sorted = rates.stream()
.collect(Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new))
.values();
System.out.println(sorted);
On the other hand if you absolutely need to return a TreeSet to actually denote that this are unique elements and sorted:
TreeSet<Rate> sorted = rates.stream()
.collect(Collectors.collectingAndThen(
Collectors.toMap(Rate::getPersonCount,
Function.identity(),
(left, right) -> {
return left.getLos() > right.getLos() ? left : right;
},
TreeMap::new),
map -> {
TreeSet<Rate> set = new TreeSet<>(Comparator.comparing(Rate::getPersonCount));
set.addAll(map.values());
return set;
}));

This should work if your Rate type has natural ordering (i.e. implements Comparable):
List<Rate> l = rates.stream()
.distinct()
.sorted()
.collect(Collectors.toList());
If not, use a lambda as a custom comparator:
List<Rate> l = rates.stream()
.distinct()
.sorted( (r1,r2) -> ...some code to compare two rates... )
.collect(Collectors.toList());
It may be possible to remove the call to sorted if you just need to remove duplicates.

CKQuery with NSPredicate fails when using "CONTAINS" operator

According to Apples Class Reference CKQuery, the operator CONTAINS is one of the supported operators. However, that doesn't seem to work. I have a RecordType called myRecord, and a record with field name name type String. I try to fetch the record with two different predicates, one with "==" operator, and one with CONTAINS operator.
func getRecords() {
let name = "John"
let Predicate1 = NSPredicate(format: "name == %#",name)
let Predicate2 = NSPredicate(format: "name CONTAINS %#",name)
let sort = NSSortDescriptor(key: "Date", ascending: false)
let query = CKQuery(recordType: "myRecord", predicate: Predicate1)
// let query = CKQuery(recordType: "myRecord", predicate: Predicate2)
query.sortDescriptors = [sort]
let operation = CKQueryOperation(query: query)
operation.desiredKeys = ["name", "Date"]
operation.recordFetchedBlock = { (record) in
print(record["name"])
operation.queryCompletionBlock = { [unowned self] (cursor, error) in
dispatch_async(dispatch_get_main_queue()) {
if error == nil {
print ("sucess")
} else {
print("couldn't fetch record error:\(error?.localizedDescription)")
}
}
}
CKContainer.defaultContainer().publicCloudDatabase.addOperation(operation)
}
Using Predicate1, output is:
Optional(John)
sucess
Using Predicate2, output is:
couldn't fetch record error:Optional("Field \'name\' has a value type of STRING and cannot be queried using filter type LIST_CONTAINS")
Also using [c] to ignore casings gives a server issue.
How do I use the operator CONTAINS correctly?
EDIT:
I have now looked closer at the documentation, and seen that CONTAINS can only be used with SELF. Meaning that all String fields will be used for searching. Isn't there a better way?

It's an exception mentioned as below:
With one exception, the CONTAINS operator can be used only to test
list membership. The exception is when you use it to perform full-text
searches in conjunction with the self key path. The self key path
causes the server to look in searchable string-based fields for the
specified token string. For example, a predicate string of #"self
contains 'blue'" searches for the word “blue” in all fields marked for
inclusion in full-text searches. You cannot use the self key path to
search in fields whose type is not a string.
So, you can use 'self' instead of '%K' in order to search sub-text of string field.
For the full document written by Apple

Search for a list of words in a paragraph

I have a paragraph written in English.
I have a list of words.
I want to check if the paragraph contains any one word
What is the best algorithm to do so:
Presently, I have the following but it seems very naive:
private boolean findMatch(List<String> list, String param, ArrayList<String> skipChars) {
boolean matchResult = false;
for (String s : list) {
if(skipChars == null || !skipChars.contains(s)){
if (param.indexOf(s) != -1) {
matchResult = true;
break;
}
}
}
return matchResult;
}
}

split the paragraph to wrods, and store them in a hash table
now for each word in your list search for it in the hash.
for real life applications this will probably do.
--EDIT--
if you cannot split the paragraph into words, and you need to tell if only one word is in the paragraph I suggest constructing a trie from your list of words, and then going over the paragraph and checking the trie for matches as you go.

In c# i usually use linq to entities for quering list and get result.
this is my code:
private bool findMatch(List<String> list, String param, List<String> skipChars)
{
if (skipChars == null)
skipChars = new List<string>();
var c = (from l in list.Except(skipChars)
where param.IndexOf(l) != -1
select l).Count();
return c != 0;
}

Selecting first items in GroupBy when using custom Class

I have a very basic sql view which joins 3 tables: users, pictures, and tags.
How would one create the query below in a way that it won't list the same pictures more than once? In other words, I want to Group By pictures (I think) and return get the first insance of each.
I think this is very similar to the post Linq Query Group By and Selecting First Items, but I cannot figure out how to apply it in this case where the query is instantiating MyImageClass.
validPicSummaries = (from x in db.PicsTagsUsers
where x.enabled == 1
select new MyImageClass {
PicName = x.picname,
Username= x.Username,
Tag = x.tag }).Take(50);

To exclude duplicates, you can use the Distinct LINQ method:
validPicSummaries =
(from x in db.PicsTagsUsers
where x.tag == searchterm && x.enabled == 1
select new MyImageClass
{
PicName = x.picname,
Username= x.Username,
Tag = x.tag
})
.Distinct()
.Take(50);
You will need to make sure that the objects are comparable so that two MyImageClass objects that have the same PicName, Username, and Tag are considered equal (or however you wish to consider two of them as being equal).
You can write a small class that implements IEqualityComparer<T> if you would like to have a custom comparer for just this case. Ex:
private class MyImageClassComparer : IEqualityComparer<MyImageClass>
{
public bool Equals(MyImageClass pMyImage1, MyImageClass pMyImage2)
{
// some test of the two objects to determine
// whether they should be considered equal
return pMyImage1.PicName == pMyImage2.PicName
&& pMyImage1.Username == pMyImage2.Username
&& pMyImage1.Tag == pMyImage2.Tag;
}
public int GetHashCode(MyImageClass pMyImageClass)
{
// the GetHashCode function seems to be what is used by LINQ
// to determine equality. from examples, it seems the way
// to combine hashcodes is to XOR them:
return pMyImageClass.PicName.GetHashCode()
^ pMyImageClass.UserName.GetHashCode()
^ pMyImageClass.Tag.GetHashCode();
}
}
Then when you call distinct:
...
.Distinct(new MyImageClassComparer())
.Take(50);

Item-by-item list comparison, updating each item with its result (no third list)

The solutions I have found so far in my research on comparing lists of objects have usually generated a new list of objects, say of those items existing in one list, but not in the other. In my case, I want to compare two lists to discover the items whose key exists in one list and not the other (comparing both ways), and for those keys found in both lists, checking whether the value is the same or different.
The object being compared has multiple properites that constitute the key, plus a property that constitutes the value, and finally, an enum property that describes the result of the comparison, e.g., {Equal, NotEqual, NoMatch, NotYetCompared}. So my object might look like:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
}
These objects are collected into two ObservableCollection<MyObject> to be compared. When bound to the UI, the grid rows are being styled based on the caomparison result enum, so the user can easily see what keys are in the new dataset but not in the old, vice-versa, along with those keys in both datasets with a different value. Both lists are presented in the UI in data grids, with the rows styled based on the comparison result.
Would LINQ be suitable as a tool to solve this efficiently, or should I use loops to scan the lists and break out when the key is found, etc (a solution like this comes naturally to be from my procedural programming background)... or some other method?
Thank you!

You can use Except and Intersect:
var list1 = new List<MyObject>();
var list2 = new List<MyObject>();
// initialization code
var notIn2 = list1.Except(list2);
var notIn1 = list2.Except(list1);
var both = list1.Intersect(list2);
To find objects with different values (ColumnD) you can use this (quite efficient) Linq query:
var diffValue = from o1 in list1
join o2 in list2
on new { o1.columnA, o1.columnB, o1.columnC } equals new { o2.columnA, o2.columnB, o2.columnC }
where o1.columnD != o2.columnD
select new { Object1 = o1, Object2 = o2 };
foreach (var diff in diffValue)
{
MyObject obj1 = diff.Object1;
MyObject obj2 = diff.Object2;
Console.WriteLine("Obj1-Value:{0} Obj2-Value:{1}", obj1.columnD, obj2.columnD);
}
when you override Equals and GetHashCode appropriately:
class MyObject
{
//Key combination
string columnA;
string columnB;
decimal columnC;
//The Value
decimal columnD;
//Enum for comparison, used for styling the item (value hidden from UI)
//Alternatively...this could be a string type, holding the enum.ToString()
MyComparisonEnum result;
public override bool Equals(object obj)
{
if (obj == null || !(obj is MyObject)) return false;
MyObject other = (MyObject)obj;
return columnA.Equals(other.columnA) && columnB.Equals(other.columnB) && columnC.Equals(other.columnC);
}
public override int GetHashCode()
{
int hash = 19;
hash = hash + (columnA ?? "").GetHashCode();
hash = hash + (columnB ?? "").GetHashCode();
hash = hash + columnC.GetHashCode();
return hash;
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio