Complex conditional filter design - data-structures

I'm stuck at implementing some conditional rules in a form in the backend. Basically i need to come up with an efficient and scalable way of doing this. I was looking into binary trees and decision trees for this one but still not sure what's the best way to implement this.
As you can see there's one statement with the possibility of more than one conditions separated by logical AND/OR. Basically what i need to know is the data structure to store this information in the database. It will act as a filter when a form is submitted by the user based on the form values when it goes through the filter some action will take place as a result.

Your question is a bit generic, but let me see if I can help you get started. In Java, you can set up a class structure as follows :
interface ConditionTree {
boolean evaluate(...);
}
class OperatorNode implements ConditionTree {
List<ConditionTree> subTrees = ....;
#Override
boolean evaluate(...) {
if(operator == AND) {
//loop through and evaluate each sub tree and make sure they are all true,
//or return false
}
else if(operator == OR) {
//loop through and evaluate each sub tree, and make sure at least one is
//true, or return false
}
}
}
class LeafNode implements ConditionTree {
#Override
boolean evaluate(...) {
//get the LHS, operator, and RHS, and evaluate them
}
}
Notice it's an N-ary tree, not a binary tree.

Related

Don't know how to treat that as a predicate

I'm trying to run a custom query in my repository, but I'm getting a InvalidDataAccessResourceUsageException. "Don't know how to treat that as a predicate String("n.id = '1234'")".
public void myMethod() {
myRepository.queryUsingCustomFilters("n.id = '1234'");
}
public interface MyRepository() extends Neo4jRepository<MyObject, String> {
#Query("MATCH (n) WHERE {filter} RETURN n")
List<MyObject> queryUsingCustomFilters(#Param("filter") String filter);
}
I have a simple example for now, but the string I'm passing in the future could be a little bit more complicated, such as "n.id = '1234' AND (n.name = 'one name' OR n.name = 'another name')"
I don't believe you can pass entire clauses/predicates/queries as a #Param.
If you want to build queries at run time, you might want to look at composing it using the lower level Neo4j OGM filters (see https://neo4j.com/docs/ogm-manual/current/reference/#reference:filters)
So in the case you describe above, you could simply add Filters as required and chain them together to build your WHERE clause

How do I use a custom comparer with the Linq Distinct method?

I was reading a book about Linq, and saw that the Distinct method has an overload that takes a comparer. This would be a good solution to a problem I have where I want to get the distinct entities from a collection, but want the comparison to be on the entity ID, even if the other properties are different.
According to the book, if I have a Gribulator entity, I should be able to create a comparer like this...
private class GribulatorComparer : IComparer<Gribulator> {
public int Compare(Gribulator g1, Gribulator g2) {
return g1.ID.CompareTo(g2.ID);
}
}
...and then use it like this...
List<Gribulator> distinctGribulators
= myGribulators.Distinct(new GribulatorComparer()).ToList();
However, this gives the following compiler errors...
'System.Collections.Generic.List' does not contain a definition for 'Distinct' and the best extension method overload 'System.Linq.Enumerable.Distinct(System.Collections.Generic.IEnumerable, System.Collections.Generic.IEqualityComparer)' has some invalid arguments
Argument 2: cannot convert from 'LinqPlayground.Program.GribulatorComparer' to 'System.Collections.Generic.IEqualityComparer'
I've searched around a bit, and have seen plenty of examples that use code like this, but no complaints about compiler errors.
What am I doing wrong? Also, is this the best way of doing this? I want a one-off solution here, so don't want to start changing the code for the entity itself. I want the entity to remain as normal, but just in this one place, compare by ID only.
Thanks for any help.
You're implementing your comparer as an IComparer<T>, the LINQ method overload requires an implementation of IEqualityComparer:
private class GribulatorComparer : IEqualityComparer<Gribulator> {
public bool Equals(Gribulator g1, Gribulator g2) {
return g1.ID == g2.ID;
}
}
edit:
For clarification, the IComparer interface can be used for sorting, as that's basically what the Compare() method does.
Like this:
items.OrderBy(x => new ItemComparer());
private class ItemComparer : IComparer<Item>
{
public int Compare(Item x, Item y)
{
return x.Id.CompareTo(y.Id)
}
}
Which will sort your collection using that comparer, however LINQ provides a way to do that for simple fields (like an int Id).
items.OrderBy(x => x.Id);

String set implementation

I have to implement a set ADT for a pair of strings. The interface I want is (in Java):
public interface StringSet {
void add(String a, String b);
boolean contains(String a, String b);
void remove(String a, String b);
}
The data access pattern has the following properties:
The contains operation is far more frequent that the add and remove ones.
More often that not, contains returns true i.e. the search is successful
A simple implementation I can think of is to use a two-level hashtable, i.e. HashMap<String, HashMap<String, Boolean>>. But this datastructure makes no use of the two peculiarities of the access pattern. I am wondering if there is something more efficient than the hashtable, maybe by leveraging the access pattern peculiarities.
Personally, I would design this in terms of a standard Set<> interface:
public class StringPair {
public StringPair(String a, String b) {
a_ = a;
b_ = b;
hash_ = (a_ + b_).hashCode();
}
public boolean equals(StringPair pair) {
return (a_.equals(pair.a_) && b_.equals(pair.b_));
}
#Override
public boolean equals(Object obj) {
if (obj instanceof StringPair) {
return equals((StringPair) obj);
}
return false;
}
#Override
public int hashCode() {
return hash_;
}
private String a_;
private String b_;
private int hash_;
}
public class StringSetImpl implements StringSet {
public StringSetImpl(SetFactory factory) {
pair_set_ = factory.createSet<StringPair>();
}
// ...
private Set<StringPair> pair_set_ = null;
}
Then you could leave it up to the user of StringSetImpl to use the preferred Set type. If you are attempting to optimize access, though, it's hard to do better than a HashSet<> (at least with respect to runtime complexity), given that access is O(1), whereas tree-based sets have O(log N) access times.
That contains() usually returns true may make it worth considering a Bloom filter, although this would require that some number of false positives for contains() are allowed (don't know if that is the case).
Edit
To avoid the extra allocation, you can do something like this, which is similar to your two-level approach, except using a set rather than a map for the second level:
public class StringSetImpl implements StringSet {
public StringSetImpl() {
elements_ = new HashMap<String, Set<String>>();
}
public boolean contains(String a, String b) {
if (!elements_.containsKey(a)) {
return false;
}
Set<String> set = elements_.get(a);
if (set == null) {
return false;
}
return set.contains(b);
}
public void add(String a, String b) {
if (!elements_.containsKey(a) || elements_.get(a) == null) {
elements_.put(a, new HashSet<String>());
}
elements_.get(a).add(b);
}
public void remove(String a, String b) {
if (!elements_.containsKey(a)) {
return;
}
HashSet<String> set = elements_.get(a);
if (set == null) {
elements_.remove(a);
return a;
}
set.remove(b);
if (set.empty()) {
elements_.remove(a);
}
}
private Map<String, Set<String>> elements_ = null;
}
Since it's 4:20 AM where I'm located, the above is definitely not my best work (too tired to refresh myself on the treatment of null by these different collections types), but it sketches the approach.
Do not use normal trees (most standard library data structures) for this. There is one simple assumption, which will hurt you in this case:
The normal O(log(n)) calculation of operations on trees assume that comparisons are in O(1). This is true for integers and most other keys, but not for strings. In case of strings each comparison is on O(k) where k is the length of the string. This makes all operations dependent on the length, which will most likely hurt you if you need to be fast and is easily overlooked.
Especially if you most often return true there will be k comparisons for each string at each level, so with this access pattern you will experience the full drawback of strings in trees.
Your access pattern is easily handled by a Trie. Testing if a string is contained is in O(k) worst case (not average case as in a hash map). Adding a string is is also in O(k). Since you are storing two strings I would suggest, you don't index your trie by characters, but rather by some larger type, so you can add two special index values. One value for the end of the first string, and one value for the end of both strings.
In your case using these two extra symbols would also allow for simple removal: Just delete the final node containing the end symbol and your string will not be found anymore. You will waste some memory, because you still have the strings in your structure that have been deleted. In case this is a problem you could keep track of the number of deleted strings and rebuild your trie in case this get's to bad.
P.s. A trie can be thought of as a combination of a tree and several hashtables, so this gives you the best of both data structures.
I'd second the approach of Michael Aaron Safyan to use a StringPair type. Perhaps with a more specific name, or as a generic tuple type: Tuple<A,B> instantiated to Tuple<String,String>. But I would strongly suggest to use one of the provided set implementations, either a HashSet or a TreeSet.
Red-Black Tree implementation of the set would be a good option. C++ STL is implemented in Red-Black Tree

IList with an implicit sort order

I'd like to create an IList<Child> that maintains its Child objects in a default/implicit sort order at all times (i.e. regardless of additions/removals to the underlying list).
What I'm specifically trying to avoid is the need for all consumers of said IList<Child> to explicitly invoke IEnumerable<T>.OrderBy() every time they want to enumerate it. Apart from violating DRY, such an approach would also break encapsulation as consumers would have to know that my list is even sorted, which is really none of their business :)
The solution that seemed most logical/efficient was to expose IList<Child> as IEnumerable<Child> (to prevent List mutations) and add explicit Add/Remove methods to the containing Parent. This way, I can intercept changes to the List that necessitate a re-sort, and apply one via Linq:
public class Child {
public string StringProperty;
public int IntProperty;
}
public class Parent{
private IList<Child> _children = new List<Child>();
public IEnumerable<Child> Children{
get
{
return _children;
}
}
private void ReSortChildren(){
_children = new List<Child>(child.OrderBy(c=>c.StringProperty));
}
public void AddChild(Child c){
_children.Add();
ReSortChildren()
}
public void RemoveChild(Child c){
_children.Remove(c);
ReSortChildren()
}
}
Still, this approach doesn't intercept changes made to the underlying Child.StringProperty (which in this case is the property driving the sort). There must be a more elegant solution to such a basic problem, but I haven't been able to find one.
EDIT:
I wasn't clear in that I would preferable a LINQ compatible solution. I'd rather not resort to using .NET 2.0 constructs (i.e. SortedList)
What about using a SortedList<>?
One way you could go about it is to have Child publish an event OnStringPropertyChanged which passes along the previous value of StringProperty. Then create a derivation of SortedList that overrides the Add method to hookup a handler to that event. Whenever the event fires, remove the item from the list and re-add it with the new value of StringProperty. If you can't change Child, then I would make a proxy class that either derives from or wraps Child to implement the event.
If you don't want to do that, I would still use a SortedList, but internally manage the above sorting logic anytime the StringProperty needs to be changed. To be DRY, it's preferable to route all updates to StringProperty through a common method that correctly manages the sorting, rather than accessing the list directly from various places within the class and duplicating the sort management logic.
I would also caution against allowing the controller to pass in a reference to Child, which allows him to manipulate StringProperty after it's added to the list.
public class Parent{
private SortedList<string, Child> _children = new SortedList<string, Child>();
public ReadOnlyCollection<Child> Children{
get { return new ReadOnlyCollection<Child>(_children.Values); }
}
public void AddChild(string stringProperty, int data, Salamandar sal){
_children.Add(stringProperty, new Child(stringProperty, data, sal));
}
public void RemoveChild(string stringProperty){
_children.Remove(stringProperty);
}
private void UpdateChildStringProperty(Child c, string newStringProperty) {
if (c == null) throw new ArgumentNullException("c");
RemoveChild(c);
c.StringProperty = newStringProperty;
AddChild(c);
}
public void CheckSalamandar(string s) {
if (_children.ContainsKey(s))
var c = _children[s];
if (c.Salamandar.IsActive) {
// update StringProperty through our method
UpdateChildStringProperty(c, c.StringProperty.Reverse());
// update other properties directly
c.Number++;
}
}
}
I think that if you derive from KeyedCollection, you'll get what you need. That is only based on reading the documentation, though.
EDIT:
If this works, it won't be easy, unfortunately. Neither the underlying lookup dictionary nor the underlying List in this guy is sorted, nor are they exposed enough such that you'd be able to replace them. It might, however, provide a pattern for you to follow in your own implementation.

Write a linq expression to select a subtree of items

Given a class:
class Control
{
public Control Parent { get; set; }
public List<Control> Children { get; set; }
}
and a list:
List<Control> myControls;
Is it possible to write a linq query that will select all children & grandchildren for a given control? For example if a tree looks like this:
GridA1
PanelA1
TextBoxA1
TextBoxA2
PanelA2
ListBoxA1
ListBoxA2
GridB1
PanelB1
TextBoxB1
I'd like a query that, given list myControls that contains all above controls with Parent and Children properties set as approriate can be parameterized with PanelA1 and return TextBoxA1, TextBoxA2, PanelA2, ListBoxA1 and ListBoxA2. Is there an efficient way to do this with linq? I'm selecting a tree structure out of a database and looking for a better way to pull apart subtrees than a recursive function.
It's hard to do this in a tremendously pretty way with LINQ, since lambda expressions can't be self-recursive before they're defined. A recursive function (perhaps using LINQ) is your best bet.
How I'd implement it:
public IEnumerable<Control> ChildrenOf(this IEnumerable<Control> controls)
{
return controls.SelectMany(c =>
new Control[] { c }.Concat(ChildrenOf(c.Children)));
}

Resources