Linq To Sql Where does not call overridden Equals - linq

I'm currently working on a project where I'm going to do a lot of comparison of similar non-database (service layer objects for this discuss) objects and objects retrieved from the database via LinqToSql. For the sake of this discussion, assume I have a service layer Product object with a string field that is represented in the database. However, in the database, there is also a primary key Id that is not represented in the service layer.
Accordingly (as I often do for unit testing etc), I overrode Equals(Object), Equals(Product), and GetHashCode and implemented IEquatable with the expectation that I would be able to write code like this:
myContext.Products.Where(p => p.Equals(passedInProduct).SingleOrDefault();
And so forth.
The Equals override is tested and works. The objects are mutable so the usual caveats apply to the GetHashCode override. However, for the purposes of this example, the objects are not modified except by LtS and could be made readonly.
Here's the simple test:
Create a test object in memory and commit to the LtS context. By committing, the test object is populated with a few auto-generated fields.
Create another identical test object in memory (separate reference)
Attempt to retrieve the first object from the database using the second object as the criteria. (see code line above).
// Setup
string productDesc = "12A";
Product testProduct1 = _CreateTestProductInDatabase(productDesc);
Product testProduct2 = _CreateTestProduct(productDesc);
// check setup
Product retrievedProduct1 = ProductRepo.Retrieve(testProduct1);
//Assert.IsNotNull(retrievedProduct1);
// execute - try to retrieve the 'equivalent' product object
Product retrievedProduct2 = ProductRepo.Retrieve(testProduct2);
A simplified version of Retrieve (cruft removed is just parameter checks etc):
using (var dbContext = new ProductDataContext()) {
Product retrievedProduct = dbContext.Products
.Where(p => p.Equals(product)).SingleOrDefault();
NB: The overridden Equals method knows not to care about the auto-generated fields from the database and only looks at the string that is represented in the service layer.
Here's what I observed:
Retrieve on testProduct1 succeeds (no surprise, equal by reference)
Retrieve on testProduct2 fails (null)
The overridden Equals method called in the Retrieve method is never hit during either Retrieve calls
However, the overridden Equals method is called multiple times by the context on SubmitChanges (called when creating the first test object in the database) (works as expected).
Statically, the compiler knows that the type of the objects being emitted and is able to resolve the type.
So my specific questions:
Am I trying to do something ill-advised? Seems like a straightforward use of Equals.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
I'm as much interested in understanding as making my Equals calls work. But I also would love to learn how to make this 'pattern' work rather than just understand why it appears to be an 'anti-pattern' in the contest of LtS and C#.
Please don't suggest I just filter directly on the context with Where statements. Obviously, I can remove the Equals call and do that. However, some of the other objects (not presented here) are large and a bit complicated. For the sake of maintenance and clarity, I want to keep knowledge of how to compare itself to another of its own type in one place and ideally as part of the object in question.
Some other things I tried that didn't change the behavior:
Overloaded and used == instead
Casting the lambda variable to the type p => (Product)p
Getting an IQueryable object first and calling Equals in the Where clause
Some other things I tried that didn't work:
Creating a static ProductEquals(Product first, Product second) method: System.NotSupportedException:has no supported translation to SQL.
Thanks StackOverflow contributors!
Re Possible dups: I've read ~10 other questions. I'd love a pointer to an exact duplicate but most don't seem to directly address what seems to be an oddity of LinqToSql.

Am I trying to do something ill-advised?
Absolutely. Consider what LINQ to SQL does: it creates a SQL representation of your query. It doesn't know what your overridden Equals method does, so can't translate that logic into SQL.
Corollary to first question: alternate suggestions to deal with linq to sql equality checking while keeping comparison details inside the objects rather than the repository
You'd need to do something with expression trees to represent the equality that way - and then build those expression trees up into a full query. It won't be fun, but it should be possible. It will affect how you build all your queries though.
I would have expected most database representations to be ID-based though, so you should be able to just compare IDs for equality. Usually when I've seen attempts to really model data in an OO fashion but store it in a database, the leakiness of the abstraction has caused a lot of pain.
Why might I have observed the Equals method being resolved in SubmitChanges but not in the Where clause?
Presumably SubmitChanges is working against a set of in-memory objects to work out what's changed - it doesn't have to do any conversion to SQL to do that part.

Related

What is the name of the design pattern to avoid chained field access?

There is a pattern or term that is used to avoid codes like
myObject.fieldA.fieldB.fieldC
something like this. I forgot what this term is called. Can anyone let me know about it?
It violates the Law of Demeter, which states that code should only access its own local variables, parameters, and instance members.
It could be a case of of feature envy, where a class calls a lot of getters or accesses a lot of data from another class.
If these are really fields, they are poorly encapsulated (i.e., not behind a function), and any change to these fields forces you to modify all code that's using them.
Testing such code becomes hard, as you will have to mock not only fieldA, but also that's fieldB, and in turn that's fieldC.
I think you are trying to create a new object and add certain properties to that object. If that is the case then it's Builder design patten where you seperate the construction and representation.
If you are trying to call a certain field with the above shown code then your design is very poor. An object should store only it's own properties.

Trying to identify if a data injection method has a name already

Lets say we have a class "Car" than has different pieces of data ( maker, model, color, fabrication date, registration date, etc). The class has no method to get data, but it knows to as for it from another object (sent via constructor, let's cal it for short DS).- and the same for when needing to update changes.
A method getColor() would be implemented like this
if(! this->loaded('color')){
this->askDS('color') // this will do the necesarry work to generate a request to DS
}
return this->information('color');
Nothing too fancy so far. No comes the part i want to find out if it has a name, or if there are libraries / frameworks that do this already.
DS has a list of methods registered dinamically based on the class that needs data. For car we have:
input: car serial number, output: method to use to read the numbers to extract raw values
input: car raw color value, output: color code
input: car color code, manufacturer, year, mode, output:human-readable color (for example navy blue)
Now, DS or any method does not have an ordered list of using command to start from serial number and return the color blue, but if can construct a chain of methods that from one set of data, it can run them in order and get the desired data.
For our example above, DS runs 1,2,3 in that order and injects the data resulted from all methods into the class object that needed it.
Now if the car needs registration info, we have method (4) that gets that from the police database with an api request.
So, given:
- a type of model (class/object)
- a list of methods that take a fixed list of input(object properties) and give out a fixed list of output (object properties)
- a class DS that can glue the methods and run the needed ones for a model to get from property A (serial) to properby B (human readable colour) without the model or DS having a preconfigured way to get this data but finding it as needed.
does this have a name or is it already implemented somewhere ?
I've implemented a very basic prototype and it works very nice and i think this implementation method has useful features:
if you have a set of methods that do sql queries and then your app switches to using an api, you only need to change the methods and don't have to touch any other part of the application
when looking for a chain of methods that resolve the 'need' the object has, you can find a method chain, run it, if it fails keep looking for another list of methods based on the currently available data - so if you have multiple sources for a piece of data, it can try multiple versions
starting from the above paragraph i could start with an app that only has sql queries for data retrieval - when i find out a part of the app overloads the sql server i could add a method to retrieve data from cache with a lower cost than the one from database (or multiple layered caches, each with different costs)
i could probably add business logi in the mix the same ways as cache, and based on the user location / options present different data
this requires less coding overall, and decouples the data source from the object, making each piece easier to mock/test
what is needed to make this fast is a caching solution for the discovered method chains, since matching hundreds of thousands of methods per model type would be time-consuming but I don't think this is very hard to do - just store all found chains in memory as you find them and some metadata to be able to resume a search from any point in time - when you update the methods, just clear the cache, take a performance hit for the first requests
Thank you for your time
What you describe sounds like a somewhat roundabout way of doing Dependency Injection. Quote:
"Passing the service to the client, rather than allowing a client to
build or find the service, is the fundamental requirement of the
pattern."
Depending on what language you're using, there should be several Dependency Injection frameworks/libraries available.

How to check whether an instance of an ActiveRecord model is up to date?

For testing reasons, I want to check that one of my methods doesn't update a specific entry in my database. Is there a simple way to ask an instance of an ActiveRecord model if its in sync with the database? for instance, if we had a method foobar? that could do this:
old_post = Post.find(1)
updated_post = Post.find(1)
updated_post.update_attributes(name: "this is a new name not like the old name")
old_post.foobar? #should return true, as its attributes are no longer up to date
updated_post.foobar? #should return false, as its attributes match the database directly
So is there a method that acts like foobar, or something like it? Thanks in advance.
I think your problem lies beyond finding a method which tells you wether an attribute has been updated, but in the relationship among the different objects that are instantiated. First it is important to understand, that old_post and updated_post are unrelated ruby objects. They know about how to save their own state to the database, but they do not know about each other.
Therefore your first requirement for foobar? cannot be fulfilled, as old_post will think it is up-to-date as long as no attribute has been updated. In contrast the changed? method will roughly answer in the way you are trying to achieve for updated_post. However it does so because it thinks nothing has happened since it was last saved, this will not be verified against the database upon each call of changed? as this would be wasting a database call in 99.9% of all cases.
This means it is all too easy to generate anomalies between the objects you created as there is no direct connection between the two (except the implicit connection that they once represented the same database row). If you change an attribute in one object (using e.g. title='?' it will change the value of the object and take note of the change in the changed-array. Once you save this object it will save its changed attributes to the database (by creating an individually constructed update-statement).
Another object that is already instantiated (as old_post in your example) will not know about this change and might change other attributes if you are not careful (or even the same ones if they have been changed again). Depending on your database adapter you may try to use the lock! method which will synchronize your object with the database before allowing any modifications. This however will not happen automatically as in most controller methods updates do not conflict nearly often enough to merit the synchronization as it will be idempotent in most cases.
This does not go without saying that rails can not save you from thinking about your transaction semantics if you want to guarantee specific ACID semantics for your controller methods.

What is the benefit of using IQueryable and LINQ queries?

I have a project where was realized own configuration classes:
IconSizesConfigSection: ConfigurationSection
IconSizesCollection: ConfigurationElementCollection
IconSize: ConfigurationElement
In Config class exists this property:
public IQueryable<IconSize> IconSizes
{
get
{
IconSizesConfigSection configInfo = (IconSizesConfigSection)ConfigurationManager.GetSection("iconConfig");
return configInfo.IconSizes.OfType<IconSize>().AsQueryable<IconSize>();
}
}
IconSizes property returns IconSizesCollection which derives from ConfigurationElementCollection. In turn ConfigurationElementCollection derives from ICollection, IEnumerable.
In some another class I have such code:
var previewIconSize = Config.IconSizes.FirstOrDefault(c => c.Name == "AvatarSize");
Why in such case uses Deffered Execution?
Why initially it uses AsQueryable<IconSize>() for collection and then uses LINQ and Deffered Execution?
Is there any benefits compared with using simple List?
In these case, there is no practical benefit. Using IQueryable is helpful for cases when query rewriting/translation will optimize performance. You will actually incur decreased performance in the provided example.
One example of using IQueryable in a helpful way is the significant performance increase gained when lazily translating and evaluating queries against a database or web service. This will perform significantly better than the alternative of pulling massive result sets and applying query logic in active memory with a "simple List".
The way you can tell that using the IQueryable in your case is detrimental is that the collection is already loaded into memory, when you begin the query.
Both IEnumerable and IQueryable use deferred execution. The difference is that IQueryable is used to cross boundaries like database queries, entity framework queries or OData queries.
When an IQueryable is iterated over, the query is translated to the remote provider's idiom and executed there. When the response is received from the remote provider, it is translated to a local object representation.
Deferred Execution is good because your user may never use the result set and hence there would have been no point querying the data source.
There may be some LINQ methods your user can't use unless they cast the result to IQueryable which means you might restrict what they can do, or force them to cast/copy the list into something more useful.
If you use a List, then you're hard coding your solution to a List, do you care what the implementation of the collection is, does your user ... probably not as long as it supports the necessary interfaces.

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

Resources