Performance of IQueryable versus Dictionary - linq

I'm caching a whole bunch of static metadata in my app at startup. It's from a db and there are tons of Foreign Key relationships. I'm looking into the best way of modelling them.
I've just started off with LINQ.
It's easy for me to declare a class
public class AllData {
public static IQueryable<libm_ColPurpose> IQ_libm_ColPurpose = libm_ColPurpose.All();
public static IQueryable<libm_ColType> IQ_libm_ColType = libm_ColType.All();
...
(I'm using SubSonic 3 to generate my classes, but that's beside the point).
Then I can use the IQueryable<T> members to get access to anything I want, for example:
libm_ColType ct = AllData.IQ_libm_ColType.SingleOrDefault(x => x.ColTypeStr == this.DefaultJetColTypeStr);
Prior to using IQueryable I was using Dictionaries to store FK relationships, so to mimic the above code I'd code the following from a preexisting List<libm_ColType> list
Dictionary<string, libm_ColType> colTypeByColTypeStr = new Dictionary<string, libm_ColType>();
foreach (libm_ColType x in list) { rtn.Add(x.ColTypeStr, x); }
and then I could use
libm_ColType ct = colTypeByColTypeStr[this.DefaultJetColTypeStr];
OK, so finally we get to the question !
The Dictionary lookup by ID is extremely efficient, however the IQueryable solution is far more flexible and elegant.
I'm wondering how much of a performance hit I'm going to get using IQueryable. I suspect I am doing a linear scan of the list each time I call it, and that's really gonna add up over repeat calls if there are a lot of records involved.
It woul be great if I could identify unique-valued columns and have a hashtable generated and cached after the first lookup, but I suspect this is not gonna be part of the offering.
This is a bit of a dealbreaker for me regarding using LINQ.
Note (I'll repeat it again) that I'm NOT pulling data from a database, it's already in memory and I'm querying it there, so I'm only interesting in looking up the in-memory IQueryable<T>.

IQueryable represents a collection in a data-store, so you probably don't have the collections in memory. If you explicitly want in-memory collections, then I would go back to your dictionaries. Remember, this doesn't prevent you from using LINQ queries over the data.

Related

What is the benefit of using IQueryable and LINQ queries?

I have a project where was realized own configuration classes:
IconSizesConfigSection: ConfigurationSection
IconSizesCollection: ConfigurationElementCollection
IconSize: ConfigurationElement
In Config class exists this property:
public IQueryable<IconSize> IconSizes
{
get
{
IconSizesConfigSection configInfo = (IconSizesConfigSection)ConfigurationManager.GetSection("iconConfig");
return configInfo.IconSizes.OfType<IconSize>().AsQueryable<IconSize>();
}
}
IconSizes property returns IconSizesCollection which derives from ConfigurationElementCollection. In turn ConfigurationElementCollection derives from ICollection, IEnumerable.
In some another class I have such code:
var previewIconSize = Config.IconSizes.FirstOrDefault(c => c.Name == "AvatarSize");
Why in such case uses Deffered Execution?
Why initially it uses AsQueryable<IconSize>() for collection and then uses LINQ and Deffered Execution?
Is there any benefits compared with using simple List?
In these case, there is no practical benefit. Using IQueryable is helpful for cases when query rewriting/translation will optimize performance. You will actually incur decreased performance in the provided example.
One example of using IQueryable in a helpful way is the significant performance increase gained when lazily translating and evaluating queries against a database or web service. This will perform significantly better than the alternative of pulling massive result sets and applying query logic in active memory with a "simple List".
The way you can tell that using the IQueryable in your case is detrimental is that the collection is already loaded into memory, when you begin the query.
Both IEnumerable and IQueryable use deferred execution. The difference is that IQueryable is used to cross boundaries like database queries, entity framework queries or OData queries.
When an IQueryable is iterated over, the query is translated to the remote provider's idiom and executed there. When the response is received from the remote provider, it is translated to a local object representation.
Deferred Execution is good because your user may never use the result set and hence there would have been no point querying the data source.
There may be some LINQ methods your user can't use unless they cast the result to IQueryable which means you might restrict what they can do, or force them to cast/copy the list into something more useful.
If you use a List, then you're hard coding your solution to a List, do you care what the implementation of the collection is, does your user ... probably not as long as it supports the necessary interfaces.

java customize a hashmap values

I am working on using a real time application in java, I have a data structure that looks like this.
HashMap<Integer, Object> myMap;
now this works really well for storing the data that I need but it kills me on getting data out. The underlying problems that I run into is that if i call
Collection<Object> myObjects = myMap.values();
Iterator<object> it = myObjects.iterator();
while(it.hasNext(){ object o = it.next(); }
I declare the iterator and collection as variable in my class, and assign them each iteration, but iterating over the collection is very slow. This is a real time application so need to iterate at least 25x per second.
Looking at the profiler I see that there is a new instance of the iterator being created every update.
I was thinking of two ways of possibly changing the hashmap to possibly fix my problems.
1. cache the iterator somehow although i'm not sure if that's possible.
2. possibly changing the return type of hashmap.values() to return a list instead of a collection
3. use a different data structure but I don't know what I could use.
If this is still open use Google Guava collections. They have things like multiMap for the structures you are defining. Ok, these might not be an exact replacement, but close:
From the website here: https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
Every experienced Java programmer has, at one point or another, implemented a Map> or Map>, and dealt with the awkwardness of that structure. For example, Map> is a typical way to represent an unlabeled directed graph. Guava's Multimap framework makes it easy to handle a mapping from keys to multiple values. A Multimap is a general way to associate keys with arbitrarily many values.

Is there an implementation of IQueryable over DbDataReader?

I have a lot of existing code which uses raw ADO.NET (DbConnection, DbDataReader, etc). I would like to transition to using LINQ to SQL for new code, but for now put both the existing and new code behind a unified set of Repository classes.
One issue I have is this: I would like the Repository classes to expose result sets as IQueryable<> which I get for free with LINQ to SQL. How do I wrap my existing DbDataReader result sets in an IQueryable? Do I have to implement IQueryable over DbDataReader from scratch?
Note I am aware of LINQ to DataSet, but I don't use DataSets because of memory scale issues, as the result sets I deal with can be quite large (order of 1000s). This implies that the IQueryable over DbDataReader implementation will need to be efficient as well (i.e. don't cache results in memory).
I can't see any benefit in implement IQueryable<T> - that suggests more functionality than is actually available - however, you could implement it as an IEnumerable<T> easily enough, with the caveat that it is once-only. An iterator block would be a reasonable choice:
public static IEnumerable<IDataRecord> AsEnumerable(
this IDataReader reader)
{
while (reader.Read())
{
yield return reader; // a bit dangerous
}
}
The "a bit dangerous" is because the caller could cast it back and abuse it...

Do you ToList()?

Do you have a default type that you prefer to use in your dealings with the results of LINQ queries?
By default LINQ will return an IEnumerable<> or maybe an IOrderedEnumerable<>. We have found that a List<> is generally more useful to us, so have adopted a habit of ToList()ing our queries most of the time, and certainly using List<> in our function arguments and return values.
The only exception to this has been in LINQ to SQL where calling .ToList() would enumerate the IEnumerable prematurely.
We are also using WCF extensively, the default collection type of which is System.Array. We always change this to System.Collections.Generic.List in the Service Reference Settings dialog in VS2008 for consistency with the rest of our codebase.
What do you do?
ToList always evaluates the sequence immediately - not just in LINQ to SQL. If you want that, that's fine - but it's not always appropriate.
Personally I would try to avoid declaring that you return List<T> directly - usually IList<T> is more appropriate, and allows you to change to a different implementation later on. Of course, there are some operations which are only specified on List<T> itself... this sort of decision is always tricky.
EDIT: (I would have put this in a comment, but it would be too bulky.) Deferred execution allows you to deal with data sources which are too big to fit in memory. For instance, if you're processing log files - transforming them from one format to another, uploading them into a database, working out some stats, or something like that - you may very well be able to handle arbitrary amounts of data by streaming it, but you really don't want to suck everything into memory. This may not be a concern for your particular application, but it's something to bear in mind.
We have the same scenario - WCF communications to a server, the server uses LINQtoSQL.
We use .ToArray() when requesting objects from the server, because it's "illegal" for the client to change the list. (Meaning, there is no purpose to support ".Add", ".Remove", etc).
While still on the server, however, I would recommend that you leave it as it's default (which is not IEnumerable, but rather IQueryable). This way, if you want to filter even more based on some criteria, the filtering is STILL on the SQL side until evaluated.
This is a very important point as it means incredible performance gains or losses depending on what you do.
EXAMPLE:
// This is just an example... imagine this is on the server only. It's the
// basic method that gets the list of clients.
private IEnumerable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsEnumerable();
}
// This method here is actually called by the user...
public Client[] GetClientsForLoggedInUser()
{
var clients = GetClients().Where(client=> client.Owner == currentUser);
return clients.ToArray();
}
Do you see what's happening there? The "GetClients" method is going to force a download of ALL 'clients' from the database... THEN the Where clause will happen in the GetClientsForLoogedInUser method to filter it down.
Now, notice the slight change:
private IQueryable<Client> GetClients()
{
var result = MyDataContext.Clients;
return result.AsQueryable();
}
Now, the actual evaluation won't happen until ".ToArray" is called... and SQL will do the filtering. MUCH better!
In the Linq-to-Objects case, returning List<T> from a function isn't as nice as returning IList<T>, as THE VENERABLE SKEET points out. But often you can still do better than that. If the thing you are returning ought to be immutable, IList is a bad choice because it invites the caller to add or remove things.
For example, sometimes you have a method or property that returns the result of a Linq query or uses yield return to lazily generate a list, and then you realise that it would be better to do that the first time you're called, cache the result in a List<T> and return the cached version thereafter. That's when returning IList may be a bad idea, because the caller may modify the list for their own purposes, which will then corrupt your cache, making their changes visible to all other callers.
Better to return IEnumerable<T>, so all they have is forward iteration. And if the caller wants rapid random access, i.e. they wish they could use [] to access by index, they can use ElementAt, which Linq defines so that it quietly sniffs for IList and uses that if available, and if not it does the dumb linear lookup.
One thing I've used ToList for is when I've got a complex system of Linq expressions mixed with custom operators that use yield return to filter or transform lists. Stepping through in the debugger can get mighty confusing as it jumps around doing lazy evaluation, so I sometimes temporarily add a ToList() to a few places so that I can more easily follow the execution path. (Although if the things you are executing have side-effects, this can change the meaning of the program.)
It depends if you need to modify the collection. I like to use an Array when I know that no one is going to add/delete items. I use a list when I need to sort/add/delete items. But, usually I just leave it as IEnumerable as long as I can.
If you don't need the added features of List<>, why not just stick with IQueryable<> ?!?!?! Lowest common denominator is the best solution (especially when you see Timothy's answer).

How to work around a potential performance issue when using a Grails hasMany relation?

Given the following domain classes:
class Post {
SortedSet tags
static hasMany = [tags: Tag]
}
class Tag {
static belongsTo = Post
static hasMany = [posts: Post]
}
From my understanding so far, using a hasMany will result in hibernate Set mapping.
However, in order to maintain uniqueness/order, Hibernate needs to load the entire set from the database and compare their hashes.
This could lead to a significant performance problem with adding and deleting posts/tags
if their sets get large. What is the best way to work around this issue?
There is no order ensured by Hibernate/GORM in the default mapping. Therefore, it doesn't have to load elements from the database in order to do the sorting. You will have your hands on a bunch of ids, but that's that extent of it.
See 19.5.2:
http://www.hibernate.org/hib_docs/reference/en/html/performance-collections.html
In general, Hibernate/GORM is going to have better performance than you expect. Unless and until you can actually prove a real-world performance issue, trust in the framework and don't worry about it.
The ordering of the set is guaranteed by the Set implementation, ie, the SortedSet. Unless you use a List, which keeps track of indexes on the db, the ordering is server-side only.
If your domain class is in a SortedSet, you have to implement Comparable in order to enable the proper sorting of the set.
The question of performance is not really a question per se. If you want to access a single Tag, you should get it by its Id. If you want the sorted tags, well, the sort only makes sense if you are looking at all Tags, not a particular one, so you end up retrieving all Tags at once. Since the sorting is performed server-side and not db-side, there is really not much difference between a SortedSet and a regular HashSet in regards to Db.
The Grails docs seems to be updated:
http://grails.org/doc/1.0.x/
In section 5.2.4 they discuss the potential performance issues for the collection types.
Here's the relevant section:
A Note on Collection Types and Performance
The Java Set type is a collection that doesn't allow duplicates. In order to ensure uniqueness when adding an entry to a Set association Hibernate has to load the entire associations from the database. If you have a large numbers of entries in the association this can be costly in terms of performance.
The same behavior is required for List types, since Hibernate needs to load the entire association in-order to maintain order. Therefore it is recommended that if you anticipate a large numbers of records in the association that you make the association bidirectional so that the link can be created on the inverse side. For example consider the following code:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
book.author = author
book.save()
In this example the association link is being created by the child (Book) and hence it is not necessary to manipulate the collection directly resulting in fewer queries and more efficient code. Given an Author with a large number of associated Book instances if you were to write code like the following you would see an impact on performance:
def book = new Book(title:"New Grails Book")
def author = Author.get(1)
author.addToBooks(book)
author.save()

Resources