Linq Pivot Query on Column with Unknown Values - linq

Say I have the following class:
public class Sightings
{
public string CommonName { get; set; }
public string ScientificName { get; set; }
public string TimePeriod { get; set; }
public bool Seen { get; set; }
}
I would like to create a pivot query on TimePeriod. If I knew what the values in TimePeriod were ahead of time I would know how to write the query. However, I don't know. They could be years (e.g., 2007, 2008, 2009 etc.) or they could be months (e.g., Jan-2004, Feb-2004, March-2004) or they could be quarters (e.g., Q1-2004, Q2-2005, Q3-2004 etc.). It really depends on how the user wants to view the data.
Is it possible to write a Linq query that can do this, when the values in the pivot column are not known?
I've considered writing the values to an Excel sheet, creating a pivot sheet and then reading the values out of the sheet. I may have to do that if a Linq query won't work.

Do you know the most granular values that the user can choose ?
If for example it's months, then you can simply write a query for months (see Is it possible to Pivot data using LINQ?).
If the user then chooses years, it should be simple to aggregate those values to years.
If they want quarters, then you can aggregate to quarters and so on.
It's maybe not the best/most generic solution, but it should get the job done quite easily.

Related

Return subset of Redis Values matching specific property?

Let's say I have this kind of model:
public class MyModel
{
public long ID { get; set; }
public long ParentModelID { get; set; }
public long ReferenceID1 { get; set; }
public long ReferenceID2 { get; set; }
}
There are more attributes, but for examples sake, it is just this. There are around 5000 - 10000 rows of this model. Currently storing it in a Redis Set.
Is there an efficient way in REDIS to query only a subset of the whole Data Set? For example, in LINQ I can do:
allModels.Where(m => m.ParentModelID == my_id);
or
allModels.Where(m => m.ReferenceID1 == my_referenceid);
Basically, being able to search through the dataset without returning the whole dataset and performing the LINQ queries against that. Because querying and returning 10,000 rows to get only 100 is not efficient?
You can use an OHM (Object-Hash Mapper, like ORM) in your favorite language to achieve the LINQ-like behavior. There are quite a few listed under the "Higher level libraries and tools" section of the [Redis Clients page](https://redis.io/clients.
Alternatively, you can implement it yourself using the patterns described at https://redis.io/topics/indexes.
You can't use something like LINQ in Redis out of the box. Redis is just a key-value store, so it doesn't have the same principles or luxuries as a relational database. It doesn't have queries or relations, so something like LINQ just doesn't translate at all.
As a workaround, you could segment your data using different keys. Each key could reference a set that stores values with a specific range of reference Ids. That way you wouldn't need to retrieve all 10,000 items.
I would also recommend looking at hashes, this might be more appropriate than a set depending on your use case as they're better at storing complex data objects.

How can I speedup IEnumerable<T> database access

I am using EF6 code first to execute a query that pulls a large amount of data for parallelized GPU processing. The linq query returns an IEnumerable.
IEnumerable<DatabaseObject> results = ( from items in _myContext.DbSet
select items).Include("Table1").Include("Table2");
Now, I need to perform some statistical analysis on the complete set of data, and present the result to the user.
Unfortunately, because of the sheer size of the returned data, just doing a
results.ToList() is taking an extremely long time to complete... and I haven't even begun the parallelized processing of the data as yet!
I there anything that I can do to make this more efficient other than reducing the amount of data being pulled? This is not an option since it is the complete set of data that needs to be processed.
EDIT 1
My current code first is as follows:
public class Orders
{
[Key]
public virtual DateTime ServerTimeId
{
get;
set;
}
public string Seller
{
get;
set;
}
public decimal Price
{
get;
set;
}
public decimal Volume
{
get;
set;
}
public List<Table1> Tables1{ get; set; }
public List<Table2> Table22{ get; set; }
}
Although by not using .Include my query speeds up significantly, if I do not use .Include ("Tables1).Include("Tables2") these fields are null
in the final result for this query:
var result = ( from items in _context.DbOrders
select orderbook ).Include("Tables1").Include("Tables2")
In my DbContext, I have defined:
public DbSet<Orderok> DbOrders { get; set; }
If there is a way to force EF6 to populate these tables without the use of .Include, then I'd be very pleased if someone could instruct me.
You can load the main table, DbOrders and the child tables separately into the context:
_myContext.Configuration.ProxyCreationEnabled = false;
_myContext.DbOrders.Load();
_myContext.Table1.Load();
_myContext.Table2.Load();
Now the context is fully charged with the data you need. I hope you won't run into an out of memory exception (because then the whole approach collapses).
Entity Framework excecutes relationship fixup, which means that it populates the navigation properties DbOrders.Table1 and DbOrders.Table1.
Disabling proxy creation has two reasons:
The materialized objects will be as light-weight as possible
Lazy loading is disabled, otherwise it would be triggered when you access a navigation property.
Now you can continue working wit the data by accessing the Local collection:
from entity in _myContext.DbOrders.Local
...
You can further try to speed op the process by unmapping all database fields that you don't need. This makes the SQL result sets smaller and the materialized object will be even lighter. To achieve that, maybe you have to create a dedicated context.

NHibernate Many-To-Many Performance Issue

My application has the following entities (with a many-to-many relationship between Product and Model):
public class TopProduct {
public virtual int Id { get; set; }
public virtual Product Product { get; set; }
public virtual int Order { get; set; }
}
public class Product {
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IList<Model> Models { get; set; }
}
public class Model {
public virtual string ModelNumber { get; set; }
public virtual IList<Product> Products { get; set; }
}
Note: A product could have 1000s of models.
I need to display a list of TopProducts and the first 5 models (ordered alphabetically) against each one.
For example say I have the following query:
var topProducts = session.Query<TopProduct>()
.Cacheable()
.Fetch(tp => tp.Product).ThenFetchMany(p => p.Models)
.OrderBy(tp => tp.Order)
.ToList();
If I now say:
foreach (var topProduct in topProducts) {
var models = topProduct.Product.Models.Take(5).ToList();
...
}
This executes extremely slowly as it retrieves an item from the second level cache for each model. Since there could be 1000s of models against a product, it would need to retrieve 1000s of items from the cache the second time it is executed.
I have been racking my brain trying to think of a better way of doing this but so far I am out of ideas. Unfortunately my model and database cannot be modified at this stage.
I'd appreciate the help. Thanks
The key to your problem is understanding how entity and query caching work.
Entity caching stores, essentially, the POID of an entity and its property values.
When you want to get/initialize an instance, NH will first check the cache to see if the values are there, in order to avoid a db query.
Query caching, on the other hand, stores a query as the key (to simplify, let's say it's the command text and the parameter values), and a list of entity ids as the value (this is assuming your result is a list of entities, and not a projection)
When NH executes a cacheable query, it will see if the results are cached. If they are, it will load the proxies from those ids. Then, as you use them, it will initialize them one by one, either from the entity cache or from the db.
Collection cache is similar.
Usually, getting many second-level cache hits for those entity loads is a good thing. Unless, of course, you are using a distributed cache located in a separate machine, in which case this is almost as bad as getting them from the db.
If that is the case, I suggest you skip caching the query.

Linq dynamic queries for user search screens

I have a database that has a user search screen that is "dynamic" in that I can add additional search criteria on the fly based on what columns are available in the particular view the search is based on and it will allow the user to use them immediately. Previously I had been using nettiers for this database, but now I am programming a new application against it using RIA and EntFramework 4 and LINQ.
I currently have 2 tables that are used for this, one that fills the combobox with the available search string patterns:
LastName
LastName, FirstName
Phone
etc....
then I have an other table that splits those criteria out and is used in my nettiers algorithms. It works well, but I want to use LINQ..and it doesnt fit this model very well. Besides I think I can pare it down to just one table with linq...
using a format similar to this or something very close...
ID Criteria WhereClause
1 LastName 'Lastname Like '%{0}%'
now I know this wont fit specifically into a linq query..but I am trying to use a univeral syntax for clarity here...
the real where clause would look something like this: a=>a.LastName.Contains("{0}")
My first question is: Is that even possible to do? Feed a lambda in to a string and use it in a Linq Query?
My second question is: at one point when I was researching this before I found a linq syntax that had a prefix like it.LastName{0}
and I appear to have tried using it because vestiges of it are still in my test databases...but I dont know recall where I read about it.
Is anyone doing this? I have done some searches and found similar occurances but they mostly have static fields that are optional, not exactly the way I am doing it...
As for your first question, you can do this using Dynamic Linq as described by Scott Gu here
var query = Northwind.Products.Where("Lastname LIKE "test%");
I'm not sure how detailed your dynamic query needs to be, but when I need to do dynamic queries, I create a class to represent filter values. Then I pass that class to a search method on my repository. If the value for a field is null then the query ignores it. If it has a value it adds the appropriate filter.
public class CustomerSearchCriteria{
public string LastName { get; set; }
public string FirstName { get; set; }
public string PhoneName { get; set; }
}
public IEnumberable<Customer> Search(CustomerSearchCriteria criteria){
var q = db.Customers();
if(criteria.FirstName != null){
q = q.Where(c=>c.FirstName.Contains(criteria.FirstName));
}
if(criteria.LastName!= null){
q = q.Where(c=>c.LastName.Contains(criteria.LastName));
}
if(criteria.Phone!= null){
q = q.Where(c=>c.Phone.Contains(criteria.Phone));
}
return q.AsEnumerable();
}

LINQ self referencing query

I have the following SQL query:
select
p1.[id],
p1.[useraccountid],
p1.[subject],
p1.[message],
p1.[views],
p1.[parentid],
max(
case
when p2.[created] is null then p1.[created]
else p2.[created]
end
) as LastUpdate
from forumposts p1
left join
(
select
id, parentid, created
from
forumposts
) p2 on p2.parentid = p1.id
where
p1.[parentid] is null
group by
p1.[id],
p1.[useraccountid],
p1.[subject],
p1.[message],
p1.[views],
p1.[parentid]
order by LastUpdate desc
Using the following class:
public class ForumPost : PersistedObject
{
public int Views { get; set; }
public string Message { get; set; }
public string Subject { get; set; }
public ForumPost Parent { get; set; }
public UserAccount UserAccount { get; set; }
public IList<ForumPost> Replies { get; set; }
}
How would I replicate such a query in LINQ? I've tried several variations, but I seem unable to get the correct join syntax. Is this simply a case of a query that is too complicated for LINQ? Can it be done using nested queries some how?
The purpose of the query is to find the most recently updated posts i.e. replying to a post would bump it to the top of the list. Replies are defined by the ParentID column, which is self-referencing.
The syntaxt of left join in LINQ is :
(i put it in VB.NET) :
Dim query = From table1 in myTable.AsEnumarable 'Can be a collection of your object
Group join table2 in MyOtherTable.AsEnumerable
On table1.Field(Of Type)("myfield") Equals table2.Field(Of Type)("myfield")
In temp
From table2 in temp.DefaultIsEmpty()
Where table1.Field(Of Type)("Myanotherfield") is Nothing 'exemple
Select New With { .firstField = table1.Field(Of Type)("Myanotherfield")
.secondField = table2.Field(Of Type)("Myanotherfield2")}
Something like that
Ju
I discovered that NHibernate LINQ support doesn't include joins. That, coupled with an apparent inexperience with complex LINQ queries, I resorted to the following work around:
Add a Modified column to the posts table.
On reply, update parent's Modified column to match reply's Created column
Sort by and retrieve the value of the Modified column for post display.
I think it's a pretty clean work around, given the limitations of the code. I dreadfully wanted to avoid having to resort to adding another entity, referencing a view, or using a stored procedure + data table combination for this particular piece of code only. Wanted to keep everything within the entities and use NHibernate only, and this fix allows that to happen with minimal code smell.
Leaving this here to mark as answer later.

Resources