I'm pretty new to LINQ, but I'm trying to find a fast way to take a set of data and pull out only rows where particular columns have duplicates in other rows. E.g. in a set of people, pull out only people who share a phone number with another person. Here's a breakdown of what I'm up to:
public class Person
{
public int Id { get; set; }
public string AddressLine1 { get; set; }
public string City { get; set; }
public string PostalCode { get; set; }
public int Province { get; set; }
public string Name { get; set; }
public string Phone { get; set; }
public string Email { get; set; }
public string Fax { get; set; }
public string Web { get; set; }
}
And then I want to sort them in different ways to look for possible duplicates in my input values, so if I want to find where Address Line 1 and Postal Code match, I can sort it like so:
IOrderedEnumerable<Person> sortedPeople = people.OrderBy(x => x.AddressLine1).ThenByDescending(x => x.PostalCode);
And then just go through and bundle any matches together, but there will be a lot of data that doesn't match anything, so if I can cull it out in the first place, it could potentially save a lot of time.
I have a suspicion it will end up costing me more time than it would save, but I figured I'd ask if there's an efficient way.
Your sorting lines won't find duplicates. If you want to find duplicates, you need to make groups of Persons that share something, for instance Persons that have the same PostalCode, or that live in the same City.
Making groups of items that share something is done by using one of the overloads of Enumerable.GroupBy. The most important parameter of GroupBy is parameter keySelector. With this parameter you say what should be the common value for all Persons in the group.
The following will give you a sequence of Groups of Persons. All Persons in the group have the same City. The group is identified by the Key, which has the value of the common element. So you have a Group with all Parisians, with key "Paris"; Another group contains all Amsterdammers with key "Amsterdam", etc
var result = persons.GroupBy(person => person.City);
However, you don't want to keep all groups, you only want to keep groups that have more than one member: those are the groups that have duplicates.
var duplicateCitizens = persons.GroupBy(person => person.City)
.Where(group => group.Skip(1).Any());
In words: first make groups of persons that live in the same City. Then from every group, keep only those groups that have more than one element (= if you skip one element, there are still elements left).
I use the Skip(1).Any() method for efficiency reasons. If you already know after the 1st element that there are duplicates, why continue counting all hundred elements?
The result is a sequence of Groups of more than one Person. All Persons in the group live in the same city. The Key of the group is the City that they have in common.
You can group the results by the phone number:
var query = Persons.GroupBy(p => p.Phone)
.Where(p => p.Count() > 1)
.ToList();
After this you have a grouped list with the phone number as key and the persons with the matching number if this number is more than one time assigned.
Related
Say I have the following class:
public class Sightings
{
public string CommonName { get; set; }
public string ScientificName { get; set; }
public string TimePeriod { get; set; }
public bool Seen { get; set; }
}
I would like to create a pivot query on TimePeriod. If I knew what the values in TimePeriod were ahead of time I would know how to write the query. However, I don't know. They could be years (e.g., 2007, 2008, 2009 etc.) or they could be months (e.g., Jan-2004, Feb-2004, March-2004) or they could be quarters (e.g., Q1-2004, Q2-2005, Q3-2004 etc.). It really depends on how the user wants to view the data.
Is it possible to write a Linq query that can do this, when the values in the pivot column are not known?
I've considered writing the values to an Excel sheet, creating a pivot sheet and then reading the values out of the sheet. I may have to do that if a Linq query won't work.
Do you know the most granular values that the user can choose ?
If for example it's months, then you can simply write a query for months (see Is it possible to Pivot data using LINQ?).
If the user then chooses years, it should be simple to aggregate those values to years.
If they want quarters, then you can aggregate to quarters and so on.
It's maybe not the best/most generic solution, but it should get the job done quite easily.
My application has the following entities (with a many-to-many relationship between Product and Model):
public class TopProduct {
public virtual int Id { get; set; }
public virtual Product Product { get; set; }
public virtual int Order { get; set; }
}
public class Product {
public virtual int Id { get; set; }
public virtual string Name { get; set; }
public virtual IList<Model> Models { get; set; }
}
public class Model {
public virtual string ModelNumber { get; set; }
public virtual IList<Product> Products { get; set; }
}
Note: A product could have 1000s of models.
I need to display a list of TopProducts and the first 5 models (ordered alphabetically) against each one.
For example say I have the following query:
var topProducts = session.Query<TopProduct>()
.Cacheable()
.Fetch(tp => tp.Product).ThenFetchMany(p => p.Models)
.OrderBy(tp => tp.Order)
.ToList();
If I now say:
foreach (var topProduct in topProducts) {
var models = topProduct.Product.Models.Take(5).ToList();
...
}
This executes extremely slowly as it retrieves an item from the second level cache for each model. Since there could be 1000s of models against a product, it would need to retrieve 1000s of items from the cache the second time it is executed.
I have been racking my brain trying to think of a better way of doing this but so far I am out of ideas. Unfortunately my model and database cannot be modified at this stage.
I'd appreciate the help. Thanks
The key to your problem is understanding how entity and query caching work.
Entity caching stores, essentially, the POID of an entity and its property values.
When you want to get/initialize an instance, NH will first check the cache to see if the values are there, in order to avoid a db query.
Query caching, on the other hand, stores a query as the key (to simplify, let's say it's the command text and the parameter values), and a list of entity ids as the value (this is assuming your result is a list of entities, and not a projection)
When NH executes a cacheable query, it will see if the results are cached. If they are, it will load the proxies from those ids. Then, as you use them, it will initialize them one by one, either from the entity cache or from the db.
Collection cache is similar.
Usually, getting many second-level cache hits for those entity loads is a good thing. Unless, of course, you are using a distributed cache located in a separate machine, in which case this is almost as bad as getting them from the db.
If that is the case, I suggest you skip caching the query.
How would I display data from the following database entity in a format that looks like a table:
public class Attendance
{
public int AttendanceID { get; set; }
public int CourseID { get; set; }
public int StudentID { get; set; }
public int AttendanceDay { get; set; }
public bool Present { get; set; }
public virtual Course Course { get; set; }
public virtual Student Student { get; set; }
}
I would want to find all rows in the Attendance db entry that had CourseID == x; So I would use something like:
AttendanceData = Attendance.Where(s => s.CourseID == x); // I think
Then I would need to be able to sort this information in my view to display it in a way that makes sense. I would want to have a data on screen with all of the present/not present values sorted in a table with StudentIDs listed on the left and AttendanceDays listed accross the top.
How would I sort and display this information?
UPDATE:
Using the following code (along with Mvc WebGrid) - I can get a grid of some sort to appear in my view.
Controller:
IEnumerable<Attendance> model = db.Attendance.Where(s => s.CourseID == 4);
return View(model);
View:
#model IEnumerable<MyProject.Models.Attendance>
<div>
#{
var grid = new WebGrid(Model, defaultSort: "Name");
}
#grid.GetHtml()
</div>
However, the grid is not organized in a manner that is useful for my needs.
I want the top of my displayed table to read:
Day 1 | Day 2 | Day 3 | etc until the max value of "Attendance Day" (which is dictated at the creation of a Course that a student signs up for).
I want the left side of the displayed table to read:
Student ID 1
Student ID 5
Student ID 6
Student ID etc . . until all of the students within the data set have been displayed.
I think I need to use something along the lines of this in my controller:
var model = from s in db.Attendance
where s.CourseID == 4
group s.AttendanceDay by s.StudentID into t
select new
{
StudentID = t.Key,
Days = t.OrderBy(x => x)
};
return View(model);
But I need an IEnumerable<> returned to my view using Mvc WebGrid -- I am getting somewhere, just still a little lost along the way. Can I get a nudge in the right direction?
For a fairly typical set of requirements, it sounds like this would be a good candidate for the ASP.NET WebGrid. It's flexible, allows for paging, sorting, formatting, etc. I've used it for a few projects and works just like any of the other HTML helpers that you're probably used to in ASP.NET MVC.
Here's a good starting place.
Ok, I am trying to find get all subcategories from my database (using query result shaping) that belong to the category that I supply. My class SubCategory includes a List<> of Categories.
The problem is that in the linq statement, g is referring to SubCategory (which in the end contains Categories<>). So the statement below is not allowed.
How do I change the Linq statement to generate the correct SQL query to include all SubCategories that contain the matching Category.
public class SubCategory
{
public int SubCategoryId { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public List<Article> Articles { get; set; }
public List<Category> Categories { get; set; }
}
//incorrect code below:
var SubCategories = storeDB.SubCategories.Include("Categories").Single(g => g.Name == category);
This worked for me (maybe too simple):
var Category = storeDB.Categories.Include("SubCategories").Single(c => c.Name == category);
return Category.SubCategories;
I find it a bit confusing that each SubCategory can belong to more than one Category - have you got this relationship the right way around?
Regardless, I think its probably more readable if you change the select to work on Categories first - i.e. something like:
var subCatQuery = from cat in storeDB.Categories
where cat.Name == category
select cat.SubCategories;
which you can then execute to get your IEnumerable<>:
var subCategories = subCatQuery.ToList();
I find that much more readable/understandable.
(I also find the query syntax easier to read here than the fluent style)
my preferred answer would be to use a linq join. if there aren't but 2 data sets you can use the linq operations .join and .groupjoin vs a from statement. this would be my approach.
Dictionary<MainQuery, List<subQuery>> Query =
dbStore.List<MainQuery>.include("IncludeAllObjects")
.groupjoin(db.List<SubTable>.Include("moreSubQueryTables"),
mainQueryA=>mainQueryA.PropToJoinOn,
subQueryB => sunQueryB.PropToJoinOn,
((main, sub)=> new {main, sub})
.ToDictionary(x=>x.main, x=>x.sub.ToList());
I am writing a method that is passed a List<AssetMovements> where AssetMovements looks something like
public class AssetMovements
{
public string Description { get; set; }
public List<DateRange> Movements { get; set; }
}
I want to be able to flatten out these objects into a list of all Movements regardless of Description and am trying to figure out the LINQ query I need to do this. I thought that
from l in list select l.Movements
would do it and return IEnumerable<DateRange> but instead it returns IEnumerable<List<DateRange>> and I'm not really sure how to correct this. Any suggestions?
This one's been asked before. You want the SelectMany() method, which flattens out a list of lists. So:
var movements = list.SelectMany(l => l.Movements);