Trouble with Linq in group by query - linq

Maybe someone knows how to achieve this kind of query in linq (or lambda).
I have this set in a list
Filter: My input will be code 100 and 101, I need to get the "values", in this example = 1, 2.
Problem: If you input 100 and 101, you´ll get 3 results, because of 100 from group 1 and group 2. I just need the pair that matches in the same group. (And you don´t have group as an input param)
How can I solve this if the group fully exists?
thanks!

Starting with a simple representation in code of what you have in a picture:
var list = new[]
{
new{code = 100, value = 1, group = 1},
new{code = 101, value = 2, group = 1},
new{code = 100, value = 3, group = 2},
new{code = 103, value = 4, group = 2},
};
var inp = new[]{100, 103};
Then we can do:
list
.GroupBy(el => el.group) // Group by the "group" field.
.Where(grp => !inp.Except(grp.Select(el => el.code)).Any()) // Exclude groups that don't contain all input values
.Single() // Obtain the only such group (with a check that there is only one)
.Select(el => el.value); // Obtain the "value" fields.
If you could perhaps have inputs that were a subset of the "code" fields of some groups, you could also check that you match all of the group completely by excluding groups which have a different size:
list
.GroupBy(el => el.group)
.Where(grp =>
grp.Count‎() == inp.Count()
&& !inp.Except(grp.Select(el => el.code)).Any())
.Single()
.Select(el => el.value);
There are other variations that match other possible interpretations of your question (e.g. I'm assuming there can be only one matching group, but that wasn't clear).

Related

Bar Chart on Dimension-1 and Stacked by Dimension-2

Summary
I want to display a bar chart whose dimension is days and is stacked by a different category (i.e. x-axis = days and stack = category-1). I can do this "manually" in that I can write if-then's to zero or display the quantity, but I'm wondering if there's a systematic way to do this.
JSFiddle https://jsfiddle.net/wostoj/rum53tn2/
Details
I have data with dates, quantities, and other classifiers. For the purpose of this question I can simplify it to this:
data = [
{day: 1, cat: 'a', quantity: 25},
{day: 1, cat: 'b', quantity: 15},
{day: 1, cat: 'b', quantity: 10},
{day: 2, cat: 'a', quantity: 90},
{day: 2, cat: 'a', quantity: 45},
{day: 2, cat: 'b', quantity: 15},
]
I can set up a bar chart, by day, that shows total units and I can manually add the stacks for 'a' and 'b' as follows.
var dayDim = xf.dimension(_ => _.day);
var bar = dc.barChart("#chart");
bar
.dimension(dayDim)
.group(dayDim.group().reduceSum(
_ => _.cat === 'a' ? _.quantity : 0
))
.stack(dayDim.group().reduceSum(
_ => _.cat === 'b' ? _.quantity : 0
));
However, this is easy when my data has only 2 categories, but I'm wondering how I'd scale this to 10 or an unknown number of categories. I'd imagine the pseudo-code I'm trying to do is something like
dc.barChart("#chart")
.dimension(xf.dimension(_ => _.day))
.stackDim(xf.dimension(_ => _.cat))
.stackGroup(xf.dimension(_ => _.cat).group().reduceSum(_ => _.quantity));
I mentioned this in my answer to your other question, but why not expand on it a little bit here.
In the dc.js FAQ there is a standard pattern for custom reductions to reduce more than one value at once.
Say that you have a field named type which determines which type of value is in the row, and the value is in a field named value (in your case these are cat and quantity). Then
var group = dimension.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});
will reduce all the rows for each bin to an object where the keys are the types and the values are the sum of values with that type.
The way this works is that when crossfilter encounters a new key, it first uses the "initial" function to produce a new value. Here that value is an empty object.
Then for each row it encounters which falls into the bin labelled with that key, it calls the "add" function. p is the previous value of the bin, and v is the current row. Since we started with a blank object, we have to make sure we initialize each value; (p[v.type] || 0) will make sure that we start from 0 instead of undefined, because undefined + 1 is NaN and we hate NaNs.
We don't have to be as careful in the "remove" function, because the only way a row will be removed from a bin is if it was once added to it, so there must be a number in p[v.type].
Now that each bin contains an object with all the reduced values, the stack mixin has helpful extra parameters for .group() and .stack() which allow us to specify the name of the group/stack, and the accessor.
For example, if we want to pull items a and b from the objects for our stacks, we can use:
.group(group, 'a', kv => kv.value.a)
.stack(group, 'b', kv => kv.value.b)
It's not as convenient as it could be, but you can use these techniques to add stacks to a chart programmatically (see source).

How to use OrderBy in Linq

I am using OrderBy, and I have figured out that I have to use OrderBy as a last method, or it will not work. Distinct operator does not grant that it will maintain the original order of values, or if I use Include, it cannot sort the children collection.
Is there any reason why I shouldn't do Orderby always last and don't worry if order is preserved?
Edit:
In general, is there any reason, like performance impact, why I should not use OrderBy last. Doesnt metter if I use EnityFramework to query a database or just querying some collection.
dbContext.EntityFramework.Distinct().OrderBy(o=> o.Something); // this will give me ordered result
dbContext.EntityFramework.OrderBy(o=> o.Something).Distinct().; // this will not, because Distinct doesnt preserve order.
Lets say that I want to Select only one property.
dbContext.EntityFramework.Select(o=> o.Selected).OrderBy(o=> o.Something);
Will order be faster if I order collection after one property selection? So in that case I should use Order last. And I am just asking is there any situation where ordering shoudnt be done as last command?
Is there any reason why I shouldn't do OrderBy always last
There may be reasons to use OrderBy not as the last statement. For example, the sort property may not be in the result:
var result = context.Entities
.OrderBy(e => e.Date)
.Select(e => e.Name);
Or you want a sorted collection as part of the result:
var result = context.Customers
.Select(c => new
{
Customer = c,
Orders = c.Orders.OrderBy(o => o.Date)
Address = c.Address
});
Will order be faster if I order collection after one property selection?
Your examples show that you're working with LINQ to Entities, so the statements will be translated into SQL. You will notice that...
context.Entities
.OrderBy(e => e.Name)
.Select(e => e.Name)
... and ...
context.Entities
.Select(e => e.Name)
.OrderBy(s => s)
... will produce exactly the same SQL. So there is no essential difference between both OrderBy positions.
Doesn't matter if I use Entity Framework to query a database or just querying some collection.
Well, that does matter. For example, if you do...
context.Entities
.OrderBy(e => e.Date)
.Select(e => e.Name)
.Distinct()
... you'll notice that the OrderBy is completely ignored by EF and the order of names is unpredictable.
However, if you do ...
context.Entities
.AsEnumerable() // Continue as LINQ to objects
.OrderBy(e => e.Date)
.Select(e => e.Name)
.Distinct()
... you'll see that the sort order is preserved in the distinct result. LINQ to objects clearly has a different strategy than LINQ to Entities. OrderBy at the end of the statement would have made both results equal.
To sum it up, I'd say that as a rule of the thumb, try to order as late as possible in a LINQ query. This will produce the most predictable results.
I don't know if you misundertood the meaning of Distinct. According to definition it does:
Returns distinct elements from a sequence by using the default equality comparer to compare values.
So if you have a list of int and you want to remove repeated values, you use Distinct. Distinct uses the default equality comparer and it does the comparison by comparing the current element to the next one. So, you have to sort first to get the expected result.
And about OrderBy method, in fact, it does the sort. So if you want to sort something and distinct after you use:
List<int> myNumbers = new List<int>{ 102, 2817, 82, 2, 1, 2, 1, 9, 4 };
Sorting and removing duplicated numbers
// returns 1, 2, 4, 9, 82, 102, 2817
var sortedUniques = myNumbers.OrderBy(n => n).Distinct();
Removing duplicated numbers and sorting
// returns 1, 1, 2, 2, 4, 9, 82, 102, 2817
// It occurs because the Distinct compares current number to the next one
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
Just removing duplicated numbers
// returns 102, 2817, 82, 2, 1, 9, 4
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
Just sorting
// returns 1, 1, 2, 2, 4, 9, 82, 102, 2817
var sortedUniques = myNumbers.Distinct().OrderBy(n => n);
I hope it helps you \o/

LINQ Get Grouped ID by condition

Hi I have a List so:
A 1
A 2
A 3
A 4
B 1
B 2
C 1
I want to select the letter that contains AT LEAST these 3 numbers: 1,2,3
So in this case would be selected the letter A.
Can you help me to write this as LINQ expression?
Thanks a lot!
First, make a collection of the numbers you require.
var required = new[] { 1, 2, 3 };
Then, group your pairings by letter.
var groupedPairings = pairings.GroupBy(p => Letter, p => Number);
Then, discard those pairings that don't have your three required items. (The logic here is "take the collection of required items, remove anything in the group, and make sure there is nothing left".)
var groupsWithRequired = groupedPairings
.Where(g => !required.Except(g).Any());
Now, if you just want the letters, you can simply do
var lettersWithRequired = groupsWithRequired.Select(g => g.Key);
or if you want a dictionary mapping from the letter to its collection of numbers, you can do
var dictionary = groupsWithRequired.ToDictionary(g => g.Key, g => g.ToArray());
var numbersForA = dictionary["A"]; // = {1, 2, 3, 4}
You could try this, although I don't feel it's the best answer:
var items = new List<Item>{
new Item{Name="A", Value=1},
new Item{Name="A", Value=2},
new Item{Name="A", Value=3},
new Item{Name="A", Value=3},
new Item{Name="A", Value=4},
new Item{Name="B", Value=1},
new Item{Name="B", Value=2},
new Item{Name="C", Value=1},
};
var values = new List<int>{1,2,3};
var query = items.GroupBy (i => i.Name)
.Where (i => i.Select (x => x.Value)
.Intersect(values).Count() == values.Count)
.Select (i => i.Key);
Where
class Item{
public string Name{get;set;}
public int Value{get;set;}
}

Entity Framework Linq - how to get groups that contain all your data

Here a sample dataset:
OrderProduct is a table that contains the productIds that were part of a given order.
Note: OrderProduct is a database table and I am using EF.
OrderId, ProductId
1, 1
2, 2
3, 4
3, 5
4, 5
4, 2
5, 2
5, 3
What I want to be able to do is find an order that contains only the productIds that I am searching for. So if my input was productIds 2,3, then I should get back OrderId 5.
I know how I can group data, but I am unsure of how to perform the select on the group.
Here is what I have:
var q = from op in OrderProduct
group op by op.OrderId into orderGroup
select orderGroup;
Not sure how to proceed from here
IEnumerable<int> products = new List<int> {2, 3};
IEnumerable<OrderProduct> orderProducts = new List<OrderProduct>
{
new OrderProduct(1, 1),
new OrderProduct(2, 2),
new OrderProduct(3, 4),
new OrderProduct(3, 5),
new OrderProduct(4, 5),
new OrderProduct(4, 2),
new OrderProduct(5, 2),
new OrderProduct(5, 3),
};
var orders =
(from op in orderProducts
group op by op.OrderId into orderGroup
//magic goes there
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
select orderGroup);
//outputs 5
orders.Select(x => x.Key).ToList().ForEach(Console.WriteLine);
Or you can have another version as pointed in another answer, just replace
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
on
where products.All(pid => orderGroup.Any(op => op.ProductId == pid))
second one will have ~ 15% better performance (I've checked that)
Edit
According to the last requirement change, that you need orders that contain not all productIds you are searching, but exactly those and only those productIds, I wrote an updated version:
var orders =
(from op in orderProducts
group op by op.OrderId into orderGroup
//this line was added
where orderGroup.Count() == products.Count()
where !products.Except(orderGroup.Select(x => x.ProductId)).Any()
select orderGroup);
So the only thing you'll need is to add a precondition ensuring that collections contains the same amount of elements, it will work for both previous queries, and as a bonus I suggest 3rd version of the most important where condition:
where orderGroup.Select(x => x.ProductId).Intersect(products).Count() == orderGroup.Count()
At first glance, I'd try something like this:
var prodIds = new[] {2, 3};
from o in context.Orders
where prodIds.All(pid => o.OrderProducts.Any(op => op.ProductId == pid))
select o
In plain language: "get the orders that have a product with every ID in the given list."
Update
Since it appears you are using LINQ to SQL rather than LINQ to Entities, here's another approach:
var q = context.Orders;
foreach(var pid in prodIds)
{
q = q.Where(o => o.OrderProducts.Any(op => op.ProductId == pid));
}
Rather than using a single LINQ statement, you essentially build the query piecemeal.
Thanks to StriplingWarrior's answer I managed to figure this out. Not sure if this is the best way to do this, but it works.
List<int> prodIds = new List<int>{2,3};
var q = from o in Orders
//get all orderproducts that contain products in the ProdId list
where o.OrderProducts.All(op => prodIds.Contains(op.ProductId))
//now group the OrderProducts by the Orders
select from op in o.OrderProducts
group op by op.OrderId into opGroup
//select only those groups that have the same count as the prodId list
where opGroup.Count() == prodIds.Count()
select opGroup;
//get rid of any groups that may be empty
q = q.Where(fi => fi.Count()> 0);
(I am using LinqPad, which is why the query looks a little funky - no context, etc)

How can I Make a LINQ query expression dynamic

I'm trying to implement cascading controls using the following LINQ query expression.
The idea is that I have three option lists represented by the tables OptionA, OptionB and OptionC and a view called OptionIndex with one column each for OptionA_ID, OptionB_ID, OptionC_ID and that table has of all the combinations of tags from the option lists that are in use. Left outer joining the OptionIndex on the option list produces a boolean for the Disabled attributed in the option tag.
How do I make the on clause, which is .Where(...) in the following sample code, allow for any combination of the controls being used?
For example, lets say the user initially selects option value 123 in OptionA. The code to return the Values, Labels and Disabled booleans for OptionC would look like the following:
from t1 in OptionCs
from t2 in OptionIndexes.Where(x => t1.OptionC_ID == x.OptionC_ID && new List<int> { 123 }.Contains(x.OptionA_ID)).DefaultIfEmpty()
group new {t1, t2} by new { t1.OptionC_ID, t1.Label } into g
select new { g.Key.OptionC_ID, g.Key.Label, Disabled = g.Count(t => t.t2.OptionC_ID == null) > 0 }
Then lets say the user selects option values 456 and 789 in OptionB. The code to return the Values, Labels and Disabled booleans for OptionC change to:
from t1 in OptionCs
from t2 in OptionIndexes.Where(x => t1.OptionC_ID == x.OptionC_ID && new List<int> { 123 }.Contains(x.OptionA_ID) && new List<int> { 456, 789 }.Contains(x.OptionB_ID)).DefaultIfEmpty()
group new {t1, t2} by new { t1.OptionC_ID, t1.Label } into g
select new { g.Key.OptionC_ID, g.Key.Label, Disabled = g.Count(t => t.t2.OptionC_ID == null) > 0 }
To make the example code easier to understand I used new List<int>. In the actual project, however I would be passing the integers from the option list in as integer arrays from the controls themselves.
The trick is somehow making the query expression dynamic so that it can represent any combination of 0 to N multi-select controls being used or passing something that tells the join to accept any value for any given control such as
{x.OptionB_ID.Any}.Contains(x.OptionB_ID)
What is the best way to handle this?
Thanks!
Distilling your issue down to a simple example, consider this list of integers:
List<int> l = new List<int> { 1, 25, 3, 99, -23, 0, 15, 75 };
Say that you want to conditionally filter this list based on external criteria. Sometimes you want positive numbers, sometimes you want numbers smaller than 50, sometimes you want numbers divisible by 5, or any combination of these. Applying all filters with a static expression would look like this:
l.Where(n => n > 0).Where(n => n < 50).Where(n => n % 5 == 0);
To apply any or all of these dynamically, just build the LINQ query in pieces:
// These switches simulate your external conditions.
bool conditionA = true;
bool conditionB = false;
bool conditionC = true;
IEnumerable<int> myList = l;
if (conditionA) { myList = myList.Where(n => n > 0 ); }
if (conditionB) { myList = myList.Where(n => n < 50 ); }
if (conditionC) { myList = myList.Where(n => n % 5 == 0); }
With the switches set as in my example, the output is 25, 15, 75.
Side note: if you are not aware of it, use LINQPad to experiment with things like this. It is a fantastic tool for essentially executing code interactively, be it LINQ code or not. When I built the above sample, I inserted myList.Dump(); calls after each of the last 4 lines so I could see how each filter was applied. Here is the output:

Resources