Method to get a summary view from a dataset - linq

I have a dataset that looks like this:
Date
Category
Rate
Quantity
There will be 0 or 1 row for each Category for any given Date.
What is a good way to get this data into a summary type of view?
For example:
Date
Category1_Rate
Category2_Rate
Category3_Rate
Category4_Rate
I have a fixed number of Categories.
I'm using linq.
Here is an example. If I have this data:
Date Category Rate Quantity
1/1/12 toys 15 12
1/1/12 games 20 20
1/1/12 dvds 18 30
1/2/12 toys 19 13
1/2/12 dvds 20 17
I want to produce a summary that looks like this:
Date toys_rate games_rate dvds_rate
1/1/12 15 20 18
1/2/12 19 null 20

Possibly something like this
var summarydata =
from r in table
group r by r.Date into g
select new
{
Date = g.Key,
ToysRate = g.Where(e=> e.Category == "toys").Count() > 0 ?
(int?)g.Where(e=> e.Category == "toys").First().Rate : null,
GamesRate = g.Where(e=> e.Category == "games").Count() > 0 ?
(int?)g.Where(e=> e.Category == "games").First().Rate : null,
DvdsRate = g.Where(e=> e.Category == "dvds").Count() > 0 ?
(int?)g.Where(e=> e.Category == "dvds").First().Rate : null
};
Note I haven't tested this as I don't current have access to a C# environment.
EDIT - Added nullable int casts to properly set the type of the various rate fields in the resulting anonymous type.

Related

Using Featuretools to aggregate per time time of day

I'm wondering if there's any way to calculate all the same variables I already am using deep feature synthesis (ie counts, sums, mean, etc) for different time segments within a day?
I.e. count of morning events (hours 0-12) as a separate variable from evening events (13-24).
Also, within the same vein, what would be the easiest to get counts by day of week, day of month, day of year, etc. Custom aggregate primitives?
Yes, this is possible. First, let's generate some random data and then I'll walkthrough how
import featuretools as ft
import pandas as pd
import numpy as np
# make some random data
n = 100
events_df = pd.DataFrame({
"id" : range(n),
"customer_id": np.random.choice(["a", "b", "c"], n),
"timestamp": pd.date_range("Jan 1, 2019", freq="1h", periods=n),
"amount": np.random.rand(n) * 100
})
def to_part_of_day(x):
if x < 12:
return "morning"
elif x < 18:
return "afternoon"
else:
return "evening"
events_df["time_of_day"] = events_df["timestamp"].dt.hour.apply(to_part_of_day)
events_df
the first thing we want to do is add a new column for the segment we want to calculate features for
def to_part_of_day(x):
if x < 12:
return "morning"
elif x < 18:
return "afternoon"
else:
return "evening"
events_df["time_of_day"] = events_df["timestamp"].dt.hour.apply(to_part_of_day)
now we have a dataframe like this
id customer_id timestamp amount time_of_day
0 0 a 2019-01-01 00:00:00 44.713802 morning
1 1 c 2019-01-01 01:00:00 58.776476 morning
2 2 a 2019-01-01 02:00:00 94.671566 morning
3 3 a 2019-01-01 03:00:00 39.271852 morning
4 4 a 2019-01-01 04:00:00 40.773290 morning
5 5 c 2019-01-01 05:00:00 19.815855 morning
6 6 a 2019-01-01 06:00:00 62.457129 morning
7 7 b 2019-01-01 07:00:00 95.114636 morning
8 8 b 2019-01-01 08:00:00 37.824668 morning
9 9 a 2019-01-01 09:00:00 46.502904 morning
Next, let's load it into our entityset
es = ft.EntitySet()
es.entity_from_dataframe(entity_id="events",
time_index="timestamp",
dataframe=events_df)
es.normalize_entity(new_entity_id="customers", index="customer_id", base_entity_id="events")
es.plot()
Now, we are ready to set the segments we want to create aggregations for by using interesting_values
es["events"]["time_of_day"].interesting_values = ["morning", "afternoon", "evening"]
Then we can run DFS and place the aggregation primitives we want to do on a per segment basis in the where_primitives parameter
fm, fl = ft.dfs(target_entity="customers",
entityset=es,
agg_primitives=["count", "mean", "sum"],
trans_primitives=[],
where_primitives=["count", "mean", "sum"])
fm
In the resulting feature matrix, you can now see we have aggregations per morning, afternoon, and evening
COUNT(events) MEAN(events.amount) SUM(events.amount) COUNT(events WHERE time_of_day = afternoon) COUNT(events WHERE time_of_day = evening) COUNT(events WHERE time_of_day = morning) MEAN(events.amount WHERE time_of_day = afternoon) MEAN(events.amount WHERE time_of_day = evening) MEAN(events.amount WHERE time_of_day = morning) SUM(events.amount WHERE time_of_day = afternoon) SUM(events.amount WHERE time_of_day = evening) SUM(events.amount WHERE time_of_day = morning)
customer_id
a 37 49.753630 1840.884300 12 7 18 35.098923 45.861881 61.036892 421.187073 321.033164 1098.664063
b 30 51.241484 1537.244522 3 10 17 45.140800 46.170996 55.300715 135.422399 461.709963 940.112160
c 33 39.563222 1305.586314 9 7 17 50.129136 34.593936 36.015679 451.162220 242.157549 612.266545

Please suggest a linq query for my requirement

Can anyone suggest a linq query for the below requirement.
There is a Checkbox on the form..when we click on it...As per the below datatable it has to be grouped according to ItemCode,Sum(SoldQty), StockInHand,LatestRecordValueOfSales, Amount, Description.
You can't group. the following columns
solddate - show the latest sold date
department
category
ItemCode Description UOM SoldQty Stock in Hand SellPrice Amount
---------------------------------------------------------------
100 Paracetamol 200MG UOM1 5 -5 3 8 0 100 1/21/2013 MEAT INDIAN BEAF
100 Paracetamol 200MG UOM1 5 -5 3 8 0 100 1/21/2013 MEAT INDIAN BEAF
200 frozen meat Kilograms 0.005 88.19 4 4.01 0 200 1/21/2013 OTHERS INDIAN BEAF
200 frozen meat Kilograms 0.044 88.19 4 4.04 0 200 1/21/2013 OTHERS INDIAN BEAF
100 Paracetamol 200MG UOM1 5 -5 3 8 0 100 1/22/2013 MEAT INDIAN BEAF
200 frozen meat Kilograms 0.054 88.19 4 4.05 0 200 1/22/2013 OTHERS INDIAN BEAF
200 frozen meat Kilograms 0.055 88.19 4 4.06 0 200 1/22/2013 OTHERS INDIAN BEAF
========================================================================
General query
var resQuery = from i in someQueryable
group i by new {i.groupProperty1, i.groupProperty2} into g
select new
{
Property1 = g.Key.Property1,
Property2 = g.Key.Property2
Total = g.Sum(p => p.SumProperty),
/// other properties
};
For your example data it could be like:
var resQuery = from i in dbContext.Items
group i by new{ i.ItemCode, i.Description, i.UOM} into g
select new
{
ItemCode = g.Key.ItemCode,
TotalSold = g.Sum(p => p.SoldQty),
Description = g.Key.Description,
UOM =g.Key.UOM
/// other properties
};
Try example on Ideone: http://ideone.com/xXwgoG
Similar questions asked on SO many times:
Linq Objects Group By & Sum
LINQ Lambda Group By with Sum
Multiple group by and Sum LINQ
Below is my code and it works fine but only for the first row the soldqty and Amount values are getting doubled.while other rows data is fine.I am not able to understand why only the first row data Sum(SoldQty) is getting doubled.
decimal? SoldQty, stockinhand,SellPrice,Amount,CostPrice;
string ItemCode, Description,UOM,BarCode,SoldDate,Department,Category,User;
var resQuery = from row in dtFilter.AsEnumerable()
group row by row.Field<string>("Item Code") into g
select dtFilter.LoadDataRow(new object[]
{
ItemCode=g.Key,
Description=g.Select(r=>r.Field<string>("Description")).First<string>(),
UOM=g.Select(r=>r.Field<string>("UOM")).First<string>(),
SoldQty = g.Sum(r => r.Field<decimal?>("Sold Qty")).Value,
stockinhand=g.Select(r=>r.Field<decimal?>("Stock in Hand")).First<decimal?>(),
SellPrice=g.Select(r=>r.Field<decimal?>("Sell Price")).First<decimal?>(),
Amount = g.Sum(r => r.Field<decimal?>("Amount")).Value,
CostPrice = g.Sum(r => r.Field<decimal?>("Cost Price")).Value,
BarCode=g.Select(r=>r.Field<string>("Barcode")).First<string>(),
SoldDate=g.Select(r=>r.Field<string>("SoldDate")).Last<string>(),
Department=g.Select(r=>r.Field<string>("Department")).First<string>(),
Category=g.Select(r=>r.Field<string>("Category")).First<string>(),
User=g.Select(r=>r.Field<string>("User")).First<string>(), }, false);

How to get query result if linq to DataTable?

Can any one tell, how to get the result of LINQ query contains group by to DataTable .
var query= from d in dtable.AsEnumerable()
group d by d["Id"];
WId FirstName LastName Age
1 Jass we 23
1 Mady wer 54
3 Servy gr 22
4 Jan fr 11
Expected
WId FirstName LastName Age
1 Jass we 23
3 Servy gr 22
4 Jan fr 11
Thanks
Pradeep
If you just want to take the first person per ID-Group:
var distinctIdPersons = from p in dtable.AsEnumerable()
group p by p.Field<int>("WId") into IdGroups
select IdGroups.First();
or in method syntax:
distinctIdPersons = dtable.AsEnumerable().GroupBy(r => r.Field<int>("WId"))
.Select( g => g.First());
If you want to see the result(f.e. for testing purposes), you can use string.Join:
var output = string.Join(", ", distinctIdPersons.Select(r =>
r.Field<string>("FirstName") + " " + r.Field<string>("LastName")));
Console.WriteLine(output); // Jass we, Servy gr, Jan fr

Linq Query - get current month plus previous months

I need to build a Linq query that will show the results as follow:
Data:
Sales Month
----------------------
10 January
20 February
30 March
40 April
50 May
60 June
70 July
80 August
90 September
100 October
110 November
120 December
I need to get the results based on this scenario:
month x = month x + previous month
that will result in:
Sales Month
--------------------
10 January
30 February (30 = February 20 + January 10)
60 March (60 = March 30 + February 30)
100 April (100 = April 40 + March 60)
.........
Any help how to build this query ?
Thanks a lot!
Since you wanted it in LINQ...
void Main()
{
List<SaleCount> sales = new List<SaleCount>() {
new SaleCount() { Sales = 10, Month = 1 },
new SaleCount() { Sales = 20, Month = 2 },
new SaleCount() { Sales = 30, Month = 3 },
new SaleCount() { Sales = 40, Month = 4 },
...
};
var query = sales.Select ((s, i) => new
{
CurrentMonth = s.Month,
CurrentAndPreviousSales = s.Sales + sales.Take(i).Sum(sa => sa.Sales)
});
}
public class SaleCount
{
public int Sales { get; set; }
public int Month { get; set; }
}
...but in my opinion, this is a case where coming up with some fancy LINQ isn't going to be as clear as just writing out the code that the LINQ query is going to generate. This also doesn't scale. For example, including multiple years worth of data gets even more hairy when it wouldn't have to if it was just written out the "old fashioned way".
If you don't want add up all of the previous sales for each month, you will have to keep track of the total sales somehow. The Aggregate function works okay for this because we can build a list and use its last element as the current total for calculating the next element.
var sales = Enumerable.Range(1,12).Select(x => x * 10).ToList();
var sums = sales.Aggregate(new List<int>(), (list, sale) => list.Concat(new List<int>{list.LastOrDefault() + sale});

EF Linq query comparing data from multiple rows

I would like to create a Linq query that compares date from multiple rows in a single table.
The table consists of data that polls a web-services for balance data for account. Unfortunately the polling interval is not a 100% deterministic which means there can be 0-1-more entries for each account per day.
For the application i would need this data to be reformatted in a certain formatted (see below under output).
I included sample data and descriptions of the table.
Can anybody help me with a EF Linq query that will produce the required output?
table:
id The account id
balance The available credits in the account at the time of the measurement
create_date The datetime when the data was retrieved
Table name:Balances
Field: id (int)
Field: balance (bigint)
Field: create_date (datetime)
sample data:
id balance create_date
3 40 2012-04-02 07:01:00.627
1 55 2012-04-02 13:41:50.427
2 9 2012-04-02 03:41:50.727
1 40 2012-04-02 16:21:50.027
1 49 2012-04-02 16:55:50.127
1 74 2012-04-02 23:41:50.627
1 90 2012-04-02 23:44:50.427
3 3 2012-04-02 23:51:50.827
3 -10 2012-04-03 07:01:00.627
1 0 2012-04-03 13:41:50.427
2 999 2012-04-03 03:41:50.727
1 50 2012-04-03 15:21:50.027
1 49 2012-04-03 16:55:50.127
1 74 2012-04-03 23:41:50.627
2 -10 2012-04-03 07:41:50.727
1 100 2012-04-03 23:44:50.427
3 0 2012-04-03 23:51:50.827
expected output:
id The account id
date The data component which was used to produce the date in the row
balance_last_measurement The balance at the last measurement of the date
difference The difference in balance between the first- and last measurement of the date
On 2012-04-02 id 2 only has 1 measurement which sets the difference value equal to the last(and only) measurement.
id date balance_last_measurement difference
1 2012-04-02 90 35
1 2012-04-03 100 10
2 2012-04-02 9 9
2 2012-04-03 -10 -19
3 2012-04-02 3 -37
3 2012-04-03 0 37
update 2012-04-10 20:06
The answer from Raphaƫl Althaus is really good but i did make a small mistake in the original request. The difference field in the 'expected output' should be either:
the difference between the last measurement of the previous day and the last measurement of the day
if there is no previous day then first measurement of the day should be used and the last measurement
Is this possible at all? It seems to be quite complex?
I would try something like that.
var query = db.Balances
.OrderBy(m => m.Id)
.ThenBy(m => m.CreationDate)
.GroupBy(m => new
{
id = m.Id,
year = SqlFunctions.DatePart("mm", m.CreationDate),
month = SqlFunctions.DatePart("dd", m.CreationDate),
day = SqlFunctions.DatePart("yyyy", m.CreationDate)
}).ToList()//enumerate there, this is what we need from db
.Select(g => new
{
id = g.Key.id,
date = new DateTime(g.Key.year, g.Key.month, g.Key.day),
last_balance = g.Select(m => m.BalanceValue).LastOrDefault(),
difference = (g.Count() == 1 ? g.First().BalanceValue : g.Last().BalanceValue - g.First().BalanceValue)
});
Well, a probable not optimized solution, but just see if it seems to work.
First, we create a result class
public class BalanceResult
{
public int Id { get; set; }
public DateTime CreationDate { get; set; }
public IList<int> BalanceResults { get; set; }
public int Difference { get; set; }
public int LastBalanecResultOfDay {get { return BalanceResults.Last(); }}
public bool HasManyResults {get { return BalanceResults != null && BalanceResults.Count > 1; }}
public int DailyDifference { get { return HasManyResults ? BalanceResults.Last() - BalanceResults.First() : BalanceResults.First(); } }
}
then we change a little bit our query
var query = db.Balances
.GroupBy(m => new
{
id = m.Id,
year = SqlFunctions.DatePart("mm", m.CreationDate),
month = SqlFunctions.DatePart("dd", m.CreationDate),
day = SqlFunctions.DatePart("yyyy", m.CreationDate)
}).ToList()//enumerate there, this is what we need from db
.Select(g => new BalanceResult
{
Id = g.Key.id,
CreationDate = new DateTime(g.Key.year, g.Key.month, g.Key.day),
BalanceResults = g.OrderBy(l => l.CreationDate).Select(l => l.BalanceValue).ToList()
}).ToList();
and finally
foreach (var balanceResult in balanceResults.ToList())
{
var previousDayBalanceResult = balanceResults.FirstOrDefault(m => m.Id == balanceResult.Id && m.CreationDate == balanceResult.CreationDate.AddDays(-1));
balanceResult.Difference = previousDayBalanceResult != null ? balanceResult.LastBalanecResultOfDay - previousDayBalanceResult.LastBalanecResultOfDay : balanceResult.DailyDifference;
}
as indicated, performance (use of dictionaries, for example), code readability should of course be improved, but... that's the idea !

Resources