Laravel Eloquent: Count Rows and group them by the day - laravel

I have a Model called "AdInteraction", these interactions can either be a click or a view (They either have boolean clicked or boolean viewed set to true).
Along every Interaction I save the created_at date.
Now this is what I want to end up with in order to have all the data I need to populate a ChartJS Chart:
[
{
"date": "01-01-2018"
"clicks": 13,
"views": 25
},
{
"date": "02-01-2018"
"clicks": 25,
"views": 74
},
{
"date": "03-01-2018"
"clicks": 0,
"views": 0
},
]
This is a query I already got on my Ad model which is related to AdInteraction:
public function getClicksForLastDays()
{
return $this->clicks()->get()->groupBy(function($date) {
return Carbon::parse($date->created_at)->format('y-m-d');
});
}
However this returns me an array of arrays looking like this:
What would be the correct and most efficient way to fetch the clicks and count them by days?

try this and let me know, I assume your column names are date,clicks,views, if its different then pls let me know, so I will adjust the answer or you can do it your self..
AdInteraction::select([DB::raw('DATE(date)'),DB::raw('count(case when clicks ="true" then 1 end) as "Clicks"'),
DB::raw('count(case when views ="true" then 1 end) as "Views"')])
->groupBy(DB::raw('DATE(date)'))
->get();
or try this
AdInteraction::select([DB::raw('DATE(date)'),DB::raw('count(case when clicks =true then 1 end) as "Clicks"'),
DB::raw('count(case when views =true then 1 end) as "Views"')])
->groupBy(DB::raw('DATE(date)'))
->get();

You should consider abandoning the idea of grouping by date using datetime column since such query will be very inefficient. When you, for example, GROUP BY DATE(created_at) MySQL will be performing this cast function for each row and won't be able to utilize indexes for created_at.
Therefore I recommend you to denormalize your table by introducing separate DATE created_date_at column for created_at value and create an index for it. Then you will be able to efficiently group your stats by this new column value. Just be sure to register the following code for your model:
AdInteraction::creating(function ($adInteraction) {
$adInteraction->created_date_at = $adInteraction->created_at->format('Y-m-d');
});
Or you can consider creating separate int columns for year, month and day. Then you can create a multi-column index and group by these columns. This way you will be able to also easily retrieve stats by days, months and years if needed.

Related

AggregatingMergeTree order by column not in the sorting key

What are some options to have AggregatingMergeTree merge by a column but ordered by a column that's not in the sorting key?
My application is similar to Zendesk tickets. A ticket has a category, status, and ID. The application emits ticket status change events to CH and I'm calculating statistics on the time it took to close since it was created given some time range R group by some time period P.
For example, events look like this
{
"ticket": "A",
"event_time": 2022-12-08T15:00:00Z,
"category": "bug",
"status": "created"
},
{
"ticket": "A",
"event_time": 2022-12-08T15:30:00Z,
"category": "bug",
"status": "reviewing"
},
{
"ticket": "A",
"event_time": 2022-12-08T16:00:00Z,
"category": "bug",
"status": "reviewed"
}
My AggregatingMergeTree (more specifically, it's replicated) has a sorting key on the ticket ID to aggregate two states into one.
CREATE TABLE ticket_created_to_reviewed
(
`ticket` String,
`created_ticket_event_id` SimpleAggregateFunction(max, String),
`created_ticket_event_time` SimpleAggregateFunction(max, DateTime64(9)),
`created_ticket_category` SimpleAggregateFunction(max, String),
`close_ticket_event_id` SimpleAggregateFunction(max, String),
`close_ticket_event_time` SimpleAggregateFunction(max, DateTime64(9)),
`close_ticket_category` SimpleAggregateFunction(max, String),
)
ENGINE = ReplicatedAggregatingMergeTree('<path>', '{replica}')
PARTITION BY toYYYYMM(close_ticket_event_time)
PRIMARY KEY ticket
ORDER BY ticket
TTL date_trunc('second', if(close_ticket_event_time > created_ticket_event_time,
close_ticket_event_time, created_ticket_event_time)) + toIntervalMonth(12)
SETTINGS index_granularity = 8192
Two MVs SELECT on the raw events and inserts to the ticket_created_to_reviewed. One for WHERE status = 'created' and another for WHERE status = 'reviewed'
So far the data populates correctly, although I have to exclude rows that only have one of the status events populated. Getting hourly p99 of ticket time to close past 1 day for each category looks something like this
SELECT
quantile(0.9)(date_diff('second', created_ticket_event_time, close_ticket_event_time)),
date_trunc('hour', close_ticket_event_time) as t,
close_ticket_category as category
FROM
(
SELECT
ticket,
max(created_ticket_event_id) AS created_ticket_event_id,
max(created_ticket_event_time) AS created_ticket_event_time,
max(created_ticket_category) AS created_ticket_category,
max(close_ticket_event_id) AS close_ticket_event_id,
max(close_ticket_event_time) AS close_ticket_event_time,
max(close_ticket_category) AS close_ticket_category
FROM ticket_unreviewed_to_reviewed
GROUP BY ticket
)
WHERE close_ticket_event_id != '' AND created_ticket_event_id != '' AND
close_ticket_event_time > addDays(now(), -1)
GROUP BY t, category
The problem is close_ticket_event_time is not in the sorting key so the query scans the full table, but I can't also include that column in the sorting key because the table wouldn't then aggregate by the ticket ID.
Any suggestions?
Things tried:
Adding an index and/or projection that orders by close_ticket_event_time. However, I think the main problem is that the sorting key is on ticket ID so the data is not ordered by time to efficiently find the matching time range, but at the same time adding close_ticket_event_time breaks the aggregation behavior in AggregatingMergeTree
MV that joins created ticket and closed ticket, and a different destination table with close_ticket_event_time as the sorting key. The destination table doesn't contain all the data if the right side of the JOIN isn't available at the time MV was triggered (i.e. left side). This can happen if events are ingested out of order.
Ideally, what I'm looking for is something like this in AggregatingMergeTree, but it appears this isn't possible due to the nature of how the data is stored.
PRIMARY KEY ticket
ORDER BY close_ticket_event_time
Thanks in advance

Filter on date in PowerQuery (PowerBI)

I'm currently getting to much data from my cosmosDB, which I want to reduce to the last 8 weeks.
How can I filter in PowerQuery to get the last 8 weeks based on my date column.
This is my powerquery to get the data:
let
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx"),
#"Expanded Document" = Table.ExpandRecordColumn(Source, "Document", {"$v"}, {"Document.$v"}),
#"Expanded Document.$v" = Table.ExpandRecordColumn(#"Expanded Document", "Document.$v", {"date"}, {"Document.$v.date"}),
#"Expanded Document.$v.date" = Table.ExpandRecordColumn(#"Expanded Document.$v", "Document.$v.date", {"$v"}, {"Document.$v.date.$v"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Document.$v.date",{{"Document.$v.date.$v", type text}})
in
#"Changed Type"
And this is how the data is in my CosmosDB:
{
"_id" : ObjectId("5c6144bdf7ce070001acc213"),
"date" : {
"$date" : 1549792055030
},
If you want to do all the work on your end (maybe the server can do some/all of it):
Assuming the 1549792055030 (shown in example) is a Unix timestamp expressed in milliseconds, to convert to a datetime in Power Query, try something like: #datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, 1549792055030/1000)
You seem to expand a record field named $v (which itself was nested within a field named date, which itself was nested within a field named $v) in your M code, but $v is not shown as being present in the structure. I mention this as it's confusing to know whether to follow your M code or the structure. I'm going to assume that you have $v field, which contains a date field, which itself contains a $date field. To get at the nested Unix timestamp, you could try something like: someRecord[#"$v"][date][#"$date"]
Since you're interested in only the last 8 weeks, you could test for something like: Date.IsInPreviousNWeeks(DateTime.AddZone(someDatetime, 0), 8). (You could also do it the other way, by converting 8 weeks ago before now to a Unix timestamp and then filter for timestamps >= to the value you've worked out.)
Putting the above together, we might get some M code that looks like:
let
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx"),
filterDates = Table.SelectRows(Source, each
let
millisecondsSinceEpoch = Number.From([document][#"$v"][date][#"$date"]),
toDatetime = #datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, millisecondsSinceEpoch/1000),
toFilter = Date.IsInPreviousNWeeks(DateTime.AddZone(toDatetime, 0), 8)
in toFilter
)
in filterDates
The code above may be functional (hopefully) but, conceptually, it might not be the right way to do it. I am not familiar with the function DocumentDB.Contents, but this link (https://www.powerquery.io/accessing-data/document-db/documentdb.contents) suggests it has these parameters:
function (url as text, optional database as nullable any, optional
collection as nullable any, optional options as nullable record) as
table
and it goes on to say:
if the field Query is specified in the options record the results of
the query being executed on either the specified database and/or
collection will be returned.
What I understand this to mean is that if you change your first line to something like:
Source = DocumentDB.Contents("https://xxx.xxx", "xxx", "xxx", [Query = "..."])
and the query you specify in "..." is understood by the server (presume the query needs be in Cosmos DB's native query language), only the last 8 weeks' worth of data will be returned to you (meaning less data needs sending and less work for you). As I said, I'm unfamiliar with Azure Cosmos DB, so I can't really comment further. But this seems the better way of doing it.

Year over Year Stats from a Crossfilter Dataset

Summary
I want to pull out Year over Year stats in a Crossfilter-DC driven dashboard
Year over Year (YoY) Definition
2017 YoY is the total units in 2017 divided by the total units in 2016.
Details
I'm using DC.js (and therefore D3.js & Crossfilter) to create an interactive Dashboard that can also be used to change the data it's rendering.
I have data, that though wider (has ~6 other attributes in addition to date and quantity: size, color, etc...sales data), boils down to objects like:
[
{ date: 2017-12-7, quantity: 56, color: blue ...},
{ date: 2017-2-17, quantity: 104, color: red ...},
{ date: 2016-12-7, quantity: 60, color: red ...},
{ date: 2016-4-15, quantity: 6, color: blue ...},
{ date: 2017-2-17, quantity: 10, color: green ...},
{ date: 2016-12-7, quantity: 12, color: green ...}
...
]
I'm displaying one rowchart per attribuet such that you can see the totals by color, size, etc. People would use each of these charts to be able to see the totals by that attribute and drill into the data by filtering by just a color, or a color and a size, or a size, etc. This setup is all (relatively) straight forward and kind of what DC is made for.
However, now I'd like to add some YoY stats such that I can show a barchart with x-axis as the years, and the y-axis as the YoY values (ex. YoY-2019 = Units-2019 / Units-2018). I'd also like to do the same by quarter and month such that I could see YoY Mar-2019 = Units-Mar-2019 / Units-Mar-2018 (and the same for quarter).
I have a year dimension and sum quantity
var yearDim = crossfilterObject.dimension(_ => _.date.getFullYear());
var quantityGroup = yearDim.group.reduceSum(_ => _.quantity);
I can't figure out how to do the Year over Year calc though in the nice, beautiful DC.js-way.
Attempted Solutions
Year+1
Add another dimension that's year + 1. I didn't' really get any further though because all I get out of it are two dimensions whose year groups I want to divide ... but am not sure how.
var yearPlusOneDim = crossfilterObject.dimension(_ => _.date.getFullYear() + 1);
Visually I can graph the two separately and I know, conceptually, what I want to do: which is divide the 2017 number in yearDim by the 2017 number in YearPlusOneDim (which, in reality, is the 2016 number). But "as a concept is as far as I got on this one.
Abandon DC Graphing
I could always use the yearDim's quantity group to get the array of values, which I could then feed into a normal D3.js graph.
var annualValues = quantityGroup.all();
console.log(annualValues);
// output = [{key: 2016, value: 78}, {key: 2017, value: 170}]
// example data from the limited rows listed above
But this feels like a hacky solution that's bound to fail and not benefit from all the rapid and dynamic DC updating.
I'd use a fake group, in order to solve this in one pass.
As #Ethan says, you could also use a value accessor, but then you'd have to look up the previous year each time a value is accessed - so you'd probably have to keep an extra table around. With a fake group, you only need this table in the body of your .all() function.
Here's a quick sketch of what the fake group might look like:
function yoy_group(group) {
return {
all: function() {
// index all values by date
var bydate = group.all().reduce(function(p, kv) {
p[kv.key.getTime()] = kv.value;
return p;
}, {});
// for any key/value pair which had a value one year earlier,
// produce a new pair with the ratio between this year and last
return group.all().reduce(function(p, kv) {
var date = d3.timeYear.offset(kv.key, -1);
if(bydate[date.getTime()])
p.push({key: kv.key, value: kv.value / bydate[date.getTime()]});
return p;
}, []);
}
};
}
The idea is simple: first index all the values by date. Then when producing the array of key/value pairs, look each one up to see if it had a value one year earlier. If so, push a pair to the result (otherwise drop it).
This should work for any date-keyed group where the dates have been rounded.
Note the use of Array.reduce in a couple of places. This is the spiritual ancestor of crossfilter's group.reduce - it takes a function which has the same signature as the reduce-add function, and an initial value (not a function) and produces a single value. Instead of reacting to changes like the crossfilter one does, it just loops over the array once. It's useful when you want to produce an object from an array, or produce an array of different size from the original.
Also, when indexing an object by a date, I use Date.getTime() to fetch the numeric representation of the date. Otherwise the date coerces to a string representation which may not be exact. Probably for this application it would be okay to skip .getTime() but I'm in the habit of always comparing dates exactly.
Demo fiddle of YOY trade volume in the data set used by the stock example on the main dc.js page.
I've rewritten #Gordon 's code below. All the credit is his for the solution (answered above) and I've just wirtten down my own version (far longer and likely only useful for beginners like me) of the code (much more verbose!) and the explanation (also much more verbose) to replicate my thinking in bridging my near-nothing starting point up to #Gordon 's really clever answer.
yoyGroup = function(group) {
return { all: function() {
// For every key-value pair in the group, iterate across it, indexing it by it's time-value
var valuesByDate = group.all().reduce(function(outputArray, thisKeyValuePair) {
outputArray[thisKeyValuePair.key.getTime()] = thisKeyValuePair.value;
return outputArray;
}, []);
return group.all().reduce(function(newAllArray, thisKeyValuePair) {
var dateLastYear = d3.timeYear.offset(thisKeyValuePair.key, -1);
if (valuesByDate[dateLastYear.getTime()]) {
newAllArray.push({
key: thisKeyValuePair.key,
value: thisKeyValuePair.value / valuesByDate[dateLastYear.getTime()] - 1
});
}
return newAllArray;
}, []); // closing reduce() and a function(...)
}}; // closing the return object & a function
};
¿Why are we overwritting the all() function?
When DC.js goes to create a graph based on a grouping, the only function from Crossfilter it uses is the all() function. So if we want to do something custom to a grouping to affect a DC graph, we only have to overwrite that one function: all().
¿What does the all() function need to return?
A group's all function must return an array of objects and each object must have two properties: key & value.
¿So what exactly are we doing here?
We're starting with an existing group which shows some values over time (Important Assumption: keys are date objects) and then creating a wrapper around it so that we can take advantage of the work that crossfilter has already done to aggregate at a certain level (ex. year, month, etc.).
We start by using reduce to manipulate the array of objects into a more simple array where the keys and values that were in the objects are now directly in the array. We do this to make it easier to look up values by keys.
before / output structure of group.all()
[ {key: k1, value: v1},
{key: k2, value: v2},
{key: k3, value: v3}
]
after
[ k1: v1,
k2: v2,
k3: v3
]
Then we move on to creating the correct all() structure again: an array of objects each of which has a key & value property. We start with the existing group's all() array (once again), but this time we have the advantage of our valuesByDate array which will make it easy to look up other dates.
So we iterate (via reduce) over the original group.all() output and lookup in the array we generated earlier (valuesByDate), if there's an entry from one year ago (valuesByDate[dateLastYear.getTime()]). (We use getTime() so it's simple integers rather than objects we're indexing off of.) If there is an element of the array from one year ago, then we add a key-value object-pair to our soon-to-be-returned array with the current key (date) and for the value we divide the "now" value (thisKeyValuePair.value) by the value 1 year ago: valuesByDate[dateLastYear.getTime()]. Lastly we subtract 1 so that it's (the most traditional definition of) YoY. Ex. This year = 110 and last year = 100 ... YoY = +10% = 110/100 - 1.

MDX Calculation on SETS

I have an MDX query where I am using a Parent-Child hierarchy where a property on any level have a specific values.
Now I want to take create a sets, that have each of these specific values and subtract them from each other
The query I have looks like this:
WITH
SET [OMS] AS
{
DESCENDANTS(
FILTER([ReportHierarchy].[Hierarchy].MEMBERS,
[ReportHierarchy].[Hierarchy].Properties( "Sum Code" )="OMS")
,,SELF)
}
SET [VF] as {
DESCENDANTS(
FILTER([ReportHierarchy].[Hierarchy].MEMBERS,
[ReportHierarchy].[Hierarchy].Properties( "Sum Code" )="VF")
,,SELF)
}
SELECT
{
[Measures].[Amount],
[Measures].[Budget Amount]
} ON COLUMNS,
{
[OMS],
[VF]
}
on ROWS
FROM
Finance
WHERE
[ReportHierarchy].[Hierarchy Name].&[Income and Balance]
which returns this result:
Amount Budget Amount
Nettoomsætning -126418831.1 -308192540.75
Vareforbrug 65415924.25 159307880.45
Now I want to do a calulation which subtracts SET [VF] from set [OMS]...
Anyone have any suggestions?
you need WITH MEMBER to create a new item in your left-most column. This new item can be set up to calculate the value of one item minus another. Here's a similar situation from an old query I wrote years ago:
WITH MEMBER [Actual Time].[MySubtraction]
AS '[Actual Time].[Week].[Week 5] - [Actual Time].[Week].[Week 4]', SOLVE_ORDER=80
SELECT {
[Location].[All Location].[Blah Blah]
} ON ROWS,{
[Actual Time].[Week].members,
[Actual Time].[MySubtraction]
}
ON COLUMNS FROM [CubeName]
WHERE ([Measures].[Whatever])
My MDX is getting rusty these days, so I have not tried to give you the exact query you need, sorry. Have a look at the documentation for WITH MEMBER to learn more.

CouchDB Group By Values, Date Range

I have a bunch of documents in a CouchDB instance with the following data:
{
"_id": "[string based ID generated by CouchDB]",
"action": "view",
"group": [Integer representing a group number],
"date": [Javascript timestamp]
}
I can use the following to group data by group and action and to get the total number of actions per group:
function(doc) {
emit([doc.group, doc.action], [1, 1]);
}
(With reduce simply being _sum).
The issue is, this fetches data from all groups, whereas I only want the data from a single group (e.g. from group 1).
Also, I know I can do something like this to filter by date, but how would I combine it with the above to filter by date and by group ID?
function(doc) {
var then = new Date(Date.parse(doc['Event Date']));
var fatalities = 0;
if (doc['Total Fatal Injuries']!="") {
fatalities = parseInt(doc['Total Fatal Injuries']);
}
emit([then.getFullYear(), then.getMonth()], [1, fatalities]);
}
(From https://cloudant.com/blog/mapreduce-from-the-basics-to-the-actually-useful/)
Thanks!
I think you have your map() and reduce() functions correct already, you simply need to pass reduce=true&group_level=1&startkey=[<group id>]&endkey=[<group id>.toString() + "a"] to the view.
As for the filtering by date, you can use the same strategy. Using the group level, you can see sums per year and per month.

Resources