Funnel through dimension in Google Analytics? - events

Is there a way in GA to detect something like a funnel but through a custom dimension rather than a session?
I need to identify if two events were (not)appearing in a given order:
- identify dimensions project where event deleted was not followed by event created at any time in future (date interval).
It need not to happen within one session, but I want to know if that happened within some custom dimensions.
Is that possible in GA? How?

Related

Recurring event search that supports pagination (in Elasticsearch)

I need to implement a feature where there's an event schedule page. Initially, the first 20 events are shown starting from today. When the user scrolls down to the bottom, the next 20 events are fetched. Additionally, the user can search for events by entering some text in a search field.
The frontend implementation is fairly simple. However, what's hard is the backend side where I need to find a way to handle recurring events with pagination. I have come up with a few solutions, but none of them are cheap enough. I'm using Elasticsearch for this.
Solution 1: Storing each recurring event instance as a separate document
Let’s say we have an event that occurs daily for a year, then we’ll store 365 instances of that event in Elasticsearch.
Advantages: Doing a search with pagination is quite easy this way.
Drawbacks:
First, there’s a problem with infinite projection (recurring events that never end). It can be solved by only storing the recurring event instances for 2 years initially, and then there'll be a scheduled task that will create new required occurrences every day.
Second, creating, updating, and deleting can be costly. Imagine creating/updating/deleting hundreds of documents at once.
This is the solution I believe Google Calendar is currently using. For example, I tried creating an event that repeats daily infinitely in Google Calendar, but I ended up only being able to search for it in two years' time.
Solution 2: Storing only the original recurring event as a single document
Let’s say we have an event that occurs daily for a year, then we’ll only store 1 instance of that event in Elasticsearch with RRULE. And after having the query results from Elasticsearch, I'll do some postprocessing to expand the recurring event into multiple events and do the pagination afterwards.
Advantages: No duplication of recurring event instances. Thus, no worry about endless recurring events, and creating, updating & deleting recurring events should be really cheap.
Drawbacks:
Doing post-pagination can be costly, especially when we have a large number of recurring events. The reason for that is we don't know beforehand if a recurring event should appear in the final search results or not, so we'll have to expand every existing recurring event, and then do the pagination.
We're not utilizing Elasticsearch's built-in pagination.
I'm sure there's a better way to do this. Any idea/suggestion is greatly appreciated. The most important (also the hardest) thing about this is achieving pagination.
Thanks.

Log analytics using Elasticsearch & Kibana - Few queries

I have just started playing around with ELK to develop our log analytics solution.
I had a few questions regarding the best practices so that I don't make any bad choice to begin with.
This tool will analyze various types of logs to find out and correlate any issue. It will run on multiple 'devices' and each device will be uniquely identifiable with a serial number.
Question 1) Is it possible to create a dashboard where the serial number is taken as an user input?
Details: I would like to have 1 dashboard created to analyze various fields and I should be able to specify the serial number of the device as an input. From what I see, I could use filter but then this would need the visualization to be 'edited'. So it appears to be me that right now, if I need to analyze multiple devices then I need to create a dashboard for each of the device. This will be a problem that if I need to modify the dashboard then I will have to make changes to all. The problem can be minimized by importing additional dashboards as a JSON file, still it is inconvenient.
Is there a better way that I am not aware of?
Question 2) On the main dashboard, I want to show a heatmap of various 'services' and their status as a time series. For e.g. say I am monitoring, CPU, memory, network and our service then I want to see something like below:
Now the heatmap visualization doesn't provide a way to uniquely specify the condition. I generated above image by populating dummy data where values were one of 0,1,2,3. Which means that I need to create such data periodically which the visualization can then use. Is there any built-in mechanism (scheduled jobs for e.g.) provided by ELK to do such processing. One option could be to run an external problem which queries Elasticsearch, fetches all the relevant information, analyzes it and puts it back into Elasticssearch. Is that the only way?
If there are any other suggestions, please feel free to share. Thanks.

Addressing CRUD "tables" in event sourcing

I'm starting down an ES journey and want to know if traditional support tables should be stored in the event log or should those be handled differently? These tables would typical have a CRUD page. In other words, would it be common to have 2 approaches in the same application, one for support tables and one for transactional data?
A support table would be like "Account" in an accounting application or "Product Type" or the actual "Product" table in an ERP application (I'm not writing an ERP application - that's an example of the type of table I'm talking about).
If we store CRUD-type data in the event log, then we might have events:
ProductCreated
ProductUpdated
ProductDeleted (which would just mark it as deleted)
Then, do we attempt to find out what changed (in ProductUpdated event) and just store the change and replay to get the latest image of the Product?
Mostly, I'm after what approach to use for CRUD tables - traditional or store in the event log? Additional information would be great!
Suppose you start purely with an event log, including for events like ProductCreated, etc., and no other data store. What happens then is that every time your application starts up, it has to replay all the events in the log to build its current state.
Now, suppose you create a traditional SQL table to store the current state of your app (say a products table) and the ID of the last event that was processed to get to that state (say a last_event table). What happens then is every time your app starts up, it has to replay only the events with higher IDs than the stored ID and process those to build its new state.
On the flip side, your app now has to be careful to keep these two states synchronised. If you need to have concurrency, you'll need to be careful to do atomic operations only on your SQL tables--but that should be reasonably easy with transacctions.
Your support tables are just a read-model/projection of the event stream. In general you don't create those support models in case you need them. You create a read-model only if you use it somewhere in the UI.
Anyway, one important benefit behind Event sourcing is that you won't need to use join in your queries. That is, you create a table for each read-model that contains all the data it needs - full denormalisation. You keep that table super-optimised for the query.

Text search for microservice architectures

I am investigating into implementing text search on a microservice based system. We will have to search for data that span across more than one microservice.
E.g. say we have two services for managing Organisations and managing Contacts. We should be able to search for organisations by contact details in one search operation.
Our preferred search solution is Elasticsearch. We already have a working solution based on embedded objects (and/or parent-child) where when a parent domain is updated the indexing payload is enriched with the dependent object data, which is held in a cache (we avoid making calls to the service managing child directly for this purpose).
I am wondering if there is a better solution. Is there a microservice pattern applicable to such scenarios?
It's not particularly a microservice pattern I would suggest you, but it fits perfectly into microservices and it's called Event sourcing
Event sourcing describes an architectural pattern in which events are generated by different sources. An event will now trigger 0 or more so called Projections which then use the data contained in the event to aggregate information in the form it is needed.
This is directly applicable to your problem: Whenever the organisation service changes it's internal state (Added / removed / updated an organization) it can fire an event. If an organization is added, it will for example aggregate the contacts to this organization and store this aggregate. The search for it is now trivial: Lookup the organizations id in the aggregated information (this can be indexed) and get back the contacts associated with this organization. Of course the same works if contracts are added to the contract service: It just fires a message with the contract creation information and the corresponding projections now alter different aggregates that can again be indexed and searched quickly.
You can have multiple projections responding to a single event - which enables you to aggregate information in many different forms - exactly the way you'd like to query it later. Don't be afraid of duplicated data: event sourcing takes this trade-off intentionally and since this is not the data your business-services rely on and you do not need to alter it manually - this duplication will not hurt you.
If you store the events in the chronological order they happened (which I seriously advise you to do!) You can 'replay' these events over and over again. This helps for example if a projection was buggy and has to be fixed!
If your're interested I suggest you read up on event sourcing and look for some kind of event store:
event sourcing
event store
We use event sourcing to aggregate an array of different searches in our system and we aggregate millons of records every day into mongodb. All projections have their own collection create their own indexes and until now we never had to resort to different systems / patterns like elastic search or the likes!
Let me know if this helped!
Amendment
use the data contained in the event to aggregate information in the form it is needed
An event should contain all the information necessary to aggregate more information. For example if you have an organization creation event, you need to at least provide some information on what the organizations name is, an ID of some kind, creation date, parent organizations ID etc. As a rule of thumb, we send all the information we gather in the service that gets the request (don't take it directly form the request ;-) check it first, then write it to the event and send it off) because we do not know what we're gonna need in the future. Just stay cautious - payloads should not get too large!
We can now have multiple projections responding to this event: One that adds the organizations to it's parents aggregate (to get an easy lookup for all children of a given organization), one that just adds it to the search set of all organizations and maybe a third that aggregates all the parents of a given child organization so the lookup for the parent organizations is easy and fast.
We have the same service process these events that also process client requests. The motivation behind it is, that the schema of the data that your projections create is tightly coupled to the way it is read by the service that the client interacts with. This does not have to be that way and it could be separated into two services - but you create an almost invisible dependency there and releasing these two services independently becomes even more challenging. But if you do not mind that additional level of complexity - you can separate the two.
We're currently also considering writing a generic service for aggregating information from events for things like searches, where projections could be scripted. That only makes the invisible dependencies problem less conspicuous, it does not solve it.

Designing a Calendar system like Google Calendar [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have to create something similiar to Google Calendar, so I created an events table that contains all the events for a user.
The hard part is handling re-occurring events, the row in the events table has an event_type field that tells you what kind of event it is, since an event can be for a single date only, OR a re-occuring event every x days.
The main design challenge is handling re-occurring events.
When a user views the calendar, using the month's view, how can I display all the events for the given month? The query is going to be tricky, so I thought it would be easier to create another table and create a row for each and every event, including the re-occuring events.
What do you guys think?
I'm tackling exactly this problem, and I had completely spaced iCalendar (rfc 2445) up until reading this thread, so I have no idea how well this will or won't integrate with that. Anyway the design I've come up with so far looks something like this:
You can't possibly store all the instances of a recurring event, at least not before they occur, so I simply have one table that stores the first instance of the event as an actual date, an optional expiration, and nullable repeat_unit and repeat_increment fields to describe the repetition. For single instances the repition fields are null, otherwise the units will be 'day', 'week', 'month', 'year' and increment is simply the multiple of units to add to start date for the next occurrence.
Storing past events only seems advantageous if you need to establish relationships with other entities in your model, and even then it's not necessary to have an explicit "event instance" table in every case. If the other entities already have date/time "instance" data then a foreign key to the event (or join table for a many-to-many) would most likely be sufficient.
To do "change this instance"/"change all future instances", I was planning on just duplicating the events and expiring the stale ones. So to change a single instances, you'd expire the old one at it's last occurrence, make a copy for the new, unique occurrence with the changes and without any repetition, and another copy of the original at the following occurrence that repeats into the future. Changing all future instances is similar, you would just expire the original and make a new copy with the changes and repition details.
The two problems I see with this design so far are:
It makes MWF-type events hard to represent. It's possible, but forces the user to create three separate events that repeat weekly on M,W,F individually, and any changes they want to make will have to be done on each one separately as well. These kind of events aren't particularly useful in my app, but it does leave a wart on the model that makes it less universal than I'd like.
By copying the events to make changes, you break the association between them, which could be useful in some scenarios (or, maybe it would just be occasionally problematic.) The event table could theoretically contain a "copied_from" id field to track where an event originated, but I haven't fully thought through how useful something like that would be. For one thing, parent/child hierarchical relationships are a pain to query from SQL, so the benefits would need to be pretty heavy to outweigh the cost for querying that data. You could use a nested-set instead, I suppose.
Lastly I think it's possible to compute events for a given timespan using straight SQL, but I haven't worked out the exact details and I think the queries usually end up being too cumbersome to be worthwhile. However for the sake of argument, you can use the following expression to compute the difference in months between the given month and year an event:
(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12))
Building on the last example, you could use MOD to determine whether difference in months is the correct multiple:
MOD(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12), repeatIncrement) = 0
Anyway this isn't perfect (it doesn't ignore expired events, doesn't factor in start / end times for the event, etc), so it's only meant as a motivating example. Generally speaking though I think most queries will end up being too complicated. You're probably better off querying for events that occur during a given range, or don't expire before the range, and computing the instances themselves in code rather than SQL. If you really want the database to do the processing then a stored procedure would probably make your life a lot easier.
As previously stated, don't reinvent the wheel, just enhance it.
Checkout VCalendar, it is open source, and comes in PHP, ASP, and ASP.Net (C#)!
Also you could check out Day Pilot which offers a calendar written in Asp.Net 2.0. They offer a lite version that you could check out, and if it works for you, you could purchase a license.
Update (9/30/09):
Unless of course the wheel is broken! Also, you can put a shiny new coat of paint if you like (ie: make a better UI). But at least try to find some foundation to build off of, since the calendar system can be tricky (with Repeating events), and it's been done thousands of times.
Attempting to store each instance of every event seems like it would be really problematic and, well, impossible. If someone creates an event that occurs "every thursday, forever", you clearly cannot store all the future events.
You could try to generate the future events on demand, and populate the future events only when necessary to display them or to send notification about them. However, if you are going to build the "on-demand" generation code anyway, why not use it all the time? Instead of pulling from the event table, and then having to use on-demand event generation to fill in any new events that haven't been added to the table yet, just use the on-demand event generation exclusively. The end result will be the same. With this scheme, you only need to store the start and end dates and the event frequency.
I don't see any way that you can avoid having on-demand event generation, so I can't see the utility in the event table. If you want it for the sake of caching, then I think you're taking the wrong approach. First, it's a poor cache because you can't avoid on-demand event generation anyway. Second, you should probably be caching at a higher level anyway. If you want to cache, then cache generated pages, not events.
As far as making your polling more efficient, if you are only polling every 15 minutes, and your database and/or server can't handle the load, then you are already doomed. There's no way your database will be able to handle users if it can't handle much, much more frequent polling without breaking a sweat.
I would say start with the ical standard. If you use it as your model, then you'll be able to do everything that google calendar, outlook, mac ical (the program), and get virtually instant integration with them.
From there, time to bone up on your ajax and javascript cuz you can't have a flashy web ui with drag drop and multiple calendars without a ton of ajax and javascript.
You should have a start date, end date, and expiration date. Single day events would have the same start date and end date, and allows you to do partial day events as well. As for re-occuring events, then the start and end date would be for the same day, but have different times, then you have an enumeration or table that specifies the repeat frequency (daily, weekly, monthly, etc).
This allows you to say "this event appears every day" for daily, "this event appears on the 2nd day of every week" for weekly, "this event appears on the 5th day of every month" for monthly, "this event appears on the 215th day of every year" for yearly as long as the date is less than the expiration date.
Darren,
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
This is why I wanted to create another table that would expand out all the events/re-occuring events, this way I can simple hit that table and get the users months view of events without doing any complicated querying and business logic, AND it will make polling much more effecient also.
The question is, should this secondary table be for the next day or month? What makes more sense? Since the maximum a user can view is a months view, I am leaning towards a table that writes out all the events for a given month.
(ofcourse I will have to maintain this secondary table for any edits the user might make to the original events table).
ChanChan,
I have designed it with the same sort of functionality actually, but I am just referring to how I will go about storing events, specifically how to handle re-occurring events.
The brute-force-ish but still reasonable way would be to create a new row in your single events table for every instance of the recurring event, all pointing not to the event preceding it in the series but to the first event in the series. This simplifies selecting and/or deleting all elements in a particular series, since you can select based on parent id. It also allows users to delete individual items from a series without affecting the rest of them.
This query gets you the series that starts on element 3:
SELECT * FROM events WHERE id = 3 OR parentid = 3
To get all items for this month, assuming you'd have a start date and an end date in your events table, all you'd have to do is:
SELECT * FROM events WHERE startdate >= '2008-08-01' AND enddate <= '2008-08-31'
Handling the creation/modification of series programmatically wouldn't be very difficult, but it really would depend on the feature set you want to provide and how you think it'll be used. If you want to differentiate between series and events, you could have a separate series table and a nullable series_id on your events, allowing you the freedom to muck about with individual events while still retaining control over your series.
From past experience I would create a new record for each occurring event and then have a column which references the previous event so you can track all events in a series.
This has two advantages:
No complicated routines to work out the next event date
Individual occurrences can be edited without effecting the rest
Hope this gives you some food for thought :)
I have to agree with #ChanChan on reading the ical spec for how to store these things. There is no easy way to handle recurrences, especially ones that have exceptions. I've built and rebuilt and rebuilt a database to handle this, and I keep coming back to ical.
It's not a bad idea to create a subordinate table, depending on use cases. The algorithm for calculating exactly when occurrences . . . um, occur . . . can indeed be quite complex. There's no getting away from running it, but it's worth considering caching the results.
#GateKiller
I hadn't thought of the case where you edit individual occurrences. It makes sense you would store the occurrences separately in that case.
When you do that, though, how far in the future do you store events? Do you pick an arbitrary date? Do you auto-generate the new occurrences the first time a user browses out into future months/years?
Also, how do you handle the case where the user wants to edit the whole series. "We've had a meeting every Tuesday morning at 10:30 but we're going to start meeting on Wednesday at 8"
I think I understand your second paragraph to mean you are considering a second events table that has a row for each occurrence of an event. I would avoid that.
Re-occurring events should have a start date and a stop date (which could be Null for events that continue every X days "forever") You'll have to decide what kinds of frequency you want to allow -- every X days, the Nth day of each month, every given weekday, etc.
I'd probably tend toward two tables - one for one time events and a second for recurring events. You'll have to query and display the two separately.
If I were going to tackle this (and I'd try as hard as I can to avoid reinventing this wheel) I'd look for open-source libraries or, at the very least, open source projects with Calendars that you can look at. Any recommendations guys?
undefined wrote:
…this table will be hit pretty
hard, especially if I am going to be
sending out emails to remind people of
their events (events can be for
specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Create a table for the notifications. Poll only it.
Update the notification table when events (recurring or otherwise) are updated.
EDIT: A database View might not violate normal forms, if that's a concern. But, you'll probably want to track which notifications were sent and which have not yet been sent somewhere anyway.
Derek Park,
I would be creating each and every instance of an event in a table, and this table would be regenerated every month (so any event that was set to reoccurr 'forever' would be regenerated one month in advance using a windows service or maybe at the sql server level).
The polling won't only be done every 15 minutes, that might only be for polls related to email notifications. When someone wants to view their events for a month, I will have to fetech all their events, and re-occuring events and figure out which events to display (since a re-occuring event might have been created 6 months ago, but relates to a month the user is viewing).
Zack, i'm not too concerned with having a perfectly normalized database, the fact that I'm thinking of creating a secondary table is already breaking one of the rules hehe. My core database tables are following 'the rules', but I don't mind creating secondary tables/columns at times when it benefits things performance wise.
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Databases do exception jobs of handling sets of data, so i wouldn't be too worried about that. What you should do is use this as your primary table, and then as events expire then move them into another table (like an archive).
The next thing is you want to try is to query the db as little as possible, so move the information into a caching tier (like velocity) and just persist data to the database.
Then, you can partition the information across multiple databases for scaling purposes. ie users 1-10000 calendars exist on server 1, 10001 - 20000 exist on server 2, etc.
That's how i would scale a solution like this, but i still think the original solution i proposed is the way to go, it's just how you scale it that becomes the question.
The Ra-Ajax Calendar starter-kit features a sample of handling the RenderDate event which can modify the dates of specifically. Though the "recurring events" is more of an algorithmic thing and here I doubt very few calendars will help you much...
If anyone is doing Ruby there's a great library Runt that does this kind of thing. Worth checking out. http://runt.rubyforge.org/

Resources