I have the following scenario that I was wondering if it's possible/feasible to implement. I apologize if this is considered an overly "broad" question, but I think SO would be the best place to ask this.
Let us suppose I have a website and I want to display a graph to an end-user. For the purposes of this example, let's say we want to show them "Sales per category" in the past hour. The data would be displayed in a graph, and the SQL to run the query might be something like this:
SELECT SUM(revenue) FROM sales
WHERE timestamp > NOW() - INTERVAL 1 HOUR
GROUP BY category
As far as I'm aware, there are two general ways to update the data for the end-user:
Do some sort of polling (or a similar technique) at a certain interval to re-fetch the data from the query. However, this can become quite expensive depending on the complexity/duration of the query and how many people are connected simultaneously.
The second method would be to store all the data in-memory and push the update directly to that memory store (which could be either client-side, or server side, and we could send a ws request to the end user whenever there's a data update. An example of this would be using something like https://github.com/jpmorganchase/perspective.
My question then is if it's possible at all do do real-time data updating (the case I describe in Example 2) when the data is too large to store in memory. I think the answer is a "no", but perhaps I'm missing some ways to do this. For example, let's say I have 1TB of data stored in BigQuery and I am streaming updates to it with new product purchases -- is there a way to push updates to the end-client without having to re-run the query for every time I want to get an update? Are there any other technologies that might be used/useful for this scenario?
Again, I don't think it's possible but wanted to see what's possible for as near-real-time display to an end-client as possible on a queried data set.
If your data is unique per client, big and real-time changing, there is no salvation in using any database or cache as an exchange. You have to send the data update directly.
If you can't directly push data to the client from the process doing the database update, you probably can pass the data from the process doing the update to the process doing the pushes through a message broker (I'll use Rabbitmq as an example).
The optimal configuration for this setup is a topic model, where a topic is a client ID or key, and making one listener per connected client for that topic - alternatively, one listener for all clients, but registering/unregistering topics dynamically.
Have the websocket handler listen to the topic of their client. Setup the process updating the database to also stream updates to the topic id of the client. The broker will discard all updates not going to a connected client, making the load more manageable in the listener end.
Without any storage or polling, this solution is low latency. And even with a thousand simultaneous clients, I doubt the broker would ever exhaust memory.
Since you are interested in this option I decided to extend the comment to an answer. I will take the SQL Server and C# compoment - sqltabledependency. You can check it out if it fits your needs.
You would create a temp table where you would put any changes from the sales table e.g. sales_data_watch (you could have there also the precalculation aggregations as in your example).
You would create a hourly job which would monitor the changes in the sales table and perform insert/updates on the sales_data_watch
You would have connected the C# sqltabledependency connected to sales_data_watch (note: taken from the example to fit your table)
public class SaleData
{
public int revenue{ get; set; }
}
public class Program
{
private static string _con = "data source=.; initial catalog=MyDB; integrated security=True";
public static void Main()
{
// The mapper object is used to map model properties
// that do not have a corresponding table column name.
// In case all properties of your model have same name
// of table columns, you can avoid to use the mapper.
var mapper = new ModelToTableMapper<SaleData>();
mapper.AddMapping(s => s.revenue, "Aggregated revenue");
// Here - as second parameter - we pass table name:
// this is necessary only if the model name is different from table name
// (in our case we have Sale vs Sales).
// If needed, you can also specifiy schema name.
using (var dep = new SqlTableDependency<SaleData>(_con, "sales_data_watch", mapper: mapper));
{
dep.OnChanged += Changed;
dep.Start();
Console.WriteLine("Press a key to exit");
Console.ReadKey();
dep.Stop();
}
}
public static void Changed(object sender, RecordChangedEventArgs<SaleData> e)
{
var changedEntity = e.Entity;
Console.WriteLine("DML operation: " + e.ChangeType);
Console.WriteLine("Revenue: " + changedEntity.Revenue);
}
}
After all the notifications have been distributed you could do truncate table sales_data_watch after (if you don't want the table to grow too big which would slow down the whole process eventually.
This is using only sql server and C# component. There are other, probably better options, for example: Detect record table change with MVC, SignalR, jQuery and SqlTableDependency to do it differently. That will depend on your preferences.
Edit a complete example link for Building real time charts with Angular 5, Google Charts, SignalR Core, .NET Core 2, Entity Framework Core 2 and SqlTable dependency (this link is first page of three). At the top of the page you can see real-time google's gaugeschart. All credits go to anthonygiretti. You can download the example project at github.
Technologies used
Database
Sql Server, localDb with Visual Studio 2017 is correct to make it work
Front End technologies
Angular 5
Google Charts
Visual Studio Code
SignalR Client
BackEnd technologies
.NET Core 2
SignalR Core
EntityFramework Core
EntityFramework Core for Sql Server
SqlTableDependency
First is to install compoenents needed - service broker, SQL Table, Angular-CLI, Angular 5 project, SignalR client (VS 2017, .Net Core 2 SDK installed) - link is the same part1
Next comes the backend setup - part2
To make it work this project contains :
A DbContext (GaugesContext.cs) for EntityFramework Core
A Hub (GaugeHub.cs) for SignalR that broadcast data
A Model that contains strongly typed data to send (Gauge.cs)
A Repository exposed with Entity Framework and its Interface (GaugeRepository.cs and IGaugeRepository.cs)
A Subscription to Gauge sql table with SqlTableDependency and its Interface (GaugeDatabaseSubscription.cs and IDatabaseSubscription)
Two Extension methods that extends IServiceCollection (AddDbContextFactory.cs) and IApplicationBuilder (UseSqlTableDependency.cs)
And Startup.cs and Program.cs
Last part is to setup the frontend - part3
We have :
A folder that contains the gauge chart component (gaugeschart.component.html and gaugeschart.component.ts)
A folder that contains a gauge chart service and a Google Charts base service (google-gauges-chart.service.ts and google-charts.base.service.ts)
A folder that contains environments files
A folder that contains a strongly typed model for the gauge chart (gauge.ts)
Finally at the root of src folder the defaults files components and module (app component files and app module file)
In the next step you should test it to see if the data are projected into the graphs correctly when you changed the data.
I think question might be rooted in an issue with the client's graph and it's design requirements.
A "sales in the last hour" graph is both lacking information and hard to update.
Updates need to deduct sales as the "latest hour" progresses (1:05pm turns to 1:06pm) as well as add new sales.
In addition, the information might look exciting, but it provides very little information that marketing can use to improve sales (i.e., at which hours should more ads be added).
I would consider a 24 hour graph, or a 12 hour graph divided by actual hours.
This could simplify updates and would probably provide more useful metrics.
This way, updates to the graph are always additive, so no in-memory data-store is required (and the information is more actionable).
For example, every new sale could be published to a "new_sale" channel. The published sale data could include it's exact time.
This would allow subscribed clients to add new sales to the correct hour in the graph without ever invoking an additional database call and without requiring an in-memory data-store.
Related
I am working on a SAAS project where users can create various projects. With each project, they can choose from 5 different plans. Each plan has its own costs per month. Hotjar is a kind of equal concept.
Now I want to arrange the subscription with Stripe. The problem with that was that a user can have a maximum x subscription, which of course was a shame. Then I decided to take 1 subscription that has several plans. But now I have a dilemma, to update the subscription you have to change the number via SubscriptionItem. Then you have to save yourself which plan has which SubscriptionItem_id for which user. That is quite a detour and can cause many problems.
Someone is a better way with Stripe or another payment software.
You don't necessarily need to store the subscritpion_item IDs, you can look it up via the subscription_item list API. All you need to do is store the subscription_id for your customers, and based on that ID you can retrieve the list of subscription_items:
\Stripe\Stripe::setApiKey("sk_test_9GavlLpfiKewqeCBXvRvmVgd");
\Stripe\SubscriptionItem::all(["subscription" => "sub_EQlPGjVj4o5luH"]);
Then you can handle the data part of the returned JSON object and update / delete / etc these subscription items.
If you only have the customer_id handy, then you can use the subscription list API (with status as well on the GET params) to retrieve the list of active subscriptions.
Theoretically when using event sourcing you don't store "state" but events. But I saw in many implementations that you store snapshots of state in a column in a format like JSON or just a BLOB. For example:
Using an RDBMS as event sourcing storage
The events table has Data column which stores entire object. To me, it's like storing state for that given time when some event occurred.
Also this picture(taken from Streamstone):
It has Data column with a serialized state. So it stores state too but inside an Event?
So how to replay from the initial state then, If I can simply pick some event and access Data to get the state directly.
What exactly is stored inside Data, is it a state of the entire object or it's serialized event?
Let's say I have a person object (in C#)
public class Person
{
public string Name { get; set }
public int Age { get; set; }
}
What should my event store when I create a person or change properties like name or age.
When I create a Person I most likely will send something like PersonCreatedEvent with the initial state, so the entire object.
But what if I change Name or Age should they be 2 separate events or just 1? PersonChangedEvent or PersonChangedAgeEvent and PersonChangedNameEvent?
What should be stored in the event in this case?
What exactly is stored inside Data, is it a state of the entire object or it's serialized event?
That will usually be a serialized representation of the event.
One way of thinking about it: a stream of events is analogous to a stream of patch documents. Current state is calculated by starting from some known default state and then applying each patch in turn -- aka a "fold". Previous states can be recovered by choosing a point somewhere in the stream, and applying the patches up to that point.
The semantics of events, unlike patches, tends to be domain specific. So Checked-In, Checked-Out rather than Patched, Patched.
We normally keep the events compact - you won't normally record state that hasn't changed.
Within your domain specific event language, the semantics of a field in an event may be "replace" -- for example, when recording a change of a Name, we probably just store the entire new name. In other cases, it may make sense to "aggregate" instead -- with something like an account balance, you might record credits and debits leaving the balance to be derived, or you might update the total balance (like a gauge).
In most mature domains (banking, accounting), the domain language has semantics for recording changes, rather than levels. We write new entries into the ledger, we write new entries into the checkbook register, we read completed and pending transactions in our account statement.
But what if I change Name or Age should they be 2 separate events or just 1? PersonChangedEvent or PersonChangedAgeEvent and PersonChangedNameEvent?
It depends.
There's nothing wrong with having more than one event produced by a transaction.
There's nothing wrong with having a single event schema, that can be re-used in a number of ways.
There's nothing wrong with having more than one kind of event that changes the same field(s). NameLegallyChanged and SpellingErrorCorrected may be an interesting distinction to the business.
Many of the same concerns that motivate task based UIs apply to the design of your event schema.
It still seems to me like PersonChangedEvent will contain all person properties that can change. Even if they didn't change
In messaging (and event design takes a lot of lessons from message design), we often design our schema with optional fields. So the event schema can be super flexible, while any individual representation of an event would be compact (limited to only the information of interest).
To answer your question, an event that is stored should be the event data only. Not the objects state.
When you need to work on your Entity, you read up all the events and apply them to get the latest state every time. So events should be stored with the events data only. (ofc together with AggregateId, Version etc)
The "Objects State" will be the computation of all events, but if you have an Eventlistener that listens to all your published events you can populates a separate ReadModel for you. To query against and use as read only from a users perspective.
Hope it helps!
Updated answer to updated question:
Really depends on your model, if you do the update at the same time Age and Name, yes the new age and name values should be stored in a new event.
The event should only contain this data "name and age with aggregateId, version etc"
The event listener will listen specifically on each event (created, updated etc), find the aggregates read model that you have stored and only update these 2 properties (in this example).
For createevent you create the object for the read model.
I want to create a membership based site in Umbraco 7, following the umbraco.tv videos and reading through the docs have got me quite far.
My members will have custom properties, firstname, lastname, favourite colours, hats owned etc. I have been adding each of these as custom properties and then assigning them to the tab I want. This works fine and I can then access them from code using:
Members.GetCurrentMember().GetProperty("lastname").Value.ToString();
When I looked in my database I noticed that each of these custom properties is a row in the cmsPropertyData table, linked to the cmsMember table by the nodeId column. Is there a way I can set all of this information to store in it's own table?
Ideally, I want each Member to have a one to many relationship with favourite colours, as well as one to many relationships with other tables; each member might have 100 hats for example. What is the best way for me to set this up? Shall I create custom tables in my Umbraco database for HatsOwned and FavouriteColours, then assign each Member a unique ID so I can set my foreign keys up correctly? That way I would only need to store the Members Unique Id in the cmsPropertyTable. Is there a better way to let Umbraco deal with it? Would I have difficulty retrieving Members using either the Umbraco orm, or EF?
Any help or pointers greatly appreciated!
I would store all data in the PROFILE of the member, in the umbraco membership. E.g. timezone, hair color, ... This makes sense for other developers to find back the data.
For all other data, you have a few options:
Relationships
If you want to link nodes to members, or nodes to nodes, or... Relations link 2 umbraco entities and can be one way or two way. If you have a color node, you can link all members to this node. Just create a "favoriteColor" relationship on the developer section, linking up nodes to members. Do some programming and you are done. Don't forget that a relation is a database record linking 2 umbraco entities. So think of some caching if you use this in your front end to take off some database load. Read more on the Relationship Api in the umbraco documentation.
Content
It's pretty easy to create new nodes using code to store e.g. comments on an article. Because you are republishing the xml cache every time you create (and publish) a node, don't use content nodes for stroring your data if you have a lot of updates.
External data
It is perfectly legit to store data outside of umbraco. Just create your own tables (or content to any service you created). You could use every ORM you want to, but I would recommend PetaPoco. The reason is obvious. Umbraco uses it also. And it will make you a better Umbraco developer. There is a detailed post on stackoverflow on how to work with external data in umbraco.
i am building my the model using ODataModelBuilder, i am trying to create navigation property however in the metadata i dont see any foreginkey indication, in my solution i am not using EF, so there is no foreignKey attribute, is it possible to add it by code?
As you clarified in your comment, the reason you want to add foreign key information is because your client application is not including related entities when you query the main entity. I don't think foreign keys are the problem here.
As an example, I'll use two entity types: Customer and Order. Every Customer has some number of associated Orders, so I have a navigation property on Customer called Orders that points to a collection of Orders. If I issue a GET request to /MyService.svc/Customers(1), the server will respond with all of the Customer's information as well as URLs that point to the related Order entities*. I won't, by default, get the data of each related Order within the same payload.
If you want a request to Customers(1) to include all of the data of its associated Orders, you would add the $expand query option to the request URI: /MyService.svc/Customers(1)?$expand=Orders. Using the WCF Data Services client (DataServiceContext), you can do this with .Expand():
DataServiceQuery<Customer> query = context.Customers.Expand("Orders");
However, WebAPI OData doesn't currently support $expand (the latest nightly builds do though, so this will change soon).
The other approach would be to make a separate request to fill in the missing Order data. You can use the LoadProperty() method to do this:
context.LoadProperty(customer, "Orders");
The LoadProperty approach should work with WebAPI as it stands today.
I know this doesn't answer your original question, but I hope addresses your intent.
*In JSON, which is the default format for WebAPI OData services, no links will show up on the wire, but they are still there "in spirit". The client is expected to be able to compute them on its own, which the WCF Data Services Client does.
We use CRM 4.0 at our institution and have no plans to upgrade presently as we've spend the last year and a half customising and extending the CRM to work with our processes.
A tiny part of model is a simply hierarchy, we have a group of learning rooms that has a one-to-many relationship with another entity that describes the courses available for that learning room.
Another entity has a list of all potential and enrolled students who have expressed an interest in whichever course.
That bit's all straightforward and works pretty well and is modelled into 3 custom entities.
Now, we've got an Admin application that reads the rooms and then wants to show the courses for that room, but only where there are enrolled students.
In SQL this is simplified to:
SELECT DISTINCT r.CourseName, r.OtherInformation
FROM Rooms r
INNER JOIN Students S
ON S.CourseId = r.CourseId
WHERE r.RoomId = #RoomId
And this indeed is very close to the eventual SQL that CRM generates.
We use a Crm QueryEntity, a Filter and a LinkEntity to represent this same structure.
The problem now is that the CRM normalizes the a customize entity into a Base Table which has the standard CRM entity data that all share, and then an ExtensionBase Table which has our customisations. To Give a flattened access to this, it creates a view that merges both tables.
This view is what is used by the Generated SQL.
Now the base tables have indices but the view doesn't.
The problem we have is that all we want to do is return Courses where the inner join is satisfied, it's enough to prove there are entries and CRM makes it SELECT DISTINCT, so we only get one item back for Room.
At first this worked perfectly well, but now we have thousands of queries, it takes well over 30 seconds and of course causes a timeout in anything but SMS.
I'm given to believe that we can create and alter indices on tables in CRM and that's not considered to be an unsupported modification; but what about Views ?
I know that if we alter an entity then its views are recreated, which would of course make us redo our indices when this happens.
Is there any way to hint to CRM4.0 that we want a specific index in place ?
Another source recommends that where you get problems like this, then it's best to bring data closer together, but this isn't something I'd feel comfortable in trying to engineer into our solution.
I had considered putting a new entity in that only has RoomId, CourseId and Enrolment Count in to it, but that smacks of being incredibly hacky too; After all, an index would resolve the need to duplicate this data and have some kind of trigger that updates the data after every student operation.
Lastly, whilst I know we're stuck on CRM4 at the moment, is this the kind of thing that we could expect to have resolved in CRM2011 ? It would certainly add more weight to the upgrading this 5 year old product argument.
Since views are "dynamic" (conceptually, their contents are generated on-the-fly from the base tables every time they are used), they typically can't be indexed. However, SQL Server does support something called an "indexed view". You need to create a unique clustered index on the view, and the query analyzer should be able to use it to speed up your join.
Someone asked a similar question here and I see no conclusive answer. The cited concerns from Microsoft are Referential Integrity (a non-issue here) and Upgrade complications. You mention the unsupported option of adding the view and managing it over upgrades and entity changes. That is an option, as unsupported and hackish as it is, it should work.
FetchXml does have aggregation but the query execution plans still uses the views: here is the SQL generated from a simple select count from incident:
'select
top 5000 COUNT(*) as "rowcount"
, MAX("__AggLimitExceededFlag__") as "__AggregateLimitExceeded__" from (select top 50001 case when ROW_NUMBER() over(order by (SELECT 1)) > 50000 then 1 else 0 end as "__AggLimitExceededFlag__" from Incident as "incident0" ...
I dont see a supported solution for your problem.
If you are building an outside admin app and you are hosting CRM 4 on-premise you could go directly to the database for your query bypassing the CRM API. Not supported but would allow you to solve the problem.
I'm going to add this as a potential answer although I don't believe its a sustainable or indeed valid long-term solution.
After analysing the indexes that CRM had defined automatically, I realised that selecting more information in my query would be enough to fulfil the column requirements of an Index and now the query runs in less then a second.