How to model data planning - datamodel

I want to build a data model which supports:
Data history - store every change of data. This is not a problem: http://en.wikibooks.org/wiki/Java_Persistence/Advanced_Topics#History
Data planning - user should be able to prepare a record with validity sometime in the future (for example, I know that customer name changes from May so I prepare record with validity of 1st of May).
How can I do point 2?
How can I do these things together (points 1 & 2)

If you really need point 2 - and I would think very hard about this, because in my experience users will never use it, and you will be spending a lot of effort to support something no one will ever use - anyway, if you really need it, then:
Make no changes at all directly in the table. All changes go through history.
Behind the scenes, periodically you will run a batch updater. This goes through history, finds all unapplied changes (set a status flag in the history to be able to rapidly find them), and applies them, and it checks the date to make sure it is time to apply the change.
You are going to have to deal with merges. What if the user says: In one month my name changes. Then goes in a and changes their name effective today. You have a conflict. How do you resolve it? You can either prevent any immediate changes, until past ones are done (or at least all new changes have a date after the last unapplied one). Or you can change it now, and change it again in a month.

I think storing the change of data is handled in the background, Look into data warehousing and slowly changing dimensions http://en.wikipedia.org/wiki/Slowly_changing_dimension in a Stored Procedure to handle new records and predecessors of those new records which will be known as "expired records". Once you allowed for SCD it's quite easy to find those historic expired records that you're after.

Related

Locking records returned by context? Or perhaps a change to my approach

I'm not sure whether I need a way to lock records returned by the context or simply need a new approach.
Here's the story. We currently have a small number of apps that integrate with our CRM. Some of them open a XrmServiceContext and return a few thousand record to perform updates. These scripts are calling SaveChanges along the way but there will still be accounts near the end that will be saved a couple of minutes after the context return them. If a user updates the record during this time, their changes are overwritten by the script.
Is there a way of locking the records until the context has saved the update back or is there a better approach I should be taking?
Kit
In my opinion, this type of database transaction issue is what CRM is currently lacking the most. There is no way to ensure that someone else doesn't monkey with your data, it's always a last-one-in-wins world in CRM.
With that being said, my suggestion would be to only update the attributes you care about. If you're returning all columns for an entity, when you update that entity, you're possibly going to update all the attributes of the entity, even if you only updated one of them.
If you're dealing with a system were you can't tolerate the last-one-in-wins mentality, then you're probably better off not using CRM.
Update 1
CRM 2015 SP1 and above supports Optimistic Updates. Which allows the use of a version number to ensure that no one has updated the record since you retrieved it.
You have a several options here, it just depends on what you want to do. First of all though, if you can move some of these automated processes to off-time hours, then that's the best option.
Another option would be to retrieve each record 1 by 1 instead of by 1000+.
If you are only updating a percentage of the records retrieved, then you would be better off to check before saving if an update occurred (comparing the modified date). If the modified date changed, then you need to do a single retrieve and then save.
At first thought, I would create a field or status that indicates a pending operation and then use JScript in the form OnLoad event to warn/lock the form. When you process completes, it could clear the flag.

Oracle transaction read-consistency?

I have a problem understanding read consistency in database (Oracle).
Suppose I am manager of a bank . A customer has got a lock (which I don't know) and is doing some updating. Now after he has got a lock I am viewing their account information and trying to do some thing on it. But because of read consistency I will see the data as it existed before the customer got the lock. So will not that affect inputs I am getting and the decisions that I am going to make during that period?
The point about read consistency is this: suppose the customer rolls back their changes? Or suppose those changes fail because of a constraint violation or some system failure?
Until the customer has successfully committed their changes those changes do not exist. Any decision you might make on the basis of a phantom read or a dirty read would have no more validity than the scenario you describe. Indeed they have less validity, because the changes are incomplete and hence inconsistent. Concrete example: if the customer's changes include making a deposit and making a withdrawal, how valid would your decision be if you had looked at the account when they had made the deposit but not yet made the withdrawal?
Another example: a long running batch process updates the salary of every employee in the organisation. If you run a query against employees' salaries do you really want a report which shows you half the employees with updated salaries and half with their old salaries?
edit
Read consistency is achieved by using the information in the UNDO tablespace (rollback segments in the older implementation). When a session reads data from a table which is being changed by another session, Oracle retrieves the UNDO information which has been generated by that second session and substitutes it for the changed data in the result set presented to the first session.
If the reading session is a long running query it might fail because due to the notorious ORA-1555: snapshot too old. This means the UNDO extent which contained the information necessary to assemble a read consistent view has been overwritten.
Locks have nothing to do with read consistency. In Oracle writes don't block reads. The purpose of locks is to prevent other processes from attempting to change rows we are interested in.
For systems that have large number of users, where users may "hold" the lock for a long time the Optimistic Offline Lock pattern is usually used, i.e. use the version in the UPDATE ... WHERE statement.
You can use a date, version id or something else as the row version. Also the virtual columm ORA_ROWSCN may be used but you need to read up on it first.
When a record is locked due to changes or an explicit lock statement, an entry is made into the header of that block. This is called an ITL (interested transaction list). When you come along to read that block, your session sees this and knows where to go to get the read consistent copy from the rollback segment.

(ASP.NET) How would you go about creating a real-time counter which tracks database changes?

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?

Client-server synchronization pattern / algorithm?

I have a feeling that there must be client-server synchronization patterns out there. But i totally failed to google up one.
Situation is quite simple - server is the central node, that multiple clients connect to and manipulate same data. Data can be split in atoms, in case of conflict, whatever is on server, has priority (to avoid getting user into conflict solving). Partial synchronization is preferred due to potentially large amounts of data.
Are there any patterns / good practices for such situation, or if you don't know of any - what would be your approach?
Below is how i now think to solve it:
Parallel to data, a modification journal will be held, having all transactions timestamped.
When client connects, it receives all changes since last check, in consolidated form (server goes through lists and removes additions that are followed by deletions, merges updates for each atom, etc.).
Et voila, we are up to date.
Alternative would be keeping modification date for each record, and instead of performing data deletes, just mark them as deleted.
Any thoughts?
You should look at how distributed change management works. Look at SVN, CVS and other repositories that manage deltas work.
You have several use cases.
Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server; server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this "transaction replication".
Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.
Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.
You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request ("What revision should I have?") before attempting to synchronize. And even then, the query ("All deltas since 2149") is delightfully simple for the client and server to process.
As part of the team, I did quite a lot of projects which involved data syncing, so I should be competent to answer this question.
Data syncing is quite a broad concept and there are way too much to discuss. It covers a range of different approaches with their upsides and downsides. Here is one of the possible classifications based on two perspectives: Synchronous / Asynchronous, Client/Server / Peer-to-Peer. Syncing implementation is severely dependent on these factors, data model complexity, amount of data transferred and stored, and other requirements. So in each particular case the choice should be in favor of the simplest implementation meeting the app requirements.
Based on a review of existing off-the-shelf solutions, we can delineate several major classes of syncing, different in granularity of objects subject to synchronization:
Syncing of a whole document or database is used in cloud-based applications, such as Dropbox, Google Drive or Yandex.Disk. When the user edits and saves a file, the new file version is uploaded to the cloud completely, overwriting the earlier copy. In case of a conflict, both file versions are saved so that the user can choose which version is more relevant.
Syncing of key-value pairs can be used in apps with a simple data structure, where the variables are considered to be atomic, i.e. not divided into logical components. This option is similar to syncing of whole documents, as both the value and the document can be overwritten completely. However, from a user perspective a document is a complex object composed of many parts, but a key-value pair is but a short string or a number. Therefore, in this case we can use a more simple strategy of conflict resolution, considering the value more relevant, if it has been the last to change.
Syncing of data structured as a tree or a graph is used in more sophisticated applications where the amount of data is large enough to send the database in its entirety at every update. In this case, conflicts have to be resolved at the level of individual objects, fields or relationships. We are primarily focused on this option.
So, we grabbed our knowledge into this article which I think might be very useful to everyone interested in the topic => Data Syncing in Core Data Based iOS apps (http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en)
What you really need is Operational Transform (OT). This can even cater for the conflicts in many cases.
This is still an active area of research, but there are implementations of various OT algorithms around. I've been involved in such research for a number of years now, so let me know if this route interests you and I'll be happy to put you on to relevant resources.
The question is not crystal clear, but I'd look into optimistic locking if I were you.
It can be implemented with a sequence number that the server returns for each record. When a client tries to save the record back, it will include the sequence number it received from the server. If the sequence number matches what's in the database at the time when the update is received, the update is allowed and the sequence number is incremented. If the sequence numbers don't match, the update is disallowed.
I built a system like this for an app about 8 years ago, and I can share a couple ways it has evolved as the app usage has grown.
I started by logging every change (insert, update or delete) from any device into a "history" table. So if, for example, someone changes their phone number in the "contact" table, the system will edit the contact.phone field, and also add a history record with action=update, table=contact, field=phone, record=[contact ID], value=[new phone number]. Then whenever a device syncs, it downloads the history items since the last sync and applies them to its local database. This sounds like the "transaction replication" pattern described above.
One issue is keeping IDs unique when items could be created on different devices. I didn't know about UUIDs when I started this, so I used auto-incrementing IDs and wrote some convoluted code that runs on the central server to check new IDs uploaded from devices, change them to a unique ID if there's a conflict, and tell the source device to change the ID in its local database. Just changing the IDs of new records wasn't that bad, but if I create, for example, a new item in the contact table, then create a new related item in the event table, now I have foreign keys that I also need to check and update.
Eventually I learned that UUIDs could avoid this, but by then my database was getting pretty large and I was afraid a full UUID implementation would create a performance issue. So instead of using full UUIDs, I started using randomly generated, 8 character alphanumeric keys as IDs, and I left my existing code in place to handle conflicts. Somewhere between my current 8-character keys and the 36 characters of a UUID there must be a sweet spot that would eliminate conflicts without unnecessary bloat, but since I already have the conflict resolution code, it hasn't been a priority to experiment with that.
The next problem was that the history table was about 10 times larger than the entire rest of the database. This makes storage expensive, and any maintenance on the history table can be painful. Keeping that entire table allows users to roll back any previous change, but that started to feel like overkill. So I added a routine to the sync process where if the history item that a device last downloaded no longer exists in the history table, the server doesn't give it the recent history items, but instead gives it a file containing all the data for that account. Then I added a cronjob to delete history items older than 90 days. This means users can still roll back changes less than 90 days old, and if they sync at least once every 90 days, the updates will be incremental as before. But if they wait longer than 90 days, the app will replace the entire database.
That change reduced the size of the history table by almost 90%, so now maintaining the history table only makes the database twice as large instead of ten times as large. Another benefit of this system is that syncing could still work without the history table if needed -- like if I needed to do some maintenance that took it offline temporarily. Or I could offer different rollback time periods for accounts at different price points. And if there are more than 90 days of changes to download, the complete file is usually more efficient than the incremental format.
If I were starting over today, I'd skip the ID conflict checking and just aim for a key length that's sufficient to eliminate conflicts, with some kind of error checking just in case. (It looks like YouTube uses 11-character random IDs.) The history table and the combination of incremental downloads for recent updates or a full download when needed has been working well.
For delta (change) sync, you can use pubsub pattern to publish changes back to all subscribed clients, services like pusher can do this.
For database mirror, some web frameworks use a local mini database to sync server side database to local in browser database, partial synchronization is supported. Check meteror.
This page clearly describes mosts scenarios of data synchronization with patterns and example code: Data Synchronization: Patterns, Tools, & Techniques
It is the most comprehensive source I found, considering whole of delta syncs, strategies on how to handle deletions and server-to-client and client-to-server sync. It is a very good starting point, worth a look.

Designing a Calendar system like Google Calendar [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have to create something similiar to Google Calendar, so I created an events table that contains all the events for a user.
The hard part is handling re-occurring events, the row in the events table has an event_type field that tells you what kind of event it is, since an event can be for a single date only, OR a re-occuring event every x days.
The main design challenge is handling re-occurring events.
When a user views the calendar, using the month's view, how can I display all the events for the given month? The query is going to be tricky, so I thought it would be easier to create another table and create a row for each and every event, including the re-occuring events.
What do you guys think?
I'm tackling exactly this problem, and I had completely spaced iCalendar (rfc 2445) up until reading this thread, so I have no idea how well this will or won't integrate with that. Anyway the design I've come up with so far looks something like this:
You can't possibly store all the instances of a recurring event, at least not before they occur, so I simply have one table that stores the first instance of the event as an actual date, an optional expiration, and nullable repeat_unit and repeat_increment fields to describe the repetition. For single instances the repition fields are null, otherwise the units will be 'day', 'week', 'month', 'year' and increment is simply the multiple of units to add to start date for the next occurrence.
Storing past events only seems advantageous if you need to establish relationships with other entities in your model, and even then it's not necessary to have an explicit "event instance" table in every case. If the other entities already have date/time "instance" data then a foreign key to the event (or join table for a many-to-many) would most likely be sufficient.
To do "change this instance"/"change all future instances", I was planning on just duplicating the events and expiring the stale ones. So to change a single instances, you'd expire the old one at it's last occurrence, make a copy for the new, unique occurrence with the changes and without any repetition, and another copy of the original at the following occurrence that repeats into the future. Changing all future instances is similar, you would just expire the original and make a new copy with the changes and repition details.
The two problems I see with this design so far are:
It makes MWF-type events hard to represent. It's possible, but forces the user to create three separate events that repeat weekly on M,W,F individually, and any changes they want to make will have to be done on each one separately as well. These kind of events aren't particularly useful in my app, but it does leave a wart on the model that makes it less universal than I'd like.
By copying the events to make changes, you break the association between them, which could be useful in some scenarios (or, maybe it would just be occasionally problematic.) The event table could theoretically contain a "copied_from" id field to track where an event originated, but I haven't fully thought through how useful something like that would be. For one thing, parent/child hierarchical relationships are a pain to query from SQL, so the benefits would need to be pretty heavy to outweigh the cost for querying that data. You could use a nested-set instead, I suppose.
Lastly I think it's possible to compute events for a given timespan using straight SQL, but I haven't worked out the exact details and I think the queries usually end up being too cumbersome to be worthwhile. However for the sake of argument, you can use the following expression to compute the difference in months between the given month and year an event:
(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12))
Building on the last example, you could use MOD to determine whether difference in months is the correct multiple:
MOD(:month + (:year * 12)) - (MONTH(occursOn) + (YEAR(occursOn) * 12), repeatIncrement) = 0
Anyway this isn't perfect (it doesn't ignore expired events, doesn't factor in start / end times for the event, etc), so it's only meant as a motivating example. Generally speaking though I think most queries will end up being too complicated. You're probably better off querying for events that occur during a given range, or don't expire before the range, and computing the instances themselves in code rather than SQL. If you really want the database to do the processing then a stored procedure would probably make your life a lot easier.
As previously stated, don't reinvent the wheel, just enhance it.
Checkout VCalendar, it is open source, and comes in PHP, ASP, and ASP.Net (C#)!
Also you could check out Day Pilot which offers a calendar written in Asp.Net 2.0. They offer a lite version that you could check out, and if it works for you, you could purchase a license.
Update (9/30/09):
Unless of course the wheel is broken! Also, you can put a shiny new coat of paint if you like (ie: make a better UI). But at least try to find some foundation to build off of, since the calendar system can be tricky (with Repeating events), and it's been done thousands of times.
Attempting to store each instance of every event seems like it would be really problematic and, well, impossible. If someone creates an event that occurs "every thursday, forever", you clearly cannot store all the future events.
You could try to generate the future events on demand, and populate the future events only when necessary to display them or to send notification about them. However, if you are going to build the "on-demand" generation code anyway, why not use it all the time? Instead of pulling from the event table, and then having to use on-demand event generation to fill in any new events that haven't been added to the table yet, just use the on-demand event generation exclusively. The end result will be the same. With this scheme, you only need to store the start and end dates and the event frequency.
I don't see any way that you can avoid having on-demand event generation, so I can't see the utility in the event table. If you want it for the sake of caching, then I think you're taking the wrong approach. First, it's a poor cache because you can't avoid on-demand event generation anyway. Second, you should probably be caching at a higher level anyway. If you want to cache, then cache generated pages, not events.
As far as making your polling more efficient, if you are only polling every 15 minutes, and your database and/or server can't handle the load, then you are already doomed. There's no way your database will be able to handle users if it can't handle much, much more frequent polling without breaking a sweat.
I would say start with the ical standard. If you use it as your model, then you'll be able to do everything that google calendar, outlook, mac ical (the program), and get virtually instant integration with them.
From there, time to bone up on your ajax and javascript cuz you can't have a flashy web ui with drag drop and multiple calendars without a ton of ajax and javascript.
You should have a start date, end date, and expiration date. Single day events would have the same start date and end date, and allows you to do partial day events as well. As for re-occuring events, then the start and end date would be for the same day, but have different times, then you have an enumeration or table that specifies the repeat frequency (daily, weekly, monthly, etc).
This allows you to say "this event appears every day" for daily, "this event appears on the 2nd day of every week" for weekly, "this event appears on the 5th day of every month" for monthly, "this event appears on the 215th day of every year" for yearly as long as the date is less than the expiration date.
Darren,
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
This is why I wanted to create another table that would expand out all the events/re-occuring events, this way I can simple hit that table and get the users months view of events without doing any complicated querying and business logic, AND it will make polling much more effecient also.
The question is, should this secondary table be for the next day or month? What makes more sense? Since the maximum a user can view is a months view, I am leaning towards a table that writes out all the events for a given month.
(ofcourse I will have to maintain this secondary table for any edits the user might make to the original events table).
ChanChan,
I have designed it with the same sort of functionality actually, but I am just referring to how I will go about storing events, specifically how to handle re-occurring events.
The brute-force-ish but still reasonable way would be to create a new row in your single events table for every instance of the recurring event, all pointing not to the event preceding it in the series but to the first event in the series. This simplifies selecting and/or deleting all elements in a particular series, since you can select based on parent id. It also allows users to delete individual items from a series without affecting the rest of them.
This query gets you the series that starts on element 3:
SELECT * FROM events WHERE id = 3 OR parentid = 3
To get all items for this month, assuming you'd have a start date and an end date in your events table, all you'd have to do is:
SELECT * FROM events WHERE startdate >= '2008-08-01' AND enddate <= '2008-08-31'
Handling the creation/modification of series programmatically wouldn't be very difficult, but it really would depend on the feature set you want to provide and how you think it'll be used. If you want to differentiate between series and events, you could have a separate series table and a nullable series_id on your events, allowing you the freedom to muck about with individual events while still retaining control over your series.
From past experience I would create a new record for each occurring event and then have a column which references the previous event so you can track all events in a series.
This has two advantages:
No complicated routines to work out the next event date
Individual occurrences can be edited without effecting the rest
Hope this gives you some food for thought :)
I have to agree with #ChanChan on reading the ical spec for how to store these things. There is no easy way to handle recurrences, especially ones that have exceptions. I've built and rebuilt and rebuilt a database to handle this, and I keep coming back to ical.
It's not a bad idea to create a subordinate table, depending on use cases. The algorithm for calculating exactly when occurrences . . . um, occur . . . can indeed be quite complex. There's no getting away from running it, but it's worth considering caching the results.
#GateKiller
I hadn't thought of the case where you edit individual occurrences. It makes sense you would store the occurrences separately in that case.
When you do that, though, how far in the future do you store events? Do you pick an arbitrary date? Do you auto-generate the new occurrences the first time a user browses out into future months/years?
Also, how do you handle the case where the user wants to edit the whole series. "We've had a meeting every Tuesday morning at 10:30 but we're going to start meeting on Wednesday at 8"
I think I understand your second paragraph to mean you are considering a second events table that has a row for each occurrence of an event. I would avoid that.
Re-occurring events should have a start date and a stop date (which could be Null for events that continue every X days "forever") You'll have to decide what kinds of frequency you want to allow -- every X days, the Nth day of each month, every given weekday, etc.
I'd probably tend toward two tables - one for one time events and a second for recurring events. You'll have to query and display the two separately.
If I were going to tackle this (and I'd try as hard as I can to avoid reinventing this wheel) I'd look for open-source libraries or, at the very least, open source projects with Calendars that you can look at. Any recommendations guys?
undefined wrote:
…this table will be hit pretty
hard, especially if I am going to be
sending out emails to remind people of
their events (events can be for
specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Create a table for the notifications. Poll only it.
Update the notification table when events (recurring or otherwise) are updated.
EDIT: A database View might not violate normal forms, if that's a concern. But, you'll probably want to track which notifications were sent and which have not yet been sent somewhere anyway.
Derek Park,
I would be creating each and every instance of an event in a table, and this table would be regenerated every month (so any event that was set to reoccurr 'forever' would be regenerated one month in advance using a windows service or maybe at the sql server level).
The polling won't only be done every 15 minutes, that might only be for polls related to email notifications. When someone wants to view their events for a month, I will have to fetech all their events, and re-occuring events and figure out which events to display (since a re-occuring event might have been created 6 months ago, but relates to a month the user is viewing).
Zack, i'm not too concerned with having a perfectly normalized database, the fact that I'm thinking of creating a secondary table is already breaking one of the rules hehe. My core database tables are following 'the rules', but I don't mind creating secondary tables/columns at times when it benefits things performance wise.
That is how I have designed my events table actually, but when thinking about it, say I have 100K users who have create events, this table will be hit pretty hard, especially if I am going to be sending out emails to remind people of their events (events can be for specific times of the day also!), so I could be polling this table every 15 minutes potentially!
Databases do exception jobs of handling sets of data, so i wouldn't be too worried about that. What you should do is use this as your primary table, and then as events expire then move them into another table (like an archive).
The next thing is you want to try is to query the db as little as possible, so move the information into a caching tier (like velocity) and just persist data to the database.
Then, you can partition the information across multiple databases for scaling purposes. ie users 1-10000 calendars exist on server 1, 10001 - 20000 exist on server 2, etc.
That's how i would scale a solution like this, but i still think the original solution i proposed is the way to go, it's just how you scale it that becomes the question.
The Ra-Ajax Calendar starter-kit features a sample of handling the RenderDate event which can modify the dates of specifically. Though the "recurring events" is more of an algorithmic thing and here I doubt very few calendars will help you much...
If anyone is doing Ruby there's a great library Runt that does this kind of thing. Worth checking out. http://runt.rubyforge.org/

Resources