I have a page that will display all available data of a certain kind to all users. The data will be displayed separated by a number of criteria and I'm pondering certain design questions.
to make matters easier to understand, say I have sales data per month, per category and per location. on the page I will create an accordion for each month, within which I will have 1 table per category and in each table a list of locations.
so I'm wondering, which is better:
1) a single controller method that fetches all the data and:
a) does the work of converting the tabular format returned from the database to a hierarchical structure (because this is easier for the front-end to navigate) like:
{ Month, { Category, { Location, Value } } }
b) returns tabular data like
{ Month, Category, Location, Value }
and lets JQuery at the front end loop through to make it hierarchical
2) many smaller methods that each return distinct data and that need to be called by the front end? for example, a method that returns a distinct list of months for which there is data would be called once but JQuery would then need to loop through the results to query for the categories, which themselves would be looped through to get locations, sort of like this:
for (var m in GetMonths()) {
for (var c in GetCategories(m)) {
GetLocations(m, c);
}
}
as a final note, by "better" I mean both that the system will perform better under a heavy load, and that the code is structured in a more maintainable and DRY manner.
thank you for your consideration
Without actual performance numbers for the individual queries the answer will have to contain quite a bit of conjecture. But here are some generalizations.
If the data is being presented to "all users", then your most important decision is going to be cache the data. Having said that, the server side performance of the code will be much less important. Not unimportant, but if you're going to server it hundreds, thousands or millions of times for each time it is generated, you can tolerate a little more server-side work.
For scalability, prefer fewer calls to the server over more calls. That would point to the single controller method that returns all the data. Unless fetching the data is dramatically more time consuming that fetching just a single month of data, that would be your better choice.
For transforming the data to the hierarchy on the server vs. on the client, I tend to prefer to do it on the client. If there is a lot of this data, your client experience may be better if it is done on the server but doing it on the client has a could of advantages. First, client side code is automatically distributed. If you are trying to get the highest throughput, moving logic, especially presentation logic, to the client will free up the servers to do other work.
Also, by doing the transformation on the client, your server method is not as dependent on the the view. If you later decide that a different display would be better, it may just be a view change.
But, back to the beginning of the answer, if you are going to create this data once and cache it, doing the transformation at the server eliminates the need for each client to transform it.
Related
My application should handle a lot of entities (100.000 or more) with location and needs to display them only within a given radius. I basically store everything in SQL but using Redis for caching and optimization (mainly GEORADIUS).
I am adding the entities like the following example (not exactly this, I use Laravel framework with the built-in Redis facade but it does the same as here in the background):
GEOADD k 19.059982 47.494338 {\"id\":1,\"name\":\"Foo\",\"address\":\"Budapest, Astoria\",\"lat\":47.494338,\"lon\":19.059982}
Is it bad practice? Or will it make a negative impact on performance? Should I store only ID-s as member and make a following query to get the corresponding entities?
This is a matter of the requirements. There's nothing wrong with storing the raw data as members as long as it is unique (and it unique given the "id" field). In fact, this is both simple and performant as all data is returned with a single query (assuming that's what actually needed).
That said, there are at least two considerations for storing the data outside the Geoset, and just "referencing" it by having members reflect some form of their key names:
A single data structure, such as a Geoset, is limited by the resources of a single Redis server. Storing a lot of data and members can require more memory than a single server can provide, which would limit the scalability of this approach.
Unless each entry's data is small, it is unlikely that all query types would require all data returned. In such cases, keeping the raw data in the Geoset generates a lot of wasted bandwidth and ultimately degrades performance.
When data needs to be updated, it can become too expensive to try and update (i.e. ZDEL and then GEOADD) small parts of it. Having everything outside, perhaps in a Hash (or maybe something like RedisJSON) makes more sense then.
My task is to dump entire Azure tables with arbitrary unknown schemas. Standard code to do this resembles the following:
TableQuery<DynamicTableEntity> query = new TableQuery<DynamicTableEntity>();
foreach (DynamicTableEntity entity in table.ExecuteQuery(query))
{
// Write a dump of the entity (row).
}
Depending on the table, this works at a rate of 1000-3000 rows per second on my system. I'm guessing this (lack of) performance has something to do with separate HTTP requests issued to retrieve the data in chunks. Unfortunately, some of the tables are multi-gigabyte in size, so this takes a rather long time.
Is there a good way to parallelize the above or speed it up some other way? It would seem that those HTTP requests could be sent by multiple threads, as in web crawlers and the like. However, I don't see an immediate method to do so.
Unless you know the PartitionKeys of the entities in the table (or some other querying criteria which includes PartitionKey), AFAIK you would need to take a top down approach which you're doing right now. In order for you to fire queries in parallel which would work efficiently you have to include PartitionKey in your queries.
So I was thinking... Imagine you have to write a program that would represent a schedule of a whole college.
That schedule has several dimensions (e.g.):
time
location
indivitual(s) attending it
lecturer(s)
subject
You would have to be able to display the schedule from several standpoints:
everything held in one location in certain timeframe
everything attended by individual in certain timeframe
everything lecturered by a certain lecturer in certain timeframe
etc.
How would you save such data, and yet keep the ability to view it from different angles?
Only way I could think of was to save it in every form you might need it:
E.g. you have folder "students" and in it each student has a file and it contains when and why and where he has to be. However, you also have a folder "locations" and each location has a file which contains who and why and when has to be there. The more angles you have, the more size-per-info ratio increases.
But that seems highly inefficinet, spacewise.
Is there any other way?
My knowledge of Javascript is 0, but I wonder if such things would be possible with it, even in this space inefficient form.
If not that, I wonder if it would work in any other standard (C++, C#, Java, etc.) language, primarily in Java...
EDIT: Could this be done by using MySQL database?
Basically, you are trying to first store data and then present it under different views.
SQL databases were made exactly for that: from one side you build a schema and instantiate it in a database to store your data (the language is called Data Definition Language, DDL), then you make requests on it with the query language (SQL), what you call "views". There are even "views" objects in SQL databases to build these views Inside the database (rather than having to the code of the request in the user code).
MySQL can do that for sure, note that it is possible to compile some SQL engine for Javascript (SQLite for example) and use local web store to store the data.
There is another aspect to your question: optimization of the queries. While SQL can do most of the request job for your views. It is sometimes preferred to create actual copies of the requests results in so called "datamarts" (this is called de-normalizing a request), so that the hard work of selecting or computing aggregate/groups functions and so on is done once per period of time (imagine that a specific view changes only on Monday), then requesters just have to read these results. It is important in this case to separate at least semantically what is primary data from what is secondary data (and for performance/user rights reasons, physical separation is often a good idea).
Note that as you cited MySQL, I wrote about SQL but mostly any database technology could do that what you searched to do (hierarchical, object oriented, XML...) as long as the particular implementation that you use is flexible enough for your data and requests.
So in short:
I would use a SQL database to store the data
make appropriate views / requests
if I need huge request performance, make appropriate de-normalized data available
the language is not important there, any will do
,This question is likely subjective, but a lot of "grid" Javascript plugins have come out to help paginate and sort tables. They usually work in 2 ways, the first and simplest is that it takes an existing HTML <table> and converts it into a sortable and searchable information. The second is that it passes info to the server and has the server select info from the database to be displayed.
My question is this: At what point (size wise) is it more efficient to use server-side processing vs displaying all the data and have the "grid plugin" convert it to a sortable/searchable table client-side?
Using datatables as an example, I have to execute at least 3 queries to get total rows in the table, total filtered results for pagination, and the filtered results to be displayed for the specific selected page. Then every time I sort, I am querying again. Every time I move to another page, or search in the table, more queries.
If I was to pull the data once when the client visits the page, I would be executing a single query, and then formatting and pushing the results to the client all at once. This increases the page size, and possibly delays loading of the page once it gets too big. The upside is there will only one query, and all the sorting, searching, and pagination is handled by the plugin, so no waiting for a response and no more queries.
If I was to have just a few rows, I imagine just pushing the formatted table data to the client at the page load would be the fastest. But with thousands of rows, switching to server-side would be the most efficient way.
Where is the tipping point? Is there a tipping point, or is server-side or client-side the way to go 100% of the time?
The answer on your question can be only subjective. So I explain how I personally understand the problem and give me recommendation.
In my opinion the data with 2-3 row and 3-4 column can be displayed in HTML table without usage any plugin. The data you display for the user the more important will be that the user will be able to grasp the information which will be displayed. So I think that the information for example have to be good formatted and marked with colors and icons for example. This with help to grasp information from probably 10 rows of data, but not much more. If you just display table with 100 rows or more then you overtax the user. The user will have to analyse the data to get any helpful information from the table. Scrolling of the data makes this not easier.
So I think that one should give the user comfortable or at least convenient interface to sort and to filter the data from the table. The exact interface is mostly the matter of taste. For example the grid can have an additional filter bar
For filtering and even for sorting of the data it's important to have not pure strings, but to be able to distinguish the data types like integer (10 should be after 9 and not between 1 and 2), numbers (correct interpret '.' and ',' inside of numbers), dates (3/20/2012 should be grater as 4/15/2010) and so on. If you just convert HTML table to some grid you will have problems with correct filtering or sorting. Even if you use pure local JavaScript data to display in grid it would be important to have datasource which has some kind of type information and then to create the grid based in the data. In the case you can gives date as JavaScript Date or as ISO 8601 string "2012-03-20" and in the grid display the data corresponds the specified formatter as 3/20/2012 or 20-Mar-2012.
Whether you implement filtering, sorting and paging on the server side or on the client side is not really important for the user who open the page. It's important only that all works quickly enough. The exact choose of the grid plugin, the filtering (with filter toolbar or external controls) and styling of the grid depend on your taste and the project requirements.
I need to load a model, existing of +/- 20 tables from the database with Entity Framework.
So there are probably a few ways of doing this:
Use one huge Include call
Use many Includes calls while manually iterating the model
Use many IsLoaded and Load calls
Here's what happens with the 2 options
EF creates a HUGE query, puts a very heavy load on the DB and then again with mapping the model. So not really an option.
The database gets called a lot, with again pretty big queries.
Again, the database gets called even more, but this time with small loads.
All of these options weigh heavy on the performance. I do need to load all of that data (calculations for drawing).
So what can I do?
a) Heavy operation => heavy load => do nothing :)
b) Review design => but how?
c) A magical option that will make all these problems go away
When you need to load a lot of data from a lack of different tables, there is no "magic" solution which makes all problems go away. But in addition to what you have already discussed, you should consider projection. If you don't need every single property of an entity, it is often cheaper to project the information you do need, i.e.:
from parent in MyEntities.Parents
select new
{
ParentName = ParentName,
Children = from child in parent.Children
select new
{
ChildName = child.Name
}
}
One other thing to keep in mind is that for very large queries, the cost of compiling the query can often exceed the cost of executing it. Only profiling can tell you if this is the problem. If this turns out to be the problem, consider using CompiledQuery.
You might analyze the ratio of queries to updates. If you mostly upload the model once, then everything else is a query, then maybe you should store an XML representation of the model in the database as a "shadow" of the model. You should be able to either read the entire XML column in at once fairly quickly, or else maybe you can do your calculations (or at least the fetch of the values necessary for the calculations) using XQuery.
This assumes SQL Server 2005 or above.
You could consider caching your data in memory instead of getting it from the database each time.
I would recommend Enterprise Library Caching Application block: http://msdn.microsoft.com/en-us/library/dd203099.aspx