Pagination and sorting large amount of data - performance

I am wondering how to properly implement sorting and pagination mechanism in any application using tabular display of some data. Let's assume we have some entity with id and description. There are many instances of them in the database. I would like to sort alphabetically by description. But I want to have result fast. Is this possible to do without getting from database all of the records, sorting them and then displaying only some part of them? What is the best (from performance point of view) approach to this problem?
My question is rather hypothetical and does not pertain to any particular language or framework.

It can be done in two passes
The first one returns only the ids of the entity sorted by whatever criteria. The list of ids is saved in memory.
The second one takes one page, i.e. a sublist of ids and fetches the whole entities from the database for presentation.

Related

Is it OK to have multiple merge steps in an Excel Power query?

I have data from multiple sources - a combination of Excel (table and non table), csv and, sometimes, even a tsv.
I create queries for each data source and then I am bringing them together one step at a time or, actually, it's two steps: merge and then expand to bring in the fields I want for each data source.
This doesn't feel very efficient and I think that maybe I should be just joining everything together in the Data Model. The problem when I did that was that I couldn't then find a way to write a single query to access all the different fields spread across the different data sources.
If it were Access, I'd have no trouble creating a single query one I'd created all my relationships between my tables.
I feel as though I'm missing something: How can I build a single query out of the data model?
Hoping my question is clear. It feels like something that should be easy to do but I can't home in on it with a Google search.
It is never a good idea to push the heavy lifting downstream in Power Query. If you can, work with database views, not full tables, use a modular approach (several smaller queries that you then connect in the data model), filter early, remove unneeded columns etc.
The more work that has to be performed on data you don't really need, the slower the query will be. Please take a look at this article and this one, the latter one having a comprehensive list for Best Practices (you can also just do a search for that term, there are plenty).
In terms of creating a query from the data model, conceptually that makes little sense, as you could conceivably create circular references galore.

How to keep order of records in database

I'm developing an app where records appear in certain order. Users are allowed to reorder records as they wish, and I need to store that.
I have an order number for each record, but when they reorder records, that affects all records that go after that record - which could be quite expensive database operation.
Is there a clever way of storing record's order number, so that it doesn't affect many of the other records?
I have written a web application with at a high level similar requirements. I added two fields to a document which contained metadata about the user sortable list:
SortOrderVersion: integer
SortOrder: array of _id for documents
The SortOrder simply contained an ordered array of each document's _id. It was that list was manipulated by the client. The second field, SortOrderVersion was used to optimistically protect changes by multiple clients simultaneously. If the version being sent matched what was stored via findAndModify, then the update was allowed, and the number was incremented to prevent further changes by other clients. (And as a bonus, the changes were pushed to the other clients via a web socket connection).
When doing it this way, the server would do the sorting based on the list before returning it to the client as it was cached, and didn't change frequently. I could have pushed the busy work of sorting to the client, I just didn't think it was necessary.
I had considered storing the documents as a subdocument in a sorted array within a single document, but in my case their were too many opportunities where multiple users would be editing the details of the subdocuments which complicated updates and reordering significantly.
While I didn't need it for this web application, by storing the sort order independently, I could have extended the application to provide sorting easily on a per user basis.

Selecting items from combined stored procedure

I had two separate stored procedures each with their own column description and info. Now that i combined them both information of the stored procedures are displayed but with the column description of my first select statement (stored procedure). Is it possible in any way that i could distinguish between the two statements? The reason is because i am using a report page to display information but because i cannot have two datasets in one list, i thought i could combine all the information into one dataset and then distinguish between the information to show in different tables but within the same list/dataset.
Perhaps you should return one DataTable for the report, and just use the report grouping features. This is usually much, much easier than trying to manipulate multiple DataSets and/or multiple DataTables within a DataSet. I've rewritten ridiculously complex reports, using 5 or 6 DataSets, into one DataSet with report grouping and it was much, much simpler. I could live with the fact that data is repeated in rows, since creation and maintenance was easier.
That is, depending on your data of course. If you can logically query related data into one result set (e.g. Customers and their Orders), you can query customers and all of their orders in one result set. Naturally, the customer info will repeat for each record many times, but use the grouping feature of your reporting tool to display it once.

Grid - When should you switch from html to server side table processing?

,This question is likely subjective, but a lot of "grid" Javascript plugins have come out to help paginate and sort tables. They usually work in 2 ways, the first and simplest is that it takes an existing HTML <table> and converts it into a sortable and searchable information. The second is that it passes info to the server and has the server select info from the database to be displayed.
My question is this: At what point (size wise) is it more efficient to use server-side processing vs displaying all the data and have the "grid plugin" convert it to a sortable/searchable table client-side?
Using datatables as an example, I have to execute at least 3 queries to get total rows in the table, total filtered results for pagination, and the filtered results to be displayed for the specific selected page. Then every time I sort, I am querying again. Every time I move to another page, or search in the table, more queries.
If I was to pull the data once when the client visits the page, I would be executing a single query, and then formatting and pushing the results to the client all at once. This increases the page size, and possibly delays loading of the page once it gets too big. The upside is there will only one query, and all the sorting, searching, and pagination is handled by the plugin, so no waiting for a response and no more queries.
If I was to have just a few rows, I imagine just pushing the formatted table data to the client at the page load would be the fastest. But with thousands of rows, switching to server-side would be the most efficient way.
Where is the tipping point? Is there a tipping point, or is server-side or client-side the way to go 100% of the time?
The answer on your question can be only subjective. So I explain how I personally understand the problem and give me recommendation.
In my opinion the data with 2-3 row and 3-4 column can be displayed in HTML table without usage any plugin. The data you display for the user the more important will be that the user will be able to grasp the information which will be displayed. So I think that the information for example have to be good formatted and marked with colors and icons for example. This with help to grasp information from probably 10 rows of data, but not much more. If you just display table with 100 rows or more then you overtax the user. The user will have to analyse the data to get any helpful information from the table. Scrolling of the data makes this not easier.
So I think that one should give the user comfortable or at least convenient interface to sort and to filter the data from the table. The exact interface is mostly the matter of taste. For example the grid can have an additional filter bar
For filtering and even for sorting of the data it's important to have not pure strings, but to be able to distinguish the data types like integer (10 should be after 9 and not between 1 and 2), numbers (correct interpret '.' and ',' inside of numbers), dates (3/20/2012 should be grater as 4/15/2010) and so on. If you just convert HTML table to some grid you will have problems with correct filtering or sorting. Even if you use pure local JavaScript data to display in grid it would be important to have datasource which has some kind of type information and then to create the grid based in the data. In the case you can gives date as JavaScript Date or as ISO 8601 string "2012-03-20" and in the grid display the data corresponds the specified formatter as 3/20/2012 or 20-Mar-2012.
Whether you implement filtering, sorting and paging on the server side or on the client side is not really important for the user who open the page. It's important only that all works quickly enough. The exact choose of the grid plugin, the filtering (with filter toolbar or external controls) and styling of the grid depend on your taste and the project requirements.

Core Data find item sort order index with sort parameters

I have fetched an array of items from my core data store sorted by a "Name" property. The user of my app is able to change the name of the item, and the UI is supposed to update to show the results sorted in the new way with a cool animation. The only trouble I'm having is retrieving the new order index of the item after its name is updated. Is it inefficient for me to just fetch the whole result set again (could be quite large... 1000+ records) for the sole purpose of finding the objects new sort order index?
Can anyone think of a better way of accomplishing this task?
No, it is not inefficient it is the way it is supposed to work. If you have very large fetches you can make the fetches more efficient by restricting the fetch batch size, explicitly fetching as faults, fetching by attribute etc.
See the Core Data Programming Guide:Performance for details.

Resources