Joomla getItems running out of memory with big results set - caching

I've got just over 10,000,000 records in the database of my component and I think getItems/getListQuery is trying to load every single one of them into memory. The search form on the site extremely slow or comes back saying php is out of memory.
phpMyAdmin seems to be able to handle displaying this data - why not Joomla?
The strange thing is that the items are then displayed correctly using the globally set list limit of 5 to a page.
I've just looked and Joomla's cache is disabled - is that screwing me up here?
Many thanks in advance!

I fixed it in the end by copying the getPagination, getTotal, getItems etc. from the library's (list.php) and into my model (to override them). Then in each method, I made sure the results where returned instead of sending them to the cache.
The getTotal function seems to count the number of rows instead of doing a seperate count(*). That's ok with a few thousand records but over 1/2 million is asking for trouble!

Related

Laravel pagination in Data Table

I am using DataTable plugin in Laravel. I have a record of 3000 entries in some
But when i load that page it loads all 3000 records in the browser then create pagination, this slow down the page loading.
How to fix this or correct way
Use server-side processing.
Get help from some Laravel Packages. Such as Yajra's: https://yajrabox.com/docs/laravel-datatables/
Generally you can solve pagination either on the front end, the back end (server or database side), or a combination of both.
Server side processing, without a package, would mean setting up TOP/FETCH or make rows in data being returned from your server. 

You could also load a small amount (say 20) and then when the user scrolls to the bottom of the list, load another 20 or so. I mention the inclusion of front end processing as well because I’m not sure what your use cases are, but I imagine it’s pretty rare any given user actually needs to see 3000 rows at a time.

Given that Data Tables seems to have built-in functionality for paginating data, I think that #tersakyan is essentially correct — what you want is some form of back-end filtering or paginating of rows of data to limit what’s being sent to the front end.

I don’t know if that package works for you or not or what your setup looks like, but pagination can also be achieved directly from a DataBase returning data via the SQL (using TOP/FETCH for example) or could be implemented in a Controller or Service by tracking pages of data and “loading a page at a time” both from the server and then into the table. All you would need is a unique key to associate each "set of pages" for a specific request.
But for performance, you want to avoid both large data requests and operations on large sets of data. So the more you limit how much data is being grabbed or processed at any stage of your application using it, the more performant your application will be in principle.




Form datasource only loads a part of records at a time?

Today I'm wondering why (AX2009) LedgerTransVoucher form only seems to load a part of query results at the time. If the results include, say, 35K rows, only 10k are loaded at once. And if the user decides to print the results to Excel, they would only get 10k rows.
Why is this? The 10k is such a clean number I'm thinking a parameter somewhere but I have no idea where it could be hidden.
And yes, I know they should be using a report instead :)
Alas, this apparently had nothing to do with AX as such, but is some conflict with Citrix. False alarm it seems.

Joomla getItems default Pagination

Can anyone tell me if the getItems() function in the model automatically adds the globally set LIMIT before it actions the query (from getListQuery()). Joomla is really struggling, seemingly trying to cache the entire results (over 1 million records here!).
After looking in /libraries/legacy/model/list.php AND /libraries/legacy/model/legacy.php it appears that getItems() does add LIMIT to setQuery using $this->getState('list.limit') before it sends the results to the cache but if this is the case - why is Joomla struggling so much.
So what's going on? How come phpMyAdmin can return the limited results within a second and Joomla just times out?
Many thanks!
If you have one million records, you'll most definitely want to do as Riccardo is suggesting, override and optimize the model.
JModelList runs the query twice, once for the pagination numbers and then for the display query itself. You'll want to carefully inherit from JModellist to avoid the pagination query.
Also, the articles query is notorious for it's joins. You can definitely lose some of that slowdown (doubt you are using the contacts link, for example).
If all articles are visible to public, you can remove the ACL check - that's pretty costly.
There is no DBA from the West or the East who is able to explain why all of those GROUP BY's are needed, either.
Losing those things will help considerably. In fact, building your query from scratch might be best.
It does add the pagination automatically.
Its struggling is most likely due to a large dataset (i.e. 1000+ items returned in the collection) and many lookup fields: the content modules for example join as many as 10 tables, to get author names etc.
This can be a real killer, I had queries running for over one second with a dedicated server and only 3000 content items. One tag cloud component we found could take as long as 45 seconds to return a keywords list. If this is the situation (a lot of records and many joins), your only way out is to further limit the filters in the options to see if you can get some faster results (for example, limiting to articles in the last 3 months can reduce the time needed dramatically).
But if this is not sufficient or not viable, you're left with writing a new optimized query in a new model, which ultimately will bring the best performance optimization of any other optimization. In writing the query, consider leveraging the database specific optimizations, i.e. adding indexes, full-text indexes and only use joins if you really need them.
Also consider that joins must never grow with the number of fields, translations or else.
A constant query is easy for the db engine to optimize and cache, whilst a dynamic query will never be as efficient.

Caching expensive SQL query in memory or in the database?

Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps

CouchDB view is extremely slow

I have a CouchDB (v0.10.0) database that is 8.2 GB in size and contains 3890000 documents.
Now, I have the following as the Map of the view
function(doc) {emit([doc.Status], doc);
And it takes forever to load (4 hours and still no result).
Here's some extra information that might help describing the situation:
The view is not a temp view. The
view is defined before the 3890000
documents are inserted.
There isn't anything on the server. It is a ubuntu box with nothing but the defaults installed.
I see that my CPU is moving and working hard (sometimes shoots to 100%). The memory is moving as well but not increasing.
So my question is:
What is actually happening in the background?
Is this a "one time" thing where I have to wait once and it will somehow works later?
Don't emit the whole doc. It's unnecessary. You can instead run your query with include_docs=true, which will let you access the document via each row's doc attribute.
When you emit the whole doc you make the index as large or larger than your entire database. :)
Views are only updated the next time they are read. Upon reading, it processes all the documents that have been updated (created, updated, deleted) since the last time the view was read.
So even if you're view was defined before inserting the 3890000 documents, it will be processing the 3890000 documents for the view.
From http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
Note that by default views are not created and updated when a document is saved, but rather, when they are accessed. As a result, the first access might take some time depending on the size of your data while CouchDB creates the view. If preferable the views can also be updated when a document is saved using an external script that calls the views when updates have been made. An example can be found here: RegeneratingViewsOnUpdate
Also just came across this tip, which might be useful if you're running on Ubuntu:
http://nosql.mypopescu.com/post/1299848121/couchdb-and-ubuntu-configuration-trick-for

Resources