How to improve GWT performance? - performance

I am using GWT 2.4. There are times when I have to show huge amount of records for example: 50,000 records on my screen in a gridtable or flextable. But it takes very long to load that screen say around 30 mins or so; or, ultimately the screen hangs or at times IE displays an error saying that this might take too long and your application will stop working, so do you wish to continue.
Is there any solution to improve gwt performance?

Don't bring all data at once, you should bring it in pages, as the comments suggested here.
However, paging not be trivial , as it might be that during paging your db is filled with more entries, and if you're using some sorting algorithm for the results,
the new entries might ruin your sorting (for example, when trying to fetch page #2, some entries that should have been on the first page are inserted.
You may decided that you create some sort of "cursor" for paging purposes and it will reflect the state of your database at the point you created it, so you will ignore entires that are entered during traversal between pages.
Another option you may consider, as part of paging is providing only a small version for each record - i.e - only the most important details, and let the user double click if he wants to see the whole details for the record - this can also provide you some performance improvement within each page.

Related

Caching expensive SQL query in memory or in the database?

Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps

How to optimize "text search" for inverted index and relational database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Update 2022-08-12
I re-thought about it and realized I was overcomplicating it. I found the best way to enhance this system is by using good old information retrieval techniques ie using 'location' of a word in a sentence and 'ranking' queries to display best hits. The approach is illustrated in this following picture.
Update 2015-10-15
Back in 2012, I was building a personal online application and actually wanted to re-invent the wheel because am curious by nature, for learning purposes and to enhance my algorithm and architecture skills. I could have used apache lucene and others, however as I mentioned I decided to build my own mini search engine.
Question: So is there really no way to enhance this architecture except by using available services like elasticsearch, lucene and others?
Original question
I am developing a web application, in which users search for specific titles (say for example : book x, book y, etc..) , which data is in a relational database (MySQL).
I am following the principle that each record that was fetched from the db, is cached in memory , so that the app has less calls to the database.
I have developed my own mini search engine , with the following architecture:
This is how it works:
a) User searches a record name
b) The system check what character the query starts with, checks if query there : get record. If not there, adds it and get all matching records from database using two ways:
Either query already there in the Table "Queries" (which is a sort of history table) thus get record based on IDs (Fast performance)
Or, otherwise using Mysql LIKE %% statement to get records/ids (Also then keep the used query by the user in history table Queries along with the ids it maps to).
-->Then It adds records and their ids to the cache and Only the ids to the inverted index map.
c) results are returned to the UI
The system works fine, however I have Two main issues, that i couldn't find a good solution for (been trying for the past month):
First issue:
if you check point (b) , case where no query "history" is found and it has to use the Like %% statement : this process becomes time consuming when the query matches numerous records in the database (instead of one or two):
It will take some time to get records from Mysql (this is why i used INDEXES on the specific columns)
Then time to save query history
Then time to add records/ids to cache and inverted index maps
Second issue:
The application allows users to add themselves new records, that can immediately be used by other users logged in the to application.
However to achieve this, inverted index map and table "queries" have to be updated so that in case any old query matches to the new word. For example if a new record "woodX" is being added, still the old query "wood" does map to it. So in order to re-hook query "wood" to this new record, here is what i am doing now:
new record "woodX" gets added to "records" table
then i run a Like %% statement to see which already existing query in table "queries" does map to this record(for example "wood"), then add this query with the new record id as a new row: [ wood, new id].
Then in memory, update inverted index Map's "wood" key's value (ie the list), by adding the new record Id to this list
--> Thus now if a remote user searches "wood" it will get from memory : wood and woodX
The Issue here is also time consumption. Matching all query histories (in table queries) with the newly added word takes a lot of time (the more matching queries, the more time). Then the in memory update also takes a lot of time.
What i am thinking of doing to fix this time issue, is to return the desired results to the user first , then let the application POST an ajax call with the required data to achieve all these UPDATE tasks. But i am not sure if this is a bad practice or an unprofessional way of doing things?
So for the past month ( a bit more) i tried to think of the best optimization/modification/update for this architecture, but I am not an expert in the document retrieval field (actually its my first mini search engine ever built).
I would appreciate any feedback or guidance on what i should do to be able to achieve this kind of architecture.
Thanks in advance.
PS:
Its a j2ee application using servlets.
I am using MySQL innodb (thus i cannot use full-text search option)
I would strongly recommend Sphinx Search Server, wchich is best optimized in full-text searching. Visit http://sphinxsearch.com/.
It's designed to work with MySQL, so it's an addition to Your current workspace.
I do not pretend to have THE solution but here is my ideas.
First, I though like you for time-consuming queries LIKE%% : I would execute a query limited to a few answers in MySQL, like a dozen, return that to user, and wait to see if user wants more matching records, or launch in background the full-query, depending on you indexation needs for future searches.
More generally, I think that storing everything in memory could lead, one day, to too-much memory consumption. And althrough the search-engine becomes faster and faster when it keeps everything in memory, you'll have to keep all these caches up-to-date when data is added or updated and it will certainly take more and more time.
That's why I think the solution I saw a day in an "open-source forum software" (I couldn't remember its name) is not too bad for text searching in posts : each time a data is inserted, a table named "Words" keeps tracks of every existing word, and another table (let's say "WordsLinks") the link between each word and posts it appears in.
This kind of solution has some drawbacks:
Each Insert, Delete, Update in database is a lot slower
Data selection for search engine must be anticipated : if you choose to keep two letter words you never kept, it is too late for already recorded data, unless you launch a complete data re-processing.
You must take care of DELETE as well as UPDATE and INSERT
But I think there are some big advantages:
Computing time is probably the same than the "memory solution" (eventually), but it is divided in each database Create/Update/Delete, rather than at query time.
Looking for a whole word, or words "starting with" is instantaneous : when indexed, searching in "Words" table is dichotomic. And "WordLinks" table query is very fast either with an index.
Looking for multiple words at the same time could be simple : gather a group of "WordLinks" for each found Word, and execute an intersection on them to keep only "Database Ids" common to all these groups. For example with the words "tree" and "leaf", the first one could give Table records {1, 4, 6}, and the second one could give {1, 3, 6, 9}. So with an intersection it is simple to keep only common parts : {1, 6}.
A "Like %%" in a single-column table is probably faster than a lot of "Like %%" in different fields of different tables. And each database engine handles some cache : "Words" table could be little enough to be kept in memory
I think there is a small risk of performance and memory problems if data becomes huge.
As every search is fast, you can even look for synonyms. For example search "network" if user didn't find anything with "ethernet".
You can apply rules, like splitting camel case words to generate for example the 3 words "wood", "X", "woodX" from "woodX". Each "word" is very lightweight to store and find, so you can do a lot of things.
I think the solution you need could be a blend of methods : for example you can keep lightweight UPDATE, INSERT, DELETE, and launch "Words" and "WordsLinks" feeding from a TRIGGER.
Just for anecdote, I saw a software developped by my company in which it was decided to keep "everything" (!) in memory. It leads us to recommend to our customers to buy servers with 64GB RAM. A little bit expensive. It explains why I am very prudent when I see solutions that could lead, eventually, to memory filling.
I have to say, I don't think your design fits the problem very well. The issues that you see now are consequences of that. And apart from that, your current solution doesn't scale.
Here is a possible solution:
Redesign your database to only contain authoritative data, but no derived data. So all cache entries must vanish from MySQL.
Keep data only for the duration of a request in memory within your application. This makes the design of your application much simpler (think race conditions) and enables you to scale to a sensible number of clients.
Introduce a caching layer. I'd strongly recommend to use an established product, rather than building this yourself. This frees you of all the custom built caching logic in your application and even does the job much better.
You can take a look at Redis or Memcached for the caching layer. I think an LRU strategy should fit here. Depending on how complex your queries become, a dedicated indexed search mechanism like Lucene might make sense as well.
I'm sure this can be implemented in MySQL but it would be a lot less effort to just use an existing search-oriented database such as Elasticsearch. It uses Lucene library to implement the inverted index, has extensive documentation, supports horizontal scaling, fairly simple query language and so forth. I guess it has been quite a lot of work to get this far, and it will be even more work to handle caches, race conditions, bugs, performance issues etc. to make the solution "production grade".

Grid - When should you switch from html to server side table processing?

,This question is likely subjective, but a lot of "grid" Javascript plugins have come out to help paginate and sort tables. They usually work in 2 ways, the first and simplest is that it takes an existing HTML <table> and converts it into a sortable and searchable information. The second is that it passes info to the server and has the server select info from the database to be displayed.
My question is this: At what point (size wise) is it more efficient to use server-side processing vs displaying all the data and have the "grid plugin" convert it to a sortable/searchable table client-side?
Using datatables as an example, I have to execute at least 3 queries to get total rows in the table, total filtered results for pagination, and the filtered results to be displayed for the specific selected page. Then every time I sort, I am querying again. Every time I move to another page, or search in the table, more queries.
If I was to pull the data once when the client visits the page, I would be executing a single query, and then formatting and pushing the results to the client all at once. This increases the page size, and possibly delays loading of the page once it gets too big. The upside is there will only one query, and all the sorting, searching, and pagination is handled by the plugin, so no waiting for a response and no more queries.
If I was to have just a few rows, I imagine just pushing the formatted table data to the client at the page load would be the fastest. But with thousands of rows, switching to server-side would be the most efficient way.
Where is the tipping point? Is there a tipping point, or is server-side or client-side the way to go 100% of the time?
The answer on your question can be only subjective. So I explain how I personally understand the problem and give me recommendation.
In my opinion the data with 2-3 row and 3-4 column can be displayed in HTML table without usage any plugin. The data you display for the user the more important will be that the user will be able to grasp the information which will be displayed. So I think that the information for example have to be good formatted and marked with colors and icons for example. This with help to grasp information from probably 10 rows of data, but not much more. If you just display table with 100 rows or more then you overtax the user. The user will have to analyse the data to get any helpful information from the table. Scrolling of the data makes this not easier.
So I think that one should give the user comfortable or at least convenient interface to sort and to filter the data from the table. The exact interface is mostly the matter of taste. For example the grid can have an additional filter bar
For filtering and even for sorting of the data it's important to have not pure strings, but to be able to distinguish the data types like integer (10 should be after 9 and not between 1 and 2), numbers (correct interpret '.' and ',' inside of numbers), dates (3/20/2012 should be grater as 4/15/2010) and so on. If you just convert HTML table to some grid you will have problems with correct filtering or sorting. Even if you use pure local JavaScript data to display in grid it would be important to have datasource which has some kind of type information and then to create the grid based in the data. In the case you can gives date as JavaScript Date or as ISO 8601 string "2012-03-20" and in the grid display the data corresponds the specified formatter as 3/20/2012 or 20-Mar-2012.
Whether you implement filtering, sorting and paging on the server side or on the client side is not really important for the user who open the page. It's important only that all works quickly enough. The exact choose of the grid plugin, the filtering (with filter toolbar or external controls) and styling of the grid depend on your taste and the project requirements.

(ASP.NET) How would you go about creating a real-time counter which tracks database changes?

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?

Is there a DBGrid component that can handle large datasets fast?

Large datasets, millions of records, need special programming to maintain speed in DBGrids.
I want to know if there are any ready-made components for Delphi (DBGrids) that do this automatically?
EDIT For Example: Some databases have features such as fetch 1st X records (eg 100 records). When I reach the bottom with scrolling, I want to auto fetch the next 100. Conversely when I reach the beginning, I want to fetch the previous 100. I know I can program this, but it sure is possible to propagate that feature to a DBGrid control where the DBGrid does the buffering. It will save quite a bit of programming - you simply have to set the "buffer size" so to speak.
You might want to take a look at the wonderful (free, open source, dual licensed as MPL 1.1 and GPL thus usable in closed source apps) Virtual TreeView and its user-supplied descendants (scroll down the page to find those.)
Edit to reflect the question's edit: Virtual TreeView not only allows you to handle millions of nodes without keeping them in memory, but that is in fact the preferred way of using it. You supply the data through event callbacks when it's needed, and you can tell the tree to cache that data (or not.)
Oh, and of course it also has a grid / report mode where it can function as a table (just set the GridExtensions property to True.)
I would have a look at Developer Express QuantumGrid Suite. (#birger: you just were a tick faster ;-) ) So I'm not just duplicating the answer, some elaboration:
The DevExpress Grid uses a data controller that has several modes to controll the data bound to the grid. One of these is exactly what you are looking for:
Grid Mode
When using Grid Mode, only a fixed
number of dataset records is loaded
into memory. Because only a limited
set of records are retrieved from the
dataset, automatic sorting, filtering
and summary calculations are disabled
in Grid Mode (must be controlled
manually instead). By default, this
mode is disabled and the
ExpressDataController loads all
records in a dataset.
It does have some drawbacks, which seem pretty obvious: you cannot make a summary, sort, or filter if you do not have all records at hand.
NextGrid is light, fast and nice looking grid for Delphi
http://www.bergsoft.net/component/next-grid/features.htm
HANDLING LARGE AMOUNT OF CELLS WITHOUT LOOSING SPEED
NextGrid can handle very large amount
of cells without losing speed. Speed
of adding, modifying and deleting data
doesn't depend of the amount of cells.
In NextGrid demo you can see how fast
NextGrid work with 100,000 rows and 10
columns = 1,000,000 cells
I think the DevExpress Quantumgrid supports this very good.
sorry, I just saw your comment to NeftalĂ­
if you would to bring 100 record per time, and then fetch the next 100, this work related to database access components, look at devart components, they are offer direct access components to most used database, and they have the feature you are asking about and more:
http://www.devart.com/products-vcl.html

Resources