GWT - Populate Grid asynchronously - performance

we've got a GWT application with a simple search mask displaying the results as a grid.
Server side processing time is ok as well as network latency.
Client rendering time is ok even on low spec hardware with internet explorer 6 as long as the number of results is not too high (max 100 rows in the grid).
We have implemented a navigation scheme allowing the user to scroll up/down the grid. That's fast enough also.
Has anybody an idea if it is possible to display the first 100 results immediately and pull the rest in the background? The GWT architecture allows this. However I'm interested in possible pitfalls e.g. what happens if the user starts another query while the browser is still fetching previous results etc.
Thanks!
Holger

LazyPanel and this blog post might be a good starting point for you :)
The GWT Incubator has also many interesting (albeit not always complete/perfect/stable) tables and other pagination solutions - like PagingScrollTable.

Assuming your plan is to send the first 100, and then bring the rest, you can use bulks for the rest of the results. then, if a user initiates another search, you just wait for the end of the bulk ( ie, check between bulk retrivals if you have a pending query ).
Another way you can go is assign identifiers to the user searches. this will make the problem of mixed results non-existant, and will also help you with results history for multiple searches.
we found that users love the live grid look & feel, which solves most of those problems, but that might not be optional always.

Related

Dynatrace PurePath: what are each yellow bar?

I am using Dynatrace to help orient my efforts as I'm optimizing an endpoint of our service.
Looking at the Controller's PurePath, I am currently wondering: what does each individual yellow bar mean exactly?
It seems to be some sort of aggregate since I don't think we have any kind of batching activated. Yet, we see multiple times the same statement being aggregated into one bar, then right after the same statement aggregated again into a single bar, but in the same timeframe (for example: we see a 89x, then a 90x following each other).
As per company policy, I had to hide a bunch of things with black rectangles: sorry for that!
We have been using Dynatrace for long time now. These yellow boxes are showing the time taken for executing respective query. The query can be seen at the start of that row.
Looking at your diagram it seems you are executing few queries in parallel. e.g. last 4 queries have started at the same time and based on complexity each has taken different time to complete the execution.
The multiplying factor shown as 90X or 89X is the number of times that query is executed. This is what documentation says.
I truly do not agree with that. Why would developer/ DB server run the same query those many times? May be the agent installed on that DB server is getting confused due to same query is getting executed across different requests. This is just my guess.
Regards,
Vikrant Korde

Is Elasticsearch Scroll API not recommended for real-time pagination?

I understand that Elasticsearch Scroll API is not intended for real-time user requests. But would it be bad if it's used for that? I have a requirement to implement paginated results (to be displayed on web frontend) and from/size approach is returning duplicates across pages. Presumably because I have a sharded setup (with no replicas at all). I've tried setting preferencebut it did not help.
Scroll API does not seem to have this issue, I'm wondering if it's really bad to use it for my use case?
Thanks
Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests. it means that your pagination is based on the time you requested the search result, so you don't see new document or will see deleted in your result. Also Scroll API is not recommended by ES for deep pagination any more(ES 7.x). you can find more info on ElasticSearch documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/scroll-api.html
On the question 'why you get duplicate results', I think this is caused by intermediate indexing. When doing independent search calls with pagination, each call runs independently (still using some caching). So if you ask the first 100, you get the first 100 at that time. When then asking x seconds later the 'next' 100, you get 100 - 199 at x seconds later. If meanwhile a new document got indexed which logically fits in the first 100, it will push the rest further. This way, your result 100 (first in the second results) might have been #99 in the first call. When then gluing them together in the UI, you see the same result twice.
Both scroll and search-after are designed to refer ES back to the original call, indicating it that you want to continue counting from that moment onwards.
I have not found a good explanation though why search_after is better than scroll.
I assume that scroll is optimized for the use case where you will go through the entire set anyway (so the pagination is to avoid overloading the client and the pipe between ES and client with too big chunks at once). While search_after is optimized for the use case where you are likely to only go a few pages far/deep (it is known that human users tend to stay on the first page with a quickly lowering frequency of going much further, because you would force your eyes to find something into overwhelming amounts of information). Implementing good filters in the user interface is the much better approach.

JMeter and page views

I'm trying to use data from google analytics for an existing website to load test a new website. In our busiest month over an hour we had 8361 page requests. So should I get a list of all the urls for these page requests and feed these to jMeter, would that be a sensible approach? I'm hoping to compare the page response times against the existing website.
If you need to do this very quickly, say you have less than an hour for scripting, in that case you can do this way to compare that there are no major differences between 2 instances.
If you would like to go deeper:
8361 requests per hour == 2.3 requests per second so it doesn't make any sense to replicate this load pattern as I'm more than sure that your application will survive such an enormous load.
Performance testing is not only about hitting URLs from list and measuring response times, normally the main questions which need to be answered are:
how many concurrent users my application can support providing acceptable response times (at this point you may be also interested in requests/second)
what happens when the load exceeds the threshold, what types of errors start occurring and what is the impact.
does application recover when the load gets back to normal
what is the bottleneck (i.e. lack of RAM, slow DB queries, low network bandwidth on server/router, whatever)
So the options are in:
If you need "quick and dirty" solution you can use the list of URLs from Google Analytics with i.e. CSV Data Set Config or Access Log Sampler or parse your application logs to replay production traffic with JMeter
Better approach would be checking Google Analytics to identify which groups of users you have and their behavioral patterns, i.e. X % of not authenticated users are browsing the site, Y % of authenticated users are searching, Z % of users are doing checkout, etc. After it you need to properly simulate all these groups using separate JMeter Thread Groups and keep in mind cookies, headers, cache, think times, etc. Once you have this form of test gradually and proportionally increase the number of virtual users and monitor the correlation of increasing response time with the number of virtual users until you hit any form of bottleneck.
The "sensible approach" would be to know the profile, the pattern of your load.
For that, it's excellent you're already have these data.
Yes, you can feed it as is, but that would be the quick & dirty approach - while get the data analysed, patterns distilled out of it and applied to your test plan seems smarter.

Django Templating vs AJAX to load a small div

I have a Django server. The server loads a webpage with almost all static content but a few numbers must load from the database.
I'm thinking about performance/price; I can host my Django server on a fast server and render the page using Django templates. or I can host the server on a slower machine and make a static page that loads the few numbers using ajax and host the page cheaply somewhere else like github.io.
The latter choice will have most of the page load real quick and real cheap.
I was wondering what are the trade-offs ?
Whichever server you decide to hire, you should always think of reducing the server load - no matter how fast your server is. By reducing server load I mean only make your server do what is really required at the moment.
Let's learn something from the big players like Facebook, for instance
You log into your account and you see that you've got 5 notifications and 3 new messages plus a couple of photos and highly interesting statuses of your friends. Cool! You now click on the notifications icon to find out if that hot girl (forgive me if you're a girl :D) has added you to her friends list or not. As you click a big white <div> pops up AND you see nothing but a loading gif! The notifications do appear, but after a couple of seconds. Try doing it with a slow internet connection, and you get to adore the beauty of the loading gif for a lot more time.
So, what do you make of it?
Facebook only made it's server count the number of notifications and new messages, and displayed those numbers to you. Thus reducing server load. It only displayed the notifications to you when you wanted to see them. And to load the notifications, all it took was a minimal AJAX call in which only around 10 KB of data was transferred!
Facebook does it all the time and everywhere. Consider this: Robert Downey Jr. posts a photo of himself on his Facebook page. A little while later, you see that it has got 10k+ comments. You decide to read them and click the comments button. An attractive loading gif pops up again for a little while and is soon replaced by comments. But hey, only 10 comments were loaded. What the ... Oh wait! That's how Facebook reduces its server load - read those 10 comments first, if you want to read more, send a request again.
Twitter does it too - the infinite scroll.
Icing on the cake
This approach benefits you in two ways:
It reduces server load - less chances of crashing a website.
It decreases your website's page-load time since you'll be passing less data i.e. the data required at that moment. Thus making your website faster. (Yes, it can outrun Flash, too!)
Food for thought
If you've got some cool technologies around such as AJAX, why not use it? Your server is not a donkey, for God's sake!
P.S. By Facebook and Twitter, I mean the engineers behind them.
Well It would depend on the following:
A. Whether you want to Display that number on Page load itself or when user clicks to see it* ?
If you want to show the the numbers at the time of Page load Itself than it is preferable to get them at time of Template response itself.
Why do you would want your Site Visitors to wait till those numbers populate (if the intention is to display them) ?
If it is to be displayed on User's click only then Ajax should be preferred
B. How much Time is this Query going to take and Can the query be optimized to minimal time ?
If the Query you are making takes a Lot of time than first effort should be made to optimize that query to be as fast as possible,
If the query can give result in minimal time than it is futile to do another Request to Server via Ajax.
But if you know the Query will take a lot of Time than Ajax is fine.

Using onbeforeunload event with Google Analytics to record page exits and therefore more accurately record user time on page / site

I have been trying to research the hack proposed by Avinash Kaushik in his book Web Analytics 2.0. He poses the problem whereby most web analytics tools are unable to record the time a user spent on the last page they visit on a website, or on the only page they visit. In other words if user comes to page 1, a timestamp is created showing the time they arrived at the page, when they visit page 2, a second timestamp is created. The time spent on page 1 can be calculated by timestamp 2 - timestamp 1. However if the user closes the browser window or navigates away from the website there is no way to record time on page 2. Here is a link to this problem on Kaushik.net
standard-metrics-revisited-time-on-page-and-time-on-site
One proposed hack is to use the window.onbeforeunload event to call a method and push the time that the page was unloaded to google analytics. So I tried the following code -
window.onbeforeunload = capturePageExit;
function capturePageExit()
{
_gaq.push(['_trackPageview', '/page-exit?page=' + document.location.pathname + document.location.search + '&from=' + document.referrer]);
return("You are about to close this page");
}
Using firebug I can see that the correct __utm.gif image is requested and the correct params are sent to google analytics. But clearly there is a problem now that this will be called on each page unload and so each visitor will appear to go from page1 -> page-exit -> page2 -> page-exit -> page3 -> page-exit... but I should get a more accurate time on site reading, right?
However this is at the expense of accurate navigation-summary data and so not a good solution. What would be good is if I could tell - if user has clicked the close browser/tab button or is navigating away from my site then record the page-exit.
I cant find a great deal of information about how to solve this problem, plenty of discussion about being aware of this inaccuracy when interpreting google analytics (and most web analytics tools probably), another useful link is time_on_page_and_time_on_site_how_confident_are_you
Just wanted to raise this on stackoverflow as I cant find a similar question and start a discussion about this, but my interpretation is that there isnt really a way around this problem but it is just better to be aware of it.
any thoughts?
------------------------------------------------------ UPDATE -----------------------------------------------------
Here is another link that was suggested to me from a blog called Savio.no, is this a good method?
how-to-measure-true-time-with-google-analytics
Web Analytics is not an exact science. Data is always approximate and most of the time sampled.
Web Analytics tools strive for Precision not accuracy. This whitepaper describes why it's more important to have precision and less important to have accuracy when working with Web Analytics.
Once you understand the difference between precision and accuracy and why it matters you will understand that it's not important to get the exact time on site metric, but a precise measure that could clearly express trendings or changes to that metric.
On other words forget about absolute numbers, learn to report using trends and changes.
Another advice, don't bother tweaking GA to render every single metric perfectly if you're never gonna use it. Bother with metrics that you can use. And by use I mean Actionable analysis.
There are, however a few cases were some code tweaking can help you out measuring the time on site. A clear example is a weblog. You may want to implement something like that in a weblog, ince most of your visits will be looking at your homepage, reading your posts and then leaving, all that is done in the same single PageView so it may be a good idea to fire an event when the user leaves to get the correct time on site, or maybe fire an event when the user scrolls past some threshold, in the end you'll be measuring the same ting, if the user scrolls more he reads more, and if the user spends more time then he reads more. So it may not make sense to track those 2 metrics to measure the same effect. Just choose one and stick with it, leave it running for a while to create historical data and then make use of it.

Resources