How does pagination on Reddit's home page work? - algorithm

Reddit uses a time decay algorithm. That would mean the sort order is subject to change. When a user goes to page 2, is there a mechanism to prevent them from seeing a post that was on page 1 but was bumped down to page 2 before they paged over? Is it just an acceptable flaw of the sort method? Or are the first couple of pages cached for the user so this doesn't happen?
Side note: It's my understand that Digg cannot suffer from this issue but that HackerNews and Reddit can.

From the next URL you see: http://www.reddit.com/?count=25&after=t3_dj7xt
So clearly the next page ensures that the page2 starts at the post after t3_dj7xt - whatever that translated to. This could be accomplished using IDs so you'd pass after=188 then the next page starts at 189 thus ensuring you don't see the same post if a time delay occured

It might be using the last ID as opposed to limiting from. Take these two examples of SQL:
SELECT * FROM Stories WHERE StoryID>$LastStoryID;
rather than:
SELECT * FROM Stories LIMIT 20, 10;

Related

Is Elasticsearch Scroll API not recommended for real-time pagination?

I understand that Elasticsearch Scroll API is not intended for real-time user requests. But would it be bad if it's used for that? I have a requirement to implement paginated results (to be displayed on web frontend) and from/size approach is returning duplicates across pages. Presumably because I have a sharded setup (with no replicas at all). I've tried setting preferencebut it did not help.
Scroll API does not seem to have this issue, I'm wondering if it's really bad to use it for my use case?
Thanks
Results from a scrolling search reflect the state of the index at the time of the initial search request. Subsequent indexing or document changes only affect later search and scroll requests. it means that your pagination is based on the time you requested the search result, so you don't see new document or will see deleted in your result. Also Scroll API is not recommended by ES for deep pagination any more(ES 7.x). you can find more info on ElasticSearch documentation page: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/scroll-api.html
On the question 'why you get duplicate results', I think this is caused by intermediate indexing. When doing independent search calls with pagination, each call runs independently (still using some caching). So if you ask the first 100, you get the first 100 at that time. When then asking x seconds later the 'next' 100, you get 100 - 199 at x seconds later. If meanwhile a new document got indexed which logically fits in the first 100, it will push the rest further. This way, your result 100 (first in the second results) might have been #99 in the first call. When then gluing them together in the UI, you see the same result twice.
Both scroll and search-after are designed to refer ES back to the original call, indicating it that you want to continue counting from that moment onwards.
I have not found a good explanation though why search_after is better than scroll.
I assume that scroll is optimized for the use case where you will go through the entire set anyway (so the pagination is to avoid overloading the client and the pipe between ES and client with too big chunks at once). While search_after is optimized for the use case where you are likely to only go a few pages far/deep (it is known that human users tend to stay on the first page with a quickly lowering frequency of going much further, because you would force your eyes to find something into overwhelming amounts of information). Implementing good filters in the user interface is the much better approach.

Discrepancy in content experiment sessions

I noticed a huge discrepancy in the count of sessions for one of our experiments in Google Analytics.
The API says 3,123 sessions for variation 0 and 3,039 for variation 1.
GA API screenshot
At the same time the report in google.com/analytics reads 5,743 for variation 0 and 5,620 for variation 1. GA Web screenshot
The above data is:
- on the exact same dates
- with no filters
- with no segments
- on the same Google Analytics view id
Could you please help me figure this out?
Thanks,
V.
The thing is that ga_sessions in query explorer (api) is not the same as experiment sessions in content experiments interface.
Read this for more information on experiment conversion rate and sessions calculation:
https://support.google.com/analytics/answer/6112437#
Here is a quotation:
Conversion rate is calculated using the same methodology as Analytics:
total converted visits divided by total visits (once a user becomes a
part of an experiment). A user is considered part of an experiment
once he or she has seen the experiment page. For example, if a user
sees the experiment page, then comes back the next day, the second
visit is counted, even if the user does not view the experiment page
again.

Using onbeforeunload event with Google Analytics to record page exits and therefore more accurately record user time on page / site

I have been trying to research the hack proposed by Avinash Kaushik in his book Web Analytics 2.0. He poses the problem whereby most web analytics tools are unable to record the time a user spent on the last page they visit on a website, or on the only page they visit. In other words if user comes to page 1, a timestamp is created showing the time they arrived at the page, when they visit page 2, a second timestamp is created. The time spent on page 1 can be calculated by timestamp 2 - timestamp 1. However if the user closes the browser window or navigates away from the website there is no way to record time on page 2. Here is a link to this problem on Kaushik.net
standard-metrics-revisited-time-on-page-and-time-on-site
One proposed hack is to use the window.onbeforeunload event to call a method and push the time that the page was unloaded to google analytics. So I tried the following code -
window.onbeforeunload = capturePageExit;
function capturePageExit()
{
_gaq.push(['_trackPageview', '/page-exit?page=' + document.location.pathname + document.location.search + '&from=' + document.referrer]);
return("You are about to close this page");
}
Using firebug I can see that the correct __utm.gif image is requested and the correct params are sent to google analytics. But clearly there is a problem now that this will be called on each page unload and so each visitor will appear to go from page1 -> page-exit -> page2 -> page-exit -> page3 -> page-exit... but I should get a more accurate time on site reading, right?
However this is at the expense of accurate navigation-summary data and so not a good solution. What would be good is if I could tell - if user has clicked the close browser/tab button or is navigating away from my site then record the page-exit.
I cant find a great deal of information about how to solve this problem, plenty of discussion about being aware of this inaccuracy when interpreting google analytics (and most web analytics tools probably), another useful link is time_on_page_and_time_on_site_how_confident_are_you
Just wanted to raise this on stackoverflow as I cant find a similar question and start a discussion about this, but my interpretation is that there isnt really a way around this problem but it is just better to be aware of it.
any thoughts?
------------------------------------------------------ UPDATE -----------------------------------------------------
Here is another link that was suggested to me from a blog called Savio.no, is this a good method?
how-to-measure-true-time-with-google-analytics
Web Analytics is not an exact science. Data is always approximate and most of the time sampled.
Web Analytics tools strive for Precision not accuracy. This whitepaper describes why it's more important to have precision and less important to have accuracy when working with Web Analytics.
Once you understand the difference between precision and accuracy and why it matters you will understand that it's not important to get the exact time on site metric, but a precise measure that could clearly express trendings or changes to that metric.
On other words forget about absolute numbers, learn to report using trends and changes.
Another advice, don't bother tweaking GA to render every single metric perfectly if you're never gonna use it. Bother with metrics that you can use. And by use I mean Actionable analysis.
There are, however a few cases were some code tweaking can help you out measuring the time on site. A clear example is a weblog. You may want to implement something like that in a weblog, ince most of your visits will be looking at your homepage, reading your posts and then leaving, all that is done in the same single PageView so it may be a good idea to fire an event when the user leaves to get the correct time on site, or maybe fire an event when the user scrolls past some threshold, in the end you'll be measuring the same ting, if the user scrolls more he reads more, and if the user spends more time then he reads more. So it may not make sense to track those 2 metrics to measure the same effect. Just choose one and stick with it, leave it running for a while to create historical data and then make use of it.

GWT - Populate Grid asynchronously

we've got a GWT application with a simple search mask displaying the results as a grid.
Server side processing time is ok as well as network latency.
Client rendering time is ok even on low spec hardware with internet explorer 6 as long as the number of results is not too high (max 100 rows in the grid).
We have implemented a navigation scheme allowing the user to scroll up/down the grid. That's fast enough also.
Has anybody an idea if it is possible to display the first 100 results immediately and pull the rest in the background? The GWT architecture allows this. However I'm interested in possible pitfalls e.g. what happens if the user starts another query while the browser is still fetching previous results etc.
Thanks!
Holger
LazyPanel and this blog post might be a good starting point for you :)
The GWT Incubator has also many interesting (albeit not always complete/perfect/stable) tables and other pagination solutions - like PagingScrollTable.
Assuming your plan is to send the first 100, and then bring the rest, you can use bulks for the rest of the results. then, if a user initiates another search, you just wait for the end of the bulk ( ie, check between bulk retrivals if you have a pending query ).
Another way you can go is assign identifiers to the user searches. this will make the problem of mixed results non-existant, and will also help you with results history for multiple searches.
we found that users love the live grid look & feel, which solves most of those problems, but that might not be optional always.

What is the reasoning for and the basic concepts behind an interstitial loading page?

I'm interested in finding out why this is used on some Web sites for processing user-initiated search submissions, how it affects the request and response flow, and programmatically why it would be necessary (or beneficial). In an MVC framework it seems difficult to execute since you are injecting another page into the middle of the flow.
EDIT:
Not advertising related. For instance, most travel sites used to do this, and there were no ads... some banking sites do it too, where there is just a loader that says something like "Please wait while we process your transaction...".
It is often used in long running requests to prevent the web server from timing out the request. With an interstitial page, you are able to continuously refresh the page until you get results back.
EDIT:
Also, for long running requests, it is beneficial to have a "Loading.." page in order to show the user that something is happening. Without the interstitial page, the request can appear to have hung up if it takes too long.
To supplement what HVS said, interstitials which appear before, say a homepage loads, are very much there for the purpose of advertising, we've all seen the 'close this ad' link.
One instance where they can be helpful from a user experience point of view is when a user initiates an action which requires feedback from a process which may take some time to respond - either because it's slow, busy or just has a lot of processing to do.
Think of a site where you book a flight online for example. You often get an interstitial on hitting 'find flights' because the the system is having to go off and ask for all relevant flight information and then sort them for you before displaying them on your screen. If this round-trip of 'request, interrogate, return, display' is likely to take an amount of time beyond that which a normal webpage transitions from one to the next, a UXDesigner may consider an interstitial screen (or message) to let the user know something is happening whilst at the same time allowing the system the time it needs to complete the request. Any screen with this sort of face-time is going to get the attention of your marketing department from a 'well while we've got them we might as well show them something' point of view.
As a UX Designer myself interstitials like this are not always preferred as I'd love every system to return data immediately but if it can't for whatever reason, I'm very much for keeping the user in the loop as much as possible about what is happening - rather than leaving them to stare at the browser status bar until they either try again or get fed up and leave.
One final point when considering this is also to have a lower and upper time limit on a screen like this. If you need to show an interstitial, show it for long enough so people can read it and understand it but not too long that they get fed up of waiting. As a rough guide, leave it open for at least 3-4 seconds (even if the process averages 4 seconds but has finished after 1 on this occasion). Between 4 and 10 seconds check every second to see if the process has responded (and then take the user to the next page f it has) and after 10 seconds seriously consider telling the user to either try again or telling them you've failed (whilst at the same time getting your tech team to fix what is ultimately a problem which will affect your bottom line).
I believe the vast majority of interstitial pages are there to run advertising.

Resources