What metrics for GUI usability do you know? - user-interface

Of course the best metric would be a happiness of your users.
But what metrics do you know for GUI usability measurements?
For example, one of the common metrics is a average click count to perform action.
What other metrics do you know?

Jakob Nielsen has several articles regarding usability metrics, including one that is entitled, well, Usability Metrics:
The most basic measures are based on the definition of usability as a quality metric:
success rate (whether users can perform the task at all),
the time a task requires,
the error rate, and
users' subjective satisfaction.

I just look at where I want users to go and where (physically) they are going on screen, I do this with data from Google Analytics.

Not strictly usability, but we sometimes measure the ratio of the GUI and the backend code. This is for the managers, to remind them, that while functionality is importaint, the GUI should get a proportional budget for user testing and study too.

check:
http://www.iqcontent.com/blog/2007/05/a-really-simple-metric-for-measuring-user-interfaces/
Here is a simple pre-launch check you
should do on all your web
applications. It only takes about 5
seconds and one screeshot
Q: “What percentage of your interface contains stuff that your customers
want to see?”
10%
25%
100%
If you answer a, or b then you might
do well, but you’ll probably get blown
out of the water once someone decides
to enter the market with option c.

Related

How to implement personalized feed ranking?

I have an app that aggregates various sports content (news articles, videos, discussions from users, tweets) and I'm currently working on having it so that it'll display relevant content to the users. Each post has a like button so I'm using that to determine what's popular. I'm using the reddit algorithm to have it sorted on popularity but also factor in time. However, my problem is that I want to make it more personalized for each user. Each user should see more content based on what they like. I have several factors I'm measuring:
- How many of each content they watch/click on? Ex: 60% videos and 40% articles
- What teams/players they like? If a news is about a team they like, it should be weighed more heavily
- What sport they like more? Users can follow several sports
What I'm currently doing:
For each of the factors listed above, I'll increase the popularity score by X of an article. Ex: user likes videos 70% than other content. I'll increase the score of videos by 70%.
I'm looking to see if there's better ways to do this? I've been told machine learning would be a good way but I wanted to see if there are any alternatives out there.
It sounds like what your doing is a great place to start with personalizing your users feeds.
Ranking based on popularity metrics (likes, comments, etc), recency, and in you case content type is the basis of the EdgeRank algorithm that Facebook used to use.
There are a lot of metrics that you can apply to try and boost engagement. Something
user liked post from team x, y times, so boost activity in feed by log(x) if post if is from y, boost activity if it’s newer, boost activity if it’s popular, etc… You can start to see that these EdgeRank algorithms can get a bit unwieldy rather quickly the more metrics you track. Also all the hyper-parameters that you set tend to be fixed for each user, which won’t end up with the ideal ranking algorithm for every user. Which is where machine learning techniques can come into play.
The main class of algorithms that deal with this sort of thing are often called Learning to Rank, and can be on a high level generalized into 3 categories. Collaborative filtering techniques, content based techniques, and hybrid techniques (blend of the first two)
In you case with a feed that most likely gets updated fairly frequently with new items, I would take a look at content based methods. Typically these algorithms are optimized around engagement metrics such as likelihood that the user is going to click, view, comment, or like an activity within their feed.
A little bit of self-promotion: I wrote a couple blog posts that cover some of this that you may find interesting.
https://getstream.io/blog/instagram-discovery-engine-tutorial/
https://getstream.io/blog/beyond-edgerank-personalized-news-feeds/
This can be a lot a lot to take on, so you could also take a look at using a 3rd party service like Stream (disclaimer, I do work there) who helps developers build scalable, personalized feeds.

How would rating system be done for users in a web application?

I am implementing a web application that has many users and I would give the users rating based on their activities and based on other users liking their activities. How would I implement such an algorithm for that? I am looking for elegant and smart algorithm that could help.
You are basically looking for Scoring Algos. These articles might help -
How not to sort by average rating
Rank hotness with Newtons law of Cooling
How Reddit Ranking Algorithms work
Hope this helps.
Maybe your answer is staring right at you next to your username on this site :-) Stackoverflow.com's scoring system and badges are here to promote certain behaviors on the site. The algorithm is simple and the feedback is immediate so that everybody can see the consequences of certain actions.
What are the ratings used for? If you want to use the ratings as incentives for you users to encourage a specific behavior, then I believe you need to look at disciplines like behavioral psychology to figure out what behaviors you want to measure and reward.
If you already have a user base that reflects the typical user base you're trying to address, you might want to try with simple trial and error. Pick some actions, like e.g. receiving a like on a post and add points to the user's score whenever that happens. Watch the user community's reaction when you introduce the scoring system and see it it helps motivate the behavior you want. If not, try to change some other parameters and repeat.
Depending on your system, some users might try to game the system, so you could find yourself locked into an eternal cat and mouse game once you introduce a rating system (example: Google page ranking).

Resources related to data-mining and gaming on social networks

I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project:
construct a data warehouse that aggregates significant information
select a classifier, and train it with a subsectio of records from the warehouse
validate classifier with another test set
lather, rinse, repeat
Surprisingly, I've found very little in this area regarding literature, best practices, etc. I am hoping to crowdsource the information gathering problem here. Specifically what I'm looking for:
What classifiers have worked will for this type of pattern mining (it seems highly temporal, users playing games, users receiving rewards, users transferring prizes etc).
Are there any highly agreed upon attributes specific to social networking / gaming data?
What is a practical amount of information that should be considered? One problem I've run into is data overload, where queries and data cleansing may take days to complete.
Related to point above, what hardware resources are required to produce results? I've found it difficult to estimate the amount of computing power I will require for production use. It has become apparent that a white box in the corner does not have enough horse-power for such a project. Are companies generally resorting to cloud solutions? Are they buying clusters?
Basically, any resources (theoretical, academic, or practical) about implementing a social networking / gaming pattern-mining program would be very much appreciated.
Thanks.
I am looking for the same kind of resources, here are some things I found that I consider pretty interesting, hope you can take advantage of it, please if you discover more resources let me know.
Here they are:
http://techcrunch.com/2010/04/06/turiya-media-games/
http://www.kdnuggets.com/2010/08/video-tutorial-christian-thurau-data-mining-in-games.html?k10n21
http://www.gamasutra.com/view/feature/2816/better_game_design_through_data_.php
This is in portuguesse but is excelent: http://thiagofalcao.info/

How does the Amazon Recommendation feature work?

What technology goes in behind the screens of Amazon recommendation technology? I believe that Amazon recommendation is currently the best in the market, but how do they provide us with such relevant recommendations?
Recently, we have been involved with similar recommendation kind of project, but would surely like to know about the in and outs of the Amazon recommendation technology from a technical standpoint.
Any inputs would be highly appreciated.
Update:
This patent explains how personalized recommendations are done but it is not very technical, and so it would be really nice if some insights could be provided.
From the comments of Dave, Affinity Analysis forms the basis for such kind of Recommendation Engines. Also here are some good reads on the Topic
Demystifying Market Basket Analysis
Market Basket Analysis
Affinity Analysis
Suggested Reading:
Data Mining: Concepts and Technique
It is both an art and a science. Typical fields of study revolve around market basket analysis (also called affinity analysis) which is a subset of the field of data mining. Typical components in such a system include identification of primary driver items and the identification of affinity items (accessory upsell, cross sell).
Keep in mind the data sources they have to mine...
Purchased shopping carts = real money from real people spent on real items = powerful data and a lot of it.
Items added to carts but abandoned.
Pricing experiments online (A/B testing, etc.) where they offer the same products at different prices and see the results
Packaging experiments (A/B testing, etc.) where they offer different products in different "bundles" or discount various pairings of items
Wishlists - what's on them specifically for you - and in aggregate it can be treated similarly to another stream of basket analysis data
Referral sites (identification of where you came in from can hint other items of interest)
Dwell times (how long before you click back and pick a different item)
Ratings by you or those in your social network/buying circles - if you rate things you like you get more of what you like and if you confirm with the "i already own it" button they create a very complete profile of you
Demographic information (your shipping address, etc.) - they know what is popular in your general area for your kids, yourself, your spouse, etc.
user segmentation = did you buy 3 books in separate months for a toddler? likely have a kid or more.. etc.
Direct marketing click through data - did you get an email from them and click through? They know which email it was and what you clicked through on and whether you bought it as a result.
Click paths in session - what did you view regardless of whether it went in your cart
Number of times viewed an item before final purchase
If you're dealing with a brick and mortar store they might have your physical purchase history to go off of as well (i.e. toys r us or something that is online and also a physical store)
etc. etc. etc.
Luckily people behave similarly in aggregate so the more they know about the buying population at large the better they know what will and won't sell and with every transaction and every rating/wishlist add/browse they know how to more personally tailor recommendations. Keep in mind this is likely only a small sample of the full set of influences of what ends up in recommendations, etc.
Now I have no inside knowledge of how Amazon does business (never worked there) and all I'm doing is talking about classical approaches to the problem of online commerce - I used to be the PM who worked on data mining and analytics for the Microsoft product called Commerce Server. We shipped in Commerce Server the tools that allowed people to build sites with similar capabilities.... but the bigger the sales volume the better the data the better the model - and Amazon is BIG. I can only imagine how fun it is to play with models with that much data in a commerce driven site. Now many of those algorithms (like the predictor that started out in commerce server) have moved on to live directly within Microsoft SQL.
The four big take-a-ways you should have are:
Amazon (or any retailer) is looking at aggregate data for tons of transactions and tons of people... this allows them to even recommend pretty well for anonymous users on their site.
Amazon (or any sophisticated retailer) is keeping track of behavior and purchases of anyone that is logged in and using that to further refine on top of the mass aggregate data.
Often there is a means of over riding the accumulated data and taking "editorial" control of suggestions for product managers of specific lines (like some person who owns the 'digital cameras' vertical or the 'romance novels' vertical or similar) where they truly are experts
There are often promotional deals (i.e. sony or panasonic or nikon or canon or sprint or verizon pays additional money to the retailer, or gives a better discount at larger quantities or other things in those lines) that will cause certain "suggestions" to rise to the top more often than others - there is always some reasonable business logic and business reason behind this targeted at making more on each transaction or reducing wholesale costs, etc.
In terms of actual implementation? Just about all large online systems boil down to some set of pipelines (or a filter pattern implementation or a workflow, etc. you call it what you will) that allow for a context to be evaluated by a series of modules that apply some form of business logic.
Typically a different pipeline would be associated with each separate task on the page - you might have one that does recommended "packages/upsells" (i.e. buy this with the item you're looking at) and one that does "alternatives" (i.e. buy this instead of the thing you're looking at) and another that pulls items most closely related from your wish list (by product category or similar).
The results of these pipelines are able to be placed on various parts of the page (above the scroll bar, below the scroll, on the left, on the right, different fonts, different size images, etc.) and tested to see which perform best. Since you're using nice easy to plug and play modules that define the business logic for these pipelines you end up with the moral equivalent of lego blocks that make it easy to pick and choose from the business logic you want applied when you build another pipeline which allows faster innovation, more experimentation, and in the end higher profits.
Did that help at all? Hope that give you a little bit of insight how this works in general for just about any ecommerce site - not just Amazon. Amazon (from talking to friends that have worked there) is very data driven and continually measures the effectiveness of it's user experience and the pricing, promotion, packaging, etc. - they are a very sophisticated retailer online and are likely at the leading edge of a lot of the algorithms they use to optimize profit - and those are likely proprietary secrets (you know like the formula to KFC's secret spices) and guaarded as such.
This isn't directly related to Amazon's recommendation system, but it might be helpful to study the methods used by people who competed in the Netflix Prize, a contest to develop a better recommendation system using Netflix user data. A lot of good information exists in their community about data mining techniques in general.
The team that won used a blend of the recommendations generated by a lot of different models/techniques. I know that some of the main methods used were principal component analysis, nearest neighbor methods, and neural networks. Here are some papers by the winning team:
R. Bell, Y. Koren, C. Volinsky, "The BellKor 2008 Solution to the Netflix Prize", (2008).
A. Töscher, M. Jahrer, “The BigChaos Solution to the Netflix Prize 2008", (2008).
A. Töscher, M. Jahrer, R. Legenstein, "Improved Neighborhood-Based Algorithms for Large-Scale Recommender Systems", SIGKDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (KDD’08) , ACM Press (2008).
Y. Koren, "The BellKor Solution to the Netflix Grand Prize", (2009).
A. Töscher, M. Jahrer, R. Bell, "The BigChaos Solution to the Netflix Grand Prize", (2009).
M. Piotte, M. Chabbert, "The Pragmatic Theory solution to the Netflix Grand Prize", (2009).
The 2008 papers are from the first year's Progress Prize. I recommend reading the earlier ones first because the later ones build upon the previous work.
I bumped on this paper today:
Amazon.com Recommendations: Item-to-Item Collaborative Filtering
Maybe it provides additional information.
(Disclamer: I used to work at Amazon, though I didn't work on the recommendations team.)
ewernli's answer should be the correct one -- the paper links to Amazon's original recommendation system, and from what I can tell (both from personal experience as an Amazon shopper and having worked on similar systems at other companies), very little has changed: at its core, Amazon's recommendation feature is still very heavily based on item-to-item collaborative filtering.
Just look at what form the recommendations take: on my front page, they're all either of the form "You viewed X...Customers who also viewed this also viewed...", or else a melange of items similar to things I've bought or viewed before. If I specifically go to my "Recommended for You" page, every item describes why it's recommended for me: "Recommended because you purchased...", "Recommended because you added X to your wishlist...", etc. This is a classic sign of item-to-item collaborative filtering.
So how does item-to-item collaborative filtering work? Basically, for each item, you build a "neighborhood" of related items (e.g., by looking at what items people have viewed together or what items people have bought together -- to determine similarity, you can use metrics like the Jaccard index; correlation is another possibility, though I suspect Amazon doesn't use ratings data very heavily). Then, whenever I view an item X or make a purchase Y, Amazon suggests me things in the same neighborhood as X or Y.
Some other approaches that Amazon could potentially use, but likely doesn't, are described here: http://blog.echen.me/2011/02/15/an-overview-of-item-to-item-collaborative-filtering-with-amazons-recommendation-system/
A lot of what Dave describes is almost certainly not done at Amazon. (Ratings by those in my social network? Nope, Amazon doesn't have any of my social data. This would be a massive privacy issue in any case, so it'd be tricky for Amazon to do even if they had that data: people don't want their friends to know what books or movies they're buying. Demographic information? Nope, nothing in the recommendations suggests they're looking at this. [Unlike Netflix, who does surface what other people in my area are watching.])
I don't have any knowledge of Amazon's algorithm specifically, but one component of such an algorithm would probably involve tracking groups of items frequently ordered together, and then using that data to recommend other items in the group when a customer purchases some subset of the group.
Another possibility would be to track the frequency of item B being ordered within N days after ordering item A, which could suggest a correlation.
As far I know, it's use Case-Based Reasoning as an engine for it.
You can see in this sources: here, here and here.
There are many sources in google searching for amazon and case-based reasoning.
If you want a hands-on tutorial (using open-source R) then you could do worse than going through this:
https://gist.github.com/yoshiki146/31d4a46c3d8e906c3cd24f425568d34e
It is a run-time optimised version of another piece of work:
http://www.salemmarafi.com/code/collaborative-filtering-r/
However, the variation of the code on the first link runs MUCH faster so I recommend using that (I found the only slow part of yoshiki146's code is the final routine which generates the recommendation at user level - it took about an hour with my data on my machine).
I adapted this code to work as a recommendation engine for the retailer I work for.
The algorithm used is - as others have said above - collaborative filtering. This method of CF calculates a cosine similarity matrix and then sorts by that similarity to find the 'nearest neighbour' for each element (music band in the example given, retail product in my application).
The resulting table can recommend a band/product based on another chosen band/product.
The next section of the code goes a step further with USER (or customer) based collaborative filtering.
The output of this is a large table with the top 100 bands/products recommended for a given user/customer
Someone did a presentation at our University on something similar last week, and referenced the Amazon recommendation system. I believe that it uses a form of K-Means Clustering to cluster people into their different buying habits. Hope this helps :)
Check this out too: Link and as HTML.

A/B testing on a news site to improve relevance

If you were running a news site that created a list of 10 top news stories, and you wanted to make tweaks to your algorithm and see if people liked the new top story mix better, how would you approach this?
Simple Click logging in the DB associated with the post entry?
A/B testing where you would show one version of the algorithm togroup A and another to group B and measure the clicks?
What sort of characteristics would you base your decision on as to whether the changes were better?
A/B test seems a good start, and randomize the participants. You'll have to remember them so they never see both.
You could treat it like a behavioral psychology experiment, do a T-Test etc...
In addition to monitoring number of clicks, it might also be helpful to monitor how long they look at the story they clicked on. It's more complicated data, but provides another level of information. You would then not only be seeing if the stories you picked out grab the user's attentions, but also that the stories are able to keep it.
You could do statistical analysis (i.e. T-test like Tim suggested), but you probably won't get low enough of a standard deviation on either measure to prove significance. Although, it won't really matter: all you need is for one of the algorithms to have a higher average number of clicks and/or time spent. No need to fool around with hypothesis testing, hopefully.
Of course, there is always the option of simply asking the user if the recommendations were relevant, but that may not be feasible for your situation.

Resources