Gameanalytics, days since install/session number filter - gameanalytics

In "look up metrics” I’m trying to know how my players improve in playing my game.
I have the score (both as desing event and progression, just to try) and in look up metrics I try to “filter” with session number or days since install but, even if I group by Dimension, this doesn’t produce any result.
For instance if I do the same but with device filter it shows me the right histogram with score's mean per device.
What am I doing wrong?

From the customer care:
The session filter works only on core metrics at this point (like DAU). We hope to make this filter compatible with custom metrics as well but this might take time as we first need to include this improvement to our roadmap and then evaluate it by comparing it with our other tasks. As a result, there is no ETA on making a release.
I would recommend you to download the raw data (go to "Export data" in the settings of the game) and perform an analysis on your own for this sort of "per user" analysis. You should be able to create stats per user. GA does not do this since your game can reach millions of users and there's no way you can plot this amount of entries in a browser.

Related

Implementing Trending in Elasticsearch

I'm building a project that indexes celebrity-related content across sites (tmz, people, etc) because I always thought that it would be funny to "bet" on people (and maybe shows, directors, etc) like horse racing or the stock market -- only, you know, not with real money -- where the value of the person changes day to day and hour to hour and even minute to minute if we can figure this out together, stack overflow denizens.
I assign traffic values to users based on mentions in social media. I have some scrapers (probably violating some TOSes) and access to Twitter's API to get relative counts for search results for a time, so I have known "numbers" to associate w/ users outside of elasticsearch for periods of time to build the trends. Now to be clear, I am not looking to implement trending based on the number of documents in the system, that actually stays pretty consistent, but I need to rank documents that already exist based on trends.
So that's what I've got: a few hundred thousand articles with pre-determined associations to individual celebrities. Data for on-the-minute associations of a score to those celebrities which are then merged and applied to each article so that each article has a few scores associated (there's some complexity here that does not matter, but the bottom line is that I have 10 or so values that I want to assign to content to sort it when you're on the market page and I want to sort those w/ a function or script score).
So the question: How the heck do I assign these values without making elasticsearch go crazy with re-indexing? I need to use these values to sort dozens of requests per second coming from feeds on the site, but I am running this on a raspberry pi... literally, I've maxed the poor thing out for memory.
We're real write heavy, but if for some reason celebrity stock markets takes off, we're also real read heavy at the same time. I swear I remember a plugin that had metadata associated with content, but I cannot find it.
I've tried enable=false and index=false, but they seem to still thrash the read times while writing the updates. The best I've gotten to is slowing down the refresh_interval, but that's still pretty expensive and starts to affect the "real-time" nature of the app.
I believe that this is impossible as you've laid it out. Any updates to a field will update _source and fire the full update process.
There are some alternatives that you might consider:
Replication, if another cluster is available
A separate write index on the same cluster, space allowing

User Search Strategy Mobile Dev

I want to implement a search feature on my app that re-filters upon each new character entered into the search bar so users can search for other users. This is a fairly common feature on apps, but as a beginner it would seem like a very computationally complex process. It would seem that one of two things happen:
For each new character typed, the frontend queries the backend, which applies filter and returns.
The frontend loads all (or many) possible results beforehand and updates filter on the stored info as new characters are entered.
It would seem that 1) would have time complexity issues, as it makes O(n) queries (where n is number of characters) per search. This is especially problematic because it's expected that the filtered search results update near instantaneously. Additionally, my average query time is probably slower than most, as I'm using a three tier architecture (frontend<->server<->graph database)
I don't like 2)--at least in its straightforward form--as the number of possible results can get very large. We can reduce the space complexity of this by querying only for a limited set of user attributes (perhaps only uid and name, and fetching details on the fly if needed), but the point remains.
Things get more interesting if we modify 2) to load only a sample of users (and here we can use data like Location as well as ML/AI to select). The problem with this is that the searching user could always be looking for someone we didn't select. It would be a horrible (even if rare) experience for a user to know their friend was on the app but was unable to find them because our algorithm was only accurate for 99% of searches.
I am sure this is possible--other apps seem to pull it off--so what am I missing?
First, you should avoid to query the server for each character typed. Most of the times the user types a bounce of chars very fast without looking at suggested results, especially because with few chars the results wouldn't be specific enough. All the autocompletion systems adopt both of the following:
query only if the string is at least 2-3 chars long;
query only if the user is not typing more, i.e. after 300ms from the last type.
To get all the pertinent results without huge data transfer you could implement a progressive data load. Just load enough results to fill the page height, then as the user scrolls down load more results. However if you reach a high number of results you should stop retrieving them and ask the user to type a more specific search.
If you want to make your users happy, try to sort the result by relevance. For example if you know where the users are located you may sort the results by distance, because if I live in Italy and I search for "Ste" it is more likely my friend is Stefano who lives in Rome, than Steve who lives in NY.

Is there any way to analyze the outcome of each step of the optaplanner?

I'm trying to solve a SKU (Stock Keeping Unit) sequencing problem on the production line in the company I work for.
In this problem, I have in average 2000 sku's to be sequenced in a single equipment. This equipment is released for production for 600 minutes per day. The time the sku will use will vary (production time + equipment setup time).
I am having difficulty analyzing the setup time of the equipment, since I need to check the original sku and what will be the next sku that will be producing.
Is there any way to analyze the step solver? Or is there any other way to analyze my equipment setup time?
We already tried to use Shadow Variable, but the performance was very low. Making the solution unfeasible.
If you turn on DEBUG logging, you'll see every step. If you turn on TRACE logging, you'll see every move evaluation too.
Generally, all this is too verbose, and the optaplanner-benchmark tool is a much better approach, as it has several statistics (most of which need to be turned on explicitly) to give insight as to what's going on.

Designing an algorithm for detecting anamoly and statistical significance for ordinal data using python

Firstly, I would like to apologise for the detailed problem statement. Being a novice, I couldn't express it in any lesser words.
Environment Setup Details:
To give some background, I work in a cloud company where we have multiple servers geographically located in all continents. So, we have hierarchy like this:
Several partitions
Each partition has 7 pop's
Each pop has multiple nodes all set up with redundancy.
Turn servers connecting traffic to each node depending on the client location
Actual clients-ios, android, mac, windows,etc.
Now, every time the user uses our product/service, he leaves a rating out of 5, 5 being outstanding. This data is stored in our databases and we mine it and analyse it to pin-point the exact issue on any particular day.
For example, if the users from Asia are giving more bad ratings on Tuesday this week than a usual Tuesday, what factors can cause this - is it something to do with clients app version, or server release , physical factors, loss, increased round trip delay etc.
What we have done:
Till now we have been using visualization tools to track each of these metrics separately per day to see the trends and detect the issues manually.
But, due to growing micr-services, it is becoming difficult day by day. Now, we want to automate it using python/pandas.
What I want to do:
If the ratings drop on a particular day/hour, I run the script and it should do all the manual work by taking all the permutations and combinations of all factors and list out the exact combinations which could have lead to the drop.
The second step would be to check whether the drop was significant due to varying number of ratings.
What I know:
I understand that I can do this using pandas by creating a dataframe for each predictor variable and trying to do it per variable.
And then I can apply tests like whitney test etc for ordinal data.
What I need help with:
But I just wanted to know if there is a better way to do it? It is perfectly fine if there is a learning curve involved. I can learn and do it. I just wanted some help in choosing the right approach for this.

How to ensure correctness of data gathered via crowdsourcing?

I have a site where users are entering data of some products they buy.
How do I ensure correctness of data entered via crowdsourcing (enabling users to vote/edit products) minimizing amount of work that needs to be done by administrator? I'm looking for some how-tos, best practices, etc.
What sort of data are you collecting ?
You're talking about crowd-sourcing, and thus (I assume) aggregating of data across this crowd. As they're talking about products they buy, I suspect you're going to be athering product attributes and prices.
Some possible approaches. If you users are entering non-numerical data (e.g. colours), just record the most common entries, or the mode (the most commonly entered).
If they're entering numeric data, discard outliers. i.e. bin the lowest and highest results, and average the rest (you could do this for prices, say. This is the approach that electronic exchanges use for resolving closing prices out of many trades).
Depending on your application, you may want to have a historical bias towards the most recent entries.
But this all depends on your application, and how much storage and crunching of data you're prepared to do.
Make sure you keep a log of IP addresses with every action made, malicious users or bots would trample on session data or cookies. Doing this ensures that a single entity cannot skew any results or do anything drastic by appearing to be multiple users.
As a high level data can be gathered from the 'crowd' with an associated correctness value. Looking at SO, an answer or response from someone with 1000+ rep, has more wieght that a casual user. Look for validations and triangulation, if it's a single voice in the crowd that you're listening too, then it's probably not worth that much. If other voices join then you know you're onto something, again in SO terms we all get a chance to upvote questions.
I've recently seen some really good iPhone apps which rely in crowd sourcing for their data, and then validate it by asking other users if it's correct.

Resources