Google Analytics: incorrect number of sessions when grouping by Custom Dimension - session

For a while I have successfully queried the number of sessions for my website, including the number of sessions per 'Lang Code' and per 'Distribution Channel'; both Custom Dimensions I have created in Analytics with their own slot and their Scope Type set to 'Session'.
Recently the number of sessions has decreased significantly when I group by a Custom Dimension, e.g. Lang Code.
The following query gives me a number of say 900:
https://ga-dev-tools.appspot.com/query-explorer/?start-date=2015-10-17&end-date=2015-10-17&metrics=ga%3Asessions
Whereas this query gives returns around a quarter of that, say ~220:
https://ga-dev-tools.appspot.com/query-explorer/?start-date=2015-10-17&end-date=2015-10-17&metrics=ga%3Asessions&dimensions=ga%3Adimension14
Now, my initial reaction was that 'Lang Code' was not set on all pages but I checked and this data is includes guaranteed on all pages of my website.
Also, no changes have been made to the Analytics View I'm querying.
The same issue occurred a couple of weeks ago and at the time I fixed this by changing the Scope Type of said Custom Dimensions to Session, but now I'm no longer sure if this was the correct fix or if this was just a temporary glitch since:
the issue didn't occur before
the issue now reoccurs
Does anyone have any idea what may have caused this data discrepancy?
P.S. to make things stranger, for daily reporting we run this query every night (around 2am), and then the numbers are actually correct, so apparently it makes a difference at what time the query is executed?

Related

Google Webmaster data quality issues

I am running into a weird error.
We have a standard implementation of getting data from searchconsole and storing it in a database. We have crosschecked the data during the implementation and it was good.
Lately we have seen huge differences in what is reported in search console and the data retrieved from the API. In some cases it is only 10% lower than the search console data but in some cases the API data shows 50% less than what is being reported in the search console.
Is any one aware of these issues and has anyone run into this recently?
I have had this problem for about a month now and finally fixed this issue.
This was my original request
service, flags = sample_tools.init(
argv, 'webmasters', 'v3', __doc__, __file__,
scope='https://www.googleapis.com/auth/webmasters.readonly')
I have fixed it by removing the ".readonly" on the end. This was causing me to get sampled data.
My scope now looks like this and return full results.
service, flags = sample_tools.init(
argv, 'webmasters', 'v3', __doc__, __file__,
scope='https://www.googleapis.com/auth/webmasters')
I'm having the same issue of reconciling to the console. How are you storing the data, i.e. your database table structure?
Have you read about the differences in the aggregation between page and property? These can cause discrepancies.
https://support.google.com/webmasters/answer/6155685?hl=en#urlorsite
For example a search query that returns multiple pages aggregated by the property counts as 1 impression. When you group by pages this would show as however many pages you have in the search results e.g. 3 or 4. Therefore by query and by date your impressions will be lower than if you aggregate by page.

API User Usage Report: Inconsistent Reporting

I'm using a JVM to perform API calls to the Google Apps Administrator API.
I've noticed with the User Usage Reports, I'm not getting complete data when it comes to a field I'm interested in (num_docs_externally_visible) and the fields which form that fields calculation. I generally request a single day's usage report at a time, across my entire user base (~40k users).
According to the documentation on the developer's, I should be able to see that in a report 2-6 days after; however after running a report for the first 3 weeks of February, I've only gotten it for 60% of the days. The pattern appears to be random (in that I have up to 4 day in a row streaks of the item appearing and 3 days in a row streaks of it not, but there is no consistency to this).
Has anyone else experienced this issue? And if so, were you able to resolve it? Or should I expect this behavior to continue if this is an issue with what the API is returning outside of my control?
I think it's only natural that the data you get is not yet complete, it takes a certain day to receive the complete data.
This SO question is not exactly the same of your question, but i think it will help you. Especially the part that you need to use your account time zone.

Caching expensive SQL query in memory or in the database?

Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps

How to improve GWT performance?

I am using GWT 2.4. There are times when I have to show huge amount of records for example: 50,000 records on my screen in a gridtable or flextable. But it takes very long to load that screen say around 30 mins or so; or, ultimately the screen hangs or at times IE displays an error saying that this might take too long and your application will stop working, so do you wish to continue.
Is there any solution to improve gwt performance?
Don't bring all data at once, you should bring it in pages, as the comments suggested here.
However, paging not be trivial , as it might be that during paging your db is filled with more entries, and if you're using some sorting algorithm for the results,
the new entries might ruin your sorting (for example, when trying to fetch page #2, some entries that should have been on the first page are inserted.
You may decided that you create some sort of "cursor" for paging purposes and it will reflect the state of your database at the point you created it, so you will ignore entires that are entered during traversal between pages.
Another option you may consider, as part of paging is providing only a small version for each record - i.e - only the most important details, and let the user double click if he wants to see the whole details for the record - this can also provide you some performance improvement within each page.

(ASP.NET) How would you go about creating a real-time counter which tracks database changes?

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?

Resources