Is it possible to get " raw " data from Google Analytics, then to group the data yourself? - metrics

I need to get data from one query using 13 dimensions and 13 metrics. Because of the limitation, I can't do that. Q: is it possible to get raw data or how can you combine data from multiple queries with different dimensions?

If you are using Google Analytics Standard, then there is no way to fetch 13 dimensions in a single query. But if you have Google Analytics 360 aka Premium version then you can link your GA 360 account to BigQuery and then you can fetch such data using sql from BigQuery.
Having said that, there is a way for standard as well but its feasibility depends the kind of data you would want to fetch. You can use segment features if possible by creating a segment based on some of the dimensions and rest dimensions ( 5 maximum) can be put in custom report. For example if you want to fetch data such as sessions, transactions, bounce rate, time spent and other metrics on the basis of Gender type: Male, Device: desktop, Country: India, Age: 25-34, Browser: Chrome, and other dimensions are generic with no filters such as source, medium, event action, landing page and etc. Then, you can create a segment based on dimensions which sort of have filter criteria as per your requirement and use other dimensions which don't have filter criteria in custom report.But remember, the feasibility of this solution depends on your requirement.

Related

Where in the stack to best merge analytical data-warehouse data with data scraped+cached from third-party APIs?

Background information
We sell an API to users, that analyzes and presents corporate financial-portfolio data derived from public records.
We have an "analytical data warehouse" that contains all the raw data used to calculate the financial portfolios. This data warehouse is fed by an ETL pipeline, and so isn't "owned" by our API server per se. (E.g. the API server only has read-only permissions to the analytical data warehouse; the schema migrations for the data in the data warehouse live alongside the ETL pipeline rather than alongside the API server; etc.)
We also have a small document store (actually a Redis instance with persistence configured) that is owned by the API layer. The API layer runs various jobs to write into this store, and then queries data back as needed. You can think of this store as a shared persistent cache of various bits of the API layer's in-memory state. The API layer stores things like API-key blacklists in here.
Problem statement
All our input data is denominated in USD, and our calculations occur in USD. However, we give our customers the query-time option to convert the response just-in-time to another currency. We do this by having the API layer run a background job to scrape exchange-rate data, and then cache it in the document store. Individual API-layer nodes then do (in-memory-cached-with-TTL) fetches from this exchange-rates key in the store, whenever a query result needs to be translated into a specific currency.
At first, we thought that this unit conversion wasn't really "about" our data, just about the API's UX, and so we thought this was entirely an API-layer concern, where it made sense to store the exchange-rates data into our document store.
(Also, we noticed that, by not pre-converting our DB results into a specific currency on the DB side, the calculated results of a query for a particular portfolio became more cache-friendly; the way we're doing things, we can cache and reuse the portfolio query results between queries, even if the queries want the results in different currencies.)
But recently we've been expanding into also allowing partner clients to also execute complex data-science/Business Intelligence queries directly against our analytical data warehouse. And it turns out that they will also, often, need to do final exchange-rate conversions in their BI queries as well—despite there being no API layer involved here.
It seems like, to serve the needs of BI querying, the exchange-rate data "should" actually live in the analytical data warehouse alongside the financial data; and the ETL pipeline "should" be responsible for doing the API scraping required to fetch and feed in the exchange-rate data.
But this feels wrong: the exchange-rate data has a different lifecycle and integrity constraints than our financial data. The exchange rates are dirty and ephemeral point-in-time samples attained by scraping, whereas the financial data is a reliable historical event stream. The exchange rates get constantly updated/overwritten, while the financial data is append-only. Etc.
What is the best practice for serving the needs of analytical queries that need to access backend "application state" for "query result presentation" needs like this? Or am I wrong in thinking of this exchange-rate data as "application state" in the first place?
What I find interesting about your scenario is about when the exchange rate data is applicable.
In the case of the API, it's all about the realtime value in the other currency and it makes sense to have the most recent value in your API app scope (Redis).
However, I assume your analytical data warehouse has tables with purchases that were made at a certain time. In those cases, the current exchange rate is not really relevant to the value of the transaction.
This might mean that you want to store the exchange rate history in your warehouse or expand the "purchases" table to store the values in all the currencies at that moment.

Filter a Data Source from a Different Data Source

I have two chart tables both with different data sources. I want one table to act as the filter to the other table.
Here is the problem...
I tried a custom query for my data source which used the email parameter to filter the data source.
The problem is every time a user changes a filter on any page a query is executed in BigQuery, slowing the results and exponentially increasing my BigQuery monthly charges.
I tried blending the two tables.
The problem is the blended data feature only allows for 10 dimensions to be added to the resulting blended data source and is very slow.
I tried creating a control filter using a custom field on the "location" column on each table sharing the same "Field Id".
The problem is that the results table returns all the stores until you click on a location in the control list. And I cannot let a user see other locations.
Here is a link to a data studio sample report you can clearly see what I am trying to do.
https://datastudio.google.com/reporting/dd33be45-ab13-4881-8a3b-cabafa8c0dbb
Thanks
One solution which i can recommend to over come your first challenge, i.e. High cost. You can customize cost by using GCP-Memorystore, depending on frequency of data that is getting updated.
Moreover, Bigquery also cashes data for a query if you are not using Wild cards on tables and Time partitioned tables. So try to customize your solution over analysis cost if it is feasible over your solution. Bigquery Partition and Clusting may also help you in reducing BQ analysis cost.

Business Data Preparation for Reporting and BI

I am doing some research about what the best possible state that data should be in so that reporting and BI analytics perform well but can be produced by business users from a set of various data collections which align with a business data glossary that I have worked through.
We have not chosen a specific BI tool but have been playing around with Power BI and Sisense
We have not decided on a data store technology to use for reporting purposes
Origin Data
Our business application that the data will originate from has a normalised SQL relational database. There are quite a few tables and joins to consider which work fine from an application perspective but I have recommended supplying the output of those queries as a flat denormalised set of data to increase redundancy and remove the joins entirely.
Business Data Glossary
As we go through defining the business data glossary, the number of columns increases but I do not anticipate there being any more than 100 columns per row as a complete reporting set of data. I wanted to ensure that each row of data is at a transactional depth (level 0) and that the roll up through the data would be done through aggregations by distinct key values and dimensional taxonomy.
Architecture
I want some advice around what a modern architecture looks like and what works for business users rather than users who are comfortable with SQL queries and a myriad of joins on a physical data model.
I read an article about setting up data flows for Power BI which looked like they type of thing I want to do from a data availability perspective but it doesn't advice on how the data should be stored and what type of database to use.
Data Sets
The data we have that needs to be reported on are transactions where level 0 is trade positions (individual transactions from either a local or counterparty entity), level 1 is reconciliations (relating local and counterparty entities and trade linking identifier) and level 2 would be where it can be rolled up by taxonomy like asset type or status.
The current data set size would be a snapshot of positions every business day so, its duplicated every day with a snapshot date applied. The reports would be able to move across dates and show changes over time.
Any advice would be greatly appreciated on how to tackle reporting and BI in 2020. Oooh, one last thing, there is the possibility that we won't be allowed to process this type of data in the public cloud, we have our own infrastructure which is on private cloud so, that might need to be a consideration. Thanks

Is there any difference in metrics when Querying the data using Eloqua API vs Getting a report from Eloqua Insights?

I am validating the data from Eloqua insights with the data I pulled using Eloqua API. There are some differences in the metrics.So, are there any issues when pulling the data using API vs .csv file using Eloqua Insights?
Absolutely, besides undocumented data discrepancies that might exist, Insights can aggregate, calculate, and expose various hidden relations between data in Eloqua that is not accessible by an API export definition.
Think of the api as the raw data with the ability to pick and choose fields and apply a general filter on those, but Insights/OBIEE as a way to calculate that data, create those relationships across tables of raw data, and then present it in a consumable manner to the end user. A user has little use with a 1 gigabyte csv of individual unsubscribes for the past year, but present that in several graphs on a dashboard with running totals, averages, and timeseries, and it suddenly becomes actionable.

Loading of data with user conditions power BI

I am developing a report application in Power BI desktop version. I successfully created a dataset using a query and applying the filters on result data. But Now i have to get data from database in real time with user filters i.e. dataset would be created on the basis of some inputs given by users. We need this as database size is quite huge and we can not load the data then apply filters and create reports.
Same can easily be done in Dot Net application but we have to achieve this on Power Bi.
Please suggest if this can be done.
I would use the Query Parameters feature for this. You add them in the Edit Queries window, from Home / Manage Parameters, then you can use them in Calculated columns or replacing a "hard coded" filter.
There's a detailed write up in a recent blog post:
https://powerbi.microsoft.com/de-de/blog/deep-dive-into-query-parameters-and-power-bi-templates/

Resources