Best Practice for Middle tier Data structure that combines multiple data sources - data-structures

I have 4 queries from different sources that load all the needed KPIs that will be displayed on the UI. My format is essentially a simple table:
KPI
Client 1
Client 2
Client 3
kpi1
val1,1
val2,1
val3,1
kpi2
val1,2
val2,2
val3,2
kpi3
val1,3
val2,3
val3,3
But, the tricky part is that we have 20 or so separate tables/sections like this and the kpis are mixed up. i.e. kpi1 comes from data source 1, but kpi is from data source 2, then back to data source 1 for kpi3, etc. I want to just pull down the data once, and then populate all the various sections based on the definition of what the source is.
So, basically I need a layer of code that maps specific "rows" or "collections" to their source. What is the best practice for this "transformation layer"
I want to make sure it's easy to update kpi definitions, add new ones, etc. Also, a plus would be the ability to easily have a display attribute, so for instance kpi1 would be displayed as "Most Important KPI" and kpi2 could be displayed as "Interesting KPI".
I'm open to creating a Model object where each kpis is an attribute - it's just the mapping back to the source that's throwing me off a bit.
Thanks!

Related

Global filters for different data sources (with common tables)

I am currently working on Tableau using 2 data sources using each a join of 2 tables (named A, B, C):
Data source 1: A-B
Data source 2: A-C
Basically, A contains the major information that I need and then I join data from B and C to get the extra information I need for each report I am doing.
I then do a dashboard that contains reports using the data source 1 and 2.
My problem now is that I am filtering this dashboard using a dimension in A and I would like it to apply to all worksheets (e.g. for those using data sources 1 and those using data source 2).
I thought that because A is the common table in all data sources, that using a dimension in A would be ok to filter everything but it seems that it is not the case.
Is there a way to fix this?
I read some forums about creating a parameter. However, the filtering I am doing is basically as follows: I want my users to choose 1 shop name. They can find it either by:
Typing the name in the 'Shop name' quick filter,
Using a combination of the quick filters 'Region' and 'country' to then get a drop down of 'Shop Name' that has a reduced amounts of shop names (easier when the user knows where the shop is but does not remember its exact name).
Using a parameter would not allow me to do this anymore since all of this is based on 'filtering the relevant values'.
Does anyone have any recommendations?

Appropriate data structure to read this file

I have the following info in a text file.
Item Rate
pencil 2
eraser 1
laser 3
pencil 1
torch 4
eraser 1
Specifically, I want to know if any item in the above list has a different price.
For eg: In the above one, you can see that pencil has 2 rates ie 2 and 1.
The price of the eraser is same in both entries, so no problem.
Further complexities - The text file is very huge.
Since dicts don't allow us to store duplicate keys, please suggest ways to solve this problem along with appropriate data structure.
You Can use Hash Table with Separate Chaining Method.Hope it will works
Does the file have to be plain text ? I recommend tackling this problem by using XML format and parsing it with SAX (not DOM !). SAX will not load the entire file in the memory, so it works well with huge file sizes.
As for the data structure, you could always define your own or you could just use something like this Map<KeyType, List<ValueType>>. I feel it's counter-intuitive to have different prices mapped for the same product name. You could create a unique ID for every type of product and have a new field: quantity.

TABLEAU: Create global filter from a secondary data source to multiple data sources on dashboard

I have a Tableau dashboard with various visualizations created from 3 data sources (i.e. A,B, C).
Each data source has a relationship (join) with the same secondary data source (i.e. D), and the secondary data sources provides information to create a filter for each data source. In other words, there is the following relationship for my data sources:
A - D
B - D
C - D
I would like to create a global filter on a dashboard I have created. I would like one filter card from "D" to show up and be applied to "A," "B," and "C" at once rather than having a separate filter card show up for each data source.
I tried to create a global filter via a parameter and calculated field, but the parameter requires layers of connections because data sources "A,B, and C" only have "D" in common.
Thoughts?
Its not completely clear from your question, but it sounds like you are using Tableau data blending on your worksheets to include data from multiple data sources, rather than a join to create a data source based on multiple tables. If all your tables are on the same database server or spreadsheet, then traditional joins are usually more efficient than data blending.
The following approach often works well.
Instead of using Tableau's quick filter feature, create a worksheet based solely on D that shows the values you wish to use for filtering. It can be a simple list of names, or a bubble chart or anything you like. Use that worksheet as your filter by creating actions where it is the source and all the other worksheets on your dashboard are the target. Typically, you would want to specify the field names explicitly.
Data blending is useful but can be complex. Depending on details, you may need to make D the primary data source on your other worksheets. Experiment.
The parameter and calculated field you mentioned can be even simpler and faster than using actions, but users are restricted to selecting a single value for a parameter unlike the filter action approach. (Of course, one parameter value can represent multiple values in your target data source field depending entirely on how your calculated field interprets the parameter).
I can't tell why that didn't work for you or what you mean by "layers of connections". You might consider clarifying that part of your question.

Cassandra DB: is it favorable, or frowned upon, to index multiple criteria per row?

I've been doing a lot of reading lately on Cassandra, and specifically how to structure rows to take advantage of indexing/sorting, but there is one thing I am still unclear on; how many "index" items (or filters if you will) should you include in a column family (CF) row?
Specifically: I am building an app and will be using Cassandra to archive log data, which I will use for analytics.
Example types of analytic searches will include (by date range):
total visits to specific site section
total visits by Country
traffic source
I plan to store the whole log object in JSON format, but to avoid having to go through each item to get basic data, or to create multiple CF just to get basic data, I am curious to know if it's a good idea to include these above "filters" as columns (compound column segment)?
Example:
Row Key | timeUUID:data | timeUUID:country | timeUUID:source |
======================================================
timeUUID:section | JSON Object | USA | example.com |
So as you can see from the structure, the row key would be a compound key of timeUUID (say per day) plus the site section I want to get stats for. This lets me query a date range quite easily.
Next, my dilemma, the columns. Compound column name with timeUUID lets me sort & do a time based slice, but does the concept make sense?
Is this type of structure acceptable by the current "best practice", or would it be frowned upon? Would it be advisable to create a separate "index" CF for each metric I want to query on? (even when it's as simple as this?)
I would rather get this right the first time instead of having to restructure the data and refactor my application code later.
I think the idea behind this is OK. It's a pretty common way of doing timeslicing (assuming I've understood your schema anyway - a create table snippet would be great). Some minor tweaks ...
You don't need a timeUUID as your row key. Given that you suggest partitioning by individual days (which are inherently unique) you don't need a UUID aspect. A timestamp is probably fine, or even simpler a varchar in the format YYYYMMDD (or whatever arrangement you prefer).
You will probably also want to swap your row key composition around to section:time. The reason for this is that if you need to specify an IN clause (i.e. to grab multiple days) you can only do it on the last part of the key. This means you can do WHERE section = 'foo' and time IN (....). I imagine that's a more common use case - but the decision is obviously yours.
If your common case is querying the most recent data don't forget to cluster your timeUUID columns in descending order. This keeps the hot columns at the head.
Double storing content is fine (i.e. once for the JSON payload, and denormalised again for data you need to query). Storage is cheap.
I don't think you need indexes, but it depends on the queries you intend to run. If your queries are simple then you may want to store counters by (date:parameter) instead of values and just increment them as data comes in.

How to apply parent group for multiple datasets in SSRS VS2008

I have been battling this issue for days without success. I have a very tricky format of a report i need to achieve but the main thing is that all the datasets will need to be grouped by 1 parent. I'll attempt to explain...
Say we have dataset1, dataset2. Both have AccountNumber as common field(parent).
I need both datasets to be used in the format/layout of the report but grouped together by AccountNumber, something like this.
[Report Header Data]
[AccountNumber Group]
Dataset1
Dataset2
[end AccountNumber Group]
What is the best way to achieve this? The format of the report has been a major road block on grouping thus making me split the data into multiple datasets, group all them together by accountnumber and then create a custom format per dataset in the report. The flow of the report may be something like this
[Report Header Data]
[AccountNumber Group]
[tablix1]
Dataset1
[tablix1]
[tablix2]
Dataset2
[tablix2]
[end AccountNumber Group]
Looking forward to the discussion on this!
There are multiple ways to achieve this effect, and the best for your situation depends on the details of your report. So I'll just give some of the techniques I've used in the past:
Join the two datasets into one
Joining the datasets into one in your query is one of the simplest answers, and works across all versions of SSRS. It can make the SQL queries large, but it makes report layout simple.
Use the Lookup(...) function
SSRS 2008R2 added the Lookup(...) function, which can be used to access items in a second dataset. It's a little bit awkward to use, and requires a separate formula for every field to be accessed, but it is very powerful for retrieving a few fields from a different dataset.
Sub reports
Similar to the approach descibed in the original question, This lets you create a parent project with one tablix, and then place a subreport within. The subreport will be called multiple times, with the Grouping item as a parameter. Each run of the report should only return the report for that instance of the group. This can be very powerful, but maintenance is difficult: you have two places to change some thingss, and it can require manual tweaking to make sure columns line up correctly. The subreport will often be the fastest report to run, since it is getting called many times.
[NB: StackOverflow.com isn't the best place for discussions. The design of the site is set up to avoid discussion and aim towards question & answers, not discussion.]
I don't know if there's a perfect solution here.
Based on your description, (and it sounds like you're leaning in this direction already) you'll need a Dataset for each distinct AccountNumber, and create a new list or table based on this.
Once you have this set up you need to embed the different Dataset objects (i.e. tablix1, tablix2) in each row.
The main issue here is that you can't use multiple Datasets when embedding tablixes within tablixes, so this makes me think that you may need a subreport solution - this way the subreports can take an AccountNumber parameter and each use a different Dataset.
So something like:
[Report Header Data]
[AccountNumber Group]
[subreport1]
[tablix1]
Dataset1
[tablix1]
[subreport1]
[subreport2]
[tablix2]
Dataset2
[tablix2]
[subreport2]
[end AccountNumber Group]
This will repeat for each AccountNumber as required.
It's tough to say without knowing exactly what your data looks like, but in 2008R2 and above you can use Lookup and LookupSet to join Datasets, but that will be cumbersome for multiple values, even if you are running the correct edition.
Again, depending on your data, another option is adjacent groups, if you can manage to get the data in one Dataset... This would allow to have different groupings next to another under the AccountName group, but it's a long shot.
It would be great if we know the report data e.g Payslip, payslip with loan balance (ie Dataset 1 for payslip and Dataset 2 for loan).
Anyway, the format will depend on the required output of the report. i.e If your planning on produce calculation like sum in the report and if the result output will be per Dataset or for both dataset.
Assuming you will need sum calculations, if the calculation result will be per dataset, then option 2 is good, if the calculation result is for total (Dataset 1 + Dataset 2) then option 1 is better.
If no calculations or total result is required, either will do.

Resources