Getting into designing dashboards and need some help identifying each technical layer along the way - business-intelligence

So I will be embarking on designing a dashboard that will display KPI's and other relevant information for my team. Since I am in the early stages of this project and am not very familiar on the technical process behind designing a dashboard, I need some questions vetted out first before I go and shop for some solutions to avoid reinventing the wheel.
Here are some of my questions:
We want a dashboard that can provide live-time information via our data sources (or as close to live-time as possible). What function allows a dashboard to update itself with concurrent datasources? From a conceptual standpoint, I can understand creating a dashboard out of Microsoft Excel, and having the dashboard dependent on the values you may have set within your pivot table.
How do you make a dashboard request information from multiple datasources on its own? Just like the excel example, a user may have to go into the pivot tables to update values, but I want to know how would a dashboard request this by itself and what is the exact method from a programming standpoint? Does the code execute itself every time you refresh the webpage?
How do you create datasources organically? I know for some solutions such as SharePoint BI Center, there are pre-supported datasources like an excel sheet or SharePoint and it's as easy as uploading your document and letting the design handle the rest. However, there are going to be some datasources that I know that will need to be fetched. Do I need to understand something else like an event recorder in order to navigate this issue?

Introduction
The dashboard (or a report, respectively) is usually the result of a long chain of steps. Very much simplified it could look like this:
src1
|------\
src2 | /---- Dashboards
|------+---[DWH]-[BR]-+
src n | | \---- Reports etc.
|------/ [Big Data]
Keep in mind, this is only a very, very simple structure of a data backend / frontend.
DWH means Data Warehouse, where data might be stored temporarily (you referred to this as fetching). This could be a database, could be a Big Data engine, could be a combination of both...
Afterwards, there are Business Rules (BR). Those might be specific rules in how different departments calculate and relate to data, but also simple things like algebra.
Questions
So, the main question should not be about the technology:
What software should we choose?
How can we create a dashboard?
but on the contrary focused on your business processes (see it like a top-down view):
How does our core process look like? Where would I like to measure data?
How would department a calculate sales in difference to department b? Should all use the same rule?
Where does everyone store the data? Can we access it? Do we need structural data?
And, very easy to forget but also easily sometimes one of the biggest parts: Is the identifier of a business object (say, sales id) everywhere build and formatted in the same way?
Conclusion
When those questions are at least in the back of your head and you keep working in this direction, more or less automatically data will spill out at certain points of that process.
Then it won't matter if you use Excel, a small-to medium app like Tableau, Tibco Spotfire, QlikView, Power BI or you want to go full scale with a big Hadoop backend, databases and JasperReports, Apache Drill, Pentaho, SSIS on top of it... it will come out eventually.
TL;DR
Focus on the processes first. Make sure to understand them. Draft in Excel. Then proceed in getting the data and the tools you need to help your use cases. It will work out much better from a "top-down" approach than trying to solve your requirements with tools only.

Related

Seeking Advice For Oracle Data-Intensive Application

I'm endeavoring to develop an application that uses Oracle as the database back-end. The application will calculate several statistics from the various tables in the database. The front-end will most likely be a web application and this front-end will display various charts and calculated statistics. Now, I imagine that it would be more efficient to perform the calculations in the database rather than in the service layer because said calculations would need to be performed for every web request. That being the case, I'm not sure which mechanism to use. (e.g. stored procedure, function, view) To illustrate what I'm going for, suppose I want to keep statistics of student grades for many students. I would like to have a web interface that lets me view those statistics on student-by-student basis and also an all-inclusive basis. Some of the stats are dependent on aggregates (e.g. average, min, max) of all of the student grades and some stats are dependent only on an individual student. In this situation, every time a record is added or updated, the aggregates would have to be recalculated. So I am speculating that if I had a special table that held all of the calculated values I need and a trigger(s) to recalculate everything when a record is added/updated then all I would need to do from a web request point-of-view is have the service layer pull the desired values from this special table. I'm just not sure if this is the best way to go or not so I am asking the community for any input/advice. Note: Although I'm using Oracle, I'm open to using PostgreSQL or mySQL.
Thanks in advance
The scenario you are describing would be ideal for using materialized views. They can be designed to refresh automatically (and incrementally) every time the source data is updated by your application. The calculations would be built in to the view definition. No triggers required, and likely no stored procedures unless your calculations involve multiple steps. Check here: https://oracle-base.com/articles/misc/materialized-views and here: https://medium.com/oracledevs/lightning-fast-sql-with-real-time-materialized-views-12-things-developers-will-love-about-oracle-54bcc9eac358 for more info.

what after extracting my DATA?

Actually, I have a project to deal with. I'm Asking for help.
My project is in the field of Business intelligence and creating datawarehouses.
I extracted Data that I need (ETL) and then what should I do ?
I am working with MS SQL Server 2014.
How to create my dimensions and my Fact table?
looking for advises
Please do accept my salvation.
This is a big question! Unfortunately, Stack Overflow's Q&A format isn't the best place to answer this. But here are few pointers:
Everything starts with the requirements. Before you write any code, figure out exactly what your data warehouse will be used for (it can also be helpful to work out what your data warehouse will not be used for).
Analyse the raw data. Make sure you know what is and is not available. Be aware of the source systems shortcomings. Example: If your reports need to split your customers by country, is this data available? If so, is it consistently populated (some records have US, others USA, others still America)? Make a plan for dealing with these issues (see data cleansing below).
Prototype your data model. Excel and Power BI are great places to test the design. Once you start using a database it becomes much harder to change. Get it right at the very beginning and your life will be much easier.
Pick an ETL tool. Make sure you understand it, and it plays to the strengths of you and your team. I like SSIS.
Import the raw data into staging tables. This can help to simplify the analysis phase.
Cleanse the data. In a data warehouse, you have 100% control over every row, column and cell. Make use of this fact. Ensure only quality, useful, well-conformed data makes it into your published tables.
Like all projects, planning and administration is the key. Writing code and building tables comes last.
Here are some resources which should help you:
Kimball Group. Ralph Kimball literally wrote the book on data warehousing (see next tip). His company's website contains a few hints and tips.
If you cannot attend a training course, buy a good book. I'd recommend this one. It's a big subject. Blogs and the internet can only teach you so much.
Download and try out Adventure Works DW. This is a sample data warehouse and ETL package, built by Microsoft. It demonstrates some the techniques you can use in SSIS.

DB candidate as CouchDB/Schema replacement

The idea is to redesign data structure and/or change DB.
I just started to review this project and plan to start optimization from this one.
Currently i have CouchDb with about 80GB of document data, around 30M records.
From that subset for the most of documents properties like id, group_id, location, type can be considered as generic, but unfortunately for now such are even stored with different property naming around the set. Also a lot of deeply nested can be found.
Structure isn't hardly defined, that's why NoSQL db was selected way before some picture was seen.
Data is calculated and populated in DB in a separate Job on powerful cluster. This isn't done too often. From that perspective i can conclude that general write/update performance isn't very important. Also size decrease would be great, but isn't most important. There are only like 1-10 active customers at a time.
Actually read performance with various filtering/grouping etc is most important.
But no heavy summary calculations should be done, this one is already done while population.
This one is a data analytical tool for displaying compare and other reports to quality engineers and data analyst, so they can browse the results, group them or filter from the Web UI.
Now such tasks like searching a subset of document properties for a text isn't possible due to performance.
For sure i've done some initial investigations(like http://www.datastax.com/wp-content/themes/datastax-2014-08/files/NoSQL_Benchmarks_EndPoint.pdf) and it looks Cassandra seems to be good choice among NoSql.
Also it's quite interesting trying to port this data into the new PostgreSQl.
Any ideas would be highly appreciated :-)
Hello please check the following articles:
http://www.enterprisedb.com/nosql-for-enterprise
For me, PostgreSQL json(and jsonb!) capabilities allow to start schema-less, have transactions, indexes, grouping, aggregate functions with very good performance, just from the start. And when ready(and if needed), you can go for the schema, with internal data migration.
Also check:
https://www.compose.io/articles/is-postgresql-your-next-json-database/
Good luck

Big Data transfer between different systems

We have different set of data into different systems like Hadoop, Cassandra, MongoDB. But our analytic team want to get the stitched data from different systems. For example customer information with demographic will be in one system, their transactions will be in another system. Analytic should able to query to get data like from US users what was the volume of transaction. We need to develop an application to provide ease way to interact with different system. What is the best way to do?
Another requirement:
If we want to provide their custom workspace in a system like MongoDB, they can easily place with it. What is the best strategy to pull data from one system to another system on demand?
Any pointer or common architecture used to solve this kind of problem will be really helpful.
I see two questions here:
How can I consolidate data from different systems into one system?
How can I create some data in Mongo for people to experiment with?
Here we go ... =)
I would pick one system and target that for consolidation. In other words, between Hadoop, Cassandra and MongoDB, which one does your team have the most experience with? Which one do you find easiest to query with? Which one do you have set up to scale well?
Each one has pros and cons to scale, storage and queryability.
I would pick one and then pump all data to that system. At a recent job, that ended up being MongoDB. It was easy to move data to Mongo and it had by far the best query language. It also had a great community and setting up nodes was easier than Hadoop, etc.
Once you have solved (1), you can trim your data set and create a scaled down sandbox for people to run ad-hoc queries against. That would be my approach. You don't want to support the entire data set, because it would likely be too expensive and complicated.
If you were doing this in a relational database, I would say just run a
select top 1000 * from [table]
query on each table and use that data for people to play with.

Data mine a huge amount of data

I store a huge amount of reporting elements in a MySQL database. These elements are stored in a simple way :
KindOfEvent;FromCountry;FromGroupOfUser;FromUser;CreationDate
All these reporting elements should permit to display graphs from different points of view. I have tried using SQL requests for that but it is very slow for users. As this graph will be used by non-technical users, I need a tool to pre-work the result.
I am very new to all this data-mining, reporting, olap concepts. If you know a pragmatic approach not so time consuming, or a tool for that, it would help !
You could setup OLAP cubes on top of your MySQL data. The multi-dimensional model will help your users navigating through and analysing the data either via Excel or Web dashboards. One thing specific to icCube is its ability to integrate any Javascript charting library and to embed the dashboard within your own pages.
I am not familiar with DB, but I think MySQL is far than enough for your problems. Well designed index or transaction will speed up the query process.
I am not a DB expert but if you want to process graphs, you can use Neo4J (java graph processing framework), or SNAP (C++ graph processing framework), or employee cloud computing if this is possible. I would recommend either Hadoop (MapReduce) or Giraph (cloud graph processing). For graph display you can use whatever tools suites you. Of course "the best" technology depends on the data size. If none of the above suites you, try finding something that does on the wiki page: http://en.wikipedia.org/wiki/Graph_database
InforGrid (http://infogrid.org/trac/) looks like might suite you.

Resources