Salesforce Table Relationships for Business Analyst - data-structures

I am a business analyst. I use Tableau a lot but have limited knowledge about the back-end of Salesforce. The majority of our company's data is stored in Salesforce and our data team does not support business users for understanding such topics.
In many of my projects, I use the Salesforce connector inside Tableau to extract Salesforce tables, but it requires knowledge about joins relationships among tables. Most of the time, I can guess correctly about the primary key among tables, but I still want to learn systematically about the data structure and have my data independence.
So, how do I learn the data structure by myself? Or how do I ask specific structure questions to data team so I don't trouble them as much?

Do you have Salesforce account with "Customize Application" permission? If you don't have in production - maybe they'll be willing to promote you to sysadmin in one of sandboxes.
If you do - Setup -> Schema Builder might be easiest tool to visualise relations. It's bit old, flash-based but pretty neat way to model relationships. https://trailhead.salesforce.com/en/content/learn/modules/data_modeling/schema_builder
Another one might be workbench, http://workbench.developerforce.com/ It's not as neat but lets you experiment with metadata & queries, learn which object has what child relationships...
For standard objects if you have a primary key / foreign key you can use some lookup tables to learn more about target table. All Account Ids in all SF instances start with 001. Contacts with 003, Users with 005... Combine some blogs like http://www.fishofprey.com/2011/09/obscure-salesforce-object-key-prefixes.html with https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_objects_account.htm and it's a good start. Won't help much with custom objects and fields (specific to your company) but well.
It's bit "meta" but you can query info about tables and columns too. After all - you might be more comfortable in Tableau ;) Querying Salesforce Object Column Names w/SOQL might give you some hints.

If your job is to build advanced reports off these data sources, I would imagine you need to understand the data structure to some extent. This would mean you need to have authorization to view and access the database table list to get familiar with it and possibly run raw queries to verify data integrity.
If they are not comfortable with you touching the production system, ask for access to a development system which is a copy of production or even just realistic test data.

Related

Getting into designing dashboards and need some help identifying each technical layer along the way

So I will be embarking on designing a dashboard that will display KPI's and other relevant information for my team. Since I am in the early stages of this project and am not very familiar on the technical process behind designing a dashboard, I need some questions vetted out first before I go and shop for some solutions to avoid reinventing the wheel.
Here are some of my questions:
We want a dashboard that can provide live-time information via our data sources (or as close to live-time as possible). What function allows a dashboard to update itself with concurrent datasources? From a conceptual standpoint, I can understand creating a dashboard out of Microsoft Excel, and having the dashboard dependent on the values you may have set within your pivot table.
How do you make a dashboard request information from multiple datasources on its own? Just like the excel example, a user may have to go into the pivot tables to update values, but I want to know how would a dashboard request this by itself and what is the exact method from a programming standpoint? Does the code execute itself every time you refresh the webpage?
How do you create datasources organically? I know for some solutions such as SharePoint BI Center, there are pre-supported datasources like an excel sheet or SharePoint and it's as easy as uploading your document and letting the design handle the rest. However, there are going to be some datasources that I know that will need to be fetched. Do I need to understand something else like an event recorder in order to navigate this issue?
Introduction
The dashboard (or a report, respectively) is usually the result of a long chain of steps. Very much simplified it could look like this:
src1
|------\
src2 | /---- Dashboards
|------+---[DWH]-[BR]-+
src n | | \---- Reports etc.
|------/ [Big Data]
Keep in mind, this is only a very, very simple structure of a data backend / frontend.
DWH means Data Warehouse, where data might be stored temporarily (you referred to this as fetching). This could be a database, could be a Big Data engine, could be a combination of both...
Afterwards, there are Business Rules (BR). Those might be specific rules in how different departments calculate and relate to data, but also simple things like algebra.
Questions
So, the main question should not be about the technology:
What software should we choose?
How can we create a dashboard?
but on the contrary focused on your business processes (see it like a top-down view):
How does our core process look like? Where would I like to measure data?
How would department a calculate sales in difference to department b? Should all use the same rule?
Where does everyone store the data? Can we access it? Do we need structural data?
And, very easy to forget but also easily sometimes one of the biggest parts: Is the identifier of a business object (say, sales id) everywhere build and formatted in the same way?
Conclusion
When those questions are at least in the back of your head and you keep working in this direction, more or less automatically data will spill out at certain points of that process.
Then it won't matter if you use Excel, a small-to medium app like Tableau, Tibco Spotfire, QlikView, Power BI or you want to go full scale with a big Hadoop backend, databases and JasperReports, Apache Drill, Pentaho, SSIS on top of it... it will come out eventually.
TL;DR
Focus on the processes first. Make sure to understand them. Draft in Excel. Then proceed in getting the data and the tools you need to help your use cases. It will work out much better from a "top-down" approach than trying to solve your requirements with tools only.

Dynamically generate data based notifications platform

In our project we have a requirement to create dynamic notifications that "pop" in our site when a relevant rule applies.
We are based on oracle exadata as our main database.
This feature is suppose to allow the users to create dynamic rules that will be occasionally checked.
These rules may check specific fields in certain types, and may also check these fields relatively to other types field's data.
For example, if our program has a table of cars, with a location column, and another table of streets, with location column (no direct relation between those two tables), we might need to notify the users if a car is in a certain street.
Is there a good platform that can help us calculate the kind of "rules" that we want to check?
We started looking at elasticsearch and neo4j (we have a specific module that involves a graph-like relations..), but we aren't sure that they would be the right solution.
Any idea would be appreciated :)
Neo4j could help you to express your rules, but it sounds as if your disconnected data is rather queried by SQL style joins?
So if you want to express and manage your rules in predicates in the graph you can do that easily and then get a list of applicable rules to trigger queries in other databases.

Multi-tenant database. One collection or one db per tenant?

For a multi-tenancy architecture for a web application using a document-oriented database I can see two conceivable options:
Having one database per tenant, and the collections logically separate different kinds of object.
Having one collection per tenant, and all user data is stored in one database, with some kind of flag or object type identifier on each record.
Have there been any studies or has any documentation been produced regarding these two options and the differences between them?
Is there a particular standard or good reason why someone designing a web application which allows multiple users to store vastly different kinds of data would choose one over the other?
Aside from speed/efficiency issues, are there any other things to be said about this that would influence the decision?
EDIT I'm aware some of the terminology might be database specific, so for all wondering I am specifically referring to MongoDB.
I wouldn't want tenant specific collections. In my application, I usually hard code collection names, in the same way as I'd hardcode table names if I were using SQL tables. There'd be one comments collection that stores all comments for a blog. I would not want to deal with collection names like comments_tenant_1 and comments_tenant_2, because 1) that feels error prone, and 2) would make the application code more complicated (collection names would have to be replaced with functions that computed the collection name). And 3) the number of collections in a single database could grow huge, which would make a list of all collections look daunting, and also MongoDB isn't built for having very many collections (see the link in the comment below your question, which David B posted, https://docs.mongohq.com/use-cases/multi-tenant.html).
However, database names aren't coupled to application data structures, and you can grant permissions on databases (but not on single collections). So one database per tenant could be reasonable. As could be a per document tenant_id field in a single database for all tenants (see the above-mentioned link).

working with LINQ to Entities against multiple sql server databases

I'm building a project combined of number of sites with common subject.
The sites rely on one central database that holds the common info for all of them.
In addition, each site has another database that holds its unique info (I will refer to it as unique-db in the next lines so I won't be misunderstood).
For example, the Languages table sits in the central db. That said, I suddenly noticed that I need to use the Languages table in one of my unique-db in order for the table to act as a FK so I don't have to create the same table again in the unique-db.
Do I have to create the same table again this time in the unique-db? Or is there a way to connect tables from separate databases?
In addition, we decided using linq2entity and soon we're gonna run some complex queries against the different databases. Will I have a problem with this matter?
How should I go on with that? Was it wise to split the data into a few databases?
I really appreciate all the help I can get!
One thing that might make your life easier is to create views of the central tables in each unique db. Linq to Entities will pick up views as if they were tables.

One database or many?

I am developing a website that will manage data for multiple entities. No data is shared between entities, but they may be owned by the same customer. A customer may want to manage all their entities from a single "dashboard". So should I have one database for everything, or keep the data seperated into individual databases?
Is there a best-practice? What are the positives/negatives for having a:
database for the entire site (entity
has a "customerID", data has
"entityID")
database for each
customer (data has "entityID")
database for each entity (relation of
database to customer is outside of
database)
Multiple databases seems like it would have better performance (fewer rows and joins) but may eventually become a maintenance nightmare.
Personally, I prefer separate databases, specifically a database for each entity. I like this approach for the following reasons:
Smaller = faster regarding the queries.
Queries are simpler.
No risk of ever accidentally displaying one customer's data to another.
One database could pose a performance bottleneck as it gets large (# of entities increase). You get a sort of build in horizontal scalability with 1 per entity.
Easy data clean up as customers or entities are removed.
Sure it'll take more time to upgrade the schema, but in my experience modifications are fairly uncommon once you deploy and additions are trivial.
I think this is hard to answer without more information.
I lean on the side of one database. Properly coded business objects should prevent you from forgetting clientId in your queries.
The type of database you are using and how it scales might help you make your decision.
For schema changes down the road, it seems one database would be easier from a maintenance perspective - you have one place to make them.
What about backup and restore? Could you experience a customer wanting to restore a backup for one of their entities?
This is a fairly normal scenario in multi-tenant SAAS applications. Both approaches have their pros and cons. Search on best practices for multi-tenant SAAS (software as a service) and you will find tons of stuff to ponder upon.
Check out this article on Microsoft's site. I think it does a nice job of laying out the different costs and benefits associated with Multi-Tenant designs. Also look at the Multi tenancy article on wikipedeia. There are many trade offs and your best match greatly depends on what type of product you are developing.
One good argument for keeping them in separate databases is that its easier to scale (you can simply have multiple installations of the server, with the client databases distributed across the servers).
Another argument is that once you are logged in, you don't need to add an extra where check (for client ID) in each of your queries.
So, a master DB backed by multiple DBs for each client may be a better approach,
If the client would ever need to restore only a single entity from a backup and leave the others in their current state, then the maintenance will be much easier if each entity is in a separate database. if they can be backed up and restored together, then it may be easier to maintain the entities as a single database.
I think you have to go with the most realistic scenario and not necessarily what a customer "may" want to do in the future. If you are going to market that feature (i.e. seeing all your entities in one dashboard), then you have to either find a solution (maybe have the dashboard pull from multiple databases) or use a single database for the whole app.
IMHO, having the data for multiple clients in the same database just seems like a bad idea to me. You'll have to remember to always filter your queries by clientID.
It also depends on your RDBMS e.g.
With SQL server databases are cheep
With Oracle it is easy to partition tables by customer "customerID", so a single large database can run as fast as a small database for each customer.
However witch every you choose, try to hide it as a low level in your data access code
Do you plan to have your code deployed to multiple environments?
If so, then try to keep it within one database and have all table references prefixed with a namespace from a configuration file.
The single database option would make the maintenance much easier.

Resources