Dynamically generate data based notifications platform - elasticsearch

In our project we have a requirement to create dynamic notifications that "pop" in our site when a relevant rule applies.
We are based on oracle exadata as our main database.
This feature is suppose to allow the users to create dynamic rules that will be occasionally checked.
These rules may check specific fields in certain types, and may also check these fields relatively to other types field's data.
For example, if our program has a table of cars, with a location column, and another table of streets, with location column (no direct relation between those two tables), we might need to notify the users if a car is in a certain street.
Is there a good platform that can help us calculate the kind of "rules" that we want to check?
We started looking at elasticsearch and neo4j (we have a specific module that involves a graph-like relations..), but we aren't sure that they would be the right solution.
Any idea would be appreciated :)

Neo4j could help you to express your rules, but it sounds as if your disconnected data is rather queried by SQL style joins?
So if you want to express and manage your rules in predicates in the graph you can do that easily and then get a list of applicable rules to trigger queries in other databases.

Related

Adding Fields to Entity for Data Import and Power Query Analysis without Touching Dynamics Solutions

I came into one of those situations that are always used as an opposite of ideal example in tutorials. Custom built CRM, and no access to the firm who built it. For that reason, for the moment, we are not touching solutions because I lack the documentation to make safe decisions on major changes to the Dynamics end of things.
That said, I use power query to analyze the data on a daily basis. For some of our needs, could I theoretically, add fields to the entities, then data import to those fields, and analyze through power query?
Does this route temporarily usurp the potential of messing up the prod environment while giving us the ability to track new data points, add them (without creating a new form to fill out) and access the data for tracking and analysis?
Am I missing any glaring relationship issues between Dynamics and CDS or does this keep the changes on the CDS side? Thoughts?
I believe your thrid party solution is managed?
However if you wish to create field in prod let’s say for account entity.
Crete new unmanaged solution, add account entity without any component(empty, just account) then create new fields, as per your requirement do not add them to any form or views.
Once you publish this fields are available for your use.
After all the analysis if you wish to delete those fields, go to your newly create solution and delete the fields (delete and not just remove)
This shall help and will not cause any issue.

Salesforce Table Relationships for Business Analyst

I am a business analyst. I use Tableau a lot but have limited knowledge about the back-end of Salesforce. The majority of our company's data is stored in Salesforce and our data team does not support business users for understanding such topics.
In many of my projects, I use the Salesforce connector inside Tableau to extract Salesforce tables, but it requires knowledge about joins relationships among tables. Most of the time, I can guess correctly about the primary key among tables, but I still want to learn systematically about the data structure and have my data independence.
So, how do I learn the data structure by myself? Or how do I ask specific structure questions to data team so I don't trouble them as much?
Do you have Salesforce account with "Customize Application" permission? If you don't have in production - maybe they'll be willing to promote you to sysadmin in one of sandboxes.
If you do - Setup -> Schema Builder might be easiest tool to visualise relations. It's bit old, flash-based but pretty neat way to model relationships. https://trailhead.salesforce.com/en/content/learn/modules/data_modeling/schema_builder
Another one might be workbench, http://workbench.developerforce.com/ It's not as neat but lets you experiment with metadata & queries, learn which object has what child relationships...
For standard objects if you have a primary key / foreign key you can use some lookup tables to learn more about target table. All Account Ids in all SF instances start with 001. Contacts with 003, Users with 005... Combine some blogs like http://www.fishofprey.com/2011/09/obscure-salesforce-object-key-prefixes.html with https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_objects_account.htm and it's a good start. Won't help much with custom objects and fields (specific to your company) but well.
It's bit "meta" but you can query info about tables and columns too. After all - you might be more comfortable in Tableau ;) Querying Salesforce Object Column Names w/SOQL might give you some hints.
If your job is to build advanced reports off these data sources, I would imagine you need to understand the data structure to some extent. This would mean you need to have authorization to view and access the database table list to get familiar with it and possibly run raw queries to verify data integrity.
If they are not comfortable with you touching the production system, ask for access to a development system which is a copy of production or even just realistic test data.

Elasticsearch - Modelling video catalogue information into one index vs multiple indexes

I need to model a video catalogue composed of movies, tv shows, episodes, TV channels and live programs information into elasticsearch. Some of these entities are correlated, some not.
The attributes of these entities are quite different, even if there are some common ones.
Now since I may need to do query cross-entity, imagine the scenario of a customer searching for something that could be a movie, a tv channel or a live event program, is it better to have 1 single index containing a generic entity marked with a logical type attribute, or is it better to have multiple indexes, 1 for each entity (movie, show episode, channel, program) ?
In addition, some of these entities, like movies, can have metadata attributes into multiple languages.
Coming from a relational data model DB, I would create different indexes, one for every entity and have a language variant index for every language. Any suggestion or better approach in order to have great search performance and usability?
Whether to use several indexes or not very much depends on the application, so I cannot provide a definite answer, rather a few thoughts.
From my experience, indexes are rather a means to help maintenance and operations than for data modeling. It is, for example, much easier to delete an index than delete all documents from one source from a bigger index. Or if you support totally separate search applications which do not query across each others data, different indexes are the way to go.
But when you want to query, as you do, documents across data sources, it makes sense to keep them in one index. If only to have comparable ranking across all items in your index. Make sure to re-use fields across your data that have similar meaning (title, year of production, artists, etc.) For fields unique to a source we usually use prefix-marked field names, e.g. movie_... for movie-only meta data.
As for the the language you need to use language specific fields, like title_en, title_es, title_de. Ideally, at query time, you know your user's language (from the browser, because they selected it explicitly, ...) and then search in the language specific fields where available. Be sure to use the language specific analyzers for these fields, at query and at index time.
I see a search engine a bit as the dual of a database: A database stores data but can also index it. A search engine indexes data but can also store it. A database tends to normalize the schema to remove redundancy, a search engine works best with denormalized data for query performance.

Multi-tenant database. One collection or one db per tenant?

For a multi-tenancy architecture for a web application using a document-oriented database I can see two conceivable options:
Having one database per tenant, and the collections logically separate different kinds of object.
Having one collection per tenant, and all user data is stored in one database, with some kind of flag or object type identifier on each record.
Have there been any studies or has any documentation been produced regarding these two options and the differences between them?
Is there a particular standard or good reason why someone designing a web application which allows multiple users to store vastly different kinds of data would choose one over the other?
Aside from speed/efficiency issues, are there any other things to be said about this that would influence the decision?
EDIT I'm aware some of the terminology might be database specific, so for all wondering I am specifically referring to MongoDB.
I wouldn't want tenant specific collections. In my application, I usually hard code collection names, in the same way as I'd hardcode table names if I were using SQL tables. There'd be one comments collection that stores all comments for a blog. I would not want to deal with collection names like comments_tenant_1 and comments_tenant_2, because 1) that feels error prone, and 2) would make the application code more complicated (collection names would have to be replaced with functions that computed the collection name). And 3) the number of collections in a single database could grow huge, which would make a list of all collections look daunting, and also MongoDB isn't built for having very many collections (see the link in the comment below your question, which David B posted, https://docs.mongohq.com/use-cases/multi-tenant.html).
However, database names aren't coupled to application data structures, and you can grant permissions on databases (but not on single collections). So one database per tenant could be reasonable. As could be a per document tenant_id field in a single database for all tenants (see the above-mentioned link).

couchdb validation based on content from existing documents

QUESTION
Is it possible to query other couchdb documents as part of a standard couchdb validation function ?
If not, what is the standard approach for including properties of other documents as part of a validation rule inside a couchdb validation function?
RATIONALE
Consider a run-of-the-mill address book application where the validation function is intended to prevent two or more entries having the same value for the 'e-mail' in one of the address book entry fields.
Consider also an address book application where it is possible to specify validation rules in separate documents, based on whether the postal code is a US-based postal code or something else.
No, it is not possible to query other couchdb documents in a validate_doc_update function. Each runs in isolation passing references only to: the new document, the old document, and user (where applicable).
My personal experience has been there are at least three options for dealing with duplicate checking:
Use Cloudant as your CouchDB provider. They offer a free tier for now if you'd like to experiment, but they guarantee consistency across nodes for a CouchDB database. (See #2)
I've used a secondary "reserve table" for names using the type-key as the ID. Then, you need to check for conflicts if not using a system like Cloudant. Basically, there's a simple document that maintains a key to prevent duplicates. It's not fun code to write given that you need to watch for conflicts. (Even with cloudant, you need to deal with failed requests to write, but it's easier than dealing with timing issues surrounding data replication across multiple nodes).
Use a traditional DB like MySQL for example that can maintain a unique and consistent index for specific data values like you're describing. Store the documents away in CouchDB though. While slightly annoying that you need different data providers, it's reliable.
(Optional: decide that CouchDB isn't a great fit for the type of system you're building)

Resources