How to model user-configurable schemas in ElasticSearch 6+ ? - elasticsearch

I work at a small SaaS startup.
As a SaaS software, you create an account and invite others.
We have some entities, like "product", but you can configure product's field.
Let's say you work with cars, you can create fields like "Model", "Year", "Weight", etc.
Let's say you work with clothes, you can create fields like "Size", "Gender", etc.
We have this modeled by SQL, but I want to have a replica in ElasticSearch for general searches and especially for customizeable reports.
To model this, I was considering the options:
One Index per Account Entity
When an account is created, I'd create an index named something like "product-" and it would have its own schema.
When the account's admin creates or changes an field, I'd need to use Update By Query or Reindex API.. idk.
When an account is deleted, I'd need to delete the indexes.
PRO: each index has an solid schema.
CONS: creating/deleting indexes dynamically sounds scary.
One Index per Entity.
This one seems ok too. I'd put "account_id" on every document, and filter it everytime.
Would have only one index per entity: "products", "users", "contacts", "sales" etc.
PRO: way simpler
CONS: each index has multiples schemas. One per account_id
I'm not sure how to consider the relationships either... Can I create relationships dynamically?
One Index to rule them all.
One index "entities", with fields "account_id", "entity_type".
Maybe I need to do this to map the relationships propertly. I'm quite confused. I did not understood fully the join field.
Anything that I'm missing?
Thanks for reading until here :)

Related

Elasticsearch - Count of associations between indexes?

Coming from the relational database background, I want to know if there is a way to retrieve the number of unique associations between two indexes.
Basic Example (Using relational databases)
I have 3 tables: Person, Cars, Person-Cars
Person-Cars has two columns (person_id, car_id) and holds the number of associations (ownership) between people and cars.
On Elasticsearch, I have created an index for Person and for Cars.
Main Point
Everytime that I fetch a Car document, I want to know how many people own that car (IOW how many associations it has to unique people)
--
To archieve that, I would need another index for Person-Cars, and then would have to index all the association records? Is there a simpler way? What would be the best way to do this in ES?
I have looked into aggregations, but I think that can only be done on a single level (person or cars) not sure.
Thanks!
On Elasticsearch, I have created an index for Person and for Cars.
Most of the times it makes sense to store the data in a denormalized fashion in elastic search, viz defining one-to-many relationships as either nested or parent-child relationship or simply in multi-value fields.
What would be the best way to do this in ES?
It depends on your use case (either parent-child or nested or multi-value). Having separate indexes for each type definitely will add overhead. If you add other use cases and type of queries which you would be needing then only schema can be better modelled.
Considering only the shared use case: Below car document will solve your case :
{
"id":1,
"brand":"Hyundai",
"owners":[21,31,51] // <===== Ids of owners. Ids & names both can be stored if required.
"owners_cnt": 3 // <==== OR You can simply maintain the counter as well.
}
Whenever a person buy/sell a car, then car document needs to updated in this case. If buying and selling of cars happens frequently and you need to update both car & person if a person bought a car then this type of modelling makes less sense.
In that case it makes sense to have car_ids within-person doc :
{
"id":1,
"name":"Raj",
"cars":[1,2,3]
}
In this case, we can use below query to fetch the number of persons who bought a car , having id=3
GET person/_count
{
"query": {
"match": {
"cars": 3
}
}
Again better modelling can be achieved if more context is shared.

JPA entity relationship dependent on other field

My database has a table that represents the common fields between a bunch of other things. So lets say there's one table like Vehicle and then Vehicle has fields like VIN, color, type...
Then there are other tables like Car and Truck and when the "type" on the Vehicle is "car" we want to look at the Car table to find other properties. Maybe stuff about fuelEfficiency and numPassengers. When type is "truck" we want to know things about trucks like maybe loadCapacity or whatever.
How do you model something like this? Do you Embed the Vehicle into all of the other types? Is there a way to do like a #OneToOne between the tables conditionally on the "type" field?
You could use a MappedSuperClass or a JoinedTable strategy, if you want to follow the real ORM way of doing this. But it comes with a cost. There are other overheads, such as querying restrictions and more joins per query etc that comes with these strategies.
If you denormalize the tables, you could use a datafilter strategy (only hibernate supports I think). But now you have everything in a single class and that could be a bigger problem to deal with.
Since you are using REST, you are probably looking to cater /api/vehicles/1 (correct me pls)
For such scenarios, it is easier to perform this using a "Vehicle" table which contains the "type" in it. You can use a jackson serializer to omit all non-null components. So in the case of a car, vehicle attributes and car parameters will come in, but null values on truck, bus etc will be omitted.

Couchbase Lite - FTS and indexing

I have created an index for FTS to work on say "Cars". But I also have another model called "Bikes".
I have the following structure:
{ "type": "Car", "description": "..."}, {"type": "Bike", model: "..."}
I have created an index on the property "description".
Now, when indexes are created, I can see there are entries for Car, which is fine. But it also shows indexes being created for Bike, with values NULL.
I have multiple entries of Cars and Bikes, and thus have multiple NULL valued indexes being created.
Is this by design? What approach should I take to have both Car and Bike models in the same database, but I only want to implement FTS for Cars. Couchbase Lite doesn't allow me to create conditional indexes, where I could specify the "type".
The functionality you are referring to is know as "partial indexes" which is a feature that is unfortunately not available, and not planned for release yet (as of 2.6.0). Couchbase has a tracking ticket for it here and so if you like you can comment that you want this feature and that will be taken into account during prioritization.
You can still have the information in the same database, but you will have excess information in the index. If this causes an issue then you will need to separate them.

How to store different type and number of fields in one database table?

Hello everybody I'm making a "Bulletin board", like this: http://stena.kg/ad/post, I'm using Laravel 5.0, and don't know how to store different fields in database table, for example if I choose "Cars" category I should to fill Mark, Model, Fuel (etc fields for cars category), If I choose Flats category I should fill fields like Area, Number of rooms etc...How to organize all of this? I tried some ideas but nothing helped me(
Try to save data as json in table. Parse json format to string and save it in db, but it will cause many problems in future, so not recommend that solution. I recommend to store data in separate tabels, each one for category. For optimise process it is possible to create catregory table, and category_item table with fields like name, description and so on. Different category demands sp=ecific fields, so best solution is to create table per category.

Is there anything wrong with creating Couch DB views with null values?

I've been doing a fair amount of work with Couch DB in my spare time recently and really enjoy using it. I find it to be much more flexible than using a relational database, but it's not without it's disadvantages.
One big disadvantage is the lack of dynamic queries / view generation... So you have to do a fair amount of work in planning and justifying your views, as you can't put that logic into your application code as you might do with SQL.
For example, I wrote a login scheme based on a JSON document template that looked a little bit like this:
{
"_id": "blah",
"type": "user",
"name": "Bob",
"email": "bob#theaquarium.com",
"password": "blah",
}
To prevent the creation of duplicate accounts, I wrote a very basic view to generate a list of user names to lookup as keys:
emit(doc.name, null)
This seemed reasonably efficient to me. I think it's way better than dragging out an entire list of documents (or even just a reduced number of fields for each document). So I did exactly the same thing to generate a list of email addresses:
emit(doc.email, null)
Can you see where I'm going with this question?
In a relational database (with SQL) one would simply make two queries against the same table. Would this technique (of equating a view to the product of an SQL query) be in some way analogous?
Then there's the performance / efficiency issue... Should those two views really be just one? Or is the use of a Couch DB view with keys and no associated value an effective practice? Considering the example above, both of those views would have uses outside of a login scheme... If I ever need to generate a list of user names, I can retrieve them without an additional overhead.
What do you think?
First, you certainly can put the view logic into your application code - all you need is an appropriate build or deploy system that extracts the views from the application and adds them to a design document. What is missing is the ability to generate new queries on the fly.
Your emit(doc.field,null) approach certainly isn't surprising or unusual. In fact, it is the usual pattern for "find document by field" queries, where the document is extracted using include_docs=true. There is also no need to mix the two views into one, the only performance-related decision is whether the two views should be placed in the same design document: all views in a design document are updated when any of them is accessed.
Of course, your approach does not actually guarantee that the e-mails are unique, even if your application tries really hard. Imagine the following circumstances with two client applications A and B:
A: queries view, determines that `test#email.com` does not exist.
B: queries view, determines that `test#email.com` does not exist.
A: creates account with `test#email.com`
B: creates account with `test#email.com`
This is a rare occurrence, but nonetheless possible. A better approach is to keep documents that use the email address as the key, because access to single documents is transactional (it's impossible to create two documents with the same key). Typical example:
{
_id: "test#email.com",
type: "email"
user: "000000001"
}
{
_id: "000000001",
type: "user",
email: "test#email.com",
firstname: "Test",
...
}
EDIT: a reservation pattern only works if two clients attempting to create an account for a given e-mail will reliably try to access the same document. If you randomly generate a new identifier, then client A will create and reserve document XXXX while client B will create and reserve document YYYY, and you will end up with two different documents that have the same e-mail.
Again, the only way to perform a transactional "check if it exists, create if it does not" operation is to have all clients alter a single document.

Resources