Couchbase Lite - FTS and indexing - full-text-search

I have created an index for FTS to work on say "Cars". But I also have another model called "Bikes".
I have the following structure:
{ "type": "Car", "description": "..."}, {"type": "Bike", model: "..."}
I have created an index on the property "description".
Now, when indexes are created, I can see there are entries for Car, which is fine. But it also shows indexes being created for Bike, with values NULL.
I have multiple entries of Cars and Bikes, and thus have multiple NULL valued indexes being created.
Is this by design? What approach should I take to have both Car and Bike models in the same database, but I only want to implement FTS for Cars. Couchbase Lite doesn't allow me to create conditional indexes, where I could specify the "type".

The functionality you are referring to is know as "partial indexes" which is a feature that is unfortunately not available, and not planned for release yet (as of 2.6.0). Couchbase has a tracking ticket for it here and so if you like you can comment that you want this feature and that will be taken into account during prioritization.
You can still have the information in the same database, but you will have excess information in the index. If this causes an issue then you will need to separate them.

Related

Elasticsearch - Count of associations between indexes?

Coming from the relational database background, I want to know if there is a way to retrieve the number of unique associations between two indexes.
Basic Example (Using relational databases)
I have 3 tables: Person, Cars, Person-Cars
Person-Cars has two columns (person_id, car_id) and holds the number of associations (ownership) between people and cars.
On Elasticsearch, I have created an index for Person and for Cars.
Main Point
Everytime that I fetch a Car document, I want to know how many people own that car (IOW how many associations it has to unique people)
--
To archieve that, I would need another index for Person-Cars, and then would have to index all the association records? Is there a simpler way? What would be the best way to do this in ES?
I have looked into aggregations, but I think that can only be done on a single level (person or cars) not sure.
Thanks!
On Elasticsearch, I have created an index for Person and for Cars.
Most of the times it makes sense to store the data in a denormalized fashion in elastic search, viz defining one-to-many relationships as either nested or parent-child relationship or simply in multi-value fields.
What would be the best way to do this in ES?
It depends on your use case (either parent-child or nested or multi-value). Having separate indexes for each type definitely will add overhead. If you add other use cases and type of queries which you would be needing then only schema can be better modelled.
Considering only the shared use case: Below car document will solve your case :
{
"id":1,
"brand":"Hyundai",
"owners":[21,31,51] // <===== Ids of owners. Ids & names both can be stored if required.
"owners_cnt": 3 // <==== OR You can simply maintain the counter as well.
}
Whenever a person buy/sell a car, then car document needs to updated in this case. If buying and selling of cars happens frequently and you need to update both car & person if a person bought a car then this type of modelling makes less sense.
In that case it makes sense to have car_ids within-person doc :
{
"id":1,
"name":"Raj",
"cars":[1,2,3]
}
In this case, we can use below query to fetch the number of persons who bought a car , having id=3
GET person/_count
{
"query": {
"match": {
"cars": 3
}
}
Again better modelling can be achieved if more context is shared.

How to model user-configurable schemas in ElasticSearch 6+ ?

I work at a small SaaS startup.
As a SaaS software, you create an account and invite others.
We have some entities, like "product", but you can configure product's field.
Let's say you work with cars, you can create fields like "Model", "Year", "Weight", etc.
Let's say you work with clothes, you can create fields like "Size", "Gender", etc.
We have this modeled by SQL, but I want to have a replica in ElasticSearch for general searches and especially for customizeable reports.
To model this, I was considering the options:
One Index per Account Entity
When an account is created, I'd create an index named something like "product-" and it would have its own schema.
When the account's admin creates or changes an field, I'd need to use Update By Query or Reindex API.. idk.
When an account is deleted, I'd need to delete the indexes.
PRO: each index has an solid schema.
CONS: creating/deleting indexes dynamically sounds scary.
One Index per Entity.
This one seems ok too. I'd put "account_id" on every document, and filter it everytime.
Would have only one index per entity: "products", "users", "contacts", "sales" etc.
PRO: way simpler
CONS: each index has multiples schemas. One per account_id
I'm not sure how to consider the relationships either... Can I create relationships dynamically?
One Index to rule them all.
One index "entities", with fields "account_id", "entity_type".
Maybe I need to do this to map the relationships propertly. I'm quite confused. I did not understood fully the join field.
Anything that I'm missing?
Thanks for reading until here :)

OrientDB creates a Record ID for every record. Many examples include an "ID" field of their own for classes. Which to use?

Is this a recommended practice, or should I be trying to use the #rid as the primary key for my classes?
The sample JSON Import page, for example, uses this record definition:
{
"name": "Joe",
"id": 1, // <---- Surrogate key for this class
"friends": [2,4,5],
"enemies": [6]
}
This makes it easier, I think, to create Edges that will work without having to query for the #rid of a just-inserted object as a load is going on.
Is this the recommended best practice?
When OrientDB generates a record, it auto-assigns a unique unit identifier, called a Record ID, or RID. The syntax for the Record ID is the pound sign with the cluster identifier and the position. The format is like this:
#<cluster>:<position>
Records never lose their identifiers unless they are deleted. When deleted, OrientDB never recycles identifiers, except with local storage. Additionally, you can access records directly through their Record ID's. For this reason, you don't need to create a field to serve as the primary key, as you do in Relational databases.
Hope it helps
Regards

How to implement a graph with customizable nodes with Core Data?

I am trying to create a core data application with an underlying directed acyclic graph structure. Thus every node has an "to many" relationship to its children nodes, and a reverse "to many" relationship to its parents nodes.
This is all pretty easy with Core Data so far. What I am having trouble with is allowing these nodes to be customizable.
To elaborate, I really like the API associated with Sublime Text and how much functionality users have. I want to allow users to define node types on this graph. A user might create a node called a "Movie Rating" defined using JSON like so:
{
"type": "Movie Rating",
"fields": [{
"name": "Movie",
"type": "str",
"required": "true",
"min-length": "1",
}, {
"name": "Rating",
"type": "int",
"required": "true",
"min": 1,
"max": 5
}]
}
There could be any number of attributes and many different types of nodes. If I wanted to implement this sort of flexible data structure with Core Data, how might I go about doing this?
(note that every node will have parent-child relationships along with any custom fields/attributes).
Thanks
Chet
You can't just modify the Core Data model on the fly. But to a degree, you can have an extensible model by
Creating one or more simple entity types that just store an NSString key and some kind of value object. You may want more than one depending on how many different types of values you want to support. A simple example would have an NSString key and value and a to-one relationship to a node entity.
Adding a to-many relationship from your node entity to this key/value entity.
That much allows each node to have as many key/value pairs as you want. Different nodes can have different combinations of pairs. They're not really different as far as Core Data goes but your app logic can treat them as effectively being different node types.
If you'll use more than one value type (and it looks like you might) you could create an abstract entity type called KeyValuePair with type-specific sub-entities like KeyValuePairString, KeyValuePairNumber, etc.
To use something like your JSON example to manage typing (so the nodes don't contain just any arbitrary key/value collection) you'd need to
Store the node contents information in an NSDictionary (possibly one that you create by parsing the JSON that you include above).
Use a convenience constructor for your nodes that takes this dictionary as an argument and creates both the node and the necessary key/value pair objects described in the dictionary.
Be very careful about allowing modifications to the node description. If you later decide to add a "year" field to the movie rating, what then? It's up to you, both to figure out what should happen and to write the code (if any) to make it happen. You won't be able to rely on Core Data model versioning.
Following your movie rating example, when creating a new instance you might end up with something like
One instance of Node whose attributes include the parent/child node relationship, a reference to the "Movie Rating" node definition (maybe just a name, maybe some kind of object reference) and a to-many relationship to the KeyValuePair entity mentioned above.
One related KeyValuePair where the key is "name" and the value is "Star Wars"
A second related KeyValuePair where the key is "rating" and the value is 5.
But note that you'll have to write any code that's necessary to ensure that you don't violate the node description-- for example by taking a node defined by your "movie rating" description and adding arbitrary other key/value pairs that don't make sense (e.g. adding a numeric "weight" k/v pair to the movie rating probably doesn't make sense, but Core Data isn't going to help you prevent that).
This all may be kind of awkward-- you're essentially implementing a higher-level data schema on top of Core Data. With some care it should be possible, though.
To change attributes or relationships you need to modify the managedObjectModel of your managed object context with the new/updated/deleted NSEntityDescription objects.
Managed object models are editable until they are used by an object graph manager (a managed object context or a persistent store coordinator). This allows you to create or modify them dynamically. However, once a model is being used, it must not be changed. This is enforced at runtime—when the object manager first fetches data using a model, the whole of that model becomes uneditable. Any attempt to mutate a model or any of its sub-objects after that point causes an exception to be thrown. If you need to modify a model that is in use, create a copy, modify the copy, and then migrate the objects from the old model.
The whole process is detailed in Core Data Model Versioning and Data Migration Programming Guide

Is there anything wrong with creating Couch DB views with null values?

I've been doing a fair amount of work with Couch DB in my spare time recently and really enjoy using it. I find it to be much more flexible than using a relational database, but it's not without it's disadvantages.
One big disadvantage is the lack of dynamic queries / view generation... So you have to do a fair amount of work in planning and justifying your views, as you can't put that logic into your application code as you might do with SQL.
For example, I wrote a login scheme based on a JSON document template that looked a little bit like this:
{
"_id": "blah",
"type": "user",
"name": "Bob",
"email": "bob#theaquarium.com",
"password": "blah",
}
To prevent the creation of duplicate accounts, I wrote a very basic view to generate a list of user names to lookup as keys:
emit(doc.name, null)
This seemed reasonably efficient to me. I think it's way better than dragging out an entire list of documents (or even just a reduced number of fields for each document). So I did exactly the same thing to generate a list of email addresses:
emit(doc.email, null)
Can you see where I'm going with this question?
In a relational database (with SQL) one would simply make two queries against the same table. Would this technique (of equating a view to the product of an SQL query) be in some way analogous?
Then there's the performance / efficiency issue... Should those two views really be just one? Or is the use of a Couch DB view with keys and no associated value an effective practice? Considering the example above, both of those views would have uses outside of a login scheme... If I ever need to generate a list of user names, I can retrieve them without an additional overhead.
What do you think?
First, you certainly can put the view logic into your application code - all you need is an appropriate build or deploy system that extracts the views from the application and adds them to a design document. What is missing is the ability to generate new queries on the fly.
Your emit(doc.field,null) approach certainly isn't surprising or unusual. In fact, it is the usual pattern for "find document by field" queries, where the document is extracted using include_docs=true. There is also no need to mix the two views into one, the only performance-related decision is whether the two views should be placed in the same design document: all views in a design document are updated when any of them is accessed.
Of course, your approach does not actually guarantee that the e-mails are unique, even if your application tries really hard. Imagine the following circumstances with two client applications A and B:
A: queries view, determines that `test#email.com` does not exist.
B: queries view, determines that `test#email.com` does not exist.
A: creates account with `test#email.com`
B: creates account with `test#email.com`
This is a rare occurrence, but nonetheless possible. A better approach is to keep documents that use the email address as the key, because access to single documents is transactional (it's impossible to create two documents with the same key). Typical example:
{
_id: "test#email.com",
type: "email"
user: "000000001"
}
{
_id: "000000001",
type: "user",
email: "test#email.com",
firstname: "Test",
...
}
EDIT: a reservation pattern only works if two clients attempting to create an account for a given e-mail will reliably try to access the same document. If you randomly generate a new identifier, then client A will create and reserve document XXXX while client B will create and reserve document YYYY, and you will end up with two different documents that have the same e-mail.
Again, the only way to perform a transactional "check if it exists, create if it does not" operation is to have all clients alter a single document.

Resources