MongoDB index: object keys vs array of strings - performance

I'm new to the MongoDB and have been researching schema designs and indexing. I know you can index a property regardless of its value (ID, array, subdocument, etc...) but what I don't know is if there is a performance benefit to either indexing an array of strings or a nested object's keys.
Here's an example of both scenarios that I'm contemplating (in Mongoose):
// schema
mongoose.Schema({
visibility: {
usa: Boolean,
europe: Boolean,
other: Boolean
}
});
// query
Model.find({"visibility.usa": true});
OR
// schema
mongoose.Schema({
visibility: [String] // strings could be "usa", "europe", and/or "other"
});
// query
Model.find({visibility: "usa"});
Documents could have one, two, or all three visibility options.
Furthermore, if I went with the Boolean object design, could I simple index the visibility field or would I need to put an index on usa, europe, and other?

In MongoDB creating indexes on an array of strings results into multiKey index where all the strings in the array form the index keys and point to the same document.So in your case it would work same as nested object keys.
If you go with boolean design and you can put index on visibility field.You can further read on MongoDB Mulitkey indexing

Related

Sorting Results efficiently by non mapped/searchable Object properties

In my current index called items I have previously sorted the results of my queries by some property values of a specifiy object property, lets call this property just oprop. After I have realized, that with each new key in oprop the total number of used fields on the items index increased, I had to change the mapping on the index. So I set oprop's mapping to dynamic : false, so the keys of oprop are not searchable anymore (not indexed).
Now in some of my queries I need to sort the results on the items index by the values of oprop keys. I don't know how ElasticSearch still can give me the possibility to sort on these key values.
Do I need to use scripts for sorting? Do I have access on non indexed data when using scripts?
Somehow I don't think that this is a good approach and I think that in long term I will run into performance issues.
You could use scripts for sorting, since the data will be stored in the _source field, but that should probably be a last resort. If you know which fields need to be sortable, you could just add those to the mapping, and keep oprop as a non dynamic field otherwise?
"properties": {
"oprop": {
"dynamic": false,
"properties": {
"sortable_key_1": {
"type": "text"
},
"sortable_key_2": {
"type": "text"
}
}
}
}

How Keyword and Numeric data Types are stored in elastic search? is it stored in inverted index?

put sana/_mapping/learn { "properties": { "name":{"type":"text"}, "age":{"type":"integer"} } }
POST sana/learn { "name":"rosy", "age":23 }
Quoting the Elasticsearch doc:
Most fields are indexed by default, which makes them searchable. The
inverted index allows queries to look up the search term in unique
sorted list of terms, and from that immediately have access to the
list of documents that contain the term.
Keyword and numeric data types are also indexed and stored in the inverted index so that these fields are searchable, but if you want you can disable it by setting index type to false, in your index mapping, also on these fields(keyword,numeric) doc_values is enabled by default sorting and aggregations etc, but not enabled on analyzed string(text) fields.
Hope I answered your question and let me know if you have any doubt.

Elasticsearch type vs index with different schema

https://www.elastic.co/blog/index-vs-type
What is a type?
Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
And "Which one should I use"
Do your documents have similar mappings? If no, use different indices.
But still, in https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html it mentions that we need to place parent child documents in same index. But how often does parent-child have similar mappings?
And
What i'm trying to investigate is if I should put my 2 different documents in the same index, so I can do parent-child search.
But what I do understand from the index vs type is that documents with different schemas should actually be placed in different indices, but how would then parent-child relationship work?
By "similar mapping", they simply mean that fields with the same name but in two different mapping types must have the same data type.
It is perfectly ok to have different parent and child mapping types, the only constraint is that those two mappings must not contain fields with the same name but different data types.
For instance, if the parent mapping has a field named age of type int, the child mapping type cannot have a field named age of type longor string
The problem is the same field with different typed when you use similar index and different type. To handle this problem, you can use prefix for these fields. For example:
You have a score field for a type user and company. And one of them is integer and one of them is string. You should give different name for these score fields.
POST main_index/user/_mapping
{
"properties": {
....
"user_score": {
"type": "integer"
}
}
}
POST main_index/company/_mapping
{
"properties": {
....
"company_score": {
"type": "string"
}
}
}
You need to do same thing all your same named fields.

ElasticSearch: Metric aggregation and doc values / field-data

How does ES internally implement metric aggregations ?
Suppose documents in the index have below structure:
{
category: A,
measure: 20
}
Would for the below query which does terms aggregation on category and calculate sum(measure), the 'measure' field values
be extracted from the document (i.e. _source) and summed or
would the values be taken from doc-values / field data of 'measure' field
Query:
{
size: 0,
aggs: {
cat_aggs: {
terms: {
field: 'category'
},
aggs: {
sumAgg: {
sum: {field: 'measure'}
}
}
}
}
}
From the official documentation on metrics aggregations (emphasis added):
The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.
If you're using a newer ES 2.x version, then doc_values have become the norm over field data.
All fields which support doc values have them enabled by default. If you are sure that you don’t need to sort or aggregate on a field, or access the field value from a script, you can disable doc values in order to save disk space
So to answer your question clearly, metrics aggregations are computed based on either field data or doc values that have been stored at indexing time, i.e. not computed based on source parsing at query time, unless your doing it from a script which accesses the _source directly.

Indexes for mongodb

I have a mongo db collection for restaurants.
e.g.
{_id: uniquemongoid,
rank: 3,
city: 'Berlin'
}
Restaurants are listed by city and ordered by rank (an integer) - should I create an index on city and rank, or city/rank compound? (I query by city and sort by rank)
Furthermore there are several fields with booleans e.g. { hasParking:true, familyFriendly:true } - should I create indexes to speed up queries for these filters? compound indexes? Its not clear for me if I should create compound indexes as the queries can have only one boolean set or more booleans set.
The best way to figure out whether you need indexes is to benchmark it with "explain()".
As for your suggested indexes:
You will need the city/rank compound index. Indexes in MongoDB can only be used for left-to-right (at the moment) and hence doing an equality search on "city" and then sorting the result by "rank" will mean that the { city: 1, rank: -1 } index would work best.
Indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. After doing a selection by city (and hopefully a limit!) doing an extra filter for hasParking etc will not make MongoDB use both the city/rank and the hasParking index. MongoDB can only use one index per query.
1) create index { restaurant:1, rank: 1} which will serve your purpose.
You will avoid 2 indexes
2) Create a document in following format and you can query for any no of fields you want.
{
info: [{hasParking:true}, {familyFriendly:true}],
_id:
rank:
city:
}
db.restaurants.ensureIndex({info : 1});
db.restaurants.find({ info :{ hasParking:true}})
Note MongoDB don't use two index for the same query (except $or queries). So, in the (2) case, if you want to add addition filter over the (1) query, then this (2) option won't work. I am not sure of your (2) requirement, so posting this solution.

Resources