I started reading the documentation about Elasticsearch, and I read about _type metadata element, in Elasticsearch documentation:
Elasticsearch exposes a feature called types which allows you to logically partition data inside of an index. Documents in different types may have different fields, but it is best if they are highly similar.
So my question is: In which situations the best practice is to split documents into types? Because in the documentation, they wrote that the documents in different _types should have similar fields.
Let's say you create a new index "WWW" and the types of it would be "http" and "https". Both types have the same mapping and fields. It would be easier to search all the "http" documents like this:
GET /WWW/http/_search?pretty
and the https like this:
GET /WWW/https/_search?pretty
It also gives you a logical separation between your data.
There's a good blog post about type vs index: https://www.elastic.co/blog/index-vs-type
Having the same mappings and fields is a good starting point (since sparsity is an issue). Just be aware that types will be removed in the future, so don't structure your logic around it too heavily. But you will be able to do the same with an enum like field and a filter in your query.
Related
I was wondering for people who have used Elasticsearch at scale if there is a performance benefit while searching if I create an index mapping and then put documents in it compared to not creating a mapping and just directly putting documents in
It is usually preferable to create the explicit mapping for an index, where possible.
For a search case, this is crucial in order to index data with the analysis chains needed to service the search strategy.
For a log use case, it may not be possible to know what the explicit mapping should be for log records that will be ingested, as there may be dynamic fields in the data that is not known ahead of time. Dynamic templates can help here, as can adopting a unified logging structure like Elastic Common Schema (ECS), either converting data to ECS format whilst logging, or converting whilst ingesting into Elasticsearch with ingest pipelines
Yes it is always better to use explicit mapping before putting the documents rather than depending on the dynamic mapping. If at all you are dependent on the dynamic mapping you may not be able to visualize on few data types like text. And also when you maintain mapping your index will always have the same kind of data. Please refer to this blog:
[https://qbox.io/blog/maximize-guide-elasticsearch-indexing-performance-part-1/][1]
Mostly what I do is to assemble the mapping by hand. Choosing the correct types myself.
Is there any tool which facilitates this?
For example which will read a class (c#,java..etc) and choosing the closest ES types accordingly.
I've never seen such a tool, however I know that ElasticSearch has a REST API over HTTP.
So you can create a simple HTTP query with JSON body that will depict your object with your fields: field names + types (Strings, numbers, booleans) - pretty much like a Java/C# class that you've described in the question.
Then you can ask the ES to store the data in the non-existing index (to "index" your document in ES terms). It will index the document, but it will also create an index, and the most importantly for your question, will create a mapping for you "dynamically", so that later you will be able to query the mapping structure (again via REST).
Here is the link to the relevant chapter about dynamically created mappings in the ES documentation
And Here you can find the API for querying the mapping structure
At the end of the day you'd still want to retain some control over how your mapping is generated. I'd recommend:
syncing some sample documents w/o a mapping
investigating what mapping was auto generated and
dropping the index & using dynamic_templates to pseudo-auto-generate / update the mapping as new documents come in.
This GUI could help too.
Currently, there is no such tool available to generate the mapping for elastic.
It is a kind of similar thing as we have to design a database in MySQL.
But if we want such kind of thing then we use Mongo DB which requires no predefined schema.
But Elastic comes with its very dynamic feature, which allows us to play around it. One of the most important features of Elasticsearch is that it tries to get out of your way and let you start exploring your data as quickly as possible like the mongo schema which can be manipulated dynamically.
To index a document, you don’t need to first define a mapping or schema and define your fields along with their data type .
You can just index a document and the index, type, and fields will be created automatically.
For further details you can go through the below documentation:
Elastic Dynamic Mapping
I've been looking all over the place for a good answer to my question but I just can't find any...
I'm using ElasticSearch along with Laravel. I've used ElasticSearch on another project but never used suggestions. I'm following this tutorial as I think it provides a great starting point for using Laravel with ElasticSearch: https://blog.madewithlove.be/post/how-to-integrate-your-laravel-app-with-elasticsearch/
My question is about suggestions; I want my search to be a search-as-you-type just like the one you would find on Spotify. I want my users to type a few letters in the search box and have the results be organized into multiple categories: blogs, authors, tags.
If I index my data into one index, with authors and tags being blog's nested objects, I can easily get suggestions using the completion suggester for blog names, but not for nested objects. I could also split each model and index data separately into different indexes, but that would mean I would have to make 3 queries to get my results back.
Am I doing something wrong? Should I structure my data differently? Is making 3 queries the way to go or is there a way to have a single query output search results from different indexes?
Thanks!
Xavier
Something that I did when I built a search-as-you-type was I used a separate index for suggestions. In your situation, you'd index the name (title, author, whatever) in one field and the type in another. Then you could search on one field and display the grouped results.
The advantage here is speed. This will likely be a heck of a lot faster than trying to do a suggester on your nested data. (Which you can probably do, but I'm not sure how.) And speed is pretty important for this type of feature.
We're in the process of setting up Amazon Elasticsearch Service (running Elasticsearch version 2.3).
We have different types of data (that I'm currently thinking of as different document types within the same index).
We have a generic search in an app where we want an inline autocomplete function, that is, a completion suggester returning hits from all different data (document) types. How can that be set up?
When querying suggesters you have to specify an index, so that's why I wanted to keep all the data in the same index. According to the documentation, the completion suggester considers all documents in the index.
Setting up the completion suggester for the first document type was pretty straight forward and is working great. However, as far as I can see you to specify a suggest field when querying. That would be all good hadn't it been for the error message we get when setting up the mapping for the second document type:
Type: illegal_argument_exception Reason: "[suggest] is defined as an object in mapping [name_of_document_type] but this name is already used for a field in other types"
Writing this question I see that it's possible to specify more than one suggester in a single suggest query. Maybe that is what we have to solve it? (I.e. get X results from Y suggesters where we compare the scores to get the 1 suggestion we want to present to the user.)
One of the core principles of good data design for Elasticsearch (as with many data stores) is to optimise your data storage for ease of reading. Usually, this means embracing duplication.
With this in mind, I'd suggest having a separate autocomplete index with a mapping that's designed specifically for the suggester queries.
Whenever you insert or write one of your other documents, map it to your autocomplete type and add or update it in your autocomplete index at the same time (or, depending on how up-to-date it needs to be, create an offline process to update your autocomplete index e.g. every day).
Then, when you do your suggest query, you can just use your autocomplete index and not worry about dealing with different types of documents with different fields.
I'm looking into elastic search right now and I am having a hard time grasping how index types fit into the data model, I've read examples and documentation but none really goes in depth or the examples seem to use a data model that is composed of several submodels.
I am currently using mongodb to store my data, let's take this example of an Article collection that I want to be indexed for search, my doc looks like this:
Article = {
title: String,
publisher: String,
subject: String,
description: String,
year: Integer,
}
Now I want each of those fields to be searchable, so I would make an elasticsearch index of 'Article'. I will need to define each field and how it should be analysed and whether it is stored or not, that I understand.
Now how does an index type come in here? As far as I am aware, Lucene does not have this concept, this is a layer added by Elasticsearch.
For example, maybe some of you may say that we can logically group the documents by subject or publisher and create index types on those but how is this different from searching by subject or publisher?
Is it more of a performance related aspect that we have index types?
Not a very easy question to answer, but I am going to give it a try. But be warned this just my opinion.
First of all, if you do not want to keep certain documents together in an index, just because it feels they should, create separate indices. There is not really a penalty for using more indices over more types. The only thing I can think of is that you could create analysers and mappings that you can reuse over the different types.
You can use types if you feel documents belong together, they have similar structure but not necessary the same structure. Be warned though, do not create separate mappings for fields with the same name in different types within the same index. Lucene does not like this.
Than there is the final scenario, in parent-child relationships, here you need types. This way the parent and it's children can be places in the same shard which is better for performance.
Hope that helps a bit.
If I'm not mistaken, the catch with using more than one data type in one index is almost identical to using different indices. Say, you can store (as I did) documents of types "simple_address", "delivery_address", "some_strange_but_official_address_info" in the same index "address" to make your code a bit more sane. But if you don't use parent-child links, it's equivalent to just having three indices.
Speaking of your example, you should wrap your head around what would you like to search. If, for instance, you add comments in equation, it's better to use some kind of separation - either as parent-child or different indices with manual mapping by keys. And, obviously, you should have different mappings for "Article" and "Comment" types.