What is data structure used for Elasticsearch flattened type - elasticsearch

I was trying to find how flattened type in Elasticsearch works under the hood, the documentation specifies that all leaf values will be indexed into a single field as a keyword, as a result, there will be a dedicated index for all those flattened keywords.
From documentation:
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.
The specific case that I am trying to understand:
If I have flattened field and index object with nested objects there is the ability to query a specific nested key in the flattened object. See how to query by labels.release:
PUT bug_reports
{
"mappings": {
"properties": {
"labels": {
"type": "flattened"
}
}
}
}
POST bug_reports/_doc/1
{
"labels": {
"priority": "urgent",
"release": ["v1.2.5", "v1.3.0"]
}
}
POST bug_reports/_search
{
"query": {
"term": {"labels.release": "v1.3.0"}
}
}
Would flattened field have the same index structure as the keyword field, and how it is able to reference the specific child key of flattened object?

The initial design and implementation of the flattened field type is described in this issue. The leaf keys are also indexed along with the leaf values, which is how they are allowing the search for a specific sub-field.
There are some ongoing improvements to the flattened field type and Elastic would also like to support numeric values, but that's not yet released.

Related

Is there a way to define attribute type as Keyword in ElasticSearch Array data type?

I am working on indexing a large data set which has multiple name fields for a particular entity. I have defined the name field of type array and I am adding around 4 names in that. Some of the names have spaces in between and they are getting tokenized. Can I avoid that?
I know for String we have text as well as keyword type in Elastic but how do I define the type as keyword when I am having array as my data type? By default all the array fields are taken as text type. I want them to be treated as keyword type so they don't get tokenized while indexing.
Expected : If I store "Hello World" in an array, I should be able to search "Hello World".
Current behavior : It stores hello differently and world differently as it tokenizes that.
There is no data type for array in elastic search. Whenever you send an array as value of a property of type x then that property becomes an array accepting only the values of type x.
So for example you created a property as below:
{
"tagIds": {
"type": "integer"
}
}
And you index a document with values as below:
{
"tagIds": [124, 452, 234]
}
Then tagIds automatically become an array of integers.
For your case all you need to do is create a field say name with type as keyword. And make sure you always pass an array to this field even if it has to hold a single value to make sure it is always an array. Below is what you need:
Mapping:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
Indexing document:
PUT test/_doc/1
{
"name" : ["name one"]
}

Setting doc_values for _id field in elasticSearch

I want to set doc_values for _id field in elastic search As want to perform sorting based on _id
hitting below api to update mapping gives me an error
PUT my_index/my_type/_mapping
{
"properties": {
"_id": {
"type": "keyword",
"doc_values": true
}
}
}
reason : Mapping definition for [_id] has unsupported parameters: [doc_value : true]
It is “doc_values”, you are using an incorrect parameter. https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html
Elastic discourages sorting on _id field. See this
The value of the _id field is also accessible in aggregations or for sorting, but doing so is discouraged as it requires to load a lot of data in memory. In case sorting or aggregating on the _id field is required, it is advised to duplicate the content of the _id field in another field that has doc_values enabled.
EDIT
Create a scripted field for your index pattern with name for. ex id of type string and script doc['_id'].value. See this link for more information on scripted fields. This will create a new field id and copy _id field's value for every document indexed into your indices matching your index pattern. You can then perform sorting on id field.

Is it possible to retrieve an object in an array that matches my query using elasticsearch?

Given a document like this:
{
"id": "12345",
"elements": [
{
"type": "configure",
"time": 3000
}
]
}
Is it possible to query for documents with an object in the elements array that have a type of configure and then also retrieve that specific object in the array so that I can also get the time value associated with that element (In this case 3000)?
You can use nested inner_hits to retrieve details of the nested objects that match for a nested query. Note that elements will need to be mapped as a nested datatype field.

Can I add a field automatically to an elastic search index when the data is being indexed?

I have 2 loggers from 2 different clusters logging into my elasticsearch. logger1 uses indices mydata-cluster1-YYYY.MM.DD and logger2 uses indices mydata-cluster2-YYYY.MM.DD.
I have no way of touching the loggers. So i would like to add a field on the ES side when the data is indexed to show which cluster the data belongs to. Can i use mappings to do this?
Thanks
What if you use the PUT mapping API, in order to add a field to your index:
PUT mydata-cluster1-YYYY.MM.DD/_mapping/mappingtype <-- change the mapping type according to yours
{
"properties": {
"your_field": {
"type": "text" <--- type of the field
}
}
}
This SO could come in handy. Hope it helps!

Change _type of a document in elasticsearch

I have two TYPES in my elasticsearch index. Both have same mapping. I am using one for active documents, while the other for archived ones.
Now, i want to archive a document i.e. change its _type from active to archived. Both are in same index, so i cannot reindex them as well.
Is there a way to do this in Elasticsearch 5.0 ?
Changing the type is tricky. You would have to remove and then index the document with the new type.
Why not have a field in your document indicating "activeness". Then you can use a bool query to filter by what you want:
{"query": {
"bool": {
"filter": [{"term": {"status", "active"}}],
"query": { /* your query object here */ }
}
}
}
Agree with having a field which indicates the activeness of the document.
(Or)
Use two different indices for "active" and "inactive" types.
Use aliases which map to these indices.
Aliases will give you flexibility to change your indices without downtimes.

Resources