In ElasticSearch, what is the equivalent for creating custom field types? - elasticsearch

In Solr, my schema is very clean as I've defined 5 common field types used across 20 fields. The field type contains a base class (text, etc.), an index analyzer, and a query analyzer. In Elastic, is there no way to represent this other than to define my fields as a base type and then split up each field type into its two analyzers?
For example, for 20 common fields, I would still need to create two analyzers to represent my field type: customfield_index_analyzer and customfield_query_analyzer, and I would need to reference these two separate analyzers for every field in the Elastic mapping? I just want to make sure I'm doing this the right way, but it seems wrong and makes the configuration file less manageable and more prone to mistakes.

Related

Control which multi fields are queried by default

I have a preexisting index that contains field mappings and is currently being queried by many applications. I would like to add additional ways for the data to be queried, specifically, support full text search via analysis. Multi-fields seemed like the obvious way to do this, but I found that adding new multi-fields actually changes the existing query behavior.
For example, I have an "id" field that is a keyword. Applications are already using this field to query on. After I add a new multi-field, like "txt" (using the standard analyzer), new documents can be found by querying with just a partial value match. Values for "id" look like this: "123-abc" so now a query with just "abc" will match when querying against the "id" field. This is not how it worked previously (the keyword only field would require the entire value "123-abc").
Ideally, the top-level "id" field would be keyword only, and if a "full text" search was required, the query would need to specify "id.txt". So my question is... is there a way to disable multi-fields and require that the query explicitly set a sub field when needed?
My only other thought on how to solve this, was to use copy_to so that these fields are completely distinct... but that is a bit more work and there are many many fields to deal with that would require this.

How to handle the situation of saving to Elasticsearch logs whose structure are very diverse?

My log POCO has several fixed properties, like user id, timestamp, with a flexible data bag property, which is a JSON representation of any kind of extra information I'd like to add to the log. This means the property names could be anything within this data bag, bringing me 2 questions:
How can I configure the mapping so that the data bag property, which is of type string, would be mapped to a JSON object during the indexing, instead of being treated as a normal string?
With the data bag object having arbitrary property names, meaning the overall document type could have a huge number of properties inside, would this hurt the search performance?
For the data translation from string to JSON you can use ingest pipeline with JSON processor:
https://www.elastic.co/guide/en/elasticsearch/reference/master/json-processor.html
It depends of you queries. If you'll use the "free text search" - yes, the huge number of fields will slow the query. If you you'll use query like "field":"value" - no, there is no problem with the fields number in the searches. Additional information about query optimization you cold find here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/tune-for-search-speed.html#search-as-few-fields-as-possible
And the question is: what you meen, when say "huge number"? 1000? 10000? 100000? As part of optimization i recommend to use dynamic templates with the definition: each string field automatically ingest into the index as "keyword" and not text + keyword. This setting decrease the number of fields to half.

elasticsearch - Tag data with lookup table values

I’m trying to tag my data according to a lookup table.
The lookup table has these fields:
• Key- represent the field name in the data I want to tag.
In the real data the field is a subfield of “Headers” field..
An example for the “Key” field:
“Server. (* is a wildcard)
• Value- represent the wanted value of the mentioned field above.
The value in the lookup table is only a part of a string in the real data value.
An example for the “Value” field:
“Avtech”.
• Vendor- the value I want to add to the real data if a combination of field- value is found in an document.
An example for combination in the real data:
“Headers.Server : Linux/2.x UPnP/1.0 Avtech/1.0”
A match with that document in the look up table will be:
Key= Server (with wildcard on both sides).
Value= Avtech(with wildcard on both sides)
Vendor= Avtech
So baisically I’ll need to add a field to that document with the value- “ Avtech”.
the subfields in “Headers” are dynamic fields that changes from document to document.
of a match is not found I’ll need to add to the tag field with value- “Unknown”.
I’ve tried to use the enrich processor , use the lookup table as the source data , the match field will be ”Value” and the enrich field will be “Vendor”.
In the enrich processor I didn’t know how to call to the field since it’s dynamic and I wanted to search if the value is anywhere in the “Headers” subfields.
Also, I don’t think that there will be a match between the “Value” in the lookup table and the value of the Headers subfield, since “Value” field in the lookup table is a substring with wildcards on both sides.
I can use some help to accomplish what I’m trying to do.. and how to search with wildcards inside an enrich processor.
or if you have other idea besides the enrich processor- such as parent- child and lookup terms mechanism.
Thanks!
Adi.
There are two ways to accomplish this:
Using the combination of Logstash & Elasticsearch
Using the only the Elastichsearch Ingest node
Constriant: You need to know the position of the Vendor term occuring in the Header field.
Approach 1
If so then you can use the GROK filter to extract the term. And based on the term found, do a lookup to get the corresponding value.
Reference
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_static.html
https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
Approach 2
Create an index consisting of KV pairs. In the ingest node, create a pipeline which consists of Grok processor and then Enrich it. The Grok would work the same way mentioned in the Approach 1. And you seem to have already got the Enrich part working.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html
If you are able to isolate the sub field within the Header where the Term of interest is present then it would make things easier for you.

Unable to demonstrate common fields on the same column using Kibana?

I am using two different indexes which are named cdr_mobile and cdr_volte that have lots of common fields. So in order to show datas on Kibana I have made an alias using two of them cdr_alias and now I am using cdr_alias in order to retieve data to Kibana.
The problem I have is, I cannot demonstrate common fields on the same column. Instead of that I am having for example, cdr_volte.startOfCharge and startOfCharge (the other thing I dont understand is here, it is using the cdr_mobile as default).
Do you have any idea how I can put the common fields on the same column ?
Thank you
I have made an error on mapping, In order to put same fields on the same column you must have exactly same mapping for the field for both indices.

ElasticSearch: Performance Implications of Multiple Types in the same Index

We are storing a handful of polymorphic document subtypes in a single index (e.x. let's say we store vehicles with subtypes of car, van, motorcycle, and Batmobile).
At the moment, there is >80% commonality in fields across these subtypes (e.x manufacturer, number of wheels, ranking of awesomeness as a mode of transport).
The standard case is to search across all types, but sometimes users will want to filter the results to a subset of the subtypes: find only cars with...).
How much overhead (if any) is incurred at search/index time from modelling these subtypes as distinct ElasticSearch types vs. modelling them as a single type using some application-specific field to distinguish between subtypes?
I've looked through several related answers already, but can't find the answer to my exact question.
Thanks very much!
There shouldn't be any noticeable overhead.
If you keep everything under the same type, you can filter results by a subtype by adding a "class" field on your objects and adding a condition on this field in your search.
A good reason to model your different classes into different ES types is if there can be a conflict between type of fields with the same name.
That is, assume your "car" class has a "color" field that holds integer number, while your "van" class also has a "color" field but this one is a string. (Stupid example, I know, didn't have any better idea).
Elasticsearch holds the mapping (the data "schema") for a type. So if you index both "car" and "van" under the same type, you will have a field type conflict. A field in a type can have one specific type. If you set the field as integer and then try to index a string into it, it will fail.
This is one of the main guidelines on how to use Elasticsearch types - treat the type as a specific data schema that can't have conflicts.

Resources