What is the difference between a field and a property in Elasticsearch? - elasticsearch

I'm currently trying to understand the difference between fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) and properties (https://www.elastic.co/guide/en/elasticsearch/reference/current/properties.html).
They are both somehow defined as a "subfield/subproperty" of a type/mapping property, both can have separate types and analyzers (as far as I understood it), both are accessed by the dot notation (mappingProperty.subField or mappingProperty.property).
The docs are using the terms "field" and "property" randomly, I have the feeling, for example:
Type mappings, object fields and nested fields contain sub-fields,
called properties.
What is the difference between properties and (sub-)fields? How do I decide if I have a property or a field?
In other words, how do I decide if I use
{
"mappings": {
"_doc": {
"properties": {
"myProperty": {
"properties": {
}
}
}
}
}
}
or
{
"mappings": {
"_doc": {
"properties": {
"myProperty": {
"fields": {
}
}
}
}
}
}

Subfields are indexed from the parent property source. While sub-properties need to have a "real" value in the document's source.
If your source contains a real object, you need to create properties. Each property will correspond to a different value from your source.
If you only want to index the same value but with different analyzers then use subfields.
It is often useful to index the same field in different ways for
different purposes. This is the purpose of multi-fields. For instance,
a string field could be mapped as a text field for full-text search,
and as a keyword field for sorting or aggregations:
(sorry I find its hard to explain =| )

Note: This is an explanation from my current understanding. It may not be 100% accurate.
A property is what we used to call field in a RDBMS (a standard relationship db like MySQL). It stores properties of an object and provides the high-level structure for an index (which we can compare to a table in a relational DB).
A field, which is linked (or included) into the property concept, is a way to index that property using a specific analyzer.
So lets say you have:
One analyzer (A) to uppercase
One analyzer (B) to lowercase
One analyzer (C) to translate to Spanish (this doesn't even exist, just to give you an idea)
What an analyzer does is transform the input (the text on a property) into a series of tokens that will be indexed. When you do a search the same analyzer is used so the text is transformed into those tokens, it gives each one a score and then those tokens are used to grab documents from the index.
(A) Dog = DOG
(B) Dog = dog
(C) Dog = perro
To search using a specific field configuration you call it using a dot:
The text field uses the standard analyzer.
The text.english field uses the English analyzer.
So the fields basically allow you to perform searches using different token generation models.

Related

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

How to compare two text fields in elastic search

I have a need to compare two text fields in elastic search, but they are text fields.
For normal fields I can use script to compare using doc['field'].value, but is there a way to do the same for text fields.
See below excerpt from ES DOCS :
By far the fastest most efficient way to access a field value from a script is to use the doc['field_name'] syntax, which retrieves the field value from doc values. Doc values are a columnar field value store, enabled by default on all fields except for analyzed text fields.
There are 2 ways known to me to access a text value from script.
Map a keyword representation of a text field as well and access that field.
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{ // <======= See this
"type":"keyword"
}
}
}
}
}
}
The keyword representation can be accessed like 'doc[name.keyword].value'
It is recommended to index/store keyword representation of fields for small-size text fields like 'name', 'emailId' but is not recommended for larger fields like 'description', due to memory overhead
Another way is to enable field data on the text field and access that field.
Fielddata is disabled on text fields by default. Set fielddata=true on
[your_field_name] in order to load fielddata in memory by uninverting the
inverted index. Note that this can however use significant memory.
It is not recommended to use the field-data over the text fields however.
Note: Please do add details on why you need comparison and what kind of comparison is required on 'text' fields.

ElasticSearch Search query is not case sensitive

I am trying to search query and it working fine for exact search but if user enter lowercase or uppercase it does not work as ElasticSearch is case insensitive.
example
{
"query" : {
"bool" : {
"should" : {
"match_all" : {}
},
"filter" : {
"term" : {
"city" : "pune"
}
}
}
}
}
it works fine when city is exactly "pune", if we change text to "PUNE" it does not work.
ElasticSearch is case insensitive.
"Elasticsearch" is not case-sensitive. A JSON string property will be mapped as a text datatype by default (with a keyword datatype sub or multi field, which I'll explain shortly).
A text datatype has the notion of analysis associated with it; At index time, the string input is fed through an analysis chain, and the resulting terms are stored in an inverted index data structure for fast full-text search. With a text datatype where you haven't specified an analyzer, the default analyzer will be used, which is the Standard Analyzer. One of the components of the Standard Analyzer is the Lowercase token filter, which lowercases tokens (terms).
When it comes to querying Elasticsearch through the search API, there are a lot of different types of query to use, to fit pretty much any use case. One family of queries such as match, multi_match queries, are full-text queries. These types of queries perform analysis on the query input at search time, with the resulting terms compared to the terms stored in the inverted index. The analyzer used by default will be the Standard Analyzer as well.
Another family of queries such as term, terms, prefix queries, are term-level queries. These types of queries do not analyze the query input, so the query input as-is will be compared to the terms stored in the inverted index.
In your example, your term query on the "city" field does not find any matches when capitalized because it's searching against a text field whose input underwent analysis at index time. With the default mapping, this is where the keyword sub field could help. A keyword datatype does not undergo analysis (well, it has a type of analysis with normalizers), so can be used for exact matching, as well as sorting and aggregations. To use it, you would just need to target the "city.keyword" field. An alternative approach could also be to change the analyzer used by the "city" field to one that does not use the Lowercase token filter; taking this approach would require you to reindex all documents in the index.
Elasticsearch will analyze the text field lowercase unless you define a custom mapping.
Exact values (like numbers, dates, and keywords) have the exact value
specified in the field added to the inverted index in order to make
them searchable.
However, text fields are analyzed. This means that their values are
first passed through an analyzer to produce a list of terms, which are
then added to the inverted index. There are many ways to analyze text:
the default standard analyzer drops most punctuation, breaks up text
into individual words, and lower cases them.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
So if you want to use a term query — analyze the term on your own before querying. Or just lowercase the term in this case.
To Solve this issue i create custom normalization and update mapping to add,
before we have to delete index and add it again
First Delete the index
DELETE PUT http://localhost:9200/users
now create again index
PUT http://localhost:9200/users
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"user": {
"properties": {
"city": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}

How to treat certain field values as null in `Elasticsearch`

I'm parsing log files which for simplicity's sake let's say will have the following format :
{"message": "hello world", "size": 100, "forward-to": 127.0.0.1}
I'm indexing these lines into an Elasticsearch index, where I've defined a custom mapping such that message, size, and forward-to are of type text, integer, and ip respectively. However, some log lines will look like this :
{"message": "hello world", "size": "-", "forward-to": ""}
This leads to parsing errors when Elasticsearch tries to index these documents. For technical reasons, it's very much untrivial for me to pre-process these documents and change "-" and "" to null. Is there anyway to define which values my mapping should treat as null ? Is there perhaps an analyzer I can write which works on any field type whatsoever that I can add to all entries in my mapping ?
Basically I'm looking for somewhat of the opposite of the null_value option. Instead of telling Elasticsearch what to turn a null_value into, I'd like to tell it what it should turn into a null_value. Also acceptable would be a way to tell Elasticsearch to simply ignore fields that look a certain way but still parse the other fields in the document.
So this one's easy apparently. Add the following to your mapping settings :
{
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
}
}
}
}
This will still index the field (contrary to what I've understood from the documentation...) but it will be ignored during aggregations (so if you have 3 entries in an integer field that are "1", 3, and "hello world", an averaging aggregation will yield 2).
Keep in mind that because of the way the option was implemented (and I would say this is a bug) this still fails for and object that is entered as a concrete value and vice versa. If you'd like to get around that you can set the field's enabled value to false like this :
{
"mappings": {
"my_mapping_name": {
"properties": {
"my_unpredictable_field": {
"enabled": false
}
}
}
}
}
This comes at a price though, since this means the field won't be indexed, but the values entered will be still be stored so you can still accessing them by searching for that document through another field. This usually shouldn't be an issue as you likely won't be filtering documents based on the value of such an unpredictable field, but that depends on your specific case use. See here for the official discussion of this issue.

Elasticsearch type vs index with different schema

https://www.elastic.co/blog/index-vs-type
What is a type?
Fields need to be consistent across types. For instance if two fields have the same name in different types of the same index, they need to be of the same field type (string, date, etc.) and have the same configuration.
And "Which one should I use"
Do your documents have similar mappings? If no, use different indices.
But still, in https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html it mentions that we need to place parent child documents in same index. But how often does parent-child have similar mappings?
And
What i'm trying to investigate is if I should put my 2 different documents in the same index, so I can do parent-child search.
But what I do understand from the index vs type is that documents with different schemas should actually be placed in different indices, but how would then parent-child relationship work?
By "similar mapping", they simply mean that fields with the same name but in two different mapping types must have the same data type.
It is perfectly ok to have different parent and child mapping types, the only constraint is that those two mappings must not contain fields with the same name but different data types.
For instance, if the parent mapping has a field named age of type int, the child mapping type cannot have a field named age of type longor string
The problem is the same field with different typed when you use similar index and different type. To handle this problem, you can use prefix for these fields. For example:
You have a score field for a type user and company. And one of them is integer and one of them is string. You should give different name for these score fields.
POST main_index/user/_mapping
{
"properties": {
....
"user_score": {
"type": "integer"
}
}
}
POST main_index/company/_mapping
{
"properties": {
....
"company_score": {
"type": "string"
}
}
}
You need to do same thing all your same named fields.

Resources