Elasticsearch “data”: { “type”: “float” } query returns incorrect results - elasticsearch

I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107.
When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent.
--> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ?
2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ?
I have 5 float type in my mapping, what is the fastest way to change all of them.
Note: In my mind I have below solutions but I'm afraid it's slow
update by query API
reindex API
run time search request (especially this one)
Thank you!

That date_partition field should have the date type with format=yyyyMMdd, that's the only sensible type to use, not long and even worse float.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108 and have the 20220109 document returned in the results.
Using the date type would also allow you to use proper time-based range queries and create date_histogram aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.

It can be answer of my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

Related

ElasticSearch query returns wrong results

I'm relatively new to ElasticSearch and encountered this issue which I can't seem to get why.
So for this particular field, it seems to be treating all the values to be zero, even though the individual records are non-zero values. This only seems to happen to this number field and not other similar fields (such as cpu pct, mem pct etc)
The records only show when I query for records that have 'system.filesystem.used.pct == 0', whereas none of them show when I do something like 'system.filesystem.used.pct > 0'.
I also did the querying in the dev tools in kibana like so, yet I don't get any results:
GET metricbeat-*/_search{
"query": {
"range":{
"system.filesystem.used.pct":{
"gt":0
}
}
}
}
However, if I did this, I will get all non-zero results, just like in discover:
GET metricbeat-*/_search
{
"query": {
"term": {
"system.filesytem.used.pct":0
}
}
}
As pointed out by #Ron Serruya, there is a mapping issue. The mapping for system.filesytem.used.pct is detected as to be of integer type. Since, you are getting the expected search results for cpu.pct field, the mapping of cpu.pct, must have been of float type
CASE 1:
If you index the two sample data as (in the same order)
{
"count": 0.45
}
{
"count": 0
}
Then float data type is detected by elasticsearch (if you are using dynamic mapping). this is because the detection of the field type depends on the first data that you have inserted in the field.
CASE 2:
Now, if you index the data in this order
{
"count": 0
}
{
"count": 0.45
}
Here elasticsearch will detect count to be of long data type.
You need to recreate the index, with the new index mapping, reindex the data and then run the search query on system.filesytem.used.pct
Modified index mapping will be
{
"mappings": {
"properties": {
"system": {
"properties": {
"filesytem": {
"properties": {
"used": {
"properties": {
"pct": {
"type": "float"
}
}
}
}
}
}
}
}
}
}

Elastic query bool must match issue

Below is the query part in Elastic GET API via command line inside openshift pod , i get all the match query as well as unmatch element in the fetch of 2000 documents. how can i limit to only the match element.
i want to specifically get {\"kubernetes.container_name\":\"xyz\"}} only.
any suggestions will be appreciated
-d ' {\"query\": { \"bool\" :{\"must\" :{\"match\" :{\"kubernetes.container_name\":\"xyz\"}},\"filter\" : {\"range\": {\"#timestamp\": {\"gte\": \"now-2m\",\"lt\": \"now-1m\"}}}}},\"_source\":[\"#timestamp\",\"message\",\"kubernetes.container_name\"],\"size\":2000}'"
For exact matches there are two things you would need to do:
Make use of Term Queries
Ensure that the field is of type keyword datatype.
Text datatype goes through Analysis phase.
For e.g. if you data is This is a beautiful day, during ingestion, text datatype would break down the words into tokens, lowercase them [this, is, a, beautiful, day] and then add them to the inverted index. This process happens via Standard Analyzer which is the default analyzer applied on text field.
So now when you query, it would again apply the analyzer at querying time and would search if the words are present in the respective documents. As a result you see documents even without exact match appearing.
In order to do an exact match, you would need to make use of keyword fields as it does not goes through the analysis phase.
What I'd suggest is to create a keyword sibling field for text field that you have in below manner and then re-ingest all the data:
Mapping:
PUT my_sample_index
{
"mappings": {
"properties": {
"kubernetes":{
"type": "object",
"properties": {
"container_name": {
"type": "text",
"fields":{ <--- Note this
"keyword":{ <--- This is container_name.keyword field
"type": "keyword"
}
}
}
}
}
}
}
}
Note that I'm assuming you are making use of object type.
Request Query:
POST my_sample_index
{
"query":{
"bool": {
"must": [
{
"term": {
"kubernetes.container_name.keyword": {
"value": "xyz"
}
}
}
]
}
}
}
Hope this helps!

Add default value on a field while modifying existing elasticsearch mapping

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

Indexing a multi-field property in Elastic Search

I am trying to re-index my documents in order for them to be sortable which requires making the sortable fields Multi-field properties with a "raw" version of the string which does not get analyzed.
I am following this article, but I am still getting errors when searching my documents with a sorting query.
I have a question then regarding the re-indexing of the data... if I re-index the doucments into this new index, then do I need to have some extra logic to set the analyzed version and the non_analyzed or "raw" version of the string as well? Or does elastic search automatically fill that one? Here is what my field looks like:
{
"entityName": {
"type":"string",
"fields": {
"raw": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
So when I index a document with a _source like:
{
...
"entityName":"Ned Stark"
...
}
Will the mapping to both the analyzed field and the not_analyzed field complete or is there something else I have to do to tell the indexing to fill in the "raw" property as well?
No, you don't need to do anything else.
After reindexing your documents, you must tell which fields the query should use like in your given documentation article.
Raw subfield:
POST /_search
{
"query": {
"match": {
"entityName.raw": "foo-bar"
}
}
}
or original analysed type:
POST /_search
{
"query": {
"match": {
"entityName": "foo-bar"
}
}
}

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Resources