ElasticSearch query returns wrong results - elasticsearch

I'm relatively new to ElasticSearch and encountered this issue which I can't seem to get why.
So for this particular field, it seems to be treating all the values to be zero, even though the individual records are non-zero values. This only seems to happen to this number field and not other similar fields (such as cpu pct, mem pct etc)
The records only show when I query for records that have 'system.filesystem.used.pct == 0', whereas none of them show when I do something like 'system.filesystem.used.pct > 0'.
I also did the querying in the dev tools in kibana like so, yet I don't get any results:
GET metricbeat-*/_search{
"query": {
"range":{
"system.filesystem.used.pct":{
"gt":0
}
}
}
}
However, if I did this, I will get all non-zero results, just like in discover:
GET metricbeat-*/_search
{
"query": {
"term": {
"system.filesytem.used.pct":0
}
}
}

As pointed out by #Ron Serruya, there is a mapping issue. The mapping for system.filesytem.used.pct is detected as to be of integer type. Since, you are getting the expected search results for cpu.pct field, the mapping of cpu.pct, must have been of float type
CASE 1:
If you index the two sample data as (in the same order)
{
"count": 0.45
}
{
"count": 0
}
Then float data type is detected by elasticsearch (if you are using dynamic mapping). this is because the detection of the field type depends on the first data that you have inserted in the field.
CASE 2:
Now, if you index the data in this order
{
"count": 0
}
{
"count": 0.45
}
Here elasticsearch will detect count to be of long data type.
You need to recreate the index, with the new index mapping, reindex the data and then run the search query on system.filesytem.used.pct
Modified index mapping will be
{
"mappings": {
"properties": {
"system": {
"properties": {
"filesytem": {
"properties": {
"used": {
"properties": {
"pct": {
"type": "float"
}
}
}
}
}
}
}
}
}
}

Related

Elasticsearch “data”: { “type”: “float” } query returns incorrect results

I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107.
When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent.
--> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ?
2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ?
I have 5 float type in my mapping, what is the fastest way to change all of them.
Note: In my mind I have below solutions but I'm afraid it's slow
update by query API
reindex API
run time search request (especially this one)
Thank you!
That date_partition field should have the date type with format=yyyyMMdd, that's the only sensible type to use, not long and even worse float.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108 and have the 20220109 document returned in the results.
Using the date type would also allow you to use proper time-based range queries and create date_histogram aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.
It can be answer of my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

Elastic search apply boost based on nested field value

Below is my indexed document
{
"defaultBoostValue":1.01,
"boostDetails": [
{
"Type": "Type1",
"value": 1.0001
},
{
"Type": "Type2",
"value": 1.002
},
{
"Type": "Type3",
"value": 1.0005
}
]
}
i want to apply boost based on value passed, so suppose i pass Type 1 then boost applied will be 1.0001 and if that Type1 does not exist then it will use defaultBoostValue
below is my query which works but quite slow, is there any way to optimize it further
Original question
Above query works but is slow as we are using _source
{
"query": {
"function_score": {
"boost_mode": "multiply",
"functions": [
"script_score": {
"script": {
"source": """
double findBoost(Map params_copy) {
for (def group : params_copy._source.boostDetails) {
if (group['Type'] == params_copy.preferredBoostType ) {
return group['value'];
}
}
return params_copy._source['defaultBoostValue'];
}
return findBoost(params)
""",
"params": {
"preferredBoostType": "Type1"
}
}
}
}
]
}
}
}
I have removed the condition of not having dynamic mapping, if changing the structure of boostDetails mapping can help then I am ok but please explain how it can help and be faster to query also please give mapping types and modified structure if answer contains modifying mapping.
Using dynamic mappings (lots of fields)
It looks like you adjusted the doc structure compared to your original question.
The query above was thought for nested fields which cannot be easily iterated in a script for performance reasons. Having said that, the above is an even slower workaround which accesses the docs' _source and iterates its contents. But keep in mind that it's not recommended to access the _source in scripts!
If your docs aren't nested anymore, you can access the so-called doc values which are much more optimized for query-time access:
{
"query": {
"function_score": {
...
"functions": [
{
...
"script_score": {
"script": {
"lang": "painless",
"source": """
try {
if (doc['boost.boostType.keyword'].value == params.preferredBoostType) {
return doc['boost.boostFactor'].value;
} else {
throw new Exception();
}
} catch(Exception e) {
return doc['fallbackBoostFactor'].value;
}
""",
"params": {
"preferredBoostType": "Type1"
}
}
}
}
]
}
}
}
thus speeding up your function score query.
Alternative using an ordered list of values
Since the nested iteration is slow and dynamic mappings are blowing up your index, you could store your boosts in a standardized ordered list in each document:
"boostValues": [1.0001, 1.002, 1.0005, ..., 1.1]
and keep track of the corresponding boost types' order in the backend where you construct the queries:
var boostTypes = ["Type1", "Type2", "Type3", ..., "TypeN"]
So something like n-hot vectors.
Then, as you construct the Elasticsearch query, you'd look up the array index of the boostValues based on the boostType and pass this array index to the script query from above which'd access the corresponding boostValues doc-value.
This is guaranteed to be faster than _source access. But it's required that you always keep your boostTypes and boostValues in sync -- preferably append-only (as you add new boostTypes, the list grows in one dimension).

Add default value on a field while modifying existing elasticsearch mapping

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources