elasticsearch search request size limit error - elasticsearch

me elasticsearch version 7.9.3 (running on ubuntu) holds an index of each day (logs)
so when a query needs to include for example data from 2020-01-01 until 2020-11-20
Search query will look like this: (which returns error 400)
http://localhost:9200/log_2020-02-14,log_2020-02-26,log_2020-02-27,log_2020-04-24,log_2020-04-25,log_2020-07-17,log_2020-08-01,log_2020-09-09,log_2020-09-21,log_2020-10-06,log_2020-10-07,log_2020-10-08,log_2020-10-16,log_2020-10-17,log_2020-10-18,log_2020-10-21,log_2020-10-22,log_2020-11-12/_search?pretty
I know I can split the request into two but I don't see why (4096 bytes over HTTP it's not so big)
any chance to config this issue ?
response:
{
"error": {
"root_cause": [
{
"type": "too_long_frame_exception",
"reason": "An HTTP line is larger than 4096 bytes."
}
],
"type": "too_long_frame_exception",
"reason": "An HTTP line is larger than 4096 bytes."
},
"status": 400
}

URLs cannot exceed a certain size depending on the medium. Elasticsearch limits that length to 4096 bytes.
Since you seem to be willing to query all indexes of 2020 since January 1st until today (Nov 20), you can use a wildcard like this:
http://localhost:9200/log_2020*/_search?pretty
Another way is by leveraging aliases and put all your 2020 indexes behind the log_2020 alias:
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "log_2020*", "alias" : "log_2020" } }
]
}
After running that you can query the alias directly
http://localhost:9200/log_2020/_search?pretty
If you want to make sure that all your daily indexes get the alias upon creation you can add an index template
PUT _index_template/my-logs
{
"index_patterns" : ["log_2020*"],
"template": {
"aliases" : {
"log_2020" : {}
}
}
}
UPDATE
If you need to query between 2020-03-04 and 2020-09-21, you can query the log_2020 alias with a range query on your date field
POST log_2020/_search
{
"query": {
"range": {
"#timestamp": {
"gte": "2020-03-04",
"lt": "2020-09-22"
}
}
}
}

Related

Elasticsearch “data”: { “type”: “float” } query returns incorrect results

I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107.
When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent.
--> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ?
2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ?
I have 5 float type in my mapping, what is the fastest way to change all of them.
Note: In my mind I have below solutions but I'm afraid it's slow
update by query API
reindex API
run time search request (especially this one)
Thank you!
That date_partition field should have the date type with format=yyyyMMdd, that's the only sensible type to use, not long and even worse float.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108 and have the 20220109 document returned in the results.
Using the date type would also allow you to use proper time-based range queries and create date_histogram aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.
It can be answer of my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Add default value on a field while modifying existing elasticsearch mapping

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

Elasticsearch query returns 10 when expecting > 10,000

I want to retrieve all the JSON objects in Elasticsearch that have a null value for awsKafkaTimestamp. This is the query I have set up:
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "tracer.awsKafkaTimestamp"
}
}
}
}
}
When I curl to my elasticsearch endpoint with the DSL I only get a few values back. I am expecting all (10000+) of them because I know for sure all the awsKafkaTimestamp values are null
This is the response I get when I use Postman. As you can see, there are only 10 JSON objects returned to me:
It's correct behaviour of the elasticsearch. By default, it only returns 10 records and provides information in hits.total field about the total number of documents matching search criteria. To retrieve more data than 10 you should specify size field in your query as shown below (you can read more about it here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html):
{
"from" : 0, "size" : 10,
"query" : {
"term" : { "user" : "kimchy" }
}
}
By default elasticsearch will give you 10 results, even if it matches to 10212. You can set the size parameter but that is limited to 10000, so your only option is to use the scroll API to get,
Example from elasticsearch site Scroll API
curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
"query": {
"match" : {
"title" : "elasticsearch"
}
}
}
'

Elasticsearch 2.3 put mapping (Attempting to override date field type) error

I have some birth_dates that I want to store as a string. I don't plan on doing any querying or analysis on the data, I just want to store it.
The input data I have been given is in lots of different random formats and some even include strings like (approximate). Elastic has determined that this should be a date field with a date format which means when elastic receives a date like 1981 (approx) it freaks out and says the input is in an invalid format.
Instead of changing input dates I want to change the date type to string.
I have looked at the documentation and have been trying to update the mapping with the PUT mapping API, but elastic keeps returning a parsing error.
based on the documentation here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
I have tried:
PUT /sanctions_lists/eu_financial_sanctions/_mapping
{
"mappings":{
"eu_financial_sanctions":{
"properties": {
"birth_date": {
"type": "string", "index":"not_analyzed"
}
}
}
}
}
but returns:
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {eu_financial_sanctions={properties={birth_date={type=string, index=not_analyzed}}}}]"
}
],
"type": "mapper_parsing_exception",
"reason": "Root mapping definition has unsupported parameters: [mappings : {eu_financial_sanctions={properties={birth_date={type=string, index=not_analyzed}}}}]"
},
"status": 400
}
Question Summary
Is it possible to override elasticsearch's automatically determined date field, forcing string as the field type?
NOTE
I'm using the google chrome sense plugin to send the requests
Elastic search version is 2.3
Just remove type reference and mapping from url, you have them inside request body. More examples.
PUT /sanctions_lists
{
"mappings":{
"eu_financial_sanctions":{
"properties": {
"birth_date": {
"type": "string", "index":"not_analyzed"
}
}
}
}
}

Resources