I want to retrieve information about all available indices in my elasticsearch db. For that I send a request to "<elasticsearch_endpoint>/logs-cfsyslog-*/_search/?format=json".
The body of the request is irrelevant for this problem. I'm simple filtering for a specifiy value for one field. I would expect that the api returns all indices of the last 30 days. However, I only receive some of the available archives. Some that are missing are: 3rd March, 11th-17th and 26th-27th February.
But when I retrieve all available indices with the "_CAT" API via
"<elasticsearch_endpoint>/_cat/indices/logs-cfsyslogs-*"
I can see ALL indices that I expect.
I can even specify the exact date that I'm looking for in the search API via:
"<elasticsearch_endpoint>/logs-cfsyslog-2022.03.03/_search/?format=json"
and the API will return the index that I specified.
So why or how does elasticsearch not return for example the index from 3rd March 2022 when I use the wildcard "*" in the search request?
it may be due to one of the below reson.
First, Default value of size is 10
Considering you are calling "<elasticsearch_endpoint>/logs-cfsyslog-*/_search/?format=json" this API and not passing size parameter so elastic search return max 10 documents in response. try below API and check how many result you are getting and from which index.
<elasticsearch_endpoint>/logs-cfsyslog-*/_search/?format=json&size=10000
Second, Due to filtering
I'm simple filtering for a specifiy value for one field.
As you mentioned in question, you are applying filter for one field on specific value so might be chances that filter condition is not matching with other indices.
Please check what value you are getting for hits.total in your response and based on that you can set value of size parameter. Please not that elasticsearch will return max 10,000 documents.
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
}
}
Related
I'm querying ES to get me list of documents within some specific timestamp. Now I encountered a scenario where we had multiple versions of single documentId. By default ES returned me all the versions of that single documentId. My requirement is to get only one last version of all the docs.
Also I wanted to get all the ES response sorted in Ascending order of one the indexed timestamp field(called as streamingSegmentStartTime)
my current query looks like following :
{"size":25,"query":{"bool":{"must":[{"terms":{"streamingSegmentId":["00002933-be25-3b9c-9970-472b41aa53cc"],"boost":1.0}},{"range":{"streamingSegmentStartTime":{"from":1644480000000,"to":1647476658447,"include_lower":true,"include_upper":false,"boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["errorCount","benefitId","streamingSegmentStopTime", "fanoutPublishTimestamp", "search.version"],"excludes":[]},"sort":[{"streamingSegmentStartTime":{"order":"asc"}}, {"_timestamp": {"order": "desc"}}]}
try to use the collapse parameter to collapse search results based on field values
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/collapse-search-results.html#collapse-search-results
Can elastalert be triggered when the sum of a field for all documents that match a query exceeds some value? Say each document has a "price" value - Can elastalert be triggered when the sum of the "price" values over the last day exceeds 200, for example?
Example document:
{
type: "transaction",
price: 20.32
}
Example rule in english:
The sum of all documents where type = 'transaction' over the past hour exceeds 200
This is not supported out of the box by ElastAlert.
There's an open issue which is still unresolved yet, as well as a related pull request which hasn't been merged yet.
However, you may be able to modify ElastAlert by yourself by following the steps described in the issue and using the contributed patch. Should be a no brainer.
I have 2 fields type in my index;
doc1
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
}
doc2
{
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
now I need such a query that filter in same url field and returns fields both of documents:
result:
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
The results given by elasticsearch are always per document, means that if there are multiple documents satisfying your query/filter, they would always appear as a different documents in the result and never merged into a single document. Hence merging them at client side is the one option which you can use. To avoid getting complete document and just to get the relevant fields, you can use "fields" in your query.
If this is not what you need and still needs narrowing down the result from the query itself, you can use top hit aggregations. It will give you the complete list of documents under a single bucket. But it would also have source field which would contain the complete documents itself.
Try giving a read to page:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-top-hits-aggregation.html
I need to update or delete several documents.
When I update I do this:
I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000).
For each of the returned documents, I modify certain values.
I resent to elasticsearch the whole modified list (bulk index).
This operation takes place until point 1 no longer returns results.
When I delete I do this:
I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000)
I delete every found document sending to elasticsearch _id document (10000 requests)
This operation repeats until point 1 no longer returns results.
Is this the right way to make an update?
When I delete, is there a way I can send several ids to delete multiple documents at once?
For your massive index/update operation, if you don't use it already (not sure), you can take a look at the bulk api documentation. it is tailored for this kind of job.
If you want to retrieve lots of documents by small batches, you should use the scan-scroll search instead of using from/size. Related information can be found here.
To sum up :
scroll api is used to load results in memory and to be able to iterate over it efficiently
scan search type disable sorting, which is costly
Give it a try, depending on the data volume, it could improve the performance of your batch operations.
For the delete operation, you can use this same _bulk api to send multiple delete operation at once.
The format of each line is the following :
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "1" } }
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "2" } }
For deletion and update, if you want to delete or update by id you can use the bulk api:
Bulk API
The bulk API makes it possible to perform many index/delete operations
in a single API call. This can greatly increase the indexing speed.
The possible actions are index, create, delete and update. index and
create expect a source on the next line, and have the same semantics
as the op_type parameter to the standard index API (i.e. create will
fail if a document with the same index and type exists already,
whereas index will add or replace a document as necessary). delete
does not expect a source on the following line, and has the same
semantics as the standard delete API. update expects that the partial
doc, upsert and script and its options are specified on the next line.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
You can also delete by query instead:
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Hi StackOverflowcrowd,
iam currently working on elasticsearch for my website. My question is:
Is it possible to boost an query depending on the value of a field.
Example:
I have a field called premiumlevel. If the "Datarow" contains a premiumlevel of 4 the boostvalue shall be 40 if it is 3 the boostvalue shall be 30 and so on.
Boost = fieldvalue multiplied by 10
Is there a way to realize a query like this? I saw a way to a similiar query for dates with range but I never found an example where the actual value of the field is used for the boost value calculation.
Edit:
I think I just found the solution for my question.
If I get the Documentation right it might be possible to achieve my goal with CustomScore Queries. I will edit this answer with my codesnippet after I have testet it completly.
This is the Documentationparagraph:
http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query.html
P.S.: The rule that new users are not allowed to answer their own questions before 8 hours have passed is not logical/productive in my honest opinion.
The only way I can think of to do what you want would be to actually index the boost value:
From the doc: http://www.elasticsearch.org/guide/reference/mapping/boost-field.html
{
"tweet" : {
"_boost" : {"name" : "_boost", "null_value" : 1.0}
}
}
The above mapping defines mapping for a field named _boost. If the _boost field exists within the JSON document indexed, its value will control the boost level of the document indexed.