Is there a limit on the number of filters that can be passed to DynamoDB query - elasticsearch

I’d like to search for something like
Field “Id” has a value in [Long list of IDs]
This long list of Ids can hit over 1000 Ids.
should I expect a problem with that? Is there a limit on how long the query can be?
I am looking at cloudsearch and it seems to have a limit of 1024 clauses and wondering if it should just be done from DynamoDB if there are no limits on it.
At that point, I guess I should also ask if Elastic search/Open search has such limits,

You can review the various DynamoDB limits at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ServiceQuotas.html. Here's two that will impact you:
The maximum length of any expression string is 4 KB.
The maximum number of operands for the IN comparator is 100.

Elasticsearch has a limit on max clause a query can have as explained in the search settings, but if you are using it only to filter the data, as you mentioned in your question, than you can simply use the terms query where you can send a long list of ids to filter on, this is also advised by Elasticsearch in the same document search settings.

Field “Id” has a value in [Long list of IDs]
This long list of Ids can hit over 1000 Ids.
Is Id your primary key?
If so, no because in dynamo you can do batchGet operation on passing the Ids and for 1000 you will have to do 10 concurrent/sequential calls to dynamo.
if it's not a primary key, i.e. a secondary index then you will have to do 1000 concurrent query operation to check the presence of the key.

Related

Elasticsearch query to return limited amount of result (10) which will contain 2 from each specified keyword

I have articles stored in Elasticsearch and I've been wondering if there is a way I can query by date but the result to contain a specific amount of articles from each publisher. More specifically, I have 5 different publishers and I want to get the 10 latest articles, 2 from each publisher. I'm storing the publishers name as a keyword field in elastic.
The only idea I've come up with is to run a query for each publisher separately and limit the result to the first 2 (and then merge the results programmatically), but it will be more efficient I think if there is way I can do this in a single query.
Thanks
This sounds like a case for field collapsing.
You would collapse on the publisher field (as long as it is a keyword or a number) and then request inner_hits, the actual articles.

Pagination with multi match query

I'm trying to figure out how to accomplish pagination with a multi match query using elasticsearch.
The scroll and search_after APIs seem like they won't work. scroll isn't meant for real time user requests as per documentation. search_after requires some unique field per id and requires you to sort on that field as per documentation but when using a multi-match query you're basically sorting by the score.
So, the only thing I've thought of so far is to do the following:
Send back last document id + score and use the score as the sort field. But, this could potentially return duplicate documents if other documents were added in between two queries.
If you want to paginate the first option is to use from and size parameter in your query. The documentation here
Pagination of results can be done by using the from and size
parameters. The from parameter defines the offset from the first
result you want to fetch. The size parameter allows you to configure
the maximum amount of hits to be returned.
Though from and size can be set as request parameters, they can also
be set within the search body. from defaults to 0, and size defaults
to 10.
Note that from + size can not be more than the index.max_result_window
index setting which defaults to 10,000. See the Scroll or Search After
API for more efficient ways to do deep scrolling.
If you don't need to paginate over 10k results it's your best choice. The max_result_window can be modified, but the performance will decrease as the selected page number will increase.
But of course if some documents are added during your user pagination they will be added and your pagination can be slightly inaccurate.

End of search results using search_after parameter from Elastic Search API

For a given date range in the query and with a search_after parameter I am able to successfully extract the relevant results. How do I figure out if I am at the end of the search results for the given date range and I dont have to continue querying with the search_after parameter.
There is a pretty cool "trick" that does not involve any additional queries or knowledge of the total number of results:
Say you have a page size of 20. Instead of asking elasticsearch for 20 results, ask it for 21.
If you got 21 results back, only use the first 20 of them. But you now know that the next query will have at least one more result (If you use the sort values of the 20th result for the search_after parameter, not the 21st!).
If you get 20 results or fewer, there will be no additional results.
This github issue gives some more details into why elasticsearch does not have this feature out of the box: https://github.com/elastic/elasticsearch/issues/22364
You can either keep querying until it starts returning zero results, or it does return the total, so you could keep a track of how many you've already retrieved and stop searching once you've met the total. (I do a combination of both)

VLV search request: Unavailable Critical Extension: The search results cannot be sorted because the given search request is not indexed

First, it seems this isn't related to unindexed-search privilege.I try ROOT DN user, same problem.
My Case:
I have 5000 entries of user, each entry contains "xxx#XXX.com" in the "mail" attribute.
And I have a VLV with sort order: +uid +cn +mail
I try the filter "(mail=.com)" in VLV, trying to get a paged result, with total count returned. I understand that returned values will exceed 4000 limit. And I understand that SSS is very expensive request(this is admin, so this operation won't be too often).
My question is: in this case, should I accept it and tell the user to narrow down the search result, or there are any possible solutions to solve this?
Thanks,
Wayne
No this is not related to the unindexed privilege, but to internal administrative limits.
VLV requests (and sort requests) will work without proper indexing only if they are processing less than 4000 entries.
Otherwise, a proper VLV Index is required, and to be used it must match all parameters of the search query: base, scope, filter and sort parameters.

In Elastic Search how can I get result from each types in an index for result limited to 10 query.?

I have four types in my index and I am searching for a keyword and the result is limited to 10.I need to get records from all types.Is it possible.?
If you mean getting the first 10 docs per type, I'd use the multisearch API.
See https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-multi-search.html

Resources