Filter on IDs with RediSearch

Filter on IDs with RediSearch - filter

Is there any way to restrict RediSearch results by a list of document IDs, which would be specified in the request?
e.g. something like FT.SEARCH cars fast #id:{100,200,300} would return only fast cars having ID 100, 200, or 300.

Yes, there's the INKEYS keyword.
> FT.SEARCH cars "fast" INKEYS 3 100 200 300
See https://oss.redislabs.com/redisearch/Commands/#ftsearch

Related

Navigating terms aggregation in Elastic with very large number of buckets

Hope everyone is staying safe!
I am trying to explore the proper way to tacke the following use case in elasticsearch
Lets say that I have about 700000 docs which I would like to bucket on the basis of a field (let's call it primary_id). This primary id can be same for more than one docs (usually upto 2-3 docs will have same primary_id). In all other cases the primary_id is not repeted in any other docs.
So on average out of every 10 docs I will have 8 unique primary ids, and 1 primary id same among 2 docs
To ensure uniqueness I tried using the terms aggregation and I ended up getting buckets in response to my search request but not for the subsequent scroll requests. Upon googling, I found that scroll queries do not support aggregations.
As a result, I tried finding alternates solutions, and tried the solution in this link as well, https://lukasmestan.com/learn-how-to-use-scroll-elasticsearch-aggregation/
It suggests use of multiple search requests each specifying the partition number to fetch (dependent upon how many partitions do you divide your result in). But I receive client timeouts even with high timeout settings client side.
Ideally, I want to know what is the best way to go about such data where the variance of the field which forms the bucket is almost equal to the number of docs. The SQL equivalent would be select DISTINCT ( primary_id) from .....
But in elasticsearch, distinct things can only be processed via bucketing (terms aggregation).
I also use top hits as a sub aggregation query under terms aggregation to fetch the _source fields.
Any help would be extremely appreciated!
Thanks!

There are 3 ways to paginate aggregtation.
Composite aggregation
Partition
Bucket sort
Partition you have already tried.
Composite Aggregation: can combine multiple datasources in a single buckets and allow pagination and sorting on it. It can only paginate linearly using after_key i.e you cannot jump from page 1 to page 3. You can fetch "n" records , then pass returned after key and fetch next "n" records.
GET index22/_search
{
"size": 0,
"aggs": {
"ValueCount": {
"value_count": {
"field": "id.keyword"
}
},
"pagination": {
"composite": {
"size": 2,
"sources": [
{
"TradeRef": {
"terms": {
"field": "id.keyword"
}
}
}
]
}
}
}
}
Bucket sort
The bucket_sort aggregation, like all pipeline aggregations, is
executed after all other non-pipeline aggregations. This means the
sorting only applies to whatever buckets are already returned from the
parent aggregation. For example, if the parent aggregation is terms
and its size is set to 10, the bucket_sort will only sort over those
10 returned term buckets
So this isn't suitable for your case
You can increase the result size to value greater than 10K by updating setting index.max_result_window. Setting too big a size can cause out of memory issue so you need to test it out see how much your hardware can support.
Better option is to use scroll api and perform distinct at client side

Limit facets and random count to value specified by user

Using Solr 6.4
When running a query over a set of documents I need to be able to return a random set of results that are limited to a number the customer requests.
e.g. The customer running the search wants 100 random documents from the 1,000,000 they have in their index.
Of course I could &fq={!frange incu=false l=0 u=1}mod(random_1927377943, 1)&pageSize=100... problem solved.
Not really, because I also need the facets for the type of document they are searching for in their index. However, the facets are related to the numFound which there could be any number found between 0-1,000,000.
"response":{"numFound":1000000,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"documentTypeId":[
"78",500000,
"3",250000,
"2",150000,
"1",100000,
How do I limit the random numFound results to the specific number the customer is asking for (100) AND where the facets of each documentTypeId reflect the total the number of random results the customer is requesting?
"response":{"numFound":100,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"documentTypeId":[
"78",50,
"3",25,
"2",15,
"1",10,

ElasticSearch: query for N items of each category

I have an index of goods in ElasticSearch (5.5), of them every product has a field "category", like "GLOVES", "COAT", "TOWEL".
With the terms query I can select items belonging to several categories, e.g.
{
"terms": {
"div_id": ["COAT", "DRESS", "JACKET"]
}
}
Now the problem is that I want to have in response several items of each type, say, not less than 3 (given that total size of answer is 15 records).
And I have no clear idea how to do this. With the given "straight" way it may return any number from any category. The closest I get is to add random_score which makes result "diverse", but it then depends on how many percents every category takes in the index.
I suspect there should be different approach, but can't guess correct keywords, seemingly.
Thanks in advance!

You may want to try top hits agg documented here.

Search by filters using views in CouchDB

I have a CouchDB database where I store models like this:
"_id": "id",
"_rev": "rev",
"field_1": "test",
"filed_2": 45,
"filed_3": 15,
"object_1": {
"field_1_1": 123,
"filed_1_2": 125
}
}
And I want to search for models by specific parameters in different ranges (filters).
For example, in one situation I need to find all the models with
field_2 from 10 to 50
field_3 from 10 to 20
object_1.field_1_1 from 100 to 150, object_1.field_1_2 from 120 to 130
In another case I need to find just all the models with field_2 from 10 to 50.
At the moment I wrote view like this:
function (doc) {
emit([doc.filed_2, doc.field_3, doc.object_1.field_1_1, doc.object_1.filed_1_2], 1);
}
So it generates that result:
{"id":"id","key":[45,15,123, 125],"value":1}
I can use this array-key to fetch necessary models and I can use "startkey" and "endkey" to generate ranges.
But Is there more efficient way to create search by different filters (some filters can be skipped, user selects the filters he wants to search by) in CouchDB? How Can I combine different parameters?
And How Can I skip parameters if they were not chosen for search (like in the second case)?
Thank you.

In CouchDB 2.x you can use the /db/_find endpoint with Mango expressions in order to query the database.
Please, check the expression syntax in order to check if it can cover your needs.

Filter ES query based on aggregation results

We have an index with the following document structure:
{
email: "test#test.com",
stuff ..
},
{
email: "test#test.com,
stuff...
},
{
email: anotherEmail#test.com,
stuf..
}
We need to get all records where the count of distinct email is > 2 for example. I know I can use an aggregation with a mininum doc count to find all counts of all records where there are at least 2 records for an email.
But what we need to do is actually get all the records where the count of distinct email is > X. So we need our query to constrain our results to only those records that match an aggregation.
I know that we can have a nested TopHits aggregation, but that is not good enough for us, because we need to be able to page through these results... there could be records where an email has 10k records for example. We need to be able to get these results in the Hits collection so that we can page them.
How would we go about doing something like that?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio