ElasticSearch - backward pagination with search_after when sorting value is null

ElasticSearch - backward pagination with search_after when sorting value is null - sorting

I have an application which has a dashboard, basically a table with hundreds of thousands of records.
This table has up to 50 different columns. These columns have different types in mapping: keyword, text, boolean, integer.
As records in the table might have the same values, I use sorting as an array of 2 attributes:
First attribute is what client wants to sort by. It can be a simple
sorting object or some sort query with nested filter.
Second
attribute is basically a default sorting by id, needed for sorting
the documents which have identical values for the column customer
wants to sort by.
I checked multiple topics/issues on github and here
on elastic forum to understand how to implement search_after
mechanism for back sorting but it's not working for all the cases I
need.
Please have a look at the image:
Imagine there is a limit = 3, the customer right now is on the 3d page of a table and all the data is sorted by name asc, _id asc
The names are: A, B, C, D, E on the image.
The ids are numeric parts of the Doc word.
When customer wants to go back to the previous page, which is a page #2 on my picture, what I do is pass the following to elastic:
sort: [
{
name: 'desc'
},
{
_id: 'desc'
}
],
search_after: [null, Doc7._id]
As as result, I get only one document, which is Doc6: null on my image. It seems to be logical, because I ask elastic to search by desc after null and id 7 and I have only 1 doc corresponding this..it's Doc6 but it's not what I need.
I can't make up the solution to get the data that I need.
Could anyone help, please?

Related

FaunaDB search document and get its ranking based on a score

I have the following Collection of documents with structure:
type Streak struct {
UserID string `fauna:"user_id"`
Username string `fauna:"username"`
Count int `fauna:"count"`
UpdatedAt time.Time `fauna:"updated_at"`
CreatedAt time.Time `fauna:"created_at"`
}
This looks like the following in FaunaDB Collections:
{
"ref": Ref(Collection("streaks"), "288597420809388544"),
"ts": 1611486798180000,
"data": {
"count": 1,
"updated_at": Time("2021-01-24T11:13:17.859483176Z"),
"user_id": "276989300",
"username": "yodanparry"
}
}
Basically I need a lambda or a function that takes in a user_id and spits out its rank within the collection. rank is simply sorted by the count field. For example, let's say I have the following documents (I ignored other fields for simplicity):
user_id
count
abc
12
xyz
10
fgh
999
If I throw in fgh as an input for this lambda function, I want it to spit out 1 (or 0 if you start counting from 0).
I already have an index for user_id so I can query and match a document reference from this index. I also have an index sorted_count that sorts document based on count field ascendingly.
My current solution was to query all documents by sorted_count index, then get the rank by iterating through the array. I think there should be a better solution for this. I'm just not seeing it.
Please help. Thank you!

Counting things in Fauna isn't as easy as one might expect. But you might still be able to do something more efficient than you describe.
Assuming you have:
CreateIndex(
{
name: "sorted_count",
source: Collection("streaks"),
values: [
{ field: ["data", "count"] }
]
}
)
Then you can query this index like so:
Count(
Paginate(
Match(Index("sorted_count")),
{ after: 10, size: 100000 }
)
)
Which will return an object like this one:
{
before: [10],
data: [123]
}
Which tells you that there are 123 documents with count >= 10, which I think is what you want.
This means that, in order to get a user's rank based on their user_id, you'll need to implement this two-step process:
Determine the count of the user in question using your index on user_id.
Query sorted_count using the user's count as described above.
Note that, in case your collection has more than 100,000 documents, you'll need your Go code to iterate through all the pages based on the returned object's after field. 100,000 is Fauna's maximum allowed page size. See the Fauna docs on pagination for details.
Also note that this might not reflect whatever your desired logic is for resolving ties.

elastic search get distinct random field values

We have elastic search document that has following fields:
{
"stockId": 1
"sellerId": 100
}
Multiple stockId can be mapped to single sellerId but one stock can only be mapped to a single dealer. There are around 10K stocks mapped to 1K sellers. But each sellerId might have different number of stocks i.e. few might have 100 while others have only 1.
Problem Statement: We want to select 'N' random documents out of all these documents indexed. The condition is that each of these 'N' document should belong to different seller i.e. distinct "sellerId". (We need to give award to these sellers).
What I have tried: I am trying to solve this by elastic query that fetches 'N' random distinct 'sellerId'. (and then elastic query to fetch 1 document of each of these 'N' sellers). One way could be to aggregate on 'sellerId' and then pick random 'N' keys but this is not desirable approach performance wise. Can someone help with better query?

I would rebuild my mapping to create a nested document type, with seller being the parent and stockid being the nested object:
{
"sellerid" : {"type" : "integer" },
"stock_obj" : {
"type" : "nested",
"properties" : {
"stockid" : { "type" : "integer" }
}
}
When you rebuild your index, you would create only one object per seller. Each seller would have all of their stock ids. It seems like there are about 10 stocks per seller, elasticsearch can handle this fine. (If there are thousands of stocks per seller, I would do this differently)
Then, I would do a search for N sellers, sorted randomly, and then as a second sort field, you would sort the stock ids randomly. Not the simplest mapping, but the query is easy and should be fast.
Also, separately, if you're just dealing with ~10k seller/stock data points that are integers, using elasticsearch is probably overkill. It can do what you want, but its main purpose is for searching large amounts of text.

Filter ES query based on aggregation results

We have an index with the following document structure:
{
email: "test#test.com",
stuff ..
},
{
email: "test#test.com,
stuff...
},
{
email: anotherEmail#test.com,
stuf..
}
We need to get all records where the count of distinct email is > 2 for example. I know I can use an aggregation with a mininum doc count to find all counts of all records where there are at least 2 records for an email.
But what we need to do is actually get all the records where the count of distinct email is > X. So we need our query to constrain our results to only those records that match an aggregation.
I know that we can have a nested TopHits aggregation, but that is not good enough for us, because we need to be able to page through these results... there could be records where an email has 10k records for example. We need to be able to get these results in the Hits collection so that we can page them.
How would we go about doing something like that?

Automatically indexing by a field name as desc

i have index type of book story that every week wants to put some books.
in this index i want to have always query by sorting a field name(in this case is "price" ) as desc so it's have some overhead on ES (cause of data volume)
in this service we always shows to user books by maximum to minimum price
is possible to have this feature automatically or manually for sorting document of book type in index always by price as desc and then when to want to query them it's always sorted by price as desc and dont need to give it by:
"sort" : { "price" { "order" : "desc" } }

No, you can not keep your data ordered based on a field. Elasticsearch keeps the data as Lucene segments inside. Take a look here to better understand internal structure of ES: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

Indexes for mongodb

I have a mongo db collection for restaurants.
e.g.
{_id: uniquemongoid,
rank: 3,
city: 'Berlin'
}
Restaurants are listed by city and ordered by rank (an integer) - should I create an index on city and rank, or city/rank compound? (I query by city and sort by rank)
Furthermore there are several fields with booleans e.g. { hasParking:true, familyFriendly:true } - should I create indexes to speed up queries for these filters? compound indexes? Its not clear for me if I should create compound indexes as the queries can have only one boolean set or more booleans set.

The best way to figure out whether you need indexes is to benchmark it with "explain()".
As for your suggested indexes:
You will need the city/rank compound index. Indexes in MongoDB can only be used for left-to-right (at the moment) and hence doing an equality search on "city" and then sorting the result by "rank" will mean that the { city: 1, rank: -1 } index would work best.
Indexes on boolean fields are often not very useful, as on average MongoDB will still need to access half of your documents. After doing a selection by city (and hopefully a limit!) doing an extra filter for hasParking etc will not make MongoDB use both the city/rank and the hasParking index. MongoDB can only use one index per query.

1) create index { restaurant:1, rank: 1} which will serve your purpose.
You will avoid 2 indexes
2) Create a document in following format and you can query for any no of fields you want.
{
info: [{hasParking:true}, {familyFriendly:true}],
_id:
rank:
city:
}
db.restaurants.ensureIndex({info : 1});
db.restaurants.find({ info :{ hasParking:true}})
Note MongoDB don't use two index for the same query (except $or queries). So, in the (2) case, if you want to add addition filter over the (1) query, then this (2) option won't work. I am not sure of your (2) requirement, so posting this solution.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio