How to get total records with pagination in Elasticsearch - elasticsearch

In my product index I have 60K records, I'm running a match query with from, size params. by default, ES support a 10k record search so I have increased it to 60k.
My issue is I'm already passing size param from API & I want to search in 60K records with the same pagination size param.
So how I can match 60k records with my pagination size param.
Here is my code:
const query = {
query: {
match: {
name: {
query: req.text
}
}
}
}
const { body: { hits } } = await esclient.search({
from: req.skip || 0,
size: req.offset || 50,
index: productsIndex,
type: productsType,
body: query,
});
here code with 60K size:
const query = {
query: {
match: {
name: {
query: req.text
}
}
},
size:60000
}
in my query, if I use size=60K, I'm getting 20 records (without that I'm getting 12 records) but I can't use pagination params.

If I get your question correctly,
size parameter is the number of hits/documents you want in response. It's not that elasticsearch will process your query only in first size number of documents. 10K is the maximum number of documents you can get in response. If you specify size more than 10K, elastic search will give you an error like this.
ES error
Without worring about the above problem, use from and size as the parameters for your pagination.

Related

What is the recommended schema for paginated GraphQL results

Let's say I have users list to be returned. What would be best schema strategy among following.
Users returned contains only the data of user as follows, separate query is used for pagination details. In this query the downside is we need to pass same filters to both users and usersCount query.
query {
users(skip: 0, limit: 100, filters: someFilter) {
name
},
usersCount(filters: someFilters)
}
Which return following
{
results: {
users: [
{ name: "Foo" },
{ name: "Bar" },
],
usersCount: 1000,
}
}
In this strategy we make pagination details as part of users query, we don't need to pass filters twice. I feel this query is not nice to read.
query {
users(skip: 0, limit: 100, filters: someFilter) {
items: {
name
},
count
}
}
Which returns the following result
{
results: {
users: {
items: [
{ name: "Foo" },
{ name: "Bar" },
],
count: 1000,
}
}
}
I am curious to know which strategy is the recommended way while designing paginated results?
I would recommend to follow the official recommendation on graphql spec,
You need to switch to cursor based pagination.
This type of pagination uses a record or a pointer to a record in the dataset to paginate results. The cursor will refer to a record in the database.
You can follow the example in the link.
GraphQL Cursor Connections Specification
Also checkout how GitHub does it here: https://docs.github.com/en/graphql/reference/interfaces#node

Size for FaunaDB GraphQL nested query

How do I set the size for a nested query?
For instance, the following will only return max. 50 users for each group.
const query = gql`
query GetGroups {
groups(_size: 100){
data{
_id
name
users(_size: 500){
data{
_id
name
}
}
}
}
}
`
It looks like you wrote your query correctly. You will get a GraphQL error if you provide the _size argument where it is not allowed.
If there are only 50 results showing up when providing a size greater than 50, then there most likely exactly 50 matches.
The GraphQL query you shared will be compiled to FQL that works roughly like the following, and return the result of paginating the nested relationship with the size that you provide.
Let(
v0: Paginate(Match(Index("groups")), { size: 100 }),
{
groups: {
data: Map(
Var("v0"),
Lambda(
"ref",
Let(
{
v1: Paginate(
Match(Index("group_users_by_group"), Var("ref")),
{ size: 500 }
)
},
{
users: Map(Var("v1"), /* ... select fields for users */),
/* ... select other fields for group */
}
)
)
)
}
}
)
If you are still concerned that there is an issue, please contact Fauna support at support.fauna.com or by emailing support#fauna.com, using the email you used to sign up. Then we can take a closer look at your account.

How can I response to client based on what fields they are querying in graphql?

I am using AWS appsync for graphql server and have schema like:
type Order {
id: ID!
price: Int
refundAmount: Int
period: String!
}
query orders (userId: ID!) [Order]
It is to support query orders based on user id. It responses an array of orders for different time period. The response could be:
[{
id: xxx
price: 100
refundAmount: 10
period: '2021-01-01'
},{
id: xxx
price: 200
refundAmount: 0
period: '2021-01-03'
},
...
]
If the price and refundAmount in the period is 0, I won't response empty element in the array. In the above example, there is price and refundAmount on 2021-01-02, so there is no such element in the array.
My problem is how can I response the data based on what frontend queries? If customer only query refundAmount field in the response, I don't want to response 2021-01-03 period. How do I know what fields frontend wants to show in the response?
e.g.
If clients send this query:
query {
orders (userId: "someUserId") {
refundAmount
}
}
I will response below data but I don't want the second one to be there since the value is 0.
[{
id: xxx
refundAmount: 10
period: '2021-01-01'
},{
id: xxx
refundAmount: 0
period: '2021-01-03'
}
]
My problem is how can I response the data based on what frontend
queries?
GraphQL will do that out of the box for you provided you have the resolvers for the fields in the query. Look at appropriate resolver based on your underlying data source.
How do I know what fields frontend wants to show in the response?
This is what the frontend decides, it can send a different query based on the fields it is interested. A few examples below.
If the frontend is interested in only one field i.e. refundAmount, then it would send a query something like this.
query {
orders (userId: "someUserId") {
refundAmount
}
}
If it is interested in more than 1 field say price and refundAmount then the query would be something like this
query {
orders (userId: "someUserId") {
price,
refundAmount
}
}
Update: Filter response:
Now based on the updated question, you need to enhance your resolver to do this additional filtering.
The resolver can do this filtering always (Kind of hard coded like refundAmount > 0 )
Support a filter criteria in the query model query orders (userId: ID!, OrderFilterInput) [Order] and the define the criteria based on which you want to filter. Then support those filter criteria in the resolvers to query the underlying data source. Also take the filter criteria from the client.
Look at the ModelPostFilterInput generated model on this example.
Edit 2: Adds changed Schema for a filter
Let's say you change your Schema to support filtering and there is no additional VTL request/response mappers and you directly talk to a Lambda.
So this is how the Schema would look like (of course you would have your mutations and subscriptions and are omitted here.)
input IntFilterInput { # This is all the kind of filtering you want to support for Int data types
ne: Int
eq: Int
le: Int
lt: Int
ge: Int
gt: Int
}
type Order {
id: ID!
price: Int
refundAmount: Int
period: String!
}
input OrderFilterInput { # This only supports filter by refundAmount. You can add more filters if you need them.
refundAmount: IntFilterInput
}
type Query {
orders(userId: ID!, filter: OrderFilterInput): [Order] # Here you add an optional filter input
}
schema {
query: Query
}
Let's say you attached the Lambda resolver at the Query orders.
In this case, the Lambda would need to return an array/list of Orders.
If you are further sending this query to some table/api, you need to understand the filter, and create an appropriate query or api call for the downstream system.
I showing a simple Lambda with hard coded response. If we bring in the Filter, this is what changes.
const getFilterFunction = (operator, key, value) => {
switch (operator) {
case "ne":
return x => x[key] != value
case "eq":
return x => x[key] == value
case "le":
return x => x[key] <= value
case "lt":
return x => x[key] < value
case "ge":
return x => x[key] >= value
case "gt":
return x => x[key] > value
default:
throw Error("Unsupported filter operation");
}
}
exports.handler = async(event) => {
let response = [{
"id": "xxx1",
"refundAmount": 10,
"period": '2021-01-01'
}, {
"id": "xxx2",
"refundAmount": 0,
"period": '2021-01-03'
}]
const filter = event.arguments.filter;
if (filter) { // If possible send the filter to your downstream system rather handling in the Lambda
if (filter.refundAmount) {
const refundAmountFilters = Object.keys(filter.refundAmount)
.map(operator => getFilterFunction(operator + "", "refundAmount", filter.refundAmount[operator]));
refundAmountFilters.forEach(filterFunction => { response = response.filter(filterFunction) });
}
}
return response; // You don't have to return individual fields the query asks for. It is taken care by AppSync. Just return a list of orders.
};
With the above in place, you can send various queries like
query MyQuery {
orders(userId: "1") { #without any filters
id
refundAmount
}
}
query MyQuery {
orders(userId: "1", filter: {refundAmount: {ne: 0}}) { # The filter you are interested
id
refundAmount
}
}
query MyQuery {
orders(userId: "1", filter: {refundAmount: {ne: 0, gt: 5}}) { # Mix and Match filters
id
refundAmount
}
}
You don't have to support all the operators for filtering and you can focus only on ne or != and further simplify things. Look at this blog for a more simple version where the filter operation is assumed.
Finally the other possibility to filter without modifying the Schema is to change your Lambda only to ensure it returns a filtered set of results either doing the filtering itself or sending an appropriate query/request to the underlying system to do the filtering.

Passing Variables into GraphQL Query in Gatsby

I want to limit the number of posts fetched on my index page. Currently, the number of pages is hard-coded into the GraphQL query.
query {
allMarkdownRemark(limit: 5, sort: { fields: [frontmatter___date], order: DESC }) {
totalCount
edges {
node {
...
}
}
}
}
I want to replace "5" with the value of a variable. String interpolation will not work with the graphql function, so I have to use another method.
Is there a way to achieve this and pass variables into a GraphQL query in GatsbyJS?
You can only pass variables to GraphQL query via context since string interpolation doesn't work in that way. In page query (rather than static queries) you can pass a variable using the context object as an argument of createPage API. So, you'll need to add this page creation to your gatsby-node.js and use something like:
const limit = 10;
page.forEach(({ node }, index) => {
createPage({
path: node.fields.slug,
component: path.resolve(`./src/pages/index.js`), // your index path
// values in the context object are passed in as variables to page queries
context: {
limit: limit,
},
})
})
Now, you have in your context object a limit value with all the required logic behind (now it's a simple number but you can add there some calculations). In your index.js:
query yourQuery($limit: String) {
allMarkdownRemark(limit: $limit, sort: { fields: [frontmatter___date], order: DESC }) {
totalCount
edges {
node {
...
}
}
}
}

Get All Pages with an Apollo Query

Suppose I have 500 rows in my database. I want to get 100 pages of 50 rows. Here is an example of a fetchMore request.
const fetchNextPage = (props) => {
props.Query.fetchMore({
query: gql(getRows),
variables: {
skip: props.Query.rows.length,
},
updateQuery: (previousResult, next) => {
return {
...previousResult,
rows: [...previousResult.rows, ...next.fetchMoreResult.rows],
};
},
});
}
What I'm unsure about is...
How I can fetch all pages without additional user action?
Since I know the total number of needed pages how can I send them in parallel?
You can actually do that with one operation. Using aliases you can request the same field multiple times with different arguments.
Here is the official explanation about aliases.
In your case it would be something similar to:
query GetAllPages {
page1rows: rows(skip: 0, limit: 50) { # "skip" and "limit" are just regular variable names
#...rowFields
}
page2rows: rows(skip: 50, limit: 50) {
#...rowFields
}
#... etc.
}
In this example page1rows and page2rows are aliases for the rows field. You can choose other aliases.
Note that skip and limit are nothing special, they are plain variables and their use depends on the schema and resolvers on the server. I see in your code you use skip, if you know you'll always get 50 rows then limit is redundant.
The response should be something like:
{
"data": {
"page1rows": [
//... rows
],
"page2rows": [
//... rows
],
//... etc.
}
}
That way works without additional user action, you get all pages at once and there's no need to use fetchMore.

Resources