Function local ownership where references are required - elasticsearch

I am currently trying to build a request to ElasticSearch using the elasticsearch dependency.
Below, find the simplified version of the code I have written:
fn test<'a>(client: &'a Elasticsearch) -> BoxFuture<'a, std::result::Result<Bytes, elasticsearch::Error>> {
let index_parts = ["foo", "bar"]; // Imagine this list being computed and not literal
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send();
search_response
.and_then(|resp| resp.bytes())
.boxed()
}
The error I get:
cannot return value referencing local variable index_parts
returns a value referencing data owned by the current function
I totally understand why I get this error - I create a new array inside of test, but SearchParts::Index expects only a &'b [&'b str], so I have no way of giving ownership to it. So I am stuck with it.
Of course there are a couple of simple solutions to this, first and foremost simply inlining test instead of creating a separate function, or somehow returning index_parts with the Future, but those solutions leak implementation details and we all know that this is bad.
So, how do I fix this error without breaking encapsulation?

A colleague of mine suggested the following solution:
async fn test<'a>(client: &'a Elasticsearch) -> std::result::Result<Bytes, elasticsearch::Error> {
let index_parts = vec!["foo", "bar"]; // Imagine this list being computed and not literal
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send()
.await?;
search_response.bytes().await
}
Although I understand that with async, everything inside of the function body is effectively a closure, I feel like there should be a way of doing this without using async as well, but I am not sure.
There is a further problem with this solution though - the returned Future has a lifetime bound to the lifetime of the Elasticsearch reference which leads to issues down the line.
To fix this:
async fn test(client: & Elasticsearch) -> std::result::Result<Bytes, elasticsearch::Error> {
let client = client.clone();
async move {
let index_parts = ["foo", "bar"];
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send()
.await?;
Ok(search_response.bytes().await?)
}.await
}

Related

Msearch Elasticsearch API - Rust

By this point, I feel like I am the only other person on earth that is using multi-search on Rust... other than the person who wrote it.
There is zero documentation on this other than this hyper-confusing one https://docs.rs/elasticsearch/7.14.0-alpha.1/elasticsearch/struct.Msearch.html
I figured I had to pass MsearchParts parts as an argument for the client.msearch(here goes msearch_parts), and luckily for me, there a piece of documentation for how that is supposed to be, but such documentation is so poorly done that I have no clue of what to do because I did not write the API.
I have no clue of how to pass my JSON
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Whiskers"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Chicken"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Turkey"}}}}
"NOT IN THE CODE: extra EMPTY line required by elasticsearch multi-searches"
and get a 200^ response.
As a side note, my JSON is well formatted into a string that can be sent in a normal reqwest the issue is more on how to turn that JSON string into MsearchParts.
The body needs to conform to the structure specified for the msearch API
The multi search API executes several searches from a single API
request. The format of the request is similar to the bulk API format
and makes use of the newline delimited JSON (NDJSON) format.
The structure is as follows:
header\n
body\n
header\n
body\n
let client = Elasticsearch::default();
let msearch_response = client
.msearch(MsearchParts::None)
.body::<JsonBody<Value>>(vec![
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Whiskers"}}}}).into(),
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Chicken"}}}}).into(),
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Turkey"}}}}).into(),
])
.send()
.await?;
let json: Value = msearch_response.json().await?;
// enumerate over the response objects in the response
for (idx, response) in json["responses"]
.as_array()
.unwrap()
.into_iter()
.enumerate()
{
println!();
println!("response {}", idx);
println!();
// print the name of each matching document
for hit in response["hits"]["hits"].as_array().unwrap() {
println!("{}", hit["_source"]["name"]);
}
}
The above example uses MsearchParts::None, but because all search requests are targeting the same index, the index can be specified with MsearchParts::Index(...), and it doesn't need to be repeated in the header for each search request
let client = Elasticsearch::default();
let msearch_response = client
.msearch(MsearchParts::Index(&["cat_food"]))
.body::<JsonBody<Value>>(vec![
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Whiskers"}}}}).into(),
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Chicken"}}}}).into(),
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Turkey"}}}}).into(),
])
.send()
.await?;
let json: Value = msearch_response.json().await?;
// enumerate over the response objects in the response
for (idx, response) in json["responses"]
.as_array()
.unwrap()
.into_iter()
.enumerate()
{
println!();
println!("response {}", idx);
println!();
// print the name of each matching document
for hit in response["hits"]["hits"].as_array().unwrap() {
println!("{}", hit["_source"]["name"]);
}
}
Because msearch's body() fn takes a Vec<T> where T implements the Body trait, and Body is implemented for &str, you can pass a string literal directly
let msearch_response = client
.msearch(MsearchParts::None)
.body(vec![r#"{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Whiskers"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Chicken"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Turkey"}}}}
"#])
.send()
.await?;
let json: Value = msearch_response.json().await?;
Per the doc of MsearchParts, it looks an array of &str needs to be used to construct the Index (or IndexType) variant of enum MsearchParts. So, please give the following way a try and see if it works.
Let parts = MsearchParts::Index([
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Whiskers"}}}}"#,
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Chicken"}}}}"#,
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Turkey"}}}}"#
]);
After hours of investigating I took some other approach, using a body vector and the msearch API.
I think the json doesnt go to the msearchparts but to a vector of bodies.
(see https://docs.rs/elasticsearch/7.14.0-alpha.1/elasticsearch/#request-bodies)
It runs, but the response gives me an error 400. And I dont know why.
I assume its the missing empty (json) bodies, as required in the elastic console.
What do you think?
let mut body: Vec<JsonBody<_>> = Vec::with_capacity(4);
body.push(json!(
{"query": {
"match": {"title":"bee"}
}}
).into());
body.push(json!(
{"query": {
"multi_match": {
"query": "tree",
"fields": ["title", "info"]
}
},"from":0, "size":2}
).into());
let search_response = client
.msearch(MsearchParts::Index(&["nature_index"]))
.body(body)
.send()
.await?;

How to collect and return warnings from services while processing GraphQL query?

What is the best way to collect some specific property from all the leafs of the GraphQL graph, reducing it to some single array? For example, my service functions can "throw" some arbitrary string warnings which I want to collect and supply to the client besides the main data, e.g. expected output:
type EntityOutput {
entity: Entity
warnings: [String!]
}
Resolver:
#Mutation()
async updateEntity(
#Args('id', ParseUUIDPipe) id: string,
#Args('data') input: UpdateDto
): Promise<EntityOutputDto>
{
return {
entity: await this.service.update(id, input),
warnings: [] // ???
};
}
Service method:
async update(id: string, input: UpdateDto): Promise<Entity> {
const entity = await this.repository.findOneOrFail(id, { relations: ['type'] }); // check existence
if (Object.values(input).some(v => v !== undefined)) {
const updateData: Partial<Entity & UpdateDto> = Object.assign({ id }, input);
if (input.isCurrentEntityOfItsType === true) {
await this.typesService.update(entity.type.id, { currentEntityId: id }); // <-- this also can create its own warnings
} else if (input.isCurrentEntityOfItsType === false) {
await this.typesService.update(entity.type.id, { currentEntityId: null as any });
}
await this.repository.save(updateData);
} else {
console.warn(`No properties to change were been provided`); // <-- this is a warning I want to save
}
return this.findOne(id);
}
I think my question can be splitted into 2:
To collect warnings from the service, i.e., in general case, the function calls stack of arbitrary depth. It actually looks more like a general programming problem than a NestJS thing
But even when one implement the feature from the first paragraph the NestJS will walk along the GraphQL graph by itself and there can be additional logs in nested fields.
The solution in its complete general form probably will be over-complicated but at least can anyone suggest the good design for the case represented by the example code?
I have a couple of thoughts:
Should every function in the service return its warnings alongside its main response (for example, in a tuple) so we can incrementally "fold" the array of warnings while "unfolding" the calls stack?
Maybe it would be better to implement using some decorator by which we will mark our service methods?
Maybe RxJS – the NestJS beloved one – can offer us some solution? (I don't know a lot about this library/their philosophy)
Actually the default form of the NestJS output is already looking similar to what I want, it's a JSON with 2 root properties: "errors" and "data". And they can be automatically sent to you simultaneously if the error happened is not so fatal to proceed. Can we somehow overwrite the default response object schema and place warnings there?
The whole question is heavily inspired by this SO discussion but it unfortunately says nothing about the actual possible implementation.
So I've implemented a custom context factory which is executed automatically on every GraphQL request and constructs the object of desired format:
app.module.ts:
export interface AppContext {
warnings: string[];
}
const contextFactory: ContextFunction<any, AppContext> = () => ({
warnings: []
});
Now we can benefit from our newly created interface to add strong typings whenever we reference the context, e.g.:
some.resolver.ts
#Mutation()
async remove(
#Args('id', ParseUUIDPipe) id: string,
#Context() ctx: AppContext
): Promise<FindOneDto>
{
return new FindOneDto(await this.service.remove(id, ctx.warnings));
}
Here the service can add its own warnings to the context.
To collect all of them and return to the API caller I override formatResponse function and append the warnings to the extensions (this is a special GraphQL meta-field serving the developing purposes):
app.module.ts:
const graphqlConfig: GqlModuleOptions = {
context: contextFactory,
formatResponse: (
response: GraphQLResponse | null,
context: GraphQLRequestContext<AppContext>,
): GraphQLResponse =>
{
const warnings = context.context.warnings;
if (warnings.length) {
if (response) {
const extensions = response.extensions || (response.extensions = {});
extensions.warnings = warnings;
} else {
return { extensions: { warnings } };
}
}
return response || {};
},
...
}
Similar approach is used in the official Apollo extension example: https://github.com/apollographql/apollo-server/blob/main/packages/apollo-tracing/src/index.ts.
The only drawback I see now is that injecting the context in resolver's arguments breaks the compliance with auto-generated TypeScript interfaces (I use schema-first approach). In such case, we can switch to per-request-based mode so our resolver/service class instance will be created individually for each and every new request: https://docs.nestjs.com/fundamentals/injection-scopes. Now we can access a context right in the methods without introducing any additional parameters. But this comes with increased latencies and, perhaps, memory-consumption. Another approach will be to create a standalone Nest interceptor.

Strapi GraphQL search by multiple attributes

I've got a very simple Nuxt app with Strapi GraphQL backend that I'm trying to use and learn more about GraphQL in the process.
One of my last features is to implement a search feature where a user enters a search query, and Strapi/GraphQL performs that search based on attributes such as image name and tag names that are associated with that image. I've been reading the Strapi documentation and there's a segment about performing a search.
So in my schema.graphql, I've added this line:
type Query {
...other generated queries
searchImages(searchQuery: String): [Image
}
Then in the /api/image/config/schema.graphql.js file, I've added this:
module.exports = {
query: `
searchImages(searchQuery: String): [Image]
`,
resolver: {
Query: {
searchImages: {
resolverOf: 'Image.find',
async resolver(_, { searchQuery }) {
if (searchQuery) {
const params = {
name_contains: searchQuery,
// tags_contains: searchQuery,
// location_contains: searchQuery,
}
const searchResults = await strapi.services.image.search(params);
console.log('searchResults: ', searchResults);
return searchResults;
}
}
}
},
},
};
At this point I'm just trying to return results in the GraphQL playground, however when I run something simple in the Playground like:
query($searchQuery: String!) {
searchImages(searchQuery:$searchQuery) {
id
name
}
}
I get the error: "TypeError: Cannot read property 'split' of undefined".
Any ideas what might be going on here?
UPDATE:
For now, I'm using deep filtering instead of the search like so:
query($searchQuery: String) {
images(
where: {
tags: { title_contains: $searchQuery }
name_contains: $searchQuery
}
) {
id
name
slug
src {
url
formats
}
}
}
This is not ideal because it's not an OR/WHERE operator, meaning it's not searching by tag title or image name. It seems to only hit the first where. Ideally I would like to use Strapi's search service.
I actually ran into this problem not to recently and took a different solution.
the where condition can be combined with using either _and or _or. as seen below.
_or
articles(where: {
_or: [
{ content_contains: $dataContains },
{ description_contains: $dataContains }
]})
_and
(where: {
_and: [
{slug_contains: $categoriesContains}
]})
Additionally, these operators can be combined given that where in this instance is an object.
For your solution I would presume you want an or condition in your where filter predicate like below
images(where: {
_or: [
{ title_contains: $searchQuery },
{ name_contains: $searchQuery }
]})
Lastly, you can perform a query that filters by a predicate by creating an event schema and adding the #search directive as seen here

How to implement subscriptions in graphql.js?

This is a pretty simple question.
How to implement subscriptions in graphql?
I'm asking specifically for when using graphql.js constructors like below ?
I could not find a clean/simple implementation.
There is another question here, but it deals with relay.js - i don't want to unnecessarily increase the nr of external dependencies in my app.
What i have:
module.exports = function (database){
return new GraphQLSchema(
{ query: RootQuery(database)
, mutation: RootMutation(database)
, subscription: RootSubscription(database) -- i can see this in graphiql - see below
}
);
}
function RootSubscription(database){
return new GraphQLObjectType(
{ name: "RootSubscriptionType"
, fields:
{ getCounterEvery2Seconds:
{ type: new GraphQLNonNull(GraphQLInt)
, args :
{ id: { type: GraphQLString }
}
, subscribe(parent, args, context){
// this subscribe function is never called .. why?
const iterator = simpleIterator()
return iterator
}
}
}
}
)
}
I learned that i need a subscribe() which must return an iterator from this github issue.
And here is a simple async iterator. All this iterator does - is to increase and return the counter every 2 seconds. When it reaches 10 it stops.
function simpleIterator(){
return {
[ Symbol.asyncIterator ]: () => {
let i = 0
return {
next: async function(){
i++
await delay(2000)
if(i > 10){
return { done: true }
}
return {
value: i,
done: false
}
}
}
}
}
}
When i run the graphiql subscription, it returns null for some reason:
I'm piecing together code from multiple sources - wasting time and hacking it basically. Can you help me figure this one out?
Subscriptions are such a big feature, where are they properly documented? Where is that snippet of code which you just copy paste - like queries are for example - look here.
Also, i can't use an example where the schema is separate - as a string/from a file. I already created my schema as javascript constructors. Now since im trying to add subscriptions i can't just move back to using a schema as a string. Requires rewriting the entire project. Or can i actually have both? Thanks :)

Parallel promise execution in resolve functions

I have a question about handling promises in resolve functions for a GraphQL client. Traditionally, resolvers would be implemented on the server, but I am wrapping a REST API on the client.
Background and Motivation
Given resolvers like:
const resolvers = {
Query: {
posts: (obj, args, context) => {
return fetch('/posts').then(res => res.json());
}
},
Post: {
author: (obj, args, _, context) => {
return fetch(`/users/${obj.userId}`)
.then(res => res.json());
.then(data => cache.users[data.id] = data)
}
}
};
If I run the query:
posts {
author {
firstName
}
}
and the Query.posts() /posts API returns four post objects:
[
{
"id": 1,
"body": "It's a nice prototyping tool",
"user_id": 1
},
{
"id": 2,
"body": "I wonder if he used logo?",
"user_id": 2
},
{
"id": 3,
"body": "Is it even worth arguing?",
"user_id": 1
},
{
"id": 4,
"body": "Is there a form above all forms? I think so.",
"user_id": 1
}
]
the Post.author() resolver will get called four times to resolve the author field.
grapqhl-js has a very nice feature where each of the promises returned from the Post.author() resolver will execute in parallel.
I've further been able to eliminate re-fetching author's with the same userId using facebook's dataloader library. BUT, I'd like to use a custom cache instead of dataloader.
The Question
Is there a way to prevent the Post.author() resolver from executing in parallel? Inside the Post.author() resolver, I would like to fetch authors one at a time, checking my cache in between to prevent duplicate http requests.
But, right now the promises returned from Post.author() are queued and executed at once, so I cannot check the cache before each request.
Thank you for any tips!
I definitely recommend looking at DataLoader as it's designed to solve exactly this problem. If you don't use it directly, at least you can read its implementation (which is not that many lines) and borrow the techniques atop your custom cache.
GraphQL and the graphql.js libraries themselves are not concerned with loading data - they leave that up to you via resolver functions. Graphql.js is just calling these resolver functions as eagerly as it can to provide for the fastest overall execution of your query. You can absolutely decide to return Promises which resolve sequentially (which I wouldn't recommend), or—as DataLoader implements—deduplicate with memoization (which is what you want for solving this).
For example:
const resolvers = {
Post: {
author: (obj, args, _, context) => {
return fetchAuthor(obj.userId)
}
}
};
// Very simple memoization
var authorPromises = {};
function fetchAuthor(id) {
var author = authorPromises[id];
if (!author) {
author = fetch(`/users/${id}`)
.then(res => res.json());
.then(data => cache.users[data.id] = data);
authorPromises[id] = author;
}
return author;
}
Just for some people who use dataSource for REST api stuff along with dataLoader(in this case, it doesn't really help as it's a single request). Here is a simple caching solution/example.
export class RetrievePostAPI extends RESTDataSource {
constructor() {
super()
this.baseURL = 'http://localhost:3000/'
}
postLoader = new DataLoader(async ids => {
return await Promise.all(
ids.map(async id => {
if (cache.keys().includes(id)) {
return cache.get(id)
} else {
const postPromise = new Promise((resolve, reject) => {
resolve(this.get(`posts/${id}`))
reject('Post Promise Error!')
})
cache.put(id, postPromise, 1000 * 60)
return postPromise
}
})
)
})
async getPost(id) {
return this.postLoader.load(id)
}
}
Note: here I use memory-cache for caching mechanism.
Hope this helps.

Resources