Msearch Elasticsearch API - Rust - elasticsearch

By this point, I feel like I am the only other person on earth that is using multi-search on Rust... other than the person who wrote it.
There is zero documentation on this other than this hyper-confusing one https://docs.rs/elasticsearch/7.14.0-alpha.1/elasticsearch/struct.Msearch.html
I figured I had to pass MsearchParts parts as an argument for the client.msearch(here goes msearch_parts), and luckily for me, there a piece of documentation for how that is supposed to be, but such documentation is so poorly done that I have no clue of what to do because I did not write the API.
I have no clue of how to pass my JSON
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Whiskers"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Chicken"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Turkey"}}}}
"NOT IN THE CODE: extra EMPTY line required by elasticsearch multi-searches"
and get a 200^ response.
As a side note, my JSON is well formatted into a string that can be sent in a normal reqwest the issue is more on how to turn that JSON string into MsearchParts.

The body needs to conform to the structure specified for the msearch API
The multi search API executes several searches from a single API
request. The format of the request is similar to the bulk API format
and makes use of the newline delimited JSON (NDJSON) format.
The structure is as follows:
header\n
body\n
header\n
body\n
let client = Elasticsearch::default();
let msearch_response = client
.msearch(MsearchParts::None)
.body::<JsonBody<Value>>(vec![
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Whiskers"}}}}).into(),
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Chicken"}}}}).into(),
json!({"index":"cat_food"}).into(),
json!({"query":{"term":{"name":{"term":"Turkey"}}}}).into(),
])
.send()
.await?;
let json: Value = msearch_response.json().await?;
// enumerate over the response objects in the response
for (idx, response) in json["responses"]
.as_array()
.unwrap()
.into_iter()
.enumerate()
{
println!();
println!("response {}", idx);
println!();
// print the name of each matching document
for hit in response["hits"]["hits"].as_array().unwrap() {
println!("{}", hit["_source"]["name"]);
}
}
The above example uses MsearchParts::None, but because all search requests are targeting the same index, the index can be specified with MsearchParts::Index(...), and it doesn't need to be repeated in the header for each search request
let client = Elasticsearch::default();
let msearch_response = client
.msearch(MsearchParts::Index(&["cat_food"]))
.body::<JsonBody<Value>>(vec![
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Whiskers"}}}}).into(),
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Chicken"}}}}).into(),
json!({}).into(),
json!({"query":{"term":{"name":{"term":"Turkey"}}}}).into(),
])
.send()
.await?;
let json: Value = msearch_response.json().await?;
// enumerate over the response objects in the response
for (idx, response) in json["responses"]
.as_array()
.unwrap()
.into_iter()
.enumerate()
{
println!();
println!("response {}", idx);
println!();
// print the name of each matching document
for hit in response["hits"]["hits"].as_array().unwrap() {
println!("{}", hit["_source"]["name"]);
}
}
Because msearch's body() fn takes a Vec<T> where T implements the Body trait, and Body is implemented for &str, you can pass a string literal directly
let msearch_response = client
.msearch(MsearchParts::None)
.body(vec![r#"{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Whiskers"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Chicken"}}}}
{"index":"cat_food"}
{"query":{"term":{"name":{"term":"Turkey"}}}}
"#])
.send()
.await?;
let json: Value = msearch_response.json().await?;

Per the doc of MsearchParts, it looks an array of &str needs to be used to construct the Index (or IndexType) variant of enum MsearchParts. So, please give the following way a try and see if it works.
Let parts = MsearchParts::Index([
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Whiskers"}}}}"#,
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Chicken"}}}}"#,
r#"{"index":"cat_food"}"#,
r#"{"query":{"term":{"name":{"term":"Turkey"}}}}"#
]);

After hours of investigating I took some other approach, using a body vector and the msearch API.
I think the json doesnt go to the msearchparts but to a vector of bodies.
(see https://docs.rs/elasticsearch/7.14.0-alpha.1/elasticsearch/#request-bodies)
It runs, but the response gives me an error 400. And I dont know why.
I assume its the missing empty (json) bodies, as required in the elastic console.
What do you think?
let mut body: Vec<JsonBody<_>> = Vec::with_capacity(4);
body.push(json!(
{"query": {
"match": {"title":"bee"}
}}
).into());
body.push(json!(
{"query": {
"multi_match": {
"query": "tree",
"fields": ["title", "info"]
}
},"from":0, "size":2}
).into());
let search_response = client
.msearch(MsearchParts::Index(&["nature_index"]))
.body(body)
.send()
.await?;

Related

Function local ownership where references are required

I am currently trying to build a request to ElasticSearch using the elasticsearch dependency.
Below, find the simplified version of the code I have written:
fn test<'a>(client: &'a Elasticsearch) -> BoxFuture<'a, std::result::Result<Bytes, elasticsearch::Error>> {
let index_parts = ["foo", "bar"]; // Imagine this list being computed and not literal
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send();
search_response
.and_then(|resp| resp.bytes())
.boxed()
}
The error I get:
cannot return value referencing local variable index_parts
returns a value referencing data owned by the current function
I totally understand why I get this error - I create a new array inside of test, but SearchParts::Index expects only a &'b [&'b str], so I have no way of giving ownership to it. So I am stuck with it.
Of course there are a couple of simple solutions to this, first and foremost simply inlining test instead of creating a separate function, or somehow returning index_parts with the Future, but those solutions leak implementation details and we all know that this is bad.
So, how do I fix this error without breaking encapsulation?
A colleague of mine suggested the following solution:
async fn test<'a>(client: &'a Elasticsearch) -> std::result::Result<Bytes, elasticsearch::Error> {
let index_parts = vec!["foo", "bar"]; // Imagine this list being computed and not literal
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send()
.await?;
search_response.bytes().await
}
Although I understand that with async, everything inside of the function body is effectively a closure, I feel like there should be a way of doing this without using async as well, but I am not sure.
There is a further problem with this solution though - the returned Future has a lifetime bound to the lifetime of the Elasticsearch reference which leads to issues down the line.
To fix this:
async fn test(client: & Elasticsearch) -> std::result::Result<Bytes, elasticsearch::Error> {
let client = client.clone();
async move {
let index_parts = ["foo", "bar"];
let search_response = client
.search(SearchParts::Index(&index_parts))
.from(0)
.size(1000)
.body(json!(
{ "query": { "match_all": { } } }
))
.send()
.await?;
Ok(search_response.bytes().await?)
}.await
}

/* in transcoding from HTTP to gRPC

rpc CreateBook(CreateBookRequest) returns (Book) {
option (google.api.http) = {
post: "/v1/{parent=publishers/*}/books"
body: "book"
};
}
message CreateBookRequest {
// The publisher who will publish this book.
// When using HTTP/JSON, this field is automatically populated based
// on the URI, because of the `{parent=publishers/*}` syntax.
string parent = 1 [
(google.api.field_behavior) = REQUIRED,
(google.api.resource_reference) = {
child_type: "library.googleapis.com/Book"
}];
Book book = 2 [(google.api.field_behavior) = REQUIRED];
string book_id = 3;
}
I don't understand post: "/v1/{parent=publishers/*}/books"
I thought publishers was a field in CreateBookRequest, then it populates to http, so it is something like this
post: "/v1/parent=publishers_field_value/books"
But publishers is not a field in CreateBookRequest
No, publishers is part of the expected value of the parent field. So suppose you have a protobuf request like this:
{
"parent": "publishers/pub1",
"book_id": "xyz"
"book": {
"author": "Stacy"
}
}
That can be transcoded by a client into an HTTP request with:
Method: POST
URI: /v1/publishers/pub1/books?bookId=xyz (with the appropriate host name)
Body:
{
"author": "Stacy"
}
If you try to specify a request with a parent that doesn't match publishers/*, I'd expect transcoding to fail.
That's in terms of transcoding from protobuf to HTTP, in the request. (That's the direction I'm most familiar with, having been coding it in C# just this week...)
In the server, it should just be the opposite - so given the HTTP request above, the server should come up with the original protobuf request including parent="publishers/pub1".
For a lot more information on all of this, see the proto defining HttpRule.

How to run a graphql against faunadb from html client

I am totally new to graphql and faunadb, so plz bear with me if its a silly question.
I see I can run graphql query from the dashboard > GRAPHQL. e.g. Pasting the following code
query FindAllTodos {
allTodos {
data {
_id
title
completed
list {
title
}
}
}
}
and hitting the Run button. But how I can run this query from my html/js code which I want to run in browser?
In js I can create the clientsdk but not sure how to pass the above query?
import faunadb, { query as q } from 'faunadb';
let adminClient = new faunadb.Client({
secret: 'my-key'
});
On googling I found example which were using some FQL like structures like
adminClient.query(
q.Get(q.Ref(q.Collection('Todo'), '276653641074475527'))
)
.then((ret) => console.log(ret));
but how I can just pass the graphql query and get the same result, its returning me in right side pane of the graphql play ground.
You can use a client like curl or any GraphQL client.
With curl you can issue something like:
curl -X POST -H 'Authorization: Bearer <your key>' https://graphql.fauna.com/graphql -d '{ "query": "{ FindAllTodos{ data {_id title completed list { title }} }}"}'
I can get you 90% there but the code I present to you is written in TypeScript in an Angular app that uses HttpClient and RxJs Observables. With a little effort you can rewrite in JS using vanilla HTTP fetch.
By the way here is a video by Brecht De Rooms that helped me a lot:
https://www.youtube.com/watch?v=KlUPiQaTp0I
const SERVER_KEY = 'Your server key goes here';
const executeQuery = (query: string) => {
const headers = new HttpHeaders().set('Authorization', 'Bearer ' + SERVER_KEY);
return this.http.post<any>('https://graphql.fauna.com/graphql',
JSON.stringify({ query }), { headers });
}
const findAllTodos = () => {
const query = `query FindAllTodos {
allTodos {
data {
_id
title
completed
list {
title
}
}
}
}`;
return executeQuery(query);
}
findAllTodos().subscribe(console.log);
For me the hurdle was learning that the Fauna server expects JSON in this form:
{ "query": "query FindAllTodos {allTodos { ... and so forth and so on ..." }
That same structure applies when you run a mutation:
{ "query": "mutation AddTodo { ...etc... " }
By the way, if your query doesn't work the first time, which it probably won't, I recommend opening your browser's developer's tools Network tab and inspect the request that was sent to the Fauna server. Look at the Response. There will be error information in there. The response status will be 200(OK) even when there are errors. You need to look inside the response to check for errors.

Resolve to the same object from two incoherent sources in graphql

I have a problem I don't know how to solve properly.
I'm working on a project where we use a graphql server to communicate with different apis. These apis are old and very difficult to update so we decided to use graphql to simplify our communications.
For now, two apis allow me to get user data. I know it's not coherent but sadly I can't change anything to that and I need to use the two of them for different actions. So for the sake of simplicity, I would like to abstract this from my front app, so it only asks for user data, always on the same format, no matter from which api this data comes from.
With only one api, the resolver system of graphql helped a lot. But when I access user data from a second api, I find very difficult to always send back the same object to my front page. The two apis, even though they have mostly the same data, have a different response format. So in my resolvers, according to where the data is coming from, I should do one thing or another.
Example :
API A
type User {
id: string,
communication: Communication
}
type Communication {
mail: string,
}
API B
type User {
id: string,
mail: string,
}
I've heard a bit about apollo-federation but I can't put a graphql server in front of every api of our system, so I'm kind of lost on how I can achieve transparency for my front app when data are coming from two different sources.
If anyone has already encounter the same problem or have advice on something I can do, I'm all hear :)
You need to decide what "shape" of the User type makes sense for your client app, regardless of what's being returned by the REST APIs. For this example, let's say we go with:
type User {
id: String
mail: String
}
Additionally, for the sake of this example, let's assume we have a getUser field that returns a single user. Any arguments are irrelevant to the scenario, so I'm omitting them here.
type Query {
getUser: User
}
Assuming I don't know which API to query for the user, our resolver for getUser might look something like this:
async () => {
const [userFromA, userFromB] = await Promise.all([
fetchUserFromA(),
fetchUserFromB(),
])
// transform response
if (userFromA) {
const { id, communication: { mail } } = userFromA
return {
id,
mail,
}
}
// response from B is already in the correct "shape", so just return it
if (userFromB) {
return userFromB
}
}
Alternatively, we can utilize individual field resolvers to achieve the same effect. For example:
const resolvers = {
Query: {
getUser: async () => {
const [userFromA, userFromB] = await Promise.all([
fetchUserFromA(),
fetchUserFromB(),
])
return userFromA || userFromB
},
},
User: {
mail: (user) => {
if (user.communication) {
return user.communication.mail
}
return user.mail
}
},
}
Note that you don't have to match your schema to either response from your existing REST endpoints. For example, maybe you'd like to return a User like this:
type User {
id: String
details: UserDetails
}
type UserDetails {
email: String
}
In this case, you'd just transform the response from either API to fit your schema.

How to get requested fields inside GraphQL resolver?

I am using graphql-tools. After receiving a GraphQL query, I execute a search using ElasticSearch and return the data.
However, usually the requested query includes only a few of the possible fields, not all. I want to pass only the requested fields to ElasticSearch.
First, I need to get the requested fields.
I can already get the whole query as a string. For example, in the resolver,
const resolvers = {
Query: {
async user(p, args, context) {
//can print query as following
console.log(context.query)
}
.....
}
}
It prints as
query User { user(id:"111") { id name address } }
Is there any way to get the requested fields in a format like
{ id:"", name:"", address:"" }
In graphql-js resolvers expose a fourth argument called resolve info. This field contains more information about the field.
From the GraphQL docs GraphQLObjectType config parameter type definition:
// See below about resolver functions.
type GraphQLFieldResolveFn = (
source?: any,
args?: {[argName: string]: any},
context?: any,
info?: GraphQLResolveInfo
) => any
type GraphQLResolveInfo = {
fieldName: string,
fieldNodes: Array<Field>,
returnType: GraphQLOutputType,
parentType: GraphQLCompositeType,
schema: GraphQLSchema,
fragments: { [fragmentName: string]: FragmentDefinition },
rootValue: any,
operation: OperationDefinition,
variableValues: { [variableName: string]: any },
}
In the fieldNodes field you can search for your field and get the selectionSet for the particular field. From here it gets tricky since the selections can be normal field selections, fragments or inline fragments. You would have to merge all of them to know all fields that are selected on a field.
There is an info object passed as the 4th argument in the resolver. This argument contains the information you're looking for.
It can be helpful to use a library as graphql-fields to help you parse the graphql query data:
const graphqlFields = require('graphql-fields');
const resolvers = {
Query: {
async user(_, args, context, info) {
const topLevelFields = graphqlFields(info);
console.log(Object.keys(topLevelFields)); // ['id', 'name', 'address']
},
};
Similarly for graphql-java you may do the same by extending the field parameters with myGetUsersResolverMethod(... DataFetchingEnvironment env).
This DataFetchingEnvironment would be injected for you and you can traverse through this DataFetchingEnvironment object for any part of the graph/query.
This Object allows you to know more about what is being fetched and what arguments have been provided.
Example:
public List<User> getUsers(final UsersFilter filter, DataFetchingEnvironment env) {
DataFetchingFieldSelectionSet selectionSet = env.getSelectionSet();
selectionSet.getFields(); // <---List of selected fields
selectionSet.getArguments(); // <--- Similarly but MAP
...
}
In fact you may be alluding to look ahead data fetching. The above should give you enough insights into the fields requested and you can take it from there to tailor you downstream calls manually. But also you may look into a more efficient way to do this by using the data fetchers for Building efficient data fetchers by looking ahead

Resources