Elasticsearch Java API: Wait until document findable in search results? - elasticsearch

I have written a REST API (javax.ws.rs) that uses the high-level Elasticsearch API. This is with ES 7.2.
The client needs to index a record, then execute a search that includes that record and there is some delay after the index operation before the document will actually appear in searches.
Is there any way to block the index operation until the newly index record appears in search results?
Failing that, is there any way to get an asynchronous notification that the document is now searchable?
To give an idea of my use case, here is the code from the client side:
const cr = await this.client.dNodeCreate(fixedNode).toPromise();
const fr = await this.client.dNodeGetById(cr._id).toPromise();
await this.client.dNodeCreate(replyRoot).toPromise();
The first line causes a Index request to ES, and returns the status object. That object includes the ID of the new document.
The second line fetches the record by ID. This always works.
The third line fails. The document it attempts to index is dependent on the first document, which the REST middleware attempts to look up by a search (not by the ID). This is the equivalent of an SQL relation enforced by the REST layer.
I can always make the code work by introducing a delay (say 1500ms) before the third call but this is really a non-robust solution. It might always work in development mode (all the servers are on my laptop and no other users) but there is no way to predict how long the delay needs to be in actual production.
UPDATE: Solved.
The marked answer below seems to do the trick. For reference, the needed call in the Java API looks like this:
IndexRequest req = new IndexRequest(DNode.INDEX);
req.setRefreshPolicy(WriteRequest.RefreshPolicy.WAIT_UNTIL);

The 'refresh' parameter is what you are looking for. From the Elasticsearch documentation:
Refresh (Index API): ) If true, Elasticsearch refreshes the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false do nothing with refreshes. Valid values: true, false, wait_for. Default: false
So your index request should look something like this:
PUT /<index>/_doc/<_id>?refresh=wait_for
I do not believe there is a built-in way to get an asynchronous notification that the document is now searchable. That being said, if you already have access to the document ID it might make more sense to use that in the code instead of a search.

Related

Spring WebClient: Call Rest Service which has paginated response

I want to hit a service which has a paginated response using Web Client. ie. I hit a service, check if it returns hasMoreElements as TRUE, then call the service again with updated request parameters like START_ROW, END_ROW, PAGE_NUMBER. What is the best approach to achieve this? Currently am just looping through the results and hitting the service again. But their should be a better approach to this. PFB my pseudocode. Any libraries I can use?
boolean hasMoreElements=true;
while(!hasMoreElements==false)
{
response=webClient.post().header(HEADERS).bodyValue(REQUEST).block();
Get the NEW START ROW, END ROW, AND PAGE NUMBER and SET in the REQUEST
Get the hasMoreElements value
}
Use JPARepository with paging for this.
You can return a list of objects and check if its length is less then the limit passed, if yes then you can stop fetching.
You could also return a Page or a Slice instead which gives you a little bit more information about the current and next fetch cycle

Is there a way to delay cache revalidation in service worker?

I am currently working on performance improvements for a React-based SPA. Most of the more basic stuff is already done so I started looking into more advanced stuff such as service workers.
The app makes quite a lot of requests on each page (most of the calls are not to REST endpoints but to an endpoint that basically makes different SQL queries to the database, hence the amount of calls). The data in the DB is not updated too often so we have a local cache for the responses, but it's obviously getting lost when a user refreshes a page. This is where I wanted to use the service worker - to keep the responses either in cache store or in IndexedDB (I went with the second option). And, of course, the cache-first approach does not fit here too well as there is still a chance that the data may become stale. So I tried to implement the stale-while-revalidate strategy: fetch the data once, then if the response for a given request is already in cache, return it, but make a real request and update the cache just in case.
I tried the approach from Jake Archibald's offline cookbook but it seems like the app is still waiting for real requests to resolve even when there is a cache entry to return from (I see those responses in Network tab).
Basically the sequence seems to be the following: request > cache entry found! > need to update the cache > only then show the data. Doing the update immediately is unnecessary in my case so I was wondering if there is any way to delay that? Or, alternatively, not to wait for the "real" response to be resolved?
Here's the code that I currently have (serializeRequest, cachePut and cacheMatch are helper functions that I have to communicate with IndexedDB):
self.addEventListener('fetch', (event) => {
// some checks to get out of the event handler if certain conditions don't match...
event.respondWith(
serializeRequest(request).then((serializedRequest) => {
return cacheMatch(serializedRequest, db.post_cache).then((response) => {
const fetchPromise = fetch(request).then((networkResponse) => {
cachePut(serializedRequest, response.clone(), db.post_cache);
return networkResponse;
});
return response || fetchPromise;
});
})
);
})
Thanks in advance!
EDIT: Can this be due to the fact that I put stuff into IndexedDB instead of cache? I am sort of forced to use IndexedDB instead of the cache because those "magic endpoints" are POST instead of GET (because of the fact they require the body) and POST cannot be inserted into the cache...

GraphQL Playground Keeps forcing server's context function to run? [duplicate]

I was learning GraphQL and about to finish the tutorial and this never happened before.
The problem is that the GraphQL server keeps receiving requests after opening GraphQL Playground in the browser even though no query or mutation is made.
I see these sort of responses being returned by the server:
{
"name":"deprecated",
"description":"Marks an element of a GraphQL schema as no longer supported.",
"locations":[
"FIELD_DEFINITION",
"ENUM_VALUE"
],
"args":[
{
"name":"reason",
"description":"Explains why this element was deprecated, usually also including a suggestion for how to access supported similar data. Formatted using the Markdown syntax (as specified by [CommonMark](https://commonmark.org/).",
"type":{
"kind":"SCALAR",
"name":"String",
"ofType":null
},
"defaultValue":"\"No longer supported\""
}
]
}
This is expected behavior.
GraphQL Playground issues an introspection query to your server. It uses the result of that query to provide validation and autocompletion for your queries. Playground will send that query to your server repeatedly (every 2 seconds by default) so that if your schema changes, these changes can be immediately reflected in the UI (although there's an issue with this feature at the moment).
You can adjust the relevant settings (click on the settings icon in the top right corner of the Playground UI) to either change the polling frequency or turn it off entirely:
'schema.polling.enable': true, // enables automatic schema polling
'schema.polling.endpointFilter': '*localhost*', // endpoint filter for schema polling
'schema.polling.interval': 2000, // schema polling interval in ms
However, the behavior you're seeing is only related to Playground so it's harmless and won't impact any other clients connecting to your server.

IBM Integration Bus, best practices for calling multiple services

So I have this requirement, that takes in one document, and from that needs to create one or more documents in the output.
During the cause of this, it needs to determine if the document is already there, because there are different operations to apply for create and update scenarios.
In straight code, this would be simple (conceptually)
InputData in = <something>
if (getItemFromExternalSystem(in.key1) == null) {
createItemSpecificToKey1InExternalSystem(in.key1);
}
if (getItemFromExternalSystem(in.key2) == null) {
createItemSpecificToKey2InExternalSystem(in.key1, in.key2);
}
createItemFromInput(in.key1,in.key2, in.moreData);
In effect a kind of "ensure this data is present".
However, in IIB How would i go about achieving this? If i used a subflow for the Get/create cycle, the output of the subflow would be whatever the result of the last operation is, is returned from the subflow as the new "message" of the flow, but really, I don't care about the value from the "ensure data present" subflow. I need instead to keep working on my original message, but still wait for the different subflows to finish before i can run my final "createItem"
You can use Aggregation Nodes: for example, use 3 flows:
first would be propagate your original message to third
second would be invoke operations createItemSpecificToKey1InExternalSystem and createItemSpecificToKey2InExternalSystem
third would be aggregate results of first and second and invoke createItemFromInput.
Have you considered using the Collector node? It will collect your records into N 'collections', and then you can iterate over the collections and output one document per collection.

What is "configuredOnly" used for in ConnectionMultiplexer.GetEndPoints()?

I am using the fantastic StackExchange.Redis library to implement ObjectCache. One of the interface methods to implement in ObjectCache is long GetCount(...) which returns the number of keys in the database. It looks like this can be satisfied by the IServer.DatabaseSize(...) method in StackExchange.Redis.
I plan on fetching the server endpoints from ConnectionMultiplexer.GetEndPoints(), getting an IServer for each endpoint, and then querying the database size for each database I am interested in on each server (ignore size discrepancies for the moment).
Now, ConnectionMultiplexer.GetEndPoints() has an optional parameter called "configuredOnly". What is the consequence of not providing it, versus true, versus false?
In the ConnectionMultiplexer.GetEndPoints() implementation, I see that it returns the EndPoints from the multiplexer configuration if configuredOnly is true, or else returns EndPoints from an array called "serverSnapshot".
As best as I can tell, "serverSnapshot" is populated here, which seems to be populated as servers are connected, or at least are attempted to be connected to.
Does GetEndPoints(true) return all EndPoints that were configured on the ConnectionMultiplexer? Does GetEndPoints() and GetEndPoints(false) return EndPoints that actually are connected/valid? The documentation for the GetEndPoints method with respect to the configuredOnly parameter is sparse, and my subsequent use of the returned EndPoints needs one behavior and not the other.
When configuredOnly is set to true, GetEndPoints() only returns endpoints for the Redis servers explicitly specified in the call to ConnectionMultiplexer.Connect(). Alternately when configuredOnly is false, endpoints are returned for every Redis servers in the cluster, whether or not they were specified in the initial ConnectionMultiplexer.Connect() call.
Somewhat strangly, if you use DNS names in the ConnectionMultiplexer.Connect() call, GetEndPoints(false) will return rows for both the DNS name and also the resolved IP address. For example, with a six-node Redis cluster the following code:
ConnectionMultiplexer redis = ConnectionMultiplexer("localhost:6379,localhost:6380");
foreach (var endpoint in redis.GetEndPoints(false))
{
Console.WriteLine(endpoint.ToString());
}
will output
$127.0.0.1:6379
Unspecified/localhost:6379
Unspecified/localhost:6380
127.0.0.1:6380
127.0.0.1:6381
127.0.0.1:6382
127.0.0.1:6383
127.0.0.1:6384
If I had called redis.GetEndPoints(true), only Unspecified/localhost:6379 and Unspecified/localhost:6380 would be returned.

Resources