elasticsearch .net client completion suggester performance - elasticsearch

For some requests using the Nest or the Elasticsearch.Net client the response time is above 500ms.
When using an http client directly or threw kibana interface the same query takes about 1-2ms.
This happens event if there are very few documents in the data base.
I am using the following settings on localhost:
PUT suggestions
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"suggestionelement": {
"properties": {
"suggest": {
"type": "completion",
"max_input_length": 100
}
}
}
}
}
And indexing the following documents:
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion a",
"weight": 1
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion b",
"weight": 2
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion c",
"weight": 3
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion d",
"weight": 4
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion e",
"weight": 5
}
}
When running a suggest(completion) query for "this is just some text for" threw the Nest or Elasticsearch.Net clients, it takes more than 500ms.
Running the same from kibana or directly with httpclient takes less then 2ms
Been at it for days... any ideas?
C# code im using:
var nodes = new Uri[] { new Uri("http://localhost:9200") };
var connectionPool = new StaticConnectionPool(nodes);
var connectionSettings = new ConnectionSettings(connectionPool)
.DefaultIndex("suggestions")
.RequestTimeout(TimeSpan.FromSeconds(30));
var searchEngineClient = new ElasticClient(connectionSettings);
for (int i = 0; i < 10; i++)
{
return await searchEngineClient.SearchAsync<SuggestionElement>(s =>
s.Suggest(ss => ss
.Completion("sentence-suggest", c => c
.Field(f => f.Suggest)
.Prefix("this is just some text for")
.Size(1000))));
};

Related

Elastic search - logstash: filter on in logstash created aggregation in the elastic index?

I posted this question on the elastic forums, but I thought I should try it here as well. The problem is as follows:
Hi,
We have Elasticsearch with logstash (version 8.2). It is inserting data in elastic index from a jdbc source. In logstash we use an aggregate filter. The config looks like this:
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:#wntstdb03.izaaksuite.nl:1521:wntstf2"
jdbc_user => "webnext_zaken"
jdbc_password => "webnext_zaken"
jdbc_driver_library => ""
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement_filepath =>"/appl/sw/webnext/logstash/config_documenten/queries/documenten.sql"
last_run_metadata_path => "/appl/sw/webnext/logstash/config_documenten/parameters/.jdbc_last_run_doc"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
aggregate {
task_id => "%{zaakdoc_id}"
code => "
map['zaak_id'] ||= event.get('zaak_id')
map['result_type'] ||= event.get('result_type')
map['mutatiedatum'] ||= event.get('mutatiedatum')
map['oge_id'] ||= event.get('oge_id')
map['zaakidentificatie'] ||= event.get('zaakidentificatie')
map['zaakomschrijving'] ||= event.get('zaakomschrijving')
map['titel'] ||= event.get('titel')
map['beschrijving'] ||= event.get('beschrijving')
map['zaakdoc_id'] ||= event.get('zaakdoc_id')
map['groepsrollenlijst'] ||= []
map['groepsrollenlijst'] << {'groepsrol' => event.get('rol')}
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
}
}
output {
# stdout { codec => rubydebug }
# file {
# path => ["/appl/sw/webnext/logstash/config_documenten/output/documenten.txt"]
# }
elasticsearch {
hosts => ["localhost:9200"]
index => "documenten"
document_id => "%{zaakdoc_id}"
}
}
The index config looks like this:
{
"documenten": {
"aliases": {
"izaaksuite": {}
},
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"beschrijving": {
"type": "text"
},
"groepsrollenlijst": {
"properties": {
"groepsrol": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"mutatiedatum": {
"type": "date"
},
"oge_id": {
"type": "text"
},
"result_type": {
"type": "text"
},
"rol": {
"type": "text"
},
"tags": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"titel": {
"type": "text"
},
"zaak_id": {
"type": "text"
},
"zaakdoc_id": {
"type": "long"
},
"zaakidentificatie": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"zaakomschrijving": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "documenten",
"creation_date": "1654158264412",
"number_of_replicas": "1",
"uuid": "bf4xj4TwQ-mP5K4Orc5HEA",
"version": {
"created": "8010399"
}
}
}
}
}
One document in the index that is eventually build, looks like this:
"_index": "documenten",
"_id": "25066386",
"_version": 1,
"_seq_no": 33039,
"_primary_term": 6,
"found": true,
"_source": {
"groepsrollenlijst": [
{
"groepsrol": "7710_AFH1"
},
{
"groepsrol": "7710_AFH2"
},
{
"groepsrol": "MR_GRP1"
}
],
"zaak_id": 44973087,
"oge_id": 98,
"#version": "1",
"#timestamp": "2022-07-11T08:24:07.717572Z",
"zaakdoc_id": 25066386,
"zaakomschrijving": "testOSiZaakAOS",
"result_type": "doc",
"titel": "Test4",
"zaakidentificatie": "077215353",
"mutatiedatum": "2022-06-27T09:51:52.078119Z",
"beschrijving": "Test4"
}
}
As you can see, the "groepsrollenlijst" is present. Now our problem: when searching we need to match one of the values in groepsrollenlijst (which is dutch for grouprole which is basically an autorisation within the application where the data is coming from) with the grouprole present on the user doing the search. This to prevent users to be able to have data in their search results they don't have acces to.
Our java code looks like this (sorry for the dutch sentences):
List<SearchResult> searchResults = new ArrayList<>();
SearchRequest searchRequest = new SearchRequest(index);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder fieldsQuery = new BoolQueryBuilder();
/*
* Haal per index alle velden op waarop gezocht kan en mag worden. We kunnen
* niet over alle velden zoeken omdat er dan ook hits voorkomen op de
* groepsrollenlijst (als je op bv op rutten zoekt worden er ook hits gevonden
* op groepsrol "RUTTENGROEP" wat je niet wilt) Ook bij documenten en
* betrokkenen wil je bv niet dat er hits gevonden worden op de
* zaakomschrijving.
*/
String indexFields = index + "Fields";
indexFields = indexFields.substring(0, 1).toUpperCase() + indexFields.substring(1);
List<String> fields = getFieldsFor(indexFields);
// Voeg per veld een query toe voor de ingegeven zoektekst
HighlightBuilder highlightBuilder = new HighlightBuilder();
QueryStringQueryBuilder queryStringQueryBuilder = new QueryStringQueryBuilder(autoCompleteText);
for (String field : fields) {
queryStringQueryBuilder.field(field);
highlightBuilder.field(field);
}
fieldsQuery.should(queryStringQueryBuilder);
// Manipuleer de roles tbv test
roles.clear();
roles.add("7710_AFH1");
roles.add("7710_AFH2");
BoolQueryBuilder rolesQuery = QueryBuilders.boolQuery();
for (String role : roles) {
rolesQuery.should(QueryBuilders.wildcardQuery("groepsrol", "*" + role + "*"));
}
LOG.info("Rollen medewerker: " + roles);
BoolQueryBuilder mainQuery = new BoolQueryBuilder();
mainQuery.must(new TermsQueryBuilder("oge_id", String.valueOf(ogeId)));
mainQuery.must(fieldsQuery);
mainQuery.must(rolesQuery);
searchSourceBuilder.query(mainQuery);
searchSourceBuilder.highlighter(highlightBuilder);
searchRequest.source(searchSourceBuilder);
searchRequest.validate();
// Execute search
LOG.info("Search query: {}", searchRequest.source().toString());
SearchResponse searchResponse = null;
try {
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException | ElasticsearchStatusException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
if (searchResponse == null) {
return;
}
SearchHits hits = searchResponse.getHits();
For the test we hardcoded the user's grouproles into the code.
The issue is that when we search for "testOSiZaakAOS" (one of the values in the document previously shown) which should be a hit, we don't get a result. If we comment out the "mainQuery.must(rolesQuery);" part, we do get a result. But then the roles are not taking into account.
How do we go about fixing this? So user has role x, some documents in the index have key-value pairs for role x, y and z. And some do have only y and z.
Search should only show those where role x is present.
Basically at least one of the roles of the user should match one of the roles present in the document in the index.
Your help is greatly appreciated! Let me know if you need more info.

How to set "max_result_window" in Elasticsearch 7 NEST 7

In default elastic search only returns 10k results. But I need to go to the last page which exceeds 10k results.
I did some reach and found a solution by setting "max_result_window" : 100000
And I execute it in Kibana and even more thanx 5000pages works fine after this setting.
PUT jm-stage-products/_settings
{
"max_result_window" : 100000
}
Now I need to include this setting when I'm creating an index in my source code.But I coundn't find a way to do it.
This is my index create function. How should I set "max_result_window" : 100000?
public string InitIndexing()
{
var indexName = string.Format(_config.ElasticIndexName, _config.HostingEnvironment);
//-----------------------------------------------------------
if (!_client.Indices.Exists(indexName).Exists)
{
//----------------------------------------------
var indexSettings = new IndexSettings
{
NumberOfReplicas = 0, // If this is set to 1 or more, then the index becomes yellow.
NumberOfShards = 5,
};
var indexConfig = new IndexState
{
Settings = indexSettings
};
var createIndexResponses = _client.Indices.Create(indexName, c => c
.InitializeUsing(indexConfig)
.Map<ElasticIndexGroupProduct>(m => m.AutoMap())
);
return createIndexResponses.DebugInformation;
}
else
{
return $"{_config.ElasticIndexName} already exists";
}
}
You can create an index with max_result_window setting with following code snippet:
var createIndexResponse = await elasticClient.Indices.CreateAsync("index_name", c => c
.Settings(s => s
.Setting(UpdatableIndexSettings.MaxResultWindow, 100000)))
Already existing index can be updated with this fluent syntax:
await elasticClient.Indices.UpdateSettingsAsync("index_name", s => s
.IndexSettings(i => i.Setting(UpdatableIndexSettings.MaxResultWindow, 100000)));
I don't know the nest but it can be easily done while creating the index and
below is the REST API to show that.
PUT HTTP://localhost:9200/your-index-name/
{
"settings": {
"max_result_window" : 1000000 // note this
},
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
},
"country": {
"type": "text"
},
"state": {
"type": "text"
},
"city": {
"type": "text"
}
}
}
}
Check if its created successfully in index-settings
GET HTTP://localhost:9200/your-index-name/_settings
{
"so_1": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "so_1",
"max_result_window": "1000000", // note this
"creation_date": "1601273239277",
"number_of_replicas": "1",
"uuid": "eHBxaGf2TBG9GdmG5bvwkQ",
"version": {
"created": "7080099"
}
}
}
}
}
In kibana it looks like this
PUT index_name/_settings
{
"max_result_window": 10000
}

Elastic search Average time difference Aggregate Query

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"events": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
Now, I wanted to write a query in elasticsearch to get all the documents which are of type "SAMPLE" and I can get the average time between STARTED and ENDED of all those documents. Eg. Avg of (2020-05-29T18:18:08.483Z - 2020-04-30T13:41:25.862Z, ....). Assume that STARTED and ENDED event is present only once in events array. Is there any way I can do that?
You can do something like this. The query selects the events of type SAMPLE and status ENDED (to make sure there is a ENDED event). Then the avg aggregation uses scripting to gather the STARTED and ENDED timestamps and subtracts them to return the number of days:
POST test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"status.keyword": "ENDED"
}
},
{
"term": {
"type.keyword": "SAMPLE"
}
}
]
}
},
"aggs": {
"duration": {
"avg": {
"script": "Map findEvent(List events, String type) {return events.find(it -> it.event == type);} def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp); def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); return ChronoUnit.DAYS.between(started, ended);"
}
}
}
}
The script looks like this:
Map findEvent(List events, String type) {
return events.find(it -> it.event == type);
}
def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp);
def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp);
return ChronoUnit.DAYS.between(started, ended);

Spring Mongo - An aggregation to order by objects in an array

I have the following data:
{
"_id": ObjectID("5e2fa881c3a1a70006c5743c"),
"name": "Some name",
"policies": [
{
"cId": "dasefa-2738-4cf0-90e0d568",
"weight": 12
},
{
"cId": "c640ad67dasd0-92f981583568",
"weight": 50
}
]
}
I'm able to query this with Spring Mongo fine, however I want to be able to order the policies by weight
At the moment I get my results fine with:
return mongoTemplate.find(query, CArea::class.java)
However say I make the following aggregations:
val unwind = Aggregation.unwind("policies")
val sort = Aggregation.sort(Sort.Direction.DESC,"policies.weight")
How can I go and actually apply those to the returned results above? I was hoping that the dot annotation would do the job in my query however didnt do anything e.g. Query().with(Sort.by(options.sortDirection, "policies.weight"))
Any help appreciated.
Thanks.
I am not familier with Spring Mongo, but I guess you can convert the following aggregation to spring code.
db.collection.aggregate([
{
$unwind: "$policies"
},
{
$sort: {
"policies.weight": -1
}
},
{
$group: {
_id: "$_id",
"policies": {
"$push": "$policies"
},
parentFields: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
"$parentFields",
{
policies: "$policies"
}
]
}
}
}
])
This will result:
[
{
"_id": "5e2fa881c3a1a70006c5743c",
"name": "Some name",
"policies": [
{
"cId": "c640ad67dasd0-92f981583568",
"weight": 50
},
{
"cId": "dasefa-2738-4cf0-90e0d568",
"weight": 12
}
]
}
]
Playground

Looking for Elasticsearch updateByQuery syntax example (Node driver)

You have an Elasticsearch index with two docs:
[
{
"_index": "myIndex",
"_type": "myType",
"_id": "es1472002807930",
"_source": {
"animal": "turtle",
"color": "green",
"weight": 20,
}
},
{
"_index": "myIndex",
"_type": "myType",
"_id": "es1472002809463",
"_source": {
"animal": "bear",
"color": "brown"
"weight": 400,
}
}
]
Later, you get this updated data about the bear:
{
"color": "pink",
"weight": 500,
"diet": "omnivore",
}
So, you want to update the "color" and "weight" values of the bear, and add the "diet" key to the "bear" doc. You know there's only one doc with "animal": "bear" (but you don't know the _id):
Using the Nodejs driver, what updateByQuery syntax would update the "bear" doc with these new values?
(NOTE: this question has been entirely edited to be more useful to the SO community!)
The answer was provided by Val in this other SO:
How to update a document based on query using elasticsearch-js (or other means)?
Here is the answer:
var theScript = {
"inline": "ctx._source.color = 'pink'; ctx._source.weight = 500; ctx._source.diet = 'omnivore';"
}
client.updateByQuery({
index: myindex,
type: mytype,
body: {
"query": { "match": { "animal": "bear" } },
"script": theScript
}
}, function(err, res) {
if (err) {
reportError(err)
}
cb(err, res)
}
)
The other answer is missing the point since it doesn't have any script to carry out the update.
You need to do it like this:
POST /myIndex/myType/_update_by_query
{
"query": {
"term": {
"animal": "bear"
}
},
"script": "ctx._source.color = 'green'"
}
Important notes:
you need to make sure to enable dynamic scripting in order for this to work.
if you are using ES 2.3 or later, then the update-by-query feature is built-in
if you are using ES 1.7.x or a former release you need to install the update-by-query plugin
if you are using anything between ES 2.0 and 2.2, then you don't have any way to do this in one shot, you need to do it in two operations.
UPDATE
Your node.js code should look like this, you're missing the body parameter:
client.updateByQuery({
index: index,
type: type,
body: {
"query": { "match": { "animal": "bear" } },
"script": { "inline": "ctx._source.color = 'pink'"}
}
}, function(err, res) {
if (err) {
reportError(err)
}
cb(err, res)
}
)
For elasticsearch 7.4 you could use
await client.updateByQuery({
index: "indexName",
body: {
query: {
match: { fieldName: "valueSearched" }
},
script: {
source: "ctx._source.fieldName = params.newValue",
lang: 'painless',
params: {
newValue: "newValue"
}
}
}
});

Resources