How to set "max_result_window" in Elasticsearch 7 NEST 7 - elasticsearch

In default elastic search only returns 10k results. But I need to go to the last page which exceeds 10k results.
I did some reach and found a solution by setting "max_result_window" : 100000
And I execute it in Kibana and even more thanx 5000pages works fine after this setting.
PUT jm-stage-products/_settings
{
"max_result_window" : 100000
}
Now I need to include this setting when I'm creating an index in my source code.But I coundn't find a way to do it.
This is my index create function. How should I set "max_result_window" : 100000?
public string InitIndexing()
{
var indexName = string.Format(_config.ElasticIndexName, _config.HostingEnvironment);
//-----------------------------------------------------------
if (!_client.Indices.Exists(indexName).Exists)
{
//----------------------------------------------
var indexSettings = new IndexSettings
{
NumberOfReplicas = 0, // If this is set to 1 or more, then the index becomes yellow.
NumberOfShards = 5,
};
var indexConfig = new IndexState
{
Settings = indexSettings
};
var createIndexResponses = _client.Indices.Create(indexName, c => c
.InitializeUsing(indexConfig)
.Map<ElasticIndexGroupProduct>(m => m.AutoMap())
);
return createIndexResponses.DebugInformation;
}
else
{
return $"{_config.ElasticIndexName} already exists";
}
}

You can create an index with max_result_window setting with following code snippet:
var createIndexResponse = await elasticClient.Indices.CreateAsync("index_name", c => c
.Settings(s => s
.Setting(UpdatableIndexSettings.MaxResultWindow, 100000)))
Already existing index can be updated with this fluent syntax:
await elasticClient.Indices.UpdateSettingsAsync("index_name", s => s
.IndexSettings(i => i.Setting(UpdatableIndexSettings.MaxResultWindow, 100000)));

I don't know the nest but it can be easily done while creating the index and
below is the REST API to show that.
PUT HTTP://localhost:9200/your-index-name/
{
"settings": {
"max_result_window" : 1000000 // note this
},
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
},
"country": {
"type": "text"
},
"state": {
"type": "text"
},
"city": {
"type": "text"
}
}
}
}
Check if its created successfully in index-settings
GET HTTP://localhost:9200/your-index-name/_settings
{
"so_1": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "so_1",
"max_result_window": "1000000", // note this
"creation_date": "1601273239277",
"number_of_replicas": "1",
"uuid": "eHBxaGf2TBG9GdmG5bvwkQ",
"version": {
"created": "7080099"
}
}
}
}
}

In kibana it looks like this
PUT index_name/_settings
{
"max_result_window": 10000
}

Related

Elastic search - logstash: filter on in logstash created aggregation in the elastic index?

I posted this question on the elastic forums, but I thought I should try it here as well. The problem is as follows:
Hi,
We have Elasticsearch with logstash (version 8.2). It is inserting data in elastic index from a jdbc source. In logstash we use an aggregate filter. The config looks like this:
jdbc {
jdbc_connection_string => "jdbc:oracle:thin:#wntstdb03.izaaksuite.nl:1521:wntstf2"
jdbc_user => "webnext_zaken"
jdbc_password => "webnext_zaken"
jdbc_driver_library => ""
jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
statement_filepath =>"/appl/sw/webnext/logstash/config_documenten/queries/documenten.sql"
last_run_metadata_path => "/appl/sw/webnext/logstash/config_documenten/parameters/.jdbc_last_run_doc"
}
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
aggregate {
task_id => "%{zaakdoc_id}"
code => "
map['zaak_id'] ||= event.get('zaak_id')
map['result_type'] ||= event.get('result_type')
map['mutatiedatum'] ||= event.get('mutatiedatum')
map['oge_id'] ||= event.get('oge_id')
map['zaakidentificatie'] ||= event.get('zaakidentificatie')
map['zaakomschrijving'] ||= event.get('zaakomschrijving')
map['titel'] ||= event.get('titel')
map['beschrijving'] ||= event.get('beschrijving')
map['zaakdoc_id'] ||= event.get('zaakdoc_id')
map['groepsrollenlijst'] ||= []
map['groepsrollenlijst'] << {'groepsrol' => event.get('rol')}
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
}
}
output {
# stdout { codec => rubydebug }
# file {
# path => ["/appl/sw/webnext/logstash/config_documenten/output/documenten.txt"]
# }
elasticsearch {
hosts => ["localhost:9200"]
index => "documenten"
document_id => "%{zaakdoc_id}"
}
}
The index config looks like this:
{
"documenten": {
"aliases": {
"izaaksuite": {}
},
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"beschrijving": {
"type": "text"
},
"groepsrollenlijst": {
"properties": {
"groepsrol": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"mutatiedatum": {
"type": "date"
},
"oge_id": {
"type": "text"
},
"result_type": {
"type": "text"
},
"rol": {
"type": "text"
},
"tags": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"titel": {
"type": "text"
},
"zaak_id": {
"type": "text"
},
"zaakdoc_id": {
"type": "long"
},
"zaakidentificatie": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"zaakomschrijving": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "documenten",
"creation_date": "1654158264412",
"number_of_replicas": "1",
"uuid": "bf4xj4TwQ-mP5K4Orc5HEA",
"version": {
"created": "8010399"
}
}
}
}
}
One document in the index that is eventually build, looks like this:
"_index": "documenten",
"_id": "25066386",
"_version": 1,
"_seq_no": 33039,
"_primary_term": 6,
"found": true,
"_source": {
"groepsrollenlijst": [
{
"groepsrol": "7710_AFH1"
},
{
"groepsrol": "7710_AFH2"
},
{
"groepsrol": "MR_GRP1"
}
],
"zaak_id": 44973087,
"oge_id": 98,
"#version": "1",
"#timestamp": "2022-07-11T08:24:07.717572Z",
"zaakdoc_id": 25066386,
"zaakomschrijving": "testOSiZaakAOS",
"result_type": "doc",
"titel": "Test4",
"zaakidentificatie": "077215353",
"mutatiedatum": "2022-06-27T09:51:52.078119Z",
"beschrijving": "Test4"
}
}
As you can see, the "groepsrollenlijst" is present. Now our problem: when searching we need to match one of the values in groepsrollenlijst (which is dutch for grouprole which is basically an autorisation within the application where the data is coming from) with the grouprole present on the user doing the search. This to prevent users to be able to have data in their search results they don't have acces to.
Our java code looks like this (sorry for the dutch sentences):
List<SearchResult> searchResults = new ArrayList<>();
SearchRequest searchRequest = new SearchRequest(index);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder fieldsQuery = new BoolQueryBuilder();
/*
* Haal per index alle velden op waarop gezocht kan en mag worden. We kunnen
* niet over alle velden zoeken omdat er dan ook hits voorkomen op de
* groepsrollenlijst (als je op bv op rutten zoekt worden er ook hits gevonden
* op groepsrol "RUTTENGROEP" wat je niet wilt) Ook bij documenten en
* betrokkenen wil je bv niet dat er hits gevonden worden op de
* zaakomschrijving.
*/
String indexFields = index + "Fields";
indexFields = indexFields.substring(0, 1).toUpperCase() + indexFields.substring(1);
List<String> fields = getFieldsFor(indexFields);
// Voeg per veld een query toe voor de ingegeven zoektekst
HighlightBuilder highlightBuilder = new HighlightBuilder();
QueryStringQueryBuilder queryStringQueryBuilder = new QueryStringQueryBuilder(autoCompleteText);
for (String field : fields) {
queryStringQueryBuilder.field(field);
highlightBuilder.field(field);
}
fieldsQuery.should(queryStringQueryBuilder);
// Manipuleer de roles tbv test
roles.clear();
roles.add("7710_AFH1");
roles.add("7710_AFH2");
BoolQueryBuilder rolesQuery = QueryBuilders.boolQuery();
for (String role : roles) {
rolesQuery.should(QueryBuilders.wildcardQuery("groepsrol", "*" + role + "*"));
}
LOG.info("Rollen medewerker: " + roles);
BoolQueryBuilder mainQuery = new BoolQueryBuilder();
mainQuery.must(new TermsQueryBuilder("oge_id", String.valueOf(ogeId)));
mainQuery.must(fieldsQuery);
mainQuery.must(rolesQuery);
searchSourceBuilder.query(mainQuery);
searchSourceBuilder.highlighter(highlightBuilder);
searchRequest.source(searchSourceBuilder);
searchRequest.validate();
// Execute search
LOG.info("Search query: {}", searchRequest.source().toString());
SearchResponse searchResponse = null;
try {
searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException | ElasticsearchStatusException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return;
}
if (searchResponse == null) {
return;
}
SearchHits hits = searchResponse.getHits();
For the test we hardcoded the user's grouproles into the code.
The issue is that when we search for "testOSiZaakAOS" (one of the values in the document previously shown) which should be a hit, we don't get a result. If we comment out the "mainQuery.must(rolesQuery);" part, we do get a result. But then the roles are not taking into account.
How do we go about fixing this? So user has role x, some documents in the index have key-value pairs for role x, y and z. And some do have only y and z.
Search should only show those where role x is present.
Basically at least one of the roles of the user should match one of the roles present in the document in the index.
Your help is greatly appreciated! Let me know if you need more info.

illegal_argument_exception: index.lifecycle.rollover_alias [metricbeat-6.8.4-alias] does not point to index [metricbeat-6.8.4-2020.02.24]

currently looking for help about setup ilm, i have setup the template, index alias and policy as below
PUT metricbeat-6.8.4-alias-000001
{
"aliases": {
"metricbeat-6.8.4-alias": {
"is_write_index": true
}
}
}
PUT _template/metricbeat-6.8.4-alias
{
"index_patterns": ["metricbeat-6.8.4-*"],
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"index.lifecycle.name": "Delete_Index",
"index.lifecycle.rollover_alias": "metricbeat-6.8.4-alias"
}
}
but still error ocurred like below
illegal_argument_exception: index.lifecycle.rollover_alias [metricbeat-6.8.4-alias] does not point to index [metricbeat-6.8.4-2020.02.24]
looking for help how i setup correcly the ilm ?
thanks
Creating an alias with the lifecycle policy is a 3 step process.
Elastic provides a great tutorial
In short:
Create a lifecycle policy that defines the appropriate phases and actions.
Create an index template to apply the policy to each new index.
Bootstrap an index as the initial write index.
I think you are creating the template AFTER you create the first index. You should first create the ilm, after that the template where you specify what ilm policy you want to use for the indexes and finally create the first index (bootstrap).
Example in code:
var indexName = "index_name";
var indexPattern = $"{indexName}-*";
var aliasName = $"{indexName}-alias";
var policyName = $"{indexName}-policy";
var firstIndexName = $"{indexName}-000001";
PUT _ilm/policy/index_name-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "5gb",
"max_docs": 10000,
"max_age":"2d"
}
}
},
"warm": {
"min_age": "5d",
"actions": {
}
},
"delete": {
"min_age": "10d",
"actions": {
"delete": {}
}
}
}
}
}
PUT _template/index_name-template
{
"index_patterns": ["{{.IndexPattern}}"],
"settings": {
"index.number_of_shards": "1",
"index.number_of_replicas": "1",
"index.lifecycle.name": "{{.PolicyName}}",
"index.lifecycle.rollover_alias": "{{.AliasName}}"
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
{{Properties}}
}
}
}
PUT index_name-000001
{
"aliases": {
"{{.AliasName}}":{
"is_write_index": true
}
}
}
If you have a rollover in the lifecycle policy delete it and then also delete the alias part like this;
before
"index_patterns": ["this-is-index-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "delete-index-policy",
"index.lifecycle.rollover_alias": "this-is-alias-*.*.*"
after
"index_patterns": ["this-is-index-*"],
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"index.lifecycle.name": "delete-index-policy"

DynamoDB DocumentClient returns Set of strings (SS) attribute as an object

I'm new to DynamoDB.
When I read data from the table with AWS.DynamoDB.DocumentClient class, the query works but I get the result in the wrong format.
Query:
{
TableName: "users",
ExpressionAttributeValues: {
":param": event.pathParameters.cityId,
":date": moment().tz("Europe/London").format()
},
FilterExpression: ":date <= endDate",
KeyConditionExpression: "cityId = :param"
}
Expected:
{
"user": "boris",
"phones": ["+23xxxxx999", "+23xxxxx777"]
}
Actual:
{
"user": "boris",
"phones": {
"type": "String",
"values": ["+23xxxxx999", "+23xxxxx777"],
"wrapperName": "Set"
}
}
Thanks!
The [unmarshall] function from the [AWS.DynamoDB.Converter] is one solution if your data comes as e.g:
{
"Attributes": {
"last_names": {
"S": "UPDATED last name"
},
"names": {
"S": "I am the name"
},
"vehicles": {
"NS": [
"877",
"9801",
"104"
]
},
"updatedAt": {
"S": "2018-10-19T01:55:15.240Z"
},
"createdAt": {
"S": "2018-10-17T11:49:34.822Z"
}
}
}
Please notice the object/map {} spec per attribute, holding the attr type.
Means you are using the [dynamodb]class and not the [DynamoDB.DocumentClient].
The [unmarshall] will Convert a DynamoDB record into a JavaScript object.
Stated and backed by AWS. Ref. https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/Converter.html#unmarshall-property
Nonetheless, I faced the exact same use case, as yours. Having one only attribute, TYPE SET (NS) in my case, and I had to manually do it. Next a snippet:
// Please notice the <setName>, which represents your set attribute name
ddbTransHandler.update(params).promise().then((value) =>{
value.Attributes[<setName>] = value.Attributes[<setName>].values;
return value; // or value.Attributes
});
Cheers,
Hamlet

elasticsearch .net client completion suggester performance

For some requests using the Nest or the Elasticsearch.Net client the response time is above 500ms.
When using an http client directly or threw kibana interface the same query takes about 1-2ms.
This happens event if there are very few documents in the data base.
I am using the following settings on localhost:
PUT suggestions
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"suggestionelement": {
"properties": {
"suggest": {
"type": "completion",
"max_input_length": 100
}
}
}
}
}
And indexing the following documents:
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion a",
"weight": 1
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion b",
"weight": 2
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion c",
"weight": 3
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion d",
"weight": 4
}
}
POST suggestions/suggestionelement
{
"suggest": {
"input": "this is just some text for suggestion e",
"weight": 5
}
}
When running a suggest(completion) query for "this is just some text for" threw the Nest or Elasticsearch.Net clients, it takes more than 500ms.
Running the same from kibana or directly with httpclient takes less then 2ms
Been at it for days... any ideas?
C# code im using:
var nodes = new Uri[] { new Uri("http://localhost:9200") };
var connectionPool = new StaticConnectionPool(nodes);
var connectionSettings = new ConnectionSettings(connectionPool)
.DefaultIndex("suggestions")
.RequestTimeout(TimeSpan.FromSeconds(30));
var searchEngineClient = new ElasticClient(connectionSettings);
for (int i = 0; i < 10; i++)
{
return await searchEngineClient.SearchAsync<SuggestionElement>(s =>
s.Suggest(ss => ss
.Completion("sentence-suggest", c => c
.Field(f => f.Suggest)
.Prefix("this is just some text for")
.Size(1000))));
};

Children are not mapping properly in elastic to parents

"chods": {
"mappings": {
"chod": {
"properties": {
"state": {
"type": "text"
}
}
},
"chods": {},
"variant": {
"_parent": {
"type": "chod"
},
"_routing": {
"required": true
},
"properties": {
"percentage": {
"type": "double"
}
}
}
}
},
When I execute:
PUT /chods/variant/565?parent=36442
{ // some data }
It returns:
{
"_index":"chods",
"_type":"variant",
"_id":"565",
"_version":6,
"result":"updated",
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":false
}
But when I run this query:
GET /chods/variant/565?parent=36442
It returns variant with parent=36443
{
"_index": "chods",
"_type": "variant",
"_id": "565",
"_version": 7,
"_routing": "36443",
"_parent": "36443",
"found": true,
"_source": {
...
}
}
Why it returns with parent 36443 and not 36442?
When I tried to reproduce this with your steps, I got the expected result (version=36442). I noticed that after your PUT of the document with "_parent": "36442" the output is "_version":6. In your GET of the document, "_version": 7 is returned. Is it possible that you posted another version of the document?
I also noticed that GET /chods/variant/565?parent=36443 would not actually filter by the parent id - the query parameter is disregarded. If you actually want to filter by parent id, this is the query you're looking for:
GET /chods/_search
{
"query": {
"parent_id": {
"type": "variant",
"id": "36442"
}
}
}
As #fylie pointed out the main problem is that if you use same id of the document you will get your document overridden by last version - sort of
Lets say that we have index /tests and type "a" which is child of type "test" and we do following commands:
PUT /tests/a/50?parent=25
{
"item": "C"
}
PUT /tests/a/50?parent=26
{
"item": "D"
}
PUT /tests/a/50?parent=50
{
"item": "E",
"item2": "F",
}
What the result will be? Well it can result in creating 1 - 3 documents.
If it will route to the same shard, you will end up with one document, which will have 3 versions.
If it will route to 3 different shards, you will end up with 3 new documents.

Resources