Spring Data Elasticseach: How to create Completion object with multiple weights? - spring-boot

I have managed to build a working autocomplete service using Elasticsearch with Spring Boot, but I can't assign different weights for my autocomplete sentences.
While I am building the Completion object (org.springframework.data.elasticsearch.core.completion.Completion) I am using the standard constructor and next, I am assigning the weight to the object, for example (I am using Kotlin)
val completion = Completion(arrayOf("Sentence one", "Second sentence"))
completion.weight = 10
(...)
myEntity.suggest = completion
what produces the following JSON for Elasticsearch
{
"suggest" : {
"input": [ "Sentence one", "Second sentence" ],
"weight" : 10
}
}
But, according to the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html) I would like to achieve something like this
{
"suggest" : [
{
"input": "Sentence one",
"weight" : 10
},
{
"input": "Second sentence",
"weight" : 5
}
]
}
Is it possible with spring-data-elasticsearch? If yes, how can I do this?

No, at the moment the second case is currently not supported by Spring Data Elasticsearch.
Both JSON you show are valid, the first one is for multiple inputs that all have the same weight, the second one is for multiple inputs, when ich input has a different weight.
Please file an issue in Spring Data Elasticsearch Jira to add support for the Completion object to support this case.

Related

Elastic/Opensearch: HowTo create a new document from an _ingest/pipeline

I am working with Elastic/Opensearch and want to create a new document in a different index out of an _ingest/pipeline
I found no help in the www...
All my documents (filebeat) get parsed and modified in the beginning by a pipline, lets say "StartPipeline".
Triggered by an information in a field of the incoming document, lets say "Start", I want to store that value in a special way by creating a new document in a different long-termindex - with some more information from the triggering document.
If found possibilities, how to do this manually from the console (update_by_query / reindex / painlesscripts) but it has to be triggered by an incoming document...
Perhaps this is easier to understand - in my head it looks like something like that.
PUT _ingest/pipeline/StartPipeline
{
"description" : "create a document in/to a different index",
"processors" : [ {
"PutNewDoc" : {
"if": "ctx.FieldThatTriggers== 'start'",
"index": "DestinationIndex",
"_id": "123",
"document": { "message":"",
"script":"start",
"server":"alpha
...}
}
} ]
}
Does anyone has an idea?
And sorry, I am no native speaker, I am from Germany

Elasticsearch 7 number_format_exception for input value as a String

I have field in index with mapping as :
"sequence_number" : {
"type" : "long",
"copy_to" : [
"_custom_all"
]
}
and using search query as
POST /my_index/_search
{
"query": {
"term": {
"sequence_number": {
"value": "we"
}
}
}
}
I am getting error message :
,"index_uuid":"FTAW8qoYTPeTj-cbC5iTRw","index":"my_index","caused_by":{"type":"number_format_exception","reason":"For input string: \"we\""}}}]},"status":400}
at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:260) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:238) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar:7.1.1]
at
How can i ignore number_format_exception errors, so the query just doesn't return anything or ignores this filter in particular - either is acceptable.
Thanks in advance.
What you are looking for is not possible, ideally, you should have coherce enabled on your numeric fields so that your index doesn't contain dirty data.
The best solution is that in your application which generated the Elasticsearch query(you should have a check for NumberFormatExcepton if you are searching for numeric fields as your index doesn't contain the dirty data in the first place and reject the query if you get an exception in your application).
Edit: Another interesting approach is to validate the data before inserting into ES, using the Validate API as suggested by #prakash, only thing is that it would add another network call but if your application is not latency-sensitive, it can be used as a workaround.

Elastic kibana selection of points (geopoints) in complex polygon

I'm having trouble with what seems like a fairly basic use case, but I'm hitting certain limitations in Kibana and problems with certain geo data types. It's starting to feel like I'm just approaching it wrong.
I have a relatively large point data set (locations) of type geo_point, with a map and dashboard built. I now want to add a complex AOI. I took the shapefile, dissolved it so it became one feature instead of many, converted it to geojson and uploaded it (to create an index) via the Kibana Maps functionality. I then made it available as layer, and wanted to just allow it to be selected, show tooltip, and then Filter by Feature. Unfortunately I then received an error saying along the lines that this would be too large an operation to be posted to the URL - which I understand as there are over 2 million characters in the geojson.
Instead I thought I could write the query somehow according to the guidance on: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-query.html
with the pre-indexed shape.
However, it doesn't seem to work to allow geo_point to be queried against geo_shape.
e.g.
GET /locations_index/_search
{
"query": {
"geo_point": {
"geolocation": {
"relation": "within",
"indexed_shape": {
"index": "aoi_index",
"id": "GYruUnMBfgunZ6kjA8qn",
"path": "coordinates"
}
}
}
}
}
Gives an error of:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "no [query] registered for [geo_point]",
"line" : 3,
"col" : 18
}
],
"type" : "parsing_exception",
"reason" : "no [query] registered for [geo_point]",
"line" : 3,
"col" : 18
},
"status" : 400
}
Do I need to convert my points index to be geoshape instead of geopoints? Or is there a simpler way?
I note the documentation at: https://www.elastic.co/guide/en/elasticsearch/guide/current/filter-by-geopoint.html suggests that I can query by geo_polygon, but I can't see any way of referencing my pre-indexed shape, instead of having the huge chunk of JSON in the query (as the example suggests).
Can anyone point me (even roughly) in the right direction?
Thanks in advance.
Here's how you can utilize indexed_shape. Let me know if this answer is sufficient to get you started.

How to create a multi-value tag metric gauge?

Already read this but with no lucky.
All examples I've found just show how to create a single value tag like this:
{
"name" : "jvm.gc.memory.allocated",
"measurements" : [ {
"statistic" : "COUNT",
"value" : 1.98180864E8
} ],
"availableTags" : [ {
"tag" : "stack",
"values" : [ "prod" ]
}, {
"tag" : "region",
"values" : [ "us-east-1" ]
} ]
}
But I need to create a multi value tag like this:
availableTags: [
{
tag: "method",
values: [
"POST",
"GET"
]
},
My code so far:
List<Tag> tags = new ArrayList<Tag>();
tags.add( Tag.of("test", "John") );
tags.add( Tag.of("test", "Doo") );
tags.add( Tag.of("test", "Foo Bar") );
Metrics.gauge("my.metric", tags, new AtomicLong(3) );
As you can see I think I can just repeat the key but this is not the case and the second parameter of Tag.of is a String and not a String Array.
I don't think this was the real intent of authors of these metering libraries - to provide a multi-value tag for a metric.
The whole point of metrics tags is to provide a "discriminator" - something that can be used later to retrieve metrics whose tag has a specific, single, value.
Usually, this value is used in metrics storage systems, like Prometheus, DataDog, InfluxDB and so on. And above this Grafana can incorporate a single tag value in its queries.
The only possible use case of such a request that I see is that it will be possible to see the metrics value in an actuator in a kind of more convenient way, but again it's not the main point of the whole capability here, so, bottom line I doubt that its possible at all.

Elasticsearch: Use match query along with autocomplete

I want to use match query along with autocomplete suggestion in ES5. Basically I want to restrict my autocomplete result based on an attribute, like autocomplete should return result within a city only.
MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("cityName", city);
SuggestBuilder suggestBuilder = new SuggestBuilder()
.addSuggestion("region", SuggestBuilders.completionSuggestion("region").text(text));
SearchResponse response = client.prepareSearch(index).setTypes(type)
.suggest(suggestBuilder)
.setQuery(queryBuilder)
.execute()
.actionGet();
The above doesn't seem to work correctly. I am getting both the results in the response both independent of each other.
Any suggestion?
It looks like the suggestion builder is creating a completion suggester. Completion suggesters are stored in a specialized structure that is separate from the main index, which means it has no access to your filter fields like cityName. To filter suggestions you need to explicitly define those same filter values when you create the suggestion, separate to the attributes you are indexing for the document to which the suggestion is attached. These suggester filters are called context. More information can be found in the docs.
The docs linked to above are going to explain this better than I can, but here is a short example. Using a mapping like the following:
"auto_suggest": {
"type": "completion",
"analyzer": "simple",
"contexts": [
{
"name": "cityName",
"type": "category",
"path": "cityName"
}
]
}
This section of the index settings defines a completion suggester called auto_suggest with a cityName context that can be used to filter the suggestions. Note that the path value is set, which means this context filter gets its value from the cityName attribute in your main index. You can remove the path value if you want to explicitly set the context to something that isn't already in the main index.
To request suggestions while providing context, something like this in combination with the settings above should work:
"suggest": {
"auto_complete":{
"text":"Silv",
"completion": {
"field" : "auto_suggest",
"size": 10,
"fuzzy" : {
"fuzziness" : 2
},
"contexts": {
"cityName": [ "Los Angeles" ]
}
}
}
}
Note that this request also allows for fuzziness, to make it a little resilient to spelling mistakes. It also restricts the number of suggestions returned to 10.
It's also worth noting that in ES 5.x completion suggester are document centric, so if multiple documents have the same suggestion, you will receive duplicates of that suggestion if it matches the characters entered. There's an option in ES 6 to de-duplicate suggestions, but nothing similar in 5.x. Again it's best to think of completion suggesters existing in their own index, specifically an FST, which is explained in more detail here.

Resources