Add new suggestion context - ElasticSearch Suggestions - elasticsearch

I have this search context in my index mapping
Index: region
"place_suggest": {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50,
"contexts" : [
{
"name" : "place_type",
"type" : "CATEGORY",
"path" : "place_type"
}
]
}
And I want to add a new context to this mapping
{
"name": "restricted",
"type": "CATEGORY",
"path": "restricted"
}
I've tried using Update Mapping API to add this new context like this:
PUT region_test/_mapping/
{
"properties" : {
"place_suggest" : {
"contexts": [
"name": "restricted",
"type": "CATEGORY",
"path": "restricted"
]
}
}
}
I'm using Kibana dev tools for running this query.

You will not be able to edit your field by adding the new context.
You need to create a new mapping and re-index your index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html#change-existing-mapping-parms

Related

Elasticsearch : how to search multiple words in a copy_to field?

I am currently learning Elasticsearch and stuck on the issue described below:
On an existing index (I don't know if it matter) I added this new mapping:
PUT user-index
{
"mappings": {
"properties": {
"common_criteria": { -- new property which aggregates other properties by copy_to
"type": "text"
},
"name": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"username": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"phone": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"country": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
}
}
}
}
The goal is to search ONE or MORE values only on common_criteria.
Say that we have:
{
"common_criteria": ["John Smith","johny","USA"]
}
What I would like to achieve is an exact match searching on multiple values of common_criteria:
We should have a result if we search with John Smith or with USA + John Smith or with johny + USA or with USA or with johny and finally with John Smith + USA + johny (the words order does not matter)
If we search with multiple words like John Smith + Germany or johny + England we should not have a result
I am using Spring Data Elastic to build my query:
NativeSearchQueryBuilder nativeSearchQuery = new NativeSearchQueryBuilder();
BoolQueryBuilder booleanQuery = QueryBuilders.boolQuery();
String valueToSearch = "johny"
nativeSearchQuery.withQuery(booleanQuery.must(QueryBuilders.matchQuery("common_criteria", valueToSearch)
.fuzziness(Fuzziness.AUTO)
.operator(Operator.AND)));
Logging the request sent to Elastic I have:
{
"bool" : {
"must" :
{
"match" : {
"common_criteria" : {
"query" : "johny",
"operator" : "AND",
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
With that request I have 0 result. I know that request is not correct because of must.match condition and maybe the field common_criteria is also not well defined.
Thanks in advance for your help and explanations.
EDIT: After trying multi_match query.
Following #rabbitbr's suggestion I tried the multi_match query but does not seem to work. This is the example of a request sent to Elastic (with 0 result):
{
"bool" : {
"must" : {
"multi_match" : {
"query" : "John Smith USA",
"fields" : [
"name^1.0",
"username^1.0",
"phone^1.0",
"country^1.0",
],
"type" : "best_fields",
"operator" : "AND",
"slop" : 0,
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
That request does not return a result.
I would try to use Multi-match query before creating a field to store all the others in one place.
The multi_match query builds on the match query to allow multi-field
queries.

Is it possible to add a new similarity metric to an existing index in Elasticsearch?

Let's say there is an existing index with a customized BM25 similarity metric like this:
{
"settings": {
"index": {
"similarity": {
"BM25_v1": {
"type": "BM25",
"b": 1.0
}
},
"number_of_replicas": 0,
"number_of_shards": 3,
"refresh_interval": "120s"
}
}
}
And this similarity metric is used for two fields:
{
'some_field': {
'type': 'text',
'norms': 'true',
'similarity': 'BM25_v1'
},
'another_field': {
'type': 'text',
'norms': 'true',
'similarity': 'BM25_v1'
},
}
Now, I was wondering if it's possible to add another similarity metric (BM25_v2) to the same index and use this new metric for the another_field, like this:
"index": {
"similarity": {
# The existing metric, not changed.
"BM25_v1": {
"type": "BM25",
"b": 1.0
},
# The new similarity metric for this index.
"BM25_v2": {
"type": "BM25",
"b": 0.0
}
}
}
# ... and use the new metric for one of the fields:
{
'some_field': {
'type': 'text',
'norms': 'true',
'similarity': 'BM25_v1' # This field uses the same old metric.
},
'another_field': {
'type': 'text',
'norms': 'true',
'similarity': 'BM25_v2' # The new metric is used for this field.
},
}
I couldn't find any example for this scenario in the documentation, so I wasn't sure if this is possible at all.
Update: I have already seen this old still-open issue which concerns with dynamic update of similarity metrics in Elasticsearch. But it is not completely clear from that discussion what is and isn't possible. Also there have been some attempts for achieving some level of similarity update; but I think it's not documented (e.g. it is possible to change the parameters of an existing similarity metric, say b or k1 in an existing BM25-based metric).
TLDR;
I believe you can't.
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
},
"status" : 400
}
If you want to, I believe you will have to create a new field and re-index the data.
To reproduce
PUT /70973345
{
"settings": {
"index": {
"similarity": {
"my_similarity": {
"type": "BM25",
"b": 1.0
}
}
}
}
}
PUT /70973345/_mapping
{
"properties" : {
"title" : { "type" : "text", "similarity" : "my_similarity" }
}
}
We insert some dummy data, and retrieve it.
POST /70973345/_doc
{
"title": "I love rock'n roll"
}
POST /70973345/_doc
{
"title": "I love pasta al'arabita"
}
POST /70973345/_doc
{
"title": "pasta rock's"
}
GET /70973345/_search?explain=true
{
"query": {
"match": {
"title": "pasta"
}
}
}
If we try to update it the settings without closing, we get an error.
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Can't update non dynamic settings ...."
}
],
"type" : "illegal_argument_exception",
"reason" : "Can't update non dynamic settings ...."
},
"status" : 400
}
POST /70973345/_close?wait_for_active_shards=0
PUT /70973345/_settings
{
"index": {
"similarity": {
"my_similarity": {
"type": "BM25",
"b": 1.0
},
"my_similarity_v2": {
"type": "BM25",
"b": 0
}
}
}
}
The update works fine, BUT :
PUT /70973345/_mapping
{
"properties": {
"title": {
"type": "text",
"similarity": "my_similarity_v2"
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Mapper for [title] conflicts with existing mapper:\n\tCannot update parameter [similarity] from [my_similarity] to [my_similarity_v2]"
},
"status" : 400
}
It will not work, regardless of the open/close status of the index.
Which makes me believe this is not possible. you might need to re-index into a new indice the existing data.

How do I query a null date inside an array in elasticsearch?

In an elasticsearch query I am trying to search Document objects that have an array of approval notifications. The notifications are considered complete when dateCompleted is populated with a date, and considered pending when either dateCompleted doesn't exist or exists with null. If the document does not contain an array of approval notifications then it is out of the scope of the search.
I am aware of putting null_value for field dateCompleted and setting it to some arbitrary old date but that seems hackish to me.
I've tried to use Bool queries with must exist doc.approvalNotifications and must not exist doc.approvalNotifications.dateCompleted but that does not work if a document contains a mix of complete and pending approvalNotifications. e.g. it only returns document with ID 2 below. I am expecting documents with IDs 1 and 2 to be found.
How can I find pending approval notifications using elasticsearch?
PUT my_index/_mapping/Document
"properties" : {
"doc" : {
"properties" : {
"approvalNotifications" : {
"properties" : {
"approvalBatchId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvalTransitionState" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvedByUser" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dateCompleted" : {
"type" : "date"
}
}
}
}
}
}
Documents:
{
"id": 1,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
}
]
}
{
"id": 2,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
}
]
}
{
"id": 3,
"status": "Complete",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
"dateCompleted": "2018-11-16T16:09:15.346+0000"
}
]
}
{
"id": 4
"status": "No Notifications"
}
You are almost there, you can achieve the desired behavior by using nested datatype for the "approvalNotifications" field.
What happens is that Elasticsearch flattens your approvalNotifications objects, treating their subfields as subfields of the original document. The nested field instead will tell ES to index each inner object as an implicit separate object, though related to the original one.
To query nested objects one should use nested query.
Hope that helps!

Failed to find geo_point field using #Spatial from hibernate search

I need to implement a solution that aims to filter one of my search query using location. You will find here my entity and how I used #Spatial annotation :
#Entity
#Indexed
#Spatial(spatialMode = SpatialMode.RANGE)
#Table(name = "ORGANIZATION", uniqueConstraints = { #UniqueConstraint(columnNames = { "CODE" }) })
public class Organization implements Serializable, FileEntity {
...
#Latitude
#Column(name = "LATITUDE")
private Double latitude;
#Longitude
#Column(name = "LONGITUDE")
private Double longitude;
...
}
Indexing does not figure any errors, here's the result I found using elasticsearch querying :
GET http://localhost:9201/com.supralog.lexis.model.organization.organization
{
"com.supralog.lexis.model.organization.organization" : {
"aliases" : { },
"mappings" : {
"com.supralog.lexis.model.organization.Organization" : {
"properties" : {
"_hibernate_default_coordinates" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
},
...
}
}
}
}
GET http://localhost:9201/com.supralog.lexis.model.organization.organization/_search?from=0&size=1
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 15628,
"max_score" : 1.0,
"hits" : [
{
"_index" : "com.supralog.lexis.model.organization.organization",
"_type" : "com.supralog.lexis.model.organization.Organization",
"_id" : "...",
"_score" : 1.0,
"_source" : {
...
"_hibernate_default_coordinates" : {
"lat" : 49.1886203,
"lon" : -0.38740259999997306
},
...
}
}
]
}
}
After checking indexation looks OK, I tried to query all Organization objects within a given radius of 100km :
final Coordinates coordinates = Point.fromDegrees(form.getLatitude(), form.getLongitude());
final String search = StringUtils.join(terms, " ");
final FullTextSession fullTextSession = Search.getFullTextSession(sessionFactory.getCurrentSession());
final QueryBuilder queryBuilder = fullTextSession.getSearchFactory().buildQueryBuilder()
.forEntity(Organization.class).get();
final org.apache.lucene.search.Query elasticQuery = queryBuilder.spatial().within(100,Unit.KM).ofCoordinates(coordinates).createQuery();
final FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(elasticQuery, Organization.class);
fullTextQuery.setMaxResults(form.getMaximumNumberOfResult());
fullTextQuery.setProjection(FullTextQuery.THIS, FullTextQuery.SCORE);
And my problem is here, when I try to execute this query, I'm having the following return statement :
Request: POST /com.supralog.lexis.model.organization.organization/_search with parameters {from=0, size=50}
Response: 400 'Bad Request' with body
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to find geo_point field [_hibernate_default_coordinates]",
"index_uuid": "phOfJTOyRvetHyZrfeUmrA",
"index": "com.supralog.lexis.model.organization.organization"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "com.supralog.lexis.model.organization.organization",
"node": "9DCzSp6kS5KGtMq6tzywzg",
"reason": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [_hibernate_default_coordinates]",
"index_uuid": "phOfJTOyRvetHyZrfeUmrA",
"index": "com.supralog.lexis.model.organization.organization"
}
}
]
},
"status": 400
}
To fix it, I tried to set a name to #Spatial record, I tried to make my entity implements Coordinates, etc. However I'm always having the same result. It looks like hibernate-search is not indexing my location as a geo_point, reason why it's failing on querying...
Do you have any idea on what I missed in documentation ?
Versions used :: hibernate : 5.3 ; hibernate-search : 5.10 ; elasticsearch : 5.6
The Elasticsearch mapping is wrong:
"com.supralog.lexis.model.organization.Organization" : {
"properties" : {
"_hibernate_default_coordinates" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
},
...
}
}
The fact that there's a "properties" attribute under "_hibernate_default_coordinates" means that "_hibernate_default_coordinates" is of type "object", whereas it should be of type "geo_point".
The most likely explanation is that you didn't generate the schema before indexing, and Elasticsearch tried to automatically generate it on the fly based on the documents it received. As you can see it's a very bad idea, since the risk of Elasticsearch guessing the schema wrong is quite high.
You should have a look at the documentation about configuration. In particular, you should pick a suitable index schema management strategy.
In short, put the following into hibernate.properties
In development environment: Hibernate Search will try its best, but may fail, and data won't be reindexed magically, you'll have to do it yourself
hibernate.search.default.elasticsearch.index_schema_management_strategy update
In production environment: you'll have to update the schema carefully yourself, and plan a mass reindexing when updating your application.
hibernate.search.default.elasticsearch.index_schema_management_strategy create

Exact phrase match in ElasticSearch

I'm trying to achieve exact search by phrase in Elastic, using my existing index (full-text). When user is searching, say, "Sanity Testing", the result should bring all the docs with "Sanity Testing" (case-insensitive), but not "Sanity tested".
My mapping:
{
"doc": {
"properties": {
"file": {
"type": "attachment",
"path": "full",
"fields": {
"file": {
"type": "string",
"term_vector":"with_positions_offsets",
"analyzer":"o3analyzer",
"store": true
},
"title" : {"store" : "yes"},
"date" : {"store" : "yes"},
"keywords" : {"store" : "yes"},
"content_type" : {"store" : "yes"},
"content_length" : {"store" : "yes"},
"language" : {"store" : "yes"}
}
}
}
}
}
As I understand, there's a way to add another index with "raw" analyzer, but I'm not sure this will work due to the need to search as case-insensitive. And also I don't want to rebuild indexes, as there are hundreds machines with tons of documents already indexed, so it may take ages.
Is there a way to run such a query? I'm now trying to search using the following query:
{
query: {
match_phrase: {
file: "Sanity Testing"
}
}
and it brings me both "Sanity Testing" and "Sanity Tested".
Any help appreciated!

Resources