Failed to find geo_point field using #Spatial from hibernate search - elasticsearch

I need to implement a solution that aims to filter one of my search query using location. You will find here my entity and how I used #Spatial annotation :
#Entity
#Indexed
#Spatial(spatialMode = SpatialMode.RANGE)
#Table(name = "ORGANIZATION", uniqueConstraints = { #UniqueConstraint(columnNames = { "CODE" }) })
public class Organization implements Serializable, FileEntity {
...
#Latitude
#Column(name = "LATITUDE")
private Double latitude;
#Longitude
#Column(name = "LONGITUDE")
private Double longitude;
...
}
Indexing does not figure any errors, here's the result I found using elasticsearch querying :
GET http://localhost:9201/com.supralog.lexis.model.organization.organization
{
"com.supralog.lexis.model.organization.organization" : {
"aliases" : { },
"mappings" : {
"com.supralog.lexis.model.organization.Organization" : {
"properties" : {
"_hibernate_default_coordinates" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
},
...
}
}
}
}
GET http://localhost:9201/com.supralog.lexis.model.organization.organization/_search?from=0&size=1
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 15628,
"max_score" : 1.0,
"hits" : [
{
"_index" : "com.supralog.lexis.model.organization.organization",
"_type" : "com.supralog.lexis.model.organization.Organization",
"_id" : "...",
"_score" : 1.0,
"_source" : {
...
"_hibernate_default_coordinates" : {
"lat" : 49.1886203,
"lon" : -0.38740259999997306
},
...
}
}
]
}
}
After checking indexation looks OK, I tried to query all Organization objects within a given radius of 100km :
final Coordinates coordinates = Point.fromDegrees(form.getLatitude(), form.getLongitude());
final String search = StringUtils.join(terms, " ");
final FullTextSession fullTextSession = Search.getFullTextSession(sessionFactory.getCurrentSession());
final QueryBuilder queryBuilder = fullTextSession.getSearchFactory().buildQueryBuilder()
.forEntity(Organization.class).get();
final org.apache.lucene.search.Query elasticQuery = queryBuilder.spatial().within(100,Unit.KM).ofCoordinates(coordinates).createQuery();
final FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(elasticQuery, Organization.class);
fullTextQuery.setMaxResults(form.getMaximumNumberOfResult());
fullTextQuery.setProjection(FullTextQuery.THIS, FullTextQuery.SCORE);
And my problem is here, when I try to execute this query, I'm having the following return statement :
Request: POST /com.supralog.lexis.model.organization.organization/_search with parameters {from=0, size=50}
Response: 400 'Bad Request' with body
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to find geo_point field [_hibernate_default_coordinates]",
"index_uuid": "phOfJTOyRvetHyZrfeUmrA",
"index": "com.supralog.lexis.model.organization.organization"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "com.supralog.lexis.model.organization.organization",
"node": "9DCzSp6kS5KGtMq6tzywzg",
"reason": {
"type": "query_shard_exception",
"reason": "failed to find geo_point field [_hibernate_default_coordinates]",
"index_uuid": "phOfJTOyRvetHyZrfeUmrA",
"index": "com.supralog.lexis.model.organization.organization"
}
}
]
},
"status": 400
}
To fix it, I tried to set a name to #Spatial record, I tried to make my entity implements Coordinates, etc. However I'm always having the same result. It looks like hibernate-search is not indexing my location as a geo_point, reason why it's failing on querying...
Do you have any idea on what I missed in documentation ?
Versions used :: hibernate : 5.3 ; hibernate-search : 5.10 ; elasticsearch : 5.6

The Elasticsearch mapping is wrong:
"com.supralog.lexis.model.organization.Organization" : {
"properties" : {
"_hibernate_default_coordinates" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
},
...
}
}
The fact that there's a "properties" attribute under "_hibernate_default_coordinates" means that "_hibernate_default_coordinates" is of type "object", whereas it should be of type "geo_point".
The most likely explanation is that you didn't generate the schema before indexing, and Elasticsearch tried to automatically generate it on the fly based on the documents it received. As you can see it's a very bad idea, since the risk of Elasticsearch guessing the schema wrong is quite high.
You should have a look at the documentation about configuration. In particular, you should pick a suitable index schema management strategy.
In short, put the following into hibernate.properties
In development environment: Hibernate Search will try its best, but may fail, and data won't be reindexed magically, you'll have to do it yourself
hibernate.search.default.elasticsearch.index_schema_management_strategy update
In production environment: you'll have to update the schema carefully yourself, and plan a mass reindexing when updating your application.
hibernate.search.default.elasticsearch.index_schema_management_strategy create

Related

Getting a timestamp exception when I try to update an unrelated field using painless in elasticsearch

Im trying to run the following script
POST /data_hip/_update/1638643727.0
{
"script":{
"source":"ctx._source.avgmer=4;"
}
}
But I am getting the following error.
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [#timestamp] of type [date] in document with id '1638643727.0'. Preview of field's value: '1.638642742E12'"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [#timestamp] of type [date] in document with id '1638643727.0'. Preview of field's value: '1.638642742E12'",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "failed to parse date field [1.638642742E12] with format [epoch_millis]",
"caused_by" : {
"type" : "date_time_parse_exception",
"reason" : "Failed to parse with all enclosed parsers"
}
}
},
"status" : 400
}
this is strange because on queries (not updates) the date time is parsed fine.
The timestamp field mapping is as follows
"#timestamp": {
"type":"date",
"format":"epoch_millis"
},
I am running elasticsearch 7+
EDIT:
Adding my index settings
{
"data_hip" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"include" : {
"_tier_preference" : "data_content"
}
}
},
"number_of_shards" : "1",
"provided_name" : "data_hip",
"creation_date" : "1638559533343",
"number_of_replicas" : "1",
"uuid" : "CHjkvSdhSgySLioCju9NqQ",
"version" : {
"created" : "7150199"
}
}
}
}
}
Im not running an ingest pipeline
The problem is the scientific notation, the 'E12' suffix, being in a field that ES is expecting to be an integer.
Using this reprex:
PUT so_test
{
"mappings": {
"properties": {
"ts": {
"type": "date",
"format": "epoch_millis"
}
}
}
}
# this works
POST so_test/_doc/
{
"ts" : "123456789"
}
# this does not, throws the same error you have IRL
POST so_test/_doc/
{
"ts" : "123456789E12"
}
I'm not sure how/where those values are creeping in, but they are there in the document you are passing to ES.

ElasticSearch, simple two fields comparison with painless

I'm trying to run a query such as SELECT * FROM indexPeople WHERE info.Age > info.AgeExpectancy
Note the two fields are NOT nested, they are just json object
POST /indexPeople/_search
{
"from" : 0,
"size" : 200,
"query" : {
"bool" : {
"filter" : [
{
"bool" : {
"must" : [
{
"script" : {
"script" : {
"source" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless"
},
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"_source" : {
"includes" : [
"info"
],
"excludes" : [ ]
}
}
However this query fails as
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
}
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "indexPeople",
"node" : "c_Dv3IrlQmyvIVpLoR9qVA",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.get(ScriptDocValues.java:121)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Longs.getValue(ScriptDocValues.java:115)",
"doc['info.Age'].value > doc['info.AgeExpectancy'].value",
" ^---- HERE"
],
"script" : "doc['info.Age'].value > doc['info.AgeExpectancy'].value",
"lang" : "painless",
"position" : {
"offset" : 22,
"start" : 0,
"end" : 70
},
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
}
}
]
},
"status" : 400
}
Is there a way to achieve this?
What is the best way to debug it? I wanted to print the objects or look at the logs (which aren't there), but I couldn't find a way to do neither.
The mapping is:
{
"mappings": {
"_doc": {
"properties": {
"info": {
"properties": {
"Age": {
"type": "long"
},
"AgeExpectancy": {
"type": "long"
}
}
}
}
}
}
}
perhaps you already solved the issue. The reason why the query failed is clear:
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
Basically there is one or more document that do not have one of the queried fields. So you can achieve the result you need by using an if to check if the fields do indeed exists. If they do not exist, you can simply return false as follows:
{
"script": """
if (doc['info.Age'].size() > 0 && doc['info.AgeExpectancy'].size() > 0) {
return doc['info.Age'].value > doc['info.AgeExpectancy'].value
}
return false;
}
"""
I tested it with an Elasticsearch 7.10.2 and it works.
What is the best way to debug it
That is a though question, perhaps someone has a better answer for it. I try to list some options. Obviously, debugging requires to read carefully the error messages.
PAINLESS LAB
If you have a pretty recent version of Kibana, you can try to use the painless lab to simulate your documents and get the errors quicker and in a more focused environment.
KIBANA Scripted Field
You can try to create a bolean scripted field in the index pattern named condition. Before clicking create remember to click "preview result":
MINIMAL EXAMPLE Create a minimal example to reduce the complexity.
For this answer I used a sample index with four documents with all possible cases.
No info: { "message": "ok"}
Info.Age but not AgeExpectancy: {"message":"ok","info":{"Age":14}}
Info.AgeExpectancy but not Age: {"message":"ok","info":{"AgeExpectancy":12}}
Info.Age and AgeExpectancy: {"message":"ok","info":{"Age":14, "AgeExpectancy": 12}}

How do I query a null date inside an array in elasticsearch?

In an elasticsearch query I am trying to search Document objects that have an array of approval notifications. The notifications are considered complete when dateCompleted is populated with a date, and considered pending when either dateCompleted doesn't exist or exists with null. If the document does not contain an array of approval notifications then it is out of the scope of the search.
I am aware of putting null_value for field dateCompleted and setting it to some arbitrary old date but that seems hackish to me.
I've tried to use Bool queries with must exist doc.approvalNotifications and must not exist doc.approvalNotifications.dateCompleted but that does not work if a document contains a mix of complete and pending approvalNotifications. e.g. it only returns document with ID 2 below. I am expecting documents with IDs 1 and 2 to be found.
How can I find pending approval notifications using elasticsearch?
PUT my_index/_mapping/Document
"properties" : {
"doc" : {
"properties" : {
"approvalNotifications" : {
"properties" : {
"approvalBatchId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvalTransitionState" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"approvedByUser" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dateCompleted" : {
"type" : "date"
}
}
}
}
}
}
Documents:
{
"id": 1,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
}
]
}
{
"id": 2,
"status": "Pending Notifications",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
}
]
}
{
"id": 3,
"status": "Complete",
"approvalNotifications": [
{
"approvalBatchId": "e6c39194-5475-4168-9729-8ddcf46cf9ab",
"dateCompleted": "2018-11-15T16:09:15.346+0000"
},
{
"approvalBatchId": "05eaeb5d-d802-4a28-b699-5e593a59d445",
"dateCompleted": "2018-11-16T16:09:15.346+0000"
}
]
}
{
"id": 4
"status": "No Notifications"
}
You are almost there, you can achieve the desired behavior by using nested datatype for the "approvalNotifications" field.
What happens is that Elasticsearch flattens your approvalNotifications objects, treating their subfields as subfields of the original document. The nested field instead will tell ES to index each inner object as an implicit separate object, though related to the original one.
To query nested objects one should use nested query.
Hope that helps!

Elasticsearch not storing field, what am I doing wrong?

I have something like the following template in my Elasticsearch. I just want certain part of the data returned, so I turn the source off, and explicitly stated store for the fields I want.
{
"template_1" : {
"order" : 20,
"template" : "test*",
"settings" : { },
"mappings" : {
"_default_" : {
"_source" : {
"enabled" : false
}
},
"type_1" : {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}
}
}
However, when I query the data, I don't get the fields back. The query works, however, if I enable the _source field. I am just starting with Elasticsearch, so I am not quite sure what I am doing wrong. Any help would be appreciated.
Field definitions should be wrapped in properties section of your mapping:
"type_1" : {
"properties": {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}

ElasticSearch doesn't seem to support array lookups

I currently have a fairly simple document stored in ElasticSearch that I generated with an integration test:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "unit-test_project600",
"_type" : "recordDefinition505",
"_id" : "400",
"_score" : 1.0, "_source" : {
"field900": "test string",
"field901": "500",
"field902": "2050-01-01T00:00:00",
"field903": [
"Open"
]
}
} ]
}
}
I would like to filter for specifically field903 and a value of "Open", so I perform the following query:
{
query: {
filtered: {
filter: {
term: {
field903: "Open",
}
}
}
}
}
This returns no results. However, I can use this with other fields and it will return the record:
{
query: {
filtered: {
filter: {
term: {
field901: "500",
}
}
}
}
}
It would appear that I'm unable to search in arrays with ElasticSearch. I have read a few instances of people with a similar problem, but none of them appear to have solved it. Surely this isn't a limitation of ElasticSearch?
I thought that it might be a mapping problem. Here's my mapping:
{
"unit-test_project600" : {
"recordDefinition505" : {
"properties" : {
"field900" : {
"type" : "string"
},
"field901" : {
"type" : "string"
},
"field902" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"field903" : {
"type" : "string"
}
}
}
}
}
However, the ElasticSearch docs indicate that there is no difference between a string or an array mapping, so I don't think I need to make any changes here.
Try searching for "open" rather than "Open." By default, Elasticsearch uses a standard analyzer when indexing fields. The standard analyzer uses a lowercase filter, as described in the example here. From my experience, Elasticsearch does search arrays.

Resources