ElasticSearch _Type with geolocation - elasticsearch

I have set up an elastic Search index which includes different _type mapping for every country.
So there is a mapping for "us" "au" "uk" etc.
Each mapping includes a location mapping of type "geo_point"
prior to adding different _types
My query sort would look like:
"sort" : [
{
"_geo_distance" : {
"postcode.location" : [' . $mylocation_long . ',' . $mylocation_lat . '],
"order" : "asc",
"unit" : "km"
}
}
],
with adding _types to the data and mapping this no longer works, instead I specify it like:
"sort" : [
{
"_geo_distance" : {
"$country.location" : [' . $mylocation_long . ',' . $mylocation_lat . '],
"order" : "asc",
"unit" : "km"
}
}
],
this works fine.
However there are times when queries need to be done beyond a single country. So setting it to "us.location" isn't correct, and wont work.
In this case, how do I make this sorting work, when I don't know the country and I need to sort it by a mapped location.
Or is it a case of this can not be done and all docs must have the same _type in order for this to work?

Sorry if I am missing something obvious, but why cannot you just sort on "location". It seems to work just fine:
curl -XDELETE localhost:9200/test-idx/ && echo
curl -XPUT localhost:9200/test-idx/ -d '
{
"settings":{
"number_of_shards":1,
"number_of_replicas":0
},
"mappings": {
"us": {
"properties": {
"location": {
"type": "geo_point"
}
}
},
"uk": {
"properties": {
"location": {
"type": "geo_point"
}
}
},
"au": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}' && echo
curl -XPUT localhost:9200/test-idx/us/1 -d '
{
"location": "42.3606402,-71.0674569"
}
' && echo
curl -XPUT localhost:9200/test-idx/uk/2 -d '
{
"location": "51.5286416,-0.1017943"
}
' && echo
curl -XPUT localhost:9200/test-idx/au/3 -d '
{
"location": "-33.8471226,151.0594183"
}
' && echo
curl -XPOST localhost:9200/test-idx/_refresh && echo
curl "localhost:9200/test-idx/_search?pretty" -d '{
"query": {
"match_all": {}
},
"sort" : [
{
"_geo_distance" : {
"location" : "52.3712989,4.8937347",
"order" : "asc",
"unit" : "km"
}
}
]
}' && echo
output:
{"ok":true,"acknowledged":true}
{"ok":true,"acknowledged":true}
{"ok":true,"_index":"test-idx","_type":"us","_id":"1","_version":1}
{"ok":true,"_index":"test-idx","_type":"uk","_id":"2","_version":1}
{"ok":true,"_index":"test-idx","_type":"au","_id":"3","_version":1}
{"ok":true,"_shards":{"total":1,"successful":1,"failed":0}}
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [ {
"_index" : "test-idx",
"_type" : "uk",
"_id" : "2",
"_score" : null, "_source" : {"location": "51.5286416,-0.1017943"},
"sort" : [ 355.2735714686373 ]
}, {
"_index" : "test-idx",
"_type" : "us",
"_id" : "1",
"_score" : null, "_source" : {"location": "42.3606402,-71.0674569"},
"sort" : [ 5563.606078215864 ]
}, {
"_index" : "test-idx",
"_type" : "au",
"_id" : "3",
"_score" : null, "_source" : {"location": "-33.8471226,151.0594183"},
"sort" : [ 16650.926847312003 ]
} ]
}
}

What happens when you point the working query at /index/_search instead of /index/type/_search ?

Related

How do I create a default mapping for a field on my documents, that will not be made redundant in the next major version of Elasticsearch?

I'm on Elasticsearch 7.14.0 where mapping types have been removed.
Following from this question I have learned that the generic URI to PUT documents is /[index]/_doc/[id].
I want to create a default mapping for my documents on the name field:
curl -X PUT "localhost:9200/products?pretty" -H 'Content-Type: application/json' -d'
{
"mappings":{
"properties":{
"name":{
"analyzer":"edge_ngram_analyzer",
"search_analyzer":"standard",
"type":"text"
}
}
},
"settings":{
"analysis":{
"filter":{
"edge_ngram":{
"type":"edge_ngram",
"min_gram":"2",
"max_gram":"25",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"edge_ngram_analyzer":{
"filter":[
"lowercase",
"edge_ngram"
],
"tokenizer":"standard"
}
}
}
}
}
'
However creating a new document doesn't apply the analyzer:
curl -X PUT "localhost:9200/products/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Toast"
}
'
curl -X GET "localhost:9200/products/_search?pretty"
{
"took" : 1026,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "Toast"
}
}
]
}
}
I've tried creating the mapping under the _doc type, but am getting the following error:
curl -X PUT "localhost:9200/products?pretty" -H 'Content-Type: application/json' -d'
{
"mappings":{
"_doc":{
"properties":{
"name":{
"analyzer":"edge_ngram_analyzer",
"search_analyzer":"standard",
"type":"text"
}
}
}
},
"settings":{
"analysis":{
"filter":{
"edge_ngram":{
"type":"edge_ngram",
"min_gram":"2",
"max_gram":"25",
"token_chars":[
"letter",
"digit"
]
}
},
"analyzer":{
"edge_ngram_analyzer":{
"filter":[
"lowercase",
"edge_ngram"
],
"tokenizer":"standard"
}
}
}
}
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "The mapping definition cannot be nested under a type [_doc] unless include_type_name is set to true."
},
"status" : 400
}
However, I've read that:
Elasticsearch 8.x: Specifying types in requests is no longer supported. The include_type_name parameter is removed.
How do I create a default mapping for a field on my documents, that will not be made redundant in the next major version of Elasticsearch?
This question was due to a misunderstanding on my part (new to ES). I thought the returned result from a search would include the underlying analysis of any fields. When I perform a partially matching search, the document is correctly returned, so the above mapping works as intended:
curl -X GET "localhost:9200/products/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "To"
}
}
}
'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.41501677,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.41501677,
"_source" : {
"name" : "Toast"
}
}
]
}
}

cannot resolve symbol[string] when using updateByQuery with ElasticSearch

I have the following set-up:
mapping:
esClient.indices.putMapping({
index: 'tests',
body: {
properties: {
name: {
type: 'text',
},
lastName: {
type: 'text',
},
},
},
});
this is the result when I post an entry:
this is the result when I query the entries:
curl -X GET "localhost:9200/tests/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 1000,
"query" : {
"match_all" : {}
}
}
'
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "tests",
"_type" : "_doc",
"_id" : "KJbtj3kBRlqnip7VJLLI",
"_score" : 1.0,
"_source" : {
"lastName" : 1,
"name" : "lucas"
}
}
]
}
}
I tried to update the entry's last name with the following curl:
curl -X POST "localhost:9200/tests/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.lastName='johnson'",
"lang": "painless"
},
"query": {
"term": {
"name": "lucas"
}
}
}
'
AND THIS IS THE ERROR I'M GETTIN:
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"ctx._source.lastName=johnson",
" ^---- HERE"
],
"script" : "ctx._source.lastName=johnson",
"lang" : "painless",
"position" : {
"offset" : 21,
"start" : 0,
"end" : 28
}
}
],
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"ctx._source.lastName=johnson",
" ^---- HERE"
],
"script" : "ctx._source.lastName=johnson",
"lang" : "painless",
"position" : {
"offset" : 21,
"start" : 0,
"end" : 28
},
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "cannot resolve symbol [johnson]"
}
},
"status" : 400
}
If I put an integer instead of a string it updates it otherwise I keep getting that error.
Thanks a lot for your help.
You need to surround the new lastName field value with ' '.
Adding a working example
Index Data:
{
"name":"lucas",
"lastName":"erla"
}
Query:
POST _update_by_query
{
"script": {
"source": "ctx._source.lastName='johnson'",
"lang": "painless"
},
"query": {
"term": {
"name": "lucas"
}
}
}
After hitting the update by query API, the document will be updated to
GET /_doc/1
{
"_index": "67641538",
"_type": "_doc",
"_id": "1",
"_version": 3,
"_seq_no": 2,
"_primary_term": 1,
"found": true,
"_source": {
"lastName": "johnson",
"name": "lucas"
}
}
You need to surround the new lastName field value with " ".
I faced similar problem and I solved by adding " " instead of ' '. My ES version is 8.4.
QUERY.
curl -X POST "localhost:9200/tests/_update_by_query?pretty" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.lastName=\"johnson\"",
"lang": "painless"
},
"query": {
"term": {
"name": "lucas"
}
}
}
'

enabled fielddata on text field in ElasticSearch but aggregation is not working

According to the documentation you can run ElasticSearch aggregations on fields that are type keyword or not a text field or which have fielddata set to true in the index mapping.
I am trying to count city_names in an nginx log. It works fine with the int field result. But it does not work with the field city_name even when I updated the index mapping for that to put fielddata=true. The should have been not required as it was of type keyword.
To say it does not work means that:
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
Here is the field mapping:
"city_name" : {
"type" : "text",
"fielddata" : true
},
And here is the aggression query:
curl -XGET --user $pwd --header 'Content-Type: application/json' https://58571402f5464923883e7be42a037917.eu-central-1.aws.cloud.es.io:9243/logstash/_search?pretty -d '{
"aggs" : {
"cities": {
"terms" : { "field": "city_name"}
}
}
}'
If you don't get any error when executing your search it seems that is more like a problem with the data. Are you sure you have, at least, one document with the field city_name filled?
I tried to reproduce your issue with ElasticSearch 6.6.2.
I created an index
PUT cities
{
"mappings": {
"city": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
},
"city_name": {
"type": "text",
"fielddata": true
}
}
}
}
}
I added one document without the city_name
PUT cities/city/1
{
"id": "1"
}
When i performed the search:
GET cities/_search
{
"aggs": {
"cities": {
"terms" : { "field": "city_name"}
}
}
}
I got no buckets in the cities aggregation. But when I added one document with the city name filled:
PUT cities/city/2
{
"id": "2",
"city_name": "London"
}
I got the expected result:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "cities",
"_type" : "city",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"id" : "2",
"city_name" : "london"
}
},
{
"_index" : "cities",
"_type" : "city",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"id" : "1"
}
}
]
},
"aggregations" : {
"cities" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "london",
"doc_count" : 1
}
]
}
}
}

elasticsearch geo_point don't work properly

I want to make geo_point queries on elasticsearch but it doesn't work properly for me. I always get empty result for geo_polygon queries. maybe my mapping is wrong or the way i get the data.
mapping :
curl -XPUT 'localhost:9200/botanique_localisation/' -d '{
"mappings":{
"botanique_localisation" : {
"_all" : {"enabled" : true},
"_index" : {"enabled" : true},
"_id" : {"index": "not_analyzed", "store" : false},
"properties" : {
"_id" : {"type" : "string", "store" : "no","index": "not_analyzed" } ,
"LOCATION" : { "type" : "geo_point","lat_lon" :true ,"validate":true , "store":"yes" }
}
}
}
}'
creating the view in oracle
create view all_specimens_localisation as select RAWTOHEX( SPECIMENS.occurrenceid ) as "_id" ,
decode(LOCALISATIONS.decimalLatitude ||',' || LOCALISATIONS.decimalLongitude, ',', null ,
'{"lat":' || replace(LOCALISATIONS.decimalLatitude,',' ,'.' ) ||',"lon":' || replace(LOCALISATIONS.decimalLongitude , ',' ,'.' ) || '}'
) as location
from SPECIMENS left outer join ... where rownum < 1000 ;
i create a json object in the sql because sending lat_lon as a string didn't work for me ( elastic don't split the string as write her http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-point-type.html#_lat_lon_as_string_6 )
creating the river from oracle to elasticsearch
curl -XPUT 'localhost:9200/_river/localisation_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"index" : "botanique_localisation",
"bulk_size" : 2000,
"max_bulk_requests" : 10,
"bulk_flush_interval" : "1s",
"type" : "specimens",
"url" : "********",
"user" : "********",
"password" : "********",
"sql" : "select * from all_specimens_localisation"
}
}'
exemple of indexed data in elastichsearch
{
_index: botanique_localisation
_type: specimens
_id: 38C8F872A449491C881791DE8B501B17
_score: 1.4142135
_source: {
LOCATION: {
lon: 47.05
lat: -19.95
}
}
}
working range query
curl -XGET 'localhost:9200/botanique_localisation/specimens/_search?size=10&pretty' -d '
{ "query": { "bool": { "must": [
{ "range": {
"LOCATION.lon": {
"from": 47.04,
"to": 47.08
}
}
},{ "range": {
"LOCATION.lat": {
"from": -20,
"to": -19.90
}
}
}
]}}}'
and the result :
hits:{[
{ "_index": botanique_localisation,
"_type": specimens,
"_id": 38C8F872A449491C881791DE8B501B17,
"_score": 1.4142135,
"_source": {
"LOCATION": { "lon": 47.05, "lat": -19.95 }
}
},...
now the fun not working part ! with the geo_polygon query :
curl -XGET 'localhost:9200/botanique_localisation/_search?size=10&pretty' -d '{
"query":{
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"geo_polygon" : {
"LOCATION" : {
"points" : [
{ "lat": 100, "lon": -100},
{ "lat": 100, "lon": 100},
{ "lat": -100, "lon": 100 },
{ "lat": -100 , "lon": -100 }
]
}
}
}
}
}
}'
this return no hits !
what i'm missing ?
thank you
this query work :
curl -XGET 'localhost:9200/botanique_localisation/_search?pretty' -d '{
"query" : {
"filtered" : {
"filter" : {
"geo_bounding_box" : {
"type" : "indexed",
"LOCATION" : {
"top_left" : {
"lat" : 50,
"lon" : -50
},
"bottom_right" : {
"lat" :-50,
"lon" : 50
}
}
}
}
}
}
}'

ElasticSearch - searching different doc_types with the same field name but different analyzers

Let's say I make a simple ElasticSearch index:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"analysis": {
"char_filter": {
"de_acronym": {
"type": "mapping",
"mappings": [".=>"]
}
},
"analyzer": {
"analyzer1": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": ["de_acronym"]
}
}
}
}
}'
And I make two doc_types that have the same property name but they are analyzed slightly differently from one another:
curl -XPUT 'http://localhost:9200/test/_mapping/docA' -d '{
"docA": {
"properties": {
"name": {
"type": "string",
"analyzer": "simple"
}
}
}
}'
curl -XPUT 'http://localhost:9200/test/_mapping/docB' -d '{
"docB": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer1"
}
}
}
}'
Next, let's say I put a document in each doc_type with the same name:
curl -XPUT 'http://localhost:9200/test/docA/1' -d '{ "name" : "U.S. Army" }'
curl -XPUT 'http://localhost:9200/test/docB/1' -d '{ "name" : "U.S. Army" }'
Let's try to search for "U.S. Army" in both doc types at the same time:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5,
"hits" : [ {
"_index" : "test",
"_type" : "docA",
"_id" : "1",
"_score" : 1.5,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I only get one result! I get the other result when I specify docB's analyzer:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '
{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army",
"analyzer": "analyzer1"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "docB",
"_id" : "1",
"_score" : 1.0,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I was under the impression that ES would search each doc_type with the appropriate analyzer. Is there a way to do this?
The ElasticSearch docs say that precedence for search analyzer goes:
1) The analyzer defined in the query itself, else
2) The analyzer defined in the field mapping, else
...
In this case, is ElasticSearch arbitrarily choosing which field mapping to use?
Take a look at this issue in github, which seems to have started from this post in ES google groups. I believe it answers your question:
if its in a filtered query, we can't infer it, so we simply pick one of those and use its analysis settings

Resources