How to convert existing coordinates in ElasticSearch to geopoints - elasticsearch

I am trying to convert latitude and longitude to geo_points in ElasticSearch. The problem is, I already have the data uploaded latitude and longitude values in elasticsearch but am having trouble converting them. I am getting the feeling that there is a solution using painless, but haven't quite pinpointed it.
This is what the mapping looks like
{
"temporary_index" : {
"mappings" : {
"handy" : {
"properties" : {
"CurrentLocationObj" : {
"properties" : {
"lat" : {
"type" : "float"
},
"lon" : {
"type" : "float"
}
}
},
"current_latitude" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"current_longitude" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"location" : {
"type" : "geo_point"
},
}
}
}
}
}
And this is what a sample doc looks like
"hits" : [
{
"_index" : "temporary_index",
"_type" : "handy",
"_id" : "9Q8ijmsBaU9mgS87_blD",
"_score" : 1.0,
"_source" : {
"current_longitude" : "139.7243101",
"current_latitude" : "35.6256271",
"CurrentLocationObj" : {
"lat" : 35.6256271,
"lon" : 139.7243101
}
There are obviously more fields, but I have removed them for the sake of clarity.
This is what I have tried.
POST temporary_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"inline": "ctx._source.location = [ctx._source.current_latitude, ctx._source.current_longitude]",
"lang": "painless"
}
}
However I get the following error:
"reason": "failed to parse field [location] of type [geo_point]",
"caused_by": {
"type": "parse_exception",
"reason": "unsupported symbol [.] in geohash [35.4428348]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "unsupported symbol [.] in geohash [35.4428348]"
}
}
I have used the following stackoverflow as a basis for my solution, but I am clearly doing something wrong. Any advice is helpful! Thanks.
Swapping coordinates of geopoints in elasticsearch index

Great start! You're almost there.
Just note that when specifying a geo_point as an array, the longitude must be the first element and the latitude comes next. However, I suggest you do it like this instead and it will work:
POST temporary_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"inline": "ctx._source.location = ['lat': ctx._source.current_latitude, 'lon': ctx._source.current_longitude]",
"lang": "painless"
}
}

Related

match_only_text fields do not support sorting and aggregations elasticsearch

I would like to count and sort the number of occurred message on a field of type match_only_text. Using a DSL query the output needed to have to look like this:
{" Text message 1":615
" Text message 2":568
....}
So i tried this on kibana:
GET my_index_name/_search?size=0
{
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message"
}
}
}
}
However i get this error:
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "match_only_text fields do not support sorting and aggregations"
}
I am interested in the field "message" this is its mapping:
"message" : {
"type" : "match_only_text"
}
This is a part of the index mapping:
"mappings" : {
"_meta" : {
"package" : {
"name" : "system"
},
"managed_by" : "ingest-manager",
"managed" : true
},
"_data_stream_timestamp" : {
"enabled" : true
},
"dynamic_templates" : [
{
"strings_as_keyword" : {
"match_mapping_type" : "string",
"mapping" : {
"ignore_above" : 1024,
"type" : "keyword"
}
}
}
],
"date_detection" : false,
"properties" : {
"#timestamp" : {
"type" : "date"
}
.
.
.
"message" : {
"type" : "match_only_text"
},
"process" : {
"properties" : {
"name" : {
"type" : "keyword",
"ignore_above" : 1024
},
"pid" : {
"type" : "long"
}
}
},
"system" : {
"properties" : {
"syslog" : {
"type" : "object"
}
}
}
}
}
}
}
Please Help
Yes, by design, match_only_text is of the text field type family, hence you cannot aggregate on it.
You need to:
A. create a message.keyword sub-field in your mapping of type keyword:
PUT my_index_name/_mapping
{
"properties": {
"message" : {
"type" : "match_only_text",
"fields": {
"keyword": {
"type" : "keyword"
}
}
}
}
}
B. update the whole index (using _update_by_query) so the sub-field gets populated and
POST my_index_name/_update_by_query?wait_for_completion=false
Then, depending on the size of your index, call GET _tasks?actions=*byquery&detailed regularly to check the progress of the task.
C. run the aggregation on that sub-field.
POST my_index_name/_search
{
"size": 0,
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "message.keyword"
}
}
}
}

How to split object (nested) into multiple columns in Elasticsearch / Kibana data table visualization

I have a nested object indexed in elasticsearch (7.10) and I need to visualize it with a kibana table. The problem is that kibana throws in the values from the nested field which have the same name in one column.
Part of the index:
{
"index" : {
"mappings" : {
"properties" : {
"data1" : {
"type" : "keyword"
},
"Details" : {
"type" : "nested",
"properties" : {
"Amount" : {
"type" : "float"
},
"Currency" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"DetailType" : {
"type" : "keyword"
},
"Price" : {
"type" : "float"
},
"Quantity" : {
"type" : "float"
},
"TotalAmount" : {
"type" : "float"
.......
The problem in the table:
How can I get three rows named Details each with one split term (e.g DetailType: "start_fee")?
Update:
I could query the nested object in the console:
GET _search
{
"query": {
"nested": {
"path": "Details",
"query": {
"bool": {
"must": [
{ "match": { "Details.DetailType": "energybased_fee" }}
]
}
},
"inner_hits": {
}
}}}
But how can I visualize in the table only the "inner_hits" value?

Elasticsearch: How to calculate the yield (percentage of success)?

My purpose is to calculate the yield of each benchId. Which means: For each bench, what is the percentage of team that have isPassed=True the first time they pass the test. I would like to have a visualization of each yield for each bench.
My Elasticsearch mapping is:
"test-logs" : {
"mappings" : {
"log" : {
"properties" : {
"benchGroup" : {
"type" : "keyword"
},
"benchId" : {
"type" : "keyword"
},
"date" : {
"type" : "date",
"format" : "yyyy/MM/dd HH:mm:ss"
},
"duration" : {
"type" : "float"
},
"finalStatus" : {
"type" : "keyword"
},
"isCss" : {
"type" : "boolean"
},
"isPassed" : {
"type" : "boolean"
},
"machine" : {
"type" : "keyword"
},
"sha1" : {
"type" : "keyword"
},
"uuid" : {
"type" : "keyword"
},
"team" : {
"type" : "keyword"
}
I tried to divide this issue in several sub-issues. I think I need to aggregate the documents by benchId then sub-aggregate them by team, ordering them by date then taking the first document. Then I think need to use a script to calculate isPassed=True/all first attemps.
No idea how to visualize the result on Kibana though.
I manage to create aggregations with this search:
GET _search
{
"size" : 0,
"aggs": {
"benchId": {
"terms": {
"field": "benchId"
},
"aggs": {
"teams": {
"terms": {
"script": "doc['uut'].join(' & ')",
"size": 10
}
}
}
}
}
}
I get the result I want but I have difficulties to include order by date ascending with limitation to one document by uut

Is there a way to Search through Elastic Search to get all results that have an ID contained in an array of IDs?

Been trying to find a way to do this for a couple days now. I've looked through 'bool', 'constant_score', 'filtered' queries none of which seem to be able to come up with the result I want.
One that HAS come close is the 'ids' query (does exactly what I described in the title of this questions) the one problem is that the key that I'm trying to search is not the '_id' value of the Elastic search index. Instead it is 'posterId' in the index below:
"_index": "activity",
"_type": "activity",
"_id": "<unique string id>",
"_score": null,
"_source": {
...
misc keys
...
"posterId": "<QUERY BASED ON THIS VALUE>",
"time": 20171007173623
}
Query that returns based on the _id value:
ids : {
type : "activity",
values : ["<unique string id>", ...]
}
as seen here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
How I want my query to work:
posterId : {
type : "activity",
values : [<list of posterIds>]
}
Returning all indicies that have posterIds contained in "<list of posterIds>"
< Edit > I'm trying to do this in one query as apposed to looping through each member of my list of posterIds because I also need to sort based on the time key and be able to page the query.
So, does anyone know of a built in query that does this or a work around?
Side note: if you feel like you're about to downvote this please just comment why, I'm about to be banned and I've read through all the guidelines and I feel like I'm following them but my questions rarely perform well. :( It would be much appreciated
Edit:
{
"activity" : {
"aliases" : { },
"mappings" : {
"activity" : {
"properties" : {
"-Kvp7f3epvW_dXSONzKj" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"actionType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"activityType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"attachedType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"cardType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"noteTitleDict" : {
"properties" : {
"noun" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"subject" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"posterId" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"segueType" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"time" : {
"type" : "long"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1507678305995",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "<id>",
"version" : {
"created" : "5010199"
},
"provided_name" : "activity"
}
}
}
}
I think what you are looking for is a Terms Query
{
"query": {
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
}
This finds documents which contain the exact term Kimchy or elasticsearch in the index of the user field. You can read more about this here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html
In your case you need to replace
the user with posterId.keyword
Kimchy and elasticsearch with all your posterIds
Keep in mind that a terms query is case sensitive and the keyword field does not use a lowercase analyzer (which means it'll save/index the value in the same case it was received)

Elasticsearch on object nested under objects array

Assuming I have the following index structure:
{
"title": "Early snow this year",
"body": "After a year with hardly any snow, this is going to be a serious winter",
"source": [
{
"name":"CNN",
"details": {
"site": "cnn.com"
}
},
{
"name":"BBC",
"details": {
"site": "bbc.com"
}
}
]
}
and I have a bool query to try and retrieve this document here:
{
"query": {
"bool" : {
"must" : {
"query_string" : {
"query" : "snow",
"fields" : ["title", "body"]
}
},
"filter": {
"bool": {
"must" : [
{ "term" : {"source.name" : "bbc"}},
{ "term" : {"source.details.site" : "BBC.COM"}}
]
}
}
}
}
}'
But it is not working with zero hits, how should I modify my query? It is only working if I remove the { "term" : {"source.details.site" : "BBC.COM"}}.
Here is the mapping:
{
"news" : {
"mappings" : {
"article" : {
"properties" : {
"body" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"properties" : {
"details" : {
"properties" : {
"site" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
You are doing a term query on "source.details.site". Term query means that the value you provide will not be analysed at query time. If you are using default mapping then source.details.site will be lowercased. Now when you query it with term and "BBC.COM", "BBC.COM" will not be analysed and ES is trying to match "BBC.COM" with "bbc.com" (because it was lowercased at index time) and result is false.
You can use match instead of term to get it analysed. But its better to use term query on your keyword field, it you know in advance the exact thing that would have been indexed. Term queries have good advantage of caching from ES side and it is faster than match queries.
You should clean your data at index time as you will write once and read always. So anything like "/", "http" should be removed if you are not losing the semantics. You can achieve this from your code while indexing or you can create custom analysers in your mapping. But do remember that custom analysers won't work on keyword field. So, if you try to achieve this on ES side, you wont be able to do aggregations on that field without enabling field data, that should be avoided. We have an experimental support for normalisers in latest update, but as it is experimental, don't use it in production. So in my opinion you should clean the data in your code.

Resources