JSON geo-point mapping with fields not matching Elasticsearch documentation format - elasticsearch

I'm fairly new to Elasticsearch, and I'm trying to visualize some JSON data with Kibana. The trouble I have is with the geo-point mapping format.
The JSON object containing the relevant location fields (lon/lat) looks like this:
"geoNetwork": {
"city": "Test City",
"cityId": "1234567",
"continent": "Americas",
"country": "Canada",
"latitude": "44.1234",
"longitude": "-63.6940",
"metro": "(not set)",
"networkDomain": "bellaliant.net",
"networkLocation": "bell aliant regional communications inc.",
"region": "Nova Scotia",
"subContinent": "Northern America"
},
This does not match geo-point format in the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html, first example), as longitude and latitude are not the only two keys in the geoNetwork (location) object.
Is there a way to define the geo-point mapping so that I can use the JSON as-is, or would I have to modify the JSON schema to have an object that matches one of the geo-point formats in the documentation?

Sadly what you are asking is impossible :(
Elasticsearch require a GeoPoint/GeoLocation/... for any geographic operation.
So from there, I would recommend updating your json with a new GeoPoint field.
If for some technical reason, this is not possible, while I haven't used them myself, I would lookup elasticsearch ingestion pipeline. It act similarly to a SQL trigger. You will be able to add new fields dynamically before the index process, meaning that you can create on insertion a new GeoPoint field from your latitude, longitude field.
GeoPoint: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html
Ingestion Pipeline: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html

Related

is there a elasticsearch standard solution to load recently changed relational data

I have following tables which have millions of records and they are changing frequently is there a way to load that data in elasticsearch (for eventual consistency ) with spring boot initially and incrementally?
Tables :
Employee
Role
contactmethod (Phone/email/mobile)
channel
department
Status
Address
Here the document will be like below
{
"id":1,
"name": "tom john",
"Contacts":[
{
"mobile":123,
"type":"MOBILE"
},
{
"phone":223333
"type":"PHONE"
}
]
"Address":[
{
"city": "New york"
"ZIP": 12343
"type":"PERMANENT"
},
{
"city": "New york"
"ZIP": 12343
"type":"TEMPORARY"
}
]
}
.. simillar data for ROLE,DEPT etc tables
]
How do I make sure that ev.g. mobile number of "tom john" changed in relational DB will be propagated to elasticsearch DB ?
You should have a background job in your application, which pulls the data from DB(you know when there is change in DB of-course), and based on what you need(filtering, massaging) reindex that in your Elasticsearch index.
or you can use the logstash with JDBC to keep your data in sync, please refer to elastic blog on how to do it.
The first one is a flexible and not out of the box solution, while the second one is out of the box solution, and there are pros and cons of both the approaches and choose what fits best in your use-case.

How does a JSON object gets tokenized and indexed in Elasticsearch

I recently started working on Elasticsearch and could not figure out how a JSON object gets tokenized and gets stored in the inverted index.
Consider below JSON has been inserted.
{
"city": "Seattle",
"state": "WA",
"location": {
"lat": "47.6062095",
"lon": "-122.3320708"
}
}
I can perform an URI search like this
GET /my_index/_search?q=city:seattle
This search would return the above document, but how does Elasticsearch could be able to search 'seattle' only in the 'city' field. If it tokenizes the complete JSON, all the keys and values would be separated, then how the mapping between key token and value token would be maintained.
Because the indexed tokens point to the original document, which is also stored.
Have a look at Inverted Index in the elastic docs.

What is the best way to map following unstructured data in elastic search?

I am trying to figure out what could be the best type and anlyzer for a field which has unstructured data.
request field could be of following and many other
{"_format":"json","follow":{"followee":27}} //nested objects
[{"q": "madhab"}] //array of objects
?q=madhab //string
i have tried making this field text with simple analyzer
"request": {
"type": "text",
"analyzer": "simple"
},
Plus: i wonder if there is any online tool which can help to visualize how elastic search tokenize the data with given analyzers, filters
Elastic search gives you an option to see how the text has been tokenized under various analyzers. You can use Kibana or any REST client to see the response for such request:
GET /_analyze
{
"analyzer": "standard",
"text": "Text to analyze"
}
https://www.elastic.co/guide/en/elasticsearch/guide/master/analysis-intro.html
This will give you fair idea what is missing in your schema w.r.t your queries.

Can't create a visualization in Kibana (no compatible fields) - but I have compatible fields

I'd appreciate any help with this, I'm really stuck.
I am trying to create a simple visualization in Kibana, a line graph based on a number value in my data (origin_file_size_bytes). When I try to add a Visualization graph, I get this error:
No Compatible Fields: The "test*" index pattern does not contain any of the following field types: number or date
My actual index does contain a field with number, as does my data.
Thank you for any help!
Andrew
Here's a sample entry from the Discover Menu:
{
"_index": "lambda-index",
"_type": "lambda-type",
"_id": "LC08_L1TP_166077.TIF",
"_version": 1,
"_score": 2,
"_source": {.
"metadata_processed": {
"BOOL": true.
},
"origin_file_name": {
"S": "LC08_L1TP_166077.TIF"
},
"origin_file_size_bytes": {
"N": "61667800"
}
}
}
My Index pattern classifies as a string, even though it isn't:
origin_file_size_bytes.N string
You cannot aggregate on a string field. As seen from the screenshot above, your field has been indexed as string and NOT as a number. Elasticsearch dynamically determines mapping type of data if it is not explicitly defined. Since, you ingested the field as a string ES determined, correctly, that the field is of type string. See this link.
For ex. if you run the below to index a document with 2 fields as shown without an explicit mapping, ES creates message field as type 'string' and size field as type 'number' (long)
POST my_index\_doc\1
{
"message": "100",
"size": 100
}
Index your field into ES as a number instead and you should able to aggregate on it.

Timestamps and documents are not detected in Kibana

I am using Elasticsearch for a while, and wanted to visualize the data with Kibana. Since I have time-series data, I created a---from my point of view---suitable timestamp field in the corresponding index. The relevant part of the index mappings is as follows:
[..]
"properties": {
"#timestamp": {
"enabled" : true,
"type":"date",
"format": "date_hour_minute_second_millis",
"store": true,
"path": "#timestamp"
},
[..]
I have played around with the "format" field-value, because I want to visualize data having millisecond resolution. Ideally, I would just like to use the raw timestamp from my application (i.e. Unix epoch time, fractional in seconds), but I couldn't get Kibana to detect that format. Currently, I am posting data as follows:
{
"#timestamp": "2015-03-10T14:37:42.644",
"name": "some counter",
"value": 91.76
}
Kibana detects the #timestamp field as a timestamp, but then tells me that it cannot find any documents stored having that field (which is not true):
This field is present in your elasticsearch mapping but not in any documents in the search results. You may still be able to visualize or search on it.
I should note that previously, I used "dateOptionalTime" as format for the timestamp, and everything was working fine in Kibana using "simple" timestamps. I need, however, to switch to milliseconds now.
Cheers!
I was struggling with this highly ambiguous problem as well. No matter how I changed the mapping, Kibana simply would not display certain fields.
Then I found out it has something to do with JVM Heap size and Kibana being unable to display every field 'in-memory'.
Setting "doc_value" to true for those fields during mapping fixed the issue.
In the Java API, I'm building the following Mapping that includes the doc_values option
XContentFactory.jsonBuilder()
.startObject()
.startObject("#memory-available")
.field("type", "integer")
.field("index","not_analyzed")
.field("doc_values", true)
.endObject()
.endObject();
Read More here : Doc Values in ES

Resources