Elasticsearch - ignore fields from indexing document - elasticsearch

I have simple question. I'm indexing JSON files that contains several fields into Elasticsearch. How to say ElasticSearch to ignore (don't index and store them at all) group of defined fields from the incoming file OR to work only with defined group of fields? Is this possible by using the mapping?
For example: I have JSON files like this:
{
"id": 123456789,
"name": "Name value",
"screenName": "Nick name",
"location": "Location value",
"description": "Description text",
"url": "url value",
"anotherField": 456789,
"status": null,
"anotherField2": "9AE4E8",
"color": "333333",
.......
}
And now I want to Elasticsearch works only with fields (for example) "id;name;description;location;status;url" and ingnore other fields.
Any help? Thanks.

When you serialize the data into a json using a DTO(POJO), you can mark the fields which you dont want to index using #Jsonignore annotation if you are using Jackson serializer/desrializer.
eg: #JsonIgnore
private Date creationDate;
package com.fasterxml.jackson.annotation;

Related

Expanding events from the list in a JSON object inside the Elastic Search pipeline and inserting them as a separate document in the elastic index

Hello experts in Elastic Search
We need to customize our Elastic Ingest Pipeline.
We are facing a challenge and want to know if there is a way of expanding events from the list in a JSON object inside the ingest pipeline and inserting them as a separate document in the elastic index.
Giving the following Input inside the elastic:
{"#timestamp": "2022-06-16T04:06:03.064Z", "message": "{ "name":"name #1", "logEvents":[{"key": "value #1"},{"key": "value #2"}]}"}
{"#timestamp": "2022-06-16T04:06:13.888Z", "message": "{ "name":"name #2", "logEvents":[{"key": "value #3"},{"key": "value #4"}]}"}
We want to iterate for each event inside the list of logEvents, be able to customize the data, and insert each event as an individual document inside elastic.
This is the Output we want to have after passing throw the pipeline processors:
{"#timestamp": "2022-06-16T04:06:21.105Z", "message": "{"name":"name #1", "key": "value #1"}"}
{"#timestamp": "2022-06-16T04:06:27.204Z", "message": "{"name":"name #1", "key": "value #2"}"}
{"#timestamp": "2022-06-16T04:06:31.154Z", "message": "{"name":"name #2", "key": "value #3"}"}
{"#timestamp": "2022-06-16T04:06:36.189Z", "message": "{"name":"name #2", "key": "value #4"}"}
Essentially we want to achieve this functionality inside Elastic without using Lambda.
https://github.com/elastic/elastic-serverless-forwarder/blob/main/docs/README-AWS.md#expanding-events-from-list-in-json-object
Is that possible?
Thank You in advanced
I appreciate your help and Dedication.
The Logstash split filter should work for your case (after decoding the JSON message inside the message field):
filter {
json {
source => "message"
}
split {
field => "logEvents"
}
}

Streamsets Data Collector: Replace a Field With Its Child Value

I have a data structure like this
{
"id": 926267,
"updated_sequence": 2304899,
"published_at": {
"unix": 1589574240,
"text": "2020-05-15 21:24:00 +0100",
"iso_8601": "2020-05-15T20:24:00Z"
},
"updated_at": {
"unix": 1589574438,
"text": "2020-05-15 21:27:18 +0100",
"iso_8601": "2020-05-15T20:27:18Z"
},
}
I want to replace the updated_at field with its unix field value using Streamsets Data Collector. As far as I know, it can be done using field replacer. But I'm still didn't get it how to make a mapping expression. How can I achieve that?
In Field Replacer, set Fields to /rec/updated_at and New value to ${record:value('/rec/updated_at/unix')} and it will replace the value. See below.
Cheers,
Dash

Use Kafka Connect to update Elasticsearch field on existing document instead of creating new

I have Kafka set-up running with the Elasticsearch connector and I am successfully indexing new documents into an ES index based on the incoming messages on a particular topic.
However, based on incoming messages on another topic, I need to append data to a field on a specific document in the same index.
Psuedo-schema below:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": []
}
^ This document is being created fine in ES based on the data in the topic mentioned above.
However, how do I then add items to the views field using messages from another topic. Like so:
article-view topic schema:
{
"article_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"user_id": 123456,
"timestamp: 136389734
}
and instead of simply creating a new document on the article-view index (which I dont' want to even have). It appends this to the views field on the article document with corresponding _id equal to article_id from the message.
so the end result after one message would be:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": [
{
"user_id": 123456,
"timestamp: 136389734
}
]
}
Using the ES API it is possible using a script. Like so:
{
"script": {
"lang": "painless",
"params": {
"newItems": [{
"timestamp": 136389734,
"user_id": 123456
}]
},
"source": "ctx._source.views.addAll(params.newItems)"
}
}
I can generate scripts like above dynamically in bulk, and then use the helpers.bulk function in the ES Python library to bulk update documents this way.
Is this possible with Kafka Connect / Elasticsearch? I haven't found any documentation on Confluent's website to explain how to do this.
It seems like a fairly standard requirement and an obvious thing people would need to do with Kafka / A sink connector like ES.
Thanks!
Edit: Partial updates are possible with write.method=upsert (src)
The Elasticsearch connector doesn't support this. You can update documents in-place but need to send the full document, not a delta for appending which I think it what you're after.

How can you sort documents by nested object in elastic search using go?

I'm new to Elasticsearch.
I have documents and each of them has a structure like this:
{
"created_at": "2018-01-01 01:01:01",
"student": {
"first_name": "john",
"last_name": "doe"
},
"parent": {
"first_name": "susan",
"last_name": "smile"
}
}
I just want to sort those documents based on student.first_name using olivere/elastic package for go.
This is my query at the moment:
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("student.first_name").Asc())
and I'm getting this error:
elastic: Error 400 (Bad Request): all shards failed
[type=search_phase_execution_exception]
However when I tried sorting it by created_at, it's working.
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("created_at").Asc())
I don't have any mapping in the index. (is this the problem?)
I tried searching for something like "Elasticsearch sort by nested object", but I always got questions that need to sort an array in the nested object.
It turns out that this is a beginner mistake.. You can't sort by text fields. I got it from here elasticsearch-dsl-py Sorting by Text() field
What you can do though, if you don't specify mappings, you can sort by the keyword property of the field.
searchSvc = searchSvc.SortBy(elastic.NewFieldSort("student.first_name.keyword").Asc())
And it works!

JSON geo-point mapping with fields not matching Elasticsearch documentation format

I'm fairly new to Elasticsearch, and I'm trying to visualize some JSON data with Kibana. The trouble I have is with the geo-point mapping format.
The JSON object containing the relevant location fields (lon/lat) looks like this:
"geoNetwork": {
"city": "Test City",
"cityId": "1234567",
"continent": "Americas",
"country": "Canada",
"latitude": "44.1234",
"longitude": "-63.6940",
"metro": "(not set)",
"networkDomain": "bellaliant.net",
"networkLocation": "bell aliant regional communications inc.",
"region": "Nova Scotia",
"subContinent": "Northern America"
},
This does not match geo-point format in the Elasticsearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html, first example), as longitude and latitude are not the only two keys in the geoNetwork (location) object.
Is there a way to define the geo-point mapping so that I can use the JSON as-is, or would I have to modify the JSON schema to have an object that matches one of the geo-point formats in the documentation?
Sadly what you are asking is impossible :(
Elasticsearch require a GeoPoint/GeoLocation/... for any geographic operation.
So from there, I would recommend updating your json with a new GeoPoint field.
If for some technical reason, this is not possible, while I haven't used them myself, I would lookup elasticsearch ingestion pipeline. It act similarly to a SQL trigger. You will be able to add new fields dynamically before the index process, meaning that you can create on insertion a new GeoPoint field from your latitude, longitude field.
GeoPoint: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html
Ingestion Pipeline: https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html

Resources