Elasticsearch custom mapping definition - elasticsearch

I have to upload data to elk in the following format:
{
"location":{
"timestamp":1522751098000,
"resources":[
{
"resource":{
"name":"Node1"
},
"probability":0.1
},
{
"resource":{
"name":"Node2"
},
"probability":0.01
}]
}
}
I'm trying to define a mapping this kind of data and I produced he following mapping:
{
"mappings": {
"doc": {
"properties": {
"location": {
"properties" : {
"timestamp": {"type": "date"},
"resources": []
}
}
}
}
}
I have 2 questions:
how can I define the "resources" array in my mapping?
is it possible to define a custom type (e.g. resource) and use this type in my mapping (e.g "resources": [{type:resource}]) ?

There is a lot of things to know about the Elasticsearch mapping. I really highly suggest to read through at least some of their documentation.
Short answers first, in case you don't care:
Elasticsearch automatically allows storing one or multiple values of defined objects, there is no need to specify an array. See Marker 1 or refer to their documentation on array types.
I don't think there is. Since Elasticsearch 6 only 1 type per index is allowed. Nested objects is probably the closest, but you define them in the same file. Nested objects are stored in a separate index (internally).
Long answer and some thoughts
Take a look at the following mapping:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resources": { [1]
"type": "nested", [2]
"properties": {
"resource": {
"properties": {
"name": { [3]
"type": "text"
}
}
},
"probability": {
"type": "float"
}
}
}
}
}
}
}
}
This is how your mapping could look like. It can be done differently, but I think it makes sense this way - maybe except marker 3. I'll come to these right now:
Marker 1: If you define a field, you usually give it a type. I defined resources as a nested type, but your timestamp is of type date. Elasticsearch automatically allows storing one or multiple values of these objects. timestamp could actually also contain an array of dates, there is no need to specify an array.
Marker 2: I defined resources as a nested type, but it could also be an object like resource a little below (where no type is given). Read about nested objects here. In the end I don't know what your queries would look like, so not sure if you really need the nested type.
Marker 3: I want to address two things here. First, I want to mention again that resource is defined as a normal object with property name. You could do that for resources as well.
Second thing is more a thought-provoking impulse: Don't take it too seriously if something absolutely doesn't fit your case. Just take it as an opinion.
This mapping structure looks very inspired by a relational database approach. I think you usually want to define document structures for elasticsearch more for the expected searches. Redundancy is not a problem, but nested objects can make your queries complicated. I think I would omit the whole resources part and do it something like this:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resource": {
"properties": {
"resourceName": {
"type": "text"
}
"resourceProbability": {
"type": "float"
}
}
}
}
}
}
}
}
Because as I said, in this case resource can contain an array of objects, each having a resourceName and a resourceProbability.

Related

Can anyone help me - how to use arrays in opensearch?

I put an object with some field and i wanna figure out how to mapping the index to handle and show the values like elasticsearch. I dunno why opensearch separate to individual fields the values. Both app has the same index mappings but the display is different for something.
I tried to map the object type set to nested but nothing changes
PUT test
{
"mappings": {
"properties": {
"szemelyek": {
"type": "nested",
"properties": {
"szam": {
"type": "integer"
},
"nev": {
"type": "text"
}
}
}
}
}
}

What's the right way to perform a keyword search?

If I want to perform a keyword search using a TermQuery, what's the proper way to do this? Am I supposed to prepend ".keyword" to my field name? I would think there is a more first-class citizen way of doing it! 🤷‍♂️
QueryBuilders.termQuery(SOME_FIELD_NAME + ".keyword", someValue)
It all boils down to your mapping. If your field is mapped as a 'straightforward' keyword like so
{
"mappings": {
"properties": {
"some_field": {
"type": "keyword"
}
}
}
}
you won't need to append .keyword -- you'd do just
QueryBuilders.termQuery(SOME_FIELD_NAME, someValue)
It's good practice, though, not to restrict yourself to only keywords, esp. if you'll be doing partial matches, expansions, autocomplete etc down the line.
A typical text field mapping would look like
PUT kwds
{
"mappings": {
"properties": {
"some_field": {
"type": "text",
"fields": {
"keyword": { <---
"type": "keyword"
},
"analyzed": { <---
"type": "text",
"analyzer": "simple"
},
"...": { <---
...
}
}
}
}
}
}
This means you'd be able to access differently-indexed "versions" (fields) of the same "property" (field). The naming is rather confusing but you get the gist.
Long story short, this is where the .keyword convention stems from. You don't need it if your field is already mapped as a keyword.

Elasticsearch: Is it possible to reference properties in mappings?

I have a simple mapping in elasticsearch-6, like this.
{
"mappings": {
"_doc": {
"properties": {
"#timestamp": {
"type": "date"
},
"fields": {
"properties": {
"meta": {
"properties": {
"task": {
"properties": {
"field1": {
"type": "keyword"
},
"field2": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
}
}
Now I have to add another property to it - tasks which is just an array of the task property already defined.
Is there a way to reference the properties of task so that I don't have to duplicate all the properties? Something like:
{
"fields": {
"properties": {
"meta": {
"properties": {
"tasks": {
"type": "nested",
"properties": "fields.properties.meta.properties.task"
},
"task": {
...
}
}
}
}
}
}
you can already use your task field as an array of task objects, only, you cannot query them independently. If your goal is to achieve this (as I assume from your second example), I would directly set the "nested" data type into the mapping of the task field - then, yes, you'll need to reindex.
I can't imagine a use case where you would need the same array of objects duplicated in two fields, with one nested and the other not.
EDIT
Below, some considerations/suggestions based on the discussion in the comments:
One field can have either one value or an array of values. In your case, your task field can have either one task object or an array of task objects. You should only care about setting the "nested" datatype for task, if you plan to query its objects independently (of course, if they are more than one)
I would suggest to design your documents in such a way to avoid duplicated information in the first place. Duplicated information will make your documents bigger and more complex to process, leading to greater storage requirements and slower queries
If it's not possible to redesign your document mapping, you might check whether alias datatypes can help you avoiding some repetitions.
If it's not possible to redesign your document mapping, you might check whether dynamic templates can help you avoiding some repetitions

Create new Index Mapping error

When I create an index with mapping like this one, what does it mean the _template/ word? what does the _ mean? I ask your help to understand more about creating an index, are they stored in a kind of folder, like template/packets folder?
PUT _template/packets
{
"template": "packets-*",
"mappings": {
"pcap_file": {
"dynamic": "false",
"properties": {
"timestamp": {
"type": "date"
},
"layers": {
"properties": {
"frame": {
"properties": {
"frame_frame_len": {
"type": "long"
},
"frame_frame_protocols": {
"type": "keyword"
}
}
},
"ip": {
"properties": {
"ip_ip_src": {
"type": "ip"
},
"ip_ip_dst": {
"type": "ip"
}
}
},
"udp": {
"properties": {
"udp_udp_srcport": {
"type": "integer"
},
"udp_udp_dstport": {
"type": "integer"
}
}
}
}
}
}
}
}
}
I ask this because after typing this, I recieve he following error
! Deprecation: Deprecated field [template] used, replaced by [index_patterns]
{
"acknowledged": true
}
I copied the pattern from this link:
https://www.elastic.co/blog/analyzing-network-packets-with-wireshark-elasticsearch-and-kibana
And I'm trying to do exactly what is taught in the link, and I already can capture files with tshark and parse copy them into a packets.json file, and I will use filebeat to transfer the data to Elasticsearch, I already uploaded some data to Elasticsearch, but it wasn't indexed correctly, I just saw a lot of information with a lot of data.
My aim is to inderstand exactly how to create a new index pattern, and also how to relate what I upload to that index.
Thank you very much.
Just replace word template with index_patterns:
PUT _template/packets
{
"index_patterns": ["packets-*"],
"mappings": {
...
Index templates allow you to define templates that will automatically be applied when new indices are created.
After version 5.6 the format of Elasticsearch index templates has changed; the template field, which was used to specify one or more patterns for matching index names that would use the template at create time, was deprecated and superseded by the more appropriately named field index_patterns which works exactly the same way.
To solve the issue and get rid of the deprecation warnings you will have to update all your pre-6.0 index templates, changing the template to index_patterns.
You can list all your index templates by running this command:
curl -XGET 'http://localhost:9200/_template/*?pretty'
Or replace the asterisk with the name of one specific index template.
More about ES templates is here.

How to map dynamic field value in elasticsearch?

I'm mapping a couchbase gateway document and I'd like to tell elasticsearch to avoid indexing the internal attributes added by the gateway like the "_sync", this object contains another object named "channels" which has the following form:
"channels": {
"i7de5558-32ad-48ca-bf91-858c3a1e4588": 12
}
So I guess the mapping of this object would be like:
"channels": {
"type": "object",
"properties": {
"i7de5558-32ad-48ca-bf91-858c3a1e4588": {
"type": "integer",
"index": "not_analyze"
}
}
}
The problem is that the keys are always changing, so I don't know if I should use a wildcard like this "*": {"type": "integer", "index": "not_analyze"} for this property or do something else.
Any advice please?
If the fields are of integer types, you don't have to provide them explicitly in the mapping. You can create an empty mapping ,index documents with these fields. Elasticsearch will infer the type of field and update the mapping dynamically. You can also use dynamic templates for this.
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"analysed_string_template": {
"path_match": "channels.*",
"mapping": {
"type": "integer"
}
}
}
]
}
}
}
There`s a dynamic way to do that as you need, is called dynamic template
Using templates you are able to create rules like this:
PUT /my_index
{
"mappings": {
"my_type": {
"date_detection": false
}
}
}
In your case you could create a template to set all news fields inside the channel object as not_analyzed.
Hope it will help

Resources