Nest Aggregation with dynamic fields - elasticsearch - elasticsearch

Is it possible to use nest to create buckets with keywords/fields that are not strongly typed?
Due to the nature of this project. I do not have any root objects to pass in.
Below is an example.
var result = client.Search<PortalDoc>(s => s
.Aggregations(a => a
.Terms("agg_objecttype", t => t.Field(l => "CUSTOM_FIELD_HERE"))
)
);

string implicitly convert to Field, so you can pass a string for any field name
var result = client.Search<PortalDoc>(s => s
.Aggregations(a => a
.Terms("agg_objecttype", t => t
.Field("CUSTOM_FIELD_HERE")
)
)
);

Yes, something like that is possible. Looks here for my solution using nested fields. It allows to do all the operations on "dynamic" fields, but with somewhat greater efforts (nested fields are tougher to manipulate with). The gist has some proofs for search, but I implemented aggregations as well.
curl -XPOST localhost:9200/something -d '{
"mappings" : {
"live" : {
"_source" : { "enabled" : true },
"dynamic" : false,
"properties" : {
"customFields" : {
"type" : "nested",
"properties" : {
"fieldName" : { "type" : "string", "index" : "not_analyzed" },
"stringValue": {
"type" : "string",
"fields" : {
"raw" : { "type" : "string", "index" : "not_analyzed" }
}
},
"integerValue": { "type" : "long" },
"floatValue": { "type" : "double" },
"datetimeValue": { "type" : "date", "format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd" },
"booleanValue": { "type" : "boolean" }
}
}
}
}
}
}'
Search should be done in the same nested query using AND, and aggregations should be done in nested aggregation.
I made it for dynamic fields, but it probably can be adjusted for something else. I doubt there can be more flexibility with searchable/aggregatable fields due to principles how indices work.

Related

How to add default values while adding a new field in existing mapping in elasticsearch

This is my existing mapping in elastic search for one of the child document
sessions" : {
"_routing" : {
"required" : true
},
"properties" : {
"operatingSystem" : {
"index" : "not_analyzed",
"type" : "string"
},
"eventDate" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"durations" : {
"type" : "integer"
},
"manufacturer" : {
"index" : "not_analyzed",
"type" : "string"
},
"deviceModel" : {
"index" : "not_analyzed",
"type" : "string"
},
"applicationId" : {
"type" : "integer"
},
"deviceId" : {
"type" : "string"
}
},
"_parent" : {
"type" : "userinfo"
}
}
in above mapping "durations" field is an integer array. I need to update the existing mapping by adding a new field called "durationCount" whose default value should be the size of durations array.
PUT sessions/_mapping
{
"properties" : {
"sessionCount" : {
"type" : "integer"
}
}
}
using above mapping I am able to update the existing mapping but I am not able to figure out how to assign a value ( which would vary for each session document like it should be durations array size ) while updating the mapping. any ideas ?
Well 2 recommendations here -
Instead of adding default value , you can adjust it in the query using missing filter. Lets say , you want to search based on a match query - Instead of just match query , use a bool query with should clause having the match and missing filter. inside filtered query. This way , those documents which did not have the field is also accounted.
If you absolutely need the value in that field for existing documents , you need to reindex the whole set of documents. Or , use the out of box plugin , update by query -

How to search on a URL exactly in ElasticSearch / Kibana

I have imported an IIS log file and the data has moved through Logstash (1.4.2), into ElasticSearch (1.3.1) and then being displayed in Kibana.
My filter section is as follows:
filter {
grok {
match =>
["message" , "%{TIMESTAMP_ISO8601:iisTimestamp} %{IP:serverIP} %{WORD:method} %{URIPATH:uri} - %{NUMBER:port} - %{IP:clientIP} - %{NUMBER:status} %{NUMBER:subStatus} %{NUMBER:win32Status} %{NUMBER:timeTaken}"]
}
}
When using a Terms panel in Kibana, and using "uri" (one of my captured fields from Logstash), it is matching the tokens within the URI. Therefore it is matching items like:
'Scripts'
'/'
'EN
Q: How do I display the 'Top URLs' in their full form?
Q: How do I inform ElasticSearch that the field is 'not_analysed'. I don't mind having 2 fields, for example:
uri - The tokenized URI
uri.raw - the fully formed URL.
Can this be done Logstash side, or is this a mapping that needs to be set up in ElasticSearch?
Mapping is as follows :
//http://localhost:9200/iislog-2014.10.09/_mapping?pretty
{
"iislog-2014.10.09" : {
"mappings" : {
"iislogs" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"#version" : {
"type" : "string"
},
"clientIP" : {
"type" : "string"
},
"device" : {
"type" : "string"
},
"host" : {
"type" : "string"
},
"id" : {
"type" : "string"
},
"iisTimestamp" : {
"type" : "string"
},
"logFilePath" : {
"type" : "string"
},
"message" : {
"type" : "string"
},
"method" : {
"type" : "string"
},
"name" : {
"type" : "string"
},
"os" : {
"type" : "string"
},
"os_name" : {
"type" : "string"
},
"port" : {
"type" : "string"
},
"serverIP" : {
"type" : "string"
},
"status" : {
"type" : "string"
},
"subStatus" : {
"type" : "string"
},
"tags" : {
"type" : "string"
},
"timeTaken" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"uri" : {
"type" : "string"
},
"win32Status" : {
"type" : "string"
}
}
}
}
}
}
In your Elasticsearch mapping:
url: {
type: "string",
index: "not_analyzed"
}
The problem is that the iislog- is not compliant with the logstash- format, and hence did not pick up the template:
My index format was iislog-YYYY.MM.dd, this did not use the out-of-the-box mappings by Logstash. When using the logstash- index format, Logstash will create 2 pairs of fields for strings. For example uri is:
uri (appears in Kibana)
uri.raw (does not appear in Kibana)
Note that the uri.raw will not appear in Kibana - but it is queryable.
So the solution to use an alternative index is to:
Don't bother! Use the default index format of logstash-%{+YYYY.MM.dd}
Add a "type" to the file input to help you filter the correct logs in Kibana (whilst using the logstash- index format)
input {
file {
type => "iislog"
....
}
}
Apply filtering in Kibana based in the type
OR
If you really really do want a different index format:
Copy the default configuration file to a new file, say iislog-template.json
Reference the configuration file in the output ==> elasticsearch like this:
output {
elasticsearch_http {
host => localhost
template_name => "iislog-template.json"
template => "<path to template>"
index => "iislog-%{+YYYY.MM.dd}"
}
}

Specify Routing on Index Alias's Term Lookup Filter

I am using Logstash, ElasticSearch and Kibana to allow multiple users to log in and view the log data they have forwarded. I have created index aliases for each user. These restrict their results to contain only their own data.
I'd like to assign users to groups, and allow users to view data for the computers in their group. I created a parent-child relationship between the groups and the users, and I created a term lookup filter on the alias.
My problem is, I receive a RoutingMissingException when I try to apply the alias.
Is there a way to specify the routing for the term lookup filter? How can I lookup terms on a parent document?
I posted the mapping and alias below, but a full gist recreation is available at this link.
curl -XPUT 'http://localhost:9200/accesscontrol/' -d '{
"mappings" : {
"group" : {
"properties" : {
"name" : { "type" : "string" },
"hosts" : { "type" : "string" }
}
},
"user" : {
"_parent" : { "type" : "group" },
"_routing" : { "required" : true, "path" : "group_id" },
"properties" : {
"name" : { "type" : "string" },
"group_id" : { "type" : "string" }
}
}
}
}'
# Create the logstash alias for cvializ
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
{ "remove" : { "index" : "logstash-2014.04.25", "alias" : "cvializ-logstash-2014.04.25" } },
{
"add" : {
"index" : "logstash-2014.04.25",
"alias" : "cvializ-logstash-2014.04.25",
"routing" : "intern",
"filter": {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "user",
"id" : "cvializ",
"path" : "group.hosts"
},
"_cache_key" : "cvializ_hosts"
}
}
}
}
]
}'
In attempting to find a workaround for this error, I submitted a bug to the ElasticSearch team, and received an answer from them. It was a bug in ElasticSearch where the filter is applied before the dynamic mapping, causing some erroneous output. I've included their workaround below:
PUT /accesscontrol/group/admin
{
"name" : "admin",
"hosts" : ["computer1","computer2","computer3"]
}
PUT /_template/admin_group
{
"template" : "logstash-*",
"aliases" : {
"template-admin-{index}" : {
"filter" : {
"terms" : {
"host" : {
"index" : "accesscontrol",
"type" : "group",
"id" : "admin",
"path" : "hosts"
}
}
}
}
},
"mappings": {
"example" : {
"properties": {
"host" : {
"type" : "string"
}
}
}
}
}
POST /logstash-2014.05.09/example/1
{
"message":"my sample data",
"#version":"1",
"#timestamp":"2014-05-09T16:25:45.613Z",
"type":"example",
"host":"computer1"
}
GET /template-admin-logstash-2014.05.09/_search

Elasticsearch not storing field, what am I doing wrong?

I have something like the following template in my Elasticsearch. I just want certain part of the data returned, so I turn the source off, and explicitly stated store for the fields I want.
{
"template_1" : {
"order" : 20,
"template" : "test*",
"settings" : { },
"mappings" : {
"_default_" : {
"_source" : {
"enabled" : false
}
},
"type_1" : {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}
}
}
However, when I query the data, I don't get the fields back. The query works, however, if I enable the _source field. I am just starting with Elasticsearch, so I am not quite sure what I am doing wrong. Any help would be appreciated.
Field definitions should be wrapped in properties section of your mapping:
"type_1" : {
"properties": {
"mydata" :
"store" : "yes",
"type" : "string"
}
}
}

In elasticsearch, how important is it to fully define a mapping during mapping-creation?

I am creating a mapping like this
"institution" : {
"properties" : {
"InstitutionCode" : {
"type" : "string",
"store" : "yes"
},
"InstitutionID" : {
"type" : "integer",
"store" : "yes"
},
"Name" : {
"type" : "string",
"store" : "yes"
}
}
}
However, when I perform actual indexing operations for institutions, I am adding an Alias property (0 or more aliases per institution)
"institution" : {
"properties" : {
"Aliases" : {
"dynamic" : "true",
"properties" : {
"InstitutionAlias" : {
"type" : "string"
},
"InstitutionAliasTypeID" : {
"type" : "long"
}
}
},
"InstitutionCode" : {
"type" : "string",
"store" : "yes"
},
"InstitutionID" : {
"type" : "integer",
"store" : "yes"
},
"Name" : {
"type" : "string",
"store" : "yes"
}
}
}
This is actually a simplified example, as I am actually adding more fields than just Aliases during the actual indexing of records.
How important is it to to fully define a mapping during mapping-creation?
Am I going to suffer any penalties by having the mapping automatically adjusted during indexing operations due to the indexing of institution records with additional properties? I expect institutions to gain additional properties over time and I wonder if I need to maintain the mapping-creation code in addition to the institution-indexing code.
I believe the overhead of dynamic mapping is fairly negligible...using them won't hurt indexing speed. However, you can run into some unexpected situations where ElasticSearch auto-detects a field type incorrectly.
A common example is detecting an integer because the first example of a field is a number ("25"), when in reality the rest of the data for that field is a string. Or seeing an integer when the rest of the data is actually a float. Etc etc.
If your data is well standardized that isn't much of a problem.
Alternatively, you can use dynamic templates to apply mappings to new fields based on a regex pattern.

Resources