Elasticsearch - List all sources sending messages to ES

Elasticsearch - List all sources sending messages to ES - elasticsearch

I am trying to get a list which shows me all sources ES is receiving messages from. I am pretty new with this topic and trying to get deeper into it. I am searching basically for a solution to see the total amount of sources sending logs to my central logging solution and in best case also provided my a list with the source names.
Does anyone have an idea how to get such information querying Elasticsearch?

Yes, this is possible, though the solution depends on how your data looks.
Users typically index data in Elasticsearch so that it contains more than just the raw log lines. This is done automatically if you're using Filebeat. Otherwise, you'd do something (add a field using Logstash, rely on a host field in syslog, etc) to ensure you have a field that contains your "source" identifier:
{
"message": "my super valuable logline",
"source": "my_kinda_awesome_app"
}
given ^^ you can identify all sources (and record counts!) with a terms aggregation like:
{
"aggs": {
"my_sources": {
"terms": { "field": "source" }
}
}
}
Kibana makes this all easier since you don't need to know/write ES queries and can do stuff visually.

Related

how to copy id field during indexing (elasticsearch)

It's often useful to have the _id as a part of the document. In fact it's advised here: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
But if you do not know the _id prior to document creation, how would you duplicate the _id during indexing? The only way I can think of doing it is using a pipeline but is there a simpler way?
Edit: according to answer below even a pipeline cannot achieve this.

Ingest pipelines (current version 7.9.2) cannot access the _id if the _id is generated. There is a note in the documentation saying:
If you automatically generate document IDs, you cannot use the {{_id}} value in an ingest processor. Elasticsearch assigns auto-generated _id values after ingest.
The copy_to feature also don't work for _id when auto generated. This Information is a little bit hidden here https://github.com/elastic/elasticsearch/issues/6730#issuecomment-103142553
Queries with script_fieldsusing doc['_id'].value is deprecated too.
It seems to me that this is what many of us are looking for, for different reasons, but there is no solution at least I am aware of.
The case is obviously complete different for self generated document id.

In case someone still looking for Solution to this issue
You can do a reindexing with script tag and use the context object to get grab of the _id and matched it the ID in the POCO
POST /_reindex?wait_for_completion=false
{
"source": {
"index": "data.dataitems",
"query": {
"match_all": {}
}
},
"dest": {
"index": "data.dataitems_new_index_with_id"
},"script": {
"source": "ctx._source.id = ctx._id"
}
}

Elasticsearch Dynamic Field Mapping and JSON Dot Notation

I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
The problem is that a few other applications deployed to the cluster have labels of the following format:
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.
The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:
object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value
The opposite situation, posting the namespaced label second, results in this exception:
Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].
What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.
My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.
Any help here would be appreciated.

I opted to use the Logstash mutate filter with the rename option as described here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
The end result looked something like this:
filter {
mutate {
'[kubernetes][labels][app]' => '[kubernetes][labels][app.kubernetes.io/name]'
'[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
}
}

Although personally I've never encountered the exact same issue, I had similar problems when I indexed some test data and afterwards changed the structure of the document that should have been indexed (especially when "unflattening" data structures).
Your interpretation of the error message is correct. When you first index the document
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
Elasticsearch will recognize the app as an object/structure due to dynamic mapping.
When you then try to index the document
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
the previously, dynamically created mapping defined the field app as an object with sub-fields but elasticsearch encounters a concrete value, namely "application-1".
I suggest that you setup an index template to define the correct mappings. For the 'outdated' logging-versions I suggest to pre-process the particular documents either through an elasticsearch ingest-pipeline or with e.g. Logstash to get the documents in the correct format.
Hope that helps.

Elasticsearch: Fields necessary for map visualization on Kibana

I want to make use of the geoip logstash plugin to get geolocation info about some IP addresses seen in my logs;
I also want to be able to visualize such info on kibana;
I am going through a short overview of the process;
What the tutorial does not mention, is what are the geoip.* fields necessary for producing the map visualizations;
I want to keep only the strictly necessary fields and discard the rest;
Will keeping only geoip.longtitute and geoip.latitude do the job?
edit: At this point in time I am just using
{ geoip { source => "my_incoming_ip" } }
in my logstash filter;

It turns out the following field is necessary for producing the map visualization
geoip.location {
"lat": 38.7163,
"lon": -78.1704
}
The others can be ommited (i.e. mutate/remove)

Elasticsearch NEST: ordering terms aggregation with multiple criteria

Using NEST, I need to be able to order a terms aggregation with multiple criteria (requires ElasticSearch 1.5 or later). For example:
"order": [{"avg_rank": "desc"}, {"avg_score": "desc"}]
This is working great using the raw JSON that I created to verify I was getting the expected behavior. Now, in trying to translate that over to code using the NEST library, I'm not seeing how that would be accomplished.
The OrderDescending() method has only one implementation that takes a string for the key. I need a C# "params" type method that can take a list of OrderDescending() and\or OrderAscending() elements.
Is there a way to do this in NEST that I'm overlooking?
Is there a way in NEST to work around this where I can inject a little raw JSON where I need it?
FWIW, I've been using the "fluent" style to create my queries.
EDIT:
I see that, using "object initializer" syntax, I could manually create the dictionary and add my criteria elements. Problem is, I have large amounts of code written in "fluent" syntax. So,
Is there a way to use an "object initializer" object and convert it to a "fluent" descriptor? In this case, a TermsAggregator to a TermsAggregationDescriptor?
EDIT 2:
I should have mentioned originally that I tried .OrderDescending("avg_rank").OrderDescending("avg_score") already. That simply took that last one in the chain. In looking at the code, I can see why. Each call to OrderDescending blindly news up the dictionary instead of checking to see if one was already newed up and adding a new key to the dictionary if it already exists.
Based on this, I believe this is a bug for which I have entered a report here:
OrderDescending and OrderAscending cannot be chained for multi-criteria ordering
EDIT 3:
I appreciate all the answers (some of which are getting deleted) because they're helping drive this along and are responsible for these edits. I should also have mentioned originally that I discovered that:
"order": { "avg_rank": "desc", "avg_score": "desc" }
does not work. I don't know why for sure but ES will only use the last one in that case. It has be a list of dictionaries as shown in my example above at the top. I've verified that correctly sub-orders the aggregation on the second element. So, the underlying object cannot be typed as a simple dictionary. I've also added this information to the bug report I created (as noted in EDIT 2).

If you're using the fluent syntax you can just chain the sorts together.
Sample:
var esClient = ninjectKernel.Get<IElasticClient>();
var query = esClient.Search<RedemptionES>(s=> s
.SortAscending(a=>a.Date)
.SortDescending(d=>d.Input.User.Name)
);
Response:
{
"sort": [
{
"#timestamp": {
"order": "asc"
}
},
{
"input.user.name": {
"order": "desc"
}
}
]
}

Martijn Laarman of the NEST team was very responsive and kind enough to provide a work around for the bug I reported in EDIT 2 of the description above. The fix can be found in the comments of that same bug report: Work around for NEST library multi-criteria aggregation ordering.
Note that he provided a work around for both object initializer and fluent syntaxes (the one I needed).

Carrot2+ElasticSearch Basic Flow of Information

I am using Carrot2 and ElasticSearch. I has elastic search server running with a lot of data when I installed carrot2 plugin.
Wanted to get answers to a few basic questions:
Will clustering work only on newly indexed documents or even old documents?
How can I specify which fields to look at for clustering?
The curl command is working and giving some results. How can I get the curl command which takes a JSON as input to a REST API url of the form localhost:9200/article-index/article/_search_with_clusters?.....
Appreciate any help.

Yes, if you want to use the plugin straight off the ES installation, you need to make REST calls of your own. I believe you are using Python. Take a look at requests. It is a delightful REST tool for python.
To make POST requests you can do the following :
import json
url = 'localhost:9200/article-index/article/_search_with_clusters'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text
Find more information at requests documentation.

Will clustering work only on newly indexed documents or even old
documents?
It will work even on old documments
How can I specify which fields to look at for clustering?
Here's an example using the shakepspeare dataset. The query is which of shakespeare's plays are about war?
$ curl -XPOST http://localhost:9200/shakespeare/_search_with_clusters?pretty -d '
{
"search_request": {
"query": {"match" : { "_all": "war" }},
"size": 100
},
"max_hits": 0,
"query_hint": "war",
"field_mapping": {
"title": ["_source.play_name"],
"content": ["_source.text_entry"]
},
"algorithm": "lingo"
}'
Running this you'll get back plays like Richard, Henry... The title is what carrot2 uses to develop the cluster names and the text entry is what it uses to make the clusters.
The curl command is working and giving some results. How can I get the
curl command which takes a JSON as input to a REST API url of the form
localhost:9200/article-index/article/_search_with_clusters?.....
Typically use the elasticsearch client libraries for your language of choice.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio