How to query arbitrary fields in Kibana? - elasticsearch

We're importing logs that contain the the full request/response for a given endpoint. Using the escLayout c# library. The import is working fine, however the 'structured' part of the log is stored under metadata, like so:
"metadata": {
"event": {
"controller": "main",
"method": "GetData",
"request": {
"userId": 1,
"clientTypeId": 2
},
"response": {
"marketOpen": true,
"price": 18.56
}
}
}
How do I go about querying this metadata as the fields do not appear in the 'Lens' page.
Is it a case of creating an index of some description? There are a lot of different (and occasionally large) data sets so this seems really impactable.
Is querying 'ad-hoc' data like this not a good use of Kibana? Should I look elsewhere, say Grafana before I go too far down the Elastic road?
Note: We're on Elastic 8.2.0

Related

Azure Data Factory REST API paging with Elasticsearch

During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}
I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.

Use Kafka Connect to update Elasticsearch field on existing document instead of creating new

I have Kafka set-up running with the Elasticsearch connector and I am successfully indexing new documents into an ES index based on the incoming messages on a particular topic.
However, based on incoming messages on another topic, I need to append data to a field on a specific document in the same index.
Psuedo-schema below:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": []
}
^ This document is being created fine in ES based on the data in the topic mentioned above.
However, how do I then add items to the views field using messages from another topic. Like so:
article-view topic schema:
{
"article_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"user_id": 123456,
"timestamp: 136389734
}
and instead of simply creating a new document on the article-view index (which I dont' want to even have). It appends this to the views field on the article document with corresponding _id equal to article_id from the message.
so the end result after one message would be:
{
"_id": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"uuid": "6993e0a6-271b-45ef-8cf5-1c0d0f683acc",
"title": "A title",
"body": "A body",
"created_at": 164584548,
"views": [
{
"user_id": 123456,
"timestamp: 136389734
}
]
}
Using the ES API it is possible using a script. Like so:
{
"script": {
"lang": "painless",
"params": {
"newItems": [{
"timestamp": 136389734,
"user_id": 123456
}]
},
"source": "ctx._source.views.addAll(params.newItems)"
}
}
I can generate scripts like above dynamically in bulk, and then use the helpers.bulk function in the ES Python library to bulk update documents this way.
Is this possible with Kafka Connect / Elasticsearch? I haven't found any documentation on Confluent's website to explain how to do this.
It seems like a fairly standard requirement and an obvious thing people would need to do with Kafka / A sink connector like ES.
Thanks!
Edit: Partial updates are possible with write.method=upsert (src)
The Elasticsearch connector doesn't support this. You can update documents in-place but need to send the full document, not a delta for appending which I think it what you're after.

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

Generating GraphQL GUI from Schema and Schema from GUI

While using GraphiQL works well, my boss has asked me to implement a user interface where users can check elements presented to them via UI elements like checkbox, map relationships and get the data and doing this will generate a graphql input for the person, call the API and get the result back to the user.
So, basically this involves 2 generations. Generating a user interface from a GraphQL schema and generating a GraphQL input query from the user's selection.
I searched and I was not able to find any tools which already do this. My server is in Node and I am using Express GraphQL. I converted my express schema to GraphQLSchema language using https://github.com/graphql-cli/graphql-cli and I introspected the GraphQLSchema language using the introspect function at https://github.com/sheerun/graphqlviz/blob/master/cli.js
The object which I got was something like this (only partial schema output given below)
`
"data": {
"__schema": {
"queryType": {
"name": "Query"
},
"mutationType": {
"name": "Mutation"
},
"subscriptionType": null,
"types": [{
"kind": "OBJECT",
"name": "Query",
"description": null,
"fields": [{
"name": "employee",
"description": null,
"args": [{
"name": "ecode",
"description": null,
"type": {
"kind": "SCALAR",
"name": "String",
"ofType": null
},
"defaultValue": null
}],
`
I am looping through the elements trying to generate UI but I am quite stuck.
What is the best way to do this? Thanks in advance.
Well for the part of generating the ui from the introspection query, I think that the response contains enough data for a sufficient ui (description for each field can be used as a placeholder for each field's input box). If you're asking how can you generate a dynamic form from the introspection response, you can take a look at other projects that created transformers from json to html forms for inspiration/usage (take a look at https://github.com/brutusin/json-forms/blob/master/README.md#cdn). For complex fields (not primitive types) you may need to do some more work.
For the schema generation from the ui, you can use any query builder tool, and build a query according to the user inputs. Each combobox will be mapped to a specific SCHEMANAME.FIELDNAME and the value will be the value of the input box.
I hope it helped a bit. BTW, it sounds like an interesting tool so let us know if you succeed!

What's the correct way to present paged resources with HAL?

This sounds like a rookie question, but I'm wondering what's the best way to present paged resources with HAL format? Right now I'm using Spring HATEOAS API to convert Page object into resource PagedResourcesAssembler#toResource(Page<T>, ResourceAssembler<T,R>). This results in the following output:
{
"_links": {
"self": {
"href": "http://example.org/api/user?page=3"
},
…
}
"count": 3,
"total": 498,
"_embedded": {
"users": [
{
"_links": {
"self": {
"href": "http://example.org/api/user/mwop"
}
},
"id": "mwop",
"name": "Matthew Weier O'Phinney"
}
]
}
}
Everything works fine but the only problem is the collection being returned is under _embedded field and has the class name, so the client has to know this class name as well right? Would it be better to just return the collection under content like non-HAL format? If yes how should I achieve it using Spring HATEOAS?
That's not a problem, that's the way _embedded is defined in the HAL specification.
users is not a class, it's a link relation that will allow clients to actually find the collection it was asking for in the first place (e.g. using a JSONPath expression). That's not something coming out of the blue at all but usually is the same link relation, the client used to find that resource in the first place.
Assume an API root exposing this document:
{
"_links": {
"users": {
"href": "…"
},
…
}
}
Seeing that, the client would have to know the semantics of users to find the link it wants to follow. In your case users is basically pointing to a collection resource that supports pagination.
So if the client follows the link named users, it can then find the actual content it's looking for under _embedded.users in the HAL response by combining knowledge about the media type (HAL, _embedded) and the service's application-level semantics (users).

Resources