How to Viewing trace logs from OpenTelemetry in Elastic APM - elasticsearch

I receive logs from opentelemetry-collector in Elastic APM
logs structure :
"{Timestamp:HH:mm:ss} {Level:u3} trace.id={TraceId} transaction.id={SpanId}{NewLine}{Message:lj}{NewLine}{Exception}"
example:
08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG
I tried use pipeline
POST _ingest/pipeline/_simulate { "pipeline": { "description" : "parse multiple patterns", "processors": [
{
"grok": {
"field": "message",
"patterns": ["%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:loglevel} \\[trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})?\\] %{GREEDYDATA:message}"],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
},
"date": { "field": "logtime", "target_field": "#timestamp", "formats": ["HH:mm:ss"] }
} ] } }
My goal see logs in Elastic APM
{
"#timestamp": 2021-01-05T10:10:10",
"message": "Protocol Port MIs-Match",
"trace": {
"traceId": "898a7716358b25408d4f193f1cd17831",
"spanId": "4f7590e4ba80b64b"
}
}

Good job so far. Your pipeline is almost good, however, the grok pattern needs some fixing and you have some orphan curly braces. Here is a working example:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "parse multiple patterns",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"""%{TIME:logtime} %{WORD:loglevel} trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})? %{GREEDYDATA:message}"""
],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
}
},
{
"date": {
"field": "logtime",
"target_field": "#timestamp",
"formats": [
"HH:mm:ss"
]
}
}
]
},
"docs": [
{
"_source": {
"message": "08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG"
}
}
]
}
Response:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"trace" : {
"id" : "898a7716358b25408d4f193f1cd17831"
},
"#timestamp" : "2021-01-01T08:27:47.000Z",
"loglevel" : "INF",
"message" : "SOME MSG",
"logtime" : "08:27:47",
"transaction" : {
"id" : "4f7590e4ba80b64b"
}
},
"_ingest" : {
"timestamp" : "2021-03-30T11:07:52.067275598Z"
}
}
}
]
}
Just note that the exact date is missing so the #timestamp field resolve to January 1st this year.

Related

Is there a way to reference the field 'path.virtual' as part of this split processor?

The field I am interested in from my ES doc below "virtual":
"path" : {
"root" : "cda42f809526c222ebb54e5887117139",
"virtual" : "/tests/3.pdf",
"real" : "/tmp/es/tests/3.pdf"
}
My simulated ingest pipeline:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "split words on line_number field",
"processors": [
{
"split": {
"field": "path.virtual",
"separator": "/",
"target_field": "temporary_field"
}
},
{
"set": {
"field": "caseno",
"value": "{{temporary_field.1}}"
}
},
{
"set": {
"field": "file",
"value": "{{temporary_field.2}}"
}
},
{
"remove": {
"field": "temporary_field"
}
}
]
},
"docs": [
{
"_source": {
"path.virtual": "/test/3.pdf"
}
}
]
}
If I change the actual field 'path.virtual' to 'path' or 'virtual' I get desired result but if I use the actual field name I get the following error:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
}
]
}
What can I do to avoid this?
Try this in simulate:
"docs": [
{
"_source": {
"path": {
"virtual": "/test/3.pdf"
}
}
}
]

How to convert the particular item in the filebeat message to lowercase using elastic search processor

I am simulating the below code in elastic search, how to convert the event.action in the below code from Query to lowercase "query" as expected in the output.
The below simulation done in the elastic devtools console:
POST /_ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"dissect": {
"field" : "message",
"pattern" : "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
},
"set": {
"field": "event.category",
"value": "database"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}
Expected output
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_id" : "id",
"_source" : {
"process" : {
"thread" : {
"id" : "9"
}
},
"#timestamp" : "2020-10-22T20:28:26.267397Z",
"message" : "set session",
"event" : {
"category" : "database",
"action" : "query"
}
},
"_ingest" : {
"timestamp" : "2022-08-17T09:27:34.587465824Z"
}
}
}
]
}
You can use lowercase processor in same ingest pipeline as shown below:
{
"pipeline": {
"description": "_description",
"processors": [
{
"dissect": {
"field": "message",
"pattern": "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
}
},
{
"set": {
"field": "event.category",
"value": "database"
}
},
{
"lowercase": {
"field": "event.action"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}

ElasticSearch DSL Matching all elements of query in list of list of strings

I'm trying to query ElasticSearch to match every document that in a list of list contains all the values requested, but I can't seem to find the perfect query.
Mapping:
"id" : {
"type" : "keyword"
},
"mainlist" : {
"properties" : {
"format" : {
"type" : "keyword"
},
"tags" : {
"type" : "keyword"
}
}
},
...
Documents:
doc1 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1",
"tag2"
]
},
{
"type" : "small",
"tags" : [
"tag1"
]
}
]
},
doc2 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
},
{
"type" : "small",
"tags" : [
"tag2"
]
}
]
},
doc3 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
}
]
}
The query I've tried that got me closest to the result is:
GET /index/_doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"term": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
although I get as result doc1 and doc2, while I'd only want doc1 as contains tag1 and tag2 in a single list element and not spread across both sublists.
How would I be able to achieve that?
Thanks for any help.
As mentioned by #caster, you need to use the nested data type and query as in normal way Elasticsearch treats them as object and relation between the elements are lost, as explained in offical doc.
You need to change both mapping and query to achieve the desired output as shown below.
Index mapping
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"mainlist" :{
"type" : "nested"
}
}
}
}
Sample Index doc according to your example, no change there
Query
{
"query": {
"nested": {
"path": "mainlist",
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"match": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
}
}
And result
hits": [
{
"_index": "71519931_new",
"_id": "1",
"_score": 0.9139043,
"_source": {
"id": "abc",
"mainlist": [
{
"type": "big",
"tags": [
"tag1",
"tag2"
]
},
{
"type": "small",
"tags": [
"tag1"
]
}
]
}
}
]
use nested field type,this is work for it
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/nested.html

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

How split a field to words by ingest pipeline in Kibana

I have created an ingest pipeline as below to split a field into words:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
but it split the field into characters:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"a",
"p",
"p",
"l",
"e",
"|",
"t",
"i",
"m",
"e"
]
}
}
}
]
}
If I replace the separator with a comma, the same pipeline split the field to words:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": ","
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple,time"
}
}
]
}
then the output would be:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"apple",
"time"
]
}
}
}
]
}
How can I split the field into words when the separator is "|"?
My next question is how could I apply this ingest pipeline to an existing index?
I tried this solution, but it doesn't work for me.
Edit
Here is the whole pipeline with the document that will assign two parts to two columns:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.1}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.2}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
That generates this response:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "m",
"dv_m_splited" : [
"a",
"m",
"a",
"z",
"e",
"_",
"i",
"n",
"c",
"|",
"U",
"n",
"d",
"e",
"r",
"s",
"t",
"a",
"n",
"d",
"i",
"n",
"g"
],
"dv_metric_section" : "a",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-02T08:33:58.2234143Z"
}
}
}
]
}
If I set "separator": "\\|", then I will get this error:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache#776f8239"
}
],
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache#776f8239",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "Failed to get value for dv_m_splited.2 #[query-template:1]",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "2 #[query-template:1]",
"caused_by" : {
"type" : "index_out_of_bounds_exception",
"reason" : "2"
}
}
}
}
}
]
}
The solution is fairly simple: just escape your separator.
As the separator field in the split processor is a regular expression, you need to escape special characters such as |.
You also need to escape it twice.
So your code only lacks the double escaping part:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "\\|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
UPDATE
You did not mention or I missed the part where you wanted to assign the values to two separate fields.
In this case, you should use dissect instead of split. It is shorter, simpler, cleaner. See the documentation here.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"dissect": {
"field": "dv_m",
"pattern": "%{dv_metric_prod}|%{dv_metric_section}"
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
Result
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "amaze_inc",
"dv_metric_section" : "Understanding",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-18T07:39:12.84910326Z"
}
}
}
]
}
ADDENDUM
If using split instead of dissect
You got your array indices wrong. There is no such thing as {{dv_m_splited.2}} as the array index starts from 0 and you only have two results.
This is the correct pipeline when using the split processor.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "\\|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.0}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.1}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}

Resources