How to convert the particular item in the filebeat message to lowercase using elastic search processor - elasticsearch

I am simulating the below code in elastic search, how to convert the event.action in the below code from Query to lowercase "query" as expected in the output.
The below simulation done in the elastic devtools console:
POST /_ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"dissect": {
"field" : "message",
"pattern" : "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
},
"set": {
"field": "event.category",
"value": "database"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}
Expected output
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_id" : "id",
"_source" : {
"process" : {
"thread" : {
"id" : "9"
}
},
"#timestamp" : "2020-10-22T20:28:26.267397Z",
"message" : "set session",
"event" : {
"category" : "database",
"action" : "query"
}
},
"_ingest" : {
"timestamp" : "2022-08-17T09:27:34.587465824Z"
}
}
}
]
}

You can use lowercase processor in same ingest pipeline as shown below:
{
"pipeline": {
"description": "_description",
"processors": [
{
"dissect": {
"field": "message",
"pattern": "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
}
},
{
"set": {
"field": "event.category",
"value": "database"
}
},
{
"lowercase": {
"field": "event.action"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}

Related

Is there a way to reference the field 'path.virtual' as part of this split processor?

The field I am interested in from my ES doc below "virtual":
"path" : {
"root" : "cda42f809526c222ebb54e5887117139",
"virtual" : "/tests/3.pdf",
"real" : "/tmp/es/tests/3.pdf"
}
My simulated ingest pipeline:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "split words on line_number field",
"processors": [
{
"split": {
"field": "path.virtual",
"separator": "/",
"target_field": "temporary_field"
}
},
{
"set": {
"field": "caseno",
"value": "{{temporary_field.1}}"
}
},
{
"set": {
"field": "file",
"value": "{{temporary_field.2}}"
}
},
{
"remove": {
"field": "temporary_field"
}
}
]
},
"docs": [
{
"_source": {
"path.virtual": "/test/3.pdf"
}
}
]
}
If I change the actual field 'path.virtual' to 'path' or 'virtual' I get desired result but if I use the actual field name I get the following error:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
}
]
}
What can I do to avoid this?
Try this in simulate:
"docs": [
{
"_source": {
"path": {
"virtual": "/test/3.pdf"
}
}
}
]

Should and Filter combination in ElasticSearch

I have this query which return the correct result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
]
}
}
}
and the result is below:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.6012557,
"hits" : [
{
"_index" : "person",
"_type" : "_doc",
"_id" : "70002",
"_score" : 3.6012557,
"_source" : {
"gender" : "Male",
"activeStatus" : "Inactive",
"deceased" : "No",
"nameDetails" : {
"name" : [
{
"nameValue" : {
"firstName" : "Fawsu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Primary Name"
},
{
"nameValue" : {
"firstName" : "Fausu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Spelling Variation"
}
]
}
}
}
]
}
But when I add the filter for Gender, it returns no result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
],
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}
Even I just use filter, it return no result
GET /person/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}
You are not getting any search result, because you are using the term query (in the filter clause). Term query will return the document only if it has an exact match.
A standard analyzer is used when no analyzer is specified, which will tokenize Male to male. So either you can search for male instead of Male or use any of the below solutions.
If you have not defined any explicit index mapping, you need to add .keyword to the gender field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after gender field). Try out this below query -
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender.keyword": "Male"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "66879128",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"gender": "Male",
"activeStatus": "Inactive",
"deceased": "No",
"nameDetails": {
"name": [
{
"nameValue": {
"firstName": "Fawsu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Primary Name"
},
{
"nameValue": {
"firstName": "Fausu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Spelling Variation"
}
]
}
}
}
]
If you have defined index mapping, then modify the mapping for gender field as shown below
{
"mappings": {
"properties": {
"gender": {
"type": "keyword"
}
}
}
}

How to Viewing trace logs from OpenTelemetry in Elastic APM

I receive logs from opentelemetry-collector in Elastic APM
logs structure :
"{Timestamp:HH:mm:ss} {Level:u3} trace.id={TraceId} transaction.id={SpanId}{NewLine}{Message:lj}{NewLine}{Exception}"
example:
08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG
I tried use pipeline
POST _ingest/pipeline/_simulate { "pipeline": { "description" : "parse multiple patterns", "processors": [
{
"grok": {
"field": "message",
"patterns": ["%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:loglevel} \\[trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})?\\] %{GREEDYDATA:message}"],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
},
"date": { "field": "logtime", "target_field": "#timestamp", "formats": ["HH:mm:ss"] }
} ] } }
My goal see logs in Elastic APM
{
"#timestamp": 2021-01-05T10:10:10",
"message": "Protocol Port MIs-Match",
"trace": {
"traceId": "898a7716358b25408d4f193f1cd17831",
"spanId": "4f7590e4ba80b64b"
}
}
Good job so far. Your pipeline is almost good, however, the grok pattern needs some fixing and you have some orphan curly braces. Here is a working example:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "parse multiple patterns",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"""%{TIME:logtime} %{WORD:loglevel} trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})? %{GREEDYDATA:message}"""
],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
}
},
{
"date": {
"field": "logtime",
"target_field": "#timestamp",
"formats": [
"HH:mm:ss"
]
}
}
]
},
"docs": [
{
"_source": {
"message": "08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG"
}
}
]
}
Response:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"trace" : {
"id" : "898a7716358b25408d4f193f1cd17831"
},
"#timestamp" : "2021-01-01T08:27:47.000Z",
"loglevel" : "INF",
"message" : "SOME MSG",
"logtime" : "08:27:47",
"transaction" : {
"id" : "4f7590e4ba80b64b"
}
},
"_ingest" : {
"timestamp" : "2021-03-30T11:07:52.067275598Z"
}
}
}
]
}
Just note that the exact date is missing so the #timestamp field resolve to January 1st this year.

Nested Query in Elastic Search

I have a schema in elastic search of this form:
{
"index1" : {
"mappings" : {
"properties" : {
"key1" : {
"type" : "keyword"
},
"key2" : {
"type" : "keyword"
},
"key3" : {
"properties" : {
"components" : {
"type" : "nested",
"properties" : {
"sub1" : {
"type" : "keyword"
},
"sub2" : {
"type" : "keyword"
},
"sub3" : {
"type" : "keyword"
}
}
}
}
}
}
}
}
}
and then the data stored in elastic search would be of the format:
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"key1" : "val1",
"key2" : "val2",
"key3" : {
components : [
{
"sub1" : "subval11",
"sub3" : "subval13"
},
{
"sub1" : "subval21",
"sub2" : "subval22",
"sub3" : "subval23"
},
{
"sub1" : "subval31",
"sub2" : "subval32",
"sub3" : "subval33"
}
]
}
}
}
As you can see that the sub1, sub2 and sub3 might not be present in few of the objects under key3.
Now if I try to write a query to fetch the result based on key3.sub2 as subval22 using this query
GET index1/_search
{
"query": {
"nested": {
"path": "components",
"query": {
"bool": {
"must": [
{
"match": {"key3.sub2": "subval22"}
}
]
}
}
}
}
}
I always get the error as
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: {...}",
"index_uuid": "1",
"index": "index1"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index1",
"node": "1aK..",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {...}",
"index_uuid": "1",
"index": "index1",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [components]"
}
}
}
]
},
"status": 400
}
I understand that since sub2 is not present in all the objects under components, this error is being thrown. I am looking for a way to search such scenarios such that it matches and finds all the objects in the array. If a value is matched, then this doc should get returned.
Can someone help me to get this working.
You made mistake while defining your schema, below schema works fine, Note I just defined key3 as nested. and changed the nested path to key3
Index def
{
"mappings": {
"properties": {
"key1": {
"type": "keyword"
},
"key2": {
"type": "keyword"
},
"key3": {
"type": "nested"
}
}
}
}
Index you sample doc without any change
{
"key1": "val1",
"key2": "val2",
"key3": {
"components": [ --> this was a diff
{
"sub1": "subval11",
"sub3": "subval13"
},
{
"sub1": "subval21",
"sub2": "subval22",
"sub3": "subval23"
},
{
"sub1": "subval31",
"sub2": "subval32",
"sub3": "subval33"
}
]
}
}
Searching with your criteria
{
"query": {
"nested": {
"path": "key3", --> note this
"query": {
"bool": {
"must": [
{
"match": {
"key3.components.sub2": "subval22" --> note this
}
}
]
}
}
}
}
}
This brings the proper search result
"hits": [
{
"_index": "so_nested_61200509",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"key1": "val1",
"key2": "val2",
"key3": {
"components": [ --> note this
{
"sub1": "subval11",
"sub3": "subval13"
},
{
"sub1": "subval21",
"sub2": "subval22",
"sub3": "subval23"
},
{
"sub1": "subval31",
"sub2": "subval32",
"sub3": "subval33"
}
]
Edit:- Based on the comment from OP, updated sample doc, search query and result.

Elasticsearch parent - child mapping: Search in both and highlight

I have the following elasticsearch 1.6.2 index mappings: parent item and child document. One item can have several documents. Documents are not nested because they contain base64 data (mapper-attachments-plugin) and cannot be updated with an item.
"mappings" : {
"document" : {
"_parent" : {
"type" : "item"
},
"_routing" : {
"required" : true
},
"properties" : {
"extension" : {
"type" : "string",
"term_vector" : "with_positions_offsets",
"include_in_all" : true
}, ...
},
}
"item" : {
"properties" : {
"prop1" : {
"type" : "string",
"include_in_all" : true
}, ...
}
}
I like to search in both indices but always return items. If there is a match in an document, return the corresponding item. If there is a match in an item, return the item. If both is true, return the item.
Is it possible to combine has_child and has_parent searches?
This search only searches in documents and returns items:
{
"query": {
"has_child": {
"type": "document",
"query": {
"query_string":{"query":"her*}
},
"inner_hits" : {
"highlight" : {
"fields" : {
"*" : {}
}
}
}
}
EXAMPLE
GET index/item/174
{
"_type" : "item",
"_id" : "174",
"_source":{"prop1":"Perjeta construction"}
}
GET index/document/116
{
"_type" : "document",
"_id" : "116",
"_source":{"extension":"pdf","item": {"id":174},"fileName":"construction plan"}
}
__POSSIBLE SEARCH RESULT searching for "constr*"__
{
"hits": {
"total": 1,
"hits": [
{
"_type": "item",
"_id": "174",
"_source": {
"prop1": "Perjeta construction"
},
"highlight": {
"prop1": [
"Perjeta <em>construction<\/em>"
]
},
"inner_hits": {
"document": {
"hits": {
"hits": [
{
"_type": "document",
"_id": "116",
"_source": {
"extension": "pdf",
"item": {
"id": 174
},
"fileName": "construction plan"
},
"highlight": {
"fileName": [
"<em>construction<\/em> plan"
]
}
}
]
}
}
}
}
]
}
}
I can answer my question "Is it possible to combine has_child and has_parent" with no.
You should only use one at a time on one index.

Resources