Is there a way to reference the field 'path.virtual' as part of this split processor? - elasticsearch

The field I am interested in from my ES doc below "virtual":
"path" : {
"root" : "cda42f809526c222ebb54e5887117139",
"virtual" : "/tests/3.pdf",
"real" : "/tmp/es/tests/3.pdf"
}
My simulated ingest pipeline:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "split words on line_number field",
"processors": [
{
"split": {
"field": "path.virtual",
"separator": "/",
"target_field": "temporary_field"
}
},
{
"set": {
"field": "caseno",
"value": "{{temporary_field.1}}"
}
},
{
"set": {
"field": "file",
"value": "{{temporary_field.2}}"
}
},
{
"remove": {
"field": "temporary_field"
}
}
]
},
"docs": [
{
"_source": {
"path.virtual": "/test/3.pdf"
}
}
]
}
If I change the actual field 'path.virtual' to 'path' or 'virtual' I get desired result but if I use the actual field name I get the following error:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "field [[path] not present as part of path [[path.virtual]]"
}
}
]
}
What can I do to avoid this?

Try this in simulate:
"docs": [
{
"_source": {
"path": {
"virtual": "/test/3.pdf"
}
}
}
]

Related

How to convert the particular item in the filebeat message to lowercase using elastic search processor

I am simulating the below code in elastic search, how to convert the event.action in the below code from Query to lowercase "query" as expected in the output.
The below simulation done in the elastic devtools console:
POST /_ingest/pipeline/_simulate
{
"pipeline" :
{
"description": "_description",
"processors": [
{
"dissect": {
"field" : "message",
"pattern" : "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
},
"set": {
"field": "event.category",
"value": "database"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}
Expected output
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_id" : "id",
"_source" : {
"process" : {
"thread" : {
"id" : "9"
}
},
"#timestamp" : "2020-10-22T20:28:26.267397Z",
"message" : "set session",
"event" : {
"category" : "database",
"action" : "query"
}
},
"_ingest" : {
"timestamp" : "2022-08-17T09:27:34.587465824Z"
}
}
}
]
}
You can use lowercase processor in same ingest pipeline as shown below:
{
"pipeline": {
"description": "_description",
"processors": [
{
"dissect": {
"field": "message",
"pattern": "%{#timestamp}\t%{->} %{process.thread.id} %{event.action}\t%{message}"
}
},
{
"set": {
"field": "event.category",
"value": "database"
}
},
{
"lowercase": {
"field": "event.action"
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"message": "2020-10-22T20:28:26.267397Z\t 9 Query\tset session"
}
}
]
}

Aggregation Filter Sort - InternalFilter cannot be cast to class InternalMultiBucketAggregation

I have a query like this with aggregation and filter added. I'm trying to sort the results like this
GET stats/_search
{
"size": 0,
"aggs": {
"group_by_names": {
"terms": {
"field": "playerid"
},
"aggs": {
"gender_filter": {
"filter": {
"term": {
"gender": "women"
}
},
"aggs": {
"sum_of_runs": {
"sum": {
"field": "runs"
}
},
"top_runs_by_player": {
"bucket_sort": {
"sort": [
"sum_of_runs"
]
}
}
}
}
}
}
}
}
I'm receiving the error
{
"error" : {
"root_cause" : [ ],
"type" : "search_phase_execution_exception",
"reason" : "",
"phase" : "fetch",
"grouped" : true,
"failed_shards" : [ ],
"caused_by" : {
"type" : "class_cast_exception",
"reason" : "class org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation (org.elasticsearch.search.aggregations.bucket.filter.InternalFilter and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')"
}
},
"status" : 500
}
How do I resolve this issue?

ElasticSearch DSL Matching all elements of query in list of list of strings

I'm trying to query ElasticSearch to match every document that in a list of list contains all the values requested, but I can't seem to find the perfect query.
Mapping:
"id" : {
"type" : "keyword"
},
"mainlist" : {
"properties" : {
"format" : {
"type" : "keyword"
},
"tags" : {
"type" : "keyword"
}
}
},
...
Documents:
doc1 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1",
"tag2"
]
},
{
"type" : "small",
"tags" : [
"tag1"
]
}
]
},
doc2 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
},
{
"type" : "small",
"tags" : [
"tag2"
]
}
]
},
doc3 {
"id" : "abc",
"mainlist" : [
{
"type" : "big",
"tags" : [
"tag1"
]
}
]
}
The query I've tried that got me closest to the result is:
GET /index/_doc/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"term": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
although I get as result doc1 and doc2, while I'd only want doc1 as contains tag1 and tag2 in a single list element and not spread across both sublists.
How would I be able to achieve that?
Thanks for any help.
As mentioned by #caster, you need to use the nested data type and query as in normal way Elasticsearch treats them as object and relation between the elements are lost, as explained in offical doc.
You need to change both mapping and query to achieve the desired output as shown below.
Index mapping
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"mainlist" :{
"type" : "nested"
}
}
}
}
Sample Index doc according to your example, no change there
Query
{
"query": {
"nested": {
"path": "mainlist",
"query": {
"bool": {
"must": [
{
"term": {
"mainlist.tags": "tag1"
}
},
{
"match": {
"mainlist.tags": "tag2"
}
}
]
}
}
}
}
}
And result
hits": [
{
"_index": "71519931_new",
"_id": "1",
"_score": 0.9139043,
"_source": {
"id": "abc",
"mainlist": [
{
"type": "big",
"tags": [
"tag1",
"tag2"
]
},
{
"type": "small",
"tags": [
"tag1"
]
}
]
}
}
]
use nested field type,this is work for it
https://www.elastic.co/guide/en/elasticsearch/reference/8.1/nested.html

How to Viewing trace logs from OpenTelemetry in Elastic APM

I receive logs from opentelemetry-collector in Elastic APM
logs structure :
"{Timestamp:HH:mm:ss} {Level:u3} trace.id={TraceId} transaction.id={SpanId}{NewLine}{Message:lj}{NewLine}{Exception}"
example:
08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG
I tried use pipeline
POST _ingest/pipeline/_simulate { "pipeline": { "description" : "parse multiple patterns", "processors": [
{
"grok": {
"field": "message",
"patterns": ["%{TIMESTAMP_ISO8601:logtime} %{LOGLEVEL:loglevel} \\[trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})?\\] %{GREEDYDATA:message}"],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
},
"date": { "field": "logtime", "target_field": "#timestamp", "formats": ["HH:mm:ss"] }
} ] } }
My goal see logs in Elastic APM
{
"#timestamp": 2021-01-05T10:10:10",
"message": "Protocol Port MIs-Match",
"trace": {
"traceId": "898a7716358b25408d4f193f1cd17831",
"spanId": "4f7590e4ba80b64b"
}
}
Good job so far. Your pipeline is almost good, however, the grok pattern needs some fixing and you have some orphan curly braces. Here is a working example:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "parse multiple patterns",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"""%{TIME:logtime} %{WORD:loglevel} trace.id=%{TRACE_ID:trace.id}(?: transaction.id=%{SPAN_ID:transaction.id})? %{GREEDYDATA:message}"""
],
"pattern_definitions": {
"TRACE_ID": "[0-9A-Fa-f]{32}",
"SPAN_ID": "[0-9A-Fa-f]{16}"
}
}
},
{
"date": {
"field": "logtime",
"target_field": "#timestamp",
"formats": [
"HH:mm:ss"
]
}
}
]
},
"docs": [
{
"_source": {
"message": "08:27:47 INF trace.id=898a7716358b25408d4f193f1cd17831 transaction.id=4f7590e4ba80b64b SOME MSG"
}
}
]
}
Response:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"trace" : {
"id" : "898a7716358b25408d4f193f1cd17831"
},
"#timestamp" : "2021-01-01T08:27:47.000Z",
"loglevel" : "INF",
"message" : "SOME MSG",
"logtime" : "08:27:47",
"transaction" : {
"id" : "4f7590e4ba80b64b"
}
},
"_ingest" : {
"timestamp" : "2021-03-30T11:07:52.067275598Z"
}
}
}
]
}
Just note that the exact date is missing so the #timestamp field resolve to January 1st this year.

Nested Query in Elastic Search

I have a schema in elastic search of this form:
{
"index1" : {
"mappings" : {
"properties" : {
"key1" : {
"type" : "keyword"
},
"key2" : {
"type" : "keyword"
},
"key3" : {
"properties" : {
"components" : {
"type" : "nested",
"properties" : {
"sub1" : {
"type" : "keyword"
},
"sub2" : {
"type" : "keyword"
},
"sub3" : {
"type" : "keyword"
}
}
}
}
}
}
}
}
}
and then the data stored in elastic search would be of the format:
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"key1" : "val1",
"key2" : "val2",
"key3" : {
components : [
{
"sub1" : "subval11",
"sub3" : "subval13"
},
{
"sub1" : "subval21",
"sub2" : "subval22",
"sub3" : "subval23"
},
{
"sub1" : "subval31",
"sub2" : "subval32",
"sub3" : "subval33"
}
]
}
}
}
As you can see that the sub1, sub2 and sub3 might not be present in few of the objects under key3.
Now if I try to write a query to fetch the result based on key3.sub2 as subval22 using this query
GET index1/_search
{
"query": {
"nested": {
"path": "components",
"query": {
"bool": {
"must": [
{
"match": {"key3.sub2": "subval22"}
}
]
}
}
}
}
}
I always get the error as
{
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: {...}",
"index_uuid": "1",
"index": "index1"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index1",
"node": "1aK..",
"reason": {
"type": "query_shard_exception",
"reason": "failed to create query: {...}",
"index_uuid": "1",
"index": "index1",
"caused_by": {
"type": "illegal_state_exception",
"reason": "[nested] failed to find nested object under path [components]"
}
}
}
]
},
"status": 400
}
I understand that since sub2 is not present in all the objects under components, this error is being thrown. I am looking for a way to search such scenarios such that it matches and finds all the objects in the array. If a value is matched, then this doc should get returned.
Can someone help me to get this working.
You made mistake while defining your schema, below schema works fine, Note I just defined key3 as nested. and changed the nested path to key3
Index def
{
"mappings": {
"properties": {
"key1": {
"type": "keyword"
},
"key2": {
"type": "keyword"
},
"key3": {
"type": "nested"
}
}
}
}
Index you sample doc without any change
{
"key1": "val1",
"key2": "val2",
"key3": {
"components": [ --> this was a diff
{
"sub1": "subval11",
"sub3": "subval13"
},
{
"sub1": "subval21",
"sub2": "subval22",
"sub3": "subval23"
},
{
"sub1": "subval31",
"sub2": "subval32",
"sub3": "subval33"
}
]
}
}
Searching with your criteria
{
"query": {
"nested": {
"path": "key3", --> note this
"query": {
"bool": {
"must": [
{
"match": {
"key3.components.sub2": "subval22" --> note this
}
}
]
}
}
}
}
}
This brings the proper search result
"hits": [
{
"_index": "so_nested_61200509",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"key1": "val1",
"key2": "val2",
"key3": {
"components": [ --> note this
{
"sub1": "subval11",
"sub3": "subval13"
},
{
"sub1": "subval21",
"sub2": "subval22",
"sub3": "subval23"
},
{
"sub1": "subval31",
"sub2": "subval32",
"sub3": "subval33"
}
]
Edit:- Based on the comment from OP, updated sample doc, search query and result.

Resources