Accessing metadata fields within ingestpipeline.yml set's processor in Elasticsearch - elasticsearch

I have to write an ingest pipeline for elasticsearch within an pipeline.yml file. I was able to retrieve my field with grok and was able to divide it with the split processor. Now, I want to assign each value of the resulting array from the split operation to its own field.
But I'm not able to access the elements of the split array. The relevant code snippets look like this:
- grok:
field: message
patterns:
- ^TRIGGER OCCURRED. %{GREEDYDATA:pac.log.deo.trigger.path}
tag: TRIGGER
- split:
if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
field: '#metadata.pac.log.deo.trigger.path'
separator: "/"
- set:
if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
field: pac.log.deo.trigger.provider
value: '{{{#metadata.pac.log.deo.trigger.path[0]}}}'
a log line would look like:
TRIGGER OCCURRED: Timer/Period [seconds]/10 seconds
I would like to have the first value = index 0, if elasticsearch indexes start as well as other oop - languages array indexes with 0, stored inside the field pac.log.deo.trigger.provider
I tried varies annotations:
'{{{#metadata.pac.log.deo.trigger.path[0]}}}'
'{{#metadata.pac.log.deo.trigger.path[0]}}'
'#metadata.pac.log.deo.trigger.path[0]'
'#metadata.pac.log.deo.trigger.path[0]'
'{{{_source.metadata.pac.log.deo.trigger.path[0]}}}'
'{{{_ingest.metadata.pac.log.deo.trigger.path[0]}}}'
Since its ingesting processors do not filter plugins, the filter "ruby" is not available. List of available ingest Processors:
"processors": [
{
"type": "append"
},
{
"type": "attachment"
},
{
"type": "bytes"
},
{
"type": "circle"
},
{
"type": "community_id"
},
{
"type": "convert"
},
{
"type": "csv"
},
{
"type": "date"
},
{
"type": "date_index_name"
},
{
"type": "dissect"
},
{
"type": "dot_expander"
},
{
"type": "drop"
},
{
"type": "enrich"
},
{
"type": "fail"
},
{
"type": "fingerprint"
},
{
"type": "foreach"
},
{
"type": "geoip"
},
{
"type": "grok"
},
{
"type": "gsub"
},
{
"type": "html_strip"
},
{
"type": "inference"
},
{
"type": "join"
},
{
"type": "json"
},
{
"type": "kv"
},
{
"type": "lowercase"
},
{
"type": "network_direction"
},
{
"type": "pipeline"
},
{
"type": "registered_domain"
},
{
"type": "remove"
},
{
"type": "rename"
},
{
"type": "script"
},
{
"type": "set"
},
{
"type": "set_security_user"
},
{
"type": "sort"
},
{
"type": "split"
},
{
"type": "trim"
},
{
"type": "uppercase"
},
{
"type": "uri_parts"
},
{
"type": "urldecode"
},
{
"type": "user_agent"
}

Found the solution:
- set:
if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
field: pac.log.deo.trigger.provider
value: {{{#metadata.pac.log.deo.trigger.path.0}}}

Related

Why after set mapping, index return nothing?

I am using Elasticsearch 7.12.0 , Logstash 7.12.0, Kibana 7.12.0 on Windows 10 x64. Logstash config file logistics.conf
input {
jdbc {
jdbc_driver_library => "D:\\tools\\postgresql-42.2.16.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://localhost:5433/ld"
jdbc_user => "xxxx"
jdbc_password => "sEcrET"
schedule => "*/5 * * * *"
statement => "select * from inventory_item_report();"
}
}
filter {
uuid {
target => "uuid"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "localdist"
document_id => "%{uuid}"
doc_as_upsert => "true"
}
}
Run logstash
logstash -f logistics.conf
If I don't set mapping explicit, the query
GET /localdist/_search
{
"query": {
"match_all": {}
}
}
return many result.
My mappings
POST localdist/_mapping
{
}
DELETE /localdist
PUT /localdist
{
}
POST /localdist
{
}
PUT localdist/_mapping
{
"properties": {
"unt_cost": {
"type": "double"
},
"ii_typ": {
"type": "keyword"
},
"qty_uom_id": {
"type": "keyword"
},
"prod_id": {
"type": "keyword"
},
"root_cat_id": {
"type": "keyword"
},
"uom": {
"type": "keyword"
},
"product_name": {
"type": "text"
},
"ii_id": {
"type": "keyword"
},
"wght_uom_id": {
"type": "keyword"
},
"iid_seq_id": {
"type": "long"
},
"avai_diff": {
"type": "double"
},
"invt_change_typ": {
"type": "keyword"
},
"ccy": {
"type": "keyword"
},
"exp_date": {
"type": "date"
},
"req_amt": {
"type": "text"
},
"pur_cost": {
"type": "double"
},
"tot_pri": {
"type": "long"
},
"own_pid": {
"type": "keyword"
},
"doc_type": {
"type": "keyword"
},
"ii_date": {
"type": "date"
},
"fac_id": {
"type": "keyword"
},
"shipment_type_id": {
"type": "keyword"
},
"lot_id": {
"type": "keyword"
},
"phy_invt_id": {
"type": "keyword"
},
"facility_name": {
"type": "text"
},
"amt_ohand_diff": {
"type": "double"
},
"reason_id": {
"type": "keyword"
},
"cat_id": {
"type": "keyword"
},
"qty_ohand_diff": {
"type": "double"
},
"#timestamp": {
"type": "date"
}
}
}
run query
GET /localdist/_search
{
"query": {
"match_all": {}
}
}
return nothing.
How to fix it, how to make explicit mappings works correctly?
If I got you right, you are indexing via logstash. Elastic then create the index if missing, indexes the documents, and tries to guess the mapping for your documents based on the very first documents.
TL;DR: You are DELETING your index containing the data by yourself.
With
DELETE /localdist
you are deleting the whole index including all data. After that, by issuing
PUT /localdist
{
}
you are re-creating the previously deleted index which is empty again. And at the end, you are setting the index mapping with
PUT localdist/_mapping
{
"properties": {
"unt_cost": {
"type": "double"
},
"ii_typ": {
"type": "keyword"
},
...
Now, as you have an empty elastic index with a mapping set, start the logstash pipeline again. If your documents are matching the index mapping, the docs should start to appear very quickly.

How to exclude fields from being indexed in ElasticSearch?

I am trying to utilize ElasticSearch to store large sets of data. Most of the data will be searchable, however, there are some field that will be there just so the data is stored and returned upon request.
Here is my mapping
{
"mappings": {
"properties": {
"amenities": {
"type": "completion"
},
"summary": {
"type": "text"
},
"street_number": {
"type": "text"
},
"street_name": {
"type": "text"
},
"street_suffix": {
"type": "text"
},
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"state_or_province": {
"type": "text"
},
"postal_code": {
"type": "text"
},
"mlsid": {
"type": "text"
},
"source_id": {
"type": "text"
},
"status": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"subtype": {
"type": "keyword"
},
"year_built": {
"type": "short"
},
"community": {
"type": "keyword"
},
"elementary_school": {
"type": "keyword"
},
"middle_school": {
"type": "keyword"
},
"jr_high_school": {
"type": "keyword"
},
"high_school": {
"type": "keyword"
},
"area_size": {
"type": "double"
},
"lot_size": {
"type": "double"
},
"bathrooms": {
"type": "double"
},
"bedrooms": {
"type": "double"
},
"listed_at": {
"type": "date"
},
"price": {
"type": "double"
},
"sold_at": {
"type": "date"
},
"sold_for": {
"type": "double"
},
"total_photos": {
"type": "short"
},
"formatted_addressLine": {
"type": "text"
},
"formatted_address": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"price_changes": {
"type": "object"
},
"fields": {
"type": "object"
},
"deleted_at": {
"type": "date"
},
"is_available": {
"type": "boolean"
},
"is_unable_to_find_coordinates": {
"type": "boolean"
},
"source": {
"type": "keyword"
}
}
}
}
The fields and price_changes properties are there in case the user want to read that info. But that info should not be searchable or indexed. The fields holds a large list of key-value pairs whereas price_changes fields hold multiple objects of the same type.
Currently, when I attempt to bulk create records, I get Limit of total fields [1000] has been exceeded error. I am guessing this error is happening because every key-value pair in the fields collection is considered a field in elasticsearch.
How can I store the fields and the price_changes object as non-searchable data and not index it or count it toward the fields count?
You could use the enabled property at field level to store the fields without indexing them.
Read here https://www.elastic.co/guide/en/elasticsearch/reference/current/enabled.html
"price_changes": {
"type": "object",
"enabled": false
}
NOTE: Are you able to create an index using the mapping you gave in the question? It gives me syntax errors(Duplicate key) at "type" field. I think you are missing a closing bracket for "city" field.

How to use "where" clausule with graphQL in OpenAPI-to-GraphQL server?

I'm using LoopBack 4 with oasgraph (renamed to OpenAPI-to-GraphQL).
One of my OpenAPI endpoint has a filter parameter with the following schema :
"parameters": [
{
"name": "filter",
"in": "query",
"style": "deepObject",
"explode": true,
"schema": {
"properties": {
"where": {
"type": "object"
},
"fields": {
"type": "object",
"properties": {
"id": {
"type": "boolean"
},
"idOwner": {
"type": "boolean"
},
"createdTimestamp": {
"type": "boolean"
},
"modifiedTimestamp": {
"type": "boolean"
},
"idUserCreated": {
"type": "boolean"
},
"idUserModified": {
"type": "boolean"
},
"value": {
"type": "boolean"
},
"dicContactId": {
"type": "boolean"
},
"counterpartyId": {
"type": "boolean"
}
}
},
"offset": {
"type": "integer",
"minimum": 0
},
"limit": {
"type": "integer",
"minimum": 0
},
"skip": {
"type": "integer",
"minimum": 0
},
"order": {
"type": "array",
"items": {
"type": "string"
}
},
"include": {
"type": "array",
"items": {
"type": "object",
"properties": {
"relation": {
"type": "string"
},
"scope": {
"properties": {
"where": {
"type": "object"
},
"fields": {
"type": "object",
"properties": {}
},
"offset": {
"type": "integer",
"minimum": 0
},
"limit": {
"type": "integer",
"minimum": 0
},
"skip": {
"type": "integer",
"minimum": 0
},
"order": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
}
}
}
},
"type": "object"
}
}
],
As you can see the where properity is of a type "object". However in graphQL editor it expects a String:
graphql editor - expected type string
The problem is that the string produces an error when I run a query:
graphql editor - where clause is not an object
As a result, I'm not able to perform a query with where clause.
You can us npm qs node module to stringify your where clause object. Beacuse Loopback is using qs under the hood to parse query string.
import * as qs from 'qs';
let query = {
// where condition
}
qs.stringify(query, { addQueryPrefix: true });
You can find more info about qs here
Loopback4 query string issue discussion:- https://github.com/strongloop/loopback-next/issues/2468

In MS Flow, how do I loop through an array and extract values from array?

I have http rest result in this format:
{
"type": "object",
"properties": {
"page": {
"type": "object",
"properties": {
"total": {
"type": "integer"
}
}
},
"list": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"type": {
"type": "string"
},
"status": {
"type": "string"
}
},
"required": [
"id",
"type",
"status"
]
}
}
}
}
I am trying to loop through each item in "list" and extract the id, type, status. How do I do this in MS Flow? Here is what I got:
As you can see the variables are not in the dynamic content picker, how do I get it to show up?

How to debug elastic search error number_format_exception with reason "empty String" but no field name

I just deployed a small application that loads a few thousand docs into an index and when working with production data i get an error in my search request.
http code is 400 and the error is
{
"error": {
"root_cause": [
{
"type": "number_format_exception",
"reason": "empty String"
}
],
"type": "number_format_exception",
"reason": "empty String"
},
"status": 400
}
Okay i kind of get it that my mapping defines some numeric field which i oviously dont store correctly, but how am i supposed to find that field?
each doc contains hundereds of fields.... i mean, really?
I tried looking in /var/log/elasticsearch but nothing useful there...
Please help me i need to get the thing going
I defined my fields as integer which should hold arrays and might be empty. Could that be a problem?
My ES Version is 6.6.0
Update:
The error does occur while searching, during index all is fine
My mapping for that index:
{
"development-object-1551202425": {
"mappings": {
"_doc": {
"dynamic": "false",
"properties": {
"accommodation": {
"properties": {
"badges": {
"properties": {
"maskedProp1": {
"type": "boolean"
},
"maskedProp2": {
"type": "boolean"
},
"maskedProp3": {
"type": "boolean"
},
"maskedProp4": {
"type": "boolean"
},
"maskedProp5": {
"type": "boolean"
},
"maskedProp6": {
"type": "boolean"
}
}
},
"businessTypes": {
"type": "integer"
},
"classification": {
"properties": {
"classification": {
"type": "keyword"
},
"classificationValue": {
"type": "short"
}
}
},
"endowments": {
"type": "integer"
},
"hasPrice": {
"type": "boolean"
},
"lowestPrice": {
"type": "float"
},
"metascore": {
"type": "short"
},
"rating": {
"type": "short"
},
"regionscore": {
"type": "short"
}
}
},
"certificates": {
"type": "integer"
},
"geoLocation": {
"type": "geo_point"
},
"id": {
"type": "text"
},
"isAccommodation": {
"type": "boolean"
},
"location": {
"properties": {
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fielddata": true
},
"zipcodes": {
"type": "integer"
}
}
},
"maskedProp1": {
"type": "integer"
},
"maskedProp2": {
"type": "integer"
},
"description": {
"type": "text"
},
"sortTitle": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
}
The name consists of an environment string (development) and a timestamp appended (i work with automatic index switching and query for the alias, that does is called {env}-{name}-current.
In my case the error was an empty "size" parameter in the query, i tried to find the error in my filters and did not see that...
A more verbose error message (at least at what property or setting the error occured) could save thousands of hours of debugging all around the world i guess xD.
For now you would have to take apart your dsl section by section to find the issue.

Resources