how to take duplicate configurations out in filebeat.yaml - yaml

I have a list of inputs in filebeat, for example
- path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- path: /xxx/ss.log
enabled: true
type: log
fields:
topic: topic_2
so can I take the duplicate configs out as a reference variable? for example
- path: /xxx/xx.log
${vars}
fields:
topic: topic_1
- path: /xxx/ss.log
${vars}
fields:
topic: topic_2

You can use YAML's inheritance : your first input is used as a model, and the others can override parameters.
- &default-log
path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- <<: *default-log
path: /xxx/ss.log
fields:
topic: topic_2
AFAIK there is no way to define an "abstract" default, meaning your &default-log should be one of your inputs (not just an abstract model).
(YAML syntax verified with YAMLlint)

Related

How to write a schema to constrain some of the properties with one/any of the sub-schemas?

Can I validate both
name: "range_1"
step: 1
start: 0
stop: 10
and
name: "range_2"
step: 1
center: 5
span: 5
with something like
properties:
name:
type: "string"
stop:
type: number
oneOf:
- start:
type: number
step:
type: number
- center:
type: number
span:
type: number
For now I am using jsonschema in Python, but it complains jsonschema.exceptions.SchemaError: <the array in oneOf> is not of type 'object', 'boolean'.
Validating against name and step only or validating against all possible keys apparently works but they both seem sub-optimal for me.
You need to move the oneOf keyword out of the properties object as everything in the properties object is interpreted as an expected value in your data.
Additionally, it makes sense to add an required property to make the values mandatory. Finally, if you want to make sure that no other values are excepted, you can use additionalProperties: false. Note though, that you have to repeat the "parent" properties in the oneOf schemas again. For further reading I recommend this example.
Put all together, you could use the following schema (see live example here):
---
properties:
name:
type: string
step:
type: number
oneOf:
- properties:
name: true
step: true
start:
type: number
stop:
type: number
required:
- start
- stop
additionalProperties: false
- properties:
name: true
step: true
center:
type: number
span:
type: number
required:
- center
- span
additionalProperties: false

Redocly OpenAPI structure error. Property `openapi` is not expected here

I am trying to create an api documentation using redocly.
On my openapi.yaml it is linking to a yaml that has the api docs called kpi-documentation.yaml
link/$ref in openapi.yaml
/kpiDocumentation:
$ref: paths/kpi-documentation.yaml
I have an error in my visual studio code redocly preview extension that says
We found structural problems in your definition, please check the files below before running the preview.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `openapi` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `info` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `paths` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `components` is not expected here.
Part of code that I have in the kpi-documentation.yaml that appears to be throwing the error is
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
I have checked the documentation on the redocly website and it looks like my yaml structure is fine.
Also to note the kpi-documentation previews fine by itself when using the preview, but not when I preview the openapi.yaml which is the root file that needs to work.
https://redocly.com/docs/openapi-visual-reference/openapi/#OAS-3.0
rootfile
openapi.yaml
openapi: 3.1.0
info:
version: 1.0.0
title: KPI API documentation
termsOfService: 'https://example.com/terms/'
contact:
name: Brendan
url: 'http://example.com/contact'
license:
name: Apache 2.0
url: 'http://www.apache.org/licenses/LICENSE-2.0.html'
x-logo:
url: 'https://www.feedgy.solar/wp-content/uploads/2022/07/Sans-titre-1.png'
tags:
- name: Insert Service 1
description: Example echo operations.
- name: Insert Service 2
description: Operations about users.
- name: Insert Service 3
description: This is a tag description.
- name: Insert Service 4
description: This is a tag description.
servers:
- url: 'https://{tenant}/api/v1'
variables:
tenant:
default: www
description: Your tenant id
- url: 'https://example.com/api/v1'
paths:
'/users/{username}':
$ref: 'paths/users_{username}.yaml'
/echo:
$ref: paths/echo.yaml
/pathItem:
$ref: paths/path-item.yaml
/pathItemWithExamples:
$ref: paths/path-item-with-examples.yaml
/kpiDocumentation:
$ref: 'paths/kpi-documentation.yaml'
pathitem file
kpi-documentation.yaml
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
"/api/v1/corrected-performance-ratio/plants/{id}":
get:
summary: Same as the Performance Ratio, but the ratio is done using Corrected Reference Yield, so it considers thermal losses in the panels as normal. The WCPR represents the losses in the BoS (balance of system), so everything from the panel DC output to the AC output.
operationId: corrected_performance_ratio_plants_retrieve
parameters:
- in: query
name: date_end
schema:
type: string
format: date
required: true
- in: query
name: date_start
schema:
type: string
format: date
required: true
- in: query
name: frequency
schema:
enum:
- H
- D
- M
- Y
type: string
default: H
minLength: 1
- in: path
name: id
schema:
type: integer
description: A unique integer value identifying this plant.
required: true
- in: query
name: threshold
schema:
type: integer
default: 50
tags:
- corrected-performance-ratio
security:
- tokenAuth: []
- cookieAuth: []
- {}
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/KPIResponse"
description: ""
"/api/v1/corrected-performance-ratio/plants/{id}/inverters":

Can't set document_id for deduplicating docs in Filebeat

What are you trying to do?
I have location data of some sensors, I want to make geo-spatial queries to find which sensors are in a specific area (query by polygon, bounding-box, etc). The location data (lat-lon) for these sensors may change in the future. I should be able to paste json files in ndjson format in the watched folder and overwrite the existing data with the new location data for each sensor.
I also have another filestream input for the indexing the logs of these sensors.
I went through docs for deduplication and filestream input for ndjson and followed them exactly.
Show me your configs.
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: filestream
id: "log"
enabled: true
paths:
- D:\EFK\Data\Log\*.json
parsers:
- ndjson:
keys_under_root: true
add_error_key: true
fields.doctype: "log"
- type: filestream
id: "loc"
enabled: true
paths:
- D:\EFK\Data\Location\*.json
parsers:
- ndjson:
keys_under_root: true
add_error_key: true
document_id: "Id" # Not working as expected.
fields.doctype: "location"
processors:
- copy_fields:
fields:
- from: "Lat"
to: "fields.location.lat"
fail_on_error: false
ignore_missing: true
- copy_fields:
fields:
- from: "Long"
to: "fields.location.lon"
fail_on_error: false
ignore_missing: true
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
hosts: ["localhost:9200"]
index: "sensor-%{[fields.doctype]}"
setup.ilm.enabled: false
setup.template:
name: "sensor_template"
pattern: "sensor-*"
# ------------------------------ Global Processors --------------------------
processors:
- drop_fields:
fields: ["agent", "ecs", "input", "log", "host"]
What does your input file look like?
{"Id":1,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
{"Id":2,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
{"Id":3,"Lat":19.000000,"Long":20.00000,"key1":"value1"}
It's the 'Id' field here that I want to use for deduplicating (overwriting with new) documents.
Update 10/05/22 :
I have also tried working with:
json.document_id: "Id"
filebeat.inputs
- type: filestream
id: "loc"
enabled: true
paths:
- D:\EFK\Data\Location\*.json
json.document_id: "Id"
ndjson.document_id: "Id"
filebeat.inputs
- type: filestream
id: "loc"
enabled: true
paths:
- D:\EFK\Data\Location\*.json
ndjson.document_id: "Id"
Straight up document_id: "Id"
filebeat.inputs
- type: filestream
id: "loc"
enabled: true
paths:
- D:\EFK\Data\Location\*.json
document_id: "Id"
Trying to overwrite _id using copy_fields
processors:
- copy_fields:
fields:
- from: "Id"
to: "#metadata_id"
fail_on_error: false
ignore_missing: true
Elasticsearch config has nothing special other than disabled security. And it's all running on localhost.
Version used for Elasticsearch, Kibana and Filebeat: 8.1.3
Please do comment if you need more info :)
References:
Deduplication in Filebeat: https://www.elastic.co/guide/en/beats/filebeat/8.2/filebeat-deduplication.html#_how_can_i_avoid_duplicates
Filebeat ndjson input: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_ndjson
Copy_fields in Filebeat: https://www.elastic.co/guide/en/beats/filebeat/current/copy-fields.html#copy-fields

Different yaml inputs for CWL scatter

I have a commandline tool in cwl which can take the following input:
fastq:
class: File
path: /path/to/fastq_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_R2.fq.gz
sample_name: foo
Now I want to scatter over this commandline tool and the only way I can think to do it is with scatterMethod: dotproduct and an input of something like:
fastqs:
- class: File
path: /path/to/fastq_1_R1.fq.gz
- class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2s:
- class: File
path: /path/to/fastq_1_R2.fq.gz
- class: File
path: /path/to/fastq_2_R2.fq.gz
sample_names: [foo, bar]
Is there any other way for me to design the workflow and / or input file such that each input group is sectioned together? Something like
paired_end_fastqs:
- fastq:
class: File
path: /path/to/fastq_1_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_1_R2.fq.gz
sample_name: foo
- fastq:
class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_2_R2.fq.gz
sample_name: bar
You could accomplish this with a nested workflow wrapper that maps a record to each individual field, and then using that workflow to scatter over an array of records. The workflow would look something like this:
---
class: Workflow
cwlVersion: v1.0
id: workflow
inputs:
- id: paired_end_fastqs
type:
type: record
name: paired_end_fastqs
fields:
- name: fastq
type: File
- name: fastq2
type: File
- name: sample_name
type: string
outputs: []
steps:
- id: tool
in:
- id: fastq
source: paired_end_fastqs
valueFrom: "$(self.fastq)"
- id: fastq2
source: paired_end_fastqs
valueFrom: "$(self.fastq2)"
- id: sample_name
source: paired_end_fastqs
valueFrom: "$(self.sample_name)"
out: []
run: "./tool.cwl"
requirements:
- class: StepInputExpressionRequirement
Specify a workflow input of type record, with fields for each of the inputs the tool accepts, the ones you want to keep together while scattering. Connect the workflow input to each tool input, its source. Using valueFrom on the step inputs, transform the record (self is the source this context) to pass only the appropriate field to the tool.
More about valueFrom in workflow steps: https://www.commonwl.org/v1.0/Workflow.html#WorkflowStepInput
Then use this wrapper in your actual workflow and scatter over paired_end_fastqs with an array of records.

how can i store in two index using two JSON formated log files using filebeat and output to elasticsearch

below is my configuration file for filebeat which is present in /etc/filebeat/filebeat.yml,
it throws an error of
Failed to publish events: temporary bulk send failure
filebeat.prospectors:
- paths:
- /var/log/nginx/virus123.log
input_type: log
fields:
type:virus123
json.keys_under_root: true
- paths:
- /var/log/nginx/virus1234.log
input_type: log
fields:
type:virus1234
json.keys_under_root: true
setup.template.name: "filebeat-%{[beat.version]}"
setup.template.pattern: "filebeat-%{[beat.version]}-*"
setup.template.overwrite: true
processors:
- drop_fields:
fields: ["beat","source"]
output.elasticsearch:
index: index: "filebeat-%{[beat.version]}-%{[fields.type]:other}-%{+yyyy.MM.dd}"
hosts: ["http://127.0.0.1:9200"]
I think I found your problem, Although i'm not sure it is the only problem
index: index: "filebeat-%{[beat.version]}-%{[fields.type]:other}-%{+yyyy.MM.dd}"
should be:
index: "filebeat-%{[beat.version]}-%{[fields.type]:other}-%{+yyyy.MM.dd}"
I saw a similar problem with a wrong index which cause the same error that you showed

Resources