Different yaml inputs for CWL scatter - yaml

I have a commandline tool in cwl which can take the following input:
fastq:
class: File
path: /path/to/fastq_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_R2.fq.gz
sample_name: foo
Now I want to scatter over this commandline tool and the only way I can think to do it is with scatterMethod: dotproduct and an input of something like:
fastqs:
- class: File
path: /path/to/fastq_1_R1.fq.gz
- class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2s:
- class: File
path: /path/to/fastq_1_R2.fq.gz
- class: File
path: /path/to/fastq_2_R2.fq.gz
sample_names: [foo, bar]
Is there any other way for me to design the workflow and / or input file such that each input group is sectioned together? Something like
paired_end_fastqs:
- fastq:
class: File
path: /path/to/fastq_1_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_1_R2.fq.gz
sample_name: foo
- fastq:
class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_2_R2.fq.gz
sample_name: bar

You could accomplish this with a nested workflow wrapper that maps a record to each individual field, and then using that workflow to scatter over an array of records. The workflow would look something like this:
---
class: Workflow
cwlVersion: v1.0
id: workflow
inputs:
- id: paired_end_fastqs
type:
type: record
name: paired_end_fastqs
fields:
- name: fastq
type: File
- name: fastq2
type: File
- name: sample_name
type: string
outputs: []
steps:
- id: tool
in:
- id: fastq
source: paired_end_fastqs
valueFrom: "$(self.fastq)"
- id: fastq2
source: paired_end_fastqs
valueFrom: "$(self.fastq2)"
- id: sample_name
source: paired_end_fastqs
valueFrom: "$(self.sample_name)"
out: []
run: "./tool.cwl"
requirements:
- class: StepInputExpressionRequirement
Specify a workflow input of type record, with fields for each of the inputs the tool accepts, the ones you want to keep together while scattering. Connect the workflow input to each tool input, its source. Using valueFrom on the step inputs, transform the record (self is the source this context) to pass only the appropriate field to the tool.
More about valueFrom in workflow steps: https://www.commonwl.org/v1.0/Workflow.html#WorkflowStepInput
Then use this wrapper in your actual workflow and scatter over paired_end_fastqs with an array of records.

Related

Redocly OpenAPI structure error. Property `openapi` is not expected here

I am trying to create an api documentation using redocly.
On my openapi.yaml it is linking to a yaml that has the api docs called kpi-documentation.yaml
link/$ref in openapi.yaml
/kpiDocumentation:
$ref: paths/kpi-documentation.yaml
I have an error in my visual studio code redocly preview extension that says
We found structural problems in your definition, please check the files below before running the preview.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `openapi` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `info` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `paths` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `components` is not expected here.
Part of code that I have in the kpi-documentation.yaml that appears to be throwing the error is
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
I have checked the documentation on the redocly website and it looks like my yaml structure is fine.
Also to note the kpi-documentation previews fine by itself when using the preview, but not when I preview the openapi.yaml which is the root file that needs to work.
https://redocly.com/docs/openapi-visual-reference/openapi/#OAS-3.0
rootfile
openapi.yaml
openapi: 3.1.0
info:
version: 1.0.0
title: KPI API documentation
termsOfService: 'https://example.com/terms/'
contact:
name: Brendan
url: 'http://example.com/contact'
license:
name: Apache 2.0
url: 'http://www.apache.org/licenses/LICENSE-2.0.html'
x-logo:
url: 'https://www.feedgy.solar/wp-content/uploads/2022/07/Sans-titre-1.png'
tags:
- name: Insert Service 1
description: Example echo operations.
- name: Insert Service 2
description: Operations about users.
- name: Insert Service 3
description: This is a tag description.
- name: Insert Service 4
description: This is a tag description.
servers:
- url: 'https://{tenant}/api/v1'
variables:
tenant:
default: www
description: Your tenant id
- url: 'https://example.com/api/v1'
paths:
'/users/{username}':
$ref: 'paths/users_{username}.yaml'
/echo:
$ref: paths/echo.yaml
/pathItem:
$ref: paths/path-item.yaml
/pathItemWithExamples:
$ref: paths/path-item-with-examples.yaml
/kpiDocumentation:
$ref: 'paths/kpi-documentation.yaml'
pathitem file
kpi-documentation.yaml
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
"/api/v1/corrected-performance-ratio/plants/{id}":
get:
summary: Same as the Performance Ratio, but the ratio is done using Corrected Reference Yield, so it considers thermal losses in the panels as normal. The WCPR represents the losses in the BoS (balance of system), so everything from the panel DC output to the AC output.
operationId: corrected_performance_ratio_plants_retrieve
parameters:
- in: query
name: date_end
schema:
type: string
format: date
required: true
- in: query
name: date_start
schema:
type: string
format: date
required: true
- in: query
name: frequency
schema:
enum:
- H
- D
- M
- Y
type: string
default: H
minLength: 1
- in: path
name: id
schema:
type: integer
description: A unique integer value identifying this plant.
required: true
- in: query
name: threshold
schema:
type: integer
default: 50
tags:
- corrected-performance-ratio
security:
- tokenAuth: []
- cookieAuth: []
- {}
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/KPIResponse"
description: ""
"/api/v1/corrected-performance-ratio/plants/{id}/inverters":

Syntax troubles with dbt, Incorrect type. Expected "Seed configs"

I am learning dbt via the jaffle_shop project. I'm trying to set-up the raw data in a raw database, and jaffle_shop schema in order to call the relevant sources later on.
I am having a bit of trouble with my seeds config syntax within my dbt_project.yml, what am I doing wrong?
seeds:
- schema: jaffle_shop
- database: raw
- name: customers
config:
enabled: true
column_types:
id: integer
first_name: varchar(50)
last_name: varchar(50)
columns:
- name: id
tests:
- not_null
- unique
- name: first_name
- name: last_name
- name: orders
config:
enabled: true
column_types:
id: integer
user_id: integer
order_date: date
status: varchar(100)
columns:
- name: id
tests:
- not_null
- unique
- name: user_id
- name: order_date
- name: status
- name: payments
config:
enabled: true
column_types:
id: integer
order_id: integer
payment_method: varchar(100)
amount: integer
columns:
- name: id
tests:
- not_null
- unique
- name: user_id
- name: order_date
- name: status
I am working on VSCode, and using the extension dbt power user, perhaps it doesn't recognize the underlined config paramemters?
I tried writing it this way :
seeds:
+schema: jaffle_shop
which gives no errors until I add +database..
I searched the docs, but I don't see the discrepancy with what I wrote...
You're mixing together two different ways to provide configuration to dbt. See the docs for configuring seeds, and the more general page on the difference between configs and properties.
You can provide general config that applies to all seeds (or a subset of seeds in a specific directory) in your dbt_project.yml file. These configs use the + syntax and only include a subset of all configuration options. This works because it's a project-level config:
seeds:
+schema: jaffle_shop
As of v1.3, these are all of the possible seed configs:
seeds:
<resource-path>:
+quote_columns: true | false
+column_types: {column_name: datatype}
+enabled: true | false
+tags: <string> | [<string>]
+pre-hook: <sql-statement> | [<sql-statement>]
+post-hook: <sql-statement> | [<sql-statement>]
+database: <string>
+schema: <string>
+alias: <string>
+persist_docs: <dict>
+full_refresh: <boolean>
+meta: {<dictionary>}
+grants: {<dictionary>}
You can provide properties or config for an individual seed using what is now called a "property" file (formerly a schema.yml file). These property files can be named anything, as long as they are .yml files. It's usually a good idea to put a single resource's properties in a single file, put that file in the same directory as the resource definition, and name it my_resource.yml. Or you can group them together, with something like seeds.yml. It's these property files that use the syntax that you are trying to place in your dbt_project.yml file:
version: 2
seeds:
- name: customers
config:
enabled: true
column_types:
id: integer
first_name: varchar(50)
last_name: varchar(50)
columns:
- name: id
tests:
- not_null
- unique
- name: first_name
- name: last_name

YAML Can local variables be mixed with group-variables AND have their naming simplified?

This is my first time working with YAML, but I am running into an issue where it seems like if I want to include a variable-group (i.e., signing certificate password) with local pipeline-related variables then I cannot use the simplified naming convention where the variable's name and value can both be defined and set on the same line.
For example, wat I want is something similar to this (I made sure spacing is correct in the YAML):
variables:
solutionName: Foo.sln
projectName: Bar
buildPlatform: x64
buildConfiguration: development
major: '1'
minor: '0'
build: '0'
revision: $[counter('rev', 0)]
vhdxSize: '200'
- group: legacy-pipeline
signingCertPwd: $[variables.SigningCertificatePassword]
But, this results in a parsing error. As a result, I have to use a denser, but more bloated looking, format of:
variables:
- name: solutionName
value: Foo.sln
- name: projectName
value: Bar
- name: buildPlatform
value: x64
- name: buildConfiguration
value: development
- name: major
value: '1'
- name: minor
value: '0'
- name: build
value: '0'
- name: revision
value: $[counter('rev', 0)]
- name: vhdxSize
value: '200'
- group: legacy-pipeline
- name: signingCertPwd
value: $[variables.SigningCertificatePassword]
It seems like the simplified naming format is only available if I use it for local variables, but if I add a variable-group then the simplified format goes away. I have tried searching the web for a solution for this, but I am not able find anything useful for this. Is what I am trying to achieve possible or no? If yes, how can it be done?
Unfortunately mixing the styles is not possible, but you can work around that using templates:
# pipeline.yaml
stages:
- stage: declare_vars
variables:
- template: templates/vars.yaml
- group: my-group
- template: templates/inline-vars.yaml
parameters:
vars:
inline_var: yes!
and_more: why not
jobs:
- job:
steps:
- pwsh: |
echo 'foo=$(foo)'
echo 'bar=$(bar)'
echo 'var1=$(var1)'
echo 'inline_var=$(inline_var)'
# templates/vars.yaml
variables:
foo: bar
bar: something else
# templates/inline-vars.yaml
parameters:
- name: vars
type: object
default: {}
variables:
${{ each var in parameters.vars}}:
${{var.key}}: ${{var.value}}
templates/vars.yaml is just simply moving variables to another file.
templates/inline-vars.yaml lets you define inline variables using the denser syntax together with referencing groups, but there's additional ceremony of writing template:, parameters:, vars:.

how to take duplicate configurations out in filebeat.yaml

I have a list of inputs in filebeat, for example
- path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- path: /xxx/ss.log
enabled: true
type: log
fields:
topic: topic_2
so can I take the duplicate configs out as a reference variable? for example
- path: /xxx/xx.log
${vars}
fields:
topic: topic_1
- path: /xxx/ss.log
${vars}
fields:
topic: topic_2
You can use YAML's inheritance : your first input is used as a model, and the others can override parameters.
- &default-log
path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- <<: *default-log
path: /xxx/ss.log
fields:
topic: topic_2
AFAIK there is no way to define an "abstract" default, meaning your &default-log should be one of your inputs (not just an abstract model).
(YAML syntax verified with YAMLlint)

Use globbed string in YAML

I am looking for a way to dynamically set the key using the path of the file below.
For example if I have this YAML:
prospectors.config:
- fields:
queue_name: <somehow get the globbed string below in here>
paths:
- /var/log/casino/*.log
type: log
output.redis:
hosts:
- "producer:6379"
key: "%{[fields.queue_name]}"
And then I had a file called /var/log/casino/test.log, then key would become test.
Im not sure that what you want is possible.
You could use the source field and configure your Redis output using that as the key:
output.redis:
hosts:
- "producer:6379"
key: "%{source}"
This would have the disadvantage of being the absolute path of the source file, not the basename as your question asks for.
If you have a small number of possible basename patterns, and want a queue for each. For example, you have files:
/common/path/test-1.log
/common/path/foo-0.log
/common/path/01-bar.log
/common/path/test-3.log
...
and wanted to have three queues in redis test, foo and bar you could use the source field and the conditionals available in the keys configuration of redis output something like this
output.redis:
hosts:
- "producer:6379"
key: "default_key"
keys:
- key: "test_key"
when.contains:
source: "test"
- key: "foo_key"
when.contains:
source: "foo"
- key: "bar_key"
when.contains:
source: "bar"

Resources