Syntax troubles with dbt, Incorrect type. Expected "Seed configs" - syntax

I am learning dbt via the jaffle_shop project. I'm trying to set-up the raw data in a raw database, and jaffle_shop schema in order to call the relevant sources later on.
I am having a bit of trouble with my seeds config syntax within my dbt_project.yml, what am I doing wrong?
seeds:
- schema: jaffle_shop
- database: raw
- name: customers
config:
enabled: true
column_types:
id: integer
first_name: varchar(50)
last_name: varchar(50)
columns:
- name: id
tests:
- not_null
- unique
- name: first_name
- name: last_name
- name: orders
config:
enabled: true
column_types:
id: integer
user_id: integer
order_date: date
status: varchar(100)
columns:
- name: id
tests:
- not_null
- unique
- name: user_id
- name: order_date
- name: status
- name: payments
config:
enabled: true
column_types:
id: integer
order_id: integer
payment_method: varchar(100)
amount: integer
columns:
- name: id
tests:
- not_null
- unique
- name: user_id
- name: order_date
- name: status
I am working on VSCode, and using the extension dbt power user, perhaps it doesn't recognize the underlined config paramemters?
I tried writing it this way :
seeds:
+schema: jaffle_shop
which gives no errors until I add +database..
I searched the docs, but I don't see the discrepancy with what I wrote...

You're mixing together two different ways to provide configuration to dbt. See the docs for configuring seeds, and the more general page on the difference between configs and properties.
You can provide general config that applies to all seeds (or a subset of seeds in a specific directory) in your dbt_project.yml file. These configs use the + syntax and only include a subset of all configuration options. This works because it's a project-level config:
seeds:
+schema: jaffle_shop
As of v1.3, these are all of the possible seed configs:
seeds:
<resource-path>:
+quote_columns: true | false
+column_types: {column_name: datatype}
+enabled: true | false
+tags: <string> | [<string>]
+pre-hook: <sql-statement> | [<sql-statement>]
+post-hook: <sql-statement> | [<sql-statement>]
+database: <string>
+schema: <string>
+alias: <string>
+persist_docs: <dict>
+full_refresh: <boolean>
+meta: {<dictionary>}
+grants: {<dictionary>}
You can provide properties or config for an individual seed using what is now called a "property" file (formerly a schema.yml file). These property files can be named anything, as long as they are .yml files. It's usually a good idea to put a single resource's properties in a single file, put that file in the same directory as the resource definition, and name it my_resource.yml. Or you can group them together, with something like seeds.yml. It's these property files that use the syntax that you are trying to place in your dbt_project.yml file:
version: 2
seeds:
- name: customers
config:
enabled: true
column_types:
id: integer
first_name: varchar(50)
last_name: varchar(50)
columns:
- name: id
tests:
- not_null
- unique
- name: first_name
- name: last_name

Related

Redocly OpenAPI structure error. Property `openapi` is not expected here

I am trying to create an api documentation using redocly.
On my openapi.yaml it is linking to a yaml that has the api docs called kpi-documentation.yaml
link/$ref in openapi.yaml
/kpiDocumentation:
$ref: paths/kpi-documentation.yaml
I have an error in my visual studio code redocly preview extension that says
We found structural problems in your definition, please check the files below before running the preview.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `openapi` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `info` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `paths` is not expected here.
File: /Users/xx/Desktop/Projects/api-docs/openapi/paths/kpi-documentation.yaml
Problem: Property `components` is not expected here.
Part of code that I have in the kpi-documentation.yaml that appears to be throwing the error is
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
I have checked the documentation on the redocly website and it looks like my yaml structure is fine.
Also to note the kpi-documentation previews fine by itself when using the preview, but not when I preview the openapi.yaml which is the root file that needs to work.
https://redocly.com/docs/openapi-visual-reference/openapi/#OAS-3.0
rootfile
openapi.yaml
openapi: 3.1.0
info:
version: 1.0.0
title: KPI API documentation
termsOfService: 'https://example.com/terms/'
contact:
name: Brendan
url: 'http://example.com/contact'
license:
name: Apache 2.0
url: 'http://www.apache.org/licenses/LICENSE-2.0.html'
x-logo:
url: 'https://www.feedgy.solar/wp-content/uploads/2022/07/Sans-titre-1.png'
tags:
- name: Insert Service 1
description: Example echo operations.
- name: Insert Service 2
description: Operations about users.
- name: Insert Service 3
description: This is a tag description.
- name: Insert Service 4
description: This is a tag description.
servers:
- url: 'https://{tenant}/api/v1'
variables:
tenant:
default: www
description: Your tenant id
- url: 'https://example.com/api/v1'
paths:
'/users/{username}':
$ref: 'paths/users_{username}.yaml'
/echo:
$ref: paths/echo.yaml
/pathItem:
$ref: paths/path-item.yaml
/pathItemWithExamples:
$ref: paths/path-item-with-examples.yaml
/kpiDocumentation:
$ref: 'paths/kpi-documentation.yaml'
pathitem file
kpi-documentation.yaml
openapi: "3.1"
info:
title: KPI API
version: '1.0'
description: Documentation of API endpoints of KPI
servers:
- url: https://api.redocly.com
paths:
"/api/v1/corrected-performance-ratio/plants/{id}":
get:
summary: Same as the Performance Ratio, but the ratio is done using Corrected Reference Yield, so it considers thermal losses in the panels as normal. The WCPR represents the losses in the BoS (balance of system), so everything from the panel DC output to the AC output.
operationId: corrected_performance_ratio_plants_retrieve
parameters:
- in: query
name: date_end
schema:
type: string
format: date
required: true
- in: query
name: date_start
schema:
type: string
format: date
required: true
- in: query
name: frequency
schema:
enum:
- H
- D
- M
- Y
type: string
default: H
minLength: 1
- in: path
name: id
schema:
type: integer
description: A unique integer value identifying this plant.
required: true
- in: query
name: threshold
schema:
type: integer
default: 50
tags:
- corrected-performance-ratio
security:
- tokenAuth: []
- cookieAuth: []
- {}
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/KPIResponse"
description: ""
"/api/v1/corrected-performance-ratio/plants/{id}/inverters":

Ansible Fact - Parsing Ansible Fact Variable to Dictionary

I'm using Ansible
os_project_facts module to gather admin project id of OpenStack.
This is the ansible_fact log:
ansible_facts:
openstack_projects:
- description: Bootstrap project for initializing the cloud.
domain_id: default
enabled: true
id: <PROJECT_ID>
is_domain: false
is_enabled: true
location:
cloud: envvars
project:
domain_id: default
domain_name: null
id: default
name: null
region_name: null
zone: null
name: admin
options: {}
parent_id: default
properties:
options: {}
tags: []
tags: []
Apparently, this is not a dictionary, and I can't get openstack_projects.id since it is not a dictionary. How can I retrieve PROJECT_ID and use it in other tasks?
Since the openstack_projects facts contains single list element with a dictionary, we can use the array indexing method to get the id, i.e. openstack_projects[0]['id'].
You can use it directly, or use something like set_fact:
- name: get the project id
set_fact:
project_id: "{{ openstack_projects[0]['id'] }}"

YAML Can local variables be mixed with group-variables AND have their naming simplified?

This is my first time working with YAML, but I am running into an issue where it seems like if I want to include a variable-group (i.e., signing certificate password) with local pipeline-related variables then I cannot use the simplified naming convention where the variable's name and value can both be defined and set on the same line.
For example, wat I want is something similar to this (I made sure spacing is correct in the YAML):
variables:
solutionName: Foo.sln
projectName: Bar
buildPlatform: x64
buildConfiguration: development
major: '1'
minor: '0'
build: '0'
revision: $[counter('rev', 0)]
vhdxSize: '200'
- group: legacy-pipeline
signingCertPwd: $[variables.SigningCertificatePassword]
But, this results in a parsing error. As a result, I have to use a denser, but more bloated looking, format of:
variables:
- name: solutionName
value: Foo.sln
- name: projectName
value: Bar
- name: buildPlatform
value: x64
- name: buildConfiguration
value: development
- name: major
value: '1'
- name: minor
value: '0'
- name: build
value: '0'
- name: revision
value: $[counter('rev', 0)]
- name: vhdxSize
value: '200'
- group: legacy-pipeline
- name: signingCertPwd
value: $[variables.SigningCertificatePassword]
It seems like the simplified naming format is only available if I use it for local variables, but if I add a variable-group then the simplified format goes away. I have tried searching the web for a solution for this, but I am not able find anything useful for this. Is what I am trying to achieve possible or no? If yes, how can it be done?
Unfortunately mixing the styles is not possible, but you can work around that using templates:
# pipeline.yaml
stages:
- stage: declare_vars
variables:
- template: templates/vars.yaml
- group: my-group
- template: templates/inline-vars.yaml
parameters:
vars:
inline_var: yes!
and_more: why not
jobs:
- job:
steps:
- pwsh: |
echo 'foo=$(foo)'
echo 'bar=$(bar)'
echo 'var1=$(var1)'
echo 'inline_var=$(inline_var)'
# templates/vars.yaml
variables:
foo: bar
bar: something else
# templates/inline-vars.yaml
parameters:
- name: vars
type: object
default: {}
variables:
${{ each var in parameters.vars}}:
${{var.key}}: ${{var.value}}
templates/vars.yaml is just simply moving variables to another file.
templates/inline-vars.yaml lets you define inline variables using the denser syntax together with referencing groups, but there's additional ceremony of writing template:, parameters:, vars:.

how to take duplicate configurations out in filebeat.yaml

I have a list of inputs in filebeat, for example
- path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- path: /xxx/ss.log
enabled: true
type: log
fields:
topic: topic_2
so can I take the duplicate configs out as a reference variable? for example
- path: /xxx/xx.log
${vars}
fields:
topic: topic_1
- path: /xxx/ss.log
${vars}
fields:
topic: topic_2
You can use YAML's inheritance : your first input is used as a model, and the others can override parameters.
- &default-log
path: /xxx/xx.log
enabled: true
type: log
fields:
topic: topic_1
- <<: *default-log
path: /xxx/ss.log
fields:
topic: topic_2
AFAIK there is no way to define an "abstract" default, meaning your &default-log should be one of your inputs (not just an abstract model).
(YAML syntax verified with YAMLlint)

Different yaml inputs for CWL scatter

I have a commandline tool in cwl which can take the following input:
fastq:
class: File
path: /path/to/fastq_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_R2.fq.gz
sample_name: foo
Now I want to scatter over this commandline tool and the only way I can think to do it is with scatterMethod: dotproduct and an input of something like:
fastqs:
- class: File
path: /path/to/fastq_1_R1.fq.gz
- class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2s:
- class: File
path: /path/to/fastq_1_R2.fq.gz
- class: File
path: /path/to/fastq_2_R2.fq.gz
sample_names: [foo, bar]
Is there any other way for me to design the workflow and / or input file such that each input group is sectioned together? Something like
paired_end_fastqs:
- fastq:
class: File
path: /path/to/fastq_1_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_1_R2.fq.gz
sample_name: foo
- fastq:
class: File
path: /path/to/fastq_2_R1.fq.gz
fastq2:
class: File
path: /path/to/fastq_2_R2.fq.gz
sample_name: bar
You could accomplish this with a nested workflow wrapper that maps a record to each individual field, and then using that workflow to scatter over an array of records. The workflow would look something like this:
---
class: Workflow
cwlVersion: v1.0
id: workflow
inputs:
- id: paired_end_fastqs
type:
type: record
name: paired_end_fastqs
fields:
- name: fastq
type: File
- name: fastq2
type: File
- name: sample_name
type: string
outputs: []
steps:
- id: tool
in:
- id: fastq
source: paired_end_fastqs
valueFrom: "$(self.fastq)"
- id: fastq2
source: paired_end_fastqs
valueFrom: "$(self.fastq2)"
- id: sample_name
source: paired_end_fastqs
valueFrom: "$(self.sample_name)"
out: []
run: "./tool.cwl"
requirements:
- class: StepInputExpressionRequirement
Specify a workflow input of type record, with fields for each of the inputs the tool accepts, the ones you want to keep together while scattering. Connect the workflow input to each tool input, its source. Using valueFrom on the step inputs, transform the record (self is the source this context) to pass only the appropriate field to the tool.
More about valueFrom in workflow steps: https://www.commonwl.org/v1.0/Workflow.html#WorkflowStepInput
Then use this wrapper in your actual workflow and scatter over paired_end_fastqs with an array of records.

Resources