How can we generate multiple output file in benthos? - etl

Input Data:
{ "name": "Coffee", "price": "$1.00" }
{ "name": "Tea", "price": "$2.00" }
{ "name": "Coke", "price": "$3.00" }
{ "name": "Water", "price": "$4.00" }
extension.yaml
input:
label: ""
file:
paths: [./input/*]
codec: lines
max_buffer: 1000000
delete_on_finish: false
pipeline:
processors:
- bloblang: |
root.name = this.name.uppercase()
root.price = this.price
output:
file:
path: "result/file1.log"
codec: lines
I want to create an output file based on the input and defined processor for our need.

extension.yaml
input:
label: ""
file:
paths: [./input/*]
codec: lines
max_buffer: 1000000
delete_on_finish: false
pipeline:
processors:
- bloblang: |
root.name = this.name.uppercase()
root.price = this.price
output:
broker:
pattern: fan_out
outputs:
- file:
path: "result/file1.log"
codec: lines
- file:
path: "result/file2.log"
codec: lines
processors:
- bloblang: |
root.name = this.name.lowercase()
It will generate two output files "file1.log" & file2.log".
file1.log
{"name":"COFFEE","price":"$1.00"}
{"name":"TEA","price":"$2.00"}
{"name":"COKE","price":"$3.00"}
{"name":"WATER","price":"$4.00"}
file2.log
{"name":"coffee"}
{"name":"tea"}
{"name":"coke"}
{"name":"water"}
Here we can use something called 'Broker' in Benthos as shown in the above example.
It allows you to route messages to multiple child outputs using a range of brokering patterns.
You can refer to this link for more information: Broker

Related

ARM Incremental deployment mode

Consider I have an Azure Function App with some app settings and some functions created via the following YAML task:
- task: AzureResourceManagerTemplateDeployment#3
displayName: 'deploy resources'
inputs:
azureResourceManagerConnection: azureResourceManagerConnection
subscriptionId:subscriptionId
resourceGroupName: rg
location:location
csmFile: template.bicep
csmParametersFile:parameters.json
deploymentMode: 'Incremental'
Let’s say I modified app settings.
If I run the above YAML task in Incremental deployment mode, does that recreate the Azure Function App with the modified app settings and functions or does it just update the app settings - Function app and existing functions stay the same?
Here is how my template.bicep looks like:
var appSettings = [
{
name: 'WEBSITE_ADD_SITENAME_BINDINGS_IN_APPHOST_CONFIG'
value: 1
}
{
name: 'WEBSITE_ENABLE_SYNC_UPDATE_SITE'
value: 1
}
{
name: 'SCM_TOUCH_WEBCONFIG_AFTER_DEPLOYMENT'
value: 0
}
{
name: 'APPINSIGHTS_INSTRUMENTATIONKEY'
value: reference(resourceId(appInsightsResourceGroup, 'microsoft.insights/components/', appInsightsName), '2015-05-01').InstrumentationKey
}
{
name: 'AzureWebJobsStorage'
value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccountName};AccountKey=${listKeys(resourceId(storageAccountResourceGroup, 'Microsoft.Storage/storageAccounts', storageAccountName), providers('Microsoft.Storage', 'storageAccounts').apiVersions[0]).keys[0].value}'
}
{
name: 'WEBSITE_CONTENTAZUREFILECONNECTIONSTRING'
value: 'DefaultEndpointsProtocol=https;AccountName=${storageAccountName};AccountKey=${listKeys(resourceId(storageAccountResourceGroup, 'Microsoft.Storage/storageAccounts', storageAccountName), providers('Microsoft.Storage', 'storageAccounts').apiVersions[0]).keys[0].value}'
}
{
name: 'FUNCTIONS_WORKER_RUNTIME'
value: 'dotnet'
}
{
name: 'FUNCTIONS_EXTENSION_VERSION'
value: '~4'
}
{
name: 'WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT'
value: maximumElasticWorkerCount
}
{
name: 'appConfigurationEndpoint'
value: appConfigurationEndpoint
}
{
name: 'membershipStorageAccountName'
value: '#Microsoft.KeyVault(SecretUri=${reference(membershipStorageAccountName, '2019-09-01').secretUriWithVersion})'
}
{
name: 'membershipContainerName'
value: '#Microsoft.KeyVault(SecretUri=${reference(membershipContainerName, '2019-09-01').secretUriWithVersion})'
}
]
var stagingSettings = [
{
name: 'WEBSITE_CONTENTSHARE'
value: toLower('functionApp-staging')
}
{
name: 'AzureFunctionsJobHost__extensions__durableTask__hubName'
value: '${solutionAbbreviation}compute${environmentAbbreviationStaging}'
}
{
name: 'AzureWebJobs.StarterFunction.Disabled'
value: 1
}
]
var productionSettings = [
{
name: 'WEBSITE_CONTENTSHARE'
value: toLower('functionApp')
}
{
name: 'AzureFunctionsJobHost__extensions__durableTask__hubName'
value: '${solutionAbbreviation}compute${environmentAbbreviation}'
}
{
name: 'AzureWebJobs.StarterFunction.Disabled'
value: 0
}
]
module functionAppTemplate 'functionApp.bicep' = {
name: 'functionAppTemplater'
params: {
name: '${functionAppName}'
kind: functionAppKind
location: location
servicePlanName: servicePlanName
dataKeyVaultName: dataKeyVaultName
dataKeyVaultResourceGroup: dataKeyVaultResourceGroup
secretSettings: union(appSettings, productionSettings)
}
dependsOn: [
servicePlanTemplate
]
}
module functionAppSlotTemplate 'functionAppSlot.bicep' = {
name: 'functionAppSlotTemplate'
params: {
name: '${functionAppName}/staging'
kind: functionAppKind
location: location
servicePlanName: servicePlanName
dataKeyVaultName: dataKeyVaultName
dataKeyVaultResourceGroup: dataKeyVaultResourceGroup
secretSettings: union(appSettings, stagingSettings)
}
dependsOn: [
functionAppTemplate
]
}
functionapp.bicep
resource functionApp 'Microsoft.Web/sites#2018-02-01' = {
name: name
location: location
kind: kind
properties: {
serverFarmId: resourceId('Microsoft.Web/serverfarms', servicePlanName)
clientAffinityEnabled: false
httpsOnly: true
siteConfig: {
use32BitWorkerProcess : false
appSettings: secretSettings
}
}
identity: {
type: 'SystemAssigned'
}
}
functionAppSlot.bicep
resource functionAppSlot 'Microsoft.Web/sites/slots#2018-11-01' = {
name: name
kind: kind
location: location
properties: {
clientAffinityEnabled: true
enabled: true
httpsOnly: true
serverFarmId: resourceId('Microsoft.Web/serverfarms', servicePlanName)
siteConfig: {
use32BitWorkerProcess : false
appSettings: secretSettings
}
}
identity: {
type: 'SystemAssigned'
}
}
Your Function App will not be recreated. Only the state or values which does not match the desired state are changed.
This means, if you change a setting e.g. FUNCTIONS_EXTENSION_VERSION in the bicep file, it will be changed in the Function App settings.
https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/overview
Repeatable results: Repeatedly deploy your infrastructure throughout the development lifecycle and have confidence your resources are deployed in a consistent manner. Templates are idempotent, which means you can deploy the same template many times and get the same resource types in the same state. You can develop one template that represents the desired state, rather than developing lots of separate templates to represent updates.

Is it possible to push an image to AWS ECR using Google CloudBuild?

The following snippet (Node/Typescript) utilizes Google's CloudBuild API (v1) to build a container and push to Google's Container Registry (GCR). If it is possible, what's the right way to have CloudBuild push the image to AWS ECR instead of GCR?
import { cloudbuild_v1 } from "googleapis";
[...]
const manifestLocation = `gs://${manifestFile.bucket}/${manifestFile.fullpath}`;
const buildDestination = `gcr.io/${GOOGLE_PROJECT_ID}/xxx:yyy`;
const result = await builds.create({
projectId: GOOGLE_PROJECT_ID,
requestBody: {
steps: [
{
name: 'gcr.io/cloud-builders/gcs-fetcher',
args: [
'--type=Manifest',
`--location=${manifestLocation}`
]
},
{
name: 'docker',
args: ['build', '-t', buildDestination, '.'],
}
],
images: [buildDestination]
}
})```
Yes you can by setting a custom step where you do that.
for this you can have a step with the docker image that makes the build and pushes it to AWS ECR.
steps:
- name: 'gcr.io/cloud-builders/docker'
args: [ 'build', '-t', '<AWS_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/<IMAGE_NAME>', '.' ]
Here is a guide on how to use cludbuild which can be useful to you.
BAsically on your usecase you can just change the value of destination to the AWS ECR URL like this:
import { cloudbuild_v1 } from "googleapis";
[...]
const manifestLocation = `gs://${manifestFile.bucket}/${manifestFile.fullpath}`;
const buildDestination = `<AWS_ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/<IMAGE_NAME>`;
const result = await builds.create({
projectId: GOOGLE_PROJECT_ID,
requestBody: {
steps: [
{
name: 'gcr.io/cloud-builders/gcs-fetcher',
args: [
'--type=Manifest',
`--location=${manifestLocation}`
]
},
{
name: 'docker',
args: ['build', '-t', buildDestination, '.'],
}
],
images: [buildDestination]
}
})```

TYPO3: CKEditor removes some html tags (e.g. strong, h4)

When I add html content to CKEditor (source-code mode) and then save the html content, some tags are removed - for example <strong> or <h4>.
I am using the default YAML Konfiguration and add my own one:
# EXT:my_ext/Configuration/RTE/Default.yaml
imports:
# Import default RTE config (for example)
- { resource: "EXT:rte_ckeditor/Configuration/RTE/Processing.yaml" }
- { resource: "EXT:rte_ckeditor/Configuration/RTE/Editor/Base.yaml" }
- { resource: "EXT:rte_ckeditor/Configuration/RTE/Full.yaml" }
# Import the image plugin configuration
- { resource: "EXT:rte_ckeditor_image/Configuration/RTE/Plugin.yaml" }
editor:
config:
# RTE default config removes image plugin - restore it:
removePlugins: null
removeButtons:
- Anchor
extraAllowedContent: 'a[onclick]'
toolbarGroups:
- { name: basicstyles, groups: [ basicstyles, align, cleanup ] }
- { name: styles }
stylesSet:
- { name: "Rote Schrift", element: "span", attributes: { class: "highlighted red"} }
- { name: "Button", element: "a", attributes: { class: "btn"} }
- { name: "Checkliste", element: "ul", attributes: { class: "check-list"} }
toolbarGroups:
- { name: links, groups: ['MenuLink', 'Unlink', 'Anchor'] }
externalPlugins:
typo3image: { allowedExtensions: "jpg,jpeg,png,gif,svg" }
typo3link: { resource: "EXT:rte_ckeditor/Resources/Public/JavaScript/Plugins/typo3link.js", route: "rteckeditor_wizard_browse_links" }
processing:
HTMLparser_db:
denyTags: null
Furthermore I have the following TS page config (not sure if this is used by TYPO3 - it's a setup of the old RTE editor):
RTE.default.enableWordClean.HTMLparser {
allowTags = a,b,blockquote,br,div,em,h2,h3,h4,h5,h6,hr,i,img,li,ol,p,span,strike,strong, ...
I finally found the solution, by comparing my Custom.yaml with typo3\sysext\rte_ckeditor\Configuration\RTE\Full.yaml
To allow more tags I had to add the new tags in the following sections of my Yaml file:
editor:
config:
allowTags:
- link
- strong
- h4
processing:
allowTags:
- link
- strong
- h4

how to speed up docpad rendering?

I am changing only a couple of pages but docpad seems to render everything again. I'm not using any fancy plugins or dynamic components - just the basic ghost template. Are there some techniques to make less pages render?
maybe its something to do with the timestamp format in the docpad.coffee ?
moment = require('moment')
docpadConfig = {
templateData:
vars:
appserver: 'http://xxx'
site:
title: 'Pocket Tutor'
tagline: 'English language chat tutor'
description: 'Learn english by chatting'
logo: '/uploads/images/corpid/comiceng/96/logo-96.png'
url: 'http://app:9005'
cover: '/img/cover.jpg'
navigation: [
{
name: 'Home',
href: '/',
section: 'home'
},
{
name: 'About',
href: '/about.html',
section: 'about'
},
{
name: 'Lessons',
href: '/tags/lessons.html',
section: 'tag-lessons'
},
{
name: 'Grammar',
href: '/tags/grammar.html',
section: 'tag-grammar'
}
{
name: 'Teachers',
href: '/tags/tech.html',
section: 'tag-tech'
},
]
author:
name: 'Rikai Labs'
img: ''
url: 'http://rikai.co'
website: 'http://RIKAI.co'
location: 'space',
bio: 'we build chat apps'
getPreparedTitle: -> if #document.title then "#{#document.title} | #{#site.title}" else #site.title
getDescription: -> if #document.description then "#{#document.description} | #{#site.description}" else #site.description
bodyClass: -> if #document.isPost then "post-template" else "home-template"
masthead: (d) ->
d = d || #document
if d.cover then d.cover else #site.cover
isCurrent: (l) ->
if #document.section is l.section then ' nav-current'
else if #document.url is l.href then ' nav-current'
else ''
excerpt: (p,w) ->
w = w || 26
if p.excerpt then p.excerpt else p.content.replace(/<%.+%>/gi, '').split(' ').slice(0, w).join(' ')
encode: (s) -> encodeURIComponent(s)
slug: (s) -> return s.toLowerCase().replace(' ', '-')
currentYear: -> new Date().getFullYear()
time: (ts, format) ->
format = format || 'MMMM DO, YYYY'
ts = new Date(ts) || new Date()
moment(ts).format(format)
collections:
posts: ->
#getCollection("html").findAllLive({active:true, isPost: true, isPagedAuto: {$ne: true}}, {postDate: -1}).on "add", (model) ->
model.setMetaDefaults({layout:"post"})
plugins:
tags:
extension: '.html'
injectDocumentHelper: (doc) ->
doc.setMeta { layout: 'tag' }
rss:
default:
collection: 'posts'
url: '/rss.xml'
marked:
gfm: true
environments: # default
development: # default
# Always refresh from server
maxAge: false # default
# Only do these if we are running standalone via the `docpad` executable
checkVersion: process.argv.length >= 2 and /docpad$/.test(process.argv[1]) # default
welcome: process.argv.length >= 2 and /docpad$/.test(process.argv[1]) # default
prompts: process.argv.length >= 2 and /docpad$/.test(process.argv[1]) # default
# Listen to port 9005 on the development environment
port: 9005 # example
production:
port: 9005
maxAge: false # default
}
module.exports = docpadConfig
update: stubbing out the date and time methods
time: -> 'time'
currentYear: -> 'year'
gives a little speed up but still making one edit to one file gives info:
Generated 40/150 files in 7.364 seconds
update2: added
standalone: true
to some pages to test, but still takes
info: Generated 40/150 files in 7.252 seconds
so even a single standalone page triggers a bunch of other stuff.
It is possible to handle the Docpad regeneration process manually. To do this you need to turn off Docpad's watch. That is, run Docpad with the docpad server command. What will happen here is that it doesn't matter how many times you edit a document it will not be loaded into the docpad collection. You will then have to load any updates manually. That is, have some code to load the document.
model = #docpad.getCollection('posts').findOne({someValue: someValue})
model.load()
Following that trigger the regeneration.
#docpad.action 'generate', reset: false, (err) ->
if err
#docpad.log "warn", "GENERATE ERROR"
This is what I do in my posteditor plugin
I suspect this is not really what you are asking for but it does give you full control over the regeneration process.

How to correct this YAML Format Error?

I have been working on defining a new YAML document, but when trying to process the file, I receive the following error from yamllint:
>syntax error on line 3, col 10: ` suites: '
and the following error in PyCharm when running tests:
ScannerError: mapping values are not allowed here
in "<string>", line 2, column 11:
name: testFirstNameLower
for the following code:
DataMart\Users:
name: testFirstNameLower
suites:
- suite: dataMart
- suite: userDim
dataset:
source: etlUnitTest
table: users
It looks like it is formatted correctly, but I don't know what I'm doing wrong...
If your DataMart\Users is supposed to contain a sequence of users, with each user having a name, sequence of suites, and a dataset, you're just doing a little too much indenting, and aren't handling each user as a series. (This online parser is typically what I use when handling yaml.)
Try this instead:
DataMart\Users:
- name: testFirstNameLower
suites:
- suite: dataMart
- suite: userDim
dataset:
source: etlUnitTest
table: users
...which corresponds to the following json:
{
"DataMart\\Users": [
{
"name": "testFirstNameLower",
"suites": [
{
"suite": "dataMart"
},
{
"suite": "userDim"
}
],
"dataset": {
"source": "etlUnitTest",
"table": "users"
}
}
]
}
Here's some yaml with a second example user added:
DataMart\Users:
- name: testFirstNameLower
suites:
- suite: dataMart
- suite: userDim
dataset:
source: etlUnitTest
table: users
- name: secondname
suites:
- suite: secondDataMart
- suite: secondUserDim
dataset:
source: secondEtlUnitTest
table: secondUsers

Resources