Dynamic UDF in Apache Drill cluster - user-defined-functions

I have drill cluster, with 4 drillbits (drill 1.14). But I can not use dynamic UDF feature in cluster for some kind of reason. Every time, I was confronting with troubles.
Let me present 2 scenarios:
Scenario 1
Here is the config (configs are same for all drillbits):
drill.exec: {
cluster-id: "drill-test",
zk: {
connect: "vm29.local:2181,vm32.local:2181,vm39.local:2181",
root: "drill"
},
sys.store.provider.zk.blobroot: "hdfs://vm29.local:9000/apps/drill/pstore/",
http: {
enabled: true,
ssl_enabled: false,
port: 8047
session_max_idle_secs: 3600, # Default value 1hr
cors: {
enabled: true,
allowedOrigins: ["*"],
allowedMethods: ["GET", "POST", "HEAD", "OPTIONS"],
allowedHeaders: ["X-Requested-With", "Content-Type", "Accept", "Origin"],
}
}
}
drill.exec.udf: {
retry-attempts: 5,
directory: {
fs: "hdfs://vm29.local:9000/",
root: "/drill",
base: "/udf",
local: ${drill.exec.udf.directory.base}"/local",
staging: ${drill.exec.udf.directory.base}"/staging",
registry: ${drill.exec.udf.directory.base}"/registry",
tmp: ${drill.exec.udf.directory.base}"/tmp"
}
}
As You see, I use hdfs for UDF in that scenario.
When I put jar files into 'staging' folder, and run 'CREATE FUNCTION USING JAR' - it registers function successfully. BUT then I can use it only on drillbit where I registered it.
For example if I ran command in web UI in vm29 - I can use function only in vm29.
If in additional, I try to register jar in different drillbit - I get 'already registered' error - but can not use it.(not found error)
JAR files present in hdfs://vm29.local:9000/drill/udf/registry and metadata in ZK registry.
Scenario 2
Config the same, only with difference - all drillbits use their local filesystem for UDF folder.
In that case - I can register/unregister function - but I can not use it on every drillbit (not found error). Jar files present in /UDF/registry folder, and metadata in zk registry - but do not work.
What am I doing wrong?
I can not found any description of step-by-step instruction, about using Dynamic UDF feature in cluster. Maybe You know one?
Thanks.
updated:
I just thought:
I use web console for queries. Maybe it has difference - create function through web console or jdbc:zk connection? (I will test)
Cause & Results
This is a bug in drill 1.14
Was reported in Drill Jira
Fix with explanation: Drill GitHub repository

This is a regression since 1.13, we have opened a Jira ticket - https://issues.apache.org/jira/browse/DRILL-6762. Meanwhile, you can add custom udfs manually - https://drill.apache.org/docs/manually-adding-custom-functions-to-drill/.

Related

How to specify pipeline for Filebeat Nginx module?

I have web server (Ubuntu) with Nginx + PHP.
It has Filebeat, which sends Nginx logs to Elastic ingestion node directly (no Logstash or anything else).
When I just installed it 1st time, I made some customizations to the pipeline, which Filebeat created.
Everything worked great for a month or so.
But I noticed, that every Filebeat upgrade result in the creation of new pipeline. Currently I have these:
filebeat-7.3.1-nginx-error-pipeline: {},
filebeat-7.4.1-nginx-error-pipeline: {},
filebeat-7.2.0-nginx-access-default: {},
filebeat-7.3.2-nginx-error-pipeline: {},
filebeat-7.4.1-nginx-access-default: {},
filebeat-7.3.1-nginx-access-default: {},
filebeat-7.3.2-nginx-access-default: {},
filebeat-7.2.0-nginx-error-pipeline: {}
I can create new pipeline, but how do I tell (how to configure) Filebeat to use specific pipeline?
Here is what I tried and it doesn't work:
- module: nginx
# Access logs
access:
enabled: true
# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/var/log/nginx/*/*access.log"]
# Convert the timestamp to UTC
var.convert_timezone: true
# The Ingest Node pipeline ID associated with this input. If this is set, it
# overwrites the pipeline option from the Elasticsearch output.
output.elasticsearch.pipeline: 'filebeat-nginx-access-default'
pipeline: 'filebeat-nginx-access-default
It still using filebeat-7.4.1-nginx-error-pipeline pipeline.
Here is Filebeat instructions on how to configure it (but I can't make it work):
https://github.com/elastic/beats/blob/7.4/filebeat/filebeat.reference.yml#L1129-L1130
Question:
how can I configure Filebeat module to use specific pipeline?
Update (Nov 2019): I submitted related bug: https://github.com/elastic/beats/issues/14348
In beats source code, I found that the pipeline ID is settled by the following params:
beats version
module name
module's fileset name
pipeline filename
the source code snippet is as following:
// formatPipelineID generates the ID to be used for the pipeline ID in Elasticsearch
func formatPipelineID(module, fileset, path, beatVersion string) string {
return fmt.Sprintf("filebeat-%s-%s-%s-%s", beatVersion, module, fileset, removeExt(filepath.Base(path)))
}
So you cannot assign the pipeline ID, which needs the support of elastic officially.
For now, the pipeline ID is changed along with the four params. You MUST change the pipeline ID in elasticsearch when you upgrading beats.
Refer /{filebeat-HOME}/module/nginx/access/manifest.yml,
maybe u should set ingest_pipeline in /{filebeat-HOME}/modules.d/nginx.yml.
the value seems like a local file.
The pipeline can be configured either in your input or output configuration, not in the modules one.
So in your configuration you have different sections, the one you show in your question is for configuring the nginx module. You need to open filebeat.yml and look for the output section where you have configured elasticsearch and put the pipeline configuration there:
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["elk.slavikf.com:9200"]
pipeline: filebeat-nginx-access-default
If you need to be able to use different pipelines depending on the nature of data you can definitely do so using pipeline mappings:
output.elasticsearch:
hosts: ["elk.slavikf.com:9200"]
pipelines:
- pipeline: "nginx_pipeline"
when.contains:
type: "nginx"
- pipeline: "apache_pipeline"
when.contains:
type: "apache"

Serverless - Lambda Layers "Cannot find module 'request'"

When I deploy my serverless api using:
serverless deploy
The lambda layer gets created but when I go to run the function is gives me this error:
"Cannot find module 'request'"
But if I upload the .zip file manually through the console (the exactly same file thats uploaded when I deploy), it works fine.
Any one have any idea why this is happening?
environment:
SLS_DEBUG: "*"
provider:
name: aws
runtime: nodejs8.10
stage: ${opt:api-type, 'uat'}-${opt:api, 'payment'}
region: ca-central-1
timeout: 30
memorySize: 128
role: ${file(config/prod.env.json):ROLE}
vpc:
securityGroupIds:
- ${file(config/prod.env.json):SECURITY_GROUP}
subnetIds:
- ${file(config/prod.env.json):SUBNET}
apiGateway:
apiKeySourceType: HEADER
apiKeys:
- ${file(config/${opt:api-type, 'uat'}.env.json):${opt:api, "payment"}-APIKEY}
functions:
- '${file(src/handlers/${opt:api, "payment"}.serverless.yml)}'
package:
# individually: true
exclude:
- node_modules/**
- nodejs/**
plugins:
- serverless-offline
- serverless-plugin-warmup
- serverless-content-encoding
custom:
contentEncoding:
minimumCompressionSize: 0 # Minimum body size required for compression in bytes
layers:
nodejs:
package:
artifact: nodejs.zip
compatibleRuntimes:
- nodejs8.10
allowedAccounts:
- "*"
Thats what my serverless yaml script looks like.
I was having a similar error to you while using the explicit layers keys that you are using to define a lambda layer.
My error (for the sake of web searches) was this:
Runtime.ImportModuleError: Error: Cannot find module <package name>
I feel this is a temporary solution b/c I wanted to explicitly define my layers like you were doing, but it wasn't working so it seemed like a bug.
I created a bug report in Serverless for this issue. If anyone else is having this same issue they can track it there.
SOLUTION
I followed this this post in the Serverless forums based on these docs from AWS.
I zipped up my node_modules under the folder nodejs so it looks like this when it is unzipped nodejs/node_modules/<various packages>.
Then instead of using the explicit definition of layers I used the package and artifact keys like so:
layers:
test:
package:
artifact: test.zip
In the function layer it is referred to like this:
functions:
function1:
handler: index.handler
layers:
- { Ref: TestLambdaLayer }
The TestLambdaLayer is a convention of <your name of layer>LambdaLayer as documented here
Make sure you run npm install inside your layers before deploying, ie:
cd ~/repos/repo-name/layers/utilityLayer/nodejs && npm install
Otherwise your layers will get deployed without a node_modules folder. You can download the .zip of your layer from the Lambda UI to confirm the contents of that layer.
If anyone face a similar issue Runtime.ImportModuleError, is fair to say that another cause of this issue could be a package exclude statement in the serverless.yml file.
Be aware that if you have this statement:
package:
exclude:
- './**'
- '!node_modules/**'
- '!dist/**'
- '.git/**'
It will cause exactly the same error, on runtime once you've deployed your lambda function (with serverless framework). Just, ensure to remove the ones that could create a conflict across your dependencies
I am using typescript with the serverless-plugin-typescript and I was having a same error, too.
When I switched from
const myModule = require('./src/myModule');
to
import myModule from './src/myModule';
the error disappeared. It seems like the files were not included into the zip file by serverless when I was using require.
PS: Removing the serverless-plugin-typescript and switching back to javascript also solved the problem.

Field types missing on production

We're having problem with some field types on our production server. Some field types are missing, causing the admin interface to crash when trying to list all items. The fields we're having problems with are so far Date and CloudinaryImages (note that DateTime and CloudinaryImage works fine).
When inspecting the source on our staging server and comparing to our production server we see the following difference in the compiled js files:
example.com/js/fields.js on staging:
exports.Fields = {
text: require("types/text/TextField"),
textarea: require("types/textarea/TextareaField"),
html: require("types/html/HtmlField"),
cloudinaryimage: require("types/cloudinaryimage/CloudinaryImageField"),
select: require("types/select/SelectField"),
relationship: require("types/relationship/RelationshipField"),
datetime: require("types/datetime/DatetimeField"),
boolean: require("types/boolean/BooleanField"),
embedly: require("types/embedly/EmbedlyField"),
cloudinaryimages: require("types/cloudinaryimages/CloudinaryImagesField"),
numberarray: require("types/numberarray/NumberArrayField"),
code: require("types/code/CodeField"),
number: require("types/number/NumberField"),
textarray: require("types/textarray/TextArrayField"),
url: require("types/url/UrlField"),
file: require("types/file/FileField"),
email: require("types/email/EmailField"),
name: require("types/name/NameField"),
password: require("types/password/PasswordField")
};
example.com/js/fields.js on production:
exports.Fields = {
text: require("types/text/TextField"),
textarea: require("types/textarea/TextareaField"),
html: require("types/html/HtmlField"),
cloudinaryimage: require("types/cloudinaryimage/CloudinaryImageField"),
select: require("types/select/SelectField"),
relationship: require("types/relationship/RelationshipField"),
datetime: require("types/datetime/DatetimeField"),
boolean: require("types/boolean/BooleanField"),
embedly: require("types/embedly/EmbedlyField"),
numberarray: require("types/numberarray/NumberArrayField"),
code: require("types/code/CodeField"),
number: require("types/number/NumberField"),
textarray: require("types/textarray/TextArrayField"),
url: require("types/url/UrlField"),
file: require("types/file/FileField"),
email: require("types/email/EmailField"),
name: require("types/name/NameField"),
password: require("types/password/PasswordField")
};
An eagle eyed reader can see that the staging server has cloudinaryimages: require("types/cloudinaryimages/CloudinaryImagesField"), while the production server does not. Date does not appear at all in these, perhaps because we removed all fields using that type the last time we encountered this problem?
Our site is hosted on Heroku. We've tried disabling node cache and rebuilding. We've tried promoting the staging build to production. The problem still persists. Our production server has environment set to production.
Does the build of the fields.js file depend on what fields we use? And how come our production server doesn't get them?
Any help appreciated.
Keystone version: 4.0.0-beta.8 (forked with a small addition not relevant to this)

Node red not working with https

Following various posts about running Node-red as https, I've done the following:
Made the following changes in settings.js:
var fs = require("fs");
...
https: {
key: fs.readFileSync('privkey.pem'),
cert: fs.readFileSync('cert.pem')
},
...
requireHttps: true
Created privkey.pem and cert.pem.
Verified files exist in ~/node-red (Raspberry Pi).
Node.js version v8.9.4
Node-RED version v0.17.5
When I do https://raspberrypi:1880 I get "The site cannot be reached" but http://raspberrypi:1880 still works. I even tried rebooting the Pi.
This was a simple operator error. I had two dirs. One .node-red and one node-red (FYI: Only install node-red using one way and not several [duh]).
When node-red was loading I miss read the working directory and modified the wrong settings.js

Scripted editing of Symfony 2 YAML file breaks the formatting and produces errors

I'm trying to script out our entire installation process of new Symfony 2.1 projects including adding and configuring all our bundles and their dependencies. The end result should be one command that sets up everything so the developer is both forced into our best practices setup, and does not have to spend time on this.
So far this is fairly successful since it is now possible to go from 0 to fully installed CMS in about an hour (mostly due to composer installs). You can see the result here: https://github.com/Kunstmaan/KunstmaanSandbox/blob/feature/update-to-2.1/README.md
The next fase of this project is modifying the Symfony config YAML files. But here I'm stuck.
For the parameters.yml I did this with a ruby script, here is the relevant extract, the full script can be found here: https://github.com/Kunstmaan/KunstmaanSandbox/blob/feature/update-to-2.1/app/Resources/docs/scripts/sandboxinstaller.rb
parametersymlpath = ARGV[1]
projectname = ARGV[2]
parametersyml = YAML.load_file(parametersymlpath)
params = parametersyml["parameters"]
params["searchport"] = 9200
params["searchindexname"] = projectname
params["sentry.dsn"] = "https://XXXXXXXX:XXXXXXXX#app.getsentry.com/XXXX"
params["cdnpath"] = ""
params["requiredlocales"] = "nl|fr|de|en"
params["defaultlocale"] = "nl"
params["websitetitle"] = projectname.capitalize
File.open(parametersymlpath, 'w') {|f| f.write(YAML.dump(parametersyml)) }
So far so good, but the same type of script fails on the config.yml due to these lines:
imports:
- { resource: #KunstmaanMediaBundle/Resources/config/config.yml }
- { resource: #KunstmaanAdminBundle/Resources/config/config.yml }
- { resource: #KunstmaanFormBundle/Resources/config/config.yml }
- { resource: #KunstmaanSearchBundle/Resources/config/config.yml }
- { resource: #KunstmaanAdminListBundle/Resources/config/config.yml }
The # is a reserved character according to the YAML spec and Ruby throws an error.
So I switched to php and the Symfony yaml component since at this point in the install there is a full symfony install and i came up with this standalone command: https://gist.github.com/3526251
But when reading and dumping the config.yml file the lines above for example would turn into
imports:
-
resource: #KunstmaanMediaBundle/Resources/config/config.yml
-
resource: #KunstmaanAdminBundle/Resources/config/config.yml
-
resource: #KunstmaanFormBundle/Resources/config/config.yml
-
resource: #KunstmaanSearchBundle/Resources/config/config.yml
-
resource: #KunstmaanAdminListBundle/Resources/config/config.yml
Which looks like crap and i'm not entirely sure this will even work.
So at this point i'm looking at moving the fully modified config.yml files to the install script and just overwriting the originals. I would rather not go there, since it will take constant maintenance if something changes in the symfony-standard project.
I'm wondering if there is another way?
These two forms are semantically equivalent. They are called the Inline and Indented Blocks, respectively.

Resources