Apache Storm Flux Simple KafkaSpout --> KafkaBolt NullPointerException - apache-storm

I'm using Apache Storm 0.10.0-beta1 and started converting some topologies to Flux. I decided to start with a simple topology that reads from a Kafka queue and writes to a different Kafka queue. I get this error, which I am having a difficult time figuring out what is wrong. The topology yaml file follows the error.
Parsing file: /Users/frank/src/mapper/mapper.yaml
388 [main] INFO o.a.s.f.p.FluxParser - loading YAML from input stream...
391 [main] INFO o.a.s.f.p.FluxParser - Not performing property substitution.
391 [main] INFO o.a.s.f.p.FluxParser - Not performing environment variable substitution.
466 [main] INFO o.a.s.f.FluxBuilder - Detected DSL topology...
Exception in thread "main" java.lang.NullPointerException
at org.apache.storm.flux.FluxBuilder.canInvokeWithArgs(FluxBuilder.java:561)
at org.apache.storm.flux.FluxBuilder.findCompatibleConstructor(FluxBuilder.java:392)
at org.apache.storm.flux.FluxBuilder.buildObject(FluxBuilder.java:288)
at org.apache.storm.flux.FluxBuilder.buildSpout(FluxBuilder.java:361)
at org.apache.storm.flux.FluxBuilder.buildSpouts(FluxBuilder.java:349)
at org.apache.storm.flux.FluxBuilder.buildTopology(FluxBuilder.java:84)
at org.apache.storm.flux.Flux.runCli(Flux.java:153)
at org.apache.storm.flux.Flux.main(Flux.java:98)
Topology yaml:
name: "mapper-topology"
config:
topology.workers: 1
topology.debug: true
kafka.broker.properties.metadata.broker.list: "localhost:9092"
kafka.broker.properties.request.required.acks: "1"
kafka.broker.properties.serializer.class: "kafka.serializer.StringEncoder"
# component definitions
components:
- id: "topicSelector"
className: "storm.kafka.bolt.selector.DefaultTopicSelector"
constructorArgs:
- "schemaq"
- id: "kafkaMapper"
className: "storm.kafka.bolt.mapper.FieldNameBasedTupleToKafkaMapper"
# spout definitions
spouts:
- id: "kafka-spout"
className: "storm.kafka.SpoutConfig"
parallelism: 1
constructorArgs:
- ref: "zkHosts"
- "mapperq"
- "/mapperq"
- "id-mapperq"
properties:
- name: "forceFromStart"
value: true
- name: "scheme"
ref: "stringMultiScheme"
# bolt definitions
bolts:
- id: "kafka-bolt"
className: "storm.kafka.bolt.KafkaBolt"
parallelism: 1
configMethods:
- name: "withTopicSelector"
args: [ref: "topicSelector"]
- name: "withTupleToKafkaMapper"
args: [ref: "kafkaMapper"]
# streams
streams:
- name: "kafka-spout --> kafka-bolt"
from: "kafka-spout"
to: "kafka-bolt"
grouping:
type: SHUFFLE
And here is the command:
storm jar /Users/frank/src/mapper/target/mapper-0.1.0-SNAPSHOT-standalone.jar org.apache.storm.flux.Flux --local mapper.yaml

spout classname should be storm.kafka.KafkaSpout, not storm.kafka.SpoutConfig. You should define SpoutConfig to "components" section, and let spout refer this.
You can refer https://github.com/apache/storm/blob/master/external/flux/flux-examples/src/main/resources/kafka_spout.yaml to see how to setup KafkaSpout from flux.

Related

Hyperledger Fabric configtxgen - Error reading config: map merge requires map or sequence of maps as the value

I'm trying to setup a simple Fabric network with the following:
Orderer Organization [abccoinOrderers]
Sample Organization [ABC]
After generating the all the necessary files using cryptogen tool, running the configtxgen command gives the following error:
student#abc:~/Desktop/fabric/network$ configtxgen -profile DefaultBlockOrderingService -outputBlock ./config/genesis.block -configPath $PWD
2019-12-26 12:35:42.131 MST [common.tools.configtxgen] main -> WARN 001 Omitting the channel ID for configtxgen for output operations is deprecated. Explicitly passing the channel ID will be required in the future, defaulting to 'testchainid'.
2019-12-26 12:35:42.136 MST [common.tools.configtxgen] main -> INFO 002 Loading configuration
2019-12-26 12:35:42.137 MST [common.tools.configtxgen.localconfig] Load -> PANI 003 Error reading configuration: While parsing config: yaml: map merge requires map or sequence of maps as the value
2019-12-26 12:35:42.137 MST [common.tools.configtxgen] func1 -> PANI 004 Error reading configuration: While parsing config: yaml: map merge requires map or sequence of maps as the value
panic: Error reading configuration: While parsing config: yaml: map merge requires map or sequence of maps as the value [recovered]
panic: Error reading configuration: While parsing config: yaml: map merge requires map or sequence of maps as the value
Here is the configtx.yaml
Organizations:
- &abccoinOrderers
Name: abccoinOrderersMSP
ID: abccoinOrderersMSP
MSPDir: crypto-config/ordererOrganizations/abccoin.com/msp
- &ABC
Name: ABCMSP
ID: ABCMSP
MSPDir: crypto-config/peerOrganizations/ABC.abccoin.com/msp
AnchorPeers:
- Host: Andy.ABC.abccoin.com
Port: 7051
Application: &ApplicationDefaults
Orderer:
- &DevModeOrdering
OrdererType: solo
Addresses:
- Devorderer.abccoin.com:7050
BatchTimeout: 2s
BatchSize:
MaxMessageCount: 1
Profiles:
DefaultBlockOrderingService:
Orderer:
<<: *DevModeOrdering
Organizations:
- *abccoinOrderers
Consortiums:
SampleConsortium:
Organizations:
- *ABC
abcMembersOnly:
Consortium: SampleConsortium
Application:
<<: *ApplicationDefaults
Organizations:
- *ABC
I've already tried rearranging the code blocks as mentioned in this post. I've also tried pacing the "<<" key in quotes as mentioned in this issue YML document "<<: value" cannot be parsed #245 but it didn't help.
There are 2 errors in theconfigtx.yaml.
Orderer: Is a map type or object type, not an array or slice type. When you define parameters using -, it is used as an array in yaml.
Orderer:
// remove -
&DevModeOrdering
OrdererType: solo
Addresses:
- Devorderer.abccoin.com:7050
BatchTimeout: 2s
BatchSize:
MaxMessageCount: 1
Application: You must declare Organizations: parameter. It can be empty. If you don't declare anything in that, it will not compile. To check you should try to convert the yaml into json in any online convertor.
Application: &ApplicationDefaults
Organizations:

How can I configure multiple Logger in YAML File

I am unable to configure multiple loggers in my YAML file. The last logger is overriding the previous loggers.
Here is my code
Loggers:
Logger:
- name: com.example
additivity: false
level: info
AppenderRef:
- ref: RollingFileAppender_Normal
level: info
- name: com.example
additivity: false
level: info
AppenderRef:
- ref: RollingFileAppender_JSON
level: info
All logs are getting generated in RollingFileAppender_JSON appender.
I found the answer to my question.
There are 2 solution I found to above problem.
1)
Loggers:
Logger:
- name: com.example
additivity: false
level: info
AppenderRef:
- ref: RollingFileAppender_Normal
- ref: RollingFileAppender_JSON
- level: info
2) By keeping 'additivity: false' only in the first logger
Loggers:
Logger:
- name: com.example
level: info
additivity: false
AppenderRef:
- ref: RollingFileAppender_Normal
level: info
- name: com.example
level: info
AppenderRef:
- ref: RollingFileAppender_JSON
level: info

Elasticsearch Curator not to delete the last index while deleting indices by filtertypes

Using Elasticsearch 5.1 and Curator version is 4.3 in Centos 7
I am having some indices in elasticsearch whose naming format is sample.data.YYYY_MM_DD , sample.file.YYYY_MM_DD
For example:-
sample.data.2019_07_22
sample.data.2019_07_23
sample.data.2019_07_25
sample.data.2019_07_26
sample.data.2019_07_28
sample.file.2019_07_21
sample.file.2019_07_25
sample.file.2019_07_26
sample.file.2019_07_29
I have used to run the action file by using the below command in Linux.
curator --config /root/config.yml /root/action_file.yml
I wanted to delete all indices except the recent index which is have created newer [ sample.data.2019_07_28, sample.file.2019_07_29 ]
This is the which i tried :-
---
actions:
1:
action: delete_indices
description: "Delete indices older than 3 days (based on index name), for workflow- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly."
filters:
-
exclude: ~
filtertype: pattern
kind: prefix
value: sample.*.
-
direction: older
exclude: ~
filtertype: age
source: name
timestring: "%Y%m%d"
unit: days
unit_count: 3
options:
continue_if_exception: false
disable_action: false
ignore_empty_list: true
timeout_override: ~
Its deleting overall indices even though i have used the below function also,
- filtertype: count
count: 4
Expected output be like :-
sample.data.2019_07_28
sample.file.2019_07_29
I think you should change your timestring from timestring: "%Y%m%d" to timestring: "%Y_%m_%d". When I test with a dry run I get:
2019-08-02 15:02:47,493 INFO Preparing Action ID: 1, "delete_indices"
2019-08-02 15:02:47,513 INFO Trying Action ID: 1, "delete_indices": Delete indices older than 3 days (based on index name), for workflow- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly.
2019-08-02 15:02:48,709 INFO DRY-RUN MODE. No changes will be made.
2019-08-02 15:02:48,709 INFO (CLOSED) indices may be shown that may not be acted on by action "delete_indices".
2019-08-02 15:02:48,709 INFO DRY-RUN: delete_indices: sample.file.2019_07_26 with arguments: {}
2019-08-02 15:02:48,709 INFO DRY-RUN: delete_indices: sample.file.2019_07_27 with arguments: {}
2019-08-02 15:02:48,710 INFO DRY-RUN: delete_indices: sample.file.2019_07_28 with arguments: {}
2019-08-02 15:02:48,710 INFO DRY-RUN: delete_indices: sample.file.2019_07_29 with arguments: {}
2019-08-02 15:02:48,710 INFO DRY-RUN: delete_indices: sample.file.2019_07_30 with arguments: {}
2019-08-02 15:02:48,710 INFO Action ID: 1, "delete_indices" completed.
2019-08-02 15:02:48,710 INFO Job completed.
Hope that helps.
I think you should upgrade to Curator 5.7, which fully supports Elasticsearch v5, and provides the count filter, which can sort indices by age and keep only n indices. Using the exclude flag, you can exclude the most recent index, and then use the regular age filter.

MiNiFi - NiFi Connection Failure: Unknown Host Exception : Able to telnet host from the machine where MiNiFi is running

I am running MiNiFi in a Linux Box (gateway server) which is behind my company's firewall. My NiFi is running on an AWS EC2 cluster (running in standalone mode).
I am trying to send data from the Gateway to NiFi running in AWS EC2.
From gateway, I am able to telnet to EC2 node with the public DNS and the remote port which I have configured in the nifi.properties file
nifi.properties
# Site to Site properties
nifi.remote.input.host=ec2-xxx.us-east-2.compute.amazonaws.com
nifi.remote.input.secure=false
nifi.remote.input.socket.port=1026
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
Telnet connection from Gateway to NiFi
iot1#iothdp02:~/minifi/minifi-0.5.0/conf$ telnet ec2-xxx.us-east-2.compute.amazonaws.com 1026
Trying xx.xx.xx.xxx...
Connected to ec2-xxx.us-east-2.compute.amazonaws.com.
Escape character is '^]'.
The Public DNS is resolving to the correct Public IP of the EC2 node.
From the EC2 node, when I do nslookup on the Public DNS, it gives back the private IP.
From AWS Documentation: "The public IP address is mapped to the primary private IP address through network address translation (NAT). "
Hence, I am not adding the Public DNS and the Public IP in /etc/host file in the EC2 node.
From MiNiFi side, I am getting the below error:
minifi-app.log
iot1#iothdp02:~/minifi/minifi-0.5.0/logs$ cat minifi-app.log
2018-11-14 16:00:47,910 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:00:47,911 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:01:02,334 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 20 milliseconds (Stop-the-world time = 6 milliseconds, Clear Edit Logs time = 4 millis), max Transaction ID -1
2018-11-14 16:02:47,911 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:02:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:03:02,354 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 18 milliseconds (Stop-the-world time = 3 milliseconds, Clear Edit Logs time = 5 millis), max Transaction ID -1
2018-11-14 16:03:10,636 WARN [Timer-Driven Process Thread-8] o.a.n.r.util.SiteToSiteRestApiClient Failed to get controller from http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi-api due to java.net.UnknownHostException: ec2-xxx.us-east-2.compute.amazonaws.com: unknown error
2018-11-14 16:03:10,636 WARN [Timer-Driven Process Thread-8] o.apache.nifi.controller.FlowController Unable to communicate with remote instance RemoteProcessGroup[http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi] due to org.apache.nifi.controller.exception.CommunicationsException: org.apache.nifi.controller.exception.CommunicationsException: Unable to communicate with Remote NiFi at URI http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi due to: ec2-xxx.us-east-2.compute.amazonaws.com: unknown error
2018-11-14 16:04:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:04:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:05:02,380 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with 0 Records and 0 Swap Files in 25 milliseconds (Stop-the-world time = 8 milliseconds, Clear Edit Logs time = 6 millis), max Transaction ID -1
2018-11-14 16:06:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2018-11-14 16:06:47,912 INFO [pool-31-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 0 records in 0 milliseconds
2018-11-14 16:07:02,399 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#67207d8a checkpointed with
MiNiFi config.yml
MiNiFi Config Version: 3
Flow Controller:
name: Gateway-IDS_v0.1
comment: "1. ConsumeMQTT - MiNiFi will consume mqtt messages in gateway\n2. Remote\
\ Process Group will send messages to NiFi "
Core Properties:
flow controller graceful shutdown period: 10 sec
flow service write delay interval: 500 ms
administrative yield duration: 30 sec
bored yield duration: 10 millis
max concurrent threads: 1
variable registry properties: ''
FlowFile Repository:
partitions: 256
checkpoint interval: 2 mins
always sync: false
Swap:
threshold: 20000
in period: 5 sec
in threads: 1
out period: 5 sec
out threads: 4
Content Repository:
content claim max appendable size: 10 MB
content claim max flow files: 100
always sync: false
Provenance Repository:
provenance rollover time: 1 min
implementation: org.apache.nifi.provenance.MiNiFiPersistentProvenanceRepository
Component Status Repository:
buffer size: 1440
snapshot frequency: 1 min
Security Properties:
keystore: ''
keystore type: ''
keystore password: ''
key password: ''
truststore: ''
truststore type: ''
truststore password: ''
ssl protocol: ''
Sensitive Props:
key:
algorithm: PBEWITHMD5AND256BITAES-CBC-OPENSSL
provider: BC
Processors:
- id: 6396f40f-118f-33f4-0000-000000000000
name: ConsumeMQTT
class: org.apache.nifi.processors.mqtt.ConsumeMQTT
max concurrent tasks: 1
scheduling strategy: TIMER_DRIVEN
scheduling period: 0 sec
penalization period: 30 sec
yield period: 1 sec
run duration nanos: 0
auto-terminated relationships list: []
Properties:
Broker URI: tcp://localhost:1883
Client ID: nifi
Connection Timeout (seconds): '30'
Keep Alive Interval (seconds): '60'
Last Will Message:
Last Will QoS Level:
Last Will Retain:
Last Will Topic:
MQTT Specification Version: '0'
Max Queue Size: '10'
Password:
Quality of Service(QoS): '0'
SSL Context Service:
Session state: 'true'
Topic Filter: MQTT
Username:
Controller Services: []
Process Groups: []
Input Ports: []
Output Ports: []
Funnels: []
Connections:
- id: f0007aa3-cf32-3593-0000-000000000000
name: ConsumeMQTT/Message/85ebf198-0166-1000-5592-476a7ba47d2e
source id: 6396f40f-118f-33f4-0000-000000000000
source relationship names:
- Message
destination id: 85ebf198-0166-1000-5592-476a7ba47d2e
max work queue size: 10000
max work queue data size: 1 GB
flowfile expiration: 0 sec
queue prioritizer class: ''
Remote Process Groups:
- id: c00d3132-375b-323f-0000-000000000000
name: ''
url: http://ec2-xxx.us-east-2.compute.amazonaws.com:9090
comment: ''
timeout: 30 sec
yield period: 10 sec
transport protocol: RAW
proxy host: ''
proxy port: ''
proxy user: ''
proxy password: ''
local network interface: ''
Input Ports:
- id: 85ebf198-0166-1000-5592-476a7ba47d2e
name: From MiNiFi
comment: ''
max concurrent tasks: 1
use compression: false
Properties:
Port: 1026
Host Name: ec2-xxx.us-east-2.compute.amazonaws.com
Output Ports: []
NiFi Properties Overrides: {}
Any pointers on how to troubleshoot this issue?
In MiNiFi config.yml, I changed the URL under Remote Process Groups from http://ec2-xxx.us-east-2.compute.amazonaws.com:9090 to http://ec2-xxx.us-east-2.compute.amazonaws.com:9090/nifi.

Storm HiveBolt missing records due to batching of Hive transactions

To store the processed records I am using HiveBolt in Storm topology with following arguments.
- id: "MyHiveOptions"
className: "org.apache.storm.hive.common.HiveOptions"
- "${metastore.uri}" # metaStoreURI
- "${hive.database}" # databaseName
- "${hive.table}" # tableName
configMethods:
- name: "withTxnsPerBatch"
args:
- 2
- name: "withBatchSize"
args:
- 100
- name: "withIdleTimeout"
args:
- 2 #default value 0
- name: "withMaxOpenConnections"
args:
- 200 #default value 500
- name: "withCallTimeout"
args:
- 30000 #default value 10000
- name: "withHeartBeatInterval"
args:
- 240 #default value 240
There are missing transaction in Hive due to batch no being completed and records are flushed. (For example: 1330 records are processed but only 1200 records are in hive. 130 records missing.)
How can I overcome this situation? How can I fill the batch so that the transaction is triggered and the records are stored in hive.
Topology : Kafka-Spout --> DataProcessingBolt
DataProcessingBolt -->HiveBolt (Sink)
DataProcessingBolt -->JdbcBolt (Sink)

Resources