I'm looking for some advice on setting up a Spring Data Flow stream for a specific use case.
My use case:
I have 2 RDBMS and I need to compare the results of queries run against each. The queries should be run roughly simultaneously. Based on the result of the comparison, I should be able to send an email through a custom email sink app which I have created.
I envision the stream diagram to look something like this (sorry for the paint):
The problem is that SDF does not, to my knowledge, allow a stream to be composed with 2 sources. It seems to me that something like this ought to be possible without pushing the limits of the framework too far. I'm looking for answers that provide a good approach to this scenario while working within the SDF framework.
I am using Kafka as a message broker and the data flow server is using mysql to persist stream information.
I have considered creating a custom Source app which polls two datasources and sends the messages on the output channel. This would eliminate my requirement of 2 sources, but it looks like it would require a significant amount of customization of the jdbc source application.
Thanks in advance.
I have not really tried this, but you should be able to use named destinations to achieve that. Take a look here: http://docs.spring.io/spring-cloud-dataflow/docs/current-SNAPSHOT/reference/htmlsingle/#spring-cloud-dataflow-stream-advanced
stream create --name jdbc1 --definition "jdbc > :dbSource"
stream create --name jdbc2 --definition "jdbc > :dbSource"
stream create --name processor --definition ":dbSource > aggregator | sink"
Related
My use case here is to read from database and write it into bigquery table.
For this i am trying to use grpc api
And Following this example file. Considiring myself new to protobuf and golang I am unable to figure out how to write a DB row into bigquery table. Specifically confused about this part. Not able to find any particular example of creating request in protobuf byte sequence and streaming it.
Any help is much appreciated.
Go client provides a managedWriter that you can use to stream data more easily. You can see how it is used in the integration tests.
Also, if you are new to Go, do you consider using Java instead? There is a JsonStreamWriter available in Java that allows you to append JSONArray objects (as opposed to protobuf rows) and the samples are here: https://github.com/googleapis/java-bigquerystorage/tree/main/samples/snippets/src/main/java/com/example/bigquerystorage
I am using confluent Kafka Go for my project. When writing tests, because of the asynchronous nature of Kafka when creating a topic, I might be have errors (error code 3: UNKNOWN_TOPIC_OR_PARTITION) when create the topic then get back immediately.
As I understood, if I can query directly on the controller, I can always get the lastest meta data. So my question is: How can I get Kafka controller's IP or ID when using Confluent Kafka Go.
I'm deploying a project with IIB.
The good feature is Integration Serivce, but I dont know how to save log before and after each operation.
So can any one know how to resolve that ?
Tks !
There are three ways in my project. Refer to the following.
Code Level
1.JavaComputeNode (Using log4j )
Flow Level
1.TraceNode
2.Message Flow Monitoring
In addition to the other answers there is one more option, which I often use: The IAM3 SupportPac
It adds a log4j-Node and also provides the possibility to log from esql and java compute nodes.
There are two ways of doing this:
You can use Log Node to create audit logging. This option only store in files and the files are not rotatives
You can use the IBM Integrated Monitor these events to create a external flow that intercepts messages and store this message in the way you prefer
I am new in Learning of Spring cloud Task and SCDF so asking this.
I wand to execute my SCT based on an event (say a message is posted into Rabbit MQ), so I am think it can be done in two ways:
Create a source which polls message from RabbitMQ and sends the data to stream, now create a sink which reads data from stream and as soon as data comes to sink (from source stream) Task will be launched.
create steam producer --definition "rabbitproducer | streamconsumer (This is #TaskEnabled)"
Not sure if this is possible?
Other way could be to use task launcher. Here task launcher will be configured with a stream and a listener will be polling message from rabbitMQ. so when a message is received then trigger will initiate the process and tasklauncher will launch the task. But here not sure how will i get the message data into my task? Do I have to add the data into TaskLaunchRequest?
create stream mystream --definition "rabbitmsgtrigger --uri:my task | joblauncher"
Launching a Task by an upstream event is already supported and there are few approaches to it - please review the reference guide (and the sample) for more details.
Here is the complete explanation about my question's answer. Here Sabby has helped me a lot for resolving my issue.
Problem: I was not able to trigger my task using tasklauncher/task-sink. In the log also I was not getting correct details and I even did not know how to set to set log level correctly.
Solution: With the help of Sabby and documentations provided on SCT site I could resolve this and have moved ahead in my POC work. Below are the detailed steps I did.
Started my SCDF with postgresql database by referring to the property file and setting log level change as
--logging.level.org.springframework.cloud=DEBUG
--spring.config.location=file://scdf.properties
Imported apps from bitly.
app import --uri [stream applications link][1]
Registered task sink app
app register --name task-sink --type sink --uri file://tasksink-1.1.0.BUILD-SNAPSHOT.jar
Created stream as:
stream create mytasklaunchertest --definition "triggertask --triggertask.uri=https://my-archiva/myproject-scdf-task1/0.0.1-SNAPSHOT/myproject-scdf-task1-0.0.1-20160916.143611-1.jar --trigger.fixed-delay=5 | task-sink"
Deployed Stream:
stream deploy foo --properties "app.triggertask.spring.rabbitmq.host=host,app.triggertask.spring.rabbitmq.username=user,app.triggertask.spring.rabbitmq.password=pass,app.triggertask.spring.rabbitmq.port=5672,app.triggertask.spring.rabbitmq.virtual-host=xxx"
I have a stream as follows,
source(jms-ibmmq) -> Process -> Process -> sink(jdbc-oracle)
Data ingestion works fine. But as part of my stream there is a possibility that my sink(jdbc-oracle) will be down (or) that there is some problem in the network which prevents persistence to the oracle db.
What I am asking is how to handle this failure and what option spring xd can provide out of the box? Is there a pattern thats commonly used to handle these failures in the streams which has caused processing / sink modules?
Please see the comments on this JIRA issue they explain documentation changes we are adding to explain how to configurre dead-lettering in the message bus.
In addition, we have provided mechanisms such that, if all four modules are deployed to the same container (and all containers that match the deployment criteria), we will directly connect the modules such that an error in the sink will be thrown back to the source (causing the JMS message to be rolled back in your case).
This is achieved by setting the module count property to 0 (meaning deploy on all containers that match the criteria - if any - or all containers, if no criteria).
This feature is available on master (it was added after M7).