I have 2 stream like below and transport is rabbit
stream1 source|processor|processor>namedchannel
stream2 namedchannel>processor|processor|sink
I see that lot of message is clogged in namedchannel and i like to retrieve more message from named Channel.For source and processor modules i increased concurrency but I am not sure what should be done for named channel
Set the concurrency on the first module (processor) ...
xd:>stream create foo --definition "time > queue:foo" --deploy
Created and deployed new stream 'foo'
xd:>stream create bar --definition "queue:foo > log"
Created new stream 'bar'
xd:>stream deploy bar --properties module.log.consumer.concurrency=2
Deployed stream 'bar'
You currently can't set concurrency on a topic: named channel - that is fixed in 1.3.1 due soon.
Related
I am creating a basic stream in SCDF (Local Server 1.7.3) wherein I am configuring 2 streams.
1. HTTP -> Kafka Topic
2. Kafka Topic -> HDFS
Streams:
stream create --name ingest_from_http --definition "http --port=8000 --path-pattern=/test > :streamtest1"
stream deploy --name ingest_from_http --properties "app.http.spring.cloud.stream.bindings.output.producer.headerMode=raw"
stream create --name ingest_to_hdfs --definition ":streamtest1 > hdfs --fs-uri=hdfs://<host>:8020 --directory=/tmp/hive/sensedev/streamdemo/ --file-extension=xml --spring.cloud.stream.bindings.input.consumer.headerMode=raw"
I have created a Hive managed table on location /tmp/hive/sensedev/streamdemo/
DROP TABLE IF EXISTS gwdemo.xml_test;
CREATE TABLE gwdemo.xml_test(
id int,
name string
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.id"="/body/id/text()",
"column.xpath.name"="/body/name/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/tmp/hive/sensedev/streamdemo'
TBLPROPERTIES (
"xmlinput.start"="<body>",
"xmlinput.end"="</body>")
;
Testing:
Whether Hive is able to read XML : Put a xml file in the location
/tmp/hive/sensedev/streamdemo.
File Content: <body><id>1</id><name>Test1</name></body>
On running a SELECT command on the table, it was showing the above record properly.
When posting record in SCDF with http post, I am getting proper data
in Kafka Consumer but when I am checking HDFS, the xml files are
being created but I am receiving raw messages in those files.
Example:
dataflow>http post --target http:///test
--data "<body><id>2</id><name>Test2</name></body>" --contentType application/xml
In Kafka Console Consumer, I am able to read proper XML message: <body><id>2</id><name>Test2</name></body>
$ hdfs dfs -cat /tmp/hive/sensedev/streamdemo/hdfs-sink-2.xml
[B#31d94539
Questions:
1. What am I missing? How can I get proper XML records in the newly created XML files in HDFS?
HDFS Sink expects a Java Serialized object.
I have created a composite module:
module compose common-module --definition "kafka --topic=topic1 --outputType=text/plain | shell --command='script1.sh' "
I then created a stream using this module:
stream create stream1 --definition "common-module > queue:job:job1"
And I got the following error:
Command failed org.springframework.xd.rest.client.impl.SpringXDException:
Error with option(s) for module common-module of type source:
command: may not be null
command: may not be empty
Anyone knows what's going on? Thanks !
It's a bug, I opened a JIRA Issue.
The only work-around I can think of (short of creating a custom shell module - see the JIRA) is to pass-in the script again...
stream create stream1 --definition "common-module --shell.script=script1.sh > queue:job:job1"
I am attempting to use aggregated events to allow successful completion of a job to kick off another. Problem is, is that I get a "Class cannot be created (missing no-arg constructor)" exception for JobExecution.
Here are the attempted taps:
stream create --name trigger_myjob--definition "tap:job:prerequisitejob.job > filter --expression=payload.getExitStatus.equals(T(org.springframework.batch.core.ExitStatus).COMPLETED) > queue:job:myjob" --deploy
stream create --name debug_trigger --definition "tap:job:prerequisitejob.job > log --name=TX.DEBUG --expression=payload.getExitStatus"
stream create --name debug_harder_trigger --definition "tap:job:prerequisitejob.job > log"
In each case, I get a stack trace indicating that the rabbit listener failed to create the message, ultimately caused by:
Caused by: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg const
ructor): org.springframework.batch.core.JobExecution
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
at org.springframework.xd.dirt.integration.bus.serializer.kryo.PojoCodec.doDeserialize(Po
joCodec.java:41)
at org.springframework.xd.dirt.integration.bus.serializer.kryo.AbstractKryoMultiTypeCodec$1.execute(AbstractKryoMultiTypeCodec.java:63)
at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.run(KryoPoolQueueImpl.java:43)
at org.springframework.xd.dirt.integration.bus.serializer.kryo.AbstractKryoMultiTypeCodec.deserialize(AbstractKryoMultiTypeCodec.java:60)
at org.springframework.xd.dirt.integration.bus.serializer.kryo.PojoCodec.deserialize(PojoCodec.java:30)
at org.springframework.xd.dirt.integration.bus.serializer.CompositeCodec.deserialize(CompositeCodec.java:72)
at org.springframework.xd.dirt.integration.bus.serializer.CompositeCodec.deserialize(CompositeCodec.java:78)
at org.springframework.xd.dirt.integration.bus.MessageBusSupport.deserializePayload(MessageBusSupport.java:588)
at org.springframework.xd.dirt.integration.bus.MessageBusSupport.deserializePayload(MessageBusSupport.java:573)
at org.springframework.xd.dirt.integration.bus.MessageBusSupport.deserializePayloadIfNecessary(MessageBusSupport.java:556)
at org.springframework.xd.dirt.integration.rabbit.RabbitMessageBus.access$600(RabbitMessageBus.java:101)
at org.springframework.xd.dirt.integration.rabbit.RabbitMessageBus$ReceivingHandler.handleRequestMessage(RabbitMessageBus.java:748)
at org.springframework.integration.handler.AbstractReplyProducingMessageHandler.handleMessageInternal(AbstractReplyProducingMessageHandler.java:99)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:78)
... 37 more
What is causing this exception, and how do I fix it? Is there any other way to go about having jobs trigger one another in spring-xd?
I found out after asking this that our architect had manually replaced jars, Kryo among them. The behavior observed will not happen in a vanilla Spring XD installation.
I install spring-xd-1.2.1.RELEASE and start in Spring XD in xd-signle mode, when I type the following command
xd:>stream create --definition "time | log" --name ticktock --deploy
I get the following result:
Command failed org.springframework.xd.rest.client.impl.SpringXDException: Could not find module with name 'log' and type 'sink'
When I type the following command:
xd:> module list
I get the following resul:
Source Processor Sink Job
gemfire gemfire-json-server filejdbc
gemfire-cq gemfire-server hdfsjdbc
jdbc jdbc jdbchdfs
kafka rabbit sqoop
rabbit redis
twittersearch
twitterstream
Some default modules are missed ? What happens ? Is there any other configuration to set before starting spring xd ?
Check XD_HOME/modules/sink/log - Is this folder exist?
I have installed SPRING-XD version 1.1.0 on a Centos machine. Using xd-singlenode I want to connect it to a SQL Server database via jdbc source and put the data into file.
I created some streams as follows:
1)xd:>stream create connectiontest --definition "jdbc --url=jdbc:sqlserver://sqlserverhost:1433/SampleDatabase --username=sample --password=***** --query= 'SELECT * FROM schema.tablename' |file" --deploy
2)xd:>stream create connectiontest --definition "jdbc --connectionProperties=jdbc:sqlserver://sqlserverhost:1433/SampleDatabase --username=sample --password=***** --initSQL= 'SELECT * FROM schema.tablename' |file" --deploy
Everytime I deploy the stream it gives the following error:
Command failed org.springframework.xd.rest.client.impl.SpringXDException: Multiple top level module resources found :file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/jms-hornetq.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/hadoop.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/xd-admin-logger.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/xd-singlenode-logger.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/xd-container-logger.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/jms-activemq.properties],file [/opt/pivotal/spring-xd-1.1.0.RELEASE/xd/config/httpSSL.properties]
Earlier I set springxd_home pointing to my springxd directory. After removing the path it is working fine now.
Thanks for the support.