Kafka Streams Change internal topic name - spring-boot

I'm using kafka-streams:2.7.0 in my Springboot application. I have to join two KStreams and change internal topic name.
KStream messageToRetry=messageStream.join(
keyToRetryStream,
documentJoiner,
JoinWindows.of(Duration.ofMinutes(30)),
StreamJoined.with(Serdes.String(), docSerde, docSerde).withStoreName("test123"))
Internal topics follows the naming convention:${applicationId}-<storename>-changelog
So my Auto-generated topic name is :
kafka-streams-dev-KSTREAM-JOINTHIS-0000000004-store-changelog
I'm using StreamJoined.withStoreName(„test123”) to change store name and I'm expected sth like:
kafka-streams-dev-test123-store-changelog
But I've got:
kafka-streams-dev-test123-other-join-store-changelog
Is StreamJoined.withStoreName() right method to change the store name? Is there any documentation which explain when other-join suffix is added?

The answer is in javaDoc: https://kafka.apache.org/27/javadoc/org/apache/kafka/streams/kstream/StreamJoined.html#withStoreName-java.lang.String-
The name for the stores will be ${applicationId}--this-join and ${applicationId}--other-join.
However the name of the store depends on which join we are using (inner-other-join/outer-other-join).

Related

How to retrieve addresses using the odata endpoint

I'm trying to retrieve an address on a standard entity.
the request is the following :
https://mycrm.api.crm4.dynamics.com/api/data/v9.1/contacts(guid)
In this example, I get all fields, but the ones that is of interest to me is address1_addressid, which seems to be a Guid, referencing some of record, but I cannot find it in the "Many to One" relationship list I retrieved using the following command :
https://mycrm.api.crm4.dynamics.com/api/data/v9.1/EntityDefinitions(LogicalName='contact')?$select=LogicalName&$expand=ManyToOneRelationships($select=ReferencingAttribute,ReferencedEntity)
I would like to retrieve those addresses in a generic manner, as I'm working on a generic NetStandard 2.0 library. I won't be able to know on which entity I'll be working, and thus won't be able to hard-code a list of addresses field names.
Here's one way to find the Contact to Address relationship:
https://myOrg.api.crm.dynamics.com/api/data/v9.1/EntityDefinitions(LogicalName='customeraddress')?$select=LogicalName&$expand=ManyToOneRelationships($select=ReferencingAttribute,ReferencedEntity)
I got an error with https://myOrg.api.crm.dynamics.com/api/data/v9.1/contacts(guid)?$expand=address1_addressid.
I looked into the MetaData and discovered that address1_addressid has a type of Primary Key:
While a normal lookup field has a type of Lookup:
Considering the error message that appears when attempting to expand address1_addressid, I think the issue is address1_addressid's data type.
Property 'address1_addressid' on type 'Microsoft.Dynamics.CRM.contact'
is not a navigation property or complex property. Only navigation
properties can be expanded.
It would seem that rather than using $expand to get the address details, you'd have to make a separate call for it:
https://myOrg.api.crm.dynamics.com/api/data/v9.1/customeraddresses(guid)
UPDATE
By reviewing the full Contact-Address relationship, via this query: https://myOrg.api.crm.dynamics.com/api/data/v9.1/EntityDefinitions(LogicalName='customeraddress')?$select=LogicalName&$expand=ManyToOneRelationships.
I discovered the ReferencedEntityNavigationPropertyName was set to Contact_CustomerAddress.
The error message before talked about the Navigation property so I gave it a shot. Using that property named allowed me to expand the Address info from the Contact:
https://myOrg.api.crm.dynamics.com/api/data/v9.1/contacts(guid)?$expand=Contact_CustomerAddress
Interestingly, expanding that navigation property returns all 3 customer addresses:

How to access DSL-created KTable/GlobalKTable using Processor API?

I am using a Processor API (PAPI) topology.
Is it possible to access a KTable (or GlobalKTable) created with DSL from within the Processor API (even if read-only)?
I.e. using the:
val builder = new StreamsBuilder()
val KTable = builder.table("topicname")
I get a KTable, but the Topology only allows you to use addStateStore with a StoreBuilder, not the KTable itself.
.addStateStore(myStoreBuilder, MY_PROCESSOR_NAME)
So I could build one by doing this:
def keyValueStoreBuilder[K, V](storeName: String, keySerde: Serde[K], valueSerde: Serde[V]): StoreBuilder[KeyValueStore[K, V]] = {
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(storeName),
keySerde,
valueSerde)
}
But, how to cleanly obtain the storeName in this case?
When you create a KTable it will automatically create a store internally, with a generated name. (You can get the name via Topology#describe()). You can also assign a name to the store via table() method using Materialized parameter.
It's a little unclear to me, what you mean by "access a KTable within the Processor API" though? If you mean "access the KTable store within a Processor" you can use Topology#connectProcessorAndStateStores() to give the processor access to the store. Note, that the processor should never write into the KTable store, as the table() operator is responsible to maintain the table's state. If you do write into the store, there are not guarantees and you might loose data in case of a failure.

Different serde for Kafka Streams KTable state store

As part of our application logic, we use Kafka Streams state store for range lookups, data is loaded from Kafka topic using builder.table() method.
The problem is that source topic's key is serialised as JSON and doesn't suite well to binary key comparisons used internally in RocksDB based state store.
We were hoping to use a separate serde for keys by passing it to Materialized.as(). However, it looks like that streams implementation resets whatever is passed to the original serdes used to load from the table topic.
This is what I can see in streams builder internals:
public synchronized <K, V> KTable<K, V> table(final String topic,
final Consumed<K, V> cons,
final Materialized<K, V, KeyValueStore<Bytes, byte[]>> materialized) {
Objects.requireNonNull(topic, "topic can't be null");
Objects.requireNonNull(consumed, "consumed can't be null");
Objects.requireNonNull(materialized, "materialized can't be null");
materialized.withKeySerde(consumed.keySerde).withValueSerde(consumed.valueSerde);
return internalStreamsBuilder.table(topic,
new ConsumedInternal<>(consumed),
new MaterializedInternal<>(materialized, internalStreamsBuilder, topic + "-"));
}
Anybody knows why it's done this way, and if it's possible to use a different serde for a DSL state store?
Please don't propose using Processor API, this route is well explored. I would like to avoid writing a processor and a custom state store every time when I need to massage data before saving it into a state store.
After some digging through streams sources, I found out that I can pass a custom Materialized.as to the filter with always true predicate. But it smells a bit hackerish.
This is my code, that unfortunately doesn't work as we hoped to, because of "serdes reset" described above.
Serde<Value> valueSerde = new JSONValueSerde()
KTable<Key, Value> table = builder.table(
tableTopic,
Consumed.with(new JSONKeySerde(), valueSerde)
Materialized.as(cacheStoreName)
.withKeySerde(new BinaryComparisonsCompatibleKeySerde())
.withValueSerde(valueSerde)
)
The code works by design. From a streams point of view, there is no reason to use a different Serde for the store are for reading the data from the topic, because it's know to be the same data. Thus, if one does not use the default Serdes from the StreamsConfig, it's sufficient to specify the Serde once (in Consumed) and it's not required to specify it in Materialized again.
For you special case, you could read the topic as a stream a do a "dummy aggregation" that just return the latest value per record (instead of computing an actual aggregate). This allows you to specify a different Serde for the result type.

Simple MQ pub sub defining topic and topic string

i am using Websphere MQ 7.1. I want to create pub/sub and i need to define a topic
like "DEPARTMENT" with following structure
DEPARTMENT
---> SUBJECT1
---> SUBJECT2
|===> Minor1
eg I define the first one like this
define TOPIC(DEPARTMENT) TOPICSTR('SUBJECT1')
but i hit error when i try to define subject2
define TOPIC(DEPARTMENT) TOPICSTR('SUBJECT2')
it says "Object already exists". How to remedy. thanks
TOPIC objects are unique. Hence the same topic object can't defined again. Topic objects are to be used for administration and topic strings for publish messages and subscribing to publications. As you are using the same DEPARTMENT object name again to define a another topic, you are getting the error.
You can do it this way:
define TOPIC(DEPSUB1) TOPICSTR('DEPARTMENT/SUBJECT1')
define TOPIC(DEPSUB2) TOPICSTR('DEPARTMENT/SUBJECT2')
define TOPIC(DEPSUB3) TOPICSTR('DEPARTMENT/SUBJECT2/Minor1')
Later for receiving publications you can use the following sample topic strings.
"#" -> Receive all publications
"DEPARTMENT/#" -> Every publication under 'DEPARTMENT' topic
"DEPARTMENT/+/Minor1" -> All publications on 'Minor1' irrespective of SUBJECTs.

Hadoop Cascading : CascadeException "no loops allowed in cascade" when cogroup pipes twice

I'm trying to write a Casacading(v1.2) casade (http://docs.cascading.org/cascading/1.2/userguide/htmlsingle/#N20844) consisting of two flows:
1) The first flow outputs urls to a db table, (in which they are automatically assigned id's via an auto-incrementing id value).
This flow also outputs pairs of urls into a SequenceFile with field names "urlTo", "urlFrom".
2) The second flow reads from both these sources and tries to do a CoGroup on "urlTo" (from the SequenceFile) and "url" (from the db source) to get the db record "id" for each "urlTo".
It then does a CoGroup on "urlFrom" and "url" to get the db record "id" for each "urlFrom".
The two flows work individually - if I call flow.complete() on the first before running the second flow. But if I put the two flows in a cascade object I get the error
cascading.cascade.CascadeException: no loops allowed in cascade, flow: urlLink*url*url, source: JDBCTap{connectionUrl='jdbc:mysql://localhost:3306/mydb', driverClassName='com.mysql.jdbc.Driver', tableDesc=TableDesc{tableName='urls', columnNames=null, columnDefs=null, primaryKeys=null}}, sink: JDBCTap{connectionUrl='jdbc:mysql://localhost:3306/mydb', driverClassName='com.mysql.jdbc.Driver', tableDesc=TableDesc{tableName='url_link', columnNames=[urlLinkFrom, urlLinkTo], columnDefs=[bigint(20), bigint(20)], primaryKeys=[urlLinkFrom, urlLinkTo]}}
on trying to configure the cascade.
I can see it's coming from the addEdgeFor function of the CascadeConnector but I'm not clear on how to resolve this problem.
I've never used Cascade / CascadeConnector before. Is there something I'm missing?
It seems like your some paths for source and sinks are the same.
A Cascade uses the concept of Direct Graphs to build the Cascade itself so if you have a flow source and a sink source pointing to the same location that in essence creates a loop and is disallowed in the concept of Directed Graphs since
it does not go from:
Source Location A to Sink Location B
but instead goes from:
Source Location A to Sink Location A.
"A Tap is not given an explicit name by design. This is so a given Tap instance can be re-used in different {#link Flow}s that may expect a source or sink by a different logical name, but are the same physical resource."
"In general, two instances of the same Tap class must have differing Identifiers (and different #equals)."
It turns out that JDBCTaps generate their identifier from the connection url alone (and do not include the table name). So as I was reading from one table and writing to a different table in the same database it seemed like I was reading from and writing to the same Tap and causing a loop.
As a work-around, I'm going to subclass the JDBCTap and override the getIdentifier() method to include the table name.

Resources