how to send json data stream to multiple topics in kafka based on input fields

how to send json data stream to multiple topics in kafka based on input fields - spring-boot

I have to consume json data coming to kafka stream and send to diffrent topics (distinct combination of app id and entity) for further consumption.
topic names :
app1.entity1
app1.entity2
app2.entity1
app2.entity2
Json Data
[
{
"appId": "app1",
"entity": "entity1",
"extractType": "txn",
"status": "success",
"fileId": "21151235"
},
{
"appId": "app1",
"entity": "entity2",
"extractType": "txn",
"status": "fail",
"fileId": "2134234123"
},
{
"appId": "app2",
"entity": "entity3",
"extractType": "payment",
"status": "success",
"fileId": "2312de23e"
},
{
"appId": "app2",
"entity": "entity3",
"extractType": "txn",
"status": "fail",
"fileId": "asxs3434"
}
]
TestInput.java
private String appId;
private String entity ;
private String extractType;
private String status;
private String fileId;
setter/gtter
SpringBootConfig.java
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public KafkaStreamsConfiguration kStreamsConfigs(KafkaProperties kafkaProperties) {
Map<String, Object> config = new HashMap<>();
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaProperties.getBootstrapServers());
config.put(StreamsConfig.APPLICATION_ID_CONFIG, kafkaProperties.getClientId());
config.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, new JsonSerde<>(TestInput.class).getClass());
config.put(JsonDeserializer.DEFAULT_KEY_TYPE, String.class);
config.put(JsonDeserializer.DEFAULT_VALUE_TYPE, TestInput.class);
return new KafkaStreamsConfiguration(config);
}
#Bean
public KStream<String, TestInput> kStream(StreamsBuilder kStreamBuilder) {
KStream<String, TestInput> stream = kStreamBuilder.stream(inputTopic);
// how to form key , group records and send to different topics
return stream;
}
I searched a lot but didnt find anything near which publishes data to topics dynamically. Please help experts

Use stream.branch()
See https://www.confluent.io/blog/putting-events-in-their-place-with-dynamic-routing/
Next, let’s modify the requirement. Instead of processing all events in the stream, each microservice should take action only on a subset of relevant events. One way to handle this requirement is to have a microservice that subscribes to the original stream with all the events, examines each record and then takes action only on the events it cares about while discarding the rest. However, depending on the application, this may be undesirable or resource intensive.
A cleaner way is to provide the service with a separate stream that contains only the relevant subset of events that the microservice cares about. To achieve this, a streaming application can branch the original event stream into different substreams using the method KStream#branch(). This results in new Kafka topics, so then the microservice can subscribe to one of the branched streams directly.
...

Related

GraphQL java: return a partial response and inform a user about it

I have a SpringBoot application that uses GraphQL to return data to a request.
What I have
One of my queries returns a list of responses based on a list of ids supplied. So my .graphqls file is a follows:
type Query {
texts(ids: [String]): [Response]
}
type Response {
id: String
text: String
}
and the following are request & response:
Request
texts(ids:["id 1","id 2"]){
id
text
}
Response
{
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
}
]
}
At the moment, if id(s) is/are not in aws, then exception is thrown and the response is an error block saying that certain id(s) was/were not found. Unfortunately, the response for other ids that were found is not displayed - instead the data block returns a null. If I check wether data is present in the code via ssay if/else statment, then partial response can be returned but I will not know that it is a partial response.
What I want to happen
My application fetches the data from aws and occasionally some of it may not be present, meaning that for one of the supplied ids, there will be no data. Not a problem, I can do checks and simply never process this id. But I would like to inform a user if the response I returned is partial (and some info is missing due to absence of data).
See example of the output I want at the end.
What I tried
While learning about GraphQL, I have encountered an instrumentation - a great tool for logging. Since it goes through all stages of execution, I thought that I can try and change the response midway - the Instrumentation class has a lot of methods, so I tried to find the one that works. I tried to make beginExecution(InstrumentationExecutionParameters parameters) and instrumentExecutionResult(ExecutionResult executionResult, InstrumentationExecutionParameters parameters) to work but neither worked for me.
I think the below may work, but as comments suggests there are parts that I failed to figure out
#Override
public GraphQLSchema instrumentSchema(GraphQLSchema schema, InstrumentationExecutionParameters parameters) {
String id = ""; // how to extract an id from the passed query (without needing to disect parameters.getQuery();
log.info("The id is " + id);
if(s3Service.doesExist(id)) {
return super.instrumentSchema(schema, parameters);
}
schema.transform(); // How would I add extra field
return schema;
}
I also found this post that seem to offer more simpler solution. Unfortunately, the link provided by host does not exist and link provided by the person who answered a question is very brief. I wonder if anyone know how to use this annotation and maybe have an example I can look at?
Finally, I know there is DataFetcherResult which can construct partial response. The problem here is that some of my other apps use reactive programming, so while it will be great for Spring mvc apps, it will not be so great for spring flux apps (because as I understand it, DataFetcherResult waits for all the outputs and as such is a blocker). Happy to be corrected on this one.
Desired output
I would like my response to look like so, when some data that was requested is not found.
Either
{
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
},
{
"id": "Non existant id",
"msg": "This id was not found"
}
]
}
or
{
"error": [
"errors": [
{
"message": "There was a problem getting data for this id(s): Bad id 1"
}
]
],
"data": [
{
"id": "id 1",
"text": "Text 1"
},
{
"id": "id 2",
"text": "Text 2"
}
]
}

So I figured out one way of achieving this, using instrumentation and extension block (as oppose to error block which is what I wanted to use initially). The big thanks goes to fellow Joe, who answered this question. Combine it with DataFetchingEnviroment (great video here) variable and I got the working solution.
My instrumentation class is as follows
public class CustomInstrum extends SimpleInstrumentation {
#Override
public CompletableFuture<ExecutionResult> instrumentExecutionResult(
ExecutionResult executionResult,
InstrumentationExecutionParameters parameters) {
if(parameters.getGraphQLContext().hasKey("Faulty ids")) {
Map<Object, Object> currentExt = executionResult.getExtensions();
Map<Object, Object> newExtensionMap = new LinkedHashMap<>();
newExtensionMap.putAll(currentExt == null ? Collections.emptyMap() : currentExt);
newExtensionMap.put("Warning:", "No data was found for the following ids: " + parameters.getGraphQLContext().get("Faulty ids").toString());
return CompletableFuture.completedFuture(
new ExecutionResultImpl(
executionResult.getData(),
executionResult.getErrors(),
newExtensionMap));
}
return CompletableFuture.completedFuture(
new ExecutionResultImpl(
executionResult.getData(),
executionResult.getErrors(),
executionResult.getExtensions()));
}
}
and my DataFetchingEnviroment is in my resolver:
public CompletableFuture<List<Article>> articles(List<String> ids, DataFetchingEnvironment env) {
List<CompletableFuture<Article>> res = new ArrayList<>();
// Below's list would contain the bad ids
List<String> faultyIds = new ArrayList<>();
for(String id : ids) {
log.info("Getting article for id {}",id);
if(s3Service.doesExist(id)) {
res.add(filterService.gettingArticle(id));
} else {
faultyIds.add(id);// if data doesn't exist then id will not be processed
}
}
// if we have any bad ids, then we add the list to the context for instrumentations to pick it up, right before returning a response
if(!faultyIds.isEmpty()) {
env.getGraphQlContext().put("Faulty ids", faultyIds);
}
return CompletableFuture.allOf(res.toArray(new CompletableFuture[0])).thenApply(item -> res.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList()));
}
You can obviously separate error related ids to different contexts but for my simple case, one will suffice. I however still interested in how can the same results be achieved via error block, so i will leave this question hanging for a bit before accepting this as a final answer.
My response looks as follows now:
{
"extensions": {
"Warning:": "No data was found for the following ids: [234]"
},
"data": { ... }
My only concern with this approach is security and "doing the right thing" - is this correct thing to do, adding something to the context and then using instrumentation to influence the response? Are there any potential security issues? If someone know anything about it and could share, it will help me greatly!
Update
After further testing it appears if exception is thrown it will still not work, so it only works if you know beforehand that something goes wrong and add appropriate exception handling. Cannot be used with try/catch block. So I am a half step back again.

Jackson derealization with SpringBoot : To get field names present in request along with respective field mapping

I have a requirement to throw different error in case of different scenarios like below, and there are many such fields not just 1.
e.g.
{
"id": 1,
"name": "nameWithSpecialChar$"
}
Here it should throw error for special character.
{
"id": 1,
"name": null
}
Here throw field null error.
{
"id": 1
}
Here throw field missing error.
Handling, 1st and 2nd scenario is easy, but for 3rd one, is there any way we can have a List of name of fields that were passed in input json at the time of serialization itself with Jackson?
One way, I am able to do it is via mapping request to JsonNode and then check if nodes are present for required fields and after that deserialize that JsonNode manually and then validate rest of the members as below.
public ResponseEntity myGetRequest(#RequestBody JsonNode requestJsonNode) {
if(!requestJsonNode.has("name")){
throw some error;
}
MyRequest request = ObjectMapper.convertValue(requestJsonNode, MyRequest .class);
validateIfFieldsAreInvalid(request);
But I do not like this approach, is there any other way of doing it?

You can define a JSON schema and validate your object against it. In your example, your schema may look like this:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"id": {
"description": "The identifier",
"type": "integer"
},
"name": {
"description": "The item name",
"type": "string",
"pattern": "^[a-zA-Z]*$"
}
},
"required": [ "id", "name" ]
}
To validate your object, you could use the json-schema-validator library. This library is built on Jackson. Since you're using Spring Boot anyway, you already have Jackson imported.
The example code looks more or less like this:
String schema = "<define your schema here>";
String data = "<put your data here>";
JsonSchemaFactory factory = JsonSchemaFactory.byDefault();
ObjectMapper m = new ObjectMapper();
JsonSchema jsonSchema = factory.getJsonSchema(m.readTree(schema));
JsonNode json = m.readTree(data);
ProcessingReport report = jsonSchema.validate(json);
System.out.println(report);
The report includes detailed errors for different input cases. For example, with this input
{
"id": 1,
"name": "nameWithSpecialChar$"
}
this output is printed out
--- BEGIN MESSAGES ---
error: ECMA 262 regex "^[a-zA-Z]*$" does not match input string "nameWithSpecialChar$"
level: "error"
schema: {"loadingURI":"#","pointer":"/properties/name"}
instance: {"pointer":"/name"}
domain: "validation"
keyword: "pattern"
regex: "^[a-zA-Z]*$"
string: "nameWithSpecialChar$"
--- END MESSAGES ---
Or instead of just printing out the report, you can loop through all errors and have your specific logic
for (ProcessingMessage message : report) {
// Add your logic here
}
You could check the example code to gain more information about how to use the library.

Spring cloud contracts with generic api

How to use spring cloud contracts with generic api. I'm asking about REST contracts on producer service. So consider an example. I have a service which allows to store user data into different formats into database and acts like proxy between service and database. It has parameters required for all consumers, and parameters which depend on a consumer.
class Request<T> {
Long requestId;
String documentName;
T documentContent;
}
And it has two consumers.
Consumer 1:
{
"requestId": 1,
"documentName": "login-events",
"documentContent": {
"userId": 2,
"sessionId": 3
}
}
Consumer 2:
{
"requestId": 1,
"documentName": "user-details",
"documentContent": {
"userId": 2,
"name": "Levi Strauss",
"age": 11
}
}
As you can see documentContent depends on consumer. In I want to write such contracts, which will check content of this field on consumer side and ignore it on producer side. Options like
"documentContent": ["age": $(consumer(11))] //will produce .field(['age']").isEqualTo(11)
and
"documentContent": ["age": $(consumer(11), producer(optional(anInteger())))] //will require field presence
didn't work. Of course I may write "documentContent": [] or even ignore this field in contracts, but I want them to act like Rest Api documentation. Does anybody has ideas how to solve this?

Ignore the optional element and define 2 contracts. One with the age value and one without it. The one with the age value should contain also contain a priority field. You can read about priority here https://cloud.spring.io/spring-cloud-static/spring-cloud-contract/2.2.0.RELEASE/reference/html/project-features.html#contract-dsl-http-top-level-elements
It would look more or less like this (contract in YAML):
priority: 5 # lower value of priority == higher priority
request:
...
body:
documentContent:
age: 11
response:
...
and then the less concrete case (in YAML)
priority: 50 # higher value of priority == lower priority
request:
...
body:
documentContent:
# no age
response:
...

I found solution, that is more applicable for my case (groovy code):
def documentContent = [
"userId": 2,
"sessionId": 3
]
Contract.make {
response {
body(
[
............
"documentContent" : $(consumer(documentContent), producer(~/.+/)),
............
]
)
}
}
But please, take into consideration, that I stubbed documentContent value with a String ("documentContent") in producer contract test.

How do I update MongoDB query results using inner query?

BACKGROUND
I have a collection of json documents that represent chemical compounds. A compound will have an id and a name. An external process generates new compound documents at intervals, and ids may change across iterative generations. Compound documents whose compound ids have changed need to be updated to point to the most recent iterations ids, and as such, a "lastUpdated" field and "relatedCompoundIds" field will be added. To demonstrate, consider the following compounds across 3 steps:
Step 1: initial compound document for 'acetone' is generated with id="001".
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-01",
}
Step 2: another iteration generates acetone, but with a different id.
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-01"
}
{
"id": "002",
"name": "acetone",
"lastUpdated": "2000-01-02"
}
Step 3: compound with id of "001" will append a "relatedCompoundIds" array pointing to any other compounds with the same name.
{
"id": "001",
"name": "acetone",
"lastUpdated": "2000-01-02",
"relatedCompoundIds": ["002"]
}
{
"id": "002",
"name": "acetone",
"lastUpdated": "2000-01-02"
}
I'm using MongoDB to house these records, and to resolve relatedCompoundId "pointers". I'm accessing Mongo using Spring ReactiveMongoTemplate. My process is as follows:
Upsert newly generated compounds into MongoDB.
For each record where "lastUpdated" is before now:
Get all related compounds (searching by name), and set "relatedCompoundIds".
CODE
public class App {
public static void main(String[] args) {
public static ReactiveMongoTemplate mongoOps = new ReactiveMongoTemplate(MongoClients.create(),
"CompoundStore");
Date updatedDate = new Date();
upsertAll(updatedDate, readPath);
setRelatedCompounds(updatedDate);
}
private static void upsertAll(Date updatedDate, String readPath) {
// [upsertion code here] <- this is working fine
}
private static void setRelatedCompounds(Date updatedDate) {
mongoOps.find(//
Query.query(Criteria.where("lastUpdated").lt(updatedDate)), Compound.class, "compound")//
.doOnNext(compound -> {
findRelatedCompounds(updatedDate, compound)//
.doOnSuccess(rc -> {
if (rc.size() > 0) {
compound.setRelatedCompoundIDs(rc);
mongoOps.save(Mono.just(compound)).subscribe();
}
})//
.subscribe();
}).blockLast();
}
private static Mono<List<String>> findRelatedCompounds(Date updatedDate, Compound compound) {
Query query = new Query().addCriteria(new Criteria().andOperator(//
Criteria.where("lastUpdated").gte(updatedDate), //
Criteria.where("name").is(compound.getName)));
query.fields().include("id");
return mongoOps.find(query, Compound.class)//
.map(c -> c.getId())//
.filter(cid -> !StringUtils.isEmpty(cid))//
.distinct().collectSortedList();
}
}
ERROR
Upon running, I get the following error:
17:08:35.957 [Thread-12] ERROR org.mongodb.driver.client - Callback onResult call produced an error
com.mongodb.MongoException: org.springframework.data.mongodb.UncategorizedMongoDbException: Too many operations are already waiting for a connection. Max number of operations (maxWaitQueueSize) of 500 has been exceeded.; nested exception is com.mongodb.MongoWaitQueueFullException: Too many operations are already waiting for a connection. Max number of operations (maxWaitQueueSize) of 500 has been exceeded.
at com.mongodb.MongoException.fromThrowableNonNull(MongoException.java:79)
Is there a better way to accomplish what I'm trying to do?
How do I adjust backpressure so as not to overload the mongo?
Other advice?
EDIT
The above error can be resolved by adding a limitRate modifier after the find method inside setRelatedCompounds.
private static void setRelatedCompounds(Date updatedDate) {
mongoOps.find(//
Query.query(Criteria.where("lastUpdated").lt(updatedDate)), Compound.class, "compound")//
.limitRate(500)//
.doOnNext(compound -> {
// do work here
}).subscribe();
}).blockLast();
}
Still open to suggestions for alternative solutions.

Spring Integration Java DSL: How to loop the paged Rest service?

How to loop the paged Rest service with the Java DSL Http.outboundGatewaymethod?
The rest URL is for example
http://localhost:8080/people?page=3
and it returns for example
"content": [
{"name": "Mike",
"city": "MyCity"
},
{"name": "Peter",
"city": "MyCity"
},
...
]
"pageable": {
"sort": {
"sorted": false,
"unsorted": true
},
"pageSize": 20,
"pageNumber": 3,
"offset": 60,
"paged": true,
"unpaged": false
},
"last": false,
"totalElements": 250,
"totalPages": 13,
"first": false,
"sort": {
"sorted": false,
"unsorted": true
},
"number": 3,
"numberOfElements": 20,
"size": 20
}
where the variable totalPages tells the total pages amount.
So if the implementation
integrationFlowBuilder
.handle(Http
.outboundGateway("http://localhost:8080/people?page=3")
.httpMethod(HttpMethod.GET)
.expectedResponseType(String.class))
access one page, how to loop all the pages?

The easiest way to do this is like wrapping the call to this Http.outboundGateway() with the #MessagingGateway and provide a page number as an argument:
#MessagingGateway
public interface HttpPagingGateway {
#Gateway(requestChannel = "httpPagingGatewayChannel")
String getPage(int page);
}
Then you get a JSON as a result, where you can convert it into some domain model or just perform a JsonPathUtils.evaluate() (based on json-path) to get the value of the last attribute to be sure that you need to call that getPage() for the page++ or not.
The page argument is going to be a payload of the message to send and that can be used as an uriVariable:
.handle(Http
.outboundGateway("http://localhost:8080/people?page={page}")
.httpMethod(HttpMethod.GET)
.uriVariable("page", Message::getPayload)
.expectedResponseType(String.class))
Of course, we can do something similar with Spring Integration, but there are going to be involved filter, router and some other components.
UPDATE
First of all I would suggest you to create a domain model (some Java Bean), let's say PersonPageResult, to represent that JSON response and this type to the expectedResponseType(PersonPageResult.class) property of the Http.outboundGateway(). The RestTemplate together with the MappingJackson2HttpMessageConverter out-of-the-box will do the trick for you to return such an object as a reply for the downstream processing.
Then, as I said before, looping would be better done from some Java code, which you could wrap to the service activator call. For this purpose you should daclare a gateway like this:
public interface HttpPagingGateway {
PersonPageResult getPage(int page);
}
Pay attention: no annotations at all. The trick is done via IntegrationFlow:
#Bean
public IntegrationFlow httpGatewayFlow() {
return IntegrationFlows.from(HttpPagingGateway.class)
.handle(Http
.outboundGateway("http://localhost:8080/people?page={page}")
.httpMethod(HttpMethod.GET)
.uriVariable("page", Message::getPayload)
.expectedResponseType(PersonPageResult.class))
}
See IntegrationFlows.from(Class<?> aClass) JavaDocs.
Such a HttpPagingGateway can be injected into some service with hard looping logic:
int page = 1;
boolean last = false;
while(!last) {
PersonPageResult result = this.httpPagingGateway.getPage(page++);
last = result.getLast();
List<Person> persons = result.getPersons();
// Process persons
}
For processing those persons I would suggest to have separate IntegrationFlow, which may start from the gateway as well or you can just send a Message<List<Person>> to its input channel.
This way you will separate concerns about paging and processing and will have a simple loop logic in some POJO method.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to send json data stream to multiple topics in kafka based on input fields - spring-boot

Related

GraphQL java: return a partial response and inform a user about it

Jackson derealization with SpringBoot : To get field names present in request along with respective field mapping

Spring cloud contracts with generic api

How do I update MongoDB query results using inner query?

Spring Integration Java DSL: How to loop the paged Rest service?

Categories

Resources