Fetching Apache Flink results using Springboot API - spring-boot

I have created a springboot code to use an API so that I can fetch the flink results when the user hits the endpoint.
static Map<String, Long> finalList = new HashMap<String, Long>();
Above is the class variable outside the main method. Within the main method, a datastream API for flink is used to fetch data from Kafka using KafkaSource class and perform the aggregations in flink. The datastream is being put into data_15min variable which is an object of DataStream<Map<String, Long>>.
I used a map function to fetch the data from the datastream and put them into the above finalList object.
data_15min.map(new MapFunction<Map<String, Long>, Object>() {
ObjectMapper objMapperNew = new ObjectMapper();
#Override
public Object map(Map<String, Long> value) throws Exception {
System.out.println("Value in map: " + value);
System.out.println("KeySet: " + value.keySet());
value.forEach((key, val) -> {
if(!dcListNew.contains(key)) {
dcListNew.add(key);
finalList.put(key, val);
} else {
finalList.replace(key, val);
}
});
System.out.println("DC List NEw inside: " + dcListNew);
System.out.println("Final List inside: " + objMapperNew.writeValueAsString(finalList));
return finalList;
}
});
System.out.println("Final List: " + finalList);
On printing the finalList within the above map function I am getting the data as
Final List inside: {"ABC": 2, "AXY": 10}
Outside the map when I print, I still see it as:
Final List: {}
Can someone please help me so that this data is avialable outside the map method for me to send it to springboot api response?

I think the main reason this does not work is becuse of concurrency, I assume that You are calling execute() on your environment there somewhere, so only after execute() finishes Your finalList has been populated with records.
Note that working like this isn't safe at all and quite probably will result in errors after short time. Overall, seems that the architecture of the solution is missing something like database where You would store Your results from Flink and then access the db from Spring.

Related

How to implement a list of DB update queries in one call with SpringBoot Webflux + R2dbc application

The goal of my springBoot webflux r2dbc application is Controller accepts a Request including a list of DB UPDATE or INSERT details, and Response a result summary back.
I can write a ReactiveCrudRepository based repository to implement each DB operation. But I don't know how to write the Service to group the executions of the list of DB operations and compose a result summary response.
I am new to java reactive programing. Thanks for any suggestions and help.
Chen
I get the hint from here: https://www.vinsguru.com/spring-webflux-aggregation/ . Ideas are :
From request to create 3 Monos
Mono<List> monoEndDateSet -- DB Row ids of update operation;
Mono<List> monoCreateList -- DB Row ids of new inserted;
Mono monoRespFilled -- partly fill some known fields;
use Mono.zip aggregate the 3 monos, map and aggregate the Tuple3 to Mono to return.
Below are key part of codes:
public Mono<ChangeSupplyResponse> ChangeSupplies(ChangeSupplyRequest csr){
ChangeSupplyResponse resp = ChangeSupplyResponse.builder().build();
resp.setEventType(csr.getEventType());
resp.setSupplyOperationId(csr.getSupplyOperationId());
resp.setTeamMemberId(csr.getTeamMemberId());
resp.setRequestTimeStamp(csr.getTimestamp());
resp.setProcessStart(OffsetDateTime.now());
resp.setUserId(csr.getUserId());
Mono<List<Long>> monoEndDateSet = getEndDateIdList(csr);
Mono<List<Long>> monoCreateList = getNewSupplyEntityList(csr);
Mono<ChangeSupplyResponse> monoRespFilled = Mono.just(resp);
return Mono.zip(monoRespFilled, monoEndDateSet, monoCreateList).map(this::combine).as(operator::transactional);
}
private ChangeSupplyResponse combine(Tuple3<ChangeSupplyResponse, List<Long>, List<Long>> tuple){
ChangeSupplyResponse resp = tuple.getT1().toBuilder().build();
List<Long> endDateIds = tuple.getT2();
resp.setEndDatedDemandStreamSupplyIds(endDateIds);
List<Long> newIds = tuple.getT3();
resp.setNewCreatedDemandStreamSupplyIds(newIds);
resp.setSuccess(true);
Duration span = Duration.between(resp.getProcessStart(), OffsetDateTime.now());
resp.setProcessDurationMillis(span.toMillis());
return resp;
}
private Mono<List<Long>> getNewSupplyEntityList(ChangeSupplyRequest csr) {
Flux<DemandStreamSupplyEntity> fluxNewCreated = Flux.empty();
for (SrmOperation so : csr.getOperations()) {
if (so.getType() == SrmOperationType.createSupply) {
DemandStreamSupplyEntity e = buildEntity(so, csr);
fluxNewCreated = fluxNewCreated.mergeWith(this.demandStreamSupplyRepository.save(e));
}
}
return fluxNewCreated.map(e -> e.getDemandStreamSupplyId()).collectList();
}
...

Spring Webflux - R2dbc : How to run a child query and update value while iterating a result set

I am new to Reactive repositories and webflux. I am fetching a list of data from DB, iterating it using map() to build a DTO class object, in this process I need to run another query to get the count value and update the same DTO object. When I try as follows, the count is set to null
#Repository
public class CandidateGroupCustomRepo {
public Flux<CandidateGroupListDTO> getList(BigInteger userId){
final String sql = "SELECT gp.CANDIDATE_GROUP_ID,gp.NAME ,gp.GROUP_TYPE \n" +
" ,gp.CREATED_DATE ,cd.DESCRIPTION STATUS ,COUNT(con.CANDIDATE_GROUP_ID)\n" +
" FROM ........" +
" WHERE gp.CREATED_BY_USER_ID = :userId GROUP BY gp.CANDIDATE_GROUP_ID,gp.NAME ,gp.GROUP_TYPE \n" +
" ,gp.CREATED_DATE ,cd.DESCRIPTION";
return dbClient.execute(sql)
.bind("userId", userId)
.map(row ->{
CandidateGroupListDTO info = new CandidateGroupListDTO();
info.setGroupId(row.get(0, BigInteger.class));
info.setGroupName(row.get(1, String.class)) ;
info.setGroupType(row.get(2, String.class));
info.setCreatedDate( row.get(3, LocalDateTime.class));
info.setStatus(row.get(4, String.class));
if(info.getGroupType().equalsIgnoreCase("static")){
info.setContactsCount(row.get(5, BigInteger.class));
}else{
getGroupContactCount(info.getGroupId()).subscribe(count ->{
System.out.println(">>>>>"+count);
info.setContactsCount(count);
});
}
return info;
}
)
.all() ;
}
Mono<BigInteger> getGroupContactCount(BigInteger groupId){
final String sql = "SELECT 3 WHERE :groupId IS NOT NULL;";
return dbClient.execute(sql)
.bind("groupId", groupId)
.map(row -> {
System.out.println(row.get(0, BigInteger.class));
return row.get(0, BigInteger.class);
} ).one();
}
}
When I call getGroupContactCount, I am trying to extract count from Mono<BigInteger> and set it in my DTO.... sys out prints the count value correctly but still I get null for count in response.
You are calling subscribe in the middle which in turn is essentially blocking. The one subscribing is usually the final consumer, which im guessing your spring application is not, most likely the final consumer is the webpage that initiated the call. Your server is the producer.
call the database, flatMap and return.
return dbClient.execute(sql)
.bind("userId", userId)
.flatMap(row ->{
CandidateGroupListDTO info = new CandidateGroupListDTO();
info.setGroupId(row.get(0, BigInteger.class));
info.setGroupName(row.get(1, String.class)) ;
info.setGroupType(row.get(2, String.class));
info.setCreatedDate( row.get(3, LocalDateTime.class));
info.setStatus(row.get(4, String.class));
if(info.getGroupType().equalsIgnoreCase("static")){
return Mono.just(info.setContactsCount(row.get(5, BigInteger.class)));
} else {
return getGroupContactCount(info.getGroupId()).flatMap(count -> {
info.setContactsCount(count);
return Mono.just(info)
});
}
}).all();
Use map if order matters, otherwise try to use flatMap to do async work.

Save Multiple Records using Web Flux and Mongo DB

I'm working on a project which uses Spring web Flux and mongo DB and I'm very new to reactive programming and WebFlux.
I have scenario of saving into 3 collections using one service. For each collection im generating id using a sequence and then save them. I have FieldMaster which have List on them and every Field Info has List . I need to save FieldMaster, FileInfo and FieldOption. Below is the Code i'm using. The code works only when i'm running on debugging mode, otherwise it get blocked on below line
Integer field_seq_id = Integer.parseInt(sequencesCollection.getNextSequence(FIELDINFO).block().getSeqValue());
Here is the full code
public Mono< FieldMaster > createMasterData(Mono< FieldMaster > fieldmaster)
{
return fieldmaster.flatMap(fm -> {
return sequencesCollection.getNextSequence(FIELDMASTER).flatMap(seqVal -> {
LOGGER.info("Generated Sequence value :" + seqVal.getSeqValue());
fm.setId(Integer.parseInt(seqVal.getSeqValue()));
List<FieldInfo> fieldInfo = fm.getFieldInfo();
fieldInfo.forEach(field -> {
// saving Field Goes Here
Integer field_seq_id = Integer.parseInt(sequencesCollection.getNextSequence(FIELDINFO).block().getSeqValue()); // stops execution at this line
LOGGER.info("Generated Sequence value Field Sequence:" + field_seq_id);
field.setId(field_seq_id);
field.setMasterFieldRefId(fm.getId());
mongoTemplate.save(field).block();
LOGGER.info("Field Details Saved");
List<FieldOption> fieldOption = field.getFieldOptions();
fieldOption.forEach(option -> {
// saving Field Option Goes Here
Integer opt_seq_id = Integer.parseInt(sequencesCollection.getNextSequence(FIELDOPTION).block().getSeqValue());
LOGGER.info("Generated Sequence value Options Sequence:" + opt_seq_id);
option.setId(opt_seq_id);
option.setFieldRefId(field_seq_id);
mongoTemplate.save(option).log().block();
LOGGER.info("Field Option Details Saved");
});
});
return mongoTemplate.save(fm).log();
});
});
}
First in reactive programming is not good to use .block because you turn nonblocking code to blocking. if you want to get from a stream and save in 3 streams you can do like that.
There are many different ways to do that for performance purpose but it depends of the amount of data.
here you have a sample using simple data and using concat operator but there are even zip and merge. it depends from your needs.
public void run(String... args) throws Exception {
Flux<Integer> dbData = Flux.range(0, 10);
dbData.flatMap(integer -> Flux.concat(saveAllInFirstCollection(integer), saveAllInSecondCollection(integer), saveAllInThirdCollection(integer))).subscribe();
}
Flux<Integer> saveAllInFirstCollection(Integer integer) {
System.out.println(integer);
//process and save in collection
return Flux.just(integer);
}
Flux<Integer> saveAllInSecondCollection(Integer integer) {
System.out.println(integer);
//process and save in collection
return Flux.just(integer);
}
Flux<Integer> saveAllInThirdCollection(Integer integer) {
System.out.println(integer);
//process and save in collection
return Flux.just(integer);
}

Hibernate queries getting slower and slower

I'm working on a process that checks and updates data from Oracle database. I'm using hibernate and spring framework in my application.
The application reads a csv file, processes the content, then persiste entities :
public class Main() {
Input input = ReadCSV(path);
EntityList resultList = Process.process(input);
WriteResult.write(resultList);
...
}
// Process class that loops over input
public class Process{
public EntityList process(Input input) :
EntityList results = ...;
...
for(Line line : input.readLine()){
results.add(ProcessLine.process(line))
...
}
return results;
}
// retrieving and updating entities
Class ProcessLine {
#Autowired
DomaineRepository domaineRepository;
#Autowired
CompanyDomaineService companydomaineService
#Transactional
public MyEntity process(Line line){
// getcompanyByXX is CrudRepository method with #Query that returns an entity object
MyEntity companyToAttach = domaineRepository.getCompanyByCode(line.getCode());
MyEntity companyToDetach = domaineRepository.getCompanyBySiret(line.getSiret());
if(companyToDetach == null || companyToAttach == null){
throw new CustomException("Custom Exception");
}
// AttachCompany retrieves some entity relationEntity, then removes companyToDetach and adds CompanyToAttach. this updates relationEntity.company attribute.
companydomaineService.attachCompany(companyToAttach, companyToDetach);
return companyToAttach;
}
}
public class WriteResult{
#Autowired
DomaineRepository domaineRepository;
#Transactional
public void write(EntityList results) {
for (MyEntity result : results){
domaineRepository.save(result)
}
}
}
The application works well on files with few lines, but when i try to process large files (200 000 lines), the performance slows drastically, and i get a SQL timeout.
I suspect cache issues, but i'm wondering if saving all the entities at the end of the processing isn't a bad practice ?
The problem is your for loop which is doing individual saves on the result and thus does single inserts slowing it down. Hibernate and spring support batch inserts and should be done when ever possible.
something like domaineRepository.saveAll(results)
Since you are processing lot of data it might be better to do things in batches so instead of getting one company to attach you should get a list of companies to attach processes those then get a list of companies to detach and process those
public EntityList process(Input input) :
EntityList results;
List<Code> companiesToAdd = new ArrayList<>();
List<Siret> companiesToRemove = new ArrayList<>();
for(Line line : input.readLine()){
companiesToAdd.add(line.getCode());
companiesToRemove.add(line.getSiret());
...
}
results = process(companiesToAdd, companiesToRemove);
return results;
}
public MyEntity process(List<Code> companiesToAdd, List<Siret> companiesToRemove) {
List<MyEntity> attachList = domaineRepository.getCompanyByCodeIn(companiesToAdd);
List<MyEntity> detachList = domaineRepository.getCompanyBySiretIn(companiesToRemove);
if (attachList.isEmpty() || detachList.isEmpty()) {
throw new CustomException("Custom Exception");
}
companydomaineService.attachCompany(attachList, detachList);
return attachList;
}
The above code is just sudo code to point you in the right direction, will need to work out what works for you.
For every line you read you are doing 2 read operations here
MyEntity companyToAttach = domaineRepository.getCompanyByCode(line.getCode());
MyEntity companyToDetach = domaineRepository.getCompanyBySiret(line.getSiret());
You can read more than one line and us the in query and then process that list of companies

Spark RDD to update

I am loading a file from HDFS into a JavaRDD and wanted to update that RDD. For that I am converting it to IndexedRDD (https://github.com/amplab/spark-indexedrdd) and I am not able to as I am getting Classcast Exception.
Basically I will make key value pair and update the key. IndexedRDD supports update. Is there any way to convert ?
JavaPairRDD<String, String> mappedRDD = lines.flatMapToPair( new PairFlatMapFunction<String, String, String>()
{
#Override
public Iterable<Tuple2<String, String>> call(String arg0) throws Exception {
String[] arr = arg0.split(" ",2);
System.out.println( "lenght" + arr.length);
List<Tuple2<String, String>> results = new ArrayList<Tuple2<String, String>>();
results.addAll(results);
return results;
}
});
IndexedRDD<String,String> test = (IndexedRDD<String,String>) mappedRDD.collectAsMap();
The collectAsMap() returns a java.util.Map containing all the entries from your JavaPairRDD, but nothing related to Spark. I mean, that function is to collect the values in one node and work with plain Java. Therefore, you cannot cast it to IndexedRDD or any other RDD type as its just a normal Map.
I haven't used IndexedRDD, but from the examples you can see that you need to create it by passing to its constructor a PairRDD:
// Create an RDD of key-value pairs with Long keys.
val rdd = sc.parallelize((1 to 1000000).map(x => (x.toLong, 0)))
// Construct an IndexedRDD from the pairs, hash-partitioning and indexing
// the entries.
val indexed = IndexedRDD(rdd).cache()
So in your code it should be:
IndexedRDD<String,String> test = new IndexedRDD<String,String>(mappedRDD.rdd());

Resources