Asynchronous execution of a loop: how to make it synchronous

Asynchronous execution of a loop: how to make it synchronous - spring

spring-boot-starter-parent 3.0.0
It may be important. Anyway, under the hood:
<spring-framework.version>6.0.2</spring-framework.version>
import lombok.Synchronized;
...
#Service
#RequiredArgsConstructor
public class RequestService {
#Synchronized
public void scrape() {
List<SiteEntity> sites = siteRepository.findAllByActiveTrue();
System.out.println("Method: The Thread name is " + Thread.currentThread().getName());


for (SiteEntity site : sites) {
System.out.println("Site: " + site.getUrl());

System.out.println("Outer: The Thread name is " + Thread.currentThread().getName());

List<PhraseEntity> phrases = phraseRepository.findAllActivePhrasesBySite(site.getId());
assertSameSite(phrases, site);

final int CHUNK = 10;
List<List<PhraseEntity>> phraseChunks = Lists.partition(phrases, CHUNK);

for (List<PhraseEntity> chunk : phraseChunks) {

System.out.println("Inner: The Thread name is " + Thread.currentThread().getName());

System.out.println(chunk.get(0).getPhrase() + " : " + chunk.get(chunk.size() - 1).getPhrase());
Document doc = sendRequest(chunk, 0);

saveManager.save(doc);
}
}
}
}
The service is scraping positions of a web site in a search engine.
For debug purpose I have input each site 100 phrases. That totals to 200 phrases.
This loop can change sites arbitrarily. And the total amount of results can be like 380 (despite the fact that the total amount of phrases is 200).
The reason of that is asynchronicity. I print the names of threads. They change.
Lombok's #Synchronized has not helped
Questions:
Am I right that Spring is asynchronous by default? Documentation: https://docs.spring.io/spring-boot/docs/current/reference/htmlsingle/#features.task-execution-and-scheduling
How can I execute this loop synchronously?

Related

How to use "cache" method of Mono

I'm a beginner of spring webflux. While researching I found some code like:
Mono result = someMethodThatReturnMono().cache();
The name "cache" tell me about caching something, but where is the cache and how to retrieve cached things? Is it something like caffeine?

It cache the result of the previous steps of the Flux/Mono until the cache() method is called, check the output of this code to see it in action:
import reactor.core.publisher.Mono;
public class CacheExample {
public static void main(String[] args) {
var mono = Mono.fromCallable(() -> {
System.out.println("Go!");
return 5;
})
.map(i -> {
System.out.println("Double!");
return i * 2;
});
var cached = mono.cache();
System.out.println("Using cached");
System.out.println("1. " + cached.block());
System.out.println("2. " + cached.block());
System.out.println("3. " + cached.block());
System.out.println("Using NOT cached");
System.out.println("1. " + mono.block());
System.out.println("2. " + mono.block());
System.out.println("3. " + mono.block());
}
}
output:
Using cached
Go!
Double!
1. 10
2. 10
3. 10
Using NOT cached
Go!
Double!
1. 10
Go!
Double!
2. 10
Go!
Double!
3. 10

quarkus http calls load test results 1000 requests - 16 seconds vs 65 seconds

Test 1:
#Path("/performance")
public class PerformanceTestResource {
#Timeout(20000)
#GET
#Path("/resource")
#Produces(MediaType.APPLICATION_JSON)
public Response performanceResource() {
final String name = Thread.currentThread().getName();
System.out.println(name);
Single<Data> dataSingle = null;
try {
dataSingle = Single.fromCallable(() -> {
final String name2 = Thread.currentThread().getName();
System.out.println(name2);
Thread.sleep(1000);
return new Data();
}).subscribeOn(Schedulers.io());
} catch (Exception ex) {
int a = 1;
}
return Response.ok().entity(dataSingle.blockingGet()).build();
}
}
The test itself see also for the callPeriodically definition:
#QuarkusTest
public class PerformanceTestResourceTest {
#Tag("load-test")
#Test
public void loadTest() throws InterruptedException {
int CALL_N_TIMES = 1000;
final long CALL_NIT_EVERY_MILLISECONDS = 10;
final LoadTestMetricsData loadTestMetricsData = LoadTestUtils.callPeriodically(
this::callHttpEndpoint,
CALL_N_TIMES,
CALL_NIT_EVERY_MILLISECONDS
);
assertThat(loadTestMetricsData.responseList.size(), CoreMatchers.is(equalTo(Long.valueOf(CALL_N_TIMES).intValue())));
long executionTime = loadTestMetricsData.duration.getSeconds();
System.out.println("executionTime: " + executionTime + " seconds");
assertThat(executionTime , allOf(greaterThanOrEqualTo(1L),lessThan(20L)));
}
Results test 1:
executionTime: 16 seconds
Test 2: same but without #Timeout annotation:
executionTime: 65 seconds
Q: Why? I think even 16 seconds is slow.
Q: How to make it faster: say to be 2 seconds for 1000 calls.
I realise that I use .blockingGet() in the resource, but still, I would expect re-usage of the blocking threads.
P.S.
I tried to go more 'reactive' returning Single or CompletionStage to return from the responses - but this seems not yet ready (buggy on rest-easy side). So I go with simple .blockingGet()` and Response.
UPDATE: Reactive / RX Java 2 Way
#path("/performance")
public class PerformanceTestResource {
//#Timeout(20000)
#GET
#Path("/resource")
#Produces(MediaType.APPLICATION_JSON)
public Single<Data> performanceResource() {
final String name = Thread.currentThread().getName();
System.out.println(name);
System.out.println("name: " + name);
return Single.fromCallable(() -> {
final String name2 = Thread.currentThread().getName();
System.out.println("name2: " + name2);
Thread.sleep(1000);
return new Data();
});
}
}`
pom.xml:
io.smallrye
smallrye-context-propagation-propagators-rxjava2
org.jboss.resteasy
resteasy-rxjava2
Then when run same test:
executionTime: 64 seconds
The output would be something like:
name: vert.x-worker-thread-5 vert.x-worker-thread-9 name: vert.x-worker-thread-9
name2: vert.x-worker-thread-9
name2: vert.x-worker-thread-5
so, we are blocking the worker thread, that is use on REst/Resource side. That's hwy. Then:
If I use:Schedulers.io() to be on separate execution context for the sleep-1000-call:
return Single.fromCallable(() -> { ... }).subscribeOn(Schedulers.io());
executionTime: 16 seconds
The output will be something like this (see a new guy: RxCachedThreadScheduler)
name: vert.x-worker-thread-5
name2: RxCachedThreadScheduler-1683
vert.x-worker-thread-0
name: vert.x-worker-thread-0
vert.x-worker-thread-9
name: vert.x-worker-thread-9
name2: RxCachedThreadScheduler-1658
vert.x-worker-thread-8
Seems regardless: whether I use explicitly blockingGet() or not, i get the same result.
I assume if I am not blocking it it would be around 2-3 seconds.
Q: I there a way to fix/tweak this from this point?
I assume the use of Schedulers.io() that brings the RxCachedThreadScheduler is the bottle neck in this point so I end up with the 16 seconds, 200 io / io threads is the limit by default? But should those be reused and not really be blocked. (don't think is good idea to set that limit to 1000).
Q: Or anyways: how would make app be responsive/reactive/performant as it should with Quarkus. Or what did I miss?
Thanks!

Ok. Maybe it is me.
In my callPeriodically(); i pass CALL_NIT_EVERY_MILLISECONDS = 10 milliseconds.
10 * 1000 = 10 000 - + ten seconds just to add the requests.
This, I set it to 0.
And got my 6 seconds for server 1000 simulations requests.
Still not 2-3 seconds. But 6.
Seems there is no difference if I use .getBlocking and return Response or returning Single.
--
But just to mention it, this hello world app take 1 second to process 1000 parallel requests. While Quarkus one is 6 seconds.
public class Sample2 {
static final AtomicInteger atomicInteger = new AtomicInteger(0);
public static void main(String[] args) {
long start = System.currentTimeMillis();
final List<Single<Response>> listOfSingles = Collections.synchronizedList(new ArrayList<>());
for (int i=0; i< 1000; i++) {
// try {
// Thread.sleep(10);
// } catch (InterruptedException e) {
// e.printStackTrace();
// }
final Single<Response> responseSingle = longCallFunction();
listOfSingles.add(responseSingle);
}
Single<Response> last = Single.merge(listOfSingles).lastElement().toSingle();
final Response response = last.blockingGet();
long end = System.currentTimeMillis();
System.out.println("Execution time: " + (end - start) / 1000);
System.out.println(response);
}
static Single<Response> longCallFunction() {
return Single.fromCallable( () -> { // 1 sec
System.out.println(Thread.currentThread().getName());
Thread.sleep(1000);
int code = atomicInteger.incrementAndGet();
//System.out.println(code);
return new Response(code);
}).subscribeOn(Schedulers.io());
}
}

How solve slow througput with blocking call in SpringBoot2?

I use Spring Boot 2 and Router Function and all fine if haven't blocking code, but if I have block for 2 seconds - awfully. My application can't reach huge numbers of concurrent users and I can't impove throughput.
Doc have a section How do I wrap a synchronous, blocking call , but this approach doesn't solve the problem.
I created simple springboot2 applications where recreated the problem.
return serverRequest
.bodyToMono(ValueDto.class)
.doOnNext(order -> log.info("get order request " + order))
.map(i -> {
log.info("map 1 " + requestId);
return i;
})
.map(i -> {
log.info("map 2 " + requestId);
return i;
})
.map(i -> {
log.info("map 3 " + requestId);
return i;
})
.flatMap(i -> Mono.fromCallable(() -> executeLongMethod(i, requestId))
.subscribeOn(Schedulers.elastic()))
.map(v -> {
log.info("map 5 " + requestId);
return v;
})
.flatMap(req -> ServerResponse.ok().build());
private ValueDto executeLongMethod(final ValueDto dto, final String requestId) {
final long start = System.currentTimeMillis();
try {
log.info("start executeLongMethod. requestId:" + requestId);
TimeUnit.MILLISECONDS.sleep(1500);
return dto;
} catch (InterruptedException e) {
e.printStackTrace();
return dto;
} finally {
log.info("finish executeLongMethod requestId:" + requestId + " executed in " + (System.currentTimeMillis() - start) + "ms.");
}
}
Perform Automated Load Testing through Jmeter. It Settings:
ThreadGroup:
Number of Threads: number of concurrent threads to run during the test : 30
Ramp-Up Period: linear increase the load from 0 to target load over this time: 1
Loop Count: forever
Post request: {
"valueA":"fake",
"valueB":"fake",
"valueC":"fake"
}
Results:
Code samples can be found over on GitHub.

Is it possible to change the frequency in which spring actuator performs a health pulse?

I am attempting to look around to see how I can modify my actuator end points (specifically health) to limit its frequency. I want to see if I can set it up to being set to trigger once a minute for a specific dataset (ex mail) but leave it for others?
So far I can't seem to find that logic anywhere. The only known way I can think of is creating your own health server:
#Component
#RefreshScope
public class HealthCheckService implements HealthIndicator, Closeable {
#Override
public Health health() {
// check if things are stale
if (System.currentTimeMillis() - this.lastUpdate.get() > this.serviceProperties.getMonitorFailedThreshold()) {
String errMsg = '[' + this.serviceName + "] health status has not been updated in over ["
+ this.serviceProperties.getMonitorFailedThreshold() + "] milliseconds. Last updated: ["
+ this.lastUpdate.get() + ']';
log.error(errMsg);
return Health.down().withDetail(this.serviceName, errMsg).build();
}
// trace level since this could be called a lot.
if (this.detailMsg != null) {
Health.status(this.status);
}
Health.Builder health = Health.status(this.status);
return health.build();
}
/**
* Scheduled, low latency health check.
*/
#Scheduled(fixedDelayString = "${health.update-delay:60000}")
public void healthUpdate() {
if (this.isRunning.get()) {
if (log.isDebugEnabled()) {
log.debug("Updating Health Status of [" + this.serviceName + "]. Last Status = ["
+ this.status.getCode() + ']');
}
// do some sort of checking and update the value appropriately.
this.status = Status.UP;
this.lastUpdate.set(System.currentTimeMillis());
if (log.isDebugEnabled()) {
log.debug("Health Status of [" + this.serviceName + "] updated to [" + this.status.getCode() + ']');
}
}
}
I am not sure if there is a way to set this specifically in spring as a configuration or is the only way around this is to build a custom HealthIndicator?

How to get spark job status from program?

I am aware that hadoop REST API provides access to job status via program.
Similarly is there any way to get the spark job status in a program?

It is not similar to a REST API, but you can track the status of jobs from inside the application by registering a SparkListener with SparkContext.addSparkListener. It goes something like this:
sc.addSparkListener(new SparkListener {
override def onStageCompleted(event: SparkListenerStageCompleted) = {
if (event.stageInfo.stageId == myStage) {
println(s"Stage $myStage is done.")
}
}
})

Providing the answer for Java. In Scala would be almost similar just using SparkContext instead of JavaSparkContext.
Assume you have a JavaSparkContext:
private final JavaSparkContext sc;
Following code allow to get all info available from Jobs and Stages tabs:
JavaSparkStatusTracker statusTracker = sc.statusTracker();
for(int jobId: statusTracker.getActiveJobIds()) {
SparkJobInfo jobInfo = statusTracker.getJobInfo(jobId);
log.info("Job " + jobId + " status is " + jobInfo.status().name());
log.info("Stages status:");
for(int stageId: jobInfo.stageIds()) {
SparkStageInfo stageInfo = statusTracker.getStageInfo(stageId);
log.info("Stage id=" + stageId + "; name = " + stageInfo.name()
+ "; completed tasks:" + stageInfo.numCompletedTasks()
+ "; active tasks: " + stageInfo.numActiveTasks()
+ "; all tasks: " + stageInfo.numTasks()
+ "; submission time: " + stageInfo.submissionTime());
}
}
Unfortunately everything else is accessible only from scala Spark Context, so could be some difficulties to work with provided structures from Java.
Pools list: sc.sc().getAllPools()
Executor Memory Status: sc.sc().getExecutorMemoryStatus()
Executor ids: sc.sc().getExecutorIds()
Storage info: sc.sc().getRddStorageInfo()
... you can try to find there more useful info.

There's a (n)(almost) undocumented REST API feature that delivers almost everything you can see on the Spark UI:
http://<sparkMasterHost>:<uiPort>/api/v1/...
For local installation you can start from here:
http://localhost:8080/api/v1/applications
Possible end points you can find here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/ApiRootResource.scala

There's a (n)(almost) undocumented REST API feature on the Spark UI that delivers metrics about the job and performance.
You can access it with:
http://<driverHost>:<uiPort>/metrics/json/
(UIPort is 4040 by default)

You can get Spark job status without using Spark Job History server too. You can use SparkLauncher 2.0.1 (even Spark 1.6 version will work too) for launching your Spark job from Java program:
SparkAppHandle appHandle = sparkLauncher.startApplication();
You can also add listener to startApplication() method:
SparkAppHandle appHandle = sparkLauncher.startApplication(sparkAppListener);
Where listener has 2 methods which will inform you about job state change and info change.
I implemented using CountDownLatch, and it works as expected. This is for SparkLauncher version 2.0.1 and it works in Yarn-cluster mode too.
...
final CountDownLatch countDownLatch = new CountDownLatch(1);
SparkAppListener sparkAppListener = new SparkAppListener(countDownLatch);
SparkAppHandle appHandle = sparkLauncher.startApplication(sparkAppListener);
Thread sparkAppListenerThread = new Thread(sparkAppListener);
sparkAppListenerThread.start();
long timeout = 120;
countDownLatch.await(timeout, TimeUnit.SECONDS);
...
private static class SparkAppListener implements SparkAppHandle.Listener, Runnable {
private static final Log log = LogFactory.getLog(SparkAppListener.class);
private final CountDownLatch countDownLatch;
public SparkAppListener(CountDownLatch countDownLatch) {
this.countDownLatch = countDownLatch;
}
#Override
public void stateChanged(SparkAppHandle handle) {
String sparkAppId = handle.getAppId();
State appState = handle.getState();
if (sparkAppId != null) {
log.info("Spark job with app id: " + sparkAppId + ",\t State changed to: " + appState + " - "
+ SPARK_STATE_MSG.get(appState));
} else {
log.info("Spark job's state changed to: " + appState + " - " + SPARK_STATE_MSG.get(appState));
}
if (appState != null && appState.isFinal()) {
countDownLatch.countDown();
}
}
#Override
public void infoChanged(SparkAppHandle handle) {}
#Override
public void run() {}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Asynchronous execution of a loop: how to make it synchronous - spring

Related

How to use "cache" method of Mono

quarkus http calls load test results 1000 requests - 16 seconds vs 65 seconds

How solve slow througput with blocking call in SpringBoot2?

Is it possible to change the frequency in which spring actuator performs a health pulse?

How to get spark job status from program?

Categories

Resources