Testcontainers, try to load from local registry before build a Dockerfile - spring-boot

I'm developing some test cases using Tescontainers with Spring-Boot in order to get up a MS-SQL dockerized db. It's a huge db that takes about 40 minutes to be restored on the docker run proccess.
The steps I do to work with this image are:
Build Dockerfile with schema and data scripts tagging it as "db".
Run the container and wait about 40 minutes for database restoring.
Commit the container with "db-ready" tag.
The behavior I expect is than the test case try to run a cointainer from "db-ready" image and, if it fails, build then the image directly from Dockerfile. The code I tried looks like:
public static CustomMSSqlContainer getInstance() {
if (container == null) {
try {
container = new CustomMSSqlContainer("myproject:db-ready");
}catch(Exception ex) {
container = new CustomMSSqlContainer(new ImageFromDockerfile().withFileFromClasspath("Dockerfile", "docker/Dockerfile")
.withFileFromClasspath("schema.sql", "docker/schema.sql")
.withFileFromClasspath("entrypoint.sh", "docker/entrypoint.sh")
.withFileFromClasspath("data.sql", "docker/data.sql")
.withFileFromClasspath("data-init.sql", "docker/data-init.sql")
.withFileFromClasspath("start.sh", "docker/start.sh"));
}
container.waitingFor(Wait.forLogMessage("Database ready\\n", 1)
.withStartupTimeout(Duration.ofHours(1)))
.withExposedPorts(1433);
}
return (CustomMSSqlContainer)container;
}
Of course, this code doesn't works like I expect.
Any suggestions?

How we solved this
The way we do this is by building a custom image only on the Main/Dev branch. That way:
We don't need to have the try-catch;
We ONLY build a new container when it's actually necessary (after changes have been approved by merge request and merged into the main branch);
Building of this container is done only on the CI pipeline (so people don't have to and can't even randomly push to the container registry)
This is an example using a JUnit test (disabled in this example, but you could use Spring Profiles to enable it):
#Test
#Disabled("Should be run only with certain profiles ;)")
public void pushNewImage() throws InterruptedException {
// Startup the container before this point, using #Container or just container.start().
// That should run all your scripts and wait for the waiter
// Get the DockerClient used by the TestContainer library (you can also use your own if they every make that private).
final DockerClient dockerClient = customMSSqlContainer.getDockerClient();
// Commit docker container changes into new image
dockerClient.commitCmd(customMSSqlContainer.getContainerId())
.withRepository("myproject")
.withTag("db-ready")
.exec();
// Push new image. Logger is used for visual feedback.
dockerClient.pushImageCmd("myproject:db-ready")
.exec(new ResultCallback.Adapter<>() {
#Override
public void onNext(PushResponseItem object) {
log.info(object.toString()); // just log the push to the repo
}
}).awaitCompletion();
}
Potential pitfall with this approach
docker commit will not save anything that is saved into a volume.
This is a problem, as most database images will in fact create a volume. I can't see your Dockerfile, but make sure that all data you are saving is not saved into a volume!
Read more
I've shortly talked about this at JFokus conference recently, with a person from the TestContainers core team in my room: https://youtu.be/pxxMnvu52K8?t=1922
Almost done writing a blog post on this topic, will update this answer when it's live

Related

Multithreaded Use of Spring Pulsar

I am working on a project to read from our existing ElasticSearch instance and produce messages in Pulsar. If I do this in a highly multithreaded way without any explicit synchronization, I get many occurances of the following log line:
Message with sequence id X might be a duplicate but cannot be determined at this time.
That is produced from this line of code in the Pulsar Java client:
https://github.com/apache/pulsar/blob/a4c3034f52f857ae0f4daf5d366ea9e578133bc2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L653
When I add a synchronized block to my method, synchronizing on the pulsar template, the error disappears, but my publish rate drops substantially.
Here is the current working implementation of my method that sends Protobuf messages to Pulsar:
public <T extends GeneratedMessageV3> CompletableFuture<MessageId> persist(T o) {
var descriptor = o.getDescriptorForType();
PulsarPersistTopicSettings settings = pulsarPersistConfig.getSettings(descriptor);
MessageBuilder<T> messageBuilder = Optional.ofNullable(pulsarPersistConfig.getMessageBuilder(descriptor))
.orElse(DefaultMessageBuilder.DEFAULT_MESSAGE_BUILDER);
Optional<ProducerBuilderCustomizer<T>> producerBuilderCustomizerOpt =
Optional.ofNullable(pulsarPersistConfig.getProducerBuilder(descriptor));
PulsarOperations.SendMessageBuilder<T> sendMessageBuilder;
sendMessageBuilder = pulsarTemplate.newMessage(o)
.withSchema(Schema.PROTOBUF_NATIVE(o.getClass()))
.withTopic(settings.getTopic());
producerBuilderCustomizerOpt.ifPresent(sendMessageBuilder::withProducerCustomizer);
sendMessageBuilder.withMessageCustomizer(mb -> messageBuilder.applyMessageBuilderKeys(o, mb));
synchronized (pulsarTemplate) {
try {
return sendMessageBuilder.sendAsync();
} catch (PulsarClientException re) {
throw new PulsarPersistException(re);
}
}
}
The original version of the above method did not have the synchronized(pulsarTemplate) { ... } block. It performed faster, but generated a lot of logs about duplicate messages, which I knew to be incorrect. Adding the synchronized block got rid of the log messages, but slowed down publishing.
What are the best practices for multithreaded access to the PulsarTemplate? Is there a better way to achieve very high throughput message publishing?
Should I look at using the reactive client instead?
EDIT: I've updated the code block to show the minimum synchronization necessary to avoid the log lines, which is just synchronizing during the .sendAsync(...) call.
Your usage w/o the synchronized should work. I will look into that though to see if I see anything else going on. In the meantime, it would be great to give the Reactive client a try.
This issue was initially tracked here, and the final resolution was that it was an issue that has been resolved in Pulsar 2.11.
Please try updating the Pulsar 2.11.

Kafka stream state store rocksdb file size not decreasing on manual deletion of messages

I am using processor api to delete messages from state store. Delete is working successfully, i confirmed by using interactive queries call on state store by kafka key, but it does not reduce the kafka streams file size on local disk under directory tmp/kafka-streams.
#Override
public void init(ProcessorContext processorContext) {
this.processorContext = processorContext;
processorContext.schedule(Duration.ofSeconds(10), PunctuationType.STREAM_TIME, new Punctuator() {
#Override
public void punctuate(long l) {
processorContext.commit();
}
}); //invoke punctuate every 12 seconds
this.statestore = (KeyValueStore<String, GenericRecord>) processorContext.getStateStore(StateStoreEnum.HEADER.getStateStore());
log.info("Processor initialized");
}
#Override
public void process(String key, GenericRecord value) {
statestore.all().forEachRemaining(keyValue -> {
statestore.delete(keyValue.key);
});
}
kafka streams directory size
2.3M /private/tmp/kafka-streams
3.3M /private/tmp/kafka-streams
Do I need any specific configuration so that it keeps the file size in control? If it doesn't work this way, is it okay to delete kafka-streams directory? I assume it should be safe, since such delete will delete the record from both state store and changelog topic.
RocksDB does file compaction in the background. Hence, if you need a more aggressive compaction you should pass in a custom RocksDBConfigSetter via Streams config parameter rocksdb.config.setter. For more details about RockDB, check out the RocksDB documentation.
https://docs.confluent.io/current/streams/developer-guide/config-streams.html#rocksdb-config-setter
However, I would not recommend to change RocksDB configs as long as there is no real issue -- you can do more harm than good. Seems you store size is quite small, thus, I don't see a real problem atm.
Btw: If you go to production, you should change the state.dir config to an appropriate directory where even after restarting of a machine the state will not be lost. If you put state into the default /tmp location, state is most likely gone after restarting of the machine and an expensive recovery from the changelog topics would be triggered.

Laravel Scheduling in clustered environment

I am working with scheduling in Laravel 5.3. Previously, I was using one server to host the laravel application. Now that I am using two servers to run the Laravel App, how do I ensure that both servers are not running the same jobs at the same time?
Recently, I saw an Event method called "withoutOverlapping()". See https://laravel.com/docs/5.3/scheduling#preventing-task-overlaps
In my case, withoutOverlapping() cannot help me as I am working in a clustered environment.
Are there any workarounds or suggestions regarding this?
First of all, define if it is critical or not to avoid running task multiple times.
For example, if your app is using a task to do some sort of cleanup, there is almost no drawback to run it on every server (who care if you try to delete messages with +10 min twice?)
If it is absolutely critical to run every task only one time, you'll need to define a "main server" that will execute tasks, and a slave server that will just answer to requests but not perform any task. This is quite trivial as you just have to give every env a different name in your .env, and test against that when you define the scheduler tasks.
This is the easiest way, seriously don't bother making a database locking mecanism or whatever so you can synchronise tasks accross servers. Even OS's struggle to manage properly synchronisation against threads on the same machine, why do you want to implement the same accross different machines?
Here's what I've done when I ran into the same problems with load balancing:
class MutexCommand extends Command {
private $hash = null;
public function cleanup() {
if (is_string($this->hash)) {
Redis::del($this->hash);
$this->hash = null;
}
}
protected abstract function generateHash();
protected abstract function handleInternal();
public final function handle() {
register_shutdown_function([$this,"cleanup"]);
try {
$this->hash = $this->generateHash();
//Set a value if it does not exist atomically. Will fail if it does exist.
//Essentially setnx is the mechanism to acquire the lock
if (!Redis::setnx($this->hash,true)) {
$this->hash = null; //Prevent it from being cleaned up
throw new Exception("Already running");
}
$this->handleInternal();
} finally {
$this->cleanup();
}
}
}
Then you can write your commands:
class ThisShouldNotOverlap extends MutexCommand {
public function generateHash() {
return "Unique key for mutex, you can just use the class name if you want by doing return static::class";
}
public function handleInternal() { /* do stuff */ }
}
Then whenever you try to run the same command on multiple instances one would successfully acquire the "lock" and the others should fail.
Of course this assumes that you are using a non-clustered redis cache.
If you are not using redis then there's probably similar locking mechanisms you can implement in other caches, if you are using a clustered redis then you may need to use the RedLock locking mechanism
Essentially no, there's no a natural way using Laravel to know if another Laravel app have the same job on the job dispatcher.
We have some options there to find a solution:
Create a intermediate app that manages the jobs from the other apps.
Allow only one app to dispatch jobs.
Use worker queues, you have some packages for this, I would recommend to use Laravel 5 with WebSockets and Queue Asynchronously.
First of all Laravel scheduler isn't designed to work in a clustered environment. It was never intended to be that way.
I would suggest you should have a dedicated cron instance which manages your Laravel scheduler jobs.

Spring Statemachine Forks

I have made good progress with the state machines upto now. My most recent problem arised when I wanted to use a fork, (I'm using UML). The fork didn't work as it is supossed to and I think its because of the persistance. I persist my machine in redis. refer below image.
This is my top level machine where Manage-commands is a Sub machine Reference And the top region is as it is.
Now say I persisted some state in redis, from the below region, and next an ONLINE event comes, then the machine does not accept the event, clearly because I have asked the machine to restore the state from redis with a given key.
bur I want both the regions to be persisted so that either one is selected according to the event.
Is there any way to achieve this?
Below is how I persist n restore
private void feedMachine(StateMachine<String, String> stateMachine, String user, GenericMessage<String> event)
throws Exception {
stateMachine.sendEvent(event);
System.out.println("persist machine --- > state :" + stateMachine.getState().toString());
redisStateMachinePersister.persist(stateMachine, "testprefixSw:" + user);
}
private StateMachine<String, String> resetStateMachineFromStore(StateMachine<String, String> stateMachine,
String user) throws Exception {
StateMachine<String, String> machine = redisStateMachinePersister.restore(stateMachine, "testprefixSw:" + user);
System.out.println("restore machine --- > state :" + machine.getState().toString());
return machine;
}
It's a bit weird as I found some other issues with persistence which I fixed in 1.2.x. Probably not related to your issues but I would have expected you to see similar errors. Anyway could you check RedisPersistTests.java and see if there's something different what you're doing. I didn't yet try sub-machine refs but I should not make any difference from persistence point of view.

Command pattern, need to create new commands before executing

I use command pattern in one my application, and I have following problem:
Some commands need another commands to be created before they are executed.
The need to create new commands depends on state of application, so I can not resolve whether to create new commands when adding commands to queue, but I need to resolve it just before they are executed.
Specifically, I make commands to control one strategy game. I have command to upgrade building. And it takes resources.
When the resources price is higher than my storages capacity, the program should resolve it and insert commands for upgrading resource storages before the actual upgrade of building. This is why I can not resolve the need to upgrade storages when adding this command to queue, because the command could be executed after many days and levels of storages will change over time.
The only option that came to my mind is to insert new commands before command which needs more resources than I can have in my storages, and restart the executing of commands queue from the beginning, but it is really ugly solution.
Is there some design pattern to resolve command dependencies only when command is first in queue to be executed and to insert those depencies before this command to be executed?
I need to add the commands to upgrade storages to queue, so they could be persisted for later executing when I currently have no resources to upgrade storages.
My QueueConsumer, where the queue procesing logic is, looks like this:
public function processQueue()
{
$failedCommands = [];
$success = false;
$queue = $this->queueManager->getQueue();
foreach ($queue as $key => $command) {
foreach ($this->processors as $processor) {
if ($processor->canProcessCommand($command)) {
$success = $processor->processCommand($command);
//in the processCommandMethod I am able to resolve whether I need new commands (need to upgrade storages) or not
break;
}
}
if ($success) {
$this->queueManager->removeFromQueue($command->getUuid());
} else {
$failedCommands[] = $command;
break;
}
}
if (count($failedCommands) > 0) {
//determine when the failed commands could be processed succesfully (enough resources and so on).
}
}
Could you use an IoC container? It would resolve all dependecies for you.

Resources