Spark ML broadcast model once - spark-streaming

Background: I have a spark-ml trained random forest model. I am reading data from Kafka (streaming) which will pass through the model and the predictions will be saved in database.
What I want: I want to load my model once and broadcast that model only once when my applications will start and only redeploy if I want to.
My understanding: The model gets loaded and broadcasted for each micro-batches.
Question: How do I make my model load only once and not again for each micro batches so that it persists till the end.
Any pointers will be appreciated.

Related

Laravel - Saving PDF into a BLOB, causing job serialization issues

I am storing a raw PDF into a binary/blob column in my database. This all works great in most cases.
I had an issue a while back with livewire where the blob was causing issues with rendering. This was easy to fix by adding the blob column to the models hidden array:
https://laravel.com/docs/8.x/eloquent-serialization#hiding-attributes-from-json
I am now back here with a similar issue but no fix. When pushing one of these models into a job, the serializer fails due to not being able to encode this column.
The error we get is:
Unable to JSON encode payload. Error code: 5
My options/ideas so far:
base64 encode the PDF and save into a text field.
Maybe the new Laravel job encryption feature may help, but I am not on a version to be able to test this just yet.
Does anyone have any other ideas?
Remove the trait SerializesModels from your Job class or use an Eloquent model into constructor
From Laravel documentation
if your queued job accepts an Eloquent model in its constructor, only the identifier for the model will be serialized onto the queue. When the job is actually handled, the queue system will automatically re-retrieve the full model instance and its loaded relationships from the database. This approach to model serialization allows for much smaller job payloads to be sent to your queue driver.
Or dont use the whole model in the constructir but use the id directly.
Or simply save the PDF file to the filesystem and keep only its path in the database. Will ave you a lot of database read/write time + all the tinkering regarding big payloads

scaled microservices instances needs to update 1

I have unique problem trying to see what is the best implementation for this.
I have table which has half million rows. Each row represents
business entity I need to fetch information about this entity from
internet and update back on the table asynchronously
. (this process takes about 2 to 3 minutes) .
I cannot get all these rows updated efficiently with 1 instance of
microservices. so planning to scale this up to multiple instances
my microservice instances is async daemon fetch business entity 1 at time and process the data & finally update the data back to the table.
. Here is where my problem between multiple instances how do I ensure no 2 microservice instance works with same business entity (same row) in the update process? I want to implement an optimal solution microservices probably without having to maintain any state on the application layer.
You have to use an external system (Database/Cache) to save information about each instance.
Example: Shedlock. Creates a table or document in the database where it stores the information about the current locks.
I would suggest you to use a worker queue. Which looks like a perfect fit for your problem. Just load the whole data or id of the data to the queue once. Then let the consumers consume them.
You can see an clear explanation here
https://www.rabbitmq.com/tutorials/tutorial-two-python.html

Laravel Queues Doc. - Passing an Eloquent Model

There is a statement I don't understand in Queues chapter of Laravel's 5.5 documentation. It says:
If your queued job accepts an Eloquent model in its constructor, only
the identifier for the model will be serialized onto the queue.
I want to understand what this means. Thanks in advance.
Well, you should read further:
When the job is actually handled, the queue system will automatically re-retrieve the full model instance from the database. It's all totally transparent to your application and prevents issues that can arise from serializing full Eloquent model instances.
what means that if you pass to Queue user with id 1 and something will be changed before executing job than when executing this job, those changes will be available because fresh model will be taken from database.

Will the core data migration mechanism deal with new data on a pre populated entity?

I have a Mac application that uses core data with basically one entity.
This app creates particles for Quartz and comes with a variety of particle setups ready to use, something like fire, smoke, comet, etc. These particles are saved on that entity and shipped to the user, or in other words, the application comes with a pre populated entity.
This same entity is used to save the particles created by the user (I have a flag that I set to know if the particles were created by the user or by me).
I would like to update this app by including more pre populated particles.
The problem is that every user has already saved their particles. I need the new version not to mess with that and add the new particles I will create to them.
I know that core data mechanism is more suited to migrate structures but what about data? I suspect core data will not do that, so I will have to check the database to see if the new particles are there and add them by code the first time the user runs the app, right? or is there a way to do that automatically?
Short answer is no. Migrations are for structural changes only. It will not add new data.
The creation of new data or the updating of old data iOS a business decision and is outside of the scope of the migration API.

UI-centric vs domain-centric data model - pros and cons

How closely does your data model map to your UI and domain model?
The data model can be quite close to the domain model if it has, for example, a Customer table, an Employee table etc.
The UI might not reflect the data model so closely though - for example, there may be multiple forms, all feeding in bits-and-pieces of Customer data along with other miscellaneous bits of data. In this case, one could you have separate tables to hold the data from each form. As required the data can then combined at a future point... Alternatively one could insert the form data directly into a Customer table, so that the data model does not correlate well to the UI.
What has proven to work better for you?
I find it cleaner to map your domain model to the real world problem you are trying to solve.
You can then create viewmodels which act as a bucket of all the data required by your view.
as stated, your UI can change frequently, but this does not usually change the particular domain problem you are tackling...
information can be found on this pattern here:
http://blogs.msdn.com/dphill/archive/2009/01/31/the-viewmodel-pattern.aspx
UI can change according to many needs, so it's generally better to keep data in a domain model, abstracted away from any one UI.
If I have a RESTful service layer, what they are exposing the domain model. In that case , the UI(any particular screen) calls a number of these services and from the domain models collected composes the screen. In this scenario although domain models bubble all the way up to UI the UI layer skims out the necessary data to build its particular screen. There are also some interesting questions on SO about on using domain model(annotated) for persistence.
My point here is the domain models can be a single source of truth. It can do the work of carrying data , encapsulating logic fairly well. I have worked on projects which had a lot of boilerplate code translating each domain model to DTO, VO , DO and what-have-yous. A lot of that looked quite unnecessary and more due to habit in most cases.

Resources