Fasttext model load time - performance

I have trained a model using AWS SageMaker and downloaded the model from SageMaker. The model .bin file is 1.7GB in size.
Now, I am loading the model using fasttext(https://fasttext.cc/docs/en/unsupervised-tutorial.html) using below code
model = fasttext.load_model(os.path.join(model_dir, 'vectors.bin'))
It takes atleast 4 secs to load the model in my local machine as well as in an EC2 instance.
How can I improve the loading time

Related

Best practice to deploy multi models that will run concurrently at scale (something like map reduce)

I have a model that consists 150 models (runs in for loop).
In order to be performance oriented, I would like to split it into 150 models, that for every request my server gets it will send 150 api requests to every different model and then combine the result (so that the invocations will run parallely). So called map reduce
I thought about AWS SageMaker multi model but it says that the use case is better for serial running more than parallel or concurrent run.
In addition, I thought about maybe creating lambda function that will read the model and scale accordingly (serverless), but it sounds very odd to me and that I miss SageMaker's usecases.
Thanks!
are your models similarly sized? This should not be an issue for the concurrent requests as long as you choose an instance type to back the endpoint that has an appropriate amount of workers to be able to handle these requests. Check out the Real-Time Inference SageMaker Pricing page to see the different instance types you can use, I would suggest tuning this instance type along with count to be able to handle your requests.

Laravel - Setting up right Jobs per function

I've a question about setting up some Jobs for my functions.
The case
1. A customer uploads images to an FTP environment
2. A function will check if there are any images in the folder the client has to upload to
3. Check if image is older than 5 minutes (to be sure there is no image still in "upload" process
4. Check if image is greater than 0kb (yes, it happens that the client uploads 0kb images)
5. Reduce image with "intervention/image"
6. Copy image to local website
7. Move image to "uploaded" folder as backup
So this are all single tasks that has to be done.
My question is, do I have to make a single job per function/task or can I put all functions in one job?
Thanks!
You don't need to use cronjobs, you can dispach a job from the controller:
ProcessImage::dispatch($image);
I think about approaching this using a class that contains all of those tasks while each task represented by a public function (additional private helper function can be added according to your needs).
Afterwards, create a Laravel Job which gets the class you've created earlier by dependency injection using the Service Container. In the handle method of you Job, call the functions from your class in any order you desire.

How to load big file model in a lambda function

Say, I want a lambda function to predict incoming message category with a trained model. However, the model is over-sized (~ 1GB).
With current architecture, I should upload the trained model to AWS S3 and then load it every time the lambda is triggered. This is not desirable since most of time is loading the model.
Some solution in mind:
Don't use lambda. Have a dedicated ec2 instance to work
Keep in warm by periodically sending dummy request
Or, I suspect AWS will cache the file, so the next loading time could be shorter?
I think reading about container reuse in lambda could be helpful here.
https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
You can add the model as a global cached variable by declaring and initialising it outside the handler function. And if Lambda is reusing the same container for subsequent requests the file won't be re-downloaded.
But it's entirely up to Lambda whether to reuse the container or start a new one. Since this is Lambda's prerogative you can't depend on this behaviour.
If you want to minimise the number of downloads from S3 maybe using an external managed caching solution (Elasticache, Redis) in the same AZ as your function is a possible alternative you can look at.

Storing data in Spring state machine?

I am making mulitplayer quiz like game. I have chosen to use spring state machines to model each individual instance of the game on the server using #EnableStateMachineFactory. But, I need every instance of the state machine to have additional game data/state info, and to init that data on the state machine startup with some custom startup data (like player usernames for example). Is ExtendedState intended for such stuff and if it is how to send custom initial extended state data when creating the state machine with factory?
Yes ExtendedState is only way to store data within a machine itself. I've used it like that so it's ok.
Order to initialize ExtendedState I'd use machine's initial action which is executed when initial state entry logic happens. In UML machine model it's purpose by definition is to init machine.
Initial State

How do you import Big Data public data sets into AWS?

Loading any of Amazon's listed public data sets (http://aws.amazon.com/datasets) would take a lot of resources and bandwidth. What's the best way to import them into AWS so you start working with them quickly?
You will need to create a new EBS Instance using the Snapshot-ID for the public dataset. That way you won't need to pay for transfer.
But be careful, some data sets are only available in one region, most likely denoted by a note similar to this. You should register your EC2 instance in the same region then.
These datasets are hosted in the us-east-1 region. If you process these from other regions you will be charged data transfer fees.
FYI : SDBExplorer uses Multithreaded BatchPutAttributes to achieve high write throughput while uploading bulk data to Amazon SimpleDB. SDB Explorer allows multiple parallel uploads. If you have the bandwidth, you can take full advantage of that bandwidth by running number of BatchPutAttributes processes at once in parallel queue that will reduce the time spend in processing. SDBExplorer supports Import data from MySql and CSV to Amazon SimpleDB.
http://www.sdbexplorer.com
Disclosure : I am the developer of SDBExplorer.

Resources