I am fairly new to SLURM: the grid I use has many different users and when they are submitting or canceling jobs, it seems that other users are not able to query partition status, etc. This is extremely frustrating especially when creating jobs that spawn other jobs since they end up failing because the controller is busy. Does anyone know a workaround?
With the default settings, Slurm can get slow/hang when many users submit/modify/cancel many jobs at the same time, especially with backfill and accounting enabled.
See tips to improve on that in these slides from the Slurm User Group Meeting of 2012.
Related
How can I efficiently interrogate jobs on a Laravel queue?
For example, my RosterEmployee job has an Employee property. I have a lot of these jobs queued. Later, I need to terminate a batch of Employees, and in doing so I want to identify any queued jobs that should be deleted. (There is no point in Rostering an employee if they are about to be terminated).
To do this, I need to identify jobs where the EmployeeId is the same as an employee who is in the termination batch and delete the jobs.
(I realize that in the real world, there would be better ways to accomplish this Employee termination logic, however, I have contrived the example to demonstrate my need - efficiently interrogating jobs on a Laravel queue.)
I need to create several 10 million jobs.
I have tried it with for-loops and Bus::batch([]) and unfortunately the creation of the jobs takes longer than the processing of the jobs by the 10 servers/workers. That means the workers have to wait until the job shows up in the database (redis etc). With redis-benchmark I could learn that Redis is not the problem.
Anyway... is there a way to create jobs in BULK (not batch)? I'm just thinking of something like:
INSERT INTO ... () VALUES (), (), (), (), ...
Anyway, creating several million jobs in a for-loop or in batch seems to be very slow for some reason. Probably because it's always just 1 query at a time and not an "upsert".
For any help I would be very grateful!
Writing a million records will be kind of slow at any condition. I'd recommend maximize your queue performance using several methods:
Create job that will create all other jobs if possible
Use only QUEUE_CONNECTION=redis for your queues as redis stores data in RAM which is fastest possible.
Create your jobs after response was processed already
I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.
Looking for some guidance on best architecture to accomplish what I am trying to do. I occasionally get spreadsheets that will have a column of data that will need to be translated. There could be anywhere from 200 to 10,000 rows in that column. What I want to do is pull all rows and add them to a redis queue. I am thinking Redis will be best as I can throttle the queue which is necessary as the api I am calling for translation has throttle limits. Once the translation is done I will put the translations into a new column and return the user a new spreadsheet with the additional column.
If anyone has ideas for best setup I am open but I want to stick with laravel as that is what the application is already running. I am just not sure if I should create one queue job and that queue process will just open the file and start doing the translations. Or do I add a queue for each row of text. Or lastly do I add all of the rows of text to a table in my database and then have a task scheduler running every minute that will check that table for any untranslated rows and process x amount of them each time is checks. Not sure about cron job running so frequently when this happens maybe twice a month.
I can see a lot of ways of doing it but looking for an ideal setup as what I don't want to happen is I hit throttle limits and lose potential translations I have done as it could error out.
Thanks for any advice
Every time I execute a query in Google bigquery in the Explanation tab, I can see that their involves an average waiting time. Is it possible to know the percentage or seconds of wait time?
Since BigQuery is a managed service, around the glob a lot of customers are using it. It has an internal scheduling system based on the billingTier (explained here https://cloud.google.com/bigquery/pricing#high-compute) and other internals of your project. Based on this the query is scheduled to be executed based on the cluster availability. So there will be a minimum time until it finds a cluster of machines to execute your job.
I never seen there significant times. In case you have this issue then contact google support to see your project. If you edit your original question and add a job ID, a google enginner may check it out if there is an issue orn ot.
It's currently not exposed in the UI.
But you can find a similar concept from API (search "wait" from following page):
https://cloud.google.com/bigquery/docs/reference/v2/jobs#resource
Is it possible to reduce the big query execution wait time to the minimum?
Purchase more BigQuery Slots.
Contact your sales representative or support for more information.