What if I have less control tasks than I need to mix all suites for workers in toloka - crowdsourcing

For example I have 5 control tasks and 100 tasks and have mixing settings 4 task and 1 control.
What happens when the worker (aka toloker) just seen the last one, while there are still more potential task suites to work on?

The worker be kicked out of the "pool" !So only 5 task for every worker!
If the worker has seen all of the control tasks in the pool, he will not be able to complete any more of the task suits and will be notified that the tasks are finished.
However, if you create a separate pool with the exact same control tasks, the system will consider those as new control tasks and can show them to the same tolokers (But it could affect quality) I suggest create more control tasks from verified answers from prev runs

Related

Problem with Quartz and Tasks Simultaneous

I have some problems with a project that includes Spring Framework and Quartz tasks. I have some tasks that starts each one minute and a have a loop with 3 companies (object Company) inside this loop.
Each company must starts one particular task simultaneously, but generally, these 3 taks have duration longer than one minute.
Then a need these 3 tasks continue executing simultaneously (3 threads at the same time), but each one must wait their similar tasks (new tasks after one minute) finish and then it starts again.
I'm using Spring and I'm using taskExecutor to start de new thread. I dont know if its the best way to call the job class.
I hope I was clear.

Specifying process priority in Ansible

Is it possible to specify the process priority for an Ansible task?
The use case is setting a low priority for an expensive and long-running backup task. In a bash script I'd use nice for this. I did not find anything by searching using keywords "process priority" and "nice" combined with "Ansible".
async tasks allow you to run tasks in background. This helps in avoiding long-running tasks from blocking remaining tasks. The approach works as long as the remaining tasks are independent of the task marked async, this can reduce wait time.
For example, waiting for huge file to complete download and the next task is c completely independent command which can take some time. Since async task will run in the background by the time it is completed the rest of the independent commands are done.
Link on documentation below
https://docs.ansible.com/ansible/latest/user_guide/playbooks_async.html

Laravel Queues for multi user environment

I am using Laravel 5.1, and I have a task that takes around 2 minutes to process, and this task particularly is generating a report...
Now, it is obvious that I can't make the user wait for 2 minutes on the same page where I took user's input, instead I should process this task in the background and notify the user later about task completion...
So, to achieve this, Laravel provides Queues that runs the tasks in background (If I didn't understand wrong), Now for multi-user environment, i.e. if more than one user demands report generation (say there are 4 users), so being the feature named Queues, does it mean that tasks will be performed one after the other (i.e. when 4 users demand for report generation one after other, then 4th user's report will only be generated when report of 3rd user is generated) ??
If Queues completes their tasks one after other, then is there anyway with which tasks are instantly processed in background, on request of user, and user can get notified later when its task is completed??
Queue based architecture is little complicated than that. See the Queue provides you an interface to different messaging implementations like rabbitMQ, beanstalkd.
Now at any point in code you send send message to Queue which in this context is termed as a JOB. Now your queue will have multiple jobs which are ready to get out as in FIFO sequence.
As per your questions, there are worker which listens to queue, they get a job and execute them. It's up to you how many workers you want. If you have one worker your tasks will be executed one after another, more the workers more the parallel processes.
Worker process are started with command line interface of laravel called Artisan. Each process means one worker. You can start multiple workers with supervisor.
Since you know for sure that u r going to send notification to user after around 2 mins, i suggest to use cron job to check whether any report to generate every 2 mins and if there are, you can send notification to user. That check will be a simple one query so don't need to worry about performance that much.

How can you run more than one simultaneous job in Ansible Tower?

It seems that all jobs are enqueued, and only one will run at a time. How can we run more than one?
Tower is designed to parallelize jobs, but there are a couple of cases where it will not.
If you have your inventory or SCM set up to "update on launch" with no cache or the cahche has expired, then any additional jobs will be stuck pending behind the inventory or SCM update. The inventory and SCM will not update until after the currently running job is done.
If you are trying to run multiple jobs against the same host: Tower will not run multiple jobs against the same host at the same time in order to avoid race conditions. (localhost is a possible exception). If you need multiple jobs to run against the same host at the same time then you need to create two inventories and put that host in both inventories, running the two jobs against different inventories. In this situation, Tower does not know that you are running against the same host.
Jobs which share the same Inventory or SCM source can not run at the same time.
Suppose you have a job comprised of three tasks:
task 1: "do x", task 2: "do y", task 3: "do z"
With ansible "do x" will run on all the servers, then "do y" will run on all the servers, then "do z" will run on all the servers.
Also, I said "all serves" but in fact it maxes out at the ansible "forks" value, which defaults to 5. In my 100 server enviroment I set this value to 20. more on this here: http://docs.ansible.com/intro_configuration.html#forks
Remember the strength of ansible is doing a job ( a collection of tasks) on many machines at the same time. If what you want is to run the same task many times on a single machine, then you want something like fork, or parallel.
In fact Ansible will try to run "do x" as many times as it can across many machines. You can adjust this behavior having the whole job run on a portion of machines before it gets started on more machines with the "serial" keyword (http://docs.ansible.com/playbooks_delegation.html#rolling-update-batch-size).
Not the subtle difference between forks, and serial.
forks is "per task"
serial is "per job" ( collection of tasks )
David Thornton
Edit:I re-read your question. This is about running more than one job at a time, not running more than on task in a job. So I think you are correct for ansible-awx but not for the command line. Via the web interface you can submit a job to the job queue, but you can't make ansible-awx run more than one task at a time. I think. However via command line, if you open more than one window you can run multiple ansible-playbooks at the same time. Do you have an ansible support account? Those guys are great IMHO, they have taken a lot of time to answer my questions ( like your question ).
Simultaneous jobs can be executed from Tower. Job templates have "Enable Concurrent Jobs" option. See section "15.4. Job Concurrency" at http://docs.ansible.com/ansible-tower/latest/html/userguide/jobs.html.
If i have 3 different tasks on a single server running its called synchronous mode management, 3 tasks will be assigned to a single job ID , and each tasks executes one after the other were it consumes lots of time.
In Ansible version later than 2.5 we can get 3 job ID for 3 different tasks , and start executing at a same time were we can save a huge time.This type is called asynchronous mode.

how to implement custom cloud worker

I am designing a cloud app and need a worker process which scours my database looking for work, and then performs it.
Most of the info I seem to find on the subject of background tasks in the cloud involves some kind of scheduler and/or queuing system.
What I have doesn't quite fit into the "run this task every 5 minutes" or "add this to the queue to be executed later" models. I think the main difference to my problem is that the workers themselves find work to do, rather than being assigned it by a periodic scheduler or an external process that generates work.
What I have is basically a giant table where each entry has three fields:
job: a small task to be performed, lets say it gets the last message from a twitter account and stores it in the database
the interval at which to perform that job: say every 5 minutes, N.B. the interval is arbitrary and different for each entry in the table
the last date when the job was performed
The way I would implement this is to have a worker which has an infinite loop. When it enters the loop, it scours the database a)looking for items whose date + interval < currentTime, b)when it finds one, it sets date = currentTime, and c)then executes the job. If there is no work ATM, it sleep for a few seconds, then tries again.
I will have many parallel workers scouring the database simultaneously, which is why I do b) first and then c) in the paragraph above. Since there are parallel workers, action a) and b) are atomic operations on the database to prevent work being duplicated. If the worker crashes after a) and b), but before it manages to finish the work, it's no big deal, and the workers can just do it at the next interval; reason for this is that the work is not performed in a time-invariant system so a backlog scenario of failed jobs has no benefit as the tasks have to be performed at their exact intervals, so it's better to skip 1 interval than to have uneven intervals between which the tasks were executed.
My question is whether that is a reasonable implementation strategy? If so, how do I bring this process to life on the cloud (I am using Heroku, but may switch to EC2 in the future)? I still haven't written any code so I would welcome other suggestions (maybe I misunderstood the use cases/applications for queue systems).
This sounds so close to using something like a scheduled job that you might as well tread the well beaten path and do it the more conventional way. There's no reason why you can't schedule a job to run once every few seconds.
However, this idea of looking for work sounds dodgy. What happens if two workers find the same task to run at the same time for instance? Also, are there not triggers in the application which can indicate that work needs doing? It seems strange that you have code 'looking for work'.
You can go a very long way with simple periodic background tasks, so I would exhaust all possibilities in that area before rolling your own.

Resources