Queueing batch translations with laravel best method - laravel

Looking for some guidance on best architecture to accomplish what I am trying to do. I occasionally get spreadsheets that will have a column of data that will need to be translated. There could be anywhere from 200 to 10,000 rows in that column. What I want to do is pull all rows and add them to a redis queue. I am thinking Redis will be best as I can throttle the queue which is necessary as the api I am calling for translation has throttle limits. Once the translation is done I will put the translations into a new column and return the user a new spreadsheet with the additional column.
If anyone has ideas for best setup I am open but I want to stick with laravel as that is what the application is already running. I am just not sure if I should create one queue job and that queue process will just open the file and start doing the translations. Or do I add a queue for each row of text. Or lastly do I add all of the rows of text to a table in my database and then have a task scheduler running every minute that will check that table for any untranslated rows and process x amount of them each time is checks. Not sure about cron job running so frequently when this happens maybe twice a month.
I can see a lot of ways of doing it but looking for an ideal setup as what I don't want to happen is I hit throttle limits and lose potential translations I have done as it could error out.
Thanks for any advice

Related

How to collect data with Laravel properly, then execute in batches?

I'm using Laravel 9 to create an analytics dashboard for the company I work with. Because of certain Database limitations, I need to collect hits on the website for 10 seconds and then insert them as a batch into the database.
First I thought I'll simply just save them into the cache with Laravel's cache command. However then I realized, that's what Queues are for. On the other hand though, I also know that we could receive even thousands of hits/seconds so it might be an overkill and extremely slow to use Queues for this.
What is usually the best practice to do in this situation?
Shall I simply use cache and run a scheduled task every 10 seconds to insert the collected records, then delete the cache? Or is there a better/more practical way to do this?

How to process a logic or job periodically for all users in a large scale?

I have a large set of users in my project like 50m.
I should create a playlist for each user every day, for doing this, I'm currently using this method:
I have a column in my users' table that holds the latest time of creating a playlist for that user, and I name it last_playlist_created_at.
I run a query on the users' table and get the top 1000s, that selects the list of users which their last_playlist_created_at is past one day and sort the result in ascending order by last_playlist_created_at
After that, I run a foreach on the result and publish a message for each in my message-broker.
Behind the message-broker, I start around 64 workers to process the messages (create a playlist for the user) and update last_playlist_created_at in the users' table.
If my message-broker messages list was empty, I will repeat these steps (While - Do-While)
I think the processing method is good enough and can be scalable as well,
but the method we use to create the message for each user is not scalable!
How should I do to dispatch a large set of messages for each of my users?
Ok, so my answer is completely based on your comment where you mentioned that you use while(true) to check if the playlist needs to be updated which does not seem so trivial.
Although this is a design question and there are multiple solutions, here's how I would solve it.
First up, think of updating the playlist for a user as a job.
Now, in your case this is a scheduled Job. ie. once a day.
So, use a scheduler to schedule the next job time.
Write a Scheduled Job Handler to push this to a Message Queue. This part is just to handle multiple jobs at the same time where you could control the flow.
Generate the playlist for the user based on the job. Create a Schedule event for the next day.
You could persist Scheduled Job data just to avoid race conditions.

How can I check the status of a queue in Laravel?

I need to submit a number of jobs to a Laravel queue to process some uploaded CSV files. These jobs could be finished in one second if the files are small, or a few seconds if they're bigger, or possibly up to a minute if the CSV files are very big. And I can't tell in advance how big the files will be.
When the user goes to the "results" page, I need to display the results - but only if the queue has finished the jobs. If the queue is still processing, I need to display a "try again later" message.
So - is there a way to check, from a controller, whether the queue has finished?
I'm currently using Laravel 5.1 but would happily upgrade if that helps. And I'm currently using the database queue driver. Ideally I'd love to find a general technique that works for all queue drivers, but if the only way to do it is to check a database table then I guess that's what I have to do.
Thanks!
I know this is a year old, but why not create a new queue per upload with a unique key based on that request.
$job = (new ProcessCSVJob($data))->onQueue($uniqueQueueName);
You can then simply either do a count in the database on the queue name field if you want a DB only solution.
To work across all queue types you can use the Queue size method to return the queue size.
$queue = App::make('queue.connection');
$size = $queue->size($uniqueQueueName);
This is in Laravel 5.4. Not sure how backwards compatible this is.
I expect if I was trying to do this today, I'd use a Laravel Job Event to update a status field on the database record, to log when the job has started and when it's finished.
Then I could see whether a record has been fully processed or not by just looking at the status field on the record itself.

(ASP.NET) How would you go about creating a real-time counter which tracks database changes?

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?

Can DB2 tell a web-app when a table data is updated?

I have a table of non trivial size on a DB2 database that is updated X times a day per user input in another application. This table is also read by my web-app to display some info to another set of users. I have a large number of users on my web app and they need to do lots of fuzzy string lookups with data that is up-to-the-minute accurate. So, I need a server side cache to do my fuzzy logic on and to keep the DB from getting hammered.
So, what's the best option? I would hate to pull the entire table every minute when the data changes so rarely. I could setup a trigger to update a timestamp of a smaller table and poll that to see if I need refresh my cache, but that seems hacky to.
Ideally I would like to have DB2 tell my web-app when something changes, or at least provide a very lightweight mechanism to detect data level changes.
I think if your web application is running in WebSphere, setting up MQ would be a pretty good solution.
You could write triggers that use the MQ Series routines to add things to a queue, and your web app could subscribe to the queue and listen for updates.
If your web app is not in WebSphere then you could still look at this option but it might be more difficult.
A simple solution could be to have a timestamp (somewhere) for the latest change on to table.
The timestamp could be located in a small table/view that is updated by either the application that updates the big table or by an update-trigger on the big table.
The update-triggers only task would be to update the "help"-timestamp with currenttimestamp.
Then the webapp only checks this timestamp.
If the timestamp is newer then what the webapp has then the data is reread from the big table.
A "low-tech"-solution thats fairly non intrusive to the exsisting system.
Hope this solution fits your setup.
Regards
Sigersted
Having the database push a message to your webapp is certainly doable via a variety of mechanisms (like mqseries, etc). Similar and easier is to write a java stored procedure that gets kicked off by the trigger and hands the data to your cache-maintenance interface. But both of these solutions involve a lot of versioning dependencies, etc that could be a real PITA.
Another option might be to reconsider the entire approach. Is it possible that instead of maintaining a cache on your app's side you could perform your text searching on the original table?
But my suggestion is to do as you (and the other poster) mention - and just update a timestamp in a single-row table purposed to do this, then have your web-app poll that table. Similarly you could just push the changed rows to this small table - and have your cache-maintenance program pull from this table. Either of these is very simple to implement - and should be very reliable.

Resources