Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am processing in laravel, with product table has 300,000 items. In that table there are many duplicate names, so I used groupBy. But it takes a long time to process. Once I load the page and wait for it to finish processing the data, it also takes about 6 seconds to 7 seconds. Is there any way to optimize it fast,Thanks
$listProduct = Product::all()->groupBy('name'); // It takes about 6 seconds to 7 seconds to process this
When you call the groupBy function in this order, you are referring to a Collection object that tries to group all the models it has received. Models are relatively large objects so it takes a lot of time. A better solution would be to group on the side of the database, which performs such variations much more efficiently.
Product::all()->groupBy('name'); // slow group by on collection returned by all() method
Product::groupBy('name')->get(); // fast group by on database side
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Let's say I want to write a Rate limiter.
By Rate limiter, I mean that suppose we want that our server should entertain only 5 requests per user in the last 5 minutes.
We have to write a function that returns a boolean that if we can accept the current requests or not?
Which data structure will be used for this? How will we update entries that are more than the last 5 minutes older?
You need a FIFO queue holding a maximum of 5 times.
Upon a request, purge the times older than 5 minutes. Then if there is room left, accept the request and push the current time.
Update:
To handle the multiple users, it does not seem foolish to hold one queue per user (ring buffer), given that they are short.
Otherwise, you can store all times in a single array and organize them as doubly linked lists, with pointers to the start of the lists, per user.
You need queue with size 5, containing timestamps of requests.
With every request you check if last timestamp younger then 5 min.
If it does, you fail the request.
If it doesn't, you remove last element and push new timestamp.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently, in our project, we are using Informatica for Data loading.
We have a requirement to load 100 tables (in future it will increase) and each has 100 Million records, and we need to perform delta operation on that. What might be the best way to perform this operation in an efficient way?
If it's possible, try truncate and load. This way after each run you will have a full, fresh dump.
If you can't truncate the targets and need the delta, get some timestamp or counter that will allow to read modified rows only - like new and updated. Some "upddated date". This way you will limit the number of data being read. This will not let you do the deletes, though. So...
Create a separate flow for seeking deleted rows, that will not read the full row, but IDs only. This will still need to check all rows, but limited to just one column, so as a result it should be quite efficient. Use it to delete rows in target - or just to mark them as deleted.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What is the best practice to manage time intervals in Oracle? For example: I have a room that will be rented between 8:15 till 9:00. So I have at least 2 fields: dt_start and dt_end, I suppose. I can not permit to enter a rent between 8:45 till 9:20. So how would be the best table structure for that? Thanks
There is no clear consensus on the best way to implement this. The answer certainly depends a great deal on your exact situation. The options are:
Table with unique constraint on ROOM_ID and a block of time. This is only realistic if the application allocates a reasonably small amount of time using reasonably large blocks. For example, if a room can only be allocated for at most a week, 5 minutes at a time. But if reservations are to the second, and can span over a year, this would require 31 million rows for one reservation.
Trigger. Avoid this solution if possible. The chance of implementing this logic in a trigger that is both consistent and concurrent is very low.
Materialized view. This is my preferred approach. For example, see my answer here.
Enforced by the application. This only works if the application can serialize access and if no ad hoc SQL is allowed.
Commercial Tool. For example, RuleGen.
BEFORE INSERT TRIGGER is the best way to accomplish your need.
In trigger, figure out that the new time is not conflicting the current time of your particular room, and if so you can Rais Error, otherwise let the update happen.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a 20,000 collection of master articles and I will get about 400,000 articles of one or two pages everyday. Now, I am trying to see if each one of this 400k articles are a copy or modified version of my collection of master articles (a threshold of above 60% plagiarism is fine with me)
What are the algorithms and technologies I should use to tackle the problem in a very efficient and timely manner.
Thanks
Fingerprint the articles (i.e. intelligently hash them based on the word frequency) and then look for statistical connection between the fingerprints. Then if there is a hunch on some of the data set, do a brute force search for matching strings on those.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Which is better, small data size and more server requests, or large data size and fewer requests? Larger data size means longer processing time on ther server. How does this scale out thouh as the number of users and their activity increases?
It is always ideal to get all that you need in as many less calls as you can. If you can get all that you need in one call, definitely do it.
Also it will be very helpful if you can give some insight of what kind of application are we talking about here.