I have run across some issues while developing MS projects for learning purposes.
So the idea currently is that I have 2 Microservices for now:
User MS (UMS)
Statistic MS (SMS)
But on my UI I have a users page, where are all users displayed but they should be sorted by "Statistic MS" data.
Currently, I have 4000 fake users, and sending new API call for each user is insane. I have a pagination with 30 users per page, even though I'm using caching.
So it means now 30 requests are sent to get the statistics for each user this is a working solution, but very slow, and also it does not sort users based on statistics.
Because I'm getting users sorted by DESC, and just sending API call to statistics service for gathering each user statistics.
But what I need to have:
When someone opens /users page they are automatically sorted from "Statistics Service" so for example users with the best statistics would be at the top.
F.Y.I. The statistics are calculated based on user activity, that's why it's a new MS. But /users page should have a filter for this, so everyone could be able to filter through all the users with the best statistics, etc.
This approach can violate the high availability and is inefficient. For example, if statistics MS is down your user MS will be down too. I think the best approach here can be a self-contained denormalized database. To be more clear, you can add a field in the user's table which determines the value of the statistic for each user.
However, this may add some complexities to your code as you have to keep both microservices (user, statistics) synched together. In this case, you can use a message broker (such as RabbitMQ, Kafka, etc) between the microservices, therefore for each statistic changes in statistics MS, it's going to publish an event on a specific channel which is listened to by the user MS to update its database.
Now you don't need to call the statistics MS for every single request and you can order the users simply by a SQL query. Also the failure of statistics MS will not impact the user MS functionality.
Related
We need to design a system that filters(based on user-defined criteria) requests stored in Elasticsearch, modify these requests and call a gRPC service. The corresponding responses then should be stored in a Hadoop database table.
The main complexity of designing the aforementioned system is the scalability. The requirement is to filter and execute the requests in millions.
Can someone please help me with a high level design of such a system that can perform these tasks in a reasonable time?
I'm calling ExecuteMultipleRequest to insert 25 records of a custom entity at a time. Each batch is taking roughly 20 seconds.
Some info about the custom entity:
I did not create its schema and can't have it changed;
It has 124 attributes (colums);
On each CreateRequest the entity has 6 attribute values filled: 2 Lookup and 4 Money. ExecuteMultipleRequest is being called from a middleware component in a corporate network, which connects to the CRM in the cloud. The CRM instance used is a sandbox, so it may have some restrictions (CPU/bandwidth/IO/etc), that I'm not aware of.
I can issue concurrent requests, but considering I can only have 2 concurrent requests per organization (https://msdn.microsoft.com/en-au/library/jj863631.aspx#limitations), it would only cut the time in half. That is still not a viable time.
For each new custom CRM process created I need to load at most 5000 entity records, in less than 10 minutes.
What can I do to improve the performance of this load? Where should I be looking at?
Would a DataImport (https://msdn.microsoft.com/en-us/library/hh547396.aspx) be faster than ExecuteMultipleRequest?
Only really got suggestions for this, you would probably have to experiment and investigate to see what works for you.
Can you run your middleware application in a physical location closer to your CRM Online site?
ExecuteMultipleRequest supports much larger batch sizes, up to 1000.
Have you compared to just using a single execute request.
Do you have lots of processes (workflows, plugins) that occur in CRM when the data import is running? This can have a big performance impact. Perhaps these can be disabled during data import. E.g. you could pre-process the data before import so a plugin wouldnt need to be executed.
The concurrent requests limitation only applied to ExecuteMultipleRequest, have you tried running lots of parallel single execute requests?
For a delivery-service application based on laravel, I want to keep the customer updated on the current location of the driver. For this purpose, I have a lat and long column in my order table. The driver has the website open and posts his html5 geolocation to the server every, let's say, 30 seconds. The row gets updated with the new position and here comes the question.
Will it be more efficient to
- have a Ajax request from the customer client every 30 seconds, that searches against all current orders with the customer id as key and retrieves the current location to update the maps,
or to
- create a private Chanel with pusher, subscribe to it from the customer client and create locationUpdated events, once the driver submits his location?
My thoughts would be to use pusher, so that I don't have to do two queries (update and retrieve) for each updated location, periodically and for possibly hundreds of users at the same time.
The disadvantage I assume to cause trouble would be the amount of channels to be maintained by the server, to make sure every client has access to updated information.
Unfortunately, I have no clue what would cause more effort to the server. Any argumentation why either of the two solutions is better than the other, or even further improvements are welcome.
I have an application where some of my user's actions must be retrieved via a 3rd party api.
For example, let's say I have a user that can receive tons of phone calls. This phone call record should be update often because my user want's to see the call history, so I should do this "almost in real time". The way I managed to do this is to retrieve every 10 minutes the list of all my logged users and, for each user I enqueue a task that retrieves the call record list from the timestamp of the latest saved record to the current timestamp and saves all that to my database.
This doesn't seems to scale well because the more users I have, then, the more connected users I'll have and the more tasks i'll enqueue.
Is there any other approach to achieve this?
Seems straightforward with background queue of jobs. It is unlikely that all users use the system at the same rate so queue jobs based on their use. With fall back to daily.
You will likely at some point need more workers taking jobs from the queue and then multiple queues so if you had a thousand users the ones with a later queue slot are not waiting all the time.
It also depends how fast you need this updated and limit on api calls.
There will be some sort of limit. So suggest you start with committing to updated with 4h or 1h delay to always give some time and work on improving this to sustain level.
Make sure your users are seeing your data and cached api not live call api data incase it goes away.
Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps