Implementing multi-threading in db calls using TPL

Implementing multi-threading in db calls using TPL - task-parallel-library

In the web app that I'm currently working on I've to make multiple calls to database and combine the results at-last to show in the UI. Right now.. I'm doing the calls one by one and combining the results at last. Since the web app will be hosted in a multi-core machine(intel i5) I think I can use TPL to make parallel db calls. Is it a good idea? What are the things/pitfalls I want to consider when I'm doing parallel calls to db?

There are two things to remember here. Firstly you're DB provided API may not be thread-safe, for example ADO.NET explicitly isn't 100% thread-safe. Secondly by doing this you are moving your load from the clinet to the DB. In other words if your client creates 5 concurrent connections to the DB at once it's going to have a larger impact on the DB's load. The latency of an individual client to the user may be reduced but at the expense of overall throughput in terms of the number of clients an individual DB can support.
If largely depends on your scenario as to whether you think this is a good tradeoff.
You say the "we app" if you mean web app then their are similar tradeoffs, I'd recommend this blog post on using the TPL from a web application.
http://blogs.msdn.com/b/pfxteam/archive/2010/02/08/9960003.aspx
It's the same issue. You trade of individual request latency for throughput or vis versa.

Related

Deciding between parallel network calls vs event-driven data synching in microservices

On the product page(as shown in the image) we are showing and hiding options based on microservice response. All these microservice calls are parallel and don't cause any latency issues(so far).
This is how microservices are structured.
Make, Model, Varients shown in the menu are powered from Product microservice
Product Images, Videos, 360 views, News, Road Tests are powered from CMS microservice
Product rating, reviews are powered from ProductReviews microservice
Now, something that needs to be noticed here is that every individual thing is a network call to respective microservice. We are making around 9 network calls(parallel) to above microservices to power content on the product page.
Here are the questions...
Should continue with multiple parallels calls as they are not causing any latency issues?
Should we think of reducing network calls by combining requests of each microservice into 1? ex: combine multiple service calls to CMS into 1 and do the same for the other two microservices. This way we will be reducing the number of network calls from 9 to 3
Should be sync this data to Product microservice through the event-driven system? This looks most optimised approach considering read throughput. But implementing event-driven system worth it?
Please help us decide the right approach in this case.

I think that's too much network calls for this page.
1) Should continue with multiple parallels calls as they are not causing any latency issues?
I believe these are too many calls.
2)Should we think of reducing network calls by combining requests of each micro service into 1?
I think yes. There are couple of reason.
Slower connections
If network speed is good then you may not feel it , but for mobile users this could be too much. Also the bandwidth is very important when it comes to slower networks. Watch out for that
Security
The more API calls you have the more attack surface you expose.
Adding more APIs
If you add more APIs you may have to change client and server both. But with consolidated response you have less work to do.
3) Should be sync this data to Product microservice through the event-driven system?
Well depends. Materialized views works great when your read only queries are too high (which is your case). When your user base will grow, you may eventually need to get to this approach.

ASP.Net Web API - scaling large number of write operations

I am working on a project using ASP.Net Web API that will be receiving a large number of POST operations where I will need to write many successive / simultaneous records to the DB. I don't have an exact number per second so this is more of a conceptual design question.
I am thinking of having either a standard message queue (RabbitMQ, etc) or an in-memory data store such as Redis to handle the initial intake of the data and then persisting that data to the disk via another process (or even a built in one of the queue mechanism has one).
I know I could also use threading to improve performance of the API.
Does anyone have any suggestions as far as which message queues or memory storage to look at or even just architectural recommendations?
Thanks for any and all help everyone.
-c

Using all this middle ware will make your web application scale, but it still means the same load on your DB. Your asp.net web api can be pretty fast with just using async/await. On async/await you just need to be carefully to do them all the way down - from controller to database and external requests - don't mix them with Tasks because you will end up with deadlocks.
And don't you threading because you will consume applications threads and this way it will not be able to scale - leave the threads to be used by the ASP.NET Web API.

Java EE App Design

I am writing a Java EE application which is supposed to consume SAP BAPIs/RFC using JCo and expose them as web-services to other downstream systems. The application needs to scale to huge volumes in scale of tens of thousands and thousands of simultaneous users.
I would like to have suggestions on how to design this application so that it can meet the required volume.

Its good that you are thinking of scalability right from the design phase. Martin Abbott and Michael Fisher (PayPal/eBay fame) layout a framework called AKF Scale for scaling web apps. The main principle is to scale your app in 3 axis.
X-axis: Cloning of services/ data such that work can be easily distributed across instances. For a web app, this implies ability to add more web servers (clustering).
Y-axis: separation of work responsibility, action or data. So for example in your case, you could have different API calls on different servers.
Z-Axis: separation of work by customer or requester. In your case you could say, requesters from region 1 will access Server 1, requesters from region 2 will access Server 2, etc.
Design your system so that you can follow all 3 above if you need to. But when you initially deploy, you may not need to use all three methods.
You can checkout the book "The Art of Scalability" by the above authors. http://amzn.to/oSQGHb

A final answer is not possible, but based on the information you provided this does not seem to be a problem as long as your application is stateless so that it only forwards requests to SAP and returns the responses. In this case it does not maintain any state at all. If it comes to e.g. asynchronous message handling, temporary database storage or session state management it becomes more complex. If this is true and there is no need to maintain state you can easily scale-out your application to dozens of application servers without changing your application architecture.
In my experience this is not necessarily the case when it comes to SAP integration, think of a shopping cart you want to fill based on products available in SAP. You may want to maintain this cart in your application and only submit the final cart to SAP. Otherwise you end up building an e-commerce application inside your backend.
Most important is that you reduce CPU utilization in your application to avoid a 'too-large' cluster and to reduce all kinds of I/O wherever possible, e.g. small SOAP messages to reduce network I/O.
Furthermore, I recommend to design a proper abstraction layer on top of JCo including the JCO.PoolManager for connection pooling. You may also need a well-thought-out authorization concept if you work with a connection pool managed by only one technical user.
Just some (not well structured) thoughts...

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.

No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.

You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.

This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!

I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".

PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

Performance problems with external data dependencies

I have an application that talks to several internal and external sources using SOAP, REST services or just using database stored procedures. Obviously, performance and stability is a major issue that I am dealing with. Even when the endpoints are performing at their best, for large sets of data, I easily see calls that take 10s of seconds.
So, I am trying to improve the performance of my application by prefetching the data and storing locally - so that at least the read operations are fast.
While my application is the major consumer and producer of data, some of the data can change from outside my application too that I have no control over. If I using caching, I would never know when to invalidate the cache when such data changes from outside my application.
So I think my only option is to have a job scheduler running that consistently updates the database. I could prioritize the users based on how often they login and use the application.
I am talking about 50 thousand users, and at least 10 endpoints that are terribly slow and can sometimes take a minute for a single call. Would something like Quartz give me the scale I need? And how would I get around the schedular becoming a single point of failure?
I am just looking for something that doesn't require high maintenance, and speeds at least some of the lesser complicated subsystems - if not most. Any suggestions?

This does sound like you might need a data warehouse. You would update the data warehouse from the various sources, on whatever schedule was necessary. However, all the read-only transactions would come from the data warehouse, and would not require immediate calls to the various external sources.
This assumes you don't need realtime access to the most up to date data. Even if you needed data accurate to within the past hour from a particular source, that only means you would need to update from that source every hour.
You haven't said what platforms you're using. If you were using SQL Server 2005 or later, I would recommend SQL Server Integration Services (SSIS) for updating the data warehouse. It's made for just this sort of thing.
Of course, depending on your platform choices, there may be alternatives that are more appropriate.
Here are some resources on SSIS and data warehouses. I know you've stated you will not be using Microsoft products. I include these links as a point of reference: these are the products I was talking about above.
SSIS Overview
Typical Uses of Integration Services
SSIS Documentation Portal
Best Practices for Data Warehousing with SQL Server 2008

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio