Some concern about putting a payment/transactional app on GAE - performance

I'm trying to build a shopping-cart-like webapp on GAE. So far, I haven't deployed anything on GAE still and just keep doing some POC locally... then, I read this:
http://borglin.net/gwt-project/?page_id=688
It surprised me when I read through those "weaknesses" and I'm pretty worried about whether it's a good choice putting the app on GAE. Would someone, esp who has experience building a real-world app with cash transactions, please give me some ideas/share your thoughts?
The article said GAE has "No https support for your domain". Is it true? I thought what I need to do is just point my domain https://www.abc.com to GAE https://abc.appspot.com ...
For BigTable, I understand it would be quite hard to build analytic/statistical functionalities in my app (e.g. provide a monthly transaction summary). Does anyone has any experience to handle such situation...? export data from BigTable to RDBMS and do some SQL ...?
The articale also said that BigTable has a bad write performance: "I'm lucky to get 100 writes per second in a GAE request. " Is it true? I cannot find any figure to support/disprove it ..
I'm now using SpringMVC + Objectify at server-side. Is it too heavy for GAE ? Some said Spring could make a long cold start ... how cold is it? How long would it take to init an app with like, 20+ different pages/controllers, and 20+ kinds of entities/DAO ..?? Any ballpark figure ..??
p.s. If you know any real world app built on GAE, please kindly share here? Because I wanna know how far (or how big) my app could be.
Thanks a lot!

1) That is true. https is only supported for .appspot.com. A very big shortcoming.
2) That's not really true, you can do any kind of monthly summary reports using the remoting api if you need to do complex joins and such. You can also export the data and use an offline tool
3) I haven't seen that kind of write performance bottleneck, but there is eventual consistency to deal with. That said, 100 writes/second is not a small number...
4) I would avoid spring on appengine. A lot of people use it happily but I found that startup time was very slow and that caused problems.

The SSL for Custom Domains is in testing phase. Please note that HTTPS/SSL was not designed to work on shared-IP hosting (such as GAE), so there are some SSL extensions that have various support on browsers (SNI/VIP).
Queries are quite weak on Datastore. They are also expensive. There are two ways to do analytics:
a. Create a set of sharded counters and update the counter every time an event happens (= a financial transaction). We use this and is works well. The only downside is that this is "online" analytics. You can not add additional analytics parameters for the past data.
b. Upload (anonymized) data to Google Big Query and do analytics there.
Datastore has a limitation of about 5 writes-updates/s to a SINGLE entity or entity group (some sources say 1 w/s). There is no limitation on parallel writes to different entities. Remember, GAE is a distributed system where all apps use ONE BigTable database under the hood. So this is pretty scalable.
I don't have experience with this, but there are various reports on the net. See this http://www.listry.com/blog/2010/03/google-app-engine-cold-start-guide-for

I dont know about other topics, but what I can tell you is that we use a combination of Guice + jersey to substitute Spring :) its better for GAE if we take into account rhe startup time

Related

Slow MEAN Stack Web applications

I have three MEAN Stack built web applications hosted on a shared hosting plan. It's running really slow (takes minutes to login and minutes to call the database) and I'm not sure how to optimise the performance. I have created three backend servers so that each application can can call the backend separately. I have ensured that my files are gzipped and are on HTTP3. What else should/can I do on top on that? I can't seem to find much related information online. Please give me any suggestions that you may have!
Would implementing lazy loading help? If so, please share some easy examples because I'm still new. Much appreciated!
I'd suggest moving off of shared hosting and using one of the newer generation developer-focused hosting platforms like Render or Adaptable.io. Adaptable includes MongoDB, so it's great for MEAN stack. With Render, you'd probably use MongoDB Atlas. Both provide free tiers that smaller apps can fit within.
With any of the next-gen hosting platforms, you just connect a GitHub repo with your source code and they automatically deploy your app to the cloud. You don't have to deal with keeping servers up to date, optimizing database performance or anything like that.

Should event driven architecture be targeted for all data & analytics platforms?

For example,
You have an IT estate where a mix of batch and real-time data sources exists from multiple systems, e.g. ERP, Project management, asset, website, monitoring etc.
The aim is to integrate the datasources into a cloud environment (agnostic).
There is a need for reporting and analytics on combinations of all data sources.
Inevitably, some source systems are not capable of streaming, hence batch loading is required.
Potential use-cases for performing functionality/changes/updates based on the ingested data.
Given a steer for creating a future-proofed platform, architecturally, how would you look to design it?
It's a very open-end question, but there are some good principles you can adopt to help direct you in the right direction:
Avoid point-to-point integration, and get everything going through a few common points - ideally one. Using an API Gateway can be a good place to start, the big players (Azure, AWS, GCP) all have their own options, plus there's lots of decent independent ones like Tyk or Kong.
Batches and event-streams are totally different, but even then you can still potentially route them all through the gateway so that you get the centralised observability (reporting, analytics, alerting, etc).
Use standards-based API specifications where possible. A good REST based API, based off a proper resource model is a non-trivial undertaking, not sure if it fits with what you are doing if you are dealing with lots of disparate legacy integration. If you are going to adopt REST, use OpenAPI to specify the API's. Using this standard not only makes it easier for consumers, but also helps you with better tooling as many design, build and test tools support OpenAPI. There's also AsyncAPI for event/async API's
Do some architecture. Moving sh*t to cloud doesn't remove the sh*t - it just moves it to the cloud. Don't recreate old problems in a new place.
Work out the logical components in your new solution: what does each of them do (what's it's reason to exist)? Don't forget ancillary components like API catalogues, etc.
Think about layering the integration (usually depending on how they will be consumed and what role they need to play, e.g. system interface, orchestration, experience APIs, etc).
Want to handle data in a consistent way regardless of source (your 'agnostic' comment)? You'll need to think through how data is ingested and processed. This might lead you into more data / ETL centric considerations rather than integration ones.
Co-design. Is the integration mainly data coming in or going out? Is the integration with 3rd parties or strictly internal?
If you are designing for external / 3rd party consumers then a co-design process is advised, since you're essentially designing the API for them.
If the API's are for internal use, consider designing them for external use so that when/if you decide to do that later it's not so hard.
Taker a step back:
Continually ask yourselves "what problem are we trying to solve?". Usually, a technology initiate is successful if there's a well understood reason for doing it, which has solid buy-in from the business (non-IT).
Who wants the reporting, and why - what problem are they trying to solve?
As you mentioned its an IT estate aka enterprise level solution mix of batch and real time so first you have to identify what is end goal of this migration. You can think of refactoring applications. If you are trying to make it event driven then assess the refactoring efforts and cost. Separation of responsibility is the key factor for refactoring and migration.
If you are thinking about future proofing your solution then consider Cloud for storing and processing your data. Not necessary it will be cheap but mix of Cloud and on-prem could be a way. There are services available by cloud providers to move your data in minimal cost. Cloud native solutions are there for performing analysis on your data. Database migration service in AWS or Azure can move data and then capture on-going changes. So you can keep using on-prem db & apps and perform analysis for reporting on cloud. It will ease out load on your transactional DB. Most data sync from on-prem to cloud is near real time.

Couchbase and NoSQL DBs for a serious Web-Application?

I read some posts here, but a real answer I didnt find.
Normally I work and worked with normal SQL Databaeses (MS SQL, MySQL), when I developed applications (ERP, CRM, PPS, Web Shops etc.). A real contact/experience with document-oriented databases in real business was not possible.
Only in a private sector (hobby, experimental projects) I tested MongoDB and CouchDB. The experience was good, but not enough to say "Yes, let it use for business!", because I could not test it in a real environment.
But now, there is a chance to program from zero, which could be a big start for a business.
So my question:
Can I use Couchebase for a big business application, where thousands users would use it. Is it so fast and with good performance to handling thousnds of queries, requests/reposts etc.?
How looks like with backup and restore?
Where is the limit of couchbase?
Thank you for the anwser.
In short, yes.
Your questions are too broad to fully address here. Couchbase has many real-world installations with clients doing production work at large scale. You can see several references with write ups of their uses on the Couchbase site. (Note this is not a complete list of customers, only ones that have agreed to have their use highlighted.) You will definitely recognize some names.

Web Hosting, Web Scaling

I have a simple web application to conduct online exams for the college students. All questions are multiple choice questions. Around 5000 users will be taking up the exam. My backend is mysql and using PHP as the front end. I want to know the hardware configuration for the servers that will be required to host this application and work seamlessly for the required no of users.
I am also looking out for cloud solutions. If I choose Amazone EC2 instances, can some body give me advice on what type of EC2 machine I should go into for this application?
It is impossible to tell the exact specs of the servers that will be required to run your setup, because there are too many variables. However, it is definitely a good question: when I was a student at university, it happened that a professor tried to do this, and didn't do testing: on the exam date, the system got overloaded and the exam had to be cancelled!
Start with testing what you already have. You can use something like the ab tool or JMeter. It will simulate the requested load for you automatically, so you can check how your actual server performs, and act accordingly.
Application design is also important. Like you can cache all the question at web layer to avoid database query. Make client heavy app such that server payload is minimum (json response) to reduce download time load on server.
Request multiple questions at once and Batch user responses to answer question together to decrease ajax calls.
Make use of nosql solution to avoid RDMS constraints overhead.

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.
No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.
You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.
This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!
I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".
PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

Resources