OAuth Performance - performance

OAuth Performance - performance

I'm a newbie to OAuth - I have a high volume customer using OAuth: LoadBalancer with 12 servers but only using 1 server to store the OAuth tokens. Today, when testing I can only get 1000 concurrent users on the site and I need to support an SLA of 10,000.
I'm looking at the following alternatives:
1) Look for a more robust OAuth library - must be Java based
2) Store the tokens in a database - will be slower but users will have access
Is there anything else I'm missing? Any recommendations from more experienced OAuth developers/architects?
Much Appreciated!
Steve

Not missing anything. That's not the purpose of OAuth to solve this. Therefore, 2nd alternative sounds good to me. Anyway no COTS clustering solutions, no db storage here if you want to achieve some certain level of scalability easily and at low cost.
Instead start scaling horizontally your token repository using a distributed caching system on its own tier of servers.
If java, maybe investigate spymemcached or equivalent.

You can store your oauth access tokens in any distributed persistent cache (like mongo db with replica sets). With this setup your oauth access tokes will be available on all 12 boxes and you will be able to scale horizontally. Tokens created on any box will be automatically replicated and it should be super fast compared to a regular database.
More info on mongodb and replica sets

Related

Storing user profiles

I would like to store user profile information. After researching a bit online, I am confused between the following options:
Use a LDAP server (example: Open DJ) - I can write Java clients which can interact with the LDAP server using LDAP APIs.
Store user profile in a database as a JSON document (like in Elastic DB) - The No SQL databases can then index the documents to improve lookup time.
What are the factors that I should keep in mind before selecting one of the approaches?

For a start, if you are storing passwords, then using LDAP is a no brainer IMO. See http://smart421.com/smart-identity-and-fraud/why-bother-with-an-ldap-anyway/ .
Otherwise I would recommend you do a PoC with each solutions (do not forget to add indexes for OpenDJ and you may also use Rest2LDAP) see how they fill your needs. Both products are open source so its easy to get started.

If your user population is a known group that may already have accounts in an existing LDAP repository, or where user account information needs to be shared between systems, then it makes sense to use and add on to the existing LDAP repository.
If you are starting out from scratch and have mainly external, unknown users who have no other interaction with your infrastructure but this one application, then LDAP is not a good choice imo because of the overhead that you are getting for creating and managing the server. Then a lightweight JSON approach seems better suited (even thought the L in LDAP stands for "lightweight").
The number of expected users is less of a consideration - you need to thread carefully with very large populations in either scenario.
See this questions as well for additional insights Reasons to store users' data in LDAP instead of RDBMS

How to handle user roles with Rethinkdb?

In RethinkDB, there does not seem to be built-in support for user roles/access permissions.
This seems to be a common feature in most established databases, including MongoDB. We are worried that this gives processes that have access to the database too much access and us as developers little control over who can access what, leading to potential security issues.
I'm wondering: How big of an issue is this? Is there an alternative way to replicate this functionality without rethinDB supporting it out of the box?

EDIT:
As of RethinkDB 2.3 which was just released, you can now add users and ACLs!
2.3 Release Blog Post
Users documentation
Original Answer
access control (sometimes ACL) for RethinkDB is on the road map but in the mean time I recommend to either setup multiple instances divided by user permissions of RethinkDB along with an auth key:
https://rethinkdb.com/docs/security/#securing-the-driver-port
RethinkDB allows you to set an authentication key by modifying the
cluster_config system table. Once you set an authentication key,
client drivers will be required to pass the key to the server in order
to connect.
Hope that helps!

Rate-Limit an API (spring MVC)

I'm looking the best more efficient way to implement (or use an already setup) rate limiter that would protect all my rest api url. the protection I'm looking at is a "call per second per user limiter"
I had a look on the net and what comes out was the use of either "Redis" or Guava RateLimiter.
To be honest I have never used Redis and I'am really not familiar with it. But by looking on its docs it seems that it has a quite robust rate limiter system.
I have also had a look at Guava's RateLimiter. And it looks a bit easier to use (don't need a redis installation etc...)
So I would like some suggestion of what would be "in my case" the best solution? Is using Redis "too much"?
Have any of you already tried RateLimter? Is this a good solution? Is it scaleable?
PS: I am also open to other solutions than the 2 I aforementioned if you think there are better choices.
Thank you!

If you are trying to limit access to your Spring-based REST api you should use token-bucket algorithm.
There is bucket4j-spring-boot-starter project which uses bucket4j library to rate-limit access to the REST api. You can configure it via application properties file. There is an option to limit the access based on IP address or username.
If you are using Netflix Zuul you could use Spring Cloud Zuul RateLimit which uses different storage options: Consul, Redis, Spring Data and Bucket4j.

Guava’s RateLimiter blocks the current thread so if there’s a burst of asynchronous calls against the throttled service lots of threads will be blocked and might result exhaust of free threads.
Perhaps Spring-based library Kite meets your needs. Kite's "rate-limiting throttle" rejects requests after the principal reaches a configurable limit on the number of requests in some time period. The rate limiter uses Spring Security to determine the principal involved.
But Kite is still a single-JVM approach. If you do need a cluster-aware approach Redis is a way to go.

there is no hard rule, it totally depends on your specific situation. provided that "I have never used Redis", I would recommend guava RateLimiter. compare to redis, a completely new nosql system for you, guava RateLimiter is much easier to get started with. by adding a few lines of code, you are enable to distribute permits at a configurable rate. what left to do is to adapt it to fit your need, like providing rate limit on a per user basis.

Docker for Elasticsearch multi-tenancy SaaS or single instance and proxy?

I am trying to build a prototype of Elasticsearch as a Service. I have thought of 2 different approaches and I'd like to get opinions towards one or the other implementation
One single installation of Elasticsearch, and a proxy layer on top to add user validation (http basic authentication + user account to validate the usage).
This approach would be relatively straight forward and the main challenge would be configure the cluster properly to handle the load, as well as the permissions so there are no data leaks of the users don't have access to the cluster management APIs.
Use Docker as a container and have one instance of elasticsearch for each user. In this case I would be providing the isolation by using the Linux container (Docker). I'd still need to manage authentication.
It probably would be good to implement both, play around and see how things behave. Any opinions about pros and cons of each approach?
Thanks!

Disclaimer: I am the founder of the Elasticsearch service provider Facetflow, which currently offers shared clusters.
I think that both approaches have merit, but maybe suited for different types of customers.
Looking at other SaaS providers, like MongoDB provider MongoLab, they essentially ended up offering both setups (although not using Docker).
So, pros and cons as I see them:
Shared Cluster
Most Elasticsearch as a Service providers operate this way.
Pros:
Far more affordable for the majority of users just looking for good search and analytics.
Simpler maintenance, less clusters for you to monitor
Potentially less versions of Elasticsearch to integrate with. If you need to communicate with other systems (which you do), write your own plugins (we did, for authentication, silos, entitlements, stats etc.) less versions will be far easier to maintain.
Cons:
Noisy neighbours have to be monitored and you have to scale and relocate indices to handle this.
Users have to choose from a limited list of versions of Elasticsearch, usually a single version.
Users don't get full cluster admin control.
Private Clusters using Docker
One provider that works this way is Found.
Pros:
Users could potentially be able to deploy a variety of versions of Elasticsearch
Users can have complete cluster admin access
Noisy neighbours don't affect their cluster, less manual intervention from you
Cons:
Complex monitoring and support. If people can do whatever they want (shut down the cluster over the api), you have to be clear where your responsibility as a provider ends, and what wakes you up at night.
Complex integration with multiple versions, see shared cluster pros.
More expensive since you have to allocate resources that might not always be used.

DB Server Requirements Advice

I am building a MySQL database with a web front end for a client. The client and their staff will use this webapp on a daily basis, creating anywhere from a few thousand, to possibly a few hundred thousand records annually. I just picked up a second client who wishes to have the same product and will probably be creating the same number of records annually, possibly more.
In the future I hope to pick up a few more clients. In the next few years I could have up to 5 databases & web front ends running for 5 distinct clients, all needing tight security while creating, likely, millions of records annually (cumulatively across all the databases).
I would like to run all of this with Amazon's EC2 service but am having difficulty deciding on what type of instance to run. I am not sure if I should have several distinct Linux instances, one per client, or run one "large" instance which would manage all the clients' databases and web front ends.
I know that hardware configuration is rather specific to the task at hand. The web front ends will be using JQuery to make MySQL queries "pretty" and I will likely be doing some graphing of data (again with JQuery). The front ends will be using SSL for security, which I understand can add some overhead to the network speed.
I'm looking for some of your thoughts on this situation.
Thanks

Use the tools that are available. The Amazon RDS service lets you run a MySQL database in the cloud with no extra effort. You can scale it up and down as you need - start small, and then as you hit your limits, add extra capacity (at extra cost).
Next, use Elastic Load Balancing (ELB) with an SSL certificate, so you offload the overhead of SSL decryption to an Amazon service.
If you're using Java for your webapp, you could use Elastic Beanstalk to handle the whole hosting process for you.
Don't be afraid to experiment - you can always resize instances with no data loss (if they boot from an EBS volume) and you can always create and delete instances. Scaling horizontally is often better than scaling vertically, as you can spread your instances across multiple Availability Zones.
Good luck!

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio