Understanding use case for Kebreos secured Hadoop - hadoop

I have configured Kerberos with Hadoop. I'm facing difficulty in mapping the Kerberos architecture and whole flow of authentication to my application.
Following is my use case:
We have a web application that calls backend services, which communicates with Hadoop ecosystem internally.
Now I don't have clear idea how the Kerberos aunthentication will take place, where the tokens will be stored, i.e. whether client-side or server side. How the credential cache would be managed, when two or more users access the application and access Hadoop, because when we do kinit the old credential cache is replaced by the new one. What would be the complete flow?
Waiting for response. Thanks

Related

How to make keycloak sessions survive server restarts or upgrades?

Keycloak configuration and data is stored in a relational database, which is usally persisted to the hard disk. This includes data like realm settings, users, group- and role-memberships, auth flows and so on. But the user sessions will only be stored in an ephemeral in-memory infinispan cache. Therefore the session data in this cache is lost, when the keycloak server restarts.
There are many reasons why a restart of the Keycloak server is required. Major OS upgrades, Keycloak server upgrades to new versions, applying changes to keycloak e-mail templates or re-scheduling keycloak pods to other worker nodes in kubernetes or other cloud-based environments.
How to persist the session data to survive restarts. Ideally without having to maintain a custom infinspan server or using keycloak "offline sessions".
One solution could be to simply use so-called keycloak "offline sessions", but these sessions also have huge disadvantages:
they remain valid, even if the user logs out
logging out users with the keycloak admin console is no longer possible
See: https://www.keycloak.org/docs/latest/server_admin/#_offline-access
Will this problem still be present when keycloak > 17 is out and uses the all new quarkus distribution? Because in the following articles claim goals like Container-First Approach, Zero-Downtime Upgrade and Storage re-architecture.
https://www.keycloak.org/2021/10/keycloak-x-update
https://www.keycloak.org/2020/12/first-keycloak-x-release.adoc

Shared http session lifetime

We have several web applications using the same identity provider (which we also manage), most of them (including identity provider) are using .NET core.
Requirement is that if user is logged in in two or more applications at the same time (in one browser), and is actively using one app, it automatically extends the session lifetime in all of the applications.
So while he's using at least one application, he doesn't get logged out of neither of them. Which is another requirement: auto-logout after certain time of inactivity (this part is easy of course)
I thought of using Redis server to manage this shared session lifetime, using SessionId that each app would receive from identity server via claims. So each time user does some action, backend contacts Redis and check if user's session is still active and extend the session lifetime if it is. Logout user if it's not.
Problem is, applications are not allowed to access this Redis server directly (security reasons). So I thought of adding a separate web service for these apps to contact using standard HTTP endpoint. So basically just a middleman between Redis and web app.
Is there any better way to do it? Not sure how common of a requirement is this.
Redis usually belongs in the Distributed cache, which means that it is located on another server farm. Therefore, your application has restrictions because it is not allowed to access an external server.
If your application is under development or if it is still in the growth phase my recommendation is to use InMemory cache or consider response caching middleware.
Also, these are very small amounts of data, and if you are going to store only that in the cache for a start you would definitely consider InMemory.
Of course, I understand your need for a Redis and that is it:
Is coherent (consistent) across requests to multiple servers.
Survives server restarts and app deployments. And that's because
your cache is usually in a different location (e.g. Azure)
Doesn't use local memory.
Is scalable
etc.
For larger applications Consider replacing InMemory cache with the Distributed memory cache. It is mostly similar to the InMemory cache because both the InMemory and the Distributed cache are located on the server farm where the application was run. Only the InMemory cache requires a sticky session, and the Distributed memory cache does not.

Identity server communication with DB - Security concerns

I need quick help regarding Identity server.
There is a client requirement to not allow any public hosted application to directly talk to the database. In Identity server's case, the Identity server will be hosted publicly for token endpoint, and it queries the database for operational data (went for Db approach with reference tokens because IDsrv will be on NLB). Is there any workaround for this? or is this standard practice?
Thanks
If you don't want IdentityServer to talk directly to the database, you will need to implement & register custom implementations of ICorsPolicyService, IAuthorizationCodeStore, IClientStore, IConsentStore, IRefreshTokenStore, IScopeStore, and ITokenHandleStore, that call off to some an external app that can talk to the database.
Normal operating procedure is to have IdentityServer talk directly to the database. I don't see much merit in separating the two.
FYI: You don't need to use reference tokens if you are using load balancing. Check out the deployment docs.

Confusing by Ranger architecture

After spent whole day to setup and study Hortonworks' Ranger, I'm reluctantly able to use it now, but I'm still very confusing by it's structure. I'm listing the questions below:
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
Why I have to use UserSync? or in the other words, Ranger-Admin has ability to talk with LDAP/AD to get users, why it still needs UserSync? and how if UserSync gonna to talk with LDAP/AD also(or a different ldap server), what would happen? will it impacts to Ranger-Admin self's LDAP/AD connection?
Similar question for plugin's audit connection, as Ranger-Admin has audit connection, why plugin need itself's connection to audit database? Why they don't just push audit information to Admin, and let Admin to make decision where to store the information? How if they(Admin and plugin) talk to different database, what gonna happen?
I think I can briefly answer Q1
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
They are for different purpose. Ranger gives you fine-grained ACLs control, Knox is a proxy server (gateway) that gives a centralized web service security layer. That says, using Ranger, you have a central place (UI) to manage ACLs for hadoop stack services, e.g who can access a table on hive; using knox, you can put all your hadoop services under a private network using un-secure http protocol, and knox server running on gateway node (outside can access) that has https enabled, it gives user a central http/https entry to access web services that supports user login (some of hadoop stack services, e.g hadoop, doesn't support this yet).

Session-based authentication for a RESTful webservice in Glassfish

The Problem
I'm creating an application that runs on Glassfish 3.1.2.2 and exposes a RESTful API in an environment where authentication is expensive. Passing the credentials and authenticating every request isn't feasible - I need a session-based approach.
I'm looking at using BASIC auth and in-memory session replication (the app needs to support deployment in a cluster). In-memory replication seems expensive considering I'm only sharing "logged in" state - every other component of the applicaiton is stateless.
My question is: Is there a better alternative?
e.g. Can I configure Glassfish to persist session state to a database, instead of using in-memory replication?
I've considered a "session as a resource" approach e.g. POST to /session to login, DELETE /session/{id} to logout. This provides more control, however it's more difficult for the service consumers (vs something like BASIC auth).

Resources