Confusing by Ranger architecture - hadoop

After spent whole day to setup and study Hortonworks' Ranger, I'm reluctantly able to use it now, but I'm still very confusing by it's structure. I'm listing the questions below:
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
Why I have to use UserSync? or in the other words, Ranger-Admin has ability to talk with LDAP/AD to get users, why it still needs UserSync? and how if UserSync gonna to talk with LDAP/AD also(or a different ldap server), what would happen? will it impacts to Ranger-Admin self's LDAP/AD connection?
Similar question for plugin's audit connection, as Ranger-Admin has audit connection, why plugin need itself's connection to audit database? Why they don't just push audit information to Admin, and let Admin to make decision where to store the information? How if they(Admin and plugin) talk to different database, what gonna happen?

I think I can briefly answer Q1
What's the relationship between Ranger and Knox, why Hortonworks provides two solutions for same position? If I want apply them for my Hadoop cluster, what's the best practice?
They are for different purpose. Ranger gives you fine-grained ACLs control, Knox is a proxy server (gateway) that gives a centralized web service security layer. That says, using Ranger, you have a central place (UI) to manage ACLs for hadoop stack services, e.g who can access a table on hive; using knox, you can put all your hadoop services under a private network using un-secure http protocol, and knox server running on gateway node (outside can access) that has https enabled, it gives user a central http/https entry to access web services that supports user login (some of hadoop stack services, e.g hadoop, doesn't support this yet).

Related

Aache NIFI is it applicable to this use case?

is this use case applicable to be implemented using NIFI
I want to develop a connector between two saas applications, to transfer data from system to system B. each application is multi-tenant. this connector works as the following
user insert in a form the authorization information for both systems
once authenticated the data will move on a scheduled basis to the other systems
not all the users in the saas will use this .. only group of them
data belonging for each user should not be overlapped with other users
Regards,

Akka, AMI - discover remote actors for database access

I am working on a prototype for a client where, on AWS auto-scaling is used to create new VMs from Amazon Machine Images (AMIs), using Akka.
I want to have just one actor, control access to the database, so it will create new children, as needed, and queue up requests that go beyond a set limit.
But, I don't know the IP address of the VM, as it may change as Amazon adds/removes VMs based on activity.
How can I discover the actor that will be used to limit access to the database?
I am not certain if clustering will work (http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html), and this question and answers are from 2011 (Akka remote actor server discovery), and possibly routing may solve this problem: http://doc.akka.io/docs/akka/2.4.16/scala/routing.html
I have a separate REST service that just goes to the database, so it may be that this service will need to do the control before it goes to the actors.

Authenticating GeoServer REST calls with certificates?

To meet security requirements, our project needs to move our GeoServer credentials(account/password) out of the code base. Is it possible to authenticate REST calls with certificates, or any other method besides account/password credentials?
The answer is 'yes'!
Exactly how requires working out a few details. If all users are required to provide a certificate, you'll likely want to sort that out at the container level (Tomcat, Wildfly, etc).
Once GeoServer has a certificate, you'll likely want to set up a role service to map users to roles.
The docs for GeoServer's security system are great. I've read them multiple times, and I'd strongly encourage checking them out: http://docs.geoserver.org/latest/en/user/security/index.html#security
Since you mentioned certificates, I'd suggest reading this tutorial: http://docs.geoserver.org/latest/en/user/security/tutorials/cert/index.html.
Since you mentioned security REST endpoints, I'd point out
http://docs.geoserver.org/latest/en/user/security/rest.html. I believe some of that configuration can be done through the GeoServer admin UI.
As a note, GeoServer is highly modular; you may need to install a module or two to connect to an LDAP server or modify how the security settings, etc.

Understanding use case for Kebreos secured Hadoop

I have configured Kerberos with Hadoop. I'm facing difficulty in mapping the Kerberos architecture and whole flow of authentication to my application.
Following is my use case:
We have a web application that calls backend services, which communicates with Hadoop ecosystem internally.
Now I don't have clear idea how the Kerberos aunthentication will take place, where the tokens will be stored, i.e. whether client-side or server side. How the credential cache would be managed, when two or more users access the application and access Hadoop, because when we do kinit the old credential cache is replaced by the new one. What would be the complete flow?
Waiting for response. Thanks

What is the business benefit for Oracle Weblogic Server over OC4J?

Apart from Technology support , what are all the business benefits for oracle web logic server. For example in area of security,support etc.
What are all the new features supported by weblogic ?
TL;DR:
Support is great when you open ticket with Oracle Support (Weblogic strictly).
Great admin/read-only user implementation. We authenticate to Windows Active Directory. Developers get read-only accounts, reduces churn for them to wait for ops to transfer logs and validate settings.
Dashboard useful out-of-box to do real-time monitoring without additional tools or installs. Easily accessed by any one who is authenticated to login. We could give it to our CIO if he wanted in about 3 minutes by adding him to the right authorized group in AD.
Easier to clone environments.
I haven't worked with OC4J but I believe Oracle's roadmap is picking Weblogic as their preferred Java application server. You can see it is the base technology for some of their other products, such as Oracle Service Bus, Oracle Enterprise Manager (OEM), and Oracle Line Planning.
I have opened 3 Oracle tickets in the past month. I was surprised at how fast they answered. For a Severity 3 ticket (medium), they usually have responded in 2-3 days. I can't say the same for their other services (over 2 weeks for a ticket on OEM).
Security is a pretty broad scope... so you'd have to be a little more specific on some of the topics of security.
One thing that is pretty awesome is the Dashboard. http://docs.oracle.com/cd/E14571_01/web.1111/e13714/dashboard.htm You can obviously add read-only monitor accounts so other users can get insight to the performance. We add developers to this so that they can validate any settings, or see performance whenever there is a production issue.
We used Microsoft Active Directory authentication in our Weblogic domains. People are not using the default weblogic administrator user so configuration changes are audited. When someone's account gets disabled when leaving the company, it disables their access to Weblogic similarly. You don't have to change the password.
Other useful settings I like in it is the ability to automatically archive config changes. Each time someone makes a config change, a backup is automatically created. This allows me to go fix something when developers break their environment without having to majorly reverse-engineer what they did.
I also like the fact that you can pack and unpack the domains. I've used it to move entire domains from staging to production with some minor changes... i.e. change all stg to prod variables. This should likewise make it easier to 'clone' environments when you want to build out a new one.
Although not related, I should mention Oracle Enterprise Manager. We are an Oracle shop because they seem to have given us a good deal on licencing. So we get to run Oracle Enterprise Manager, which is a tool slowly becoming more and more useful. The agent also reports how our RedHat Linux hosts are behaving, network input/output, CPU utilization, memory utilization, java heap stacks. We are going to move to defining groups within that has all the targets related to an application stack. This will give our operations team the insight to see where the bottleneck might be... the Oracle Weblogic web layer, network, Oracle Service Bus, or Oracle Database performance.
Supposedly, you can add jBoss, other JMX monitoring as well to OEM. It's on our to-do list for non-Weblogic instance. We're slowly rolling OEM out.

Resources