My team has data stored on ElasticSearch and have given me an API key, the URL of a remote cluster, and a username/password combination (to what I dont know) to GET data.
How do I use this API key to get data from the ElasticSearch cluster with Python? I've looked through the docs, but none include the use of a raw API key and most involve localhost, not a remote host in my case.
Surely I need to know the names of nodes or indexes at least? For what would I need the username/password combo for? There must be more details I need to connect with than what I've been given?
We're moving from Node.js+couchbase work to ElasticSearch+Python so I'm more than a bit lost.
TYIA
Most probably x-pack basic security is enabled in your Elasticsearch(ES) cluster, which you can check by hitting http::9200, if it ask for username/password then you can provide what you have.
Please refer x-pack page for more info.
In short, its used to secure your cluster and indices and there are various types of authentication and basic auth(which requires username/password) is the one your team might be using.
Related
Hope you are doing well !
We have already developed ETL pipeline using apache NiFi. Which gets trigger only when client uploads source data file from portal.After that, the data present inside source file goes through various layers,gets transformed and stored back to warehouse(i.e. hive).
Goal : To identify sensitive information and mask it so that end user won't see actual data.
Identify Sensitive data & masking strategy : We will make use of open source tool to achieve this goal as follow.
Data steward studio : This tool allow me to identify sensitive information and tag it properly.
Apache Atlas : Once data steward user has confirmed the tag then that tag will be pushed into Apache atlas.
Apache ranger : At the final, we can define tag based-masking policy using Apache ranger which will allow or deny to specific user.
For more details on above solution , please visit link.
https://www.youtube.com/watch?v=RzEfLwJaLsc
Problem : In order to feed the data to DSS tool, it should be loaded first in hive table. That is fine. But we cannot stop the existing ETL flow in-between and then start identification process of sensitive information. The above solution must require some manual process which i want to get rid of and make it automated.that is, it should be plugged in somewhere within NiFi pipeline.But so far, as per my understanding DSS do not allow us to do something like that.
Manual Process :
Create Asset collection
Accept/Reject suggested tags within DSS.
If we cannot plug identification process in pipeline, then client sensitive data will be exposed to everyone and visible to everyone in team. I want something where we can de-identify sensitive data before it actually get loaded into HDFS or hive tables.
Please write your response to me on the same problem, if anyone has already worked into this particular area.
I did not test it, but here are my thoughts on this challenge.
Set up the system such that data is NOT visible to everyone(or anyone) by default
Load the data into hive
Let the profilers run and accept its suggestions
Open up the data to those who should have access (except for the things found by the profiler)
There are still some implementation details to work out (e.g. How to automate step 3/4 and whether you can just solve this with tags or whether the data needs to sit in a staging area first). But I hope this steers you in a good direction.
One idea might be to use EncryptContent of nifi (https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.EncryptContent/). Then the values loaded into Hive will be encrypted in the first place and would not be visible to the stewards. Once the tagging has been done - then in the subsequent part of the pipeline (where I'm assuming you're using nifi as well) - you can decrypt back content as required.
Is there any way to Insert Read-Only document or Key-Value pair in Couchbase using couchbase Go SDK?
There isn't a way to do this at the document level (yet), but one possible workaround with Couchbase Server Enterprise is bucket level permission. You could create a bucket (e.g. "myreadonly") and create a user (e.g "myreadonlyuser") that only has data reader permission. Of course, someone will need write access to put the document in there in the first place, but anyone using the "myreadonlyuser" credentials can only read.
There might be a way to do it in the upcoming "scope" and "collection" levels too, but it would likely be a variation of the above approach. Document-level authentication may be on the roadmap for the future.
I'm looking into ways of replicating databases from On-Premise environments to Azure and one of the options I found was setting up a Read-Scale availability group.
The reason I'm using a Read-Scale and not an Always On availability group, is because I don't won't to use SQL Server Enterprise edition due to the cost.
I followed a tutorial from Microsoft (MS TUTORIAL) to set this all up and in the end, I think I got it working as my database appeared on the Azure environment.
However, the problem is that my replica always stays in the Synchronizing state - which is probably due to the fact I chose Asynchronous Replication by using the AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT command - but even worse is that I can't access the database itself.
Each time I try to fire a query against it, it comes back with an Object is not accessible exception.
After some reading, I found that the cause of this might be because my replica doesn't have a secondary role. Trying to set this via the ... SECONDARY_ROLE({ALLOW_CONNECTIONS = ALL})... command, clearly states that this feature is not available in the Standard version of SQL Server.
My whole confusion comes from the fact that on the Microsoft documentation (MS DOCS), it says that With availability groups, one or more secondary replicas can be configured to support read-only access to secondary databases. which is exactly what I'm not succeeding in.
Did anybody have the same issue, or knows how to configure the Read-Scale availability group on SQL Server Standard so my second replica is accessible and readable as well?
P.S. I did look at the actual SQL Replication with Transaction Replication, but there are quite a bit of moving parts there, so I'm exploring all options before making a decision.
Based on a twitter conversation I found out that you will need to create a snapshot of the database in Secondary Replica in order to read from there.
Please read this tweeter thread.
I also added a suggestion in the feedback channel to fix the documentation.
I'm currently working on a POC with Couchbase, using Spring Data to put & get documents on/off a bucket on a cluster.
As I'm working in a big company, I'm lucky they gave me a bucket, but still I don't have the admin rights on the cluster, so I only have access to the bucket.
But as I'm digging into the Spring Data documentation, I'm not able to find a way to retrieve documents without creating views on the server. (I'm getting errors like "Unknown query param" ). Nevertheless with couchbase java sdk i'm able to, through n1ql queries, but the use of the Spring data layer is mandatory.
The answers I found always point me to the server-side function direction
ex : https://stackoverflow.com/a/30928169/3744307
What I would like to find, is a way to add a repository method like
List findReceiptByAccount(String Account)
without having to specificly declare the function server-side.
Is this possible, or have I to send a request to the administrators to create functions for me everytime I have to add a findByX method?
Thanks for your time,
What version of CB is it ?
I think that prior to 4.5, a n1ql access (which you seems to have) is enough to build your index yourself !
With Spring Data Couchbase 2.x that would use a N1QL index in the background, and it would work with a single primary index (although having 1 index per repository entity class would be best for performance). Maybe you can ask your admin to create that index once?
I am going to use logstash+ES+kibana for my project. I want to know how to use this framework for multi tenants. Can any one explain me how after the authentication Kibana query the elastic search index and load in Kibana's dashboard? Can I restrict kibana to look for a specifix index of Elastic search for a particular user or some-id? Anybody has tried this?
Thnx
You could, but depending on your use case it is probably not a good idea. There are a few gotchas, particularly regarding security and separating the users. First Kibana is just javascript running in the browser. So whatever Kibana is allowed to do so is your user. You can however have a separate index pattern for each "user", but elastic search does not provide you any ways of authenticating a users or authorizing a user access to a specific index. You would have to use some sort of proxy for this.
I recommend http://www.found.no/foundation/elasticsearch-in-production/ and http://www.found.no/foundation/elasticsearch-security/ for a more in depth explanation.
Create an index for each tenant.
In this way you can use a proxy (like the app the hosts kibana) to intercept the request and return a settings that includes the index to use.
The value that specifies the index to use can be the logged in user or you can get that value somewhere else.
To separate even more the data, you can use a prefix in each index name, and then when you specify an index you can use a pattern to take all the index related to only certain kind of data/entities.
Hope this help.
Elasticsearch announced today a plugin they are working on that should provide security features to ES product. Probably, this will contain ways of restricting access based on roles and users setup at cluster and indices level. If this happens I see no way for them not to extend this security layer to Kibana, as well. Also, it seems this plugin will have a commercial version only.