Issue creating/accessing hive external table with s3 location from spark thrift service - hadoop

I have configured the s3 keys (access key and secret key) in a jceks file using hadoop-credential api. Commands used for the same are as below:
hadoop credential create fs.s3a.access.key -provider jceks://hdfs#nn_hostname/tmp/s3creds_test.jceks
hadoop credential create fs.s3a.secret.key -provider jceks://hdfs#nn_hostname/tmp/s3creds_test.jceks
Then, I am opening a connection to Spark Thrift Server using beeline and passing the jceks file path in the connection string as below:
beeline -u "jdbc:hive2://hostname:10001/;principal=hive/_HOST#?hadoop.security.credential.provider.path=jceks://hdfs#nn_hostname/tmp/s3creds_test.jceks;
Now, when I try to create an external table with the location in s3, it fails with the below exception:
CREATE EXTERNAL TABLE IF NOT EXISTS test_table_on_s3 (col1 String, col2 String) row format delimited fields terminated by ',' LOCATION 's3a://bucket_name/kalmesh/';
Exception: Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://bucket_name/kalmesh: getFileStatus on s3a://bucket_name/kalmesh: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: request_id), S3 Extended Request ID: extended_request_id=) (state=,code=0)

I don't think jceks support for the fs.s3a. secrets went in until Hadoop 2.8. I don't think; it's hard to tell from the source. If that is the case, and you are using Hadoop 2.7, then the secret isn't going to to be picked up. Afraid you will have to put it in the config.

I had a similar situation, just with Drill instead of Hive. But like in your case:
using Hadoop 2.9 jars (1st version to support AWS KMS)
writing to s3a://
encrypting with SSE-KMS
... and got AmazonS3Exception: Access Denied.
In my case (perhaps in yours, as well) the exception description was a bit ambiguous. The reported AmazonS3Exception: Access Denied did not originate from S3 but from KMS! Access was denied to the key I used for encryption. User making the API calls was not on key's users list - once I added that user to key's list writing started to work and I could create encrypted tables on s3a://...

For me the following s3 permissions were required:
s3:ListBucket
s3:GetObject
s3:PutObject
I was receiving the same error and was missing s3:ListBucket.
As for KMS permissions (if applicable):
kms:Decrypt
kms:Encrypt
kms:GenerateDataKey

Related

import plugin throw error 400 saying InvalidParameterValue: The specified KMS key is not accessible

2 days back everything was working. but now it started giving this error. i am able to reproduce same error in dev environment. for testing i created a s3 without encryption and new kms key. but i am getting same error there.
aws ec2 import-image --description "123" --encrypted --kms-key-id arn:aws:kms:us-east-1:123456789:key/abc-efg-hij-klm-nop-xyz --disk-containers Format=ova,UserBucket="{S3Bucket=,S3Key=}"
An error occurred (InvalidParameterValue) when calling the ImportImage operation: The specified KMS key is not accessible. If this is a default EBS CMK, please retry your request without specifying the key explicitly
any help?

How to run portworx backup to minio server

Trying to configure portworx volume backups (ptxctl cloudsnap) to localhost minio server (emulating S3).
First step is to create cloud credentials using ptxctl cred c
e.g.
./pxctl credentials create --provider s3 --s3-access-key mybadaccesskey --s3-secret-key mybadsecretkey --s3-region local --s3-endpoint 10.0.0.1:9000
This results in:
Error configuring cloud provider.Make sure the credentials are correct: RequestError: send request failed caused by: Get https://10.0.0.1:9000/: EOF
disabling SSL (which is not configured as this is just a localhost test) gives me:
./pxctl credentials create --provider s3 --s3-access-key mybadaccesskey --s3-secret-key mybadsecretkey --s3-region local --s3-endpoint 10.0.0.1:9000 --s3-disable-ssl
Which returns:
Not authenticated with the secrets endpoint
I've tried this with both minio gateway (nas) and minio server - same result.
Portworx container is running within Rancher
Any thoughts appreciated
Resolved via instructions at https://docs.portworx.com/secrets/portworx-with-kvdb.html
i.e. set secret type to kvdb in /etc/pwx/config.json
"secret": {
"cluster_secret_key": "",
"secret_type": "kvdb"
},
Then login using ./pxctl secrets kvdb login
After this, credentials create was successful and subsequent cloudsnap backup. Test was using --s3-disable-ssl switch
Note - kvdb is plain text so not suitable for production obvs.

Hadoop Credentials using Password File

I was going through the doncumentaion of Hadoop Credentials as provided in
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html
But while using the 3rd option to Provide password for keystone using password file, I am getting a failure everytime. An excerpt of the command used is provided below. Can anyone tell me what is the error and how to rectify this.
hadoop credential -Dhadoop.security.credstore.java-keystore-provider.password-file=/home/dir/test.txt create mssql2.password -value 'SomePassword' -provider localjceks://file/home/dir/aws3.jceks
The Error is provided below:
java.io.IOException: Password file does not exist
at org.apache.hadoop.security.ProviderUtils.locatePassword(ProviderUtils.java:135)
at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.locateKeystore(AbstractJavaKeyStoreProvider.java:323)
at org.apache.hadoop.security.alias.AbstractJavaKeyStoreProvider.(AbstractJavaKeyStoreProvider.java:86)
at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider.(LocalJavaKeyStoreProvider.java:58)
at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider.(LocalJavaKeyStoreProvider.java:50)
at org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider$Factory.createProvider(LocalJavaKeyStoreProvider.java:177)
at org.apache.hadoop.security.alias.CredentialProviderFactory.getProviders(CredentialProviderFactory.java:58)
at org.apache.hadoop.security.alias.CredentialShell$Command.getCredentialProvider(CredentialShell.java:181)
at org.apache.hadoop.security.alias.CredentialShell$CreateCommand.validate(CredentialShell.java:345)
at org.apache.hadoop.security.alias.CredentialShell.run(CredentialShell.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.security.alias.CredentialShell.main(CredentialShell.java:460)
The problem is, this property expect the file name not the file path and then Hadoop api will search for this name on the Hadoop Classpath. Since this file contains clear text so another way is to just
export HADOOP_CREDSTORE_PASSWORD=${PASSWORD}

Hdfs to s3 Distcp - Access Keys

For copying the file from HDFS to S3 bucket I used the command
hadoop distcp -Dfs.s3a.access.key=ACCESS_KEY_HERE\
-Dfs.s3a.secret.key=SECRET_KEY_HERE /path/in/hdfs s3a:/BUCKET NAME
But the access key and sectet key are visible here which are not secure .
Is there any method to provide credentials from file .
I dont want to edit config file ,which is one of the method I came across .
I also faced the same situation, and after got temporary credentials from matadata instance. (in case you're using IAM User's credential, please notice that the temporary credentials mentioned here is IAM Role, which attach to EC2 server not human, refer http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html)
I found only specifying the credentials in the hadoop distcp cmd will not work.
You also have to specify a config fs.s3a.aws.credentials.provider. (refer http://hortonworks.github.io/hdp-aws/s3-security/index.html#using-temporary-session-credentials)
Final command will look like below
hadoop distcp -Dfs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -Dfs.s3a.access.key="{AccessKeyId}" -Dfs.s3a.secret.key="{SecretAccessKey}" -Dfs.s3a.session.token="{SessionToken}" s3a://bucket/prefix/file /path/on/hdfs
Recent (2.8+) versions let you hide your credentials in a jceks file; there's some documentation on the Hadoop s3 page there. That way: no need to put any secrets on the command line at all; you just share them across the cluster and then, in the distcp command, set hadoop.security.credential.provider.path to the path, like jceks://hdfs#nn1.example.com:9001/user/backup/s3.jceks
Fan: if you are running in EC2, the IAM role credentials should be automatically picked up from the default chain of credential providers: after looking for the config options & env vars, it tries a GET of the EC2 http endpoint which serves up the session credentials. If that's not happening, make sure that com.amazonaws.auth.InstanceProfileCredentialsProvider is on the list of credential providers. It's a bit slower than the others (and can get throttled), so best to put near the end.
Amazon allows to generate temporary credentials that you can retrieve from http://169.254.169.254/latest/meta-data/iam/security-credentials/
you can read from there
An application on the instance retrieves the security credentials provided by the role from the instance metadata item iam/security-credentials/role-name. The application is granted the permissions for the actions and resources that you've defined for the role through the security credentials associated with the role. These security credentials are temporary and we rotate them automatically. We make new credentials available at least five minutes prior to the expiration of the old credentials.
The following command retrieves the security credentials for an IAM role named s3access.
$ curl http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access
The following is example output.
{
"Code" : "Success",
"LastUpdated" : "2012-04-26T16:39:16Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "AKIAIOSFODNN7EXAMPLE",
"SecretAccessKey" : "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"Token" : "token",
"Expiration" : "2012-04-27T22:39:16Z"
}
For applications, AWS CLI, and Tools for Windows PowerShell commands that run on the instance, you do not have to explicitly get the temporary security credentials — the AWS SDKs, AWS CLI, and Tools for Windows PowerShell automatically get the credentials from the EC2 instance metadata service and use them. To make a call outside of the instance using temporary security credentials (for example, to test IAM policies), you must provide the access key, secret key, and the session token. For more information, see Using Temporary Security Credentials to Request Access to AWS Resources in the IAM User Guide.
if you do not want to use access and secret key (or show them on your scripts) and if your EC2 instance has access to S3 then you can use the instance credentials
hadoop distcp \
-Dfs.s3a.aws.credentials.provider="com.amazonaws.auth.InstanceProfileCredentialsProvider" \
/hdfs_folder/myfolder \
s3a://bucket/myfolder
Not sure if it is because of a version difference, but to use "secrets from credential providers" the -Dfs flag would not work for me, I had to use the -D flag as shown on the hadoop version 3.1.3 "Using_secrets_from_credential_providers" docs.
First I saved my AWS S3 credentials in a Java Cryptography Extension KeyStore (JCEKS) file.
hadoop credential create fs.s3a.access.key \
-provider jceks://hdfs/user/$USER/s3.jceks \
-value <my_AWS_ACCESS_KEY>
hadoop credential create fs.s3a.secret.key \
-provider jceks://hdfs/user/$USER/s3.jceks \
-value <my_AWS_SECRET_KEY>
Then the following distcp command format worked for me.
hadoop distcp \
-D hadoop.security.credential.provider.path=jceks://hdfs/user/$USER/s3.jceks \
/hdfs_folder/myfolder \
s3a://bucket/myfolder

storage blob upload command failed after setting AZURE_STORAGE_CONNECTION_STRING

I'm trying to upload a file to my azure storage. I did
$ set AZURE_STORAGE_CONNECTION_STRING=DefaultEndpointsProtocol=https;AccountName=**;AccountKey=**
but when I did
$ azure storage blob upload PATHFILE mycontainer data/book_270.pdf
then I got the following error:
info: Executing command storage blob upload
error: Please set the storage account parameters or one of the following two environment variables to use the storage command.
AZURE_STORAGE_CONNECTION_STRING
AZURE_STORAGE_ACCOUNT and AZURE_STORAGE_ACCESS_KEY
info: Error information has been recorded to /Users/uqtngu83/.azure/azure.err
error: storage blob upload command failed
But I already set AZURE_STORAGE_CONNECTION_STRING! Please help
As suggested in comment, you are suppose to run the following on your MAC terminal. (Change Defaultblabla with your Azure Storage Connection String)
export AZURE_STORAGE_CONNECTION_STRING="DefaultBlaBlaBla"

Resources