Access Always Encrypted data from Databricks - jdbc

I have a table in Azure SQL managed instance with 'Always Encrypted' columns. I stored the Column and master keys in Azure key Vault.
My first, question is - How do I access the decrypted data in Azure SQL from Databricks. For that I connected to Azure SQL via jdbc. For Username and Password, I am passing my credentials manually
val jdbcHostname = "XXXXXXXXXXX.database.windows.net"
val jdbcPort = 1433
val jdbcDatabase = "ABCD"
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"
// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)
import java.sql.DriverManager
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
connection.isClosed()
val user = spark.read.jdbc(jdbcUrl, "dbo.bp_mp_user_test", connectionProperties)
display(user)
When I do this I am able to display the data, but it is encrypted data. How do I see the decrypted data
I am new to Azure and Databricks combo, so still learning Azure/Microsoft stack. Are there any other forms of jdbc connection syntax that allows you to decrypt.
I have the keys in Azure Keyvault. So how do I make use of those keys and the security associated with those keys, that way when someone accesses this table, it shows encrypted/decrypted data in the Databricks when accessed.

Related

How to generate SAS token using python legacy SDK(2.1) without account_key or connection_string

I am using python3.6 and azure-storage-blob(version = 1.5.0) and trying to use user assigned managed identity to connect to my azure storage blob from an Azure VM.The problem I am facing is I want to generate the SAS token to form a downloadable url.
I am using blob_service = BlockBlobService(account name,token credential) to authenticate. But I am not able to find any methods which let me generate SAS token without supplying the account key.
Also not seeing any way of using the user delegation key as is available in the new azure-storage-blob (versions>=12.0.0). Is there any workaround or I will need to upgrade the azure storage library at the end.
I tried to reproduce in my environment to generate SAS token without account key or connection string got result successfully.
Code:
import datetime as dt
import json
import os
from azure.identity import DefaultAzureCredential
from azure.storage.blob import (
BlobClient,
BlobSasPermissions,
BlobServiceClient,
generate_blob_sas,
)
credential = DefaultAzureCredential(exclude_shared_token_cache_credential=True)
storage_acct_name = "Accountname"
container_name = "containername"
blob_name = "Filename"
url = f"https://<Accountname>.blob.core.windows.net"
blob_service_client = BlobServiceClient(url, credential=credential)
udk = blob_service_client.get_user_delegation_key(
key_start_time=dt.datetime.utcnow() - dt.timedelta(hours=1),
key_expiry_time=dt.datetime.utcnow() + dt.timedelta(hours=1))
sas = generate_blob_sas(
account_name=storage_acct_name,
container_name=container_name,
blob_name=blob_name,
user_delegation_key=udk,
permission=BlobSasPermissions(read=True),
start = dt.datetime.utcnow() - dt.timedelta(minutes=15),
expiry = dt.datetime.utcnow() + dt.timedelta(hours=2),
)
sas_url = (
f'https://{storage_acct_name}.blob.core.windows.net/'
f'{container_name}/{blob_name}?{sas}'
)
print(sas_url)
Output:
Make sure you need to add storage blob data contributor role as below:

Azure - Copy LARGE blobs from one container to other using logic apps

I successfully built logic app where whenever a blob is added in container-one, it gets copied to container-2. However it fails when any blobs larger than 50 MB (default size) is uploaded.
Could you please guide.
Blobs are added via rest api.
Below is the flow,
Currently, the maximum file size with disabled chunking is 50MB. One of the workarounds is to use Azure functions in order to transfer the files from one container to another.
Below is the sample Python Code that worked for me when I'm trying to transfer files from One container to Another
from azure.storage.blob import BlobClient, BlobServiceClient
from azure.storage.blob import ResourceTypes, AccountSasPermissions
from azure.storage.blob import generate_account_sas
from datetime import datetime,timedelta
connection_string = '<Your Connection String>'
account_key = '<Your Account Key>'
source_container_name = 'container1'
blob_name = 'samplepdf.pdf'
destination_container_name = 'container2'
# Create client
client = BlobServiceClient.from_connection_string(connection_string)
# Create sas token for blob
sas_token = generate_account_sas(
account_name = client.account_name,
account_key = account_key,
resource_types = ResourceTypes(object=True),
permission= AccountSasPermissions(read=True),
expiry = datetime.utcnow() + timedelta(hours=4)
)
# Create blob client for source blob
source_blob = BlobClient(
client.url,
container_name = source_container_name,
blob_name = blob_name,
credential = sas_token
)
# Create new blob and start copy operation
new_blob = client.get_blob_client(destination_container_name, blob_name)
new_blob.start_copy_from_url(source_blob.url)
RESULT:
REFERENCES:
General Limits
How to copy a blob from one container to another container using Azure Blob storage SDK

glue job times out when calling aws boto3 client api

I am using glue console not dev endpoint. The glue job is able to access glue catalogue and table using below code
datasource0 = glueContext.create_dynamic_frame.from_catalog(database =
"glue-db", table_name = "countries")
print "Table Schema:", datasource0.schema()
print "datasource0", datasource0.show()
Now I want to get the metadata for all tables from the glue data base glue-db.
I could not find a function in awsglue.context api, therefore i am using boto3.
client = boto3.client('glue', 'eu-central-1')
responseGetDatabases = client.get_databases()
databaseList = responseGetDatabases['DatabaseList']
for databaseDict in databaseList:
databaseName = databaseDict['Name']
print ("databaseName:{}".format(databaseName))
responseGetTables = client.get_tables( DatabaseName = databaseName,
MaxResults=123)
print("responseGetDatabases{}".format(responseGetTables))
tableList = responseGetTables['TableList']
print("response Object{0}".format(responseGetTables))
for tableDict in tableList:
tableName = tableDict['Name']
print("-- tableName:{}".format(tableName))
the code runs in lambda function, but fails within glue etl job with following error
botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.eu-central-1.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.eu-central-1.amazonaws.com timed out. (connect timeout=60)'))
The problem seems to be in environment configuration. Glue VPC has two subnets
private subnet: with s3 endpoint for glue, allows inbound traffic from the RDS security group. It has
public subnet: in glue vpc with nat gateway. Private subnet is reachable through gate nat Gateway. I am not sure what i am missing here.
Try using a proxy while creating the boto3 client:
from pyhocon import ConfigFactory
service_name = 'glue'
default = ConfigFactory.parse_file('glue-default.conf')
override = ConfigFactory.parse_file('glue-override.conf')
host = override.get('proxy.host', default.get('proxy.host'))
port = override.get('proxy.port', default.get('proxy.port'))
config = Config()
if host and port:
config.proxies = {'https': '{}:{}'.format(host, port)}
client = boto3.Session(region_name=region).client(service_name=service_name, config=config)
glue-default.conf and glue-override.conf are deployed to the cluster by glue while spark submit into the /tmp directory.
I had a similar issue and I did the same by using the public library from glue:
s3://aws-glue-assets-eu-central-1/scripts/lib/utils.py
can you please try the boto client creation as below by specifying the region explicitly?
client = boto3.client('glue',region_name='eu-central-1')
I had a similar problem when I was running this command from Glue Python Shell.
So I created endpoint (VPC->Endpoints) for Glue service (service name: "com.amazonaws.eu-west-1.glue"), this one was assigned to the same Subnet and Security Group as the Glue Connection which was used in the Glue Python Shell Job.

TTL on Ignite 2.5.0 not working

I tried enabling TTL for records in Ignite using 2 approaches, but didn't seems to be working. Need help to understand if I am missing something.
IgniteCache cache = ignite.getOrCreateCache(IgniteCfg.CACHE_NAME);
cache.query(new SqlFieldsQuery(
"CREATE TABLE IF NOT EXISTS City (id LONG primary key, name varchar, region varchar)"))
.getAll();
cache.withExpiryPolicy(new CreatedExpiryPolicy(new Duration(TimeUnit.SECONDS, 10)))
.query(new SqlFieldsQuery(
"INSERT INTO City (id, name, region) VALUES (?, ?, ?)").setArgs(1, "Forest Hill1", "GLB"))
.getAll();
So you see above I created table in Cache and inserted record mentioning expiry TTL for 10 seconds, but seems that it never expires.
I tried another approach of rather than setting TTL while inserting the record, I mentioned in CacheConfiguration while I initialize Ignite, below is the code sample
Ignition.setClientMode(true);
IgniteConfiguration cfg = new IgniteConfiguration();
// Disabling peer-class loading feature.
cfg.setPeerClassLoadingEnabled(false);
CacheConfiguration ccfg = createCacheConfiguration();
cfg.setCacheConfiguration(ccfg);
ccfg.setEagerTtl(true);
ccfg.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new Duration(TimeUnit.SECONDS, 5)));
TcpCommunicationSpi commSpi = new TcpCommunicationSpi();
cfg.setCommunicationSpi(commSpi);
TcpDiscoveryVmIpFinder tcpDiscoveryFinder = new TcpDiscoveryVmIpFinder();
String[] addresses = { "127.0.0.1" };
tcpDiscoveryFinder.setAddresses(Arrays.asList(addresses));
TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();
discoSpi.setIpFinder(tcpDiscoveryFinder);
cfg.setDiscoverySpi(discoSpi);
return Ignition.start(cfg);
Executing Ignite locally (not as in memory) as my final goal is to be able to connect to same Ignite from multiple instances of app or even multiple apps.
Ignite SQL currently doesn't interact with expiry policies and doesn't update TTL. There is a Feature Request for that: https://issues.apache.org/jira/browse/IGNITE-7687.

Firebird connection to remote server using FbConnectionStringBuilder

We're having trouble connecting to a remote Firebird server using the .NET provider FbConnectionStringBuilder class. We can connect to a local database file in either embedded or server mode however when it comes to establishing a connection to a remote server we can't establish the connection.
We assign properties of the FbConnectionStringBuilder class with the following code (server mode). I have omitted the code which assigns properties for embedded mode.
var cs = new FbConnectionStringBuilder
{
Database = databaseSessionInfo.PathAbsoluteToDatabase,
Charset = "UTF8",
Dialect = 3,
};
cs.DataSource = databaseSessionInfo.Hostname;
cs.Port = databaseSessionInfo.Port;
cs.ServerType = (FbServerType)databaseSessionInfo.Mode;
cs.Pooling = true;
cs.ConnectionLifeTime = 30;
if (databaseSessionInfo.UseCustomUserAccount)
{
cs.UserID = databaseSessionInfo.Username;
cs.Password = databaseSessionInfo.Password;
}
else
{
cs.UserID = Constants.DB_DefaultUsername;
cs.Password = Constants.DB_DefaultPassword;
}
Pretty straightforward. Our software contains a connection configuration screen whereby a user can supply different connection properties. These properties get assigned to the FbConnectionStringBuilder class using the code above.
The connection builder class outputs a connection string in the following format:
initial catalog="P:\Source\database.fdb";character set=UTF8;dialect=3;data source=localhost;port number=3050;server type=Default;pooling=True;connection lifetime=30;user id=USER;password=example
However literature on Firebird connection strings as indicated on this page (Firebird Connection Strings) talk of a different structure. I can only assume the FbConnectionStringBuilder class builds a Firebird connection string satisfying the requirements of Firebird.
Does the FbConnectionStringBuilder class append the hostname to the connection string correctly?
The Firebird server is running on the server. I assume there is no need to install it on the client?
What libraries need to be installed with the client to support a remote server connection?
Are we doing this right?
Any advice is appreciated.
Answering your questions:
1) Yes it will.
2) Correct, the client library will connect over the network.
3) If you use the ADO library, just FirebirdSql.Data.FirebirdClient.dll
4) Maybe, I dont know if this will help but this is how I connect.
FbConnectionStringBuilder ret = new FbConnectionStringBuilder();
ret.DataSource = Host;
ret.Port = Port;
ret.Database = Database;
ret.UserID = User;
ret.Password = Password;
ret.Charset = CharacterSet;
FbConnection ret = new FbConnection(connectionString);
Interestingly, what is the absolute path you are providing to the StringBuilder ? Is it the server absolute path, or some kind of network mapped drive?
Also, I assume you've reviewed firewall settings and allowing port 3050 inbound.

Resources