Can you mount more than one ADLS2 instance in Databricks? - azure-databricks

What is the best practice to set up a DEV/TEST/PROD environment for the data lakehouse / delta lake architecture? Do you have a separate ADLS2 instance for DEV/TEST/PROD each, or do you host all three in one ADLS2 instance? Can you even mount more than one ADLS2 instances in Datbricks?

It's better to have separate storage accounts for each environment. There are multiple reasons for that:
It's simpler to control access to the data - usually in prod environment only service accounts have access to the data, but dev & test environments have other requirements.
You're avoiding influence of dev/test on the prod & vice versa. Each storage account has limits on the number of API calls and bandwidth. This is especially helpful when you're trying to do load testing that may affect all environments if you have shared storage account.
You can mount as many storage accounts as you want, but you need to understand that everyone in workspace will have access to the data in that mount, and it will depends on the permission of the service principal or shared signature used for mount. More secure way would be to use credentials scoped to the specific cluster(s).

Related

Google Service Account - Multiple Servers

I'm using a Google Service Account to push MySQL backups from our webserver to Google Drive using a Google API PHP client script setup as a cron job.
I now want to run the same script across multiple webservers, I'm not sure how to correctly configure the service account for this, should I?
Use the same service account and service account key/credentials across all servers?
OR Use the same service account, but add a service account key/credentials for each server?
OR Setup a separate service account for each server?
Your requirements/needs/whatever may vary, but this generally it won't wont matter how you do it.
Single project
Create a project on Google developer console, and create a single service account key file and share it across the servers.
Is the same as.
Creating a project on google developer console and creating three separate service account credentials with different key files.
You will be bound by the same quota limits because all are under the same project.
Three separate projects.
Now if you created three different projects and create a single service account credential for each of the different servers. Then you would see a difference as they are different projects so should be bound by different quota limits.
quota
The Google drive default quota is so high anyway i'm not sure that it really matters what you do.
Security
Even security wise if you have a different file and you were hacked on one server you would't gain or loose anything by the other servers not having the same or different key files.

best EC2 instance type for a management console of other instances

I have an application which requires a strong GPU, and it runs on an EC2 instance of type p2.xlarge which is ideal for that kind of tasks. Because the p2.xlarge instances are quiet expensive though, I keep them offline and only start them when necessary.
Sometimes I do multiple calculations on 1 instance, and sometimes I even use multiple instances at the same time.
I've written an application in Angular that can visualize the results of these calculations. Which I've only tested in an environment where the angular application is hosted on that same instance.
But since I have multiple instances, it would be ideal to visualize them all on a single webpage. So that leads me to the diagram below, where a single instance is a like a portal or management console that controls the other instances.
Now, to get things moving, I would like to setup this front-end server as soon as possible. But there are so many instance types to choose from. What would be the best instance type for this front-end server for a dashboard / portal that controls other aws instances. The only requirements are:
of course it should be able to run a nodejs server (and a minimalistic db for storing logins).
it should be able to start/stop other aws instances.
it should be able to communicate to other aws instances using websockets, and as far as I'm concerned, that shouldn't even really be over the internet, that can be within the aws network.
Well ,
of course it should be able to run a nodejs server (and a minimalistic db for storing logins).
Sounds like you need a small machine .
I would suggest using the T2/T3 family . very cheap and can be configured without bursting limits which gives you all the power you need for a very low price .
it should be able to start/stop other aws instances.
Not a problem ,
Create an IAM role which have permissions to EC2 and when you
launch your instance , give it that IAM role.
It will be able to do what ever you grant it to do with the api.
Pay attention to the image you use ,
if you take the Amazon Linux 2 you get the aws-cli preinstalled ,
it's pretty nice .
Read more about IAM roles here.
it should be able to communicate to other aws instances using websockets, and as far as I'm concerned, that shouldn't even really be over the internet, that can be within the aws network.
Just make sure you launch all instances in the same VPC .
when machines are in the same vpc they can communicate with each other only with internal ips .
You can create a new VPC like here
Or , just use the default one .
after you launch the instance you will
get it's internal IP.

Heroku account sharing

We have several developers, working on the same application (to be deployed on Heroku).
We know they can open separate Heroku accounts, and share the application using "heroku sharing:add".
But is it possible to use a single "team" account? Are there limitations on people logging in simultaneously from different PC's ? Or any other technical reason to avoid it?
Note we're not worried about them overriding each other's deployments, because it's for development (not production), and it's a small team.
Thanks :)
Although you can grant limited access to multiple Heroku accounts, only one "Owner" account has privileges to modify the account configuration.
If more than one person needs to modify your account/app configuration (ie: changing/provisioning add-ons, etc), it's best to create a shared e-mail/password stored in a secure password manager like 1Password. However, this is a hassle and opens up some vulnerability. It may also be against Heroku's TOS, but isn't likely enforced.
I would recommend using multiple accounts for anyone who needs read-only or deploy access. I would limit a shared Owner account to the privileged users who need full access.

Azure cache configs - multiple on one storage account?

while developing Azure application I got famous error "Cache referred to does not exist", and after a while I found this solution: datacacheexception Cache referred to does not exist (for short: dont point multiple cache clusters to one storage account by ConfigStoreConnectionString)
Well, I have 3 roles using co-located cache, and testing+production environment. So I would have to create 6 "dummy" storage accounts just for cache configuration. This doesnt seems very nice to me.
So the question is - is there any way to point multiple cache clusters to one storage account? for example, specify different containers for them (they create one named "cacheclusterconfigs" by default) or so?
Thanks!
Given your setup, i would point each cloud service at its own storage account. So this gives two per environment (one for each cloud service). Now there are other alternatives, you could set up Server AppFabric cache in an IaaS VM and expose that to both of your cloud services by placing them all within a single Azure Virtual Network. However, this will introduce latency to the connections as well as increase costs (from running the virtual network).
You can also put the storage account for cache as the same one used by diagnostics or the data storage for your cloud services, just be aware of any scalability limits as the cache will generate some traffic (mainly from the addition of new items to the cache).
But unfortunately, to my knowledge there's no option currently to allow for two caches to share the same storage account.

What are viable ways to develop an Azure app on multiple machines

The scenario is that I am rebuilding an application that is presently SQL and classic asp. However I want to update this a bit to leverage Azure Tables. I know that the Azure SDK has the Dev Fabric storage thing available and I guess it's an option to have that installed on all of my machines.
But I'm wondering if there is a less 'invasive' way to mimick the Azure Tables. Do object DBs or document DBs provide a reasonable facsimile that could be used for the early protoyping. Or is making the move from them to the Azure SDK tables just more headache than it's worth?
In my opinion you should skip the fake Azure tables completely. Even the MS development storage is not an exact match to how things will actually run in the cloud. You get 1M transactions for $1, 1GB of storage for $0.15 and $0.15 per GB in/out of the data centre. If you're just prototyping, live dangerously and spend $10.
If you're new to working with Azure tables and you try to use a development storage or some other proxy you'll save yourself that much money in time spent reworking your code to work against the real thing.
If you're just using tables and not queues, blobs $10 will go a long way.
Each Azure "project" (which is like an Azure account) is initially limited to 5 hosted storage accounts. Let's say you have one storage account for your actual content, and one for diagnostics. Now let's say you want a dev and QA storage account for each of those, respectively, so you don't damage production data. You've now run out of your storage accounts (in the above scenario, you'd need 6 hosted accounts). You'd need to call Microsoft and ask for this limit to be increased...
If, instead, you used the Dev Fabric for your tables during development / local testing, you'll free up Azure storage accounts (the ones used for dev - you'd still want QA storage accounts to be in Azure, not Dev Fabric).
One scenario where you can't rely on Dev Fabric for storage: performance/load testing.

Resources