How to use databricks dbx with an Azure VPN? - azure-databricks

I am using dbx to deploy and launch jobs on ephemeral clusters on Databricks.
I have initialized the cicd-sample-project and connected to a fresh empty Databricks Free trial environment and everything works (this means, that I can successfully deploy the python package with this command
python -m dbx deploy cicd-sample-project-sample-etl --assets-only
and execute it through the python -m dbx launch cicd-sample-project-sample-etl --from-assets --trace
When I try to launch the same exact job on my company's Databricks environment the deploy command goes through. The only difference is that my company's Databricks environment connects to Azure through a VPN.
Therefore, I added some rules to my firewall
firewall_rules_img
firewall_rules_2_img
but when I give the dbx launch command I get the following error
error_node_img
and in the log this message appears
WARN MetastoreMonitor: Failed to connect to the metastore InternalMysqlMetastore(DbMetastoreConfig{host=consolidated-westeuropec2-prod-metastore-3.mysql.database.azure.com, port=3306, dbName=organization5367007285973203, user=[REDACTED]}). (timeSinceLastSuccess=0) org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLTransientConnectionException: metastore-monitor - Connection is not available, request timed out after 15090ms. at org.skife.jdbi.v2.DBI.open(DBI.java:230)
I am not even trying to write on the metastore, I am just logging some data:
from cicd_sample_project.common import Task
class SampleETLTask(Task):
def launch(self):
self.logger.info("Launching sample ETL task")
self.logger.info("Sample ETL task finished!")
def entrypoint(): # pragma: no cover
task = SampleETLTask()
task.launch()
if __name__ == "__main__":
entrypoint()
Does someone encountered the same problem? Where you able to use Databricks-dbx with an Azure VPN?
Please let me know and thanks for your help.
PS: If needed I can provide the full log

In your case the egress traffic isn't configured correctly - it's not the DBX problem, but general Databricks networking problem. Just make sure that outgoing traffic is allowed to the ports and destinations described in the documentation.

Related

Run a PowerShell script on Azure AKS nodes,

I have a PowerShell script that I want to run on some Azure AKS nodes (running Windows) to deploy a security tool. There is no daemon set for this by the software vendor. How would I get it done?
Thanks a million
Abdel
Similar question has been asked here. User philipwelz has written:
Hey,
although there could be ways to do this, i would recommend that you dont. The reason is that your AKS setup should not allow execute scripts inside container directly on AKS nodes. This would imply a huge security issue IMO.
I suggest to find a way the execute your script directly on your nodes, for example with PowerShell remoting or any way that suits you.
BR,
Philip
This user is right. You should avoid executing scripts on your AKS nodes. In your situation if you want to deploy Prisma cloud you need to go with the following doc. You are right that install scripts work only on Linux:
Install scripts work on Linux hosts only.
But, for the Windows and Mac software you have specific yaml files:
For macOS and Windows hosts, use twistcli to generate Defender DaemonSet YAML configuration files, and then deploy it with kubectl, as described in the following procedure.
The entire procedure is described in detail in the document I have quoted. Pay attention to step 3 and step 4. As you can see, there is no need to run any powershell script:
STEP 3:
Generate a defender.yaml file, where:
The following command connects to Console (specified in [--address](https://docs.paloaltonetworks.com/prisma/prisma-cloud/prisma-cloud-admin-compute/install/install_kubernetes.html#)) as user <ADMIN> (specified in [--user](https://docs.paloaltonetworks.com/prisma/prisma-cloud/prisma-cloud-admin-compute/install/install_kubernetes.html#)), and generates a Defender DaemonSet YAML config file according to the configuration options passed to [twistcli](https://docs.paloaltonetworks.com/prisma/prisma-cloud/prisma-cloud-admin-compute/install/install_kubernetes.html#). The [--cluster-address](https://docs.paloaltonetworks.com/prisma/prisma-cloud/prisma-cloud-admin-compute/install/install_kubernetes.html#) option specifies the address Defender uses to connect to Console.
$ <PLATFORM>/twistcli defender export kubernetes \
--user <ADMIN_USER> \
--address <PRISMA_CLOUD_COMPUTE_CONSOLE_URL> \
--cluster-address <PRISMA_CLOUD_COMPUTE_HOSTNAME>
- <PLATFORM> can be linux, osx, or windows.
- <ADMIN_USER> is the name of a Prisma Cloud user with the System Admin role.
and then STEP 4:
kubectl create -f ./defender.yaml
I think that the above answer is not completely correct.
The twistcli command, does not export daemonset for Windows Nodes. The "PLATFORM" option, is for choosing the OS of the computer that the command will run.
After testing, I have made the conclusion that there is no Docker Image for Prisma Cloud for Windows Kubernetes Nodes, as it is deployed as a service at Windows OS, and not Container (as in Linux). Wrapping up, the Daemonset is not working at the Windows Hosts
I believe the only solution is this -> Windows
This is the Powershell script that WytrzymaƂy Wiktor has mentioned.
Unfortunately this cannot be automated easily, as you have to deploy an Azure VM per AKS Cluster (at the same network), and RDP to the AKS Windows Node and run the script.
If anyone has another suggestion or solution, feel free to share.

Error syncing pod on starting Beam - Dataflow pipeline from docker

We are constantly getting an error while starting our Beam Golang SDK pipeline (driver program) from a docker image which works when started from local / VM instance. We are using Dataflow runner for our pipeline and Kubernetes to deploy.
LOCAL SETUP:
We have GOOGLE_APPLICATION_CREDENTIALS variable set with service account for our GCP cluster. When running the job from local, job gets submitted to dataflow and completes successfully.
DOCKER SETUP:
Build image used is FROM golang:1.14-alpine. When we pack the same program with Dockerfile and try to run, it fails with error
User program exited: fork/exec /bin/worker: no such file or directory
On checking Stackdriver logs for more details, we see this:
Error syncing pod 00014c7112b5049966a4242e323b7850 ("dataflow-go-job-1-1611314272307727-
01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"),
skipping: failed to "StartContainer" for "sdk" with CrashLoopBackOff:
"back-off 2m40s restarting failed container=sdk pod=dataflow-go-job-1-
1611314272307727-01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"
Found reference to this error in Dataflow common errors doc, but it is too generic to figure out whats failing. After multiple retries, we were able to eliminate any permission / access related issues from pods. Not sure what else could be the problem here.
After multiple attempts, we decided to start the job manually from a new Debian 10 based VM instance and it worked. This brought to our notice that we are using alpine based golang image in Docker which may not have all the required dependencies installed to start the job.
On golang docker hub, we found a golang:1.14-buster where buster is codename for Debian 10. Using that for docker build helped us solve the issue. Self answering here to help anyone else facing the same issues.

How to safely fix an AWOL ambari system user?

I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806

SSH timeout error on Azure DevOps CD pipeline

I am getting an timeout error when trying to deploy to an VM instance hosted on AWS. Manually I can log ing using
ssh -i myKeyFile.pem myuser#IP
Once I accessed the remote machine I can execute some docker commands and everything works fine. But now that I need to automated that on the CD pipeline is where I am getting the following error:
2020-06-02T21:37:12.6877276Z Trying to establish an SSH connection to ***#IP:port
2020-06-02T21:38:52.4629461Z ##[error]Failed to connect to remote machine. Verify the SSH service connection details. Error: Error: Timed out while waiting for handshake.
2020-06-02T21:38:52.4685976Z ##[section]Finishing: Run shell commands on remote machine
The steps I follow to make the SSH connection are:
I created a SSH service connection on the project settings in Azure DevOps
I created the CD pipeline
I added a SSH task with the following parameters
When I manually trigger it to test if it works, the release start working fine but after 1:43 minutes more or less is when I got the error:
Then, when I review the logs, it is the same error I pasted at the beginning:
[error]Failed to connect to remote machine. Verify the SSH service connection details. Error: Error: Timed out while waiting for handshake
I've increase the handshake timeout settings from the default one (20000) to 90000, but no luck.
Any one has face this problem before?
Seems there is an ongoing error with the default agent pools from Azure DevOps. Lot of people have been reported this and Azure DevOps teams is working on it at the time this post is been written (I couldn't find the post where all that is details. I will add this later on).
The workaround is
To create a self-hosted agent.
After this has been created you will need to re-create your CD pipeline using the new self-hosted agent.
The rest of the SSH task configuration depends on your needs. But if you want to test the SSH connection works, just print something:
echo 'I'm connected'
After this you CD pipeline should be working fine.
More details on how to created the Self-Hosted Agent on Windows. There are also links for Linux and Mac.
I had a similar issue with a VM in Azure. It turned out I had set the security group to only allow SSH in from my local network and Azure Dev-Ops agents obviously run in a Microsoft network and were coming from a different IP Address range. The solution was to open up SSH to all source IP Addresses. You can get the list of IP address ranges Dev-Ops agents use but they appear to change every week which isn't very helpful.
See https://learn.microsoft.com/en-us/azure/devops/organizations/security/allow-list-ip-url?view=azure-devops#microsoft-hosted-agents

Deploy API REST IBM Hyperledger Composer Blockchain

I'm developing a POC over IBM HyperLedger Blockchain. I have a business network developed and deployed in IBM Cloud. I can generate a working local API REST, but cannot make it work on cloud, on the deployed IP.
I'm following this guide:
https://ibm-blockchain.github.io/interacting/
You just have to execute the following command:
./create/create_composer-rest-server.sh --business-network-card MY_BIZNET_CARD_NAME
But it doesn't deploy anything, and get the following (more related to kubernetes than blockchain).
Preparing yaml file for create composer-rest-server
Creating composer-rest-server pod
Running: kubectl create -f /Users/sm/jsblock/ibm-container-service/cs-offerings/scripts/../kube-configs/composer-rest-server.yaml
The connection to the server localhost:8080 was refused - did you specify the right host or port?
the server doesn't have a resource type "svc"
Creating composer-rest-server service
Running: kubectl create -f /Users/sm/jsblock/ibm-container-service/cs-offerings/scripts/../kube-configs/composer-rest-server-services-free.yaml
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Composer rest server created successfully
Any ideas? Thanks too much.
You need to ensure you have a correct kube config setup. Step 10 in https://ibm-blockchain.github.io/setup/ provides the details to set up KUBECONFIG as the error suggests that either it is not configured or not configured correctly.
The document you refer to https://ibm-blockchain.github.io/interacting/ is being updated and should be available soon.
When you run the command ./create/create_composer-rest-server.sh --business-network-card MY_BIZNET_CARD_NAME - should be the name of the Network Admin for the network you deployed, NOT the PeerAdmin card so it will be something like ./create/create_composer-rest-server.sh --business-network-card admin#perishable-network
Look like it's an issue of acceess control. You should make sure again you are running with Local Admin configuration.it will help you to run queries

Resources