Folder Structure for CI/CD conform Databricks Repo - continuous-integration

Are there any best-practices how to organize your project folders so that the CI/CD pipline remains simple?
Here, the following structure is used, which seems to be quite complex:
project
│ README.md
│ azure-pipelines.yml
│ config.json
│ .gitignore
└─── package1
│ │ __init__.py
│ │ setup.py
│ │ README.md
│ │ file.py
│ └── submodule
│ │ │ file.py
│ │ │ file_test.py
│ └── requirements
│ │ │ common.txt
│ │ │ dev.txt
│ └─ notebooks
│ │ notebook1.txt
│ │ notebook2.txt
└─── package2
| │ ...
└─── ci_cd_scripts
│ requirements.py
│ script1.py
│ script2.py
│ ...
Here, the following structure is suggested:
.
├── .dbx
│   └── project.json
├── .github
│   └── workflows
│   ├── onpush.yml
│   └── onrelease.yml
├── .gitignore
├── README.md
├── conf
│   ├── deployment.json
│   └── test
│   └── sample.json
├── pytest.ini
├── sample_project
│   ├── __init__.py
│   ├── common.py
│   └── jobs
│   ├── __init__.py
│   └── sample
│   ├── __init__.py
│   └── entrypoint.py
├── setup.py
├── tests
│   ├── integration
│   │   └── sample_test.py
│   └── unit
│   └── sample_test.py
└── unit-requirements.txt
In concrete, I want to know:
Should I use one repo for all repositories and notebooks (such as suggested in the first approach) or should I create one repo per library (which makes the CI/CD more effortfull as there might be dependencies between the packages)
With both suggested folder structures it is unclear for me where to place my notebooks that are not related to any specific package (e.g. notebooks that contain my business logic and use the package)?
Is there a well-established folder structure?

The Databricks had a repository with project templates to be used with Databricks (link) but now it has been archived and the template creation is part of dbx tool - maybe these two links will be useful for you:
dbx init command - https://dbx.readthedocs.io/en/latest/reference/cli/?h=init#dbx-init
DevOps for Workflows Guide - https://dbx.readthedocs.io/en/latest/concepts/devops/#devops-for-workflows

Related

Why won't it kustomize the node already visited

I am using kubectl kustomizecommands to deploy multiple applications (parsers and receivers) with similar configurations and I'm having problems with the hierarchy of kustomization.yaml files (not understanding what's possible and what's not).
I run the kustomize command as follows from custom directory:
$ kubectl kustomize overlay/pipeline/parsers/commercial/dev - this works fine, it produces expected output defined in the kustomization.yaml #1 as desired. What's not working is that it does NOT automatically execute the #2 kustomization, which is in the (already traversed) directory path 2 levels above. The #2 kustomization.yaml contains configMap creation that's common to all of the parser environments. I don't want to repeat those in every env. When I tried to refer to #1 from #2 I got an error about circular reference, yet it fails to run the config creation.
I have the following directory structure tree:
custom
├── base
| ├── kustomization.yaml
│ ├── logstash-config.yaml
│ └── successful-vanilla-ls7.8.yaml
├── install_notes.txt
├── overlay
│   └── pipeline
│   ├── logstash-config.yaml
│   ├── parsers
│   │   ├── commercial
│   │   │   ├── dev
│   │   │   │   ├── dev-patches.yaml
│   │   │   │   ├── kustomization.yaml <====== #1 this works
│   │   │   │   ├── logstash-config.yaml
│   │   │   │   └── parser-config.yaml
│   │   │   ├── prod
│   │   │   ├── stage
│   │   ├── kustomization.yaml <============= #2 why won't this run automatically?
│   │   ├── logstash-config.yaml
│   │   ├── parser-config.yaml
│
Here is my #1 kustomization.yaml:
bases:
- ../../../../../base
namePrefix: dev-
commonLabels:
app: "ls-7.8-logstash"
chart: "logstash"
heritage: "Helm"
release: "ls-7.8"
patchesStrategicMerge:
- dev-patches.yaml
And here is my #2 kustomization.yaml file:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configMapGenerator:
# generate a ConfigMap named my-generated-configmap-<some-hash> where each file
# in the list appears as a data entry (keyed by base filename).
- name: logstashpipeline-parser
behavior: create
files:
- parser-config.yaml
- name: logstashconfig
behavior: create
files:
- logstash-config.yaml
The issue lays within your structure. Each entry in base should resolve to a directory containing one kustomization.yaml file. The same goes with overlay. Now, I think it would be easier to explain on an example (I will use $ to show what goes where):
├── base $1
│ ├── deployment.yaml
│ ├── kustomization.yaml $1
│ └── service.yaml
└── overlays
├── dev $2
│ ├── kustomization.yaml $2
│ └── patch.yaml
├── prod #3
│ ├── kustomization.yaml $3
│ └── patch.yaml
└── staging #4
├── kustomization.yaml $4
└── patch.yaml
Every entry resolves to it's corresponding kustomization.yaml file. Base $1 resolves to kustomization.yaml $1, dev $2 to kustomization.yaml $2 and so on.
However in your use case:
├── base $1
| ├── kustomization.yaml $1
│ ├── logstash-config.yaml
│ └── successful-vanilla-ls7.8.yaml
├── install_notes.txt
├── overlay
│ └── pipeline
│ ├── logstash-config.yaml
│ ├── parsers
│ │ ├── commercial
│ │ │ ├── dev $2
│ │ │ │ ├── dev-patches.yaml
│ │ │ │ ├── kustomization.yaml $2
│ │ │ │ ├── logstash-config.yaml
│ │ │ │ └── parser-config.yaml
│ │ │ ├── prod $3
│ │ │ ├── stage $4
│ │ ├── kustomization.yaml $???
│ │ ├── logstash-config.yaml
│ │ ├── parser-config.yaml
│
Nothing resolves to your second kustomization.yaml.
So to make it work you should put those files separately under each environment.
Below you can find sources with some more examples showing how the tipical directory structure should look like:
Components
Directory layout
GitHub

Github actions can't find package within repository

When setting up a github actions pipeline, I can't get it to find packages that are within my repository, and the test fails because it's missing packages.
What happens is that it clones the repo someplace but doesn't include the cloned repo's directories to look for packages. That fails because I am importing packages from within that repo in my code.
I believe my directory structure is sound because I have no trouble testing and building locally:
. │
├── extractors │
│   ├── fip.go │
│   └── fip_test.go │
├── fixtures │
│   └── fip │
│   ├── bad_req.json │
│   └── history_response.json │
├── .github │
│   └── workflows │
│   └── go_test.yml │
├── main.go │
├── Makefile │
├── playlist │
│   └── playlist.go │
├── README.md │
└── utils │
├── logger │
│   └── logger.go │
└── mocks │
└── server.go │
│
View the run here
How do I make Github actions look for the package within the cloned dir as well?
Make sure to run go mod init MODULE_NAME (if the project is outside GOROOT or GOPATH) or just simply go mod init (if the project is inside GOROOT or GOPATH). The command should be run on the root folder of your project. This would create a go.mod file that would enable go resolve your packages.

How do you specify environment specific inventory files?

I have a folder structure like so:
.
├── ansible.cfg
├── etc
│   ├── dev
│   │   ├── common
│   │   │   ├── graphite.yml
│   │   │   ├── mongo.yml
│   │   │   ├── mysql.yml
│   │   │   └── rs4.yml
│   │   ├── inventory
│   │   └── products
│   │   ├── a.yml
│   │   ├── b.yml
│   │   └── c.yml
│   └── prod
│   ├── common
│   │   ├── graphite.yml
│   │   ├── mongo.yml
│   │   ├── redis.yml
│   │   └── rs4.yml
│   ├── inventory
│   └── products
│   ├── a.yml
│   ├── b.yml
│   └── c.yml
├── globals.yml
├── startup.yml
├── roles
| └── [...]
└── requirements.txt
And in my ansible.cfg, I would like to do something like: hostfile=./etc/{{ env }}/inventory, but this doesn't work. Is there a way I can go about specifying environment specific inventory files in Ansible?
I assume common and products are variable files.
As #Deepali Mittal already mentioned your inventory should look like inventory/{{ env }}.
In inventory/prod you would define a group prod and in inventory/dev you would define a group dev:
[prod]
host1
host2
hostN
This enables you to define group vars for prod and dev. For this simply create a folder group_vars/prod and place your vars files inside.
Re-ordered your structure would look like this:
.
├── ansible.cfg
├── inventory
│ ├── dev
│ └── prod
├── group_vars
│ ├── dev
│ │ ├── common
│ │ │ ├── graphite.yml
│ │ │ ├── mongo.yml
│ │ │ ├── mysql.yml
│ │ │ └── rs4.yml
│ │ └── products
│ │ ├── a.yml
│ │ ├── b.yml
│ │ └── c.yml
│ └── prod
│ ├── common
│ │ ├── graphite.yml
│ │ ├── mongo.yml
│ │ ├── mysql.yml
│ │ └── rs4.yml
│ └── products
│ ├── a.yml
│ ├── b.yml
│ └── c.yml
├── globals.yml
├── startup.yml
├── roles
| └── [...]
└── requirements.txt
I'm not sure what globals.yml is. If it is a playbook, it is in the correct location. If it is a variable file with global definitions it should be saved as group_vars/all.yml and automatically would be loaded for all hosts.
Now you call ansible-playbook with the correct inventory file:
ansible-playbook -i inventory/prod startup.yml
I don't think it's possible to evaluate the environment inside the ansible.cfg like you asked.
I think instead of {{ env }}/inventory, /inventory/{{ env }} should work. Also if you can please share how you use it right now and the error you get when you change the configuration to envs one

JBoss + Maven: Error building POM

I am trying to launch JBoss application server - my goal is to launch it and deploy some very simple project (I'm trying to do this with "helloworld" from the original quickstarts). The problem is that I have no experience with JBoss or Maven, so I'm having terrible time for a few days and it still isn't working. I presume, that the mistake is in Maven configuration, but I don't know, what I'm supposed to rewrite / replace to repair it.
This is the exact error:
[INFO] Error building POM (may not be this project's POM).
Project ID: org.jboss.component.management:jboss-dependency-management-all
Reason: POM 'org.jboss.component.management:jboss-dependency-management-all' not found in repository: Unable to download the artifact from any repository
org.jboss.component.management:jboss-dependency-management-all:pom:6.0.1-redhat-1
from the specified remote repositories:
central (http://repo1.maven.org/maven2)
for project org.jboss.component.management:jboss-dependency-management-all
I was trying to follow instructions given at http://www.jboss.org/quickstarts/eap/#build-and-deploy-th%20e-quickstarts , so my only configuration of Maven was, that I copied settings.xml from quickstart directory to .m2 directory.Finally I tried to build and deploy quickstart by command "mvn clean install jboss-as:deploy", but it caused the error :-(
How to repair this mistake?
P.s.: I use Ubuntu 14.04.
This is how the structure of my folder with JBoss looks like:
.
├── InstallationLog.txt
├── InstallSummary.html
├── jboss-eap-6.2
│ ├── appclient
│ ├── bin
│ ├── bundles
│ ├── docs
│ ├── domain
│ ├── icons
│ ├── JBossEULA.txt
│ ├── jboss-modules.jar
│ ├── LICENSE.txt
│ ├── modules
│ ├── standalone
│ ├── version.txt
│ └── welcome-content
├── jboss-eap-6.2.0.GA-quickstarts
│ ├── bean-validation
│ ├── bmt
│ ├── cdi-alternative
│ ├── cdi-decorator
│ ├── cdi-injection
│ ├── cdi-interceptors
│ ├── cdi-portable-extension
│ ├── cdi-stereotype
│ ├── cdi-veto
│ ├── cluster-ha-singleton
│ ├── cmt
│ ├── configure-postgresql.cli
│ ├── CONTRIBUTING.html
│ ├── CONTRIBUTING.md
│ ├── contributor-settings.xml
│ ├── dist
│ ├── ejb-asynchronous
│ ├── ejb-in-ear
│ ├── ejb-in-war
│ ├── ejb-multi-server
│ ├── ejb-remote
│ ├── ejb-security
│ ├── ejb-security-interceptors
│ ├── ejb-throws-exception
│ ├── ejb-timer
│ ├── forge-from-scratch
│ ├── greeter
│ ├── guide
│ ├── helloworld
│ ├── helloworld-jms
│ ├── helloworld-mbean
│ ├── helloworld-mdb
│ ├── helloworld-osgi
│ ├── helloworld-rs
│ ├── helloworld-singleton
│ ├── helloworld-ws
│ ├── hibernate3
│ ├── hibernate4
│ ├── hornetq-clustering
│ ├── h2-console
│ ├── inter-app
│ ├── jax-rs-client
│ ├── jta-crash-rec
│ ├── jts
│ ├── jts-distributed-crash-rec
│ ├── kitchensink
│ ├── kitchensink-ear
│ ├── kitchensink-jsp
│ ├── kitchensink-ml
│ ├── kitchensink-ml-ear
│ ├── LICENSE.txt
│ ├── logging
│ ├── logging-tools
│ ├── log4j
│ ├── mail
│ ├── numberguess
│ ├── payment-cdi-event
│ ├── picketlink-sts
│ ├── pom.xml
│ ├── README.html
│ ├── README.md
│ ├── RELEASE_PROCEDURE.html
│ ├── RELEASE_PROCEDURE.md
│ ├── remove-postgresql.cli
│ ├── servlet-async
│ ├── servlet-filterlistener
│ ├── servlet-security
│ ├── settings.xml
│ ├── shopping-cart
│ ├── tasks
│ ├── tasks-jsf
│ ├── tasks-rs
│ ├── temperature-converter
│ ├── template
│ ├── wicket-ear
│ ├── wicket-war
│ ├── wsat-simple
│ ├── wsba-coordinator-completion-simple
│ ├── wsba-participant-completion-simple
│ ├── xml-dom4j
│ └── xml-jaxp
└── Uninstaller
└── uninstaller.jar
settings.xml: http://hostcode.sourceforge.net/view/1926
pom.xml: http://hostcode.sourceforge.net/view/1927
Eventually I've solved this - thorough delete of all Maven files ( or files which held some deprecated info about Maven ) and installation of its newest version was the key :-)
This solution is written for Ubuntu 14.04 LTS (but it should work in other versions and even other Debian-based systems too).
STEP-BY-STEP SOLUTION:
1) Completely uninstall Maven
sudo apt-get purge maven
2) Uninstall libaether package
sudo apt-get purge libaether-java
3) Uninstall libplexus packages
sudo apt-get purge libplexus-*
4) Finally, install actual Maven version from the repository
sudo apt-get install maven

Where to find source code of Mozilla NoScript extension?

I read in wiki that NoScript is open source http://en.wikipedia.org/wiki/NoScript, but on official site http://noscript.net/, I can't find any sources. So my question is: where to find sources? Or, is there something I did not understand, and the source code is not available?
The Firefox XPI format does not prevent you from simply extracting the contents of the plugin to examine the source code.
While I cannot find a canonical public repository, it looks like someone has systematically downloaded and extracted all the available XPIs and created a GitHub repository out of them.
https://github.com/avian2/noscript
If you'd like to do it yourself, XPI files are just standard ZIP files, so if you want to extract one yourself you can simply point an extraction program at it.
Here's an example of doing that from the command line:
mkdir noscript_source
cd noscript_source
curl -LO https://addons.mozilla.org/firefox/downloads/file/219550/noscript_security_suite-2.6.6.8-fx+fn+sm.xpi
unzip noscript_security_suite-2.6.6.8-fx+fn+sm.xpi
That yields a directory structure that looks like this:
.
├── GPL.txt
├── META-INF
│ ├── manifest.mf
│ ├── zigbert.rsa
│ └── zigbert.sf
├── NoScript_License.txt
├── chrome
│ └── noscript.jar
├── chrome.manifest
├── components
│ └── noscriptService.js
├── defaults
│ └── preferences
│ └── noscript.js
├── install.rdf
├── mozilla.cfg
└── noscript_security_suite-2.6.6.8-fx+fn+sm.xpi
Then the main code is located inside chrome/noscript.jar. You can extract that to get at the JavaScript that makes up the bulk of the plugin:
cd chrome/
unzip noscript.jar
Which will yield the main source tree:
.
├── content
│ └── noscript
│ ├── ABE.g
│ ├── ABE.js
│ ├── ABELexer.js
│ ├── ABEParser.js
│ ├── ASPIdiocy.js
│ ├── ChannelReplacement.js
│ ├── ClearClickHandler.js
│ ├── ClearClickHandlerLegacy.js
│ ├── Cookie.js
│ ├── DNS.js
│ ├── DOM.js
│ ├── ExternalFilters.js
│ ├── FlashIdiocy.js
│ ├── HTTPS.js
│ ├── Lang.js
│ ├── NoScript_License.txt
│ ├── PlacesPrefs.js
│ ├── Plugins.js
│ ├── Policy.js
│ ├── Profiler.js
│ ├── Removal.js
│ ├── RequestWatchdog.js
│ ├── STS.js
│ ├── ScriptSurrogate.js
│ ├── Strings.js
│ ├── URIValidator.js
│ ├── about.xul
│ ├── antlr.js
│ ├── clearClick.js
│ ├── clearClick.xul
│ ├── frameOptErr.xhtml
│ ├── iaUI.js
│ ├── noscript.js
│ ├── noscript.xbl
│ ├── noscriptBM.js
│ ├── noscriptBMOverlay.xul
│ ├── noscriptOptions.js
│ ├── noscriptOptions.xul
│ ├── noscriptOverlay.js
│ ├── noscriptOverlay.xul
│ ├── options-mobile.xul
│ └── overlay-mobile.xul
├── locale
└── skin
The extension contains the source code - you just need to unzip it. See Giorgio's response here.
The whole source code is publicly available in every each XPI.
You've got it on your hard disk right now, if you're a NoScript user, otheriwise you can download it here.
You can examine and/or modify it by unzipping the XPI and the JAR inside, and "building" it back by rezipping both.
It's been like that for ever, since the very first version.

Resources