Modiyfing conda configuration file does not reflect changes in environment - anaconda

I am trying to change the default installation location for Conda environments because the system I am using (a supercomputing cluster) has a ~20GB user home quota. Under normal circumstances, this could easily be done by editing ~/.condarc and adding a portion envs_dirs, which is explained quite well in this question and answer.
However, it seems that the compute environment I am in (i.e., with the supercomputer), does not let me modify the priority of various locations for environments. In an ideal world, I would be able to place /work/helikarlab/joshl/.conda/envs at the top of the list, which is a high-storage partition, so I can install additional environments if needed.
My ~/.condarc is configured as follows:
env_prompt: ({name})
channels:
- conda-forge
- bioconda
- defaults
auto_activate_base: false
envs_dirs:
- /work/helikarlab/joshl/.conda/envs/
Yet, I observe the following entries with conda config --show envs_dirs
envs_dirs:
- /home/helikarlab/joshl/.conda/envs
- /util/opt/anaconda/deployed-conda-envs/packages/python/envs
- /util/opt/anaconda/deployed-conda-envs/packages/perl/envs
- /util/opt/anaconda/deployed-conda-envs/packages/git/envs
- /util/opt/anaconda/deployed-conda-envs/packages/nano/envs
- /work/helikarlab/joshl/.conda/envs
- /home/helikarlab/joshl/.conda/envs/base_env/envs
Does anyone know why my attempt set envs_dirs is not working? How can I set the /work/helikarlab/joshl/.conda/envs to the highest priority?
Additional Info
Here is the result from conda config --show-sources
==> /util/opt/anaconda/4.9.2/.condarc <==
allow_softlinks: False
auto_update_conda: False
auto_activate_base: False
notify_outdated_conda: False
repodata_threads: 4
verify_threads: 4
execute_threads: 2
aggressive_update_packages: []
pkgs_dirs:
- ${WORK}/.conda/pkgs
- ${HOME}/.conda/pkgs
channel_priority: disabled
channels:
- hcc
- https://conda.anaconda.org/t/<TOKEN>/hcc
- conda-forge
- bioconda
- defaults
- file:///util/opt/conda_repo
==> /home/helikarlab/joshl/.condarc <==
auto_activate_base: False
env_prompt: ({name})
envs_dirs:
- /work/helikarlab/joshl/.conda/envs/
channel_priority: disabled
channels:
- conda-forge
- bioconda
- defaults
==> envvars <==
envs_path:
- /home/helikarlab/joshl/.conda/envs
- /util/opt/anaconda/deployed-conda-envs/packages/python/envs
- /util/opt/anaconda/deployed-conda-envs/packages/perl/envs
- /util/opt/anaconda/deployed-conda-envs/packages/git/envs
- /util/opt/anaconda/deployed-conda-envs/packages/nano/envs

Background: Conda's configuration priorities
As documented in "The Conda Configuration Engine for Power Users" post, Conda sources configuration values from four sources, listed from lowest to highest priority:
Default values in the Python code
.condarc configuration files (system < user < environment < working directory)
Environment variables (CONDA_* variables)
Command-line specifications
Problem: Environment variable prioritized
We can observe how this plays out in OP's case, with the --show-sources result. Specifically, there are three places where envs_dirs is defined:
System level configuration file at /util/opt/anaconda/4.9.2/.condarc
User-level configuration file at /home/helikarlab/joshl/.condarc
Environment variable CONDA_ENVS_PATH1
And since the environment variable takes priority and defines the preferred directory to be /home/helikarlab/joshl/.conda/envs, that will take precedence no matter what is set with conda config and .condarc files.
Workarounds
All the following workarounds involve manipulating the environment variable. It is unclear when the variable is set (probably via a system-level shell configuration file). It should be reliable to manipulate the variable by appending any of the following workarounds to user-level shell configuration file (e.g., ~/.bashrc, ~/.bash_profile, ~/.zshrc).
Option 1: Unset variable
One could completely remove the variable with
unset CONDA_ENVS_PATH
This would then allow the user-level .condarc to take priority.
However, this variable also appears to provide locations for several system-level shared environments. It is unclear how integral these shared environments are for normal functionality. So, removing the variable altogether could have additional consequences.
Option 2: Replace value
Conveniently, the location default and desired locations differ only by replacing /home with /work. This could be changed directly in the variable with:
export CONDA_ENVS_PATH=${CONDA_ENVS_PATH/\/home/\/work}
Option 3: Prepend desired default
The most general override would be to prepend the desired default path to the environment variable:
export CONDA_ENVS_PATH="/work/helikarlab/joshl/.conda/envs/:${CONDA_ENVS_PATH}"
This is probably the most robust, since it assumes nothing about the inherited value.
Additional Note
Users with small disk quotas in default locations should also consider moving the package cache (pkgs_dirs) to coordinate with the environments directory. Details in this answer.
[1]: CONDA_ENVS_DIRS and CONDA_ENVS_PATH are interchangeable, however only one can be defined at time. The former is the contemporary usage, so I believe the latter is likely supported for backward compatibility.

Related

Bash variable of name starting with 'DYLD' is not loaded into environment: bug or feature?

On MacOs, try the following:
export AYLD_VARIABLE=Aa
export BYLD_VARIABLE=Bb
export CYLD_VARIABLE=Cc
export DYLD_VARIABLE=Dd
export EYLD_VARIABLE=Ee
export FYLD_VARIABLE=Ff
env | grep VARIABLE
The variable DYLD_VARIABLE is not displayed among others. It can not be exported.
However, it may be set and used:
DYLD_VARIABLE=DdDd
echo $DYLD_VARIABLE
It is just not present in env.
I know, MacOs uses the acronym 'DYLD' while speaking on some internal files. But this prefix should not be discriminated.
It is not just an academic issue. I failed to do
export DYLD_LIBRARY_PATH=/Users/username/Downloads/instantclient_19_8
which is probably required to install DBD::Oracle for Perl.
How to set it up ?
This is a feature of the hardened runtime environment. Several DYLD_* variables can be used to inject malicious libraries into trusted binaries, so those variables are removed when a binary that uses the hardened runtime environment loads (unless it has the com.apple.security.cs.allow-dyld-environment-variables or com.apple.security.get-task-allow entitlement).
I'm not familiar with how DBD::Oracle is set up, but if it depends on setting DYLD_* variables, it seriously needs to be rewritten to avoid that.
For more info, see Apple's developer documentation on the hardened runtime, and the WWDC19 presentation "All About Notarization" (starting at 16:19, or page 75 of the slide deck).

Avoid path redundancy in Gitlab CI include

To improve the structure of my Gitlab CI file I include some specific files, like for example
include:
- '/ci/config/linux.yml'
- '/ci/config/windows.yml'
# ... more includes
To avoid the error-prone redundancy of the path I thought to put it into a variable, like:
variables:
CI_CONFIG_DIR: '/ci/config'
include:
- '${CI_CONFIG_DIR}/linux.yml' # ERROR: "Local file `${CI_CONFIG_DIR}/linux.yml` does not exist!"
- '${CI_CONFIG_DIR}/windows.yml'
# ... more includes
But this does not work. Gitlab CI claims that ${CI_CONFIG_DIR}/linux.yml does not exist, although the documentation says that variables in include paths are allowed, see https://docs.gitlab.com/ee/ci/variables/where_variables_can_be_used.html#gitlab-ciyml-file.
What also didn't work was to include a file /ci/config/main.yml and from that include the specific configurations without paths:
# /ci/config/main.yml
include:
- 'linux.yml' # ERROR: "Local file `linux.yml` does not exist!"
- 'windows.yml'
# ... more includes
How can I make this work or is there an alternative to define the path in only one place without making it too complicated?
This does not seem to be implemented at the moment, and there is an open issue at the moment in the backlog.
Also, with the documentation saying that you could use variables within include sections, those are only for predefined variables.
See if GitLab 14.2 (August 2021) can help:
Use CI/CD variables in include statements in .gitlab-ci.yml
You can now use variables as part of include statements in .gitlab-ci.yml files.
These variables can be instance, group, or project CI/CD variables.
This improvement provides you with more flexibility to define pipelines.
You can copy the same .gitlab-ci.yml file to multiple projects and use variables to alter its behavior.
This allows for less duplication in the .gitlab-ci.yml file and reduces the need for complicated per-project configuration.
See Documentation and Issue.

Are there any vars that Ansible can merge?

Today I noticed that Ansible won't merge vars.
For example when I have something like
---
lvm_roles:
postgresql:
size: '10g'
path: '/var/lib/postgresql'
And in another place I have for example
---
lvm_roles:
sonarqube:
size: '10g'
path: '/opt/sonarqube'
Ansible won't merge these facts. I am not sure about precedence but I think the first one wins. Without errors or warnings. IMHO a dangerous feature for a configuration management tool.
Are there any vars that Ansible can merge? Lists and hash won't work. Is there a workaround of some sort for this?
This is a significant shortcoming of Ansible. Because "facts" can be dependent on what you are provisioning. The inability to merge "facts" make it necessary to hard code and duplicate the stuff that you wan't to be configurable.
For example when I create one file with
lvm_roles:
postgresql:
size: '10g'
path: '{{ postgresql_home }}'
sonarqube:
size: '10g'
path: '{{ sonar_home }}'
This will not work because sonar_home is not defined on de postgresql node. On the the sonarqube node, postgresql_home is not defined. The ability to flexibly use vars is greatly impacted if merging is not possible.
Extract of a default ansible.cfg file:
# if inventory variables overlap, does the higher precedence one win
# or are hash values merged together? The default is 'replace' but
# this can also be set to 'merge'.
#hash_behaviour = replace
You can therefore change this behavior by setting hash_behaviour = merge.
I would not change that on a system wide basis as it might break other projects/roles that would rely on a default behavior. You can distribute the ansible.cfg at the root of your specific project that really needs this.
Meanwhile, as #dgw pointed out with a specific example, I've always been able to keep the default behavior by carefully choosing where to place my variables (group or host in inventory, included file, playbook...) and eventually merge them myself if needed.

How to share environment variables across AWS CodeDeploy steps?

I am working on a new deployment strategy that leverages AWS CodeDeploy. The project I work on has many environments (e.g: preproduction, production) and instances (e.g: EMEA, US, APAC).
I have the basic scaffolding working ok but I noticed environment variables set in the BeforeInstall hook can not be retrieved from other steps (for instance, AfterInstall).
Is there a way to share environment variables across AWS CodeDeploy steps?
Content of appspec.yml:
version: 0.0
os: linux
files:
- source: /
destination: /tmp/code-deploy
hooks:
BeforeInstall:
- location: utils/delivery/aws/CodeDeploy/before_install.sh
timeout: 300
AfterInstall:
- location: utils/delivery/aws/CodeDeploy/after_install.sh
timeout: 300
ApplicationStart:
- location: utils/delivery/aws/CodeDeploy/application_start.sh
timeout: 300
ValidateService:
- location: utils/delivery/aws/CodeDeploy/validate_service.sh
timeout: 300
I set an environment variable in before_install.sh:
export ENVIRONMENT=preprod
And if I reference it in after_install.sh:
$ echo $ENVIRONMENT
$
Nothing.
Thank you for your help on this one!
You could put the export into a temporary file, and then, source that file. So within before_install.sh:
ENVIRONMENT="preprod"
echo "export ENVIRONMENT=\"$ENVIRONMENT\"" > "/path/to/file"
Note: With this method, you are no longer exporting the variable in before_install.sh. You are simply writing a file to be sourced in after_install.sh:
source "/path/to/file"
echo "$ENVIRONMENT"
You should consider setting those variables up in the userdata phase of the instance launch, instead of at deploy time. This allows them to be available to all codedeploy scripts during the life of the instance.
The type of data you describe eg Environment is more associated with the instance itself, and would not normally change during code deployment.
In your Userdata you would set an instance level variable like this:
export ENVIRONMENT="preprod" >> /etc/environment
Another advantage of this approach is that your app itself may want to consult these variables when it launches, to provide environment specific configuration.
If you use Cloudformation, you can set the environment up as a parameter, and pass that on to the user data script. In this way, you can launch the stack and its resources with the appropriate parameters, and launch consistent instances for any environment.

How do the various ways of setting GHC options in haskell-stack work together

While setting up a deploy pipeline for optimised builds of a server application, I ran into some trouble getting the GHC options right with stack-1.6.5.
In particular, from the docs it doesn't get clear to me how the various ways to specify GHC options work together and when and how they are applied.
As far as I can tell, there are X ways of specifying GHC options:
globally as ghc-options: in ~/.stack/config.yaml and/or /etc/stack/config.yaml, per package or with "$locals", "$targets" or "$everything"
in the project stack.yaml file, per package or with "$locals", "$targets" or "$everything"
in the project package.yaml/.cabal file, globally or per target
in a dependency stack.yaml/package.yaml/.cabal files
on the stack command line via --ghc-options
and there is the apply-ghc-options: setting locals/targets/everything in stack.yaml and ~/.stack/config.yaml and/or /etc/stack/config.yaml
I'd like to know which options are applied in the different build phases snapshots/locals/targets in which order and in which cases they are additive or override options given elsewhere.
good question, this is not adequately documented. These tend to be additive. Most of the logic for this is here: https://github.com/commercialhaskell/stack/blob/657937b0ac5dbef29114b43e9c69e2b57198af85/src/Stack/Build/Source.hs#L131 . Here's the order, where later items in the list come later in the options provided to ghc:
Options specified in the package.yaml / cabal file.
$everything in ghc-options in stack.yaml
$locals in ghc-options in stack.yaml
$targets in ghc-options in stack.yaml
Special options like -fhpc (--coverage) / fprof-auto -fprof-cafs (--profile) / -g (--no-strip).
Options specified via --ghc-options on the CLI
There is currently an issue where $everything / $locals / $targets ghc-options specified in .stack/config.yaml are not additive. Instead they are currently shadowed by the project stack.yaml. There is a PR fixing this, it will probably get merged at some point: https://github.com/commercialhaskell/stack/pull/3781

Resources