Install package in separate area for read-only Anaconda Linux install - anaconda

at work we have a central, read-only, Linux Anaconda installation, and several projects need library packages for their individual project members.
Is there a way to conda install packages in a writable area set aside for each project?
Our Linux servers are also not directly web connected, but we can transfer data from a Windows machine that is. Is there a way for the windows conda to download data for our Linux install in such a way that I can transfer the downloaded files to Linux and then finish the install on Linux , with the conda linux not needing a direct web connection?
Thanks in advance :-)

The best answer to this question is a bit oblique: the Anaconda Distribution is designed for a single user on a single system with unrestricted access to the Internet. Any other use is considered "off label" and YMMV, though there are no license restrictions in place preventing you from trying to use it as you see fit. Anaconda Enterprise is the commercial product that is specifically designed for multi-user, server-deployed Anaconda with firewall restrictions. Security, governance, indemnification, support, collaboration, etc. etc. Check out https://www.continuum.io/ for more details.
But there are "work around" ways to achieve what you want, albeit complicated ones. For it to be reliable, reproducible, and maintainable you're going to end up reimplementing a lot of what is in Anaconda Enterprsie. Here are some tips:
Check out the "conda in multi-user environments" documentation
Check out the "Centralized Anaconda installation" documentation
Regular user alice for project foo can do conda create -p /nfs/project/foo/envs/custompython --offline anaconda; conda activate /nfs/project/foo/envs/custompython; conda install pkg1 pkg2 pkg3
You're going to run into ownership/permission issues. If you have sensible umask values then when alice's colleague bob tries to update pkg2 in the foo project he'll discover that he can't unlink the files alice wrote there. There is stuff you can do (as the IT admin) with chown, or alice can do with chmod, but its all a bit of a bother and there are lots of ways you can paralyze a conda environment because it is expecting "writability" to be binary for a particular environment. There is a long history in the conda GH issue tracker of people (myself included) shooting themselves in the foot by starting a conda env setup with one account and then making mods with another account that bork out half way through, leaving everything inconsistent.
Be careful about .condarc files. My advice: avoid them everywhere but in the base Anaconda installation (say, inside /opt/anaconda/.condarc). All sorts of weird stuff can happen when multiple overlaying .condarc files come together (the docs reference above discusses this).
People can create their own environments in an "offline" mode so long as the packages specified in those new environments (and their dependencies) are a subset of the packages available in the base environment (or subsequently added to the package cache), taking into account versions as well, of course.
You can download packages using your online Windows machine by grabbing them from repo.continuum.io and from anaconda.org. Make sure you download them for the right platform. But the challenge: you need to download a set of packages that will satisfy the dependencies of the package you want to install. There isn't a super easy way to get that information when you're offline.
Once you drop new packages into the Linux system's package cache be sure to re-run conda index.
Beware installing packages directly from their tarballs: this will not pick up any dependencies and does what is called a "force" install. So doing conda install /path/to/conda/pkg-ver.tar.bz2 is actually most similar to doing conda install --force --no-deps pkg=ver (though not identical, to be sure). --force means the install will happen NO MATTER WHAT, even if it will break your environment (violate existing package dependencies), and --no-deps means you won't get any of the dependencies of pkg installed.

Related

Why are there multiple copies of conda files?

I installed Miniconda a while ago, and since then I've noticed there seem several copies of the same files (or files with very similar names) in different locations on my computer.
For example, almost the exact same files in my folder "C:/ProgramData/Miniconda/pkgs" are also in the folder "C:/Users/me/.conda/pkgs". I should note that the only other things in the ".conda" folder is an "environments.txt" file and and "envs" folder with a file called "conda_envs_dir_test".
I've also noticed that the folder "C:/ProgramData/Miniconda/Lib/site-packages" also contains files with very similar names.
Anyway, I wanted to ask if all this is necessary, and why? Sorry if this seems like a weird question. I'm still relativity new to programming.
Conda Package Caching
Conda downloads and unpacks packages into a package cache, and then uses hardlinking to install those packages into environments. One can freely delete the files in the package caches, though this undermines Conda's ability to minimize redundancy across environments going forward. The safest way to clear the package cache is to use the command
conda clean -tp
Multiple Package Caches
It should be noted that you appear to have two package caches, a system-level cache at C:/ProgramData/Miniconda/pkgs and a user-level cache at C:/Users/me/.conda/pkgs. This occurs when users install with the "Install for All Users" option. This is typically not recommended for regular end users, but rather more for System Administrators who are managing a multi-user installation. Conda functions perfectly (and arguably with less hassle) without ever needing elevated privileges.
All that to say, you may need to elevate your privileges for the conda clean command to also clear out the system-level cache. Additionally, if you haven't been using it too long, you may consider uninstalling the system-level install and reinstalling at the user level.

What's the purpose of the "base" (for best practices) in Anaconda?

It says it's a default environment but "You don't want to put programs into your base environment, though"
So what exactly should I use it for? Do other environments I create inherit from the base?
The base environment is where conda itself gets installed. It's best to use Miniconda, and install all the things you want into separate environments.
Other environments do not inherit packages from the base environment. BUT the bin/ directory of the base environment is in the search path for executables. So if you call conda from inside any of your environments (which usually don't have conda installed), the one from the base environment is used.
If you install other executables into the base environment, they can be called from your other environments. But you'll have a hell of a tough time to distinguish whether the things you can call are actually in your environment, or in the base environment.
Therefore, it's best to just have conda in the base environment. And maybe other generic tools, like git or make, if you install that kind of tool with conda. But packages that are imported by your Python/R/whatever code do not belong into the base environment.
Don't worry about disk space if you create multiple environments with the same packages. conda does a very good job with hard-linking the same packages into multiple environments to save space.
The full Anaconda installer puts a ton of stuff into the base environment. That might seem convenient at first, but when you start creating new environments, you'll run into the problem I mentioned. You can call stuff from your new environment although it isn't installed there. Using Miniconda avoids this, at the cost of having to create a new environment before actually being able to use stuff. However, there's an anaconda meta-package which you can install to get the "ton of stuff" with one command.

Should I be concerned about python 3.6 having no code signature?

I recently downloaded a program that monitors all incoming and outgoing connections and let's me assign firewall rules on the fly. It also conveniently checks the code signature of programs to verify I am not unknowingly running a modified program.
Now whenever I try to run python3.6.6, I get this little warning. Me being paranoid, I deny access and as a consequence am unable to confidently use my anaconda distribution which uses this executable.
which python --> ~/Users/me/anaconda3/bin/python
I already compared the md5 hash of the original tarball file with a new one downloaded directly from the anaconda repository and they matched.
I am not sure exactly how to proceed...
Is there a way to manually reinstall python into anaconda without using conda? Or would I be better off deleting my anaconda distribution and performing a fresh install? OR is their an alternative that is much simpler and preferred :)?
Thanks
So I decided to backup my anaconda3 directory and them remove the entire directory which included the python3.6 in question.
I reinstalled anaconda3 with the newest release and littlesnitch reported the same suspicion, this time with python 3.7 as that was the newest version available.
This unexpected result drove me to dig deeper and following my research I came across this https://github.com/Homebrew/homebrew-core/issues/20193 and this https://github.com/Homebrew/homebrew-core/issues/18870.
In summary, it is an invalid code signature, but it's also a bug.

Adding Additional packages to Anaconda Installer

I am curious to know if there are methods where I can add additional packages to the anaconda installer. I am basically looking for a solution for creating an anaconda installer which has some extra python packages added along with it. Thus the participants to whom I give the installer need not be worrying about Internet connectivity or to add additional commands.
This is meant for an introductory hands-on python session. Hence the objective is to make the whole installation process less confusing as possible to the participants of the session.
I am aware of using docker as well as using environments. I am looking for something more simpler, say as seamless as anaconda installation for my participants.
Currently, I am thinking of doing the following.
1) Provide the .tar.gz file of the packages along with the installer
2) After installation and creating environments, install the libraries using pip from the .tar.gz
python -m pip install c:\mymodule\great.tar.gz
Any method which is simpler than the above one is welcome.
From the conda documentation on creating custom channels:
If you do not wish to upload your packages to the Internet, you can
build a custom repository served either through a web server or
locally using a file:// URL.
The instructions on that page tell you how to create a local custom repository from conda packages. They're aimed at people who are building their own packages but as far as I can see you can also use the existing packages that you can download from the repositories at https://repo.continuum.io/pkgs/.
You can then use the file:// URL of that repository in the -c specification of conda create and/or conda install commands to set up the environment for your users to work in.

How to install a Chocolatey package completely offline?

I need to install software on Windows clients that are completely offline. That means they have no Internet access.
An example. Let's say I want to install Paint.Net. I go to a reference machine (with INet) and install Paint.Net with Chocolatey.
choco install paint.net -y
After the install is finished I have the software installed and two artifacts:
The package file "paint.net.nupkg" in %ChocolateyInstall%/lib/paint.net
and
the the installer file "paint.net.4.0.6.install.zip" in %Temp%\chocolatey.
I now put these two files on a USB stick. Then I go to the offline machine, plug in the USB stick and want to install the package.
Is it possible to install the software without modifying the package? I am aware that inside the nupkg file there is a tools/chocolateyInstall.ps1 file with a $url variable defined. But I want to install the package without changing the package content or modifying the URL by hand.
I played around with the parameters --cache and --source but with little to no luck.
I have seen that this kind of question is asked before. But never (to my knowledge) with the intend to run the installer file from the stick too (and not only the package file). So I hope this is not a duplicate.
Caching Downloads - Not Deterministic
While there are ways to set the original nupkg (with the version on it, not the one in the packages directory - use download from left side of package's page on the Chocolatey community package repository) and the cache onto a USB stick somewhere, it's not always deterministic that it will work. You can also override the cache location, so that the folder is somewhere not in TEMP. See choco config, choco config -h and choco config set cacheLocation c:\some\location to do this.
Create Your Own Packages - Better
For packages you need offline, you have the ability to manage your own packages and you can embed software right into the package. This is desired when you want to manage software offline as most things on the community repository are subject to copyright law and distribution rights (why they don't simply have the software they represent embedded).
Creating and working with your own packages is very secure, reliable, and repeatable (and can be completely offline), but it does tend to take up time. If you are doing this for yourself, then it could override any time-savings you get as a consumer using Chocolatey and the community repository.
Internalized Packages - Best
The best thing you can do here is a process called internalizing, where you download and extract the package, download all of the resources and embed them in the package (or put them somewhere local/UNC share), edit the scripts to use those embedded/local resources and recompile the package.
This allows you to take advantage of existing package logic without the issue of the internet.
For more details see Recompiling Packages and Package Internalizer - Automatically Recompile Packages.
NOTE: As a side note, we are thinking of offering the ability to auto recompile with Chocolatey Pro edition and not just the Business edition.
Organization Use of Chocolatey
Most organizations using Chocolatey are doing some combination of creating packages and recompiling packages, because they need absolute trust and control over those packages when being used in production scenarios.

Resources