My company has a rat's nest of a github repo in which multiple Jupyter notebook tutorials are shared with associated requirements.txt files.
I would like to streamline this so that ideally one would need to only open one of these notebooks, easily create the correct environment from the associated requirements.txt file within the notebook itself, and then just execute the notebook as normal.
An additional wrinkle is that not all of these dependencies can be installed via conda. I will need to use pip as the package installer.
Is there a way to do this entirely within a notebook without creating any additional files (ex. environment.yml)?
Related
I have a sagemaker notebook instance having two jupyter notebook ipynb files. When I had one jupyter notebook, I was able to run it automatically with one lambda function trigger and lifecycle configuration.
Now I have two jupyter notebooks and corresponding two lambda function triggers. How can I run them based on the trigger by changing the lifecycle configuration script.
The trigger is file uploading into S3. Based on what location the file is added, the corresponding jupyter notebook should run
It is not possible to change the Life Cycle configuration script on the fly. You need to stop the notebook instance and then change the script which might not be ideal.
I would recommend you to write a LifeCycle configuration script in such a way that it checks the location of the S3 bucket and based on that it will run the command of running a specific notebook.
It says it's a default environment but "You don't want to put programs into your base environment, though"
So what exactly should I use it for? Do other environments I create inherit from the base?
The base environment is where conda itself gets installed. It's best to use Miniconda, and install all the things you want into separate environments.
Other environments do not inherit packages from the base environment. BUT the bin/ directory of the base environment is in the search path for executables. So if you call conda from inside any of your environments (which usually don't have conda installed), the one from the base environment is used.
If you install other executables into the base environment, they can be called from your other environments. But you'll have a hell of a tough time to distinguish whether the things you can call are actually in your environment, or in the base environment.
Therefore, it's best to just have conda in the base environment. And maybe other generic tools, like git or make, if you install that kind of tool with conda. But packages that are imported by your Python/R/whatever code do not belong into the base environment.
Don't worry about disk space if you create multiple environments with the same packages. conda does a very good job with hard-linking the same packages into multiple environments to save space.
The full Anaconda installer puts a ton of stuff into the base environment. That might seem convenient at first, but when you start creating new environments, you'll run into the problem I mentioned. You can call stuff from your new environment although it isn't installed there. Using Miniconda avoids this, at the cost of having to create a new environment before actually being able to use stuff. However, there's an anaconda meta-package which you can install to get the "ton of stuff" with one command.
I am curious to know if there are methods where I can add additional packages to the anaconda installer. I am basically looking for a solution for creating an anaconda installer which has some extra python packages added along with it. Thus the participants to whom I give the installer need not be worrying about Internet connectivity or to add additional commands.
This is meant for an introductory hands-on python session. Hence the objective is to make the whole installation process less confusing as possible to the participants of the session.
I am aware of using docker as well as using environments. I am looking for something more simpler, say as seamless as anaconda installation for my participants.
Currently, I am thinking of doing the following.
1) Provide the .tar.gz file of the packages along with the installer
2) After installation and creating environments, install the libraries using pip from the .tar.gz
python -m pip install c:\mymodule\great.tar.gz
Any method which is simpler than the above one is welcome.
From the conda documentation on creating custom channels:
If you do not wish to upload your packages to the Internet, you can
build a custom repository served either through a web server or
locally using a file:// URL.
The instructions on that page tell you how to create a local custom repository from conda packages. They're aimed at people who are building their own packages but as far as I can see you can also use the existing packages that you can download from the repositories at https://repo.continuum.io/pkgs/.
You can then use the file:// URL of that repository in the -c specification of conda create and/or conda install commands to set up the environment for your users to work in.
at work we have a central, read-only, Linux Anaconda installation, and several projects need library packages for their individual project members.
Is there a way to conda install packages in a writable area set aside for each project?
Our Linux servers are also not directly web connected, but we can transfer data from a Windows machine that is. Is there a way for the windows conda to download data for our Linux install in such a way that I can transfer the downloaded files to Linux and then finish the install on Linux , with the conda linux not needing a direct web connection?
Thanks in advance :-)
The best answer to this question is a bit oblique: the Anaconda Distribution is designed for a single user on a single system with unrestricted access to the Internet. Any other use is considered "off label" and YMMV, though there are no license restrictions in place preventing you from trying to use it as you see fit. Anaconda Enterprise is the commercial product that is specifically designed for multi-user, server-deployed Anaconda with firewall restrictions. Security, governance, indemnification, support, collaboration, etc. etc. Check out https://www.continuum.io/ for more details.
But there are "work around" ways to achieve what you want, albeit complicated ones. For it to be reliable, reproducible, and maintainable you're going to end up reimplementing a lot of what is in Anaconda Enterprsie. Here are some tips:
Check out the "conda in multi-user environments" documentation
Check out the "Centralized Anaconda installation" documentation
Regular user alice for project foo can do conda create -p /nfs/project/foo/envs/custompython --offline anaconda; conda activate /nfs/project/foo/envs/custompython; conda install pkg1 pkg2 pkg3
You're going to run into ownership/permission issues. If you have sensible umask values then when alice's colleague bob tries to update pkg2 in the foo project he'll discover that he can't unlink the files alice wrote there. There is stuff you can do (as the IT admin) with chown, or alice can do with chmod, but its all a bit of a bother and there are lots of ways you can paralyze a conda environment because it is expecting "writability" to be binary for a particular environment. There is a long history in the conda GH issue tracker of people (myself included) shooting themselves in the foot by starting a conda env setup with one account and then making mods with another account that bork out half way through, leaving everything inconsistent.
Be careful about .condarc files. My advice: avoid them everywhere but in the base Anaconda installation (say, inside /opt/anaconda/.condarc). All sorts of weird stuff can happen when multiple overlaying .condarc files come together (the docs reference above discusses this).
People can create their own environments in an "offline" mode so long as the packages specified in those new environments (and their dependencies) are a subset of the packages available in the base environment (or subsequently added to the package cache), taking into account versions as well, of course.
You can download packages using your online Windows machine by grabbing them from repo.continuum.io and from anaconda.org. Make sure you download them for the right platform. But the challenge: you need to download a set of packages that will satisfy the dependencies of the package you want to install. There isn't a super easy way to get that information when you're offline.
Once you drop new packages into the Linux system's package cache be sure to re-run conda index.
Beware installing packages directly from their tarballs: this will not pick up any dependencies and does what is called a "force" install. So doing conda install /path/to/conda/pkg-ver.tar.bz2 is actually most similar to doing conda install --force --no-deps pkg=ver (though not identical, to be sure). --force means the install will happen NO MATTER WHAT, even if it will break your environment (violate existing package dependencies), and --no-deps means you won't get any of the dependencies of pkg installed.
I need to install software on Windows clients that are completely offline. That means they have no Internet access.
An example. Let's say I want to install Paint.Net. I go to a reference machine (with INet) and install Paint.Net with Chocolatey.
choco install paint.net -y
After the install is finished I have the software installed and two artifacts:
The package file "paint.net.nupkg" in %ChocolateyInstall%/lib/paint.net
and
the the installer file "paint.net.4.0.6.install.zip" in %Temp%\chocolatey.
I now put these two files on a USB stick. Then I go to the offline machine, plug in the USB stick and want to install the package.
Is it possible to install the software without modifying the package? I am aware that inside the nupkg file there is a tools/chocolateyInstall.ps1 file with a $url variable defined. But I want to install the package without changing the package content or modifying the URL by hand.
I played around with the parameters --cache and --source but with little to no luck.
I have seen that this kind of question is asked before. But never (to my knowledge) with the intend to run the installer file from the stick too (and not only the package file). So I hope this is not a duplicate.
Caching Downloads - Not Deterministic
While there are ways to set the original nupkg (with the version on it, not the one in the packages directory - use download from left side of package's page on the Chocolatey community package repository) and the cache onto a USB stick somewhere, it's not always deterministic that it will work. You can also override the cache location, so that the folder is somewhere not in TEMP. See choco config, choco config -h and choco config set cacheLocation c:\some\location to do this.
Create Your Own Packages - Better
For packages you need offline, you have the ability to manage your own packages and you can embed software right into the package. This is desired when you want to manage software offline as most things on the community repository are subject to copyright law and distribution rights (why they don't simply have the software they represent embedded).
Creating and working with your own packages is very secure, reliable, and repeatable (and can be completely offline), but it does tend to take up time. If you are doing this for yourself, then it could override any time-savings you get as a consumer using Chocolatey and the community repository.
Internalized Packages - Best
The best thing you can do here is a process called internalizing, where you download and extract the package, download all of the resources and embed them in the package (or put them somewhere local/UNC share), edit the scripts to use those embedded/local resources and recompile the package.
This allows you to take advantage of existing package logic without the issue of the internet.
For more details see Recompiling Packages and Package Internalizer - Automatically Recompile Packages.
NOTE: As a side note, we are thinking of offering the ability to auto recompile with Chocolatey Pro edition and not just the Business edition.
Organization Use of Chocolatey
Most organizations using Chocolatey are doing some combination of creating packages and recompiling packages, because they need absolute trust and control over those packages when being used in production scenarios.