I am new to Python and starting work on a large project that will be distributed to users. I am also the first in my company to be using, and I wanted to get recommendations on the best way to install Python & packages, so that I don't head off in the wrong direction.
I require data analysis frameworks (pandas, numpy, scipy, matplotlib, statsmodels, pymongo) and my initial approach was to install Python 3.5 directly, and then use pip install on each package.
I ran into similar problems that others have found [Unable to find vcvarsall], and resolved. Next problem was with BLAS and LAPACK missing when installing scipy. At this point I decided Anaconda was the way to go, rather than individual pip installs, and was easily able to set everything up.
One problem with Anaconda is that it installs a lot of packages which I will never use, and may not have some which I would like to use in future, e.g. TensorFlow (presumably can do pip install to get extra ones that are not included?).
An in-between solution seems to be Miniconda, which I believe would have fixed the BLAS/LAPACK problem with scipy.
So my question is: can someone with experience of developing data analysis projects in Python, that will be deployed to users' Windows desktops, and with server-side components running on Linux, provide recommendation of what they would do if starting from scratch at new organization?
(I'm currently in favour of heading down the Anaconda route.)
Personally, I think Anaconda(conda) is better. First of all, conda is cross-platform package manager, and it is easy to install and use. Second, conda has functionality of virtualenv, and you can use conda create to create environment. Finally, there is Anaconda cloud and condo-forge, those community can help you solve conda issue, build packages, and share ideas.
Moreover, Anaconda(conda) indeed install a lot of packages, but those are all dependencies. For example, when you "conda install scikit-learn", conda will automatically help you install the dependency, numpy and spicy.
Related
I would like to replace as many packages on my computer with the corresponding chocolatey packages, so they can be upgrade automatically.
Is there a possibility to scan the installed Apps and point out which of them have a chocolatey equivalent?
Thanks a bunch!
Yes, but it's probably not what you want to hear.
You can do this with the Package Synchronization feature, but this feature requires a Chocolatey for Business license (C4B). Automatic Synchronization is a similarly named feature (all paid licenses have it), but this only removes packages for which the related software was uninstalled outside of Chocolatey.
With the free version, you will have to instead synchronize your package state manually.
Note: I don't recommend doing this for packages you don't maintain on the community feed. The likelihood of getting malware is low, but I'd be more concerned with a poor search term causing the wrong package to get installed instead, or accidentally installing a less "official" package maintained by someone who is not as diligent with updates or has abandoned the package.
However, this should be a perfectly safe procedure for packages you develop and maintain (and in reality you'll probably know all the package ids and versions anyways, so you'll skip straight to step 3). Doubly so if you are installing from a private feed you or your organization controls.
Query your installed programs from Windows. Take note of the version you have installed so you can install the correct version.
Do a package search for each one, recording the package ID for each one.
choco list --order-by-popularity --version VERSION should help you avoid less official or less maintained packages for the same software, and get you the correct package version. Top of the list is the most popular.
This is not perfect as some software really only gets installed by a single version of the package, but either self updates or pulls from a latest URL. In these cases the package version is not usually updated or accurate.
Install each software per package ID you have. Do this one command at a time so you can specify the correct version.
choco install -n skips running the installation PowerShell script so it effectively only "imports" the package for management without performing the install.
I am wondering if there is an easy way to install all packages that reside in a Conda Channel. To be specific I would like to install all packages from the ESRI channel.
Thank you.
No, and all is likely impossible anyway. The packages there range widely in date since last updated, so I highly doubt one could install every package into the same environment.
I'm new to miniconda and anaconda. I just wanted to get an opinion on anaconda vs miniconda in the hope of finding out what's better for my needs.
Currently i've got miniconda installed and everytime I want to work on a project I have to create a new environment inside a project folder.
Before I download and intall it I wanted to know whether if I install it, whether I would have to create an environment for each project, or will they all work at a system level.
If I have to create an environment even in anaconda, then there's no point me installing it as well.
Thanks.
Currently i've got miniconda installed and everytime I want to work on a project I have to create a new environment inside a project folder. Yes, that's basically one of the two functions of Conda, it's a good thing.
Before I download and intall it I wanted to know whether if I install it, whether I would have to create an environment for each project, or will they all work at a system level. That sounds like a bad idea!
If I have to create an environment even in anaconda, then there's no point me installing it as well. What is the point of using Conda, rather?
Take a look at some of the questions with the anaconda and conda tags, there are mountains of people encountering issues which stem from using a single environment for everything.
As for the question of Miniconda vs Anaconda, I stick to using the former, and get all my packages from the popular conda-forge channel.
I wanted to install numpy in python 2.7 without setting environment path. I do not know if that is possible or not but my Professor wants it like so please any advice would be appreciated.
I am not sure I understand your question correctly. You can simply delete python from your environment path. But normally this is not desirable since you then cannot call python from any directory. Better is to create a virtual environment. Or better use: anaconda. This will allow you to use various version of pythons in separate environments without any confusion or clashes between versions. You then install the respective numpy version within a specific environment. See: https://conda.io/docs/user-guide/tasks/manage-python.html
If you mean want to install numpy but you do not have the previleges then your answer can be found here: (Python) Use a library locally instead of installing it
I hope this helps. If not, then please clarify your question.
For m, on OS X, conda update --all often downgrades libraries - along with updating many.
Is this usual? Or something possibly in my setup?
Earlier this year, it was pillow for many months.
Surprisingly, today it was several of the HDF5 related libraries, numba and llvmlite.
So conda update numba brings numba back to the most recent version, and so on with the other 8 libraries, but why doesn't conda update --all do this anyway?
It's a compatibility issue. Anaconda is a stable set of packages. When you update Anaconda, you update to this stable list.
However, when you update individual packages, they might cause incompatibility issues with the rest of the Anaconda distribution so they aren't considered stable. That's why when you use conda update --all, it gets you to the latest stable Anaconda distribution, which might or might not have the version of the individual package you wanted.
See here: https://github.com/ContinuumIO/anaconda-issues/issues/39
Edit: This behavior has changed. It now tries to increase the version of all packages (except Python between major/minor version) such that no packages will be incompatible with each other.
See here: http://continuum.io/blog/advanced-conda-part-1#conda-update-all
Some libraries depend on specific lower versions for compatibility purposes. conda update --all will try to update packages as much as possible, but it always maintains compatibility with the version restrictions in each package's metadata. Note that the anaconda package does not come into play here (assuming you have a recent version of conda), because conda update --all ignores it.
Unfortunately, it's not always easy to see what depends on what, but there are some tricks. One way is to pin each package to a version you want and running conda update --all. It should generate an unsatisfiability hint that will give you an idea of what is causing the problem. Another way is to search through the package metadata.
For numba, I can suggest that the problem is likely related to numbapro. There are a few packages that depend on hdf5. You can use conda info <package> to see the dependencies of a package (like conda info h5py).