NPM caching similar to a local Maven cache - maven

Gradle's dependency management system stores downloaded artifacts in a local Maven cache. When a build requests that same dependency again the dependency is simply retrieved from the cache, avoiding any network transfer of the artifact.
I'm trying to replicate this behavior with NPM for building JavaScript projects. I was expecting NPM to support a global node_modules cache, but installing a package "globally" in NPM has a different meaning => the package is added to PATH so that it can be used as a CLI tool.
Reading the documenation for npm install, the standard behavior is to install packages into a local node_modules directory. But this would mean many duplicated packages on the system wasting valuable disk space. It also poses a problem for doing clean production builds, since ideally the node_modules should be blown away each time.
Does NPM support something like the Gradle's Maven caching? Documentation on NPM cache doesn't make it any clearer how this is to be used. What's more, it's not obvious if a caching strategy with NPM is safe across multiple parallel builds.
This seems like such a basic requirement for busy CI environments that it must have been solved before. I found the npm-cache tool which seems to offer this support, but it would be much better if caching was supported natively in npm itself.
Thanks!

IMHO it is a pity that the makers did not learn from things like maven that have already been there. If you are doing microservices and have many apps on your machine and you might also have multiple branches or a local jenkins you will have each dependency N*M times on the disk what is an extraordinary waste of disc-space and performance. So you have to be aware that Java or .NET/C# are mature ecosystems while the JavaScript ecosystem is still in the childhood with lots of flaws and edges. But JavaScript is evolving fast so lets hope for the best. Feel free to discuss your pain with the npm makers (https://github.com/npm/npm/issues/).
However, a partial cure comes if you go away from npm and switch to yarn: http://yarnpkg.com/

NPM Cache already comes bundled with NPM out of the box(listed under cli commands). And its main utility is to avoid the network transfer of the same package over and over.
Regarding the duplicate packages issue, as of npm v3 there has been an effort in terms of finding ways to deduplicate dependencies. But it still does not work exactly like Gradle since it is still possible to end up with duplicates of the same package in your node_modules folder.
Per NPM documentation:
Your node_modules directory structure and therefore your dependency tree are dependant on install order
Although a fresh npm install from the same package json always produces the same dependency tree:
The npm install command, when used exclusively to install packages from apackage.json, will always produce the same tree. This is because install order from a package.json is always alphabetical. Same install order means that you will get the same tree.
So at least there is a way to get consistent dependency trees, albeit there's no guarantee it will be the most efficient one. At least those differences do not interfere correct functioning of NPM.
Hope that helps.

Related

Why and when does yarn decide not to hoist a package in a workspace?

I'm working on a large project using yarn workspaces. I know that yarn workspaces essentially does two things
It automates the symlinking process we had to do manually years ago when we want to share private packages
It hoists all similar packages at the top in node_modules in order to be more efficient.
However, I have noticed that my packages still contain code in their own node_modules and I'm not sure why. When I make a sample monorepo app and say I install lodash in one, it goes straight to the root node_modules.
Why and when does yarn decide to install a package inside a package's node_modules ?
I found the answer on yarn's Discords. yarn will always hoist unless it would conflict with another version.

Automate updating outdated dependencies in CI/CD using `yarn outdated`

My team is developing a React component library which relies on MaterialUI components. The customer of our employer wants to automatically signal and/or upgrade outdated dependencies (specifically when the dependency on MaterialUI becomes outdated at least). We are using yarn as dependency manager.
I found yarn lists all the outdated dependencies (or a specific dependency if specified) through the yarn outdated command. One can then upgrade said dependencies using the yarn upgrade command to which the dependency to be updated is supplied as parameter. To do this using a single command, running yarn upgrade-interactive lists outdated dependencies which the user can then select to be updated.
I am wondering if there is/are way(s) to automate this process. I tried piping the results of yarn outdated to yarn update as well as yarn version, but yarn upgrade seems to ignore whatever input it receives and updates every package regardless and yarn version throws errors saying the version are not proper semvers.
I realise yarn upgrade-interactive makes this process easy and quick for developers, however the project is intended to become open-source over time and the customer prefers a centralised solution rather than relying on every individual contributor to track this themselves. As far as I am aware, yarn upgrade-interactive cannot be automated as it requires user input in order to select the package(s) to be updated.
Other solutions I found, such as Dependabot or packages like 'yarn-outdated-notifier', seem to only work with GitHub. The project is currently running on Azure DevOps and, when it goes public, will run on GitLab.
Is there any way we could do this in our CI/CD environment or with any (free) solutions? The customer prefers to have as few dependencies as possible.

How does Travis CI cache Gradle dependencies?

In Travis documentation about caching dependencies, it mentions:
The cache’s purpose is to make installing language-specific dependencies easy and fast, so everything related to tools like Bundler, pip, Composer, npm, Gradle, Maven, is what should go into the cache.
Large files that are quick to install but slow to download do not benefit from caching, as they take as long to download from the cache as from the original source:
I am using Gradle in my Java project.
It seems what Gradle caches is those .jar files, which should fall in the category "quick to install".
So my question is, why Travis recommends caching Gradle dependencies if .jar files are quick to install but slow to download?
Where does the benefits (in terms of shorter build time) come from?
It's a good question. I'm not sure about the benefits of cache usage because I never measured the download time of S3, but it's probably faster.
At the end of the linked page they explain:
If you store archives larger than a few hundred megabytes in the
cache, it’s unlikely that you’ll see a significant speed improvement.
It seems that they consider faster to cache a lot of small files than downloading them independently.
Gradle files fit in this category are quick to install and FAST to download.
They don't recommend to use cache for quick to install files and SLOW to download like the system images of 1GB of Android.
In my opinion, they say this because you are hurting their S3 quotas (I have no idea about the terms of this service) for a negligible benefit for you in this case.

How to disable the removal of unused packages in composer?

I have many branches in git with different set of packages in composer.json
After each git checkout I need to do composer install and composer starts to download missing packages. In that moment, composer removes packages that are needed for other branch. And when I will checkout to other branch, I will need to download that packages again. When it comes to packages such as PHPUnit, Codeception or other frameworks, it takes a very long time.
Is it possible to disable the removal of unused packages in composer?
(I have met this feature in bower or npm.)
Thank you.
Right now this is not supported, as install just performs the actions needed to comply to the project requirements. As technically in your case the requirements change, its behavior is correct. While the feature could be implemented in Composer it's not trivial, as it's 'unnatural' behavior that is quite low-level to hack.
However I think the real issue here is that your workflow is not correct. If different branches in Git have wildly different dependencies it is first of all doubtful that they should really be branches, and not entirely different repositories as they're really different projects then.
If that is not the case the easiest solution is just to clone the repository multiple times, and keep the different clones at their respective branches. That solves all your problems immediately and lets Composer do its work like it was intended. This is also a very common workflow in bigger projects, as in-place branch switching is really only practical for short-lived branches like PR's and feature branches.

What is the best way to save composer dependencies across multiple builds

I am currently using atlassian bamboo build server (cloud based, using aws) and have an initial task that simply does a composer install.
this single task can take quite a bit of time which can be a pain when developers have committed multiple times giving the build server 4 builds all downloading dependencies (these are not parallel).
I wish to speed this process up but canot figure out a way in which to save the dependancies to a common location for use across multiple builds which still allowing the application to run as intended (laravel)
Answer
Remove composer.lock from your .gitignore
Explanation
When you run compose install for the first time, composer has to check all of your dependencies (and their dependencies etc.) or compatibility. Running through the whole dependency tree is quite expensive, which is why it takes so long.
After figuring out all of your dependencies, composer then writes the exact versions it uses into the composer.lock file so that subsequent composer install commands will not have to spend that much time running through the whole graph.
If you commit your composer.lock file it'll come along to your bamboo server. The composer install command will be waaaayy faster.
Committing composer.lock is a best practice regardless. To quote the docs:
Commit your application's composer.lock (along with composer.json) into version control.
This is important because the install command checks if a lock file is present, and if it is, it downloads the versions specified there (regardless of what composer.json says).
This means that anyone who sets up the project will download the exact same version of the dependencies. Your CI server, production machines, other developers in your team, everything and everyone runs on the same dependencies, which mitigates the potential for bugs affecting only some parts of the deployments.

Resources