What is "CacheGroup" for in webpack - caching

It's very hard for me to understand why it is called cacheGroup. What is that being cached? What will be grouped? Modules? How does webpack group modules according to the rules the cacheGroup sets.

Well, after 3 days' reading posts/docs on the internet, I seem to understand that design intention for webpack.
A doc by sokra on GITHUB says:
The optimization assigns modules to cache groups
So it IS module which would be grouped.
The new chunk(s) generated by a cacheGroup's rules would be related to the all the original chunk(s) by ChunkGroup according to the post by Tobias Koppers on medium. Here original means the chunk the new chunk should be placed before splitting.
There exists a graph of chunks, by which webpack would emit assets.

Related

Applying different parsefilters to each domain in the same topology

I am trying to crawl different websites (e-commerce websites) and extract specific information from the pages of each website (i.e. product price, quantity, date of publication, etc.).
My question is: how to configure the parsing since each website has a different HTML layout which means I need different Xpaths for the same item depending on the website? Can we add multiple parser bolts in the topology for each website? If yes, how can we assign different parsefilters.json files to each parser bolt?
You need #586. At the moment there is no way to do it but to put all your XPATH expressions regardless of the site you want to use them on in the parsefilters.json.
You can't assign different parsefilters.json to the various instances of a bolt.
UPDATE however you could have multiple XpathFilters sections within the parseFilters.json. Each could cover a specific source, however, there is currently no way of constraining which source a parse filter gets applied to. You could extend XPathFilter so that it takes some extra config e.g. regular expression a URL must match in order to be applied. That would work quite nicely I think.
I've recently added JsoupFilters which will be in the next release. These should be useful for your use case but that still doesn't solve the issue that you need an implementation of the filter that organizes the resources per host. It shouldn't be too hard to implement taking the URL filter one as a example and would also make a very nice contribution to the project.

d3/cola: Layout configuration for UML-like diagram

I am trying to build a graphql schema visualizer using something other than viz.js (the library is too large and adds 1MB to the bundle). I was recommended webcola and it seems to be a very powerful library.
I have gotten to a point where the necessary elements are being rendered and linked correctly. My next step is to get the layout right. I would like to do something similar to graphql-voyager (uses viz.js).
Here is a codesandbox of what I have so far:
graphql-diagram
EDIT: My question is, how could I lay out what I have similarly to graphql-voyager? I would like help setting the right constraints and applying whichever algorithm necessary to position the nodes and routing the edges accordingly.
GraphQL Voyager author here :)
Before switching to viz.js we tried lots of other possible solutions for almost a month. Here is the article about our journey: https://medium.freecodecamp.org/how-we-got-1-500-github-stars-by-mixing-time-tested-technology-with-a-fresh-ui-b310551cba22
TL;DR; Graph drawing is rocket-science
Moreover, since Voyager release (2 years ago), we evaluated even more libraries with the exact same result.
As a side project, we are working on the Graphviz fork aggressively shrunk to just meet Voyager requirements. Our end goal is to rewrite the required parts in pure JS and embed it directly into Voyager.
ATM it's in early PoC stage and we are not ready to release it yet.

Should Normalize.css be kept as separate file or compiled (through postcss #import) into the final "styles.css" file?

In terms of performance/speed of the final product and according to the best practices - should Normalize.css be kept as separate file (linked from HTML head) or is it better to compile it into the final .css file?
I was searching here and on many other websites but couldn't find an answer. Hopefully, you'll understand my dilemma with this:
1. Leave normalize.css in node-modules folder and link to it from our html.
I'm still fresh into coding, but if I understand correctly with this approach we will add one more (maybe unnecessary?) request to the server in addition to our main.css file? How bad is it or how taxing is it on performance/loading time of website?
<link rel="stylesheet" href="../node_modules/normalize.css/normalize.css">
<link rel="stylesheet" href="temp/styles/styles.css">
On the other hand, we can:
2. use 'postcss-import' to import normalize.css with the other modules and compile them all together into one final .css file.
Ok, now we have everything in one place, but we have just added 451 lines of code (and comments) before the first line our our actual css. In terms of readability it doesn't seem like the best solution to me, but is website going to load a bit faster now?
Disclaimer: I've been using the second approach so far, but I started asking myself if that is the optimal solution.
Thank you in advance.
You are quite correct in stating that a web page will load faster if it makes fewer requests to the server when loading. You are also correct in stating that the combined file is less readable than the individual files loaded separately.
Which is more important to you in your situation is a question only you can answer. That is why you are having a hard time finding definitive advice.
Personally I use the separate file option in development so that the files are easy to read and debug. Speed of loading isn't as important on a development machine.
In production websites I use the combined file option. In fact, I use combine and minify to reduce the number of files loaded and keep the size of those files as small as possible. Readability is less important in this situation.
Ideally adding normalize.css to your final css would be done in a post processing step that combines all of your source files into one file and minifies the whole thing. That way your source is still readable but you end up only loading one file.

How does Hugo maintain site-wide data, like .Site.AllPages?

I'm looking for some bite-sized examples on how Hugo might be managing site-wide data, like Site.AllPages.
Specifically, Hugo seems too fast to be reading in every file and it's metadata, before beginning to generate pages and making things like .Site.AllPages available -- but obviously that has to be the case.
Are Ruby (Jekyll) and Python (Pelican) really just that slow, or is there some specific (algorithmic) method that Hugo employs to generate pages before everything is ready?
There is no magic, and Hugo does not start any rendering until the .Site.Pages etc. collections are filled and ready.
Some key points here:
We have a processing pipeline where we do concurrent processing whenever we can, so your CPUs should be pretty busy.
Whenever we do content manipulation (shortcodes, emojis etc.), you will most likely see a hand crafted parser or replacement function that is built for speed.
We really care about the "being fast" part, so we have a solid set of benchmarks to reveal any performance regressions.
Hugo is built with Go -- which is really fast, and have a really great set of tools for this (pprof, benchmark support etc.)
Some other points that makes the hugo server variant even faster than the regular hugo build:
Hugo uses a virtual file system, and we render directly to memory when in server/development mode.
We have some partial reloading logic in there. So, even if we render everything every time, we try to reload and rebuild only the content files that have changed and we don't reload/rebuild templates if it is a content change etc.
I'm bep on GitHub, the main developer on Hugo.
You can see AllPages in hugolib/page_collections.go.
A git blame shows that it was modified in Sept. 2016 for Hugo v0.18 in commit 698b994, in order to fix PR 2297 Fix Node vs Page.
That PR references the discussion/improvement proposal "Node improvements"
Most of the "problems" with this gets much easier once we agree that a page is just a page that is just a ... page...
And that a Node is just a Page with a discriminator.
So:
Today's pages are Page with discriminator "page"
Homepage is Page with discriminator "home" or whatever
Taxonomies are Pages with discriminator "taxonomy"
...
They have some structural differences (pagination etc.), but they are basically just pages.
With that in mind we can put them all in one collection and add query filters on discriminator:
.Site.Pages: filtered by discriminator = 'page'
*.Site.All: No filter
where: when the sequence is Pages add discriminator = 'page', but let user override
That key (the discriminator) allows to retrieve quickly all 'pages'.

How do Minify, mod_pagespeed... handle merging javascript files

Say that on page1.html I need mootools.js and main.js... I guess that these tools should generate one minified js file (say min1.js).
Then on page2.html I need mootools.js, main.js AND page2.js... Do those tools serve min1.js (already cached by browser) and page2.js ? Or do they combine these 3 .js files and serve the resulting minified file which need to be fully cached again by the browser ?
Thank you
Assuming you are using the Apache module mod_pagespeed because you tagged the question with it but didn't mention if you are or not...
If you turn on ModPagespeedEnableFilters combine_javascript (which is disabled by default), it operates on the whole page. According to the documentation:
This filter generates URLs that are essentially the concatenation of
the URLs of all the CSS files being combined.
page1.html would combine mootools.js, main.js, and page1.js; and page2.html would be mootools.js, main.js, and page2.js.
To answer your question then, yes it will cache several copies of the repeated JavaScript files.
However,
By default, the filter will combine together script files from
different paths, placing the combined element at the lowest level
common to both origins. In some cases, this may be undesirable. You
can turn off the behavior with: ModPagespeedCombineAcrossPaths off
If you leave this behavior on, and put the files spread out across paths that you want combined, you could keep them separate so that common scripts will be combined as one and individual scripts would be combined on their own. This would keep the duplication of large, common libraries down.

Resources