How can I use Cypress to crawl a website for links? - cypress

I understand that Cypress is designed for e2e testing and is not a generic browser automation tool. However, I'm wondering if it's possible to use Cypress to log into a website and crawl through it pulling all the hrefs, thus building a list of the pages you'd like to test.
For a large website, this seems almost necessary for certain kinds of e2e tests, but I'm stuck trying to implement it. Here's what I have:
describe('Link crawler', () => {
const linksQueue = ['www.example.com/'];
const seen = {};
before(() => {
cy.login(email, password);
});
// Build a queue, typical BFS algorithm.
// For all links in the queue, pull out the anchor tags and add new hrefs to the queue.
// Mark links as seen so you don't infinitely loop.
while (linksQueue.length) {
let currentLink = linksQueue.pop();
it(`${currentLink} should have links.`, () => {
cy.visit(`${currentLink}`);
cy.window().then(win => {
let anchorTags = win.document.getElementsByTagName('a');
for (let idx = 0; idx < anchorTags.length; ++idx) {
let newLink = anchorTags[idx].href;
if (!(newLink in seen)) {
linksQueue.unshift(newLink);
}
seen[newLink] = true;
}
});
});
}
});
The problem with the above is that Cypress only processes what's in the queue to begin with, so this will run and extract links, but only on 'www.example.com/'.
How can I use Cypress to work over a queue of links that continues to grow? Is there something else I can use besides cy.window?
I've made this work using Puppeteer, but it would be great to use a single library and Cypress is my team's tool of choice for e2e.

Related

How to proper Conditional Testing

I just want to ask how to properly conditional testing? I have this code here
cy.get('[data-slug="add-to-any"] > .plugin-title > strong').then(($slug) => {
if (expect($slug).not.to.exist){
//It will passed
}else if (expect($slug).to.exist){
cy.get('#deactivate-add-to-any').should('not.exist')
}
I assert the element to not.to.exist, but it gives me this error
Expected to find element: [data-slug="add-to-any"] > .plugin-title > strong, but never found it.
I am really lost what assertions I need to use.
The ideal way (if it works in your scenario) is to shift the last selector inside the .then()
cy.get('[data-slug="add-to-any"] > .plugin-title')
.then(($pluginTitle) => {
const $slug = $pluginTitle.find('strong'); // this finds with jQuery
// which won't fail the test
// if not found
if ($slug.length === 0) { // not found
} else {
cy.get('#deactivate-add-to-any').should('not.exist')
}
})
It's not 100% fool-proof, if $slug is loaded asynchronously (say via fetch) it won't be there immediately and the test might pass when in fact the $slug turns up 100 ms after the test runs.
You need to understand the way the app works to really be sure.
Cypress docs show this pattern, using <body> as the "stable" element (always present after page load).
cy.get('body').then($body => {
const slug = $body.find('[data-slug="add-to-any"] > .plugin-title > strong')
if ($slug.length) {
...
It's less than ideal because the page might have <body> but still be fetching elements inside it.
Best practice IMO is to try the immediate parent element of the conditional one. If that is also conditional, move up the element tree until you find an element that is stable/present at that point in you test.
Or add a guard condition that waits for page fetch to complete. A cy.intercept() is useful for that, or even just this
cy.get('[data-slug="add-to-any"] > .plugin-title')
.should('be.visible') // retries until .plugin-title is showing
.then(($pluginTitle) => {
const $slug = $pluginTitle.find('strong')
if ($slug.length === 0) {
...
Simple example
cy.get("body").then($body => {
if ($body.find('[data-slug="add-to-any"] > .plugin-title').length > 0) {
cy.get('[data-slug="add-to-any"] > .plugin-title').then($title => {
if ($title.is(':visible')){
//you get here only if it EXISTS and is VISIBLE
}
});
} else {
//you get here if the it DOESN'T EXIST
cy.get('#deactivate-add-to-any').should('not.exist')
}
});

How to best implement a Promise semaphore?

I use a semaphore for two processes that share a resource (rest api endpoint), that can't be called concurrent. I do:
let tokenSemaphore = null;
class restApi {
async getAccessToken() {
let tokenResolve;
if (tokenSemaphore) {
await tokenSemaphore;
}
tokenSemaphore = new Promise((resolve) => tokenResolve = resolve);
return new Promise(async (resolve, reject) => {
// ...
resolve(accessToken);
tokenResolve();
tokenSemaphore = null;
});
}
}
But this looks too complicated. Is there a simpler way to achieve the same thing?
And how to do it for more concurrent processes.
This is not a server side Semaphore. You need interprocess communication for locking processes which are running independently in different threads. In that case the API must support something like that on the server side and this here is not for you.
As this was the first hit when googling for "JavaScript Promise Semaphore", here is what I came up with:
function Semaphore(max, fn, ...a1)
{
let run = 0;
const waits = [];
function next(x)
{
if (run<max && waits.length)
waits.shift()(++run);
return x;
}
return (...a2) => next(new Promise(ok => waits.push(ok)).then(() => fn(...a1,...a2)).finally(_ => run--).finally(next));
}
Example use (above is (nearly) copied from my code, following was typed in directly and hence is not tested):
// do not execute more than 20 fetches in parallel:
const fetch20 = Semaphore(20, fetch);
async function retry(...a)
{
for (let retries=0;; retries++)
{
if (retries)
await new Promise(ok => setTimeout(ok, 100*retries));
try {
return await fetch20(...a)
} catch (e) {
console.log('retry ${retries}', url, e);
}
}
}
and then
for (let i=0; ++i<10000000; ) retry(`https://example.com/?${i}`);
My Browser handles thousands of asynchronous parallel calls to retry very well. However when using fetch directly, the Tabs crash nearly instantly.
For your usage you probably need something like:
async function access_token_api_call()
{
// assume this takes 10s and must not be called in parallel for setting the Cookie
return fetch('https://api.example.com/nonce').then(r => r.json());
}
const get_access_token = Semaphore(1, access_token_api_call);
// both processes need to use the same(!) Semaphore, of course
async function process(...args)
{
const token = await get_access_token();
// processing args here
return //something;
}
proc1 = process(1);
proc2 = process(2);
Promise.all([proc1, proc2]).then( //etc.
YMMV.
Notes:
This assumes that your two processes are just asynchronous functions of the same single JS script (i.E. running in the same Tab).
A Browser usually does not open more than 5 concurrent connects to a backend and then pipelines excess requests. fetch20 is my workaround for a real-world problem when a JS-Frontend needs to queue, say, 5000 fetches in parallel, which crashes my Browser (for unknown reason). We have 2021 and that should not be any problem, right?
But this looks too complicated.
Not complicated enough, I'm afraid. Currently, if multiple code paths call getAccessToken when the semaphore is taken, they'll all block on the same tokenSemaphore instance, and when the semaphore is released, they'll all be released and resolve roughly at the same time, allowing concurrent access to the API.
In order to write an asynchronous lock (or semaphore), you'll need a collection of futures (tokenResolvers). When one is released, it should only remove and resolve a single future from that collection.
I played around with it a bit in TypeScript a few years ago, but never tested or used the code. My Gist is also C#-ish (using "dispoables" and whatnot); it needs some updating to use more natural JS patterns.

Protractor dealing with promises and arrays in flow control

I'm working on some Jasmine end-to-end testing, using Protractor test runner. The application I am testing is a simple webpage. I already have a test scenario that works fine.
Now I'd like to improve my code so that I can use the same script to run the testing scenario twice.
The first time: the test would be performed on the English version of the page
The second time: on a translated version of the same page.
Here is my code:
var RandomSentenceInThePage = ["Sentence in English", "Phrase en Francais"];
var i;
var signInButton;
var TranslationButton;
var RandomSentenceInThePageBis;
i = 0;
//Runs the testing scenario twice
while (i < 2) {
describe('TC1 - The registration Page', function() {
//the translation is done on the second iteration
if (i != 0) {
beforeEach(function() {
browser.ignoreSynchronization = true;
browser.get('https://Mywebsite.url.us/');
//we get the translation button then click on it
TranslationButton = element(by.css('.TranslationButtonClass'));
TranslationButton.click();
});
}
//On the first iteration, we run the test on the not translated page…
Else {
beforeEach(function() {
browser.ignoreSynchronization = true; //Necessary for the browser.get() method to work inside the it statements.
browser.get('https://Mywebsite.url.us/');
});
}
it('should display the log in page', function() {
//Accessing the browser is done in the before each section
signInButton = element(by.css('.SignInButtonClass'));
signInButton.click();
RandomSentenceInThePageBis = element(by.css('.mt-4.text-center.signin-header')).getText();
/*******************[HERE IS WHERE THE PROBLEM IS]*******************/
expect(RandomSentenceInThePageBis.getText()).toEqual(RandomSentenceInThePage[i]);
});
/*******************************************************************/
});
}
I have highlighted the problematic section. The code keeps running even before the comparison between RandomSentenceInThePage[i] and RandomSentenceInThePageBis are compared. And when they are finally compared, the loop is already done.
According to what I have seen on the other related topics, because of the use of expect statements and getText() methods, I am dealing with promises and I have to wait for them to be resolved. After trying for the whole day, I think I could use a hint on how to deal with this promise resolution. Let me know if you need more information.
Change while loop to for loop and declare the variable: i by let, rather than var
let can declare variable at code block scope like for, if block etc. But var can't.
Because protractor api execute async, thus when the expect()... execute for the second time. the value of i has become 2, not 1
for(let i=0;i<2;i++) {
describe('TC1 - The registration Page', function() {
....
})
}

How to stub Fluture?

Background
I am trying to convert a code snippet from good old Promises into something using Flutures and Sanctuary:
https://codesandbox.io/embed/q3z3p17rpj?codemirror=1
Problem
Now, usually, using Promises, I can uses a library like sinonjs to stub the promises, i.e. to fake their results, force to resolve, to reject, ect.
This is fundamental, as it helps one test several branch directions and make sure everything works as is supposed to.
With Flutures however, it is different. One cannot simply stub a Fluture and I didn't find any sinon-esque libraries that could help either.
Questions
How do you stub Flutures ?
Is there any specific recommendation to doing TDD with Flutures/Sanctuary?
I'm not sure, but those Flutures (this name! ... nevermind, API looks cool) are plain objects, just like promises. They only have more elaborate API and different behavior.
Moreover, you can easily create "mock" flutures with Future.of, Future.reject instead of doing some real API calls.
Yes, sinon contains sugar helpers like resolves, rejects but they are just wrappers that can be implemented with callsFake.
So, you can easily create stub that creates fluture like this.
someApi.someFun = sinon.stub().callsFake((arg) => {
assert.equals(arg, 'spam');
return Future.of('bar');
});
Then you can test it like any other API.
The only problem is "asynchronicity", but that can be solved like proposed below.
// with async/await
it('spams with async', async () => {
const result = await someApi.someFun('spam).promise();
assert.equals(result, 'bar');
});
// or leveraging mocha's ability to wait for returned thenables
it('spams', async () => {
return someApi.someFun('spam)
.fork(
(result) => { assert.equals(result, 'bar');},
(error) => { /* ???? */ }
)
.promise();
});
As Zbigniew suggested, Future.of and Future.reject are great candidates for mocking using plain old javascript or whatever tools or framework you like.
To answer part 2 of your question, any specific recommendations how to do TDD with Fluture. There is of course not the one true way it should be done. However I do recommend you invest a little time in readability and ease of writing tests if you plan on using Futures all across your application.
This applies to anything you frequently include in tests though, not just Futures.
The idea is that when you are skimming over test cases, you will see developer intention, rather than boilerplate to get your tests to do what you need them to.
In my case I use mocha & chai in the BDD style (given when then).
And for readability I created these helper functions.
const {expect} = require('chai');
exports.expectRejection = (f, onReject) =>
f.fork(
onReject,
value => expect.fail(
`Expected Future to reject, but was ` +
`resolved with value: ${value}`
)
);
exports.expectResolve = (f, onResolve) =>
f.fork(
error => expect.fail(
`Expected Future to resolve, but was ` +
`rejected with value: ${error}`
),
onResolve
);
As you can see, nothing magical going on, I simply fail the unexpected result and let you handle the expected path, to do more assertions with that.
Now some tests would look like this:
const Future = require('fluture');
const {expect} = require('chai');
const {expectRejection, expectResolve} = require('../util/futures');
describe('Resolving function', () => {
it('should resolve with the given value', done => {
// Given
const value = 42;
// When
const f = Future.of(value);
// Then
expectResolve(f, out => {
expect(out).to.equal(value);
done();
});
});
});
describe('Rejecting function', () => {
it('should reject with the given value', done => {
// Given
const value = 666;
// When
const f = Future.of(value);
// Then
expectRejection(f, out => {
expect(out).to.equal(value);
done();
});
});
});
And running should give one pass and one failure.
✓ Resolving function should resolve with the given value: 1ms
1) Rejecting function should reject with the given value
1 passing (6ms)
1 failing
1) Rejecting function
should reject with the given value:
AssertionError: Expected Future to reject, but was resolved with value: 666
Do keep in mind that this should be treated as asynchronous code. Which is why I always accept the done function as an argument in it() and call it at the end of my expected results. Alternatively you could change the helper functions to return a promise and let mocha handle that.

React renderToString() Performance and Caching React Components

I've noticed that the reactDOM.renderToString() method starts to slow down significantly when rendering a large component tree on the server.
Background
A bit of background. The system is a fully isomorphic stack. The highest level App component renders templates, pages, dom elements, and more components. Looking in the react code, I found it renders ~1500 components (this is inclusive of any simple dom tag that gets treated as a simple component, <p>this is a react component</p>.
In development, rendering ~1500 components takes ~200-300ms. By removing some components I was able to get ~1200 components to render in ~175-225ms.
In production, renderToString on ~1500 components takes around ~50-200ms.
The time does appear to be linear. No one component is slow, rather it is the sum of many.
Problem
This creates some problems on the server. The lengthy method results in long server response times. The TTFB is a lot higher than it should be. With api calls and business logic the response should be 250ms, but with a 250ms renderToString it is doubled! Bad for SEO and users. Also, being a synchronous method, renderToString() can block the node server and backup subsequent requests (this could be solved by using 2 separate node servers: 1 as a web server, and 1 as a service to solely render react).
Attempts
Ideally, it would take 5-50ms to renderToString in production. I've been working on some ideas, but I'm not exactly sure what the best approach would be.
Idea 1: Caching components
Any component that is marked as 'static' could be cached. By keeping a cache with the rendered markup, the renderToString() could check the cache before rendering. If it finds a component, it automatically grabs the string. Doing this at a high level component would save all the nested children component's mounting. You would have to replace the cached component markup's react rootID with the current rootID.
Idea 2: Marking components as simple/dumb
By defining a component as 'simple', react should be able to skip all the lifecycle methods when rendering. React already does this for the core react dom components (<p/>, <h1/>, etc). Would be nice to extend custom components to use the same optimization.
Idea 3: Skip components on server-side render
Components that do not need to be returned by the server (no SEO value) could simply be skipped on the server. Once the client loads, set a clientLoaded flag to true and pass it down to enforce a re-render.
Closing and other attempts
The only solution I've implemented thus far is to reduce the number of components that are rendered on the server.
Some projects we're looking at include:
React-dom-stream (still working on implementing this for a test)
Babel inline elements (seems like this is along the lines of Idea 2)
Has anybody faced similar issues? What have you been able to do?
Thanks.
Using react-router1.0 and react0.14, we were mistakenly serializing our flux object multiple times.
RoutingContext will call createElement for every template in your react-router routes. This allows you to inject whatever props you want. We also use flux. We send down a serialized version of a large object. In our case, we were doing flux.serialize() within createElement. The serialization method could take ~20ms. With 4 templates, that would be an extra 80ms to your renderToString() method!
Old code:
function createElement(Component, props) {
props = _.extend(props, {
flux: flux,
path: path,
serializedFlux: flux.serialize();
});
return <Component {...props} />;
}
var start = Date.now();
markup = renderToString(<RoutingContext {...renderProps} createElement={createElement} />);
console.log(Date.now() - start);
Easily optimized to this:
var serializedFlux = flux.serialize(); // serialize one time only!
function createElement(Component, props) {
props = _.extend(props, {
flux: flux,
path: path,
serializedFlux: serializedFlux
});
return <Component {...props} />;
}
var start = Date.now();
markup = renderToString(<RoutingContext {...renderProps} createElement={createElement} />);
console.log(Date.now() - start);
In my case this helped reduce the renderToString() time from ~120ms to ~30ms. (You still need to add the 1x serialize()'s ~20ms to the total, which happens before the renderToString()) It was a nice quick improvement. -- It's important to remember to always do things correctly, even if you don't know the immediate impact!
Idea 1: Caching components
Update 1: I've added a complete working example at the bottom. It caches components in memory and updates data-reactid.
This can actually be done easily. You should monkey-patch ReactCompositeComponent and check for a cached version:
import ReactCompositeComponent from 'react/lib/ReactCompositeComponent';
const originalMountComponent = ReactCompositeComponent.Mixin.mountComponent;
ReactCompositeComponent.Mixin.mountComponent = function() {
if (hasCachedVersion(this)) return cache;
return originalMountComponent.apply(this, arguments)
}
You should do this before you require('react') anywhere in your app.
Webpack note: If you use something like new webpack.ProvidePlugin({'React': 'react'}) you should change it to new webpack.ProvidePlugin({'React': 'react-override'}) where you do your modifications in react-override.js and export react (i.e. module.exports = require('react'))
A complete example that caches in memory and updates reactid attribute could be this:
import ReactCompositeComponent from 'react/lib/ReactCompositeComponent';
import jsan from 'jsan';
import Logo from './logo.svg';
const cachable = [Logo];
const cache = {};
function splitMarkup(markup) {
var markupParts = [];
var reactIdPos = -1;
var endPos, startPos = 0;
while ((reactIdPos = markup.indexOf('reactid="', reactIdPos + 1)) != -1) {
endPos = reactIdPos + 9;
markupParts.push(markup.substring(startPos, endPos))
startPos = markup.indexOf('"', endPos);
}
markupParts.push(markup.substring(startPos))
return markupParts;
}
function refreshMarkup(markup, hostContainerInfo) {
var refreshedMarkup = '';
var reactid;
var reactIdSlotCount = markup.length - 1;
for (var i = 0; i <= reactIdSlotCount; i++) {
reactid = i != reactIdSlotCount ? hostContainerInfo._idCounter++ : '';
refreshedMarkup += markup[i] + reactid
}
return refreshedMarkup;
}
const originalMountComponent = ReactCompositeComponent.Mixin.mountComponent;
ReactCompositeComponent.Mixin.mountComponent = function (renderedElement, hostParent, hostContainerInfo, transaction, context) {
return originalMountComponent.apply(this, arguments);
var el = this._currentElement;
var elType = el.type;
var markup;
if (cachable.indexOf(elType) > -1) {
var publicProps = el.props;
var id = elType.name + ':' + jsan.stringify(publicProps);
markup = cache[id];
if (markup) {
return refreshMarkup(markup, hostContainerInfo)
} else {
markup = originalMountComponent.apply(this, arguments);
cache[id] = splitMarkup(markup);
}
} else {
markup = originalMountComponent.apply(this, arguments)
}
return markup;
}
module.exports = require('react');
It's not a complete solution
I had the same issue, with my react isomorphic app, and I used a couple of things.
Use Nginx in front of your nodejs server, and cache the rendered response for a short time.
In Case of showing a list of items, I use only a subset of list. For example, I will render only X items to fill up the viewport, and load the rest of the list in the client side using Websocket or XHR.
Some of my components are empty in serverside rendering and will only load from client side code (componentDidMount).
These components are usually graphs or profile related components. Those components usually don't have any benefit from SEO point of view
About SEO, from my experience 6 Month with an isomorphic app. Google Bot can read Client side React Web page easily, so I'm not sure why we bother with the server side rendering.
Keep the <Head>and <Footer> as static string or use template engine (Reactjs-handlebars), and render only the content of the page, (it should save a few rendered components). In case of a single page app, you can update the title description in each navigation inside Router.Run.
I think fast-react-render can help you. It increases the performance of your server rendering three times.
For try it, you only need to install package and replace ReactDOM.renderToString to FastReactRender.elementToString:
var ReactRender = require('fast-react-render');
var element = React.createElement(Component, {property: 'value'});
console.log(ReactRender.elementToString(element, {context: {}}));
Also you can use fast-react-server, in that case render will be 14 times as fast as traditional react rendering. But for that each component, which you want to render, must be declared with it (see an example in fast-react-seed, how you can do it for webpack).

Resources