Need list of links and child page paths from a website using Nightwatch - nightwatch.js

Is it possible to get list of URL's / links available in a website by providing base URL as an input. Using nightwatch, if i pass base URL it should give all hyperlinks, child pages links etc. using Nightwatch. If its possible, how we can achieve it.

Not sure Nightwatch is the right choice for what you are trying to achieve. Maybe you can let us know what you want to do with the list of urls you fetch from the page?
You can set the URL via page object configuration and when the page loads (page.navigate()) you can return all link elements. Something like:
browser
.elements("css selector", "a", result => {
// Do something with 'result' that will contain all the elements matching "a"
})
But I recommend you read more about Nightwatch to determine if that is the right tool for your needs.
More about Nightwatch: http://nightwatchjs.org/gettingstarted
More about Page objects: https://martinfowler.com/bliki/PageObject.html

Related

How to load a specific number of records per page and add an more button

On my page I would like to output all records of a specific folder
but the number should initially be limited to a certain quantity (to reduce the loading times). With a "Load more" button further records should be loaded.
Does anyone have a hint on how I can achieve this?
I have already found several approaches on the web in connection with AJAX, but since I'm not familiar with this yet, more questions than answers have emerged ...
For info: I use an own Template Extension / Distribution under Typo3 9.5.8
Thank you in advance for any help!!
The state of the art solution is the AJAX solution, where you load only the required records from the server and modify the page on the fly.
Another option would be an URL parameter which is evaluated by your extension.
With the parameter the full list is shown,
without only the first N and a button with the link to the same URL including the parameter for the full list.
Make sure the paramter is handled correctly and generates another cached version of the page. (keywords: cHash)
As you now have two pages with partially identical content: don't forget to tell the searchengines that the short variant should not be indexed.
You could use the Paginate Widget like documented here: https://docs.typo3.org/other/typo3/view-helper-reference/9.5/en-us/typo3/fluid/latest/Widget/Paginate.html
By overriding the paginate template file and only rendering the pagination.nextPage link, you could load the nextpage via AJAX.

I need to scrape that number xpath + aspx

I'm trying to get the total number of the page (or the page value) of this URL:
http://scorelibrary.fabermusic.com/String-Quartet-No-2-300-Weihnachtslieder-23032.aspx
1/58
I think that I can't because the values are inside in the ASPX frame.
I'll try a lot of thing. This is the line:
<label id="page_count">1/58</label>
using the following XPath
//label[#id='page_count']/text()
How can I use XPath inside the ASPX frame to get the page value?
You are right, you cannot get that value directly because the element is in an <iframe>, and therefore lives in a different context. You need to activate or switch to the URL context of the iframe. There are JavaScript approaches like postMessage, but I think the easiest way is loading the URL of the iframe directly and access the DOM from there.

how does html link parser preprocessor works in jmeter

I want to know about the working of HTML links parser preprocessor that how does it work how to retrieve all links and all other elements that are present in the HTML response. As far as I have checked on each blog it is written that .* will extract all links but what about other elements what if I don't want links and I want to test with other elements like I want to fetch image source or I want to play with drop down or radio button available in response . How can I extract those?
Is there going to be any other regex for that or the same one .*?
As per documentation
This modifier parses HTML response from the server and extracts links and forms
so there are 2 main use cases for the HTML link parser:
site links crawling (spidering)
submitting random data into a form
In both cases you need to provide a Perl-5 Compatible Regular Expression in order to limit crawling to current domain or narrow down options selection.
If you need to fetch image(s) source(s) the best option would be using CSS/JQuery Extractor configured like:
Selector: img
Attribute: src

Copying the xpath from Instagram inspect (using chrome) returns an empty list

So I would go to an instagram account, say, https://www.instagram.com/foodie/ to copy its xpath that gives me the number of posts, number of followers, and number of following.
I would then run the command this command on a scrapy shell:
response.xpath('//*[#id="react-root"]/section/main/article/header/section/ul')
to grab the elements on that list but scrapy keeps returning an empty list. Any thoughts on what I'm doing wrong here? Thanks in advance!
This site is a Single Page Application (SPA) so it's javascript that render DOM is not rendered yet at the time your downloader working.
When you use view(response) the javascript that your downloader collected can continue render by your browser, so you can see the page with DOM rendered (but can't interacting with Site API). You can look at your downloaded content via response.text and saw that!
In this case, you can apply selenium + phantomjs to making a rendered page for your spider!
Another trick: You can use regular expression to select the JSON part of Script, parse it to JSON obj and select your correspond attribute value (number of post, following, ...) from script!

Retrieve the content of a section via MediaWiki API

I have a MediaWiki page set up in my company's intranet.
I would like to get the content of a section in a specific page using MediaWiki API (through AJAX).
I would like to refer to the section by its title like 'General' and refer to the page by its title as well, like 'Licenses'.
Is it possible somehow?
The only thing I could achieve is referring to the page by its title and refer to the section by a number like this:
http://mywiki.local/wiki/api.php?format=xml&action=parse&prop=text&page=Licenses&section=1
But let's say I create a new section before 'General' I would have to update all my AJAX URLs that queries this page. So this isn't good enough.
I couldn't find any working solution for this. Any ideas?
You can do this by first retrieving prop=sections to get the list of sections and their numbers:
http://en.wikipedia.org/w/api.php?format=xml&action=parse&prop=sections&page=License
Then make your original request, with the section number you figured out based on the previous request.
Keep in mind that two different sections can have the same name.

Resources