JMeter: How to record a script for specific domain URLs? - jmeter

I am trying to capture a script using HTTP(S) Test Script Recorder in JMeter3.0. When I start start capturing, URLs from other domains for ex. download.cdn.mozilla.net are also getting captured. I don't want these URLs to be recorded, I want to record URLs for a specific domain only.
So, how to achieve this in JMeter3.0?
Note: I tried using URL Patters to Exclude but as I can not predict the other domain URLs, I don't want to use this option.
I also tried URL Patters to Include by specifying a specific domain i.e. ^((?!DOMAINNAME).)*$, but it is still recording the other domain URLs.

I would recommend breaking down your requirement into 2 parts:
Include your domain only
Exclude everything else
So, given I want to record JMeter Home Page and filter out any external resources the relevant configuration would be:
URL Patterns to Include: .*jmeter.apache.org.*
URL Pattens to Exclude: .*
Both inputs accept Perl5-compatible regular expressions so double check if the values your'e providing match (or don't match) the URL patterns captured by JMeter.
References:
JMeter Regular Expressions
Excluding Domains From The Load Test

Related

how to properly match your own domain in "URL patterns to include" in jmeter script recorder

our site makes 200+ requests. about 100 are our own urls (images, css, js, fonts etc). the other 100 are google analytics, newrelic, tealium, and lots of dross.
i want to match all, and only, requests to our site, which is www.mysite.com.
In "URLS Patterns to Include" I tried:
.*mysite.com.*
But this also includes many of the marketing requests which include the site name in the url parameters.
Next I tried this:
https:\/\/mysite.com.*
https:\/\/www.mysite.com.*
but get no results back.
what is the proper way to include only, and all, resources loaded from your own domain?
I think this could be the way:
^www.mysite.com.*
It seems to return the right number of requests (when I clear cache before recording of course)
Is this the best solution?
If you look at ProxyControl.generateMatchUrl() function source code you will see the following:
private String generateMatchUrl(HTTPSamplerBase sampler) {
StringBuilder buf = new StringBuilder(sampler.getDomain());
buf.append(':'); // $NON-NLS-1$
buf.append(sampler.getPort());
buf.append(sampler.getPath());
if (sampler.getQueryString().length() > 0) {
buf.append('?'); // $NON-NLS-1$
buf.append(sampler.getQueryString());
}
return buf.toString();
}
Pay attention to this sampler.getDomain() bit which returns DNS hostname or IP address of the URL so if you add protocol there (http or https) the function will not match anything.
So you will have to provide patterns without protocol section like in the "Suggested Excludes"
If you have to include the protocol - I think you will need to re-consider your approach to recording and switch to i.e. JMeter Chrome Extension which provides possibility to filter the requests including the protocol:
Moreover you won't have to worry about proxies, certificates, etc.

JMeter Load Test: Disable/remove duplicate page and -0 -1 in result tree

I export the jmx script from blazemeter chrome plugin.
Question 1:
My load test accurate or not if I disable or remove the duplicate url and .php or .json (it something like themes or scriptpage) because the page that I want to test actually without the extension.
https://i.stack.imgur.com/85IJs.png
Question 2:
What is the meaning of number -0 , -1 ,... in result tree .... as below pic?
https://i.stack.imgur.com/sEajc.png
If you want to exclude some specific URLs or patterns while recording, then use the Requests Filtering tab at the HTTP(S) Test Script Recorder
Please check this Guide for further reference: Excluding Domains from the Load Test.
This is basically the index of your sub-samplers. This SubResult Naming Policy is introduced from the JMeter version 5.0.
Before version 5.0, when your HTTP Sampler contained sub results, they would have their own names. In 5.0, a new custom naming policy was introduced for sub-samplers.
sub-result name = parent sampler name + index
Check this BUG 62550 for further reference.

Get the path of an HTTP Request from another component

I am trying to let JMeter crowl my website to ensure a realistic stress test. I was able to extract the URLs from the home page and iterate on them. So I have a regular expression feeding a ForEach loop.
Now I am not able to let an HTTP Request take the output of the loop (Defined as a variable with a name) as its path.
Is there a general approach to setting the path of such a request. JMeter is taking something like:
${MyVar}
set in the path of the request as a string and is not replacing it with the actual value.
Given your Regular Expression Extractor and ForEach Controller configurations are correct everything should work fine. If you need any assistance with this provide the following screenshots:
Regular Expression Extractor configuration
Debug PostProcessor or Debug Sampler output in the View Results Tree listener showing several generated JMeter Variables
ForEach Controller configuration
HTTP Request sampler configuration (i.e. where do you put the variable)
Be aware that you can mimic crawling the site more easily using HTML Link Parser the relevant configuration would be as simple as
See How to Spider a Site with JMeter - A Tutorial to learn more about simulating websites crawling.

How to force Jmeter proxy server to listen only for specified page

I am using Jmeter proxy to record steps which i take in browser. Is there a possibility to set up proxy server just to listen for one specific page?
I want record only steps which are taken on www.test123.com
Thanks
Use Include pattern to restrict the requests that are recorded.
As per http://jmeter.apache.org/usermanual/component_reference.html#HTTP_Proxy_Server
The include and exclude patterns are treated as regular expressions (using Jakarta ORO). They will be matched against the host name, port (actual or implied) path and query (if any) of each browser request. If the URL you are browsing is
"http://jmeter.apache.org/jmeter/index.html?username=xxxx" ,
then the regular expression will be tested against the string:
"jmeter.apache.org:80/jmeter/index.html?username=xxxx" .
Thus, if you want to include all .html files, your regular expression might look like:
".*\.html(\?.*)?" - or ".*\.html" if you know that there is no query string or you only want html pages without query strings.
"www.test123.com.*" should record requests only from the given URL.

Detecting URL rewrites (SEO urls)

How could a client detect if a server is using Search Engine Optimizing techniques such as using mod_rewrite to implement "seo friendly urls."
For example:
Normal url:
http://somedomain.com/index.php?type=pic&id=1
SEO friendly URL:
http://somedomain.com/pic/1
Since mod_rewrite runs server side, there is no way a client can detect it for sure.
The only thing you can do client side is to look for some clues:
Is the HTML generated dynamic and that changes between calls? Then /pic/1 would need to be handled by some script and is most likely not the real URL.
Like said before: are there <link rel="canonical"> tags? Then the website likes to tell the search engine, which URL of multiple with the same content it should use from.
Modify parts of the URL and see, if you get an 404. In /pic/1 I would modify "1".
If there is no mod_rewrite it will return 404. If it is, the error is handled by the server side scripting language and can return a 404, but in most cases would return a 200 page printing an error.
You can use a <link rel="canonical" href="..." /> tag.
The SEO aspect is usually on words in the URL, so you can probably ignore any parts that are numeric. Usually SEO is applied over a group of like content, such that is has a common base URL, for example:
Base www.domain.ext/article, with fully URL examples being:
www.domain.ext/article/2011/06/15/man-bites-dog
www.domain.ext/article/2010/12/01/beauty-not-just-skin-deep
Such that the SEO aspect of the URL is the suffix. Algorithm to apply is typify each "folder" after the common base assigning it a "datatype" - numeric, text, alphanumeric and then score as follows:
HTTP Response Code is 200: should be obvious, but you can get a 404 www.domain.ext/errors/file-not-found that would pass the other checks listed.
Non Numeric, with Separators, Spell Checked: separators are usually dashes, underscores or spaces. Take each word and perform a spell check. If the words are valid - including proper names.
Spell Checked URL Text on Page if the text passes a spell check, analyze the page content to see if it appears there.
Spell Checked URL Text on Page Inside a Tag: if prior is true, mark again if text in its entirety is inside an HTML tag.
Tag is Important: if prior is true and tag is <title> or <h#> tag.
Usually with this approach you'll have a max of 5 points, unless multiple folders in the URL meet the criteria, with higher values being better. Now you can probably improve this by using a Bayesian probability approach that uses the above to featurize (i.e. detects the occurrence of some phenomenon) URLs, plus come up with some other clever featurizations. But, then you've got to train the algorithm, which may not be worth it.
Now based on your example, you also want to capture situations where the URL has been designed such that a crawler will index because query parameters are now part of the URL instead. In that case you can still typify suffixes' folders to arrive at patterns of data types - in your example's case that a common prefix is always trailed by an integer - and score those URLs as being SEO friendly as well.
I presume you would be using of the curl variants.
You could try sending the same request but with different "user agent" values.
i.e. send the request one using user agent "Mozzilla/5.0" and a second time using User Agent "Googlebot" if the server is doing something special for web crawlers then there should be a different response
With the frameworks today and url routing they provide I don't even need to use mod_rewrite to create friendly urls such http://somedomain.com/pic/1 so I doubt you can detect anything. I would create such urls for all visitors, crawlers or not. Maybe you can spoof some bot headers to pretend you're a known crawler and see if there's any change. Dunno how legal that is tbh.
For the dynamic url's pattern, its better to use <link rel="canonical" href="..." /> tag for other duplicate

Resources