Trying to extract lines that contain string between brackets - python-3.9

The lines contain url, status code, and some other stuff. An example of the lines:
https://3836200.domain.com/ [404]
So I figured out that I can use
__contains__('200')
But that will not work because 200 is in the subdomain but the status code is 404. I'm trying to separate lines by status code. I then figured out how to use re.search to get the strings between the brackets but it doesn't print the whole line. Any help, or reference to an article on this. Thanks. Bts I'm using python3.9

Considering that this format is maintained. Here is a solution for this.
import re
regex= "(\/ \[(\d+)\])$"
str = "https://3836200.domain.com/ [404]"
search = re.search(regex, str)
if search is not None:
print(search.group(2))
Output : 404

Related

Jmeter - How Can I Replace a string and resend it?

I'm trying to create a script that will take a URL out of a response and send it out again.
Using the regular expression extractor I've succeeded in taking the wanted URL, but it holds "&" so naturally when sending it out the request fails.
Example:
GET http://[ia-test01.inner-active.mobi:8080/simpleM2M/ClientUpdateStatus?cn=WS2006&v=2_1_0-iOS-2_0_3_7&ci=99999&s=3852719769860497476&cip=113-170-93-111&po=642&re=1&lt=0&cc=VN&acp=&pcp=]/
I'm trying to replace the "&" with a "&".
I've tried: ${__javaScript(${url}.replace("&","&"))}
But it did not work. I've tried the regex function as well- the same.
I'm not sure the IP field in the request supports the us e of functions.
I'm currently trying to use the beanshell post-processor. But I'm pretty sure there is a simpler solution I'm missing.
Not sure what you're trying to get by replacing & with & however will try to respond.
First of all: given multiple & instances you need to use replaceall function, not replace
Second: replace / replaceall functions take a RegEx as parameter, so you'll need to escape your &
If you're trying to substitute URL Path in realtime, you'll need Beanshell Pre Processor, not the Post Processor
Sample Beanshell Pre-Processor code
import java.net.URL;
URL myURL = sampler.getUrl();
String path = myURL.getPath();
String path_replaced = path.replaceAll("\\&", "&");
vars.put("NEW_PATH", path_replaced);
After that put ${NEW_PATH} to "Path:" section of your HTTP Request.
Hope this helps.
Solution with less code:
Install the Custom JMeter Functions plugin
Use the following syntax
${__strReplace(ImAGoodBoy,Good,Bad,replaceVar)}
‘ImAGoodBoy’ is a string in which replacement will take place
‘Good’ is a substring to be replaced
‘Bad’ is the replacement string
‘replaceVar’ is a variable to save result string
Refer this URL for more info!
Thank a lot. However, i see from a recent experience that to replace a character that is actually a RegExp special character, like \ " ( ) etc, you need to put 3 backslashes and not 1, not 2. This is weird.
so you write
var res = str.replaceAll("\\\\u003c", "<");
to replace \u003c with <

What can I do to search for a string in a really big txt file with Ruby?

I have met with a problem which I can't find a good way to solve it.
Problem description:
File 1: short_map.txt, contains with over 2millon lines with each line consist of a short url like the one in twitter and its corresponding full web url.
(eg."http://bit.ly/18sy7Fzhttp://www.london24.com/spurs_star_townsend_deemed_hodgson_joke_a_compliment_1_2903643?utm_source=Daily+News&utm_medium=twitter"
)
File 2: html_index.txt, contains with about 50k lines with each line stands for a full web url.
(eg."http://www.redbubble.com/people/tipptoggy/works/10898437-rock-of-cashel")
I want to get the corresponding short url of each web url in the html_index.txt file and output it into a new txt file.
My way of doing it is to read each line of html_index.txt and then compare it with each line in short_map.txt and with this way I can get everything I want. The problem is: it's too slow!
Could anyone help me with a way faster algorithm to do this?
Problem solved: Using hash table will work, refer to the first answer please! Thanks!
Read the short_map.txt file contents into a hash where in the key would be the long url and the corresponding short url would be its value. When you want to retrieve a short url, you could just do a hash lookup, which is extremely fast.

How to get the last word from a URL?

Is there a method to extract just the last word from the URL example below? I would like to be able to use this as a heading on a page, i.e the "Account" page.
I found that by using request.path it will give me the path without the root but I'm not sure how to get just the last path name.
/users/1234/account
Try:
request.path.split('/').last
If you want "Account" (instead of "account"), call the capitalize method on the result.
I am not familiar with Ruby, but you can try this approach.
Try string splitting request.path with '/' as the separator and take the last element from the resulting array
users/1234/account will be split to {'user', '1234', 'account'}
Even though this doesn't answer your question directly, I hope it gives you a start
URLs are a simple string consisting of a scheme showing how to connect to a site, the host where the resource is located, plus a path to that resource. You can use File.basename to get the last part of that path, just like we'd use on a file on our disk:
File.basename('/users/1234/account')
=> "account"
Suppose you have URL like https://www.google.com/user/lastword
If you want to store last word of the URL which is lastword in a variable then use the following and pass url as value to finalVal.
var getLastWordFromUrl = finalVal.split("/").last()

Ruby RegEx issue

I'm having a problem getting my RegEx to work with my Ruby script.
Here is what I'm trying to match:
http://my.test.website.com/{GUID}/{GUID}/
Here is the RegEx that I've tested and should be matching the string as shown above:
/([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)/
3 capturing groups:
group 1: ([-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])*?\/)
group 2: (\/[-a-zA-Z0-9#:%_\+.~#?&\/\/=]*)
group 3: ([\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}\/\/])
Ruby is giving me an error when trying to validate a match against this regex:
empty range in char class: (My RegEx goes here) (SyntaxError)
I appreciate any thoughts or suggestions on this.
You could simplify things a bit by using URI to deal parsing the URL, \h in the regex, and scan to pull out the GUIDs:
uri = URI.parse(your_url)
path = uri.path
guids = path.scan(/\h{8}-\h{4}-\h{4}-\h{4}-\h{12}/)
If you need any of the non-path components of the URL the you can easily pull them out of uri.
You might need to tighten things up a bit depending on your data or it might be sufficient to check that guids has two elements.
You have several errors in your RegEx. I am very sleepy now, so I'll just give you a hint instead of a solution:
...[\/\/[0-9a-fA-F]....
the first [ does not belong there. Also, having \/\/ inside [] is unnecessary - you only need each character once inside []. Also,
...[-a-zA-Z0-9#:%_\+.~#?&\/\/=]{2,256}...
is greedy, and includes a period - indeed, includes all chars (AFAICS) that can come after it, effectively swallowing the whole string (when you get rid of other bugs). Consider {2,256}? instead.

How can multiple trailing slashes can be removed from a URL in Ruby

What I'm trying to achieve here is lets say we have two example URLs:
url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////"
url2 = "http://www.example.com/"
How can I extract the striped down URLs?
url1 = "http://emy.dod.com/kaskaa/dkaiad/amaa"
url2 = "http://http://www.example.com"
URI.parse in Ruby sanitizes certain type of malformed URL but is ineffective in this case.
If we use regex then /^(.*)\/$/ removes a single slash / from url1 and is ineffective for url2.
Is anybody aware of how to handle this type of URL parsing?
The point here is I don't want my system to have http://www.example.com/ and http://www.example.com being treated as two different URLs. And same goes for http://emy.dod.com/kaskaa/dkaiad/amaa//// and http://emy.dod.com/kaskaa/dkaiad/amaa/.
If you just need to remove all slashes from the end of the url string then you can try the following regex:
"http://emy.dod.com/kaskaa/dkaiad/amaa//////////".sub(/(\/)+$/,'')
"http://www.example.com/".sub(/(\/)+$/,'')
/(\/)+$/ - this regex finds one or more slashes at the end of the string. Then we replace this match with empty string.
Hope this helps.
Although this thread is a bit old and the top answer is quite good, but I suggest another way to do this:
/^(.*?)\/$/
You could see it in action here: https://regex101.com/r/vC6yX1/2
The magic here is *?, which does a lazy match. So the entire expression could be translated as:
Match as few characters as it can and capture it, while match as many slashes as it can at the end.
Which means, in a more plain English, removes all trailing slashes.
def without_trailing_slash path
path[ %r(.*[^/]) ]
end
path = "http://emy.dod.com/kaskaa/dkaiad/amaa//////////"
puts without_trailing_slash path # "http://emy.dod.com/kaskaa/dkaiad/amaa"

Resources