Extracting the anchor from a URL in ruby - ruby

I looked through the documentation of the URI class in ruby but couldn't find a way to extract the anchor (HTML) from the instance. For example, in
http://example.com/index.php?q=something#anchor
I would like to get the anchor text. Trivial solution is to manipulate the text with regular expressions but if there is some method for it, then it's much better.

The URI module provides a fragment attribute. e.g:
>> uri = URI("http://example.com/index.php?q=something#anchor")
>> uri.fragment
=> "anchor"

Related

Find a url in a document using regex in ruby

I have been trying to find a url in a html document and this has to be done in regex since the url is not in any html tag so I can't use nokogiri for that. To get the html i used httparty and i did it this way
require 'httparty'
doc = HTTParty.get("http://127.0.0.1:4040")
puts doc
That outputs the html code. And to get the url i used the .split() method to reach to the url. The full code is
require 'httparty'
doc = HTTParty.get('http://127.0.0.1:4040').split(".ngrok.io")[0].split('https:')[2]
puts "https:#{doc}.ngrok.io"
I wanted to do this using regex since ngrok might update their localhost html file and so this code won't work anymore. How do i do it?
If I understood correctly you want to find all hostnames matching "https://(any subdomain).ngrok.io", right ?
If then you want to use String#scan with a regexp. Here is an example:
# get your body (replace with your HTTP request)
body = "my doc contains https://subdomain.ngrok.io and https://subdomain-1.subdomain.ngrok.io"
puts body
# Use scan and you're done
urls = body.scan(%r{https://[0-9A-Za-z-\.]+\.ngrok\.io})
puts urls
It will result in an array containing ["https://subdomain.ngrok.io", "https://subdomain-1.subdomain.ngrok.io"]
Call .uniq if you want to get rid of duplicates
This doesn't handle ALL edge cases but it's probably enough for what you need

How can I convert a relative link in Mechanize to an absolute one?

Is there is a way to convert a Mechanize relative-link object to another one which contains the absolute URL.
Mechanize must know the absolute link, because I can call the click method on relative links too.
You can just merge the page uri (which is always absolute) with the link uri:
page.uri.merge link.uri
This is not specific to Mechanize, but an easy way would be to use the base URL in the <base> tag and add it to the relative URL to use for whatever purpose you want. This generally works.
But, then I'm not sure if you could call the click method on that since I don't know Mechanize that well.
You can also use resolve
Example:
require 'mechanize'
agent = Mechanize.new
page = agent.get(url)
some_rel_url = '/something'
url = agent.resolve(some_rel_url)
Keep in mind that the other answers provided do not take into account all the possibilities to get the base url as described here
Basically this:

Obtain XML element's value from REST server response using Ruby

n00b REST question. I'm making a GET request to an API's endpoint and getting the proper XML response. The question I have is, how do I get the value of a particular XML element in the servers REST response using Ruby?
So let's say one of the elements is 'Body' and I want to assign its value 'Blah blah blah' to a variable
Part of the XML response:
<Body>Blah blah blah</Body>
How would I do that with the response? Basically I want to do something like this
variable = params["Body"]
Thanks in advance!
The best solution is to use RestClient or HTTParty and have it parse the response for you.
Otherwise, you'll have to parse the response itself using a library such as Nokogiri:
doc = Nokogiri.XML(response)
variable = doc.at("body").text
You'll want to use an XML parser of some kind.
It sounds like you want something like XmlSimple, which will turn an XML document into ruby arrays and hashes. There's tons of examples of how to use it on the page that has been linked.
One thing to be aware of is that XML to native container mappings are imperfect. If you're dealing with a complex document, you'll likely want to use a more robust parser, like Nokogiri.
If you want full XML Object Mapping, HappyMapper is a decent library, although it isn't very active anymore. It can work with XML from any source, so you'll still want something like the libraries mentioned by #Fitzsimmons or #MarkThomas to do the HTTP request.

How can I get Mechanize objects from Mechanize::Page's search method?

I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.
I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.
Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?
Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:
a = Mechanize.new
page = a.get 'https://stackoverflow.com/'
# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'
# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page
# Use it!
form.q = 'Foobar'
result = form.submit
Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.
There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.

Sinatra and computed view name

I'm completely new in Ruby and Sinatra so please forgive the trivial question:
I wanted to compute the view name instead of just passing in a symbol. I wanted the same action to return different views depending on the current state. There's like 20 different states so putting a good naming convention allows me to express the view name as a string very easily:
get "/page" do
erb "page-#{session[:page]}"
end
When I do that all I get is the string instead of the rendered view. Can anyone explain to me how I could do that in Sinatra?
I'd say you're probably looking for String#to_sym. I didn't test now, but all the examples say erb receives a symbol argument, not a string - so try this:
erb "page-#{session[:page]}".to_sym
or equivalently
erb :"page-#{session[:page]}"
If you pass string to erb it tries to render that string directly, not seeking for view with corresponding name. Converting string to symbol will help.

Resources