How to parse a URL using Ruby - ruby

Hi
how i print
http://site.tf/home/
from
http://site.tf/home/index.php?id=12
using ruby parse url

Do like this
require 'uri'
uri = URI.parse('http://site.tf/home/index.php?id=12')
"#{uri.scheme}://#{uri.host}/#{uri.path.split('/')[1]}"
#=> "http://site.tf/home/"
Didn't tested though.I guess it should work fine
Update
If you want just site.tf,just do like this
require 'uri'
uri = URI.parse('http://site.tf/home/index.php?id=12')
uri.host.split('/').first
#=> "site.tf"

Related

want to get taobao's list of URL of products on search result page without taobao API

I want to get taobao's list of URL of products on search result page without taobao API.
I tried following Ruby script.
require "open-uri"
require "rubygems"
require "nokogiri"
url='https://world.taobao.com/search/search.htm?_ksTS=1517338530524_300&spm=a21bp.7806943.20151106.1&search_type=0&_input_charset=utf-8&navigator=all&json=on&q=%E6%99%BA%E8%83%BD%E6%89%8B%E8%A1%A8&cna=htqfEgp0pnwCATyQWEDB%2FRCE&callback=__jsonp_cb&abtest=_AB-LR517-LR854-LR895-PR517-PR854-PR895'
charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end
doc = Nokogiri::HTML.parse(html, nil, charset)
p doc.xpath('//*[#id="list-itemList"]/div/div/ul/li[1]/div/div[1]/div/a/#href').each{|i| puts i.text}
# => 0
I want to get list of URL like https://click.simba.taobao.com/cc_im?p=%D6%C7%C4%DC%CA%D6%B1%ED&s=328917633&k=525&e=lDs3%2BStGrhmNjUyxd8vQgTvfT37ERKUkJtUYVk0Fu%2FVZc0vyfhbmm9J7EYm6FR5sh%2BLS%2FyzVVWDh7%2FfsE6tfNMMXhI%2B0UDC%2FWUl0TVvvELm1aVClOoSyIIt8ABsLj0Cfp5je%2FwbwaEz8tmCoZFXvwyPz%2F%2ByQnqo1aHsxssXTFVCsSHkx4WMF4kAJ56h9nOp2im5c3WXYS4sLWfJKNVUNrw%2BpEPOoEyjgc%2Fum8LOuDJdaryOqOtghPVQXDFcIJ70E1c5A%2F3bFCO7mlhhsIlyS%2F6JgcI%2BCdFFR%2BwwAwPq4J5149i5fG90xFC36H%2B6u9EBPvn2ws%2F3%2BHHXRqztKxB9a0FyA0nyd%2BlQX%2FeDu0eNS7syyliXsttpfoRv3qrkLwaIIuERgjVDODL9nFyPftrSrn0UKrE5HoJxUtEjsZNeQxqovgnMsw6Jeaosp7zbesM2QBfpp6NMvKM5e5s1buUV%2F1AkICwRxH7wrUN4%2BFn%2FJ0%2FIDJa4fQd4KNO7J5gQRFseQ9Z1SEPDHzgw%3D however I am getting 0
What should I do?
I don't know taobao.com but the page seems like its running lots of javascript. So perhaps the content can actually not be retrieved with a client without javascript capabilities. So instead of open-uri, you could try the gem selenium-webdriver:
https://rubygems.org/gems/selenium-webdriver/versions/2.53.4

Extracting all URLs from a page using Ruby [duplicate]

This question already has answers here:
How to extract URLs from text
(6 answers)
Closed 8 years ago.
I am trying to extract all the URLs from the raw output of some Ruby code:
require 'open-uri'
reqt = open("http://www.google.com").read
reqt.each_line { |line|
if line =~/http/ then
puts URI.extract(line)
end }
What am I doing wrong? I am getting extra lines along with URLs.
Remember the URL doesn't have to start with "http" - it could be a relative URL, the path to the current page. IMO it is the best to use Nokogiri to parse the HTML:
require 'open-uri'
require 'nokogiri'
reqt = open("http://www.google.com")
doc = Nokogiri::HTML(reqt)
doc.xpath('//a[#href]').each do |a|
puts a.attr('href')
end
But if you really want to find only the absolute URLs, add a simple condition:
puts a.attr('href') if a.attr('href') =~ /^http/i
You can do this instead:
require 'open-uri'
reqt = open("http://www.google.com").read
urls = reqt.scan(/[[:lower:]]+:\/\/[^\s"]+/)

What is a good way to extract a url within a url in Ruby?

Given url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
What is a good way to extract http://example.com/yyy/zzz.jpg?
EDIT:
I would like to extract the second url.
I'd do :-
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI(url)
URI.decode_www_form(uri.query).select { |_,b| b[/^http(s)?/] }.map(&:last)
# => ["http://example.com/yyy/zzz.jpg"]
# or something like
Hash[URI.decode_www_form(uri.query)]['u'] # => "http://example.com/yyy/zzz.jpg"
require "uri"
URI.extract("text here http://foo.example.org/bla and here mailto:test#example.com and here also.")
# => ["http://foo.example.org/bla", "mailto:test#example.com"]
http://www.ruby-doc.org/stdlib-2.1.1/libdoc/uri/rdoc/URI.html
Using Ruby 2.0+:
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI.parse(url)
URI.decode_www_form(uri.query).to_h['u'] # => "http://example.com/yyy/zzz.jpg"
For Ruby < 2.0:
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI.parse(url)
Hash[URI.decode_www_form(uri.query)]['u'] # => "http://example.com/yyy/zzz.jpg"
The Addressable gem is very full-featured, and follows the specs better than URI. The same thing can be done using:
require 'addressable/uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = Addressable::URI.parse(url)
uri.query_values['u'] # => "http://example.com/yyy/zzz.jpg"

How to handle a json file return by the server with ruby?

I have a json file return by a web radio
require 'open-uri'
rquiire 'json'
songlist=open('http://douban.fm/j/mine/playlist?type=n&channel=0')
##this will return a json file:
##{"r":0,"song" [{"album":"\/subject\/25863639\/","picture":"http:\/\/img5.douban.com\/mpic\/s27256956.jpg","ssid":"7656","artist":"Carousel Kings","url":"http:\/\/mr3.douban.com\/201404122019\/660a1b4494a255e0333dfdc9ffadcf08\/view\/song\/small\/p2055547.mp3","company":"Not On Label","title":"Silence","rating_avg":3.73866,"length":194,"subtype":"","public_time":"2014","sid":"2055547","aid":"25863639","sha256":"ebf027adfaf9882118456941a774eeb509c29c4c278f55f587ba2faaa858a49d","kbps":"64","albumtitle":"Unity","like":false}]
I want to get the information like this song[0]['url'], song[0]['title'],song[0]['album']and using smplayer in terminal to play the song by pointed by url.
How can i do that with ruby?
Thanks.
I would use JSON.parse as below
require 'open-uri'
require 'json'
songlist = open('http://douban.fm/j/mine/playlist?type=n&channel=0').read
parsed_songlist = JSON.parse(songlist)
parsed_songlist["song"][0]["url"] #=> "http:\/\/mr3.douban.com\/201404122019\/660a1b4494a255e0333dfdc9ffadcf08\/view\/song\/small\/p2055547.mp3"
parsed_songlist["song"][0]["title"] #=> "Silence"

Testing filepicker.io security using Ruby

I'm trying to build a test that will allow me to exercise FilePicker.io security. The code is run as:
ruby test.rb [file handle]
and the result is the query string that I can append to a FilePicker URL. I'm pretty sure my policy is getting read properly, but my signature isn't. Can someone tell me what I'm doing wrong? Here's the code:
require 'rubygems'
require 'base64'
require 'cgi'
require 'openssl'
require 'json'
handle = ARGV[0]
expiry = Time::now.to_i + 3600
policy = {:handle=>handle, :expiry=>expiry, :call=>["pick","read", "stat"]}.to_json
puts policy
puts "\n"
secret = 'SECRET'
encoded_policy = CGI.escape(Base64.encode64(policy))
signature = OpenSSL::HMAC.hexdigest('sha256', secret, encoded_policy)
puts "?signature=#{signature}&policy=#{encoded_policy}"
The trick is to use Base64.urlsafe_encode64 instead of CGI.escape:
require 'rubygems'
require 'base64'
require 'cgi'
require 'openssl'
require 'json'
handle = ARGV[0]
expiry = Time::now.to_i + 3600
policy = {:handle=>handle, :expiry=>expiry}.to_json
puts policy
puts "\n"
secret = 'SECRET'
encoded_policy = Base64.urlsafe_encode64(policy)
signature = OpenSSL::HMAC.hexdigest('sha256', secret, encoded_policy)
puts "?signature=#{signature}&policy=#{encoded_policy}"
When tested with the sample values for expiry, handle, and secret in the Filepicker.io docs it returns same values as the python example.
I resolved this in my Ruby 1.8 environment by removing the CGI.escape and gsubbing out the newline:
Base64.encode64(policy).gsub("\n","")
elevenarms's answer is the best for Ruby 1.9 users, but you have to do something a bit kludgy like the above for Ruby 1.8. I'll accept his answer nonetheless, since most of us are or shortly will be in 1.9 these days.

Resources