How to parse a URL using Ruby

How to parse a URL using Ruby - ruby

Hi
how i print
http://site.tf/home/
from
http://site.tf/home/index.php?id=12
using ruby parse url

Do like this
require 'uri'
uri = URI.parse('http://site.tf/home/index.php?id=12')
"#{uri.scheme}://#{uri.host}/#{uri.path.split('/')[1]}"
#=> "http://site.tf/home/"
Didn't tested though.I guess it should work fine
Update
If you want just site.tf,just do like this
require 'uri'
uri = URI.parse('http://site.tf/home/index.php?id=12')
uri.host.split('/').first
#=> "site.tf"

Related

want to get taobao's list of URL of products on search result page without taobao API

I want to get taobao's list of URL of products on search result page without taobao API.
I tried following Ruby script.
require "open-uri"
require "rubygems"
require "nokogiri"
url='https://world.taobao.com/search/search.htm?_ksTS=1517338530524_300&spm=a21bp.7806943.20151106.1&search_type=0&_input_charset=utf-8&navigator=all&json=on&q=%E6%99%BA%E8%83%BD%E6%89%8B%E8%A1%A8&cna=htqfEgp0pnwCATyQWEDB%2FRCE&callback=__jsonp_cb&abtest=_AB-LR517-LR854-LR895-PR517-PR854-PR895'
charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end
doc = Nokogiri::HTML.parse(html, nil, charset)
p doc.xpath('//*[#id="list-itemList"]/div/div/ul/li[1]/div/div[1]/div/a/#href').each{|i| puts i.text}
# => 0
I want to get list of URL like https://click.simba.taobao.com/cc_im?p=%D6%C7%C4%DC%CA%D6%B1%ED&s=328917633&k=525&e=lDs3%2BStGrhmNjUyxd8vQgTvfT37ERKUkJtUYVk0Fu%2FVZc0vyfhbmm9J7EYm6FR5sh%2BLS%2FyzVVWDh7%2FfsE6tfNMMXhI%2B0UDC%2FWUl0TVvvELm1aVClOoSyIIt8ABsLj0Cfp5je%2FwbwaEz8tmCoZFXvwyPz%2F%2ByQnqo1aHsxssXTFVCsSHkx4WMF4kAJ56h9nOp2im5c3WXYS4sLWfJKNVUNrw%2BpEPOoEyjgc%2Fum8LOuDJdaryOqOtghPVQXDFcIJ70E1c5A%2F3bFCO7mlhhsIlyS%2F6JgcI%2BCdFFR%2BwwAwPq4J5149i5fG90xFC36H%2B6u9EBPvn2ws%2F3%2BHHXRqztKxB9a0FyA0nyd%2BlQX%2FeDu0eNS7syyliXsttpfoRv3qrkLwaIIuERgjVDODL9nFyPftrSrn0UKrE5HoJxUtEjsZNeQxqovgnMsw6Jeaosp7zbesM2QBfpp6NMvKM5e5s1buUV%2F1AkICwRxH7wrUN4%2BFn%2FJ0%2FIDJa4fQd4KNO7J5gQRFseQ9Z1SEPDHzgw%3D however I am getting 0
What should I do?

I don't know taobao.com but the page seems like its running lots of javascript. So perhaps the content can actually not be retrieved with a client without javascript capabilities. So instead of open-uri, you could try the gem selenium-webdriver:
https://rubygems.org/gems/selenium-webdriver/versions/2.53.4

Extracting all URLs from a page using Ruby [duplicate]

This question already has answers here:
How to extract URLs from text
(6 answers)
Closed 8 years ago.
I am trying to extract all the URLs from the raw output of some Ruby code:
require 'open-uri'
reqt = open("http://www.google.com").read
reqt.each_line { |line|
if line =~/http/ then
puts URI.extract(line)
end }
What am I doing wrong? I am getting extra lines along with URLs.

Remember the URL doesn't have to start with "http" - it could be a relative URL, the path to the current page. IMO it is the best to use Nokogiri to parse the HTML:
require 'open-uri'
require 'nokogiri'
reqt = open("http://www.google.com")
doc = Nokogiri::HTML(reqt)
doc.xpath('//a[#href]').each do |a|
puts a.attr('href')
end
But if you really want to find only the absolute URLs, add a simple condition:
puts a.attr('href') if a.attr('href') =~ /^http/i

You can do this instead:
require 'open-uri'
reqt = open("http://www.google.com").read
urls = reqt.scan(/[[:lower:]]+:\/\/[^\s"]+/)

What is a good way to extract a url within a url in Ruby?

Given url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
What is a good way to extract http://example.com/yyy/zzz.jpg?
EDIT:
I would like to extract the second url.

I'd do :-
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI(url)
URI.decode_www_form(uri.query).select { |_,b| b[/^http(s)?/] }.map(&:last)
# => ["http://example.com/yyy/zzz.jpg"]
# or something like
Hash[URI.decode_www_form(uri.query)]['u'] # => "http://example.com/yyy/zzz.jpg"

require "uri"
URI.extract("text here http://foo.example.org/bla and here mailto:test#example.com and here also.")
# => ["http://foo.example.org/bla", "mailto:test#example.com"]
http://www.ruby-doc.org/stdlib-2.1.1/libdoc/uri/rdoc/URI.html

Using Ruby 2.0+:
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI.parse(url)
URI.decode_www_form(uri.query).to_h['u'] # => "http://example.com/yyy/zzz.jpg"
For Ruby < 2.0:
require 'uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = URI.parse(url)
Hash[URI.decode_www_form(uri.query)]['u'] # => "http://example.com/yyy/zzz.jpg"
The Addressable gem is very full-featured, and follows the specs better than URI. The same thing can be done using:
require 'addressable/uri'
url = 'http://www.foo.com/bar?u=http://example.com/yyy/zzz.jpg&aaa=bbb&ccc=ddd'
uri = Addressable::URI.parse(url)
uri.query_values['u'] # => "http://example.com/yyy/zzz.jpg"

How to handle a json file return by the server with ruby?

I have a json file return by a web radio
require 'open-uri'
rquiire 'json'
songlist=open('http://douban.fm/j/mine/playlist?type=n&channel=0')
##this will return a json file:
##{"r":0,"song" [{"album":"\/subject\/25863639\/","picture":"http:\/\/img5.douban.com\/mpic\/s27256956.jpg","ssid":"7656","artist":"Carousel Kings","url":"http:\/\/mr3.douban.com\/201404122019\/660a1b4494a255e0333dfdc9ffadcf08\/view\/song\/small\/p2055547.mp3","company":"Not On Label","title":"Silence","rating_avg":3.73866,"length":194,"subtype":"","public_time":"2014","sid":"2055547","aid":"25863639","sha256":"ebf027adfaf9882118456941a774eeb509c29c4c278f55f587ba2faaa858a49d","kbps":"64","albumtitle":"Unity","like":false}]
I want to get the information like this song[0]['url'], song[0]['title'],song[0]['album']and using smplayer in terminal to play the song by pointed by url.
How can i do that with ruby?
Thanks.

I would use JSON.parse as below
require 'open-uri'
require 'json'
songlist = open('http://douban.fm/j/mine/playlist?type=n&channel=0').read
parsed_songlist = JSON.parse(songlist)
parsed_songlist["song"][0]["url"] #=> "http:\/\/mr3.douban.com\/201404122019\/660a1b4494a255e0333dfdc9ffadcf08\/view\/song\/small\/p2055547.mp3"
parsed_songlist["song"][0]["title"] #=> "Silence"

Testing filepicker.io security using Ruby

I'm trying to build a test that will allow me to exercise FilePicker.io security. The code is run as:
ruby test.rb [file handle]
and the result is the query string that I can append to a FilePicker URL. I'm pretty sure my policy is getting read properly, but my signature isn't. Can someone tell me what I'm doing wrong? Here's the code:
require 'rubygems'
require 'base64'
require 'cgi'
require 'openssl'
require 'json'
handle = ARGV[0]
expiry = Time::now.to_i + 3600
policy = {:handle=>handle, :expiry=>expiry, :call=>["pick","read", "stat"]}.to_json
puts policy
puts "\n"
secret = 'SECRET'
encoded_policy = CGI.escape(Base64.encode64(policy))
signature = OpenSSL::HMAC.hexdigest('sha256', secret, encoded_policy)
puts "?signature=#{signature}&policy=#{encoded_policy}"

The trick is to use Base64.urlsafe_encode64 instead of CGI.escape:
require 'rubygems'
require 'base64'
require 'cgi'
require 'openssl'
require 'json'
handle = ARGV[0]
expiry = Time::now.to_i + 3600
policy = {:handle=>handle, :expiry=>expiry}.to_json
puts policy
puts "\n"
secret = 'SECRET'
encoded_policy = Base64.urlsafe_encode64(policy)
signature = OpenSSL::HMAC.hexdigest('sha256', secret, encoded_policy)
puts "?signature=#{signature}&policy=#{encoded_policy}"
When tested with the sample values for expiry, handle, and secret in the Filepicker.io docs it returns same values as the python example.

I resolved this in my Ruby 1.8 environment by removing the CGI.escape and gsubbing out the newline:
Base64.encode64(policy).gsub("\n","")
elevenarms's answer is the best for Ruby 1.9 users, but you have to do something a bit kludgy like the above for Ruby 1.8. I'll accept his answer nonetheless, since most of us are or shortly will be in 1.9 these days.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to parse a URL using Ruby - ruby

Hi how i print http://site.tf/home/ from http://site.tf/home/index.php?id=12 using ruby parse url

Related

want to get taobao's list of URL of products on search result page without taobao API

Extracting all URLs from a page using Ruby [duplicate]

What is a good way to extract a url within a url in Ruby?

How to handle a json file return by the server with ruby?

Testing filepicker.io security using Ruby

Categories

Resources