Get all URLs from JSON with Ruby - ruby

how can I extract all URLs from JSON response with ruby?
I have an URL (test.testurl.de/test?p=12) which returns an JSON, e.g.
...
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
},
...
If I try to extract via:
test = open("test.testurl.de/test?p=12").read
puts URI.extract(test)
then I just get:
["http:", "http:", "http:"]
Can anybody tell me why I won't get the whole URLs?
Thx

I would recommend using an HTTP client, such as HTTParty, Typhoeus, or better yet Faraday.
However, if you want to roll your own use the JSON gem to parse the response with something like:
response = open("test.testurl.de/test?p=12").read
parsed = JSON.parse(response) rescue {}
parsed['images'].map { |image| image['path'] }

images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
...
...
Your string is not json, so you can't parse it as json. Is that really what's returned?
If I try to extract via:
test = open("test.testurl.de/test?p=12").read
puts URI.extract(test)
then I just get:
["http:", "http:", "http:"]
I get something different:
require 'uri'
str =<<END_OF_JUNK
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
}
]
END_OF_JUNK
p URI.extract(str)
--output:--
["path:", "http://static.mydomain.de/pics/z.jpg", "format:", "path:", "http://static.mydomain.de/pics/y.jpg", "format:", "path:", "http://static.mydomain.de/pics/x.jpg", "format:"]
With that output, I can do:
results = results.select do |url|
url.start_with? "http"
end
p results
--output:--
["http://static.mydomain.de/pics/z.jpg", "http://static.mydomain.de/pics/y.jpg", "http://static.mydomain.de/pics/x.jpg"]
But if what you posted is, say, part of a ruby hash that gets converted to json:
require 'json'
require 'uri'
hash = {
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
}
]
}
str = JSON.dump(hash)
p str
--output:--
"{\"images\":[{\"path\":\"http://static.mydomain.de/pics/z.jpg\",\"format\":\"image/jpeg\"},{\"path\":\"http://static.mydomain.de/pics/y.jpg\",\"format\":\"image/jpeg\"},{\"path\":\"http://static.mydomain.de/pics/x.jpg\",\"format\":\"image/jpeg\"}]}"
Then you can do this:
results = URI.extract(str)
p results
--output:--
["http://static.mydomain.de/pics/z.jpg", "http://static.mydomain.de/pics/y.jpg", "http://static.mydomain.de/pics/x.jpg"]

Related

Request : duplicateSheet with Google Spreadsheet API : badRequest: Must specify at least one request

I'm trying to duplicate a sheet using Google Spreadsheet API.
But I keep getting this error : badRequest: Must specify at least one request
I've tried a lot of things but nothing seems to work so far.
Here is what I have (ruby) :
request_body = Google::Apis::SheetsV4::BatchUpdateSpreadsheetRequest.new {
{
"includeSpreadsheetInResponse": false,
"requests": [
{
"duplicateSheet": {
"sourceSheetId": 1*********,
"insertSheetIndex": 2,
"newSheetId": 10,
"newSheetName": "*********"
}
}
],
"responseIncludeGridData": false,
"responseRanges": [
""
]}
}
response = service.batch_update_spreadsheet(spreadsheet_id, request_body)
I know the code is not over but I really can't figure out what is missing
Does anyone know what I need ? Many thanks in advance !!!
The new object should be enclosed with open and close parenthesis.
Your code should look like this:
request_body = Google::Apis::SheetsV4::BatchUpdateSpreadsheetRequest.new(
{
"includeSpreadsheetInResponse": false,
"requests": [
{
"duplicateSheet": {
"sourceSheetId": 1*********,
"insertSheetIndex": 2,
"newSheetId": 10,
"newSheetName": "*********"
}
}
],
"responseIncludeGridData": false,
"responseRanges": [
""
]}
)
Reference:
Ruby Object and Classes
In your script, you use the camel case. In the case of Ruby, please use the snake case as follows.
Modified script:
request_body = Google::Apis::SheetsV4::BatchUpdateSpreadsheetRequest.new(
{
include_spreadsheet_in_response: false,
requests: [
{
duplicate_sheet: {
source_sheet_id: 1*********,
insert_sheet_index: 2,
new_sheet_id: 10,
new_sheet_name: "*********",
}
}
],
response_include_grid_data: false,
response_ranges: [""]
})
response = service.batch_update_spreadsheet(spreadsheet_id, request_body)
Note:
As other patterns, you can also use the following scripts.
Pattern 2
request = Google::Apis::SheetsV4::Request.new
request.duplicate_sheet = {
source_sheet_id: 1*********,
insert_sheet_index: 2,
new_sheet_id: 10,
new_sheet_name: "*********",
}
request_body = Google::Apis::SheetsV4::BatchUpdateSpreadsheetRequest.new
request_body.include_spreadsheet_in_response = false
request_body.response_include_grid_data = false
request_body.response_ranges = [""]
request_body.requests = [request]
response = service.batch_update_spreadsheet(spreadsheet_id, request_body)
Pattern 3
request_body = {
include_spreadsheet_in_response: false,
requests: [{duplicate_sheet: {
source_sheet_id: 1*********,
insert_sheet_index: 2,
new_sheet_id: 10,
new_sheet_name: "*********",
}}],
response_include_grid_data: false,
response_ranges: [""],
}
response = service.batch_update_spreadsheet(spreadsheet_id, request_body, {})
Note:
In this answer, it supposes that your service can be used for using the batchUpdate method. Please be careful this.
Reference:
Method: spreadsheets.batchUpdate

Converting a json string to hashie mash

I have a web service that returns a json in the following format:
[
{
"key": "linux.ubuntu.ip",
"value": "10.10.10.10"
},
{
"key": "linux.ubuntu.hostname",
"value": "stageubuntu"
}
]
I have a ruby code that makes a call to this service and gets the json. Deep in this code, there is a variable configure of type Hashie::Mash.
I want to achieve this:
configure.linux.ubuntu.ip = 10.10.10.10 [Hashie::Mash]
configure.linux.ubuntu.hostname = stageubuntu [Hashie::Mash]
Could anybody tell me if it is possible to achieve this (w.r.t to the json output that I have)? If so, what is the best method to do it?
To get a JSON string to a Hashie::Mash object you can simply do:
require 'json'
require 'hashie'
json_str = '{ "foo": "bar" }'
ruby_hash = JSON.parse(json_str)
Hashie::Mash.new(ruby_hash)
For this specific problem, though not ideal (but we all have our unique use cases) you're needing to parse to an array, then extract the 'key's into nested Hashie::Mash objects of some unknown depth.
require 'json'
require 'hashie'
json_str = <<-JSON
[
{
"key": "linux.ubuntu.ip",
"value": "10.10.10.10"
},
{
"key": "linux.ubuntu.hostname",
"value": "stageubuntu"
}
]
JSON
parsed_arr = JSON.parse(json_str)
#=> [{"key"=>"linux.ubuntu.ip", "value"=>"10.10.10.10"}, {"key"=>"linux.ubuntu.hostname", "value"=>"stageubuntu"}]
configure = parsed_arr.map do |parsed_hash|
method_chain = parsed_hash['key'].split('.')
init_value = Hashie::Mash.new(method_chain.pop => parsed_hash['value'])
method_chain.reverse.inject(init_value) do |ret_value, method_name|
Hashie::Mash.new(method_name => ret_value)
end
end.inject(:merge) # <-- hashie is allows you to perform a deep merge into a single object here.
# you can now do
configure.linux.ubuntu.ip = 10.10.10.10 [Hashie::Mash]
#=> "10.10.10.10"
configure.linux.ubuntu.hostname
#=> "stageubuntu"

how can I iterate through this json document using ruby?

I have a ruby code block, as follows:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Hong Kong"
client = Elasticsearch::Client.new log: true
r = client.search index: 'candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
It produces multiple returns like this one:
{"_index":"candidates","_type":"data",
"_id":"AU3DyAmvtewNSFHuYn88",
"_score":3.889237,
"_source":{"first":"Kota","last":"Okayama","city":"Tokyo","designation":"Systems Engineer","email":"user#hotmail.co.jp","phone":"phone","country":"Japan","industry":"Technology","tags":["remarks","virtualization big data"]}}
I want to iterate through it and extract various elements. I have tried
data = JSON.parse(r)
data.each do |row|
puts row["_source"]["first"]
end
and the error is:
no implicit conversion of Hash into String (TypeError)
What's the best way forward on this chaps?
I have the solution, I hope it helps somebody else. It took me hours of fiddling and experimentation. Here it is:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Tokyo"
client = Elasticsearch::Client.new log: true
h = client.search index: 'swiss_candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
data = JSON.parse(h.to_json)
data["hits"]["hits"].each do |r|
puts r["_id"]
puts r["_source"]["first"]
puts r["_source"]["tags"][1]
puts r["_source"]["screened"][0]
end
The important thing seems to be to convert the elasticsearch result into something ruby friendly.
JSON.parse expects a String containing a JSON document, but you are passing it the Hash which was returned from client.search.
I'm not entirely sure what you are trying to achieve with that, why you want to parse something which is already a Ruby Hash into a Ruby Hash.

How can I access an array of objects from GData JSON in Ruby?

I am trying to write a Jekyll extension that will embed comments from a Blogger blog.
I am able to fetch the comments feed as JSON, and process it enough to pull out the total number of comments. However, I have not figured out how to process each comment in the feed.
json_url = "http://www.blogger.com/feeds/8505008/593465383646513269/comments/default/?alt=json"
json_rep = Net::HTTP.get_response(json_url)
json_rep = JSON.parse(json_rep.body)
json_rep['feed']['openSearch$totalResults']['$t'] # => "4"
json_rep['feed']['entry'].class # => Array
json_rep['feed']['entry'].length
# => Liquid Exception: undefined method `length' for nil:NilClass in post
This is my first time writing any code in Ruby. What am I doing wrong?
Here are the relevant parts of the JSON I am trying to parse.
{
"feed": {
"openSearch$totalResults": {
"$t": "4"
},
"entry": [
{
"id": {
"$t": "tag:blogger.com,1999:blog-8505008.post-491866073982779922"
},
"published": {
"$t": "2013-01-08T15:23:47.322-04:00"
},
"content": {
"type": "html",
"$t": "Recently, my sister has updated it more than I have. \u00dcber-fail on my part. :p"
}
}
]
}
}
This is what you should look at doing:
require 'rubygems'
require 'json'
require 'net/http'
require 'net/https'
require 'uri'
url = "http://www.blogger.com/feeds/8505008/593465383646513269/comments/default/?alt=json"
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
json_rep = JSON.parse(response.body)
puts json_rep['feed']['openSearch$totalResults']['$t']
entries = json_rep['feed']['entry']
entries.each do |entry|
puts entry["id"]["$t"]
#add what ever code you like here
end
This outputs:
4
tag:blogger.com,1999:blog-8505008.post-491866073982779922
tag:blogger.com,1999:blog-8505008.post-4792479891671746788
tag:blogger.com,1999:blog-8505008.post-4766604955439002209
tag:blogger.com,1999:blog-8505008.post-5484003770204916000

MongoDB returns empty array

I have a MongoDB that spits record onto a webpage
require 'mongo'
require 'json'
connection = Mongo::Connection.new
db = connection.db("salemDB")
db = Mongo::Connection.new.db("salemDB")
newsCollection = db["news"]
require 'sinatra'
set:port, 2222
get '/' do
redirect 'index.html'
end
get "/checkMail" do
newsCollection.find_one({}, {}).to_a.to_json
end
get "/:id" do
newsCollection.find("_id" => params[:id]).to_a.to_json
end
/checkmail outputs this
(formatted for reading pleasure)
[
[
"_id",
{
"$oid":"50880c8564a15e2631000001"
}
],
[
"date",
"2012-10-24T17:42:54+02:00"
],
[
"subject",
"This is a piece of news"
]
]
/50880c8564a15e2631000001 outputs this
[]
Why won't it give my object back?
That's because the id actually is not a string or Integer it's an BSON::ObjectId, so you have to query with one of those.
This should work
newsCollection.find("_id" => BSON::ObjectId(params[:id])).to_a.to_json

Resources