how can I iterate through this json document using ruby? - ruby

I have a ruby code block, as follows:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Hong Kong"
client = Elasticsearch::Client.new log: true
r = client.search index: 'candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
It produces multiple returns like this one:
{"_index":"candidates","_type":"data",
"_id":"AU3DyAmvtewNSFHuYn88",
"_score":3.889237,
"_source":{"first":"Kota","last":"Okayama","city":"Tokyo","designation":"Systems Engineer","email":"user#hotmail.co.jp","phone":"phone","country":"Japan","industry":"Technology","tags":["remarks","virtualization big data"]}}
I want to iterate through it and extract various elements. I have tried
data = JSON.parse(r)
data.each do |row|
puts row["_source"]["first"]
end
and the error is:
no implicit conversion of Hash into String (TypeError)
What's the best way forward on this chaps?

I have the solution, I hope it helps somebody else. It took me hours of fiddling and experimentation. Here it is:
require "elasticsearch"
require "json"
search_term = "big data"
city = "Tokyo"
client = Elasticsearch::Client.new log: true
h = client.search index: 'swiss_candidates', body:
{
query: {
bool: {
must: [
{
match: {
tags: search_term
}
},
{
match: {
city: city
}
}
]
}
}
}
data = JSON.parse(h.to_json)
data["hits"]["hits"].each do |r|
puts r["_id"]
puts r["_source"]["first"]
puts r["_source"]["tags"][1]
puts r["_source"]["screened"][0]
end
The important thing seems to be to convert the elasticsearch result into something ruby friendly.

JSON.parse expects a String containing a JSON document, but you are passing it the Hash which was returned from client.search.
I'm not entirely sure what you are trying to achieve with that, why you want to parse something which is already a Ruby Hash into a Ruby Hash.

Related

How to search date ranges using ElasticSearch and Searchkick?

I'm using Rails 4.1.4 with searchkick (1.3.0) so I can use elastic search.
I have User model
class User
searchkick
def search_data
{
name: username,
email: email,
created_at: created_at.strftime("%d-%m-%Y")
}
end
end
created_at in dd-MM-yyyy format. How can I search for date range, let's say date from 01-01-2014 to 01-01-2015.
I tired something like User.search('*', where: {created_at: {gte:'01-01-2014', lte: '01-01-2015' } }) without getting the right results.
Any help?
class User
searchkick
def search_data
{
name: username,
email: email,
created_at: created_at.to_time
}
end
end
Instead of
created_at: created_at.strftime("%d-%m-%Y")
Try
created_at: created_at.to_time
I am not familiar with Searchkick, but if you use the elasticsearch gem, you can easily make a search request to ElasticSearch like
query = {
query: {
bool: {
must: [
{
term: {
username: "Kevin"
}
},
{
term: {
email: "email#gmail.com"
}
},
{
range: {
created_at: {
lte: "2016-06-06",
gte: "2016-06-06",
format: "yyyy-MM-dd"
}
}
}
]
}
}
}
The search request will look like:
client = Elasticsearch::Client.new
client.search(
index: index_name,
body: {
query: query
}
)
Hope this helps!
Form a proper DateTime object and your code should work
just use
.to_datetime

Ruby mongoid aggregation return object

I am doing an mongodb aggregation using mongoid, using ModleName.collection.aggregate(pipeline) . The value returned is an array and not a Mongoid::Criteria, so if a do a first on the array, I get the first element which is of the type BSON::Document instead of ModelName. As a result, I am unable to use it as a model.
Is there a method to return a criteria instead of an array from the aggregation, or convert a bson document to a model instance?
Using mongoid (4.0.0)
I've been struggling with this on my own too. I'm afraid you have to build your "models" on your own. Let's take an example from my code:
class Searcher
# ...
def results(page: 1, per_page: 50)
pipeline = []
pipeline <<
"$match" => {
title: /#{#params['query']}/i
}
}
geoNear = {
"near" => coordinates,
"distanceField" => "distance",
"distanceMultiplier" => 3959,
"num" => 500,
"spherical" => true,
}
pipeline << {
"$geoNear" => geoNear
}
count = aggregate(pipeline).count
pipeline << { "$skip" => ((page.to_i - 1) * per_page) }
pipeline << { "$limit" => per_page }
places_hash = aggregate(pipeline)
places = places_hash.map { |attrs| Offer.new(attrs) { |o| o.new_record = false } }
# ...
places
end
def aggregate(pipeline)
Offer.collection.aggregate(pipeline)
end
end
I've omitted a lot of code from original project, just to present the way what I've been doing.
The most important thing here was the line:
places_hash.map { |attrs| Offer.new(attrs) { |o| o.new_record = false } }
Where both I'm creating an array of Offers, but additionally, manually I'm setting their new_record attribute to false, so they behave like any other documents get by simple Offer.where(...).
It's not beautiful, but it worked for me, and I could take the best of whole Aggregation Framework!
Hope that helps!

Delete nested hash according to key => value

I have this hash:
response = '{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]},{"id":2,"books":[{"id":1,"qty":0},{"id":2,"qty":3}]}]}'
in which I'd like to delete every librairies where, at least, one of the book quantity is null.
For instance, with this given response, I'd expect this return:
'{"librairies":[{"id":1,"books":[{"id":1,"qty":1},{"id":2,"qty":3}]}]}'
I've tried this:
parsed = JSON.parse(response)
parsed["librairies"].each do |library|
library["books"].each do |book|
parsed.delete(library) if book["qty"] == 0
end
end
but this returns the exact same response hash, without having deleted the second library (the one with id => 2).
You can use Array#delete_if and Enumerable#any? for this
# Move through each array element with delete_if
parsed["librairies"].delete_if do |library|
# evaluates to true if any book hash in the library
# has a "qty" value of 0
library["books"].any? { |book| book["qty"] == 0 }
end
Hope this helps
To avoid changing the hash parsed, you could do the following.
Firstly, let's format parsed so we can see what we're dealing with:
parsed = { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1 },
{ "id"=>2, "qty"=>3 } ]
},
{ "id"=>2,
"books"=>[ { "id"=>1, "qty"=>0 },
{ "id"=>2, "qty"=>3 } ]
}
]
}
Later I want to show that parsed has not been changed when we create the new hash. An easy way of doing that is to compute a hash code on parsed before and after, and see if it changes. (While it's not 100% certain that different hashes won't have the same hash code, here it's not something to lose sleep over.)
parsed.hash
#=> 852445412783960729
We first need to make a "deep copy" of parsed so that changes to the copy will not affect parsed. One way of doing that is to use the Marshal module:
new_parsed = Marshal.load(Marshal.dump(parsed))
We can now modify the copy as required:
new_parsed["libraries"].reject! { |h| h["books"].any? { |g| g["qty"].zero? } }
#=> [ { "id"=>1,
# "books"=>[ { "id"=>1, "qty"=>1 },
# { "id"=>2, "qty"=>3 }
# ]
# }
# ]
new_parsed # => { "libraries"=>[ { "id"=>1,
"books"=>[ { "id"=>1, "qty"=>1},
{ "id"=>2, "qty"=>3}
]
}
]
}
And we confirm the original hash was not changed:
parsed.hash
#=> 852445412783960729

Get all URLs from JSON with Ruby

how can I extract all URLs from JSON response with ruby?
I have an URL (test.testurl.de/test?p=12) which returns an JSON, e.g.
...
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
},
...
If I try to extract via:
test = open("test.testurl.de/test?p=12").read
puts URI.extract(test)
then I just get:
["http:", "http:", "http:"]
Can anybody tell me why I won't get the whole URLs?
Thx
I would recommend using an HTTP client, such as HTTParty, Typhoeus, or better yet Faraday.
However, if you want to roll your own use the JSON gem to parse the response with something like:
response = open("test.testurl.de/test?p=12").read
parsed = JSON.parse(response) rescue {}
parsed['images'].map { |image| image['path'] }
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
...
...
Your string is not json, so you can't parse it as json. Is that really what's returned?
If I try to extract via:
test = open("test.testurl.de/test?p=12").read
puts URI.extract(test)
then I just get:
["http:", "http:", "http:"]
I get something different:
require 'uri'
str =<<END_OF_JUNK
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
}
]
END_OF_JUNK
p URI.extract(str)
--output:--
["path:", "http://static.mydomain.de/pics/z.jpg", "format:", "path:", "http://static.mydomain.de/pics/y.jpg", "format:", "path:", "http://static.mydomain.de/pics/x.jpg", "format:"]
With that output, I can do:
results = results.select do |url|
url.start_with? "http"
end
p results
--output:--
["http://static.mydomain.de/pics/z.jpg", "http://static.mydomain.de/pics/y.jpg", "http://static.mydomain.de/pics/x.jpg"]
But if what you posted is, say, part of a ruby hash that gets converted to json:
require 'json'
require 'uri'
hash = {
images: [
{
path: "http://static.mydomain.de/pics/z.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/y.jpg",
format: "image/jpeg",
},
{
path: "http://static.mydomain.de/pics/x.jpg",
format: "image/jpeg",
}
]
}
str = JSON.dump(hash)
p str
--output:--
"{\"images\":[{\"path\":\"http://static.mydomain.de/pics/z.jpg\",\"format\":\"image/jpeg\"},{\"path\":\"http://static.mydomain.de/pics/y.jpg\",\"format\":\"image/jpeg\"},{\"path\":\"http://static.mydomain.de/pics/x.jpg\",\"format\":\"image/jpeg\"}]}"
Then you can do this:
results = URI.extract(str)
p results
--output:--
["http://static.mydomain.de/pics/z.jpg", "http://static.mydomain.de/pics/y.jpg", "http://static.mydomain.de/pics/x.jpg"]

Elegantly creating a hash from an array

I currently have some Ruby code that creates output like this (after conversion to JSON):
"days": [
{
"Jul-22": ""
},
{
"Aug-19": ""
}
],
What I want is output like this:
"days": {
"Jul-22": "",
"Aug-19": ""
},
Here is my code:
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect do |noteworthy_day|
{ noteworthy_day.date.to_s(:trends_id) => "" }
end
In other words I want a hash instead of an array of hashes. Here's my ugly solution:
days = {}
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each do |noteworthy_day|
days[noteworthy_day.date.to_s(:trends_id)] = ""
end
days
That seems very unrubylike, though. Can someone help me do this more efficiently?
Hash[
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).collect { |noteworthy_day|
[noteworthy_day.date.to_s(:trends_id), ""]
}
]
Or...
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).each_with_object(Hash.new) { |noteworthy_day, ndays|
ndays[noteworthy_day] = ""
}
This is a problem tailor made for Enumerable#inject
CalendarDay.in_the_past_30_days(patient).select(&:noteworthy?).inject({}) do |hash, noteworthy_day|
hash[noteworthy_day.date.to_s(:trends_id)] = ''
hash
end

Resources