How do I extract values from nested JSON? - ruby

After parsing some JSON:
data = JSON.parse(data)['info']
puts data
I get:
[
{
"title"=>"CEO",
"name"=>"George",
"columns"=>[
{
"display_name"=> "Salary",
"value"=>"3.85",
}
, {
"display_name"=> "Bonus",
"value"=>"994.19",
}
, {
"display_name"=> "Increment",
"value"=>"8.15",
}
]
}
]
columns has nested data in itself.
I want to save the data in a database or CSV file.
title, name, value_Salary, value_Bonus, value_increment
But I'm not concerned about getting display_name, so just the values of first of columns, second of columns data, etc.
Ok I tried data.map after converting to hash & hash.flatten could find a way out.. .map{|x| x['columns']}
.map {|s| s["value"]}
tried to get the values atleast separately - but couldnt...

This is a simple problem, and resolves down to a couple nested map blocks.
Here's the data retrieved from JSON, plus an extra row to demonstrate how easy it is to handle a more complex JSON response:
data = [
{
"title" => "CEO",
"name" => "George",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.85",
},
{
"display_name" => "Bonus",
"value" => "994.19",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
},
{
"title" => "CIO",
"name" => "Fred",
"columns" => [
{
"display_name" => "Salary",
"value" => "3.84",
},
{
"display_name" => "Bonus",
"value" => "994.20",
},
{
"display_name" => "Increment",
"value" => "8.15",
}
]
}
]
Here's the code:
records = data.map { |record|
title, name = record.values_at('title', 'name')
values = record['columns'].map{ |column| column['value'] }
[title, name, *values]
}
Here's the resulting data structure, an array of arrays:
records
# => [["CEO", "George", "3.85", "994.19", "8.15"],
# ["CIO", "Fred", "3.84", "994.20", "8.15"]]
Saving it into a database or CSV is left for you to figure out, but Ruby's CSV class makes it trivial to write a file, and an ORM like Sequel makes it really easy to insert the data into a database.

Related

I have json data i need search `unique` if key exist or not

I have JSON data I need search unique if the key exists or not.
[
{
"key1" => []
},
{
"key" => []
},
{
"unique" => []
}
]
I can use loop but need an efficient way to check unique exist or not
You'll need to iterate through the array either way.
# You'll get found item or `nil`
data.find { |item| item.key?('unique') }
# You'll get `true` or `false`
data.any? { |item| item.key?('unique') }
Btw better to use a hash as an input instead of an array:
data = {
"key1" => [],
"key" => [],
"unique" => []
}
data.key?('unique')
=> true

Created nested fields from Xpath & check for existing documents

I have two questions;
parsing xml data & adding it to an array in a record in an index
checking for an existing record in an index and if it exists add the new data of that record to the array of the existing record
I have an jdbc input that has an xml column,
input {
jdbc {
....
statement => "SELECT event_xml....
}
}
then an xml filter to parse the data,
How do i make the the last 3 xpaths to be an array? Do i need a mutate or ruby filter? I cant seem to figure it out
filter {
xml {
source => "event_xml"
remove_namespaces => true
store_xml => false
force_array => false
xpath => [ "/CaseNumber/text()", "case_number" ]
xpath => [ "/FormName/text()", "[conversations][form_name]" ]
xpath => [ "/EventDate/text()", "[conversations][event_date]" ]
xpath => [ "/CaseNote/text()", "[conversations][case_note]" ]
}
}
so it would something like this look like this in the Elastic search.
{
"case_number" : "12345",
"conversations" :
[
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
}
]
}
So second question is, if there is already a unique case_number of "12345" instead of creating a new record for this add the new xml values to the conversations array. so it would look like this
{
"case_number" : "12345",
"conversations" : [
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
},
{
"form_name" : "form2",
"event_date" : "2019-05-09T00:00:00Z",
"case_note" : "this is another case note"
}
]
}
my output filter
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
manage_template => false
}
}
Is this possible? thanks
this ruby filter created the array
ruby {
code => '
event.set("conversations", [Hash[
"publish_event_id", event.get("publish_event_id"),
"form_name", event.get("form_name"),
"event_date", event.get("event_date"),
"case_note", event.get("case_note")
]])
'
}
for the output was resolved by
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
document_id => "%{case_number}"
action => "update"
doc_as_upsert => true
script => "
boolean recordExists = false;
for (int i = 0; i < ctx._source.conversations.length; i++)
{
if(ctx._source.conversations[i].publish_event_id == params.event.get('conversations')[0].publish_event_id)
{
recordExists = true;
}
}
if(!recordExists){
ctx._source.conversations.add(params.event.get('conversations')[0]);
}
"
manage_template => false
}
}

Map keys with the same name

A GET to an API endpoint I'm working with returns json with an inconsistent order of contacts, either
{"contacts"=>[
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"},
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"}
]}
or
{"contacts"=>[
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"},
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"}
]}
The "type" values are the only static objects in these responses, so I'd like to map this so that the contact types are keys containing the other pairs:
{
"foo"=>{"id"=>$UUID_0, "name"=>$NAME_0, "email"=>$EMAIL_0, "phone"=>$PHONE_0},
"bar"=>{"id"=>$UUID_1, "name"=>$NAME_1, "email"=>$EMAIL_1, "phone"=>$PHONE_1}
}
A solution is not obvious to me.
If you use Ruby on Rails, or at least ActiveSupport, you can try index_by instead of group_by: it won't put the values into arrays.
hash['contacts'].index_by {|r| r['type']}
=>
{
"bar" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_1",
"phone" => "PHONE_1",
"type" => "bar"
},
"foo" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_0",
"phone" => "PHONE_0",
"type" => "foo"
}
}
Hash[data['contacts'].map { |c| [c['type'], c] }]
This can be done with Enumerable#reduce:
hash['contacts'].reduce({}) {|m,c| m[c['type']] = c;m}
How it works:
An empty hash is the starting point.
The block is called once for each element in the contacts list. The block receives the hash that we're building as m and the current contact as c.
In the block, assign c to the hash based on its type and return the hash so far.
Final result is the last return value of the block.

Multiple limit condition in mongodb

I have a collection in which one of the field is "type". I want to get some values of each type depending upon condition which is same for all the types. Like I want 2 documents for type A, 2 for type B like that.
How to do this in a single query? I am using Ruby Active Record.
Generally what you are describing is a relatively common question around the MongoDB community which we could describe as the "top n results problem". This is when given some input that is likely sorted in some way, how to get the top n results without relying on arbitrary index values in the data.
MongoDB has the $first operator which is available to the aggregation framework which deals with the "top 1" part of the problem, as this actually takes the "first" item found on a grouping boundary, such as your "type". But getting more than "one" result of course gets a little more involved. There are some JIRA issues on this about modifying other operators to deal with n results or "restrict" or "slice". Notably SERVER-6074. But the problem can be handled in a few ways.
Popular implementations of the rails Active Record pattern for MongoDB storage are Mongoid and Mongo Mapper, both allow access to the "native" mongodb collection functions via a .collection accessor. This is what you basically need to be able to use native methods such as .aggregate() which supports more functionality than general Active Record aggregation.
Here is an aggregation approach with mongoid, though the general code does not alter once you have access to the native collection object:
require "mongoid"
require "pp";
Mongoid.configure.connect_to("test");
class Item
include Mongoid::Document
store_in collection: "item"
field :type, type: String
field :pos, type: String
end
Item.collection.drop
Item.collection.insert( :type => "A", :pos => "First" )
Item.collection.insert( :type => "A", :pos => "Second" )
Item.collection.insert( :type => "A", :pos => "Third" )
Item.collection.insert( :type => "A", :pos => "Forth" )
Item.collection.insert( :type => "B", :pos => "First" )
Item.collection.insert( :type => "B", :pos => "Second" )
Item.collection.insert( :type => "B", :pos => "Third" )
Item.collection.insert( :type => "B", :pos => "Forth" )
res = Item.collection.aggregate([
{ "$group" => {
"_id" => "$type",
"docs" => {
"$push" => {
"pos" => "$pos", "type" => "$type"
}
},
"one" => {
"$first" => {
"pos" => "$pos", "type" => "$type"
}
}
}},
{ "$unwind" => "$docs" },
{ "$project" => {
"docs" => {
"pos" => "$docs.pos",
"type" => "$docs.type",
"seen" => {
"$eq" => [ "$one", "$docs" ]
},
},
"one" => 1
}},
{ "$match" => {
"docs.seen" => false
}},
{ "$group" => {
"_id" => "$_id",
"one" => { "$first" => "$one" },
"two" => {
"$first" => {
"pos" => "$docs.pos",
"type" => "$docs.type"
}
},
"splitter" => {
"$first" => {
"$literal" => ["one","two"]
}
}
}},
{ "$unwind" => "$splitter" },
{ "$project" => {
"_id" => 0,
"type" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.type",
"$two.type"
]
},
"pos" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.pos",
"$two.pos"
]
}
}}
])
pp res
The naming in the documents is actually not used by the code, and titles in the data shown for "First", "Second" etc, are really just there to illustrate that you are indeed getting the "top 2" documents from the listing as a result.
So the approach here is essentially to create a "stack" of the documents "grouped" by your key, such as "type". The very first thing here is to take the "first" document from that stack using the $first operator.
The subsequent steps match the "seen" elements from the stack and filter them, then you take the "next" document off of the stack again using the $first operator. The final steps in there are really justx to return the documents to the original form as found in the input, which is generally what is expected from such a query.
So the result is of course, just the top 2 documents for each type:
{ "type"=>"A", "pos"=>"First" }
{ "type"=>"A", "pos"=>"Second" }
{ "type"=>"B", "pos"=>"First" }
{ "type"=>"B", "pos"=>"Second" }
There was a longer discussion and version of this as well as other solutions in this recent answer:
Mongodb aggregation $group, restrict length of array
Essentially the same thing despite the title and that case was looking to match up to 10 top entries or greater. There is some pipeline generation code there as well for dealing with larger matches as well as some alternate approaches that may be considered depending on your data.
You will not be able to do this directly with only the type column and the constraint that it must be one query. However there is (as always) a way to accomplish this.
To find documents of different types, you would need to have some type of additional value that, on average distributed the types out according to how you want the data back.
db.users.insert({type: 'A', index: 1})
db.users.insert({type: 'B', index: 2})
db.users.insert({type: 'A', index: 3})
db.users.insert({type: 'B', index: 4})
db.users.insert({type: 'A', index: 5})
db.users.insert({type: 'B', index: 6})
Then when querying for items with db.users.find(index: {$gt: 2, $lt: 7}) you will have the right distribution of items.
Though I'm not sure this was what you were looking for

Compare three arrays of hashes and get the result without duplicates in ruby?

I m using the fql gem to retrieve the data from facebook. The original array of hashes is like this. Here. When i compare these three arrays of hashes then i want to get the final result in this way:
{
"photo" => [
[0] {
"owner" : "1105762436",
"src_big" : "https://fbcdn-sphotos-b-a.akamaihd.net/hphotos-ak-xap1/t31.0-8/q71/s720x720/10273283_10203050474118531_5420466436365792507_o.jpg",
"caption" : "Rings...!!\n\nView Full Screen.",
"created" : 1398953040,
"modified" : 1398953354,
"like_info" : {
"can_like" : true,
"like_count" : 22,
"user_likes" : true
},
"comment_info" : {
"can_comment" : true,
"comment_count" : 2,
"comment_order" : "chronological"
},
"object_id" : "10203050474118531",
"pid" : "4749213500839034982"
}
],
"comment" => [
[0] {
"text" : "Wow",
"text_tags" : [],
"time" : 1398972853,
"likes" : 1,
"fromid" : "100001012753267",
"object_id" : "10203050474118531"
},
[1] {
"text" : "Woww..",
"text_tags" : [],
"time" : 1399059923,
"likes" : 0,
"fromid" : "100003167704574",
"object_id" : "10203050474118531"
}
],
"users" =>[
[0] {
"id": "1105762436",
"name": "Nilanjan Joshi",
"username": "NilaNJan219"
},
[1] {
"id": "1105762436",
"name": "Ashish Joshi",
"username": "NilaNJan219"
}
]
}
Here is my attempt:
datas = File.read('source2.json')
all_data = JSON.parse(datas)
photos = all_data[0]['fql_result_set'].group_by{|x| x['object_id']}.to_a
comments = all_data[1]['fql_result_set'].group_by{|x| x['object_id']}.to_a
#photos_comments = []
#comments_users = []
#photo_users = []
photos.each do |a|
comments.each do |b|
if a.first == b.first
#photos_comments << {'photo' => a.last, 'comment' => b.last}
else
#comments_users << {'photo' => a.last, 'comment' => ''} unless #photos_comments.include? (a.last)
end
end
end
#photo_users = #photos_comments | #comments_users
#photo_comment_users = {photos_comments: #photo_users }
Here is what i'm getting final result
Still there are duplicates in the final array. I've grouped by the array by object id which is common between the photo and the comment array. But the problem it is only taking those photos which has comments. I'm not getting the way how to find out the photos which don't have the comments.
Also in order to find out the details of the person who has commented, ive users array and the common attribute between comments and users is fromid and id. I'm not able to understand how to get the user details also.
I think this is what you want:
photos = all_data[0]['fql_result_set']
comments = all_data[1]['fql_result_set'].group_by{|x| x['object_id']}
#photo_comment_users = photos.map do |p|
{ 'photo' => p, 'comment' => comments[p['object_id']] || '' }
end
For each photo it takes all the comments with the same object_id, or if none exist - returns ''.
If you want to connect the users too, you can map them by id, and select the relevant ones by the comment:
users = Hash[all_data[2]['fql_result_set'].map {|x| [x['id'], x]}]
#photo_comment_users = photos.map do |p|
{ 'photo' => p, 'comment' => comments[p['object_id']] || '',
'user' => (comments[p['object_id']] || []).map {|c| users[c['formid']]} }
end

Resources