Mongodb java - how to delete particular document - mongodb-java

{
CONTENT1:{
YDXM:[{
"name":"1",
"MBNH":"1"}
{"name":"2",
"MBNH":"2"}]
}
I want to delete the {"name":"1","MBNH":"1"}. How can I achieve this?

Assuming that the following is your document and you want to delete the ENTIRE document:
{
"CONTENT1": {
"YDXM": [
{
"name": "1",
"MBNH": "1"
},
{
"name": "2",
"MBNH": "2"
}
]
}
}
You could use this:
db.test.remove({"CONTENT1.YDXM.name" : "1", "CONTENT1.YDXM.MBNH" : "1"})
Now, if you want to extract the document {"name" : "1", "MBNH" : "1"} from the CONTENT1.YDXM array, you should use the $pull operator:
db.test.update({"CONTENT1.YDXM.name" : "1", "CONTENT1.YDXM.MBNH" : "1"}, { $pull : { "CONTENT1.YDXM" : {"name" : "1", "MBNH" : "1"} } }, false, true)
This will perform an update in all documents that matches with the first argument. The second argument, with the $pull operator, means that the mongodb will remove the value {"name" : "1", "MBNH" : "1"} from CONTENT1.YDXM array.
You could read more about $pull operator and the update command in this links:
http://docs.mongodb.org/manual/reference/operator/pull/
http://docs.mongodb.org/manual/applications/update/

Related

How to fetch field from array of objects in Elasticsearch Index as CSV file to Google Cloud Storage Using Logstash

I am using ElasticSearch to index data and wanted to export few fields from index created every day to Google cloud storage, How to get fields from array of objects in elastic search index and send them as csv file to GCS bucket using Logstash
Tried below conf to fetch nested fields from index:
input {
elasticsearch {
hosts => "host:443"
user => "user"
ssl => true
connect_timeout_seconds => 600
request_timeout_seconds => 600
password => "pwd"
ca_file => "ca.crt"
index => "test"
query => '
{
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy.categoryUrl"],
"query": {
"match_all": {}
}
}
'
}
}
filter {
mutate {
rename => {
"[obj1][Name]" => "col1"
"[obj1][addr]" => "col2"
"[obj1][obj2][location]" => "col3"
"[Hierarchy][0][categoryUrl]" => "col4"
}
}
}
output {
google_cloud_storage {
codec => csv {
include_headers => true
columns => [ "col1", "col2","col3"]
}
bucket => "bucket"
json_key_file => "creds.json"
temp_directory => "/tmp"
log_file_prefix => "log_gcs"
max_file_size_kbytes => 1024
date_pattern => "%Y-%m-%dT%H:00"
flush_interval_secs => 600
gzip => false
uploader_interval_secs => 600
include_uuid => true
include_hostname => true
}
}
How to get field populated to above csv from array of objects, in below example wanted to fetch categoryUrl from the first object of an array and populate to csv table and send it to GCS Bucket:
have tried below approaches :
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy.categoryUrl"]
and
"_source": ["obj1.Name","obj1.addr","obj1.obj2.location", "Hierarchy[0].categoryUrl"]
with
mutate {
rename => {
"[obj1][Name]" => "col1"
"[obj1][addr]" => "col2"
"[obj1][obj2][location]" => "col3"
"[Hierarchy][0][categoryUrl]" => "col4"
}
for input sample :
"Hierarchy" : [
{
"level" : "1",
"category" : "test",
"categoryUrl" : "testurl1"
},
{
"level" : "2",
"category" : "test2",
"categoryUrl" : "testurl2"
}}
Attaching sample document where I am trying to fetch merchandisingHierarchy[0].categoryUrl and pricingInfo[0].basePrice :
{
"_index" : "amulya-test",
"_type" : "_doc",
"_id" : "ldZPJoYBFi8LOEDK_M2f",
"_score" : 1.0,
"_ignored" : [
"itemDetails.description.keyword"
],
"_source" : {
"itemDetails" : {
"compSku" : "202726",
"compName" : "abc.com",
"compWebsite" : "abc.com",
"title" : "Monteray 38.25 in. x 73.375 in. Frameless Hinged Corner Shower Enclosure in Brushed Nickel",
"description" : "Create the modthroom of your dreams with the clean lines of the VIGO Monteray Frameless Shower Enclosure. Solid 3/8 in. tempered glass combined with stainless steel and solid brass construction makes this enclosure strong and long-lasting. The sleek, reversible, outward-opening door features a convenient towel bar. This versatile enclosure can be installed on a tile floor or with a VIGO Shower Base. With a single water deflector along the bottom seal strip, water is redirected back into the shower to keep your bathroom dry, clean, and pristine.",
"modelNumber" : "VG6011BNCL40",
"upc" : "8137756684",
"hasVariations" : false,
"productDetailsBulletPoints" : [ ],
"itemUrls" : {
"productPageUrl" : "https://.abc.com/p/VIGO-Monteray-38-in-x-73-375-in-Frameless-Hinged-Corner-Shower-Enclosure-in-Brushed-Nickel-VG6011BNCL40/202722616",
"primaryImageUrl" : "https://images.thdstatic.com/productImages/d77d9e8b-1ea1-4811-a470-8364c8e47402/svn/vigo-shower-enclosures-vg6011bncl40-64_600.jpg",
"secondaryImageUrls" : [
"https://images.thdstatic.com/productImages/d77d9e8b-1e1-4811-a470-8364c8e47402/svn/vigo-shower-enclosures-vg6011bncl40-64_1000.jpg",
"https://images.thdstatic.com/productImages/db539ff9-6df-48c2-897a-18dd1e1794e3/svn/vigo-shower-enclosures-vg6011bncl40-e1_1000.jpg",
"https://images.thdstatic.com/productImages/47c5090b-49a-46bc-a36d-921ddae5e1ab/svn/vigo-shower-enclosures-vg6011bncl40-40_1000.jpg",
"https://images.thdstatic.com/productImages/add6691c-a02-466d-9a1a-47200b05685e/svn/vigo-shower-enclosures-vg6011bncl40-a0_1000.jpg",
"https://images.thdstatic.com/productImages/d638230e-0d9-40c9-be93-7f7bf24f0732/svn/vigo-shower-enclosures-vg6011bncl40-1d_1000.jpg"
]
}
},
"merchandisingHierarchy" : [
{
"level" : "1",
"category" : "Home",
"categoryUrl" : "host/"
},
{
"level" : "2",
"category" : "Bath",
"categoryUrl" : "host/b/Bath/N-5yc1vZbzb3"
},
{
"level" : "3",
"category" : "Showers",
"categoryUrl" : "host/b/Bath-Showers/N-5yc1vZbzcd"
},
{
"level" : "4",
"category" : "Shower Doors",
"categoryUrl" : "host/b/Bath-Showers-Shower-Doors/N-5yc1vZbzcg"
},
{
"level" : "5",
"category" : "Shower Enclosures",
"categoryUrl" : "host/b/Bath-Showers-Shower-Doors-Shower-Enclosures/N-5yc1vZcbn2"
}
],
"reviewsAndRatings" : {
"pdtReviewCount" : 105
},
"additionalAttributes" : {
"isAddon" : false
},
"productSpecifications" : {
"Warranties" : { },
"Details" : { },
"Dimensions" : { }
},
"promoDetails" : [
{
"promoName" : "Save $150.00 (15%)",
"promoPrice" : 849.9
}
],
"locationDetails" : { },
"storePickupDetails" : {
"deliveryText" : "Get it by Mon, Feb 20",
"toEddDate" : "Mon, Feb 20",
"isBackordered" : false,
"selectedEddZipcode" : "20147",
"shipToStoreEnabled" : true,
"homeDeliveryEnabled" : true,
"scheduledDeliveryEnabled" : false
},
"recommendedProducts" : [ ],
"pricingInfo" : [
{
"type" : "SAS",
"offerPrice" : 849.9,
"sellerName" : "abc.com",
"onClearance" : false,
"inStock" : true,
"isBuyBoxWinner" : true,
"promo" : [
{
"onPromo" : true,
"promoName" : "Save $150.00 (15%)",
"promoPrice" : 849.9
}
],
"basePrice" : 999.9,
"priceVariants" : [
{
"basePrice" : 999.9,
"offerPrice" : 849.9
}
],
"inventoryDetails" : {
"stockInStore" : false,
"stockOnline" : true
}
}
]
}
}
You can do it like this:
input {
elasticsearch {
...
query => '
{
"_source": ["merchandisingHierarchy.categoryUrl"],
"query": {
"match_all": {}
}
}
'
}
}
filter {
mutate {
add_field => {
"col1" => "%{[merchandisingHierarchy][0][categoryUrl]}"
"col2" => "%{[pricingInfo][0][basePrice]}"
}
}
}
output {
stdout {
codec => csv {
include_headers => true
columns => [ "col1"]
}
}
}
I've tested with your sample document and I get the output below, which looks like is working per your expectation:
col1,col2
host/,999.9

Add a json attribute in a flow content with Jolt Transformation or alternative way in NiFi

I need to add this attribute named 'metadata' to json flow content.
The attribute 'metadata' is like:
{"startTime":1451952013663, "endTime":1453680013663, "name":"Npos19", "deleted":false}
The input is like this:
{
"id": 154299613718447,
"values": [
{
"timestamp": 1451977869683,
"value": 13.1
},
{
"timestamp": 1453949805784,
"value": 7.54
}
]
}
My goal is like:
{
"id": 154299613718447,
"values": [ {
"startTime":1451952013663,
"endTime":1453680013663,
"name":"Npos19",
"deleted":false,
"timestamp": 1451977869683,
"value": 13.1
},
{
"startTime":1451952013663,
"endTime":1453680013663,
"name":"Npos19",
"deleted":false,
"timestamp": 1453949805784,
"value": 7.54
}
]
}
I tried to use the Jolt Transformation:
{
"operation": "default",
"spec": {
// extract metadata array from json attribute and put it in a temporary array
"tempArray": "${metadata:jsonPath('$.*')}"
}
}
but it does not work. I need to extract metadata array with $.* because I do not know what keys will be present.
Is there an alternative fast way with other nifi processors to merge the attribute with flow content?
thanks in advance
It's possible with combination of two processors: EvaluateJsonPath ->ScriptedTransformRecord.
EvaluateJsonPath
Destination: flowfile-attribute
Return Type: json
values (dynamic property): $.values
ScriptedTransformRecord
Record Reader: JsonTreeReader
Record Writer: JsonRecordSetWriter
Script Language: Groovy
Script Body:
def mapMetadata = new groovy.json.JsonSlurper().parseText(attributes['metadata'])
def mapValue = new groovy.json.JsonSlurper().parseText(attributes['values'])
def values = mapValue.each { value ->
mapMetadata.each { k, v ->
value."${k}" = v
}
}
record.setValue('values', null)
record.setValue('updateValues', values)
record
Output json
[ {
"id" : 154299613718447,
"values" : null,
"updateValues" : [ {
"timestamp" : 1451977869683,
"value" : 13.1,
"startTime" : 1451952013663,
"endTime" : 1453680013663,
"name" : "Npos19",
"deleted" : false
}, {
"timestamp" : 1453949805784,
"value" : 7.54,
"startTime" : 1451952013663,
"endTime" : 1453680013663,
"name" : "Npos19",
"deleted" : false
} ]
} ]

Transform string to json in Laravel api

Two month ago, i created an api in laravel and tested it with postman. Everything worked fine.
Now i would continue to develop, but i can't access the elements like before.
Postman:
Body:
{
"RFQ" : "123",
"client_id": "2",
"ITEMS": [
{
"material" : "1.234.565",
"description" : "Test material 1",
"quantity" : "2.123",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "2018-01-12",
},
{
"material" : "9.87564.2",
"description" : "Test material 2",
"quantity" : "4",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "15.01.2018"
}
]
}
Controller:
public function import(ImportRequestForQuoteRequest $request, $id)
{
return $request->getContent();
}
Before, i was able to get as example the client_id like $request->client_idbut now it returns nothing.
If i return $request->getContent()i get a string like the body.
What i have to do to access the values?
Try to return it like this
public function import(ImportRequestForQuoteRequest $request, $id)
{
return response()->json($request->getContent());
}
Source docs: https://laravel.com/docs/5.6/responses#json-responses
You can try this in you controller...
use Illuminate\Http\Request;
public function myFunction(Request $request, $id)
{
$post_param = $request->post_param;
$default = '';
$post_param = $request->input('client_id', $default);
$route_param = $id;
return response()->json(['params' => $request->all()]);
}
you need to decode json to php array and then you able to access like normal array in php
$item = json_decode($request->ITEMS);
Your Body:
{
"RFQ" : "123",
"client_id": "2",
"ITEMS": [
{
"material" : "1.234.565",
"description" : "Test material 1",
"quantity" : "2.123",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "2018-01-12",
},
{
"material" : "9.87564.2",
"description" : "Test material 2",
"quantity" : "4",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "15.01.2018"
}
]
}
Change with :
{
"RFQ" : "123",
"client_id": "2",
"ITEMS": [
{
"material" : "1.234.565",
"description" : "Test material 1",
"quantity" : "2.123",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "2018-01-12"
},
{
"material" : "9.87564.2",
"description" : "Test material 2",
"quantity" : "4",
"Quot. Deadline" : "2018-01-12",
"delivery_date" : "15.01.2018"
}
]
}
json array should be well-formed. So try to remove comma from array at "delivery_date" of first item.
and you will get results using $request->all().
Hope this solves.
I did the same request with the php storm REST Client. With the same body and headers, everything works fine.
Postman adds before and after the content """ I don't know why.
So the error is anywhere in Postman

Ruby MongodB - improving speed when working with multiple collections

I'm using MongoDB with Ruby using mongo gem.
I have the following scenario:
for each document in a collection say coll1, look at key1 and key2
search for document in another collection say coll2 with matching values for key1 and key2
if there is a match, add document fetched in #2 with a new key key3 whose value be set to value of key3 in the document referenced in #1
insert the updated hash into a new collection coll3
The general guideline with MongoDB has been to handle cross collection operations in application code.
So I do the following:
client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db,
:server_selection_timeout => 5)
cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id
cursor.each do |doc|
doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method
if(doc_coll2 && doc[:key3])
doc_coll2[:key3] = doc[:key3]
doc_coll2.delete(:_id) # remove key :_id
client[:coll3].insert_one(doc_coll2)
end
end
This works, but it takes a lot of time to finish this job - approximately 250ms per document in collection coll1 or 3600s (1 hour) for ~15000 records, which seems a lot, which could be associated with reading the document one at a time, do the check in app code and then writing one doc at a time back to a new collection.
Is there a way to get this operation be done faster? Is the way I'm doing even the right way to do it?
Example documents
coll1
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876"
}
coll2
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
coll3
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
A solution would be to use aggregation instead, and do this in one single query:
perform a join on key1 field with $lookup
unwind the array with $unwind
keep doc where coll1.key2 == coll2.key2 with $redact
reformat the document with $project
write it to coll3 with $out
so the query would be :
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{
$project: {
key1: 1,
key2: 1,
key3: 1,
key4: "$coll2_doc.key4",
key5: "$coll2_doc.key5",
key6: "$coll2_doc.key6",
key7: "$coll2_doc.key7",
key8: "$coll2_doc.key8",
key9: "$coll2_doc.key9",
}
},
{$out: "coll3"}
], {allowDiskUse: true} );
and db.coll3.find() would return
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
Edit: MongoDB 3.4 solution
If you don't want to specify all keys in the $project stage, you can take advantage of $addFields and $replaceRoot, two new operators introduced in MongoDB 3.4
the query would become:
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{$addFields: {"coll2_doc.key3": "$key3" }},
{$replaceRoot: {newRoot: "$coll2_doc"}},
{$out: "coll3"}
], {allowDiskUse: true})
After toying around this for sometime, realized that index were not added. Adding index reduces the query run time by orders of magnitude.
To add index, do the following.
db.coll1.ensureIndex({"key1": 1, "key2": 1});
db.coll2.ensureIndex({"key1": 1, "key2": 1});
Using index the overall query run time came to 1/10xxxxxxth of what it was earlier.
The learning is that while working with large data sets, index the fields used for find - that itself reduces query run time a lot.

importing simplified XML file into mongodb using crack gem

Here, I'm using mongodb driver for ruby. But after this will work perfect I want to run it as a scheduled task in Ruby on Rails 3 with Mongoid ODB.
So for now, I'm experimenting in ruby.
I've noticed crack gem is very convenient when it comes to convert XML file into the format that can be inserted into mongodb. When I use mongodb driver for ruby, crack converts to the format close to JSON (it's using "=>" instead of ":" columns), which is required condition before I will insert it into mondodb database as shown here.
The problem the way I'm using crack below it imports everything that is in XML file.
Please see below.
sample.xml
<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
<Envelope>
<TransmissionDateTime>05/08/2013 23:06:02</TransmissionDateTime>
</Envelope>
<Message>
<Comment />
<Header>
<MemberId>A00000001</MemberId>
<MemberName>Bruce</MemberName>
<DeliveryId>6377935</DeliveryId>
<ShipToAddress1>123-4567</ShipToAddress1>
<OrderDate>05/08/13</OrderDate>
<Payments>
<PayType>Credit Card</PayType>
<Amount>1000</Amount>
</Payments>
<Payments>
<PayType>Points</PayType>
<Amount>5390</Amount>
</Payments>
</Header>
<Line>
<LineNumber>3.1</LineNumber>
<Item>fruit-004</Item>
<Description>Peach</Description>
<Quantity>1</Quantity>
<UnitCost>1610</UnitCost>
<DeclaredValue>0</DeclaredValue>
<PointValue>13</PointValue>
</Line>
<Line>
<LineNumber>8.1</LineNumber>
<Item>fruit-001</Item>
<Description>Fruit Set</Description>
<Quantity>1</Quantity>
<UnitCost>23550</UnitCost>
<PointValue>105</PointValue>
<PickLine>
<PickLineNumber>8.1..1</PickLineNumber>
<PickItem>fruit-002</PickItem>
<PickDescription>Apple</PickDescription>
<PickQuantity>1</PickQuantity>
</PickLine>
<PickLine>
<PickLineNumber>8.1..2</PickLineNumber>
<PickItem>fruit-003</PickItem>
<PickDescription>Orange</PickDescription>
<PickQuantity>2</PickQuantity>
</PickLine>
</Line>
</Message>
</ShipmentRequest>
sample_crack.rb
#!/usr/bin/ruby
require "crack"
require 'mongo'
include Mongo
mongo_client = MongoClient.new("localhost", 27017)
db = mongo_client.db("somedb")
coll = db.collection("somecoll")
myXML = Crack::XML.parse(File.read("sample.xml"))
coll.insert(myXML)
puts myXML
It prints on console:
{"ShipmentRequest"=>{"Envelope"=>{"TransmissionDateTime"=>"05/08/2013 23:06:02"}, "Message"=>{"Comment"=>nil, "Header"=>{"MemberId"=>"A00000001", "MemberName"=>"Bruce", "DeliveryId"=>"6377935", "ShipToAddress1"=>"123-4567", "OrderDate"=>"05/08/13", "Payments"=>[{"PayType"=>"Credit Card", "Amount"=>"1000"}, {"PayType"=>"Points", "Amount"=>"5390"}]}, "Line"=>[{"LineNumber"=>"3.1", "Item"=>"fruit-004", "Description"=>"Peach", "Quantity"=>"1", "UnitCost"=>"1610", "DeclaredValue"=>"0", "PointValue"=>"13"}, {"LineNumber"=>"8.1", "Item"=>"fruit-001", "Description"=>"Fruit Set", "Quantity"=>"1", "UnitCost"=>"23550", "PointValue"=>"105", "PickLine"=>[{"PickLineNumber"=>"8.1..1", "PickItem"=>"fruit-002", "PickDescription"=>"Apple", "PickQuantity"=>"1"}, {"PickLineNumber"=>"8.1..2", "PickItem"=>"fruit-003", "PickDescription"=>"Orange", "PickQuantity"=>"2"}]}]}}, :_id=>BSON::ObjectId('51ad8d83a3d24b3b9f000001')}
In the mongodb, the converted XML file looks like:
{
"_id" : ObjectId("51ad8d83a3d24b3b9f000001"),
"ShipmentRequest" : {
"Envelope" : {
"TransmissionDateTime" : "05/08/2013 23:06:02"
},
"Message" : {
"Comment" : null,
"Header" : {
"MemberId" : "A00000001",
"MemberName" : "Bruce",
"DeliveryId" : "6377935",
"ShipToAddress1" : "123-4567",
"OrderDate" : "05/08/13",
"Payments" : [
{
"PayType" : "Credit Card",
"Amount" : "1000"
},
{
"PayType" : "Points",
"Amount" : "5390"
}
]
},
"Line" : [
{
"LineNumber" : "3.1",
"Item" : "fruit-004",
"Description" : "Peach",
"Quantity" : "1",
"UnitCost" : "1610",
"DeclaredValue" : "0",
"PointValue" : "13"
},
{
"LineNumber" : "8.1",
"Item" : "fruit-001",
"Description" : "Fruit Set",
"Quantity" : "1",
"UnitCost" : "23550",
"PointValue" : "105",
"PickLine" : [
{
"PickLineNumber" : "8.1..1",
"PickItem" : "fruit-002",
"PickDescription" : "Apple",
"PickQuantity" : "1"
},
{
"PickLineNumber" : "8.1..2",
"PickItem" : "fruit-003",
"PickDescription" : "Orange",
"PickQuantity" : "2"
}
]
}
]
}
}
}
But I'd like to import it like to eliminate not-needed nodes and ignore empty ones:
{
"_id" : ObjectId("51ad8d83a3d24b3b9f000001"),
"MemberId" : "A00000001",
"MemberName" : "Bruce",
"DeliveryId" : "6377935",
"ShipToAddress1" : "123-4567",
"OrderDate" : "05/08/13",
"Payments" : [
{
"PayType" : "Credit Card",
"Amount" : "1000"
},
{
"PayType" : "Points",
"Amount" : "5390"
}
],
"Line" : [
{
"LineNumber" : "3.1",
"Item" : "fruit-004",
"Description" : "Peach",
"Quantity" : "1",
"UnitCost" : "1610",
"DeclaredValue" : "0",
"PointValue" : "13"
},
{
"LineNumber" : "8.1",
"Item" : "fruit-001",
"Description" : "Fruit Set",
"Quantity" : "1",
"UnitCost" : "23550",
"PointValue" : "105",
"PickLine" : [
{
"PickLineNumber" : "8.1..1",
"PickItem" : "fruit-002",
"PickDescription" : "Apple",
"PickQuantity" : "1"
},
{
"PickLineNumber" : "8.1..2",
"PickItem" : "fruit-003",
"PickDescription" : "Orange",
"PickQuantity" : "2"
}
]
}
]
}
Can this be done with crack? Or this can be better done with nokogiri?
Update
Big thanks to #Alex Peachey, here I put the updated code.
sample_crack/rb (updated):
#!/usr/bin/ruby
require "crack"
require 'mongo'
include Mongo
mongo_client = MongoClient.new("localhost", 27017)
db = mongo_client.db("somedb")
coll = db.collection("somecoll")
myXML = Crack::XML.parse(File.read("sample.xml"))
myXML.merge!(myXML.delete("ShipmentRequest")) # not needed hash
myXML.merge!(myXML.delete("Message")) # not needed hash
myXML.merge!(myXML.delete("Header")) # not needed hash
myXML.delete("Envelope") # not needed hash
# planning to put here a code to remove hashes with empty values
coll.insert(myXML)
puts myXML
It's hard to say how you define "not-needed" nodes but empty ones are easy enough to understand. Either way though, Crack is very good at what's it's doing for you which is basically turning the XML into a Hash. Once you have the Hash just prune it as you wish based on whatever rules you have before you insert it into Mongo.
Based on your comment, I better understand what you are asking. My answer still holds true, just manipulate the hash. Specifically you could do this:
myXML.merge!(myXML.delete("ShipmentRequest"))
myXML.delete("Envelope")
myXML.merge!(myXML.delete("Message"))

Resources