Why is an array of all `ids` needed in a normalized state shape? - normalizr

comments : {
byId : {
"comment1" : {
id : "comment1",
author : "user2",
comment : ".....",
},
"comment2" : {
id : "comment2",
author : "user3",
comment : ".....",
},
"comment3" : {
id : "comment3",
author : "user3",
comment : ".....",
},
"comment4" : {
id : "comment4",
author : "user1",
comment : ".....",
},
"comment5" : {
id : "comment5",
author : "user3",
comment : ".....",
},
},
allIds : ["comment1", "comment2", "comment3", "commment4", "comment5"]
}
In the above example, is there any reason my needs to include it api include it. I assume this way you can do a count faster, you can probably sort but generally I am not understanding if there is a performance hit.

This isn't anything that's required by Redux, this is a normalizr thing. To answer your question, JavaScript objects can't be replied upon to retain sort order in certain situations. Putting the ids in an array allows you to retain the sort order that was present before you normalized.
Quote from co-maintainer of Redux and author of "normalizing state shape section" of Redux docs:
As for the ID arrays, while JS engines now have a fairly standardized process for iterating across keys in an object, you shouldn't rely on that to define ordering. Storing arrays of IDs allows you to define an order for items.

Related

How to insert an element into already present list in elastic search

Say I have documents stored like below.
document 1
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2"],
...
}
document 2
{
id : '2',
title : "This is a test document2",
valueList : ["value1" , "value2"],
...
}
I need to add some more elements to the valueList in the documents with a list of document ids using bulk api. The resulting should look like
document 1
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2", "value3"],
...
}
document 2
{
id : '2',
title : "This is a test document2",
valueList : ["value1" , "value2" , "value3"],
...
}
What can I do to achieve this?
I tried using the scripts but it only updates a single document.
Sorry am really new to elastic search. I could even be stupid on this question. Please forgive and make me clear with this question.
See Updating Document. It should be straightforward. You need to use _update and just to give you an idea, even though the documentation is nearly perfect, it could look like this:
POST /your_index/your_type/document1/_update
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2", "value3"]
}
This will update document1.
In case of bulk updates you should read Batch Processing and have a look at the Bulk API.
From the docs:
POST /your_index/your_type/_bulk
{ "update" : {"_id" : "document1", "_type" : "your_type", "_index" : "your_index"}}
{ "doc" : {"myfield" : "newvalue"} }
{ "update" : {"_id" : "document2", "_type" : "your_type", "_index" : "your_index"}}
{ "doc" : {"myfield" : "newvalue"} }
Please note that you can just use _update for Partial Updates.
The simplest form of the update request accepts a partial document as
the doc parameter, which just gets merged with the existing document.
Objects are merged together, existing scalar fields are overwritten,
and new fields are added.

Performance with nested data in a script field

I am wondering if there is a more performant way of performing a calculation on nested data in a script field or of organizing the data. In the code below, the data will contain values for 50 states and/or other regions. Each user is tied to an area, so the script above will search to see that the averageValue in their area is above a certain threshold and return a true/false value for each matching document.
Mapping
{
"mydata" : {
"properties" : {
...some fields,
"related" : {
"type" : "nested",
"properties" : {
"average_value" : {
"type" : "integer"
},
"state" : {
"type" : "string"
}
}
}
}
}
}
Script
"script_fields" : {
"inBudget" : {
"script" : {
"inline" : "_source.related.find { it.state == default_area && it.average_value >= min_amount } != null",
"params" : {
"min_amount" : 100,
"default_area" : "CA"
}
}
}
}
I have a working solution using the above method, but it slows my query down and I am curious if there is a better solution. I have been toying with the idea of using a inner object with a key, like: related_CA and having each states data in a separate object, however for flexibility I would rather not have to pre-define each region in a mapping (as I may not have them all ahead of time). I feel like I might be missing a simpler/better way and I am open to either reorganizing the data/mapping and/or changes to the script.

How can I ignore certain attributes when comparing two json files in ruby?

I'm looking for a parallel approach to solving a problem. One approach I posted in "How can I subract two json files in ruby." Another approach is this.
I'm using this nifty json_diff.rb program to compare two similarly-structured json files. How can I skip certain attributes that can be nested inside other attributes?
For example, I have file1.json
{
"id" : "file1",
"att1" : {
"attA" : {
"free_mem" : "1234",
"buff_mem" : "5678"
},
"attB" : {
"name" : "Joe",
"location" : "Lab"
}
}
}
and file2.json
{
"id" : "file2",
"att1" : {
"attA" : {
"free_mem" : "5555",
"buff_mem" : "6666"
},
"attB" : {
"name" : "John",
"location" : "Lab"
}
}
}
I want to ignore attA. Note that these files are just examples, and real json files I have will have more attributes to ignore, which may be nested deeper inside other attributes.
I've done this in rspec with json_spec: https://github.com/collectiveidea/json_spec with good success. but that's specific to rspec and not just ruby

nested field query for mongodb (using ruby)

Sup, good folks of the internet.
Does anyone know how to make nested queries for mongodb? This is probably best explained by an example. To retrieve specific fields, I can use the :fields option to retrieve that field (e.g. suppose it is called "useful_field"):
collection.find({},{:fields => {"useful_field" => 1}})
But suppose that useful_field itself contains an array of many further fields, i.e
useful_field = [{"value_I_want"=>"useful","value_I_dont_want"=>"not_useful"}]
My aim is to select "value_I_want". Any thoughts?
Here is a specific entry that I am trying to deal with (a reply to a tweet):
{ "_id" : ObjectId("51b6f71b0364718d71e4bca5"),
"annotations" : { },
"resultType" : "Tweet",
"score" : 1,
"groupName" : "TweetsWithConversation",
"results" : [
{
"kind" : "Tweet",
"score" : 1,
"annotations" : { "ConversationRole" : "Ancestor" },
"value" : { "created_at" : "Fri Jun 07 19:47:51 +0000 2013",
"id" : NumberLong("343091955196104704"),
"id_str" : "343091955196104704",
"text" : "THIS_IS_WHAT_I_WANT",
etc. etc. (Apologies for the odd formatting)
I'm trying to use a method of the form that will let me do something like this:
db.collection.find({},{:fields { some_way_of_selecting(THIS_IS_WHAT_I_WANT)})
(I'm querying as part of a ruby script)
Otherwise, I'll have to go back into the dark world of regex. No-one wants that.
Try the following
db.collection.find({},{"useful_field.value_I_want": 1})
Maybe try this:
db.collection.find({"resultType" : "Tweet"}, {"results" : {$elemMatch : {"value.text" : "THIS_IS_WHAT_I_WANT"}}})
What you are trying to do is called "projection" - it's specifying what fields you want returned in the second argument to find.
In your case you simply want:
db.collection.find({}, {"results.value.text":1} )

Mongo DB MapReduce: Emit key from array based on condition

I am new to mongo db so excuse me if this is rather trivial. I would really appreciate the help.
The idea is to generate a histogram over some specific values. In that case the mime types of some files. For that I am using a map reduce job.
I have a mongo with documents in the following form:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : [
{
"key" : "key1",
"value" : "Plain text",
"status" : "SINGLE_RESULT",
},
{
"key" : "key2",
"value" : "text/plain",
"status" : "SINGLE_RESULT",
},
{
"key" : "key3",
"value" : 3469,
"status" : "OK",
}
]
}
Please note, that in almost every document there are more metadata key values.
Map Reduce job
I tried doing the following:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;}
});
emit(mime, {count:1});
}
function reduce() {
var res = {count:0};
values.forEach(function (v) {res.count += v.count;});
return res;
}
db.collection.mapReduce(map, reduce, {out: { inline : 1}})
This seems to work for a small number of documents (~15K) but the problem is that iterating through all metadata key values takes a lot of time during the mapping phase. When running this on more documents (~1Mio) the operation takes for ever.
So my question is:
Is there some way in which I can emit the mime type (the value) directly instead of iterating through all keys and selecting it? Or is there a better way to write a map reduce functions.
Something like emit (this.metadata.value {$where this.metadata.key:"key2"}) or similar...
Thanks for your help!
Two thoughts ...
First thought: How attached are you to this document schema? Could you instead have the metadata field value as an embedded document rather than an embedded array, like so:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : {
"key1" : {
"value" : "Plain text",
"status" : "SINGLE_RESULT"
},
"key2": {
"value" : "text/plain",
"status" : "SINGLE_RESULT"
},
"key3" : {
"value" : 3469,
"status" : "OK"
}
}
}
Then your map step does away with the loop entirely:
function map() {
emit( this.metadata["key2"].value, { count : 1 } );
}
At that point, you might even be able to cast this as a "group" command rather than a "mapReduce".
Second thought: Absent a schema change like that, particularly if "key2" appears early in the metadata array, you could at least exit the loop eagerly once the key is found to save yourself some iterations, like so:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;
break;
}
});
emit(mime, {count:1});
}
Not sure if either path is the key to victory, but hopefully helpful thoughts. Best of luck!

Resources