How would you implement these queries efficiently in MongoDB? - ruby

Links have one or more tags, so at first it might seem natural to embed the tags:
link = { title: 'How would you implement these queries efficiently in MongoDB?'
url: 'http://stackoverflow.com/questions/3720972'
tags: ['ruby', 'mongodb', 'database-schema', 'database-design', 'nosql']}
How would these queries be implemented efficiently?
Get links that contain one or more given tags (for searching links with given tags)
Get a list of all tags without repetition (for search box auto-completion)
Get the most popular tags (to display top 10 tags or a tag cloud)
The idea to represent the link as above is based on the MongoNY presentation, slide 38.

Get links that contain "value" tag:
db.col.find({tags: "value"});
Get links that contain "val1", "val2" tags:
db.col.find({tags: { $all : [ "val1", "val2" ] }});
Get list of all tags without repetition:
db.col.distinct("tags");
Get the most popular tags - this isn't something that can be queried on an existing db, what you need to do is add a popularity field update it whenever a query fetches the document, and then do a query with the sort field set to the popularity.
Update: proposed solution for popularity feature.
Try adding the following collection, let's call it tags.
doc = { tag: String, pop: Integer }
now once you do a query you collect all the tags that were shown (these can be aggregated and done asynchronously) so let's say you end up with the following tags: "tag1", "tag2", "tag3".
You then call the update method and increment the pop field value:
db.tags.update({tag: { $in: ["tag1", "tag2", "tag3"] }}, { $inc: { pop: 1 }});

You can also use $addToSet to change your tag array instead of $push. This doesn't modify the document when tag already exists.
This will be a bit more efficient if you modify your tags frequently (as the documents won't grow that much).
Here is an example:
> db.tst_tags.remove()
> db.tst_tags.update({'name':'test'},{'$addToSet':{'tags':'tag1'}}, true)
> db.tst_tags.update({'name':'test'},{'$addToSet':{'tags':'tag1'}}, true)
> db.tst_tags.update({'name':'test'},{'$addToSet':{'tags':'tag2'}}, true)
> db.tst_tags.update({'name':'test'},{'$addToSet':{'tags':'tag2'}}, true)
> db.tst_tags.update({'name':'test'},{'$addToSet':{'tags':'tag3'}}, true)
> db.tst_tags.find()
{ "_id" : ObjectId("4ce244548736000000003c6f"), "name" : "test",
"tags" : [ "tag1", "tag2", "tag3" ] }

Related

Filtering JSON based on sub array in a Power Automate Flow

I have some json data that I would like to filter in a Power Automate Flow.
A simplified version of the json is as follows:
[
{
"ItemId": "1",
"Blah": "test1",
"CustomFieldArray": [
{
"Name": "Code",
"Value": "A"
},
{
"Name": "Category",
"Value": "Test"
}
]
},
{
"ItemId": "2",
"Blah": "test2",
"CustomFieldArray": [
{
"Name": "Code",
"Value": "B"
},
{
"Name": "Category",
"Value": "Test"
}
]
}
]
For example, I wish to filter items based on Name = "Code" and Value = "A". I should be left with the item with ItemId 1 in that case.
I can't figure out how to do this in Power Automate. It would be nice to change the data structure, but this is the way the data is, and I'm trying to work out if this is possible in Power Automate without changing the data itself.
Firstly, I had to fix your JSON, it wasn't complete.
Secondly, filtering on sub array information isn't what I'd call easy. However, to get around the limitations, you can perform a bit of trickery.
Prior to the step above, I create a variable of type Array and called it Array.
In the step above, the left hand side expression is ...
string(item()?['CustomFieldArray'])
... and the contains comparison on the right hand side is simply as you can see, a string with the appropriate filter value ...
{"Name":"Code","Value":"A"}
... it's not an expression or a proper object, just a string.
If you need to enhance it to cater for case sensitive values, just set everything to lower case using the toLower expression on the left.
Although it's hard to see, that will produce your desired result ...
... you can see by the vertical scrollbars that it's reduced the size of the array.

How to create a multi-value tag metric gauge?

Already read this but with no lucky.
All examples I've found just show how to create a single value tag like this:
{
"name" : "jvm.gc.memory.allocated",
"measurements" : [ {
"statistic" : "COUNT",
"value" : 1.98180864E8
} ],
"availableTags" : [ {
"tag" : "stack",
"values" : [ "prod" ]
}, {
"tag" : "region",
"values" : [ "us-east-1" ]
} ]
}
But I need to create a multi value tag like this:
availableTags: [
{
tag: "method",
values: [
"POST",
"GET"
]
},
My code so far:
List<Tag> tags = new ArrayList<Tag>();
tags.add( Tag.of("test", "John") );
tags.add( Tag.of("test", "Doo") );
tags.add( Tag.of("test", "Foo Bar") );
Metrics.gauge("my.metric", tags, new AtomicLong(3) );
As you can see I think I can just repeat the key but this is not the case and the second parameter of Tag.of is a String and not a String Array.
I don't think this was the real intent of authors of these metering libraries - to provide a multi-value tag for a metric.
The whole point of metrics tags is to provide a "discriminator" - something that can be used later to retrieve metrics whose tag has a specific, single, value.
Usually, this value is used in metrics storage systems, like Prometheus, DataDog, InfluxDB and so on. And above this Grafana can incorporate a single tag value in its queries.
The only possible use case of such a request that I see is that it will be possible to see the metrics value in an actuator in a kind of more convenient way, but again it's not the main point of the whole capability here, so, bottom line I doubt that its possible at all.

Filtering Field with multiple values

How would I approach the following problem:
I want to filter on a field which contains multiple values(eg. ["value1", "value2", "value3"]).
The filter would also contain multiple values (eg. ["value1", "value2"].
I want to get back only the items which have the same field value as filter, eg. field is ["value1", "value2"] and the filter is also ["value1", "value2"]
Any help would be greatly appreciated
I think the somewhat-recently added (v6.1) terms_set query (which Val references on the question he linked in his comment) is what you want.
terms_set, unlike a regular terms, has a parameter to specify a minimum number of matches that must exist between the search terms and the terms contained in the field.
Given:
PUT my_index/_doc/1
{
"values": ["living", "in a van", "down by the river"],
}
PUT my_index/_doc/2
{
"values": ["living", "in a house", "down by the river"],
}
A terms query for ["living", "in a van", "down by the river"] will return you both docs: no good. A terms_set configured to require all three matching terms (the script params.num_terms evaluates to 3) can give you just the matching one:
GET my_index/_search
{
"query": {
"terms_set": {
"values": {
"terms": ["living", "in a van", "down by the river"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
NOTE: While I used minimum_should_match_script in the above example, it isn't a very efficient pattern. The alternative minimum_should_match_field is the better approach, but using it in the example would have meant a couple of more PUTs to add the necessary field to the documents, so I went with brevity.

Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects)

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)
Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Elasticsearch Autocomplete on Specific field in Specific Document

I have Documents that contain many fields which are lists of values.
I would like to be able to autocomplete from one specific such field at a time in one specific document without data duplication (like Completion Suggestors)
For example, I would like to be able to autocomplete after 3 characters from the values in the category field of the document with id: '7'.
I tried to implement something based on this but this doesn't seem to work on a list of values.
For filtering the suggestions by a field, you can add the fields to filter on in context.
"category":{
type: "completion",
payloads: false,
context: {
id: {
type: "category",
path: "id"
}
}
}
You can index the document as :
POST /myindex/myitem/1
{
id: 123,
category: {
input: "my category",
context: {
id: 123
}
}
}
The minimum length check has to be applied on the client side. ES suggesters do not provide anything like that.
Now, you can suggest on category field with a filter on id field.

Resources