Iterating over NEST Buckets - elasticsearch

I am trying NEST out and it seems very nice, but I am having some trouble understanding some things.
The response is serialized to an hierarchy of objects. I would like to iterate over it and to create my own structure.
I would be able to do somethings like this (thanks to #Martijn Laarman, who helped me in the GitHub page):
var buckets = result.Aggs.Terms("level_1");
var term = buckets.Items[0].Terms("level_2");
It works, but I would like to have a generic algorithm that parses the response. To do that, I would like to get content independently of the query (if it used terms, range, etc). So I would like to do things like:
var buckets = result.Aggregrations["level_1"];
var term = buckets.Items[0].Aggreggation["level_2"];
Unfortunately the Aggregations collection returns Nest.Bucket and I can't do anything from there.
Is there any way that I can iterate over the result independently on how the query was formed?
Thanks!

For sake of completeness, I was not able to find any way to do so.
I created a parser using a mix of JObects and Dictionary and operated over the JSON response to generate the output I wanted.

Related

Leveraging spring to reduce DB calls

I have a data piece that is:
foo{
string: one
string: two
list<string>: listOne
list<string>: listTwo
}
such that in the DB one is associated with multiple entries of listOne.
not much background, I'm at a loss as to where to even look for answers. I received feed back to try to eliminate a jdbctemplate.query during a code review with a "there may be a way to reduce this using #autowire".
no code to share, I just need a place to start looking for answers. I've been on the spring website and I don't see anything that looks like I can use it. and I didn't see any google results that resemble what I'm looking for.
I should probably preface this with the fact that I'm a new dev so even a simple answer is likely not something I've tried. so this came about because my query for listOne and listTwo are returning columns. so I first tried using a mapper with the jdbcTemplate.query() that returned a string. but jdbc didn't like that. so I ended up returning a list from the mapper. then jdbc turns those answers into a list>, I then afterwards loop through those list> to convert them to a list and store them in foo. in my mind an ideal solution allows me to combine the two queries and the mapper looks like (pseudo code):
public foo fooMapper implements<RowMapper>(){
foo.one = resultSet.get("thingOne")
foo.two = resultSet.get("thingTwo")
foo.listOne = resultSet.get("[a portion of the column]listThingOne")
foo.listTwo = resultSet.get("[a portion of the column]listThingTwo")
return foo;
}
it should be noted that the the result set is mono-directional, I found out when I tried using a string[] instead of a list.

Pig:FLATTEN keyword

I am a little confused with the use of FLATTEN keyword in PIG.
Consider the below dataset:
tuple_record: {details: (firstname: chararray,lastname: chararray,age: int,sex: chararray)}
Without using the FLATTEN I can access a field (suppose firstname) like this:
display_firstname = FOREACH tuple_record GENERATE details.firstname;
Now, using the FLATTEN keyword:
flatten_record = FOREACH tuple_record GENERATE FLATTEN(details);
DESCRIBE gives me this:
flatten_record: {details::firstname: chararray,details::lastname: chararray,details::age: int,details::sex: chararray}
And hence I can access the fields present directly without dereferencing like this:
display_record = FOREACH flatten_record GENERATE firstname;
My questions related to this FLATTEN keyword is:
1) Which way among the two (i.e. with or without using FLATTEN) is the optimized way of achieving the same output?
2) Any special scenarios where without using the FLATTEN keywords, the desired output cant be achieved?
Totally confused; please clarify its use and in which all scenarios I shall use it.
Sometimes you have data in a bag or a tuple and you want to remove that level of nesting.
when you want to switch around your data on the fly and group by a particular field, you need a way to pull those entries out of the bag.
As per Pig documentation:
The FLATTEN operator looks like a UDF syntactically, but it is
actually an operator that changes the structure of tuples and bags in
a way that a UDF cannot. Flatten un-nests tuples as well as bags. The
idea is the same, but the operation and result is different for each
type of structure.
For more details check this link they have explained the usage of FLATTEN clearly with examples

Sorting by counting the intersection of two lists in MongoDB

We have a posting analyzing requirement, that is, for a specific post, we need to return a list of posts which are mostly related to it, the logic is comparing the count of common tags in the posts. For example:
postA = {"author":"abc",
"title":"blah blah",
"tags":["japan","japanese style","england"],
}
there are may be other posts with tags like:
postB:["japan", "england"]
postC:["japan"]
postD:["joke"]
so basically, postB gets 2 counts, postC gets 1 counts when comparing to the tags in the postA. postD gets 0 and will not be included in the result.
My understanding for now is to use map/reduce to produce the result, I understand the basic usage of map/reduce, but I can't figure out a solution for this specific purpose.
Any help? Or is there a better way like custom sorting function to work it out? I'm currently using the pymongodb as I'm python developer.
You should create an index on tags:
db.posts.ensure_index([('tags', 1)])
and search for posts that share at least one tag with postA:
posts = list(db.posts.find({_id: {$ne: postA['_id']}, 'tags': {'$in': postA['tags']}}))
and finally, sort by intersection in Python:
key = lambda post: len(tag for tag in post['tags'] if tag in postA['tags'])
posts.sort(key=key, reverse=True)
Note that if postA shares at least one tag with a large number of other posts this won't perform well, because you'll send so much data from Mongo to your application; unfortunately there's no way to sort and limit by the size of the intersection using Mongo itself.

Query multiple elements without specifying the element name

This may be a silly question, but is it possible to make a query using XPath without specifying the element name?
Normally I would write something like
//ElementName[#id = "some_id"]
But the thing is I have many (about 40) different element types with an id attribute and I want to be able to return any of them if the id fits. But I don't want to make this call for each type individually. Is it possible to search all of them at once, regardless of the name?
I am using this in an XQuery script, if that offers any help.
use * instead of name //*[#id = "some_id"]
It might be more efficient to look directly at the #id elements - //* will work, but will initially return every node in the document and then filter!
That may not matter in a small document, of course. but here's an alternative:
//#id[.="some_id"]/..

Using Linq to select a list of Entities, linked Entities, linked Entities

Apologies for the poor question title - I'm not sure how to describe what I'm doing but that is the best I could come up with, please edit it if what I'm asking for has a real name!
I have Programmes, which can have a group of Projects assigned, which in turn have groups of Outputs assigned.
I would like to get all the outputs for the Programme through it's projects as one big list of outputs. I have this:
From pp In Me.ProgrammeProjects Select pp.Project.Outputs
Which basically gets me a list of output lists. (An Ienumerable Of EntitySet Of Output).
I'm brute forcing my way through Linq and can't find any examples of this (or can't recognise one when I see it). How can I do this using just Linq rather than for loops and Linq where I'd go through each EntitySet and add it's contents to a bigger list?
Thanks
Or go against the linq context directly:
from o in context.Outputs
where o.Project.ProgrammeProjects.ID = 1
select o
The reverse will work too and query straight from the data context's table.
Are you trying to get a list of outputs for a specific programme?
If so, try something like this:
var result = (from pp in ProgrammeProjects
where pp.Name.Equals("ProjectA")
select pp.Project.Outputs).ToList();
or
once you get your list of outputs, you could use a lambda expression to get a subset.
var result = (from pp in ProgrammeProjects
select pp.Project.Outputs).ToList();
var subResult = result.FindAll(target => target.OutputParameter.Equals("findThisValue");
Is this what you're trying to do?
If not, give a bit more detail of the data structure and what you're trying to retrieve and I'll do my best to help.
Patrick.
This is the way I've resorted to doing it, but I can tell it's going to be slow when the amount of data increases.
Dim allOutputs As New Generic.List(Of Output)
Dim outputLists = From pp In Me.ProgrammeProjects Select pp.Project.Outputs.ToList
For Each outputList In outputLists
Dim os = From o In outputList Where o.OutputTypeID = Type Select o
allOutputs.AddRange(os)
Next
Return allOutputs
I'm still a bit confused as to what kind of data you're trying to retrieve. Here is what I understand.
You have a list of Programmes
Each Programme can have many Projects
Each Project can have many outputs.
Goal: To find all outputs of a certain type. Is this right?
It doesn't look like you're retrieving any data related to the project or programme so
something like this should work:
Dim allOutputs As Generic.List(Of Outputs) = (From output In Me.Outputs Where output.OutputType.Equals(Type) Select output).ToList()
Let me know how it goes.
Patrick.

Resources