Morphia equivalent of this java-driver code - mongodb-java

I want to do this in morphia. Can anyone help out
Bson f1 = Filters.gt("score", 80);
Bson f2 = Filters.lt("score", 100);
Bson f3 = and (f1, f2);
MongoCursor<Document> c = collection.find(
elemMatch("grades", f3))
.iterator();
please do note that the following does not work
ds.find(Restaurant_M.class)
.field("grades.score")
.greaterThan(80)
.lessThan(100)
.asList();
my goal is to get all the documents where the score in the grades sub-document is only BETWEEN 80 and 100. it should $and the $gt:80 and $lt:100. The morphia statement which i posted only checks every individual score for both conditions. it does not check all the scores in a $and. so i get documents where the score is e.g. 130 (because it is > 80).

You might want to use AND operator to handle this:
Query<Restaurant_M> query = getDs().find(Restaurant_M.class);
query.and(query.criteria("grades.score").greaterThan(80), query.criteria("grades.score").lessThan(100));
List<Restaurant_M> orders = query.asList();
Using morphia version 1.2.0

Related

In PIG how to project disambiguited field present in bag?

I have something like this :
joined = JOIN A BY F1, B BY F1 ;
joinOutput = FOREACH joined GENERATE A::f3 AS f3, A::f4 AS f4, B::f5 AS f5 ;
grouped = GROUP joinOutput BY f3 ;
countOutput = FOREACH grouped FLATTEN(joinOutput) , count(f5) as COUNT ;
if I do """ DESCRIBE countOutput """ then I get following:
countOutput = { joinOutput::f3 :chararray, joinOutput::f4 :int, COUNT :int }
Now if I try to reference f3 with respect to "countOutput" i.e. countOutput.f3 I get error saying invalid field projection.
So my question is how do I project field f3 with respect to countOutput.
I haven't tried this is yet if this is correct but I could think of following ways -
countOutput.joinOutput::f3
Not sure though if this is correct way.
Any help is appreciated.
ok, found solution after trying out few things. I found that you can specify schema explicitly when you FLATTEN.
So this particular step can be re-written as follows :
countOutput = FOREACH grouped FLATTEN(joinOutput) AS ( f3 :chararray, f4: int) , count(f5) as COUNT ;
Now I can directly reference flattened fields with respect to outer relation.
Hope this helps if someone runs into same problem.

Is it possible to break up an element itself before another join?

I have got two xml documents, simplified as
<NumSetA>
<num Operation="+/-">1</num>
<num Operation="+">3</num>
<num Operation="+/*">4</num>
</NumSetA>
<NumSetB>
<num>2</num>
<num>9</num>
</NumSetB>
I want to join NumSetA with NumSetB with the possible operations stated in the Operation tag, ie.
1+2, 1-2, 1+9, 1-9, 3+2, 3+9, 4+2, 4+9, 4*2, 4*9
by using string.split('/')
What I want to do is
var CrossJoin = SetA.Elements("num").join(this.attribute("Operation").value.split('/'),
.join(SetB.Elements("num"))
Sorry for being inventive. Hope you understand what I am saying.
How can I achieve that?
It's pretty easy to do with the query syntax:
var crossJoin =
from numA in SetA.Elements("num")
from op in numA.Attribute("Operation").value.split('/')
from numB in SetB.Elements("num")
select new {
a = numA.value,
op,
b = numB.value
};

Labeled LDA learn in Stanford Topic Modeling Toolbox

It's ok when I run the example-6-llda-learn.scala as follows:
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);
val tokenizer = {
SimpleEnglishTokenizer() ~> // tokenize on space and punctuation
CaseFolder() ~> // lowercase everything
WordsAndNumbersOnlyFilter() ~> // ignore non-words and non-numbers
MinimumLengthFilter(3) // take terms with >=3 characters
}
val text = {
source ~> // read from the source file
Column(4) ~> // select column containing text
TokenizeWith(tokenizer) ~> // tokenize with tokenizer above
TermCounter() ~> // collect counts (needed below)
TermMinimumDocumentCountFilter(4) ~> // filter terms in <4 docs
TermDynamicStopListFilter(30) ~> // filter out 30 most common terms
DocumentMinimumLengthFilter(5) // take only docs with >=5 terms
}
// define fields from the dataset we are going to slice against
val labels = {
source ~> // read from the source file
Column(2) ~> // take column two, the year
TokenizeWith(WhitespaceTokenizer()) ~> // turns label field into an array
TermCounter() ~> // collect label counts
TermMinimumDocumentCountFilter(10) // filter labels in < 10 docs
}
val dataset = LabeledLDADataset(text, labels);
// define the model parameters
val modelParams = LabeledLDAModelParams(dataset);
// Name of the output model folder to generate
val modelPath = file("llda-cvb0-"+dataset.signature+"-"+modelParams.signature);
// Trains the model, writing to the given output path
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
// or could use TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
But it's not ok when I change the last line from:
TrainCVB0LabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1000);
to:
TrainGibbsLabeledLDA(modelParams, dataset, output = modelPath, maxIterations = 1500);
And the method of CVB0 cost much memory.I train a corpus of 10,000 documents with about 10 labels each document,it will cost 30G memory.
I've encountered the same situation and indeed I believe it's a bug. Check GIbbsLabeledLDA.scala in edu.stanford.nlp.tmt.model.llda under the src/main/scala folder, from line 204:
val z = doc.labels(zI);
val pZ = (doc.theta(z)+topicSmoothing(z)) *
(countTopicTerm(z)(term)+termSmooth) /
(countTopic(z)+termSmoothDenom);
doc.labels is self-explanatory, and doc.theta records the distribution (counts, actually) of its labels, which has the same size as doc.labels.
zI is index variable iterating doc.labels, while the value z gets the actual label number. Here comes the problem: it's possible this documents has only one label - say 1000 - therefore zI is 0 and z is 1000, then doc.theta(z) gets out of range.
I suppose the solution would be to modify doc.theta(z) to doc.theta(zI).
(I'm trying to check whether the results would be meaningful, anyway this bug has made me not so confident in this toolbox.)

Linq to Entities Percentages

Does anyone have any tips for calculating percentages in Linq to Entities?
I'm guessing that there must be a more efficient way than returning 2 results and calculating in memory. Perhaps an inventive use of let or into?
EDIT
Thanks Mark for your comment, here is a code snippet, but I think this will result in 2 database hits:
int passed = (from lpt in this.PushedLearnings.Select(pl => pl.LearningPlanTask)
where lpt.OnlineCourseScores.Any(score => score.ActualScore >= ((lpt.LearningResource.PassMarkPercentage != (decimal?)null) ?lpt.LearningResource.PassMarkPercentage : 80))
select lpt).Count();
int total = (from lpt in this.PushedLearnings.Select(pl => pl.LearningPlanTask)
select lpt).Count();
double percentage = passed * 100 / total;
If you use LINQ to Entities and write something along the lines of select x * 100.0 / y in your query then this expression will be converted to SQL and run in the database. It will be efficient.

OData "where ID in list" query

I have an OData service where I'm trying to filter by a list of IDs; the SQL equivalent would be something like:
SELECT * FROM MyTable WHERE TableId IN (100, 200, 300, 400)
The property I'm trying to filter on is typed as an Int32. I've tried the following, which gives me an error "Operator 'add' incompatible with operand types 'Edm.String' and 'Edm.Int32'":
string ids = ",100,200,300,400,";
from m in provider.Media where ids.Contains("," + t.media_id + ",")
as well as
string ids = ",100,200,300,400,";
from m in provider.Media where ids.Contains("," + t.media_id.ToString() + ",")
and
string ids = ",100,200,300,400,";
from m in provider.Media where ids.Contains("," + Convert.ToString(t.media_id) + ",")
and
string ids = ",100,200,300,400,";
from m in provider.Media where ids.Contains(string.Concat(",", t.media_id, ","))
As you can see, currently I'm using LINQ to query the service.
Is there a way I can do what I'm trying to, or am I stuck constructing a text filter and using AddQueryOption, and iterating through the list and manually adding "or media_id eq 100" clauses?
With OData 4.01, in statement is supported like this:
http://host/service/Products?$filter=Name in ('Milk', 'Cheese')
See accepted answer, everything below is for OData v < 4.01
try this one
var ids = new [] { 100, 200, 300 } ;
var res = from m in provider.Media
from id in ids
where m.media_id == id
select m;
there is a comprehensive description on msdn on querying DataServices.
another approach would be
var results = provider.Media
.AddQueryOption("$filter", "media_id eq 100");
and since OData doesn't support IN statements you will come up with filter condition like this
.AddQueryOption("$filter", "(media_id eq 100) or (media_id eq 200 ) or ...");
which you can build using loop or linq Select and string.Join:
var ids = new [] { 100, 200, 300 };
var filter = string.Join(" or ", ids.Select(i=> $"(media_id eq {i})"));
var results = provider.Media.AddQueryOption("$filter", filter);
UPDATE: There is filter operation field=["a","b"] however it means something different.
UPDATE2: In OData V4 there is lambda expressions any and all, paired with array literal ["a", "b"] they might work as in but I was not able to come up with working example using v4 endpoint at OData.org
Expanding on vittore's answer (of which the second part is the correct answer), I've written something similar to the following for a demo project:
var filterParams = ids.Select(id => string.Format("(media_id eq {0})", id));
var filter = string.Join(" or ", filterParams);
var results = provider.Media.AddQueryOption("$filter", filter).Execute().ToList();
It's not elegant, and you wouldn't want to use this for a large list of ids (> ~60), but it'll do the trick.
Expanding on MCattle suggestion if we need more 50 or 60 ids then its advisable to do in 2 or more parallel calls and add them to concurrent dictionary or something similar as we get results from server. Though this increases the number of calls to server but because we are slowly moving to cloud environment it shouldn't be a big problem in my opinion.

Resources