Django haystack narrow with OR operator between fields - filter

I do a search. I narrow by field A. I narrow by field B. I get results that include burlap AND sack. What I want is to get results that include burlap OR sack.
sqs = sqs.narrow(fieldA='burlap')
sqs = sqs.narrow(fieldB='sack')
You can do some level of OR narrowing with the following:
sqs = sqs.narrow(fieldA=('burlap' or 'tweed' or 'plastic'))
sqs = sqs.narrow(fieldB='sack')
But you still end up with results with burlap AND sack. An alternative to this method is the following, but it is not ideal since it seems to be slow on large data sets:
sqs = sqs.filter_or(fieldA='burlap')
sqs = sqs.filter_or(fieldB='sack')
Where is Daniel Lindsay when you need him?

YMMV -- the docs (http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#narrow) point out that this method is not portable between backends and that the syntax depends on the backend. The example in that section even has a lucene looking "SearchQuerySet().narrow('title:smoothie')" example.
In the source it looks like haystack pretty trustingly passes whatever you have as your narrow argument to the back end. You didn't say what backend you are using, but maybe something like this would get you the fq you want in solr:
sqs = sqs.narrow('fieldA:burlap OR fieldB:sack')
Filter_or is a different animal than narrow, at least with solr. Filter_or will add that clause to the main query, resulting in a different set of results, different scoring, etc. Narrow will create a filter query. This is instead used to filter your original results (shocking, right?) and it can be cached, which can help performance if you're going to be using that filter a lot.
D'oh, I typed all that stuff and still don't know where Daniel Lindsay is.

Related

What is the convention around derivative information?

I am working on a service that provides information about a few related entities, somewhat like a database. Suppose that there's calls to retrieve information about a school:
service MySchool {
rpc GetClassRoom (ClassRoomRequest) returns (ClassRoom);
rpc GetStudent (StudentRequest) returns (Student);
}
Now, suppose that I want to find out a class room's information, I'd receive a proto that looks like so:
message ClassRoom {
string id = 1;
string address = 2;
string teacher = 3;
}
Sometimes I also want to know all of the students of the classroom. I am struggling to think which is the better design pattern.
Option A) Add an extra rpc like so: rpc GetClassRoomStudents (ClassRoomRequest) returns (ClassRoomStudents), where ClassRoomStudents has a single field repeated Student students. This technique requires more than one call to get all the information that we want (and many if we wanted to know information for more than one classroom).
Option B) Add an extra repeated Student students field to the ClassRoom proto, and B') Fill it up only when necessary, or B") Fill it up whenever the server receives a GetClassRoom call. This may sometimes fetch extra information, or lead to ambiguity according to what fields are filled up.
I am not sure what's the best / most conventional way of dealing with this. How have some of you dealt with this?
There is no simple answer. It's a tradeoff between simplicity (option A) and performance (option B), and it depends on the situation which solution is best.
In general, I'd recommend to go with the simple solution first, unless your measurements demonstrate that it leads to performance issues. At that point, it's easy to add repeated Student students to ClassRoom and a field bool fetch_students [default=false] to ClassRoomRequest. Then clients are free to continue using the simple API, or choose to upgrade to the more performant API if they need to.
Note that this isn't specific to gRPC; the same issue is seen in REST APIs, and basically almost any request/response model.

Elasticsearch: Accessing all terms in TermVectorResponse in plugin

I'm trying to get a list of Terms from a termvectorresponse in an elasticsearch plugin. I want to get access to all of the statistics which are tied to the terms and am having trouble figuring out how to do that.
After making a TermVectorsRequest...
TermVectorsRequest tvReq = new TermVectorsRequest(request.param("index"), request.param("type"), request.param("id"));
tvReq.termStatistics(true);
tvReq.selectedFields(request.param("field"));
and getting a response from the client...
TermVectorsResponse tvResponse = client.termVectors(tvReq).get();
I can get access to the id, index, etc. In the fields I get "contents" which is the field name that I want. From there though it looks like I can run...
tvResponse.getFields().terms("some term here")
in which the Terms object this returns has access to the stats I want.
I have a few issues with this though. One is that only "contents" seems to be non null. In the termvectors endpoint in elastic I get several different terms of which I've tried plugging into here. Two, is I want to get a list of terms rather than having to type in which term I want.
How can I go about doing this?
Thanks
Figured it out. Theres an interator on terms you can use. on .terms you have to pass it the field and you'll get the Terms object back. From that you can use the .iterator to get each individual term and do whatever you want with them.

lock individual documents in elastic search using nest

I am looking for a good example to lock a document in elastic search. The below link explain how to do this in elasticsearch.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/concurrency-solutions.html
How can i achieve this using NEST. can someone suggest an approach to start with.
To start with, read the NEST Quick Start article. In particular, note the following:
NEST is a high level elasticsearch client that still maps very closely to the original elasticsearch API. Requests and Responses have been mapped to CLR objects and NEST also comes with a powerful strongly typed query dsl.
Consider also, that you may not require the added level of abstraction that NEST provides, and may prefer to directly utilise Elasticsearch.NET. Read also, it's own Quick Start article.
With that in mind, implementing the steps in the Solving concurrency issues article using NEST / Elasticsearch.NET is simply a matter of identifying the .NET methods that correspond to the necessary Elasticsearch API methods. The first one for example:
PUT /fs/lock/global/_create
{}
In NEST would look something like:
var createGlobalLockResponse = esClient.Index<object>(new object(), f => f
.Index("fs")
.Type("lock")
.Id("global")
.OpType(global::Elasticsearch.Net.OpType.Create));
Regarding the check of whether "this create request fails with a conflict exception", check the properties on the createGlobalLockResponse object. In particular, Created and if you're intending to get a bit more specific in the handling, ServerError.

'Maximum number of expressions in a list is 1000' error with Grails and Oracle

I'm using Grails with an Oracle database. Most of the data in my application is part of a hierarchy that goes something like this (each item containing the following one):
Direction
Group
Building site
Contract
Inspection
Non-conformity
Data visible to a user is filtered according to his accesses which can be at the Direction, Group or Building Site level depending on user role.
We easily accomplished this by creating a listWithSecurity method for the BuildingSite domain class which we use instead of list across most of the system. We created another listWithSecurity method for Contract. It basically does a Contract.findAllByContractIn(BuildingSite.listWithSecurity). And so on with the other classes. This has the advantage of keeping all the actual access logic in BuildingSite.listWithsecurity.
The problem came when we started getting real data in the system. We quickly hit the "ora-01795 maximum number of expressions in a list is 1000" error. Fair enough, passing a list of over 1000 literals is not the most efficient thing to do so I tried other ways even though it meant I would have to deport the security logic to each controller.
The obvious way seemed to use a criteria such as this (I only put the Direction level access here for simplicity):
def c = NonConformity.createCriteria()
def listToReturn = c.list(max:params.max, offset: params.offset?.toInteger() ?: 0)
{
inspection {
contract {
buildingSite {
group {
'in'("direction",listOfOneOrTwoDirections)
}
}
}
}
}
I was expecting Grails to generate a single query with joins that would avoid the ora-01795 error but it seems to be calling a separate query for each level and passing the result back to Oracle as literal in an 'in' to query the other level. In other words, it does exactly what I was doing so I get the same error.
Actually, it might be optimising a bit. It seems to be solving the problem but only for one level. In the previous example, I wouldn't get an error for 1001 inspections but I would get it for 1001 contracts or building sites.
I also tried to do basically the same thing with findAll and a single HQL where statement to which I passed a single direction to get the nonConformities in one query. Same thing. It solves the first levels but I get the same error for other levels.
I did manage to patch it by splitting my 'in' criteria into many 'in' inside an 'or' so no single list of literals is more than 1000 long but that's profoundly ugly code. A single findAllBy[…]In becomes over 10 lines of code. And in the long run, it will probably cause performance problems since we're stuck doing queries with a very large amount of parameters.
Has anyone encountered and solved this problem in a more elegant and efficient way?
This won't win any efficiency awards but I thought I'd post it as an option if you just plainly need to query a list of more than 1000 items none of the more efficient options are available/appropriate. (This stackoverflow question is at the top of Google search results for "grails oracle 1000")
In a grails criteria you can make use of Groovy's collate() method to break up your list...
Instead of this:
def result = MyDomain.createCriteria().list {
'in'('id', idList)
}
...which throws this exception:
could not execute query
org.hibernate.exception.SQLGrammarException: could not execute query
at grails.orm.HibernateCriteriaBuilder.invokeMethod(HibernateCriteriaBuilder.java:1616)
at TempIntegrationSpec.oracle 1000 expression max in a list(TempIntegrationSpec.groovy:21)
Caused by: java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a list is 1000
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
You'll end up with something like this:
def result = MyDomain.createCriteria().list {
or { idList.collate(1000).each { 'in'('id', it) } }
}
It's unfortunate that Hibernate or Grails doesn't do this for you behind the scenes when you try to do an inList of > 1000 items and you're using an Oracle dialect.
I agree with the many discussions on this topic of refactoring your design to not end up with 1000+ item lists but regardless, the above code will do the job.
Along the same lines as Juergen's comment, I've approached a similar problem by creating a DB view that flattens out user/role access rules at their most granular level (Building Site in your case?) At a minimum, this view might contain just two columns: a Building Site ID and a user/group name. So, in the case where a user has Direction-level access, he/she would have many rows in the security view - one row for each child Building Site of the Direction(s) that the user is permitted to access.
Then, it would be a matter of creating a read-only GORM class that maps to your security view, joining this to your other domain classes, and filtering using the view's user/role field. With any luck, you'll be able to do this entirely in GORM (a few tips here: http://grails.1312388.n4.nabble.com/Grails-Domain-Class-and-Database-View-td3681188.html)
You might, however, need to have some fun with Hibernate: http://grails.org/doc/latest/guide/hibernate.html

Pagination with MongoDB

I have been using MongoDB and RoR to store logging data. I am pulling out the data and looking to page the results. Has anyone done paging with MongoDB or know of any resources online that might help get me started?
Cheers
Eef
Pagination in MongoDB can be accomplished by using a combination of limit() and skip().
For example, assume we have a collection called users in our active database.
>> db.users.find().limit(3)
This retrieves a list of the first three user documents for us. Note, this is essentially the same as writing:
>> db.users.find().skip(0).limit(3)
For the next three, we can do this:
>> db.users.find().skip(3).limit(3)
This skips over the first three user records, and gives us the next three. If there is only one more user in your database, don't worry; MongoDB is smart enough to only return data that is present, and won't crash.
This can be generalised like so, and would be roughly equivalent to what you would do in a web application. Assuming we have variables called PAGE_SIZE which is set to 3, and an arbitrary PAGE_NUMBER:
>> db.users.find().skip(PAGE_SIZE * (PAGE_NUMBER - 1)).limit(PAGE_SIZE)
I cannot speak directly as to how to employ this method in Ruby on Rails, but I suspect the Ruby MongoDB library exposes these methods.

Resources