Elasticsearch: Accessing all terms in TermVectorResponse in plugin

Elasticsearch: Accessing all terms in TermVectorResponse in plugin - elasticsearch

I'm trying to get a list of Terms from a termvectorresponse in an elasticsearch plugin. I want to get access to all of the statistics which are tied to the terms and am having trouble figuring out how to do that.
After making a TermVectorsRequest...
TermVectorsRequest tvReq = new TermVectorsRequest(request.param("index"), request.param("type"), request.param("id"));
tvReq.termStatistics(true);
tvReq.selectedFields(request.param("field"));
and getting a response from the client...
TermVectorsResponse tvResponse = client.termVectors(tvReq).get();
I can get access to the id, index, etc. In the fields I get "contents" which is the field name that I want. From there though it looks like I can run...
tvResponse.getFields().terms("some term here")
in which the Terms object this returns has access to the stats I want.
I have a few issues with this though. One is that only "contents" seems to be non null. In the termvectors endpoint in elastic I get several different terms of which I've tried plugging into here. Two, is I want to get a list of terms rather than having to type in which term I want.
How can I go about doing this?
Thanks

Figured it out. Theres an interator on terms you can use. on .terms you have to pass it the field and you'll get the Terms object back. From that you can use the .iterator to get each individual term and do whatever you want with them.

Related

Umbraco 8 - Get Children Of Node Using ContentAtXPath() Method

I've been refactoring an existing Umbraco project to use more performant querying when getting back document data as everything was previously being returned using LINQ. I have been using a combination of Umbraco's querying via XPaths and Examine.
I am currently stumped on trying to get child documents using the Umbraco.ContentAtXPath() method. What I would like to do is get child document based on a path I parse to the method. This is what I have currently:
IEnumerable<IPublishedContent> umbracoPages = Umbraco.ContentAtXPath("//* [#isDoc]/descendant::/About [#isDoc]");
Running this returns a "Object reference not set to an instance of an object." error and unable to see exactly where I'm going wrong (new to this form of querying in Umbraco).
Ideally, I'd like to enhance the querying to also carry out sorting using the non-LINQ approach, as demonstrated here.

Up until Umbraco 8, content was cached in an XML file, which made XPath perfect for querying content efficiently. In v8, however, the so called "NuCache" is not file based nor XML based, so the XPath query support is only there for ... well... Old times sake, I guess? Either way it's probably not going to be super efficient and (I'd advise) not something to "aim for". That said I of course don't know what you are changing from (Linq can be a lot of things) :-/

It certainly depends on how big your dataset is.
As Umbraco has moved away from the XML backed cache, you should look into Linq queries against your content models. Make sure you use ModelsBuilder to generate the models.
On a small dataset Linq will be much quicker than examine. On a large dataset Examine/Lucene will be much more steady on performance.

Querying NuCache is pretty fast in Umbraco 8, only beaten by an Examine search.
Assuming you're using Models Builder, and your About page is a child of Home page, you could use:
var homePage = (HomePage) Model.Root();
var aboutPage = homePage?.Children<AboutPage>().FirstOrDefault();
var umbracoPages = aboutPage.Children();
Where HomePage is your home page Document Type Alias and AboutPage is your About page Document Type alias.

Subquery, for lack of a better term, when using an API written in GraphQL

I'm relatively new to GraphQL so please bear with me ...
That said, I'm writing an app in node.js to push/pull data from two disparate systems, one of which has an API written in GraphQL.
For the Graph system, I have, something like, the following types defined for me:
Time {
TimeId: Int
TaskId: Int
ProjectId: Int
Project: [Project]
TimeInSeconds: Int
Timestamp: Date
}
and
Task {
TaskId: Int
TaskName: String
TaskDescription: String
}
Where Project is another type whose definition isn't important, only that it is included in the type definition as a field...
What I would like to know is if there is a way to write a query for Time in such a way that I can include the Task type's values in my results in a similar way as the values for the Project type are included in the definition?
I am using someone else's API and do not have the ability to define my own custom types. I can write my own limited queries, but I don't know if the limits are set by the devs that wrote the API or my limited ability with GraphQL.
My suspicion is that I cannot and that I will have to query both separately and combine them after the fact, but I wanted to check here just in case.

Unfortunately, unless the Time type exposes some kind of field to fetch the relevant Task, you won't be able to query for it within the same request. You can include multiple queries within a single GraphQL request; however, they are ran in parallel, which means you won't be able to use the TaskId value returned by one query as a variable used in another query.
This sort of problem is best solved by modifying the schema, but if that's not an option then unfortunately the only other option is to make each request sequentially and then combine the results client-side.

`r.row` and `r.contains ` in nested queries not working

Trying to add a new field inside of a document using a field from another table. I first tried to use r.row
r.db('client').table('basic_info').filter({name:'Andre Pastr'}).update({teacher_name: r.db('teacher').table('basic_info').contains(r.row('name').eq('Nicholas Derma')).pluck('id')});
and it returned the message:
e: Cannot use r.row in nested queries. Use functions instead in:
r.db("client").table("basic_info").filter({name: 'Andre Pastr'}).update({"teacher_name": r.db("teacher").table("basic_info").contains(r.row("name").eq("Nicholas Derma")).pluck("id")})
^^^^^
so then after finding this [vague entry][1] in the github issue log I just decided to just change the syntax of my query and run:
r.db('client').table('basic_info').filter({name:'Andre Albuquerque'}).update({teacher_name: r.db('teacher').table('basic_info').filter({name: 'Nicholas Chadwick'}).pluck('id')});
That returned the error message:
: Could not prove argument deterministic. Maybe you want to use the non_atomic flag?
Three inter-related questions: How might I use contains to run such a query? What am I doing wrong with this second query? Why doesn't r.row work in the first query?

You can't use r.row in the first query because it's ambiguous whether it should be bound to contains or to update. (We could define which one it's bound to, but it's very confusing for people because most people haven't memorized all the terms that could possibly bind r.row, so it's easier to force people to name their variables in ambiguous cases.)
You have to specify the non_atomic flag because the function you're passing to update can't be applied atomically (since it isn't deterministic). The way you wrote it, it will be executing a table fetch for every row in the original filter.
You probably want to write something like this instead:
r.table('basic_info').filter({name: 'Nicholas Derma'}).pluck('id').do(function(nderma) {
return r.table('basic_info').filter({name: 'Andre Pastr'}).update({teacher_name: nderma);
})
You should consider creating a secondary index on name and using getAll instead of filter if you run this query often.

Django haystack narrow with OR operator between fields

I do a search. I narrow by field A. I narrow by field B. I get results that include burlap AND sack. What I want is to get results that include burlap OR sack.
sqs = sqs.narrow(fieldA='burlap')
sqs = sqs.narrow(fieldB='sack')
You can do some level of OR narrowing with the following:
sqs = sqs.narrow(fieldA=('burlap' or 'tweed' or 'plastic'))
sqs = sqs.narrow(fieldB='sack')
But you still end up with results with burlap AND sack. An alternative to this method is the following, but it is not ideal since it seems to be slow on large data sets:
sqs = sqs.filter_or(fieldA='burlap')
sqs = sqs.filter_or(fieldB='sack')
Where is Daniel Lindsay when you need him?

YMMV -- the docs (http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#narrow) point out that this method is not portable between backends and that the syntax depends on the backend. The example in that section even has a lucene looking "SearchQuerySet().narrow('title:smoothie')" example.
In the source it looks like haystack pretty trustingly passes whatever you have as your narrow argument to the back end. You didn't say what backend you are using, but maybe something like this would get you the fq you want in solr:
sqs = sqs.narrow('fieldA:burlap OR fieldB:sack')
Filter_or is a different animal than narrow, at least with solr. Filter_or will add that clause to the main query, resulting in a different set of results, different scoring, etc. Narrow will create a filter query. This is instead used to filter your original results (shocking, right?) and it can be cached, which can help performance if you're going to be using that filter a lot.
D'oh, I typed all that stuff and still don't know where Daniel Lindsay is.

'Maximum number of expressions in a list is 1000' error with Grails and Oracle

I'm using Grails with an Oracle database. Most of the data in my application is part of a hierarchy that goes something like this (each item containing the following one):
Direction
Group
Building site
Contract
Inspection
Non-conformity
Data visible to a user is filtered according to his accesses which can be at the Direction, Group or Building Site level depending on user role.
We easily accomplished this by creating a listWithSecurity method for the BuildingSite domain class which we use instead of list across most of the system. We created another listWithSecurity method for Contract. It basically does a Contract.findAllByContractIn(BuildingSite.listWithSecurity). And so on with the other classes. This has the advantage of keeping all the actual access logic in BuildingSite.listWithsecurity.
The problem came when we started getting real data in the system. We quickly hit the "ora-01795 maximum number of expressions in a list is 1000" error. Fair enough, passing a list of over 1000 literals is not the most efficient thing to do so I tried other ways even though it meant I would have to deport the security logic to each controller.
The obvious way seemed to use a criteria such as this (I only put the Direction level access here for simplicity):
def c = NonConformity.createCriteria()
def listToReturn = c.list(max:params.max, offset: params.offset?.toInteger() ?: 0)
{
inspection {
contract {
buildingSite {
group {
'in'("direction",listOfOneOrTwoDirections)
}
}
}
}
}
I was expecting Grails to generate a single query with joins that would avoid the ora-01795 error but it seems to be calling a separate query for each level and passing the result back to Oracle as literal in an 'in' to query the other level. In other words, it does exactly what I was doing so I get the same error.
Actually, it might be optimising a bit. It seems to be solving the problem but only for one level. In the previous example, I wouldn't get an error for 1001 inspections but I would get it for 1001 contracts or building sites.
I also tried to do basically the same thing with findAll and a single HQL where statement to which I passed a single direction to get the nonConformities in one query. Same thing. It solves the first levels but I get the same error for other levels.
I did manage to patch it by splitting my 'in' criteria into many 'in' inside an 'or' so no single list of literals is more than 1000 long but that's profoundly ugly code. A single findAllBy[…]In becomes over 10 lines of code. And in the long run, it will probably cause performance problems since we're stuck doing queries with a very large amount of parameters.
Has anyone encountered and solved this problem in a more elegant and efficient way?

This won't win any efficiency awards but I thought I'd post it as an option if you just plainly need to query a list of more than 1000 items none of the more efficient options are available/appropriate. (This stackoverflow question is at the top of Google search results for "grails oracle 1000")
In a grails criteria you can make use of Groovy's collate() method to break up your list...
Instead of this:
def result = MyDomain.createCriteria().list {
'in'('id', idList)
}
...which throws this exception:
could not execute query
org.hibernate.exception.SQLGrammarException: could not execute query
at grails.orm.HibernateCriteriaBuilder.invokeMethod(HibernateCriteriaBuilder.java:1616)
at TempIntegrationSpec.oracle 1000 expression max in a list(TempIntegrationSpec.groovy:21)
Caused by: java.sql.SQLSyntaxErrorException: ORA-01795: maximum number of expressions in a list is 1000
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
You'll end up with something like this:
def result = MyDomain.createCriteria().list {
or { idList.collate(1000).each { 'in'('id', it) } }
}
It's unfortunate that Hibernate or Grails doesn't do this for you behind the scenes when you try to do an inList of > 1000 items and you're using an Oracle dialect.
I agree with the many discussions on this topic of refactoring your design to not end up with 1000+ item lists but regardless, the above code will do the job.

Along the same lines as Juergen's comment, I've approached a similar problem by creating a DB view that flattens out user/role access rules at their most granular level (Building Site in your case?) At a minimum, this view might contain just two columns: a Building Site ID and a user/group name. So, in the case where a user has Direction-level access, he/she would have many rows in the security view - one row for each child Building Site of the Direction(s) that the user is permitted to access.
Then, it would be a matter of creating a read-only GORM class that maps to your security view, joining this to your other domain classes, and filtering using the view's user/role field. With any luck, you'll be able to do this entirely in GORM (a few tips here: http://grails.1312388.n4.nabble.com/Grails-Domain-Class-and-Database-View-td3681188.html)
You might, however, need to have some fun with Hibernate: http://grails.org/doc/latest/guide/hibernate.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Elasticsearch: Accessing all terms in TermVectorResponse in plugin - elasticsearch

Figured it out. Theres an interator on terms you can use. on .terms you have to pass it the field and you'll get the Terms object back. From that you can use the .iterator to get each individual term and do whatever you want with them.

Related

Umbraco 8 - Get Children Of Node Using ContentAtXPath() Method

Subquery, for lack of a better term, when using an API written in GraphQL

`r.row` and `r.contains ` in nested queries not working

Django haystack narrow with OR operator between fields

'Maximum number of expressions in a list is 1000' error with Grails and Oracle

Categories

Resources