"filtered query does not support sort" when using Hibernate Search - sorting

I'm trying to issue a query which includes sorting
from Hibernate Search 5.7.1.Final
to ElasticSearch 2.4.2.
When I'm using curl I get the results:
curl -XPOST 'localhost:9200/com.example.app.model.review/_search?pretty' -d '
{
"query": { "match" : { "authors.name" : "Puczel" } },
"sort": { "title": { "order": "asc" } }
}'
But when I issue the query from code:
protected static Session session;
public static void prepareSession()
{
SessionFactory sessionFactory = new Configuration().configure()
.buildSessionFactory();
session = sessionFactory.openSession();
}
...
protected static void testJSONQueryWithSort()
{
FullTextSession fullTextSession = Search.getFullTextSession(session);
QueryDescriptor query = ElasticsearchQueries.fromJson(
"{ 'query': { 'match' : { 'authors.name' : 'Puczel' } }, 'sort': { 'title': { 'order': 'asc' } } }");
List<?> result = fullTextSession.createFullTextQuery(query, Review.class).list();
System.out.println("\n\nSearch results for 'author.name:Puczel':");
for(Object object : result)
{
Review review = (Review) object;
System.out.println(review.toString());
}
}
I get an Exception:
"[filtered] query does not support [sort]"
I understand where it comes from, because the query
that Hibernate Search issues is different than my curl query
- specifying the type is realised differently:
{
"query":
{
"filtered":
{
"query":
{
"match":{"authors.name":"Puczel"}
},
"sort":{"title":{"order":"asc"}},
"filter":{"type":{"value":"com.example.app.model.Review"}}
}
}
}
But I don't know how to change it.
I tried using the sort example from Hibernate documentation:
https://docs.jboss.org/hibernate/search/5.7/reference/en-US/html_single/#__a_id_elasticsearch_query_sorting_a_sorting
But the example is not full. I don't know:
which imports to use (there are multiple matching),
what are the types of the undeclared variables, like s,
how to initalise the variable luceneQuery.
I will appreciate any remarks on this.

Yes, as mentioned in the javadoc of org.hibernate.search.elasticsearch.ElasticsearchQueries.fromJson(String):
Note that only the 'query' attribute is supported.
So you must use the Hibernate Search API to perform sorts.
which imports to use (there are multiple matching),
Sort is the one from Lucene (org.apache.lucene), List is from java.util, and all the other imports should be from Hibernate Search (org.hibernate.search).
what are the types of the undeclared variables, like s
s is a FullTextSession retrieved through org.hibernate.search.Search.getFullTextSession(Session). It will also work with a FullTextEntityManager retrieved through org.hibernate.search.jpa.Search.getFullTextEntityManager(EntityManager).
how to initalise the variable luceneQuery
You'll have to use the query builder (qb):
Query luceneQuery = qb.keyword().onField("authors.name").matching("Puczel").createQuery();
If you intend to use the Hibernate Search API, and you're not comfortable with it yet, I'd recommend reading the general documentation first (not just the Elasticsearch part, which only mentions Elasticsearch specifics): https://docs.jboss.org/hibernate/search/5.7/reference/en-US/html_single/#search-query

Related

What is the equivalent Query DSL object for the q parameter?

When I _search on elasticsearch, sometimes I just query with a string like q=NEEDLE and let everything happen automagically, but when I want more complex queries I use a
{
query:{ ... }
}
object.
I was wondering,
What would be the equivalent of sending the query string q=NEEDLE inside a Query DSL object?
It is equivalent to Query String . You can confirm this via the code.
For the case in OP "q=needle" it is a Query-String run against default_field.
As per documentation this defaults to the index.query.default_field index settings, which in turn defaults to "_all".
Example :
{
"query": {
"query_string": {
"query": "needle",
"analyze_wildcard": false,
"lenient" : false,
"lowercase_expanded_terms" : true
}
}
}

Spring Elasticsearch Aggregation Filtering Not Working

I'm trying to query pricing stats on products I am recording in my Elasticsearch Database by product number. The pricing may be for new, used or refurbished products, so I wish to filter on condition. The condition filter works as a JSON query in Marvel returning stats based on two price documents with condition new.
When I try to do similar using the Java API, I am getting stats based on 4 documents that includes 2 new and 2 refurbished.
Could anyone please identify what I am doing wrong in the Java code below?
Thanks.
Here's the working JSON Query:
GET /stats/price/_search
{
"query": {
"match_phrase": {"mpc": "MGTX2LL/A"}
},
"size": 0,
"aggs" : {
"low_price_stats" : {
"filter": {
"term" : { "condition" : "new"}
},
"aggs" : {
"price_stats" : { "extended_stats" : { "field" : "price" } }
}
}
}
}
And the problematic Java:
public Aggregations aggByManufacturerPartNumber(String mpn) {
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("stats")
.withTypes("price")
.withQuery(termQuery("mpn", mpn))
.withFilter(
FilterBuilders.termFilter("condition", "New")
)
.addAggregation(AggregationBuilders.extendedStats("stats_agg").field("price"))
.build();
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
#Override
public Aggregations extract(SearchResponse response) {
return response.getAggregations();
}
});
return aggregations;
}
In your Java code you're only building the price_stats sub-aggregation without its parent filter aggregation. The call to withFilter will create a filter at the query level, not at the aggregation level. The correct Java code that matches your JSON query would be like this:
// build top-level filter aggregation
FilterAggregationBuilder lowPriceStatsAgg = AggregationBuilders.filter("low_price_stats")
.filter(FilterBuilders.termFilter("condition", "new"));
// build extended stats sub-aggregation
lowPriceStatsAgg.subAggregation(AggregationBuilders.extendedStats("stats_agg").field("price"));
// build query
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withIndices("stats")
.withTypes("price")
.withQuery(termQuery("mpn", mpn))
.addAggregation(lowPriceStatsAgg)
.build();
// then get the results
Aggregations aggs = response.getAggregations();
Filter lowPriceStats = aggs.get("low_price_stats");
ExtendedStats statsAgg = lowPriceStats.get("stats_agg");
Besides, also note that in your JSON query you have a match_phrase on the mpc field while in your Java code you have a term query on the mpn field. So you probably need to fix that, too, but the above code fixes the aggregation part only.

Spring Data Elastic Search with special characters

As part of our project we are using Spring Data on top of Elastic Search.
We found very interesting issue with findBy queries. If we pass string that contains space it didn't find the right element unless we pad the string with quotes. For example: for getByName(String name) we should pass getByName("\"John Do\"").
Is there any way to eliminate such redundant padding?
I'm trying my first steps with Spring (Boot Starter) Data ES and stumbled upon the same issue as you have, only in my case it was a : that 'messed things up'. I've learned that this is part of the reserved characters (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters). The quoting that you mention is exactly the solution I use for now. It results in a query like this:
{
"from": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "\"John Do\"",
"fields": ["name"]
}
}
}
}
}
(You can use this in a rest console or in ElasticHQ to check the result.)
A colleague suggested that switching to a 'term' query:
{
"from": 0,
"size": 100,
"query": {
"term" : {
"name": "John Do"
}
}
}
might help to avoid the quoting. I have tried this out by use of the #Query annotation on the method findByName in your repository. It would go something like this:
#Query(value = "{\"term\" : {\"name\" : \"?0\"}}")
List<Person> findByName(String name);

Count query with PHP Elastica and Symfony2 FosElasticaBundle

I'm on a Symfony 2.5.6 project using FosElasticaBundle (#dev).
In my project, i just need to get the total hits count of a request on Elastic Search. That is, i'm querying Elastic Search with a normal request, but through the special "count" URL:
localhost:9200/_search?search_type=count
Note the "search_type=count" URL param.
Here's the example query:
{
"query": {
"filtered": {
"query": {
"match_all": []
},
"filter": {
"bool": {
"must": [
{
"terms": {
"media.category.id": [
681
]
}
}
]
}
}
}
},
"sort": {
"published_at": {
"order": "desc"
}
},
"size": 1
}
The results contains a normal JSON response but without any documents in the hits part. From this response i easily get the total count:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 81,
"max_score": 0,
"hits": [ ]
}
}
Okay, hits.total == 81.
Now, i couldn't find any solution to do the same through FOSElasticaBundle, from a repository.
I tried this:
$query = (...) // building the Elastica query here
$count = $this->finder->findPaginated(
$query,
array(ES::OPTION_SEARCH_TYPE => ES::OPTION_SEARCH_TYPE_COUNT)
)->getNbResults();
But i get an Attempted to load class "Pagerfanta". I don't want Pagerfanta.
Then this:
$count = $this->finder->createPaginatorAdapter(
$query,
array(ES::OPTION_SEARCH_TYPE => ES::OPTION_SEARCH_TYPE_COUNT)
)->getTotalHits();
But it would always give me 0.
Would be easy if i had access to the Elastica Finder service from the repository (i could then get a ResultSet from the query search, and this ResultSet has a correct getTotalHits() method). But services from repository... you know.
Thank you for any help or clue!
I faced the same challenge, getting access to the searchable interface from inside the repo. Here's what I ended up with:
Create AcmeBundle\Elastica\ExtendedTransformedFinder. This just extends the TransformedFinder class and makes the searchable interface accessible.
<?php
namespace AcmeBundle\Elastica;
use FOS\ElasticaBundle\Finder\TransformedFinder;
class ExtendedTransformedFinder extends TransformedFinder
{
/**
* #return \Elastica\SearchableInterface
*/
public function getSearch()
{
return $this->searchable;
}
}
Make the bundle use our new class; in service.yml:
parameters:
fos_elastica.finder.class: AcmeBundle\Elastica\ExtendedTransformedFinder
Then in a repo use the getSearch method of our class and do what you want :)
class SomeSearchRepository extends Repository
{
public function search(/* ... */)
{
// create and set your query as you like
$query = Query::create();
// ...
// then run a count query
$count = $this->finder->getSearch()->count($query);
}
}
Heads up this works for me with version 3.1.x. Should work starting with 3.0.x.
Ok, so, here we go: it is not possible.
You cannot, as of version 3.1.x-dev (2d8903a), get the total matching document count returned by elastic search from FOSElasticaBundle, because this bundle does not expose this value.
The RawPaginatorAdapter::getTotalHits() method contains this code:
return $this->query->hasParam('size')
? min($this->totalHits, (integer) $this->query->getParam('size'))
: $this->totalHits;
which prevents to get the correct $this->totalHits without actually requiring any document. Indeed, if you set size to 0, to tell elasticsearch not to return any document, only meta information, RawPaginatorAdapter::getTotalHits() will return 0.
So FosElasticaBundle doesn't provide a way to know this total hits count, you could only do that through the Elastica library directly. Of course with the downisde that Elastica finders are natively available in \FOS\ElasticaBundle\Repository. You'd had to make a new service, do some injection, and inovke your service instead of the FOSElasticaBundle one for repositories... ouch.
I chose another path, i forked https://github.com/FriendsOfSymfony/FOSElasticaBundle and changed the method code as follow:
/**
* Returns the number of results.
*
* #param boolean $genuineTotal make the function return the `hits.total`
* value of the search result in all cases, instead of limiting it to the
* `size` request parameter.
* #return integer The number of results.
*/
public function getTotalHits($genuineTotal = false)
{
if ( ! isset($this->totalHits)) {
$this->totalHits = $this->searchable->search($this->query)->getTotalHits();
}
return $this->query->hasParam('size') && !$genuineTotal
? min($this->totalHits, (integer) $this->query->getParam('size'))
: $this->totalHits;
}
$genuineTotal boolean restores the elasticsearch behaviour, without introducing any BC break. I could also have named it $ignoreSize and use it the opposite way.
I opened a Pull Request: https://github.com/FriendsOfSymfony/FOSElasticaBundle/pull/748
We'll see! If that could help just one person i'd be happy already!
While, you can get the index instance as a service (fos_elastica.index.INDEX_NAME.TYPE_NAME) and ask for count() method.
Joan

Filter facet returns count of all documents and not range

I'm using Elasticsearch and Nest to create a query for documents within a specific time range as well as doing some filter facets. The query looks like this:
{
"facets": {
"notfound": {
"query": {
"term": {
"statusCode": {
"value": 404
}
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
In the specific case, the total hits of the search is 21 documents, which fits the documents within that time range in Elasticsearch. But the "notfound" facet returns 38, which fits the total number of ErrorDocuments with a StatusCode value of 404.
As I understand the documentation, facets collects data from withing the search. In this case, the "notfound" facet should never be able to return a count higher that 21.
What am I doing wrong here?
There's a distinct difference between filter/query/filtered_query/facet filter which is good to know.
Top level filter
{
filter: {}
}
This acts as a post-filter, meaning it will filter the results after the query phase has ended. Since facets are part of the query phase filters do not influence the documents that are facetted over. Filters do not alter score and are therefor very cacheable.
Top level query
{
query: {}
}
Queries influence the score of a document and are therefor less cacheable than filters. Queries run in the query phase and thus also influence the documents that are facetted over.
Filtered query
{
query: {
filtered: {
filter: {}
query: {}
}
}
}
This allows you to run filters in the query phase taking advantage of their better cacheability and have them influence the documents that are facetted over.
Facet filter
"facets" : {
"<FACET NAME>" : {
"<FACET TYPE>" : {
...
},
"facet_filter" : {
"term" : { "user" : "kimchy"}
}
}
}
this allows you to apply a filter to the documents that the facet is run over. Remember that the it'll be a combination of the queryphase/facetfilter unless you also specify global:true on the facet as well.
Query Facet/Filter Facet
{
"facets" : {
"wow_facet" : {
"query" : {
"term" : { "tag" : "wow" }
}
}
}
}
Which is the one that #thomasardal is using in this case which is perfectly fine, it's a facet type which returns a single value: the query hit count.
The fact that your Query Facet returns 38 and not 21 is because you use a filter for your time range.
You can fix this by either doing the filter in a filtered_query in the query phase or apply a facet filter(not a filter_facet) to your query_facet although because filters are cached better you better use facet filter inside you filter facet.
Confusingly Filter Facets are specified using .FacetFilter() on the search object. I will change this in 1.0 to avoid future confusion.
Sadly: .FacetFilter() and .FacetQuery() in NEST do not allow you to specify a facet filter like you can with other facets:
var results = typedClient.Search<object>(s => s
.FacetTerm(ft=>ft
.OnField("myfield")
.FacetFilter(f=>f.Term("filter_facet_on_this_field", "value"))
)
);
You issue here is that you are performing a Filter Facet and not a normal facet on your query (which will follow the restrictions applied via the query filter). In the JSON, the issue is because of the "query" between the facet name "notfound" and the "terms" entry. This is telling Elasticsearch to run this as a separate query and facet on the results of this separate query and not your main query with the date range filter. So your JSON should look like the following:
{
"facets": {
"notfound": {
"term": {
"statusCode": {
"value": 404
}
}
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"time": {
"from": "2014-04-05T05:25:37",
"to": "2014-04-07T05:25:37"
}
}
}
]
}
}
}
Since I see you have this tagged with NEST as well, in your call using NEST, you are probably using FacetFilter on your search request, switch this to just Facet to get the desired result.

Resources