Elasticsearch: why Java client uses different query syntax? - elasticsearch

I am new to Elasticsearch. I have the following query:
{
"query": {
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
}
It runs fine under Windows prompt as follows:
curl -XGET localhost:9200/book/_search?pretty -d #my-query.json
For the same query with Java client, I have the following:
SearchResponse sr = client.prepareSearch("book")
.setTypes("fiction")
.setQuery(query_string)
.setFrom(page)
.setSize(10).execute().actionGet();
However, I have to the following query string in order to run it without exception:
{
"filtered" : {
"query" : {
"term" : {
"title" : "crime"
}
},
"filter" : {
"term" : { "year" : 1961 }
}
}
}
Why is there such a difference? How can I retain the removed "query" property"? Suppose that I have to use query string in my Java client.
Thanks and regards!

Strictly speaking, the two variants you show are not the same: you don't specify the type, offset, or size parameters in your URI-based query (even though you can do it there too, according to the docs). You can omit these parameters in Java query as well:
SearchResponse sr = client.prepareSearch("book")
.setQuery(query_string)
.execute().actionGet();
Regarding the argument for setQuery, it can be either the same JSON as you have in your URI variant:
String theQuery = String.join(System.getProperty("line.separator"),
"{\"filtered\" : {\"query\" : {\"term\" : {\"title\" : \"crime\"}},",
"\"filter\" : {\"term\" : { \"year\" : 1961 }}}}");
SearchResponse sr = client.prepareSearch("book")
.setTypes("fiction")
.setFrom(page)
.setQuery(queryString(theQuery)).execute().actionGet();
Or you can provide the analog of this query using Java methods:
SearchResponse sr = client.prepareSearch("book")
.setTypes("fiction")
.setFrom(page)
.setQuery(filteredQuery(QueryBuilders.termQuery("title","crime"),
FilterBuilders.termFilter("year","1961")))
.execute().actionGet();

Related

Spring data ElasticSearch #Query

Im struggling building this query with the #Query annotation for spring-data:
{
"bool" : {
"must" : [
{
"query_string" : {
"query" : "123",
}
},
{
"bool" : {
"should" : [
{
"term" : {
"username" : {
"value" : "admin"
}
}
},
{
"terms" : {
"groups" : ["abc"],
}
}
],
}
}
]
}
}
I tried this:
#Query("{\"bool\": {\"must\": [{\"query_string\": {\"query\": \"?0\"}}, {\"bool\": { \"should\": [ {\"term\" : {\"username\" : \"?2\"}}, {\"terms\" : {\"groups\" : \"?1\"} } ] } } ] }}")
Page<Device> findByTermAndGroupOrUser(String term, List<String> groups, String username, Pageable pageable);
But it fails with:
ParsingException[[terms] query does not support [groups]]
Tried changing the query to:
#Query("{\"bool\": {\"must\": [{\"query_string\": {\"query\": \"?0\"}}, {\"bool\": { \"should\": [ {\"term\" : {\"username\" : \"?2\"}}, {\"terms\" : {\"groups\" : [\"?1\"]} } ] } } ] }}")
Page<Device> findByTermAndGroupOrUser(String term, List<String> groups, String username, Pageable pageable);
This works but the groups seem not to be evaluated. Documents containing the given group are not found.
Same query built with QueryBuilder like this works (but I´m missing the spring-data paging in this case)
BoolQueryBuilder qb = QueryBuilders.boolQuery();
QueryBuilder should1 = QueryBuilders.termQuery("username", username);
QueryBuilder should2 = QueryBuilders.termsQuery("groups", groups);
BoolQueryBuilder should = qb.should(should1).should(should2);
QueryStringQueryBuilder termQuery = QueryBuilders.queryStringQuery(term);
BoolQueryBuilder must = QueryBuilders.boolQuery().must(termQuery).must(should);
SearchResponse searchResponse = elasticsearchTemplate.getClient()
.prepareSearch()
.setQuery(must)
.setFrom(pageable.getPageNumber() * pageable.getPageSize())
.setSize(pageable.getPageSize())
.get();
What am I doing wrong?

Unknown key for a START_OBJECT in [bool] in elastic search

Elasticsearch is giving this error like Unknown key for a START_OBJECT in [bool] in Elasticsearch.
My query is as below: Updated
var searchParams = {
index: 'offers',
body:{
query:{
bool : {
must : {
query: {
multi_match: {
query: query,
fields:['title','subTitle','address','description','tags','shopName'],
fuzziness : 'AUTO'
}
}
},
filter : {
geo_distance : {
distance : radius,
location : {
lat : latitude,
lon : longitude
}
}
}
}}},
filter_path :'hits.hits._source',
pretty:'true'
};
Can anyone tell me how to mix this both geo and fuzzy search query in elastic search?
The body should look like this (you're missing the query section):
body:{
query: { <--- add this
bool : {
must : {
multi_match: {
query: query,
fields:['title','subTitle','address','description','tags','shopName'],
fuzziness : 'AUTO'
}
},
filter : {
geo_distance : {
distance : radius,
location : {
lat : latitude,
lon : longitude
}
}
}
}}},

How to take the shortest distance per person (with multiple addresses) to an origin point and sort on that value

I have People documents in my elastic index and each person has multiple addresses, each address has a lat/long point associated.
I'd like to geo sort all the people by proximity to a specific origin location, however multiple locations per person complicates this matter. What has been decided is [Objective:] to take the shortest distance per person to the origin point and use that number as the sort number.
Example of my people index roughed out in 'pseudo-JSON' showing a couple of person documents each having multiple addresses:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
Currently I'm using an elastic script field with an inline groovy script like so - it uses a groovy script to calculate meters from origin for each address, shoves all those meter distances into an array per person and picks the minimum number from the array per person making it the sort value.
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
Although this works, I wonder if using a scripted field might be more expensive or too complex at querying time, and if there is a better way to achieve the desired sort order outcome by indexing the data differently and/or by using aggregations, maybe even doing away with the script field altogether.
Any thoughts and guidance are appreciated. I'm sure somebody else has run into this same requirement (or similar) and has found a different or better solution.
I'm using the Nest API in this code sample but will gladly accept answers in elasticsearch JSON format because I can port those into the NEST API code.
When sorting on distance from a specified origin where the field being sorted on contains a collection of values (in this case geo_point types), we can specify how a value should be collected from the collection using the sort_mode. In this case, we can specify a sort_mode of "min" to use the nearest location to the origin as the sort value. Here's an example
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
This creates the following search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
which returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
The value returned in the sort array for each hit is the minimum distance in the sort unit specified (in our case, metres) from the specified point (The Tower of London) and the addresses for each person.
Per the guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score query with a decay function.

Spring Data Elasticsearch: Datas are stored in indexes despite "store" property is setted to false in #Field annotation

I'm making some tests with Spring Data Elasticsearch, Spring Boot, Spring Data Rest and an h2 Database (embedded).
I don't understand why the values are stored in the indexes despite I have this configuration:
#Entity
#Document(indexName = "computerindex", type="computers")
public class Computer {
#Id
#GeneratedValue(generator="system-uuid")
#GenericGenerator(name="system-uuid", strategy = "uuid")
private String id;
#Field(type=FieldType.String, store=false) private String name;
#Field(type=FieldType.String, store=false) private String brand;
}
#Configuration
#EnableWebMvc
#EnableJpaRepositories
#EnableElasticsearchRepositories
public class ElasticSearchConfig {
#Bean
public ElasticsearchTemplate elasticsearchTemplate() throws IOException {
return new ElasticsearchTemplate(getNodeClient());
}
private static NodeClient getNodeClient() throws IOException {
String pathHome = new File(".").getCanonicalPath();
NodeBuilder nodeBuilder = new NodeBuilder();
nodeBuilder
.settings()
.put("path.home", pathHome)
.put("path.logs", pathHome+"/logs");
return (NodeClient) nodeBuilder.clusterName("elasticsearch").local(true).node().client();
}
}
Going to the url http://localhost:8080/computers, initially there is an empty "_embedded" result (as expected):
{
"_embedded" : {
"computers" : [ ]
},
"_links" : {
"self" : {
"href" : "http://localhost:8080/computers"
},
"profile" : {
"href" : "http://localhost:8080/profile/computers"
},
"search" : {
"href" : "http://localhost:8080/computers/search"
}
},
"page" : {
"size" : 20,
"totalElements" : 0,
"totalPages" : 0,
"number" : 0
}
}
When I save a new "Computer", I have this result:
{
"_embedded" : {
"computers" : [ {
"name" : "pc",
"brand" : "brand",
"_links" : {
"self" : {
"href" : "http://localhost:8080/computers/AVZZwpMlqIneBcTGH3av"
},
"computer" : {
"href" : "http://localhost:8080/computers/AVZZwpMlqIneBcTGH3av"
}
}
} ]
},
"_links" : {
"self" : {
"href" : "http://localhost:8080/computers"
},
"profile" : {
"href" : "http://localhost:8080/profile/computers"
},
"search" : {
"href" : "http://localhost:8080/computers/search"
}
},
"page" : {
"size" : 20,
"totalElements" : 1,
"totalPages" : 1,
"number" : 0
}
}
Now there is the interesting part.
If I stop the application and I restart it (without erase the indexes), since I have an embedded H2 database that runs and lives together with my application, I will expect that my "pc" is lost after that
the previous session has been terminated.
The indexes are also configured to not store any data.
When I restart the application, I still have this result:
"computers" : [ {
"name" : "pc",
"brand" : "brand",
"_links" : {
"self" : {
"href" : "http://localhost:8080/computers/AVZZwpMlqIneBcTGH3av"
},
"computer" : {
"href" : "http://localhost:8080/computers/AVZZwpMlqIneBcTGH3av"
}
}
The only explanation could be that the values are restored from the indexes: in fact, if I delete the indexes before restarting the application, I have an empty database (as I expect).
But I don't want to save the data in the indexes, and I don't want that the datas are pulled from the indexes.
Furthermore, I have explicitely configured store=false, and I didn't set the data(true) property (like this):
return (NodeClient) nodeBuilder.clusterName("elasticsearch").local(true).data(true).node().client();
Any ideas?

Elasticsearch: Is it possible to query for a term facet that contains more than a term

Part of my mapping looks like this:
{
...
INFO_NODO: {
properties: {
CODIGO: {
type: string
}
ESTADO: {
type: string
}
IN_HOME: {
type: string
}
TEXTO: {
type: string
}
ID_NODO: {
type: integer
}
...
}
}
}
I need to make a facet that will return the fields: ID_NODO, TEXTO, IN_HOME, ESTADO, CODIGO, and COUNT to parse it and feed it to my application. The key is that all these fields except COUNT are dependant on the ID_NODO, that is, if the field INFO_NODO is the same the rest of the information is the same... with that being said ideally I would like to make my facet dependent on the whole INFO_NODO field and not its sub-fields.
I found several solutions but I keep either failing to implement them properly or they are just not working. Any thoughts on my weird situation?
EDIT: What I'd need to do is:
{
"facets": {
"FACET_X_NODO": {
"terms": {
"field": "INFO_NODO"
}
}
}
}
I just can't get the syntax in no documentation since INFO_NODO is a subdocument and not a field.
If I understood you correctly, you should be able to do something like this:
{
"query" : {
"match_all" : { }
},
"facets" : {
"info_node_facet" : {
"terms" : {
"script_field" : "_source.INFO_NODO.CODIGO + _source.INFO_NODO.ESTADO",
"size" : 10
}
}
}
}

Resources