Elasticsearch: search in array in sequence - elasticsearch

I have an index with a field chunks with keyword type which is just an a list of keyword. When I search through I do something like
{
"query": {
"bool": {
"filter": [
{
"term": {
"chunks": "chunk1"
}
},
{
"term": {
"chunks": "chunk2"
}
}
]
}
}
}
So I can retrieve all documents where there are both "chunk1" and "chunk2" inside the chunks field. However what if I care about the order? My solution is too have a script like
String[] chunks = doc['chunks'];
int c = 0;
String chunk = params.chunks[0];
for (int i = 0; i < chunks.length; ++i) {
if (chunk == chunks[i]) {
if (++c == params.chunks.length) {
return true;
}
chunk = params.chunks[c];
}
}
return false;
where params.chunks is something like ["chunk1", "chunk2"]. The problem is the doc['chunks'] is unordered, while params._source isn't allowed from _search context.
It should be possible somehow, because Elastic itself has similar functionality for multitext search, so I can emulate the same field structure.

Related

Springdata elastic search not able to support empty spaces in wildcard bool query

I am using org.springframework.data:spring-data-elasticsearch:4.1.0 with elasticsearch 7.8.1.
I have a requirement where I need to lookup partial search for multiple attributes. I have implemented wildcard bool queries which is working fine except it is not able to do lookup for empty spaces.
Here is my actual query:
GET /maintenance_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"vinNumber.keyword": "DH34ASD7SDFF84742"
}
},
{
"term": {
"organizationId": 1
}
}
],
"minimum_should_match": 1,
"should": [
{
"wildcard": {
"dtcCode": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"subSystem": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"maintenanceActivity": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"description": {
"value": "*Cabin*"
}
}
}
]
}
}
}
Here is my SearchRequest:
public static SearchRequest buildSearchRequest(final String indexName,
final SearchRequestDTO dto,
final String vinNumber,
final Integer organizationId, Pageable pageable) {
try {
final int page = pageable.getPageNumber();
final int size = pageable.getPageSize();
final int from = page <= 0 ? 0 : pageable.getPageSize();
SearchRequest searchRequest = new SearchRequest(indexName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
final QueryBuilder vinQuery = QueryBuilders.termQuery("vinNumber.keyword", vinNumber);
final QueryBuilder orgIdQuery = QueryBuilders.termQuery("organizationId", organizationId);
boolQueryBuilder.must(vinQuery);
boolQueryBuilder.must(orgIdQuery);
boolQueryBuilder.minimumShouldMatch(1);
boolQueryBuilder.should(QueryBuilders.wildcardQuery("dtcCode", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("subSystem", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("maintenanceActivity", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("description", "*" + dto.getSearchTerm() + "*"));
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder
.from(from)
.size(size)
.sort(SortBuilders.fieldSort("statsDate")
.order(SortOrder.DESC));
searchRequest.source(searchSourceBuilder);
return searchRequest;
} catch (final Exception e) {
e.printStackTrace();
return null;
}
}
This works fine except I am unable to search for strings like "Cabin pressure".
If you want to be able to search a multi-token value like "Cabin pressure" to be findable with a wildcard query using "Cabin pressure" you need to define it in the mapping as being of type keyword.
The wildcard search searches for single terms that match the wildcard expression, and "Cabin pressure" by default is split into two terms, "Cabin" and "pressure".
In Spring Data Elasticsearch the way to do this is to use #Field(type = FieldType.Keyword, but you'd either need to delete and recreate the index to have the new mapping applied, or you need to create a new index and reindex the existing one into the new. That's because index mappings cannot be updated and in your existing index the type is by default defined as text.
And, if you store "Cabin pressure" as one term - type keyword - don't forget that this will be a different thing than "cabin pressure". Keywords are not normalized, so that upper and lower case differences matter.

Java API not querying as I would expect

I'm using the newer 8.1 Java API for Elastic in Kotlin, and getting behavior that isn't what I would expect, nor what I get when using the manual REST API. Here's my code:
val boolQuery = BoolQuery.of { boolBuilder -> boolBuilder
.must { mustBuilder ->
query.forEachIndexed { index, char ->
val prefix = if (index > 0) {
"* "
} else { "" }
val queryString = QueryStringQuery.of { queryBuilder -> queryBuilder
.query("${prefix}${char}*")
}
mustBuilder.queryString(queryString)
}
mustBuilder
}
}
In practice this only appears to hit on the last string query. With an input of query="rp" I would expect the following request to be made:
{
"query": {
"bool" : {
"must" : [
{ "query_string" : { "query" : "r*" } },
{ "query_string" : { "query" : "* p*" } }
]
}
}
}
When running that exact request, it does behave as I expect/intend. I can't tell how to pull out the request the Java API is sending over without monitoring the traffic, but if I understand the pattern then I would think these are isomorphic. I've also confirmed that the Kotlin code is calling the Java API as intended based on my input.
What am I doing wrong here?
Apparently asking the question was enough to figure it out – each subquery needs its own builder. Here's the modified code now working as expected:
val boolQuery = BoolQuery.of { boolBuilder ->
query.forEachIndexed { index, char ->
val prefix = if (index > 0) {
"* "
} else { "" }
boolBuilder.must { mustBuilder ->
val queryString = QueryStringQuery.of { queryBuilder -> queryBuilder
.query("${prefix}${char}*")
}
mustBuilder.queryString(queryString)
}
}
boolBuilder
}

How could I build this Elasticsearch query?

I'm using Elasticsearch with Spring Data and I have this configuration:
public class Address {
//...
#MultiField(
mainField = #Field(type = FieldType.Text),
otherFields = {
#InnerField(suffix = "raw", type = FieldType.Keyword)
}
)
private String locality;
//...
}
User can filter addresses by locality, so I'm trying to find the proper Elasticsearch query.
Say there are 2 documents:
{ /* ... */, locality: "Granada" }
{ /* ... */, locality: "Las Palmas de Gran Canaria" }
Given user input granada or Granada, I want to just the first document to be returned. However, using this query, both of them are returned.
{
"query": {
"match": {
"address.locality": "granada"
}
}
}
I have also tried with:
{
"query": {
"term": {
"address.locality.raw": "granada"
}
}
}
But, in that case, query is case sensitive and only returns first document when input is Granada, but not granada.
How could I achieve that behaviour?
I wonder why you get both documents with your query, nothing is returned when I try this, because address is not a property of your Document class.
The query should be
{
"query": {
"match": {
"locality": "granada"
}
}
}
Then it returns just the one document.
The mapping that is produced using Spring Data Elasticsearch 3.2.0.RC2 when using this class:
#Document(indexName = "address")
public class Address {
#Id private Long id;
#MultiField(mainField = #Field(type = FieldType.Text),
otherFields = { #InnerField(suffix = "raw", type = FieldType.Keyword) }) private String locality;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getLocality() {
return locality;
}
public void setLocality(String locality) {
this.locality = locality;
}
}
is:
{
"address": {
"mappings": {
"address": {
"properties": {
"id": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"locality": {
"fields": {
"raw": {
"type": "keyword"
}
},
"type": "text"
}
}
}
}
}
}
First thing to notice is that using match() queries - Elasticsearch analyzes (pre-processes) it queries (tokenization is performed: chops of spaces, removes punctuation and more...), in the same way as it has been indexed.
So if your "address.locality" string-field is indexed as 'text', it will use the standard analyzer for both search (using match() query) and indexing.
Term queries are not being analyzed before search is executed, and thus different results might appear.
So in your example, our analysis process will look like:
locality: 'Granada' >> ['granada'], locality.raw: 'Granada' >> ['Granada']
locality: 'Las Palmas de Gran Canaria' >> ['las', 'palmas', 'de', 'gran', 'canaria'] locality.raw: 'Las Palmas de Gran Canaria' >> ['Las
Palmas de Gran Canaria']
as for the second case, "address.locality.raw" is indexed as 'keyword' which uses the keyword analyzer, this analyzer indexes the entire token (does not chop off anything).
Possible solution:
for first part: it should actually return only one document. if you set your property as P.J mentioned above.
for second part: index the innerfield type as type = FieldType.Text, which will break
'Granada' to 'granada'
thus term() queries of 'granada' will match - but any other term() query would not.
any match() queries of
'Granada', 'GRANADA', 'granada', etc...
will match as well (as it will be analyzed to 'granada' using the standard analyzer). This must be checked with your future use cases, maybe keyword indexing is relevant in your other use cases, and just change the query itself.

How to nested a scriptedMetric aggregation into a term aggregation in elasticsearch's java api?

I want to use Elasticsearch's aggregation to do OLAP data analysis.
What I want to do is nested a scriptedMetric aggregation into a term aggregation,as below (it's correct)
{
"from": 0,
"size": 0,
"query":{
"bool":{
"must":[
{
"match":{
"poi_id":1
}
}
]
}
},
"aggregations": {
"poi_id": {
"terms": {
"script": {
"inline": "doc['poi_id'].value + 1"
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
But I didn't find how to do this in Elasticsearch's java api.
I've tried it this way:
SearchResponse response = client.prepareSearch("poi")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFetchSource(new String[]{"poi_id","poi_name"}, null)
.setQuery(QueryBuilders.termQuery("poi_id", 1))
.addAggregation(AggregationBuilders.terms("poi_id").subAggregation((AggregationBuilders.scriptedMetric("poi_id").mapScript(new Script("doc['poi_id'].value + 1")))))
.execute()
.actionGet();
But got an error
Caused by: NotSerializableExceptionWrapper[: value source config is invalid; must have either a field context or a script or marked as unwrapped]; nested: IllegalStateException[value source config is invalid; must have either a field context or a script or marked as unwrapped];
I've searched a lot, but can't find a demo.
Any help would be appreciated.
Thanks!
#Override
public Map<String, Object> sumPriceAggregation(String field, int page, int size) {
if (StringUtils.isEmpty(field)) {
field = "brandName";
}
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
PageRequest pageRequest = PageRequest.of(page, size);
queryBuilder.withPageable(pageRequest);
queryBuilder.withSourceFilter(new FetchSourceFilter(new String[] {""}, null));
String termStr = field.toUpperCase();
TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms(termStr)
.field(field)
.subAggregation(AggregationBuilders.sum("totalPrice").field("price")); //be aware this is subAggregation
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
nativeSearchQueryBuilder.addAggregation(termsAggregationBuilder);
AggregatedPage<GamingLaptop> aggregatedPage = elasticsearchRestTemplate.queryForPage(
nativeSearchQueryBuilder.build(), GamingLaptop.class);
Aggregations aggregations = aggregatedPage.getAggregations();
ParsedStringTerms stringTerms = aggregations.get(termStr);
List<? extends Terms.Bucket> buckets = stringTerms.getBuckets();
HashMap<String, Object> map = new HashMap<>();
buckets.parallelStream().forEach(bucket -> {
String key = bucket.getKeyAsString();
long docCount = bucket.getDocCount();
map.put(key, docCount);
ParsedSum sum = (ParsedSum) bucket.getAggregations().asMap().get("totalPrice"); //make sure you get the aggregation here
map.putIfAbsent("sum", sum.getValue());
});
return map;
}
'value source config is invalid; must have either a field context or a script or marked as unwrapped' i encountered this error as well, please read the comments in the codes, which is my solution. Either ParsedStringTerms or TermsAggregationBuilder need to be retrieved.

Multiple boolean checks or an invalids counter

What is your preferred way to implement actions based on the invalidity of multiple variables:
ie:
invalid_get() {
return a_invalid || b_invalid || c_invalid;
}
a_invalid_set(v) {
a_invalid=v;
if(v) {
validate_add();
} else {
validate_remove();
}
}
function validate_remove() {
if(!invalid_get()) {
validate_remove_do();
}
}
OR:
var invalids_num:Int;
function a_invalid_set(v) {
a_invalid=v;
if(v) {
++invalids_num;
validate_add();
} else {
--invalids_num;
validate_remove();
}
}
validate_remove() {
if(invalids_num==0) {
validate_remove_do();
}
}
I am guessing that a int check against 0 is faster and is without question the correct pattern, certainly for a large number of properties.

Resources