How could I build this Elasticsearch query? - elasticsearch

I'm using Elasticsearch with Spring Data and I have this configuration:
public class Address {
//...
#MultiField(
mainField = #Field(type = FieldType.Text),
otherFields = {
#InnerField(suffix = "raw", type = FieldType.Keyword)
}
)
private String locality;
//...
}
User can filter addresses by locality, so I'm trying to find the proper Elasticsearch query.
Say there are 2 documents:
{ /* ... */, locality: "Granada" }
{ /* ... */, locality: "Las Palmas de Gran Canaria" }
Given user input granada or Granada, I want to just the first document to be returned. However, using this query, both of them are returned.
{
"query": {
"match": {
"address.locality": "granada"
}
}
}
I have also tried with:
{
"query": {
"term": {
"address.locality.raw": "granada"
}
}
}
But, in that case, query is case sensitive and only returns first document when input is Granada, but not granada.
How could I achieve that behaviour?

I wonder why you get both documents with your query, nothing is returned when I try this, because address is not a property of your Document class.
The query should be
{
"query": {
"match": {
"locality": "granada"
}
}
}
Then it returns just the one document.
The mapping that is produced using Spring Data Elasticsearch 3.2.0.RC2 when using this class:
#Document(indexName = "address")
public class Address {
#Id private Long id;
#MultiField(mainField = #Field(type = FieldType.Text),
otherFields = { #InnerField(suffix = "raw", type = FieldType.Keyword) }) private String locality;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getLocality() {
return locality;
}
public void setLocality(String locality) {
this.locality = locality;
}
}
is:
{
"address": {
"mappings": {
"address": {
"properties": {
"id": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"locality": {
"fields": {
"raw": {
"type": "keyword"
}
},
"type": "text"
}
}
}
}
}
}

First thing to notice is that using match() queries - Elasticsearch analyzes (pre-processes) it queries (tokenization is performed: chops of spaces, removes punctuation and more...), in the same way as it has been indexed.
So if your "address.locality" string-field is indexed as 'text', it will use the standard analyzer for both search (using match() query) and indexing.
Term queries are not being analyzed before search is executed, and thus different results might appear.
So in your example, our analysis process will look like:
locality: 'Granada' >> ['granada'], locality.raw: 'Granada' >> ['Granada']
locality: 'Las Palmas de Gran Canaria' >> ['las', 'palmas', 'de', 'gran', 'canaria'] locality.raw: 'Las Palmas de Gran Canaria' >> ['Las
Palmas de Gran Canaria']
as for the second case, "address.locality.raw" is indexed as 'keyword' which uses the keyword analyzer, this analyzer indexes the entire token (does not chop off anything).
Possible solution:
for first part: it should actually return only one document. if you set your property as P.J mentioned above.
for second part: index the innerfield type as type = FieldType.Text, which will break
'Granada' to 'granada'
thus term() queries of 'granada' will match - but any other term() query would not.
any match() queries of
'Granada', 'GRANADA', 'granada', etc...
will match as well (as it will be analyzed to 'granada' using the standard analyzer). This must be checked with your future use cases, maybe keyword indexing is relevant in your other use cases, and just change the query itself.

Related

Springdata elastic search not able to support empty spaces in wildcard bool query

I am using org.springframework.data:spring-data-elasticsearch:4.1.0 with elasticsearch 7.8.1.
I have a requirement where I need to lookup partial search for multiple attributes. I have implemented wildcard bool queries which is working fine except it is not able to do lookup for empty spaces.
Here is my actual query:
GET /maintenance_logs/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"vinNumber.keyword": "DH34ASD7SDFF84742"
}
},
{
"term": {
"organizationId": 1
}
}
],
"minimum_should_match": 1,
"should": [
{
"wildcard": {
"dtcCode": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"subSystem": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"maintenanceActivity": {
"value": "*Cabin*"
}
}
},
{
"wildcard": {
"description": {
"value": "*Cabin*"
}
}
}
]
}
}
}
Here is my SearchRequest:
public static SearchRequest buildSearchRequest(final String indexName,
final SearchRequestDTO dto,
final String vinNumber,
final Integer organizationId, Pageable pageable) {
try {
final int page = pageable.getPageNumber();
final int size = pageable.getPageSize();
final int from = page <= 0 ? 0 : pageable.getPageSize();
SearchRequest searchRequest = new SearchRequest(indexName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
final QueryBuilder vinQuery = QueryBuilders.termQuery("vinNumber.keyword", vinNumber);
final QueryBuilder orgIdQuery = QueryBuilders.termQuery("organizationId", organizationId);
boolQueryBuilder.must(vinQuery);
boolQueryBuilder.must(orgIdQuery);
boolQueryBuilder.minimumShouldMatch(1);
boolQueryBuilder.should(QueryBuilders.wildcardQuery("dtcCode", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("subSystem", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("maintenanceActivity", "*" + dto.getSearchTerm() + "*"));
boolQueryBuilder.should(QueryBuilders.wildcardQuery("description", "*" + dto.getSearchTerm() + "*"));
searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder
.from(from)
.size(size)
.sort(SortBuilders.fieldSort("statsDate")
.order(SortOrder.DESC));
searchRequest.source(searchSourceBuilder);
return searchRequest;
} catch (final Exception e) {
e.printStackTrace();
return null;
}
}
This works fine except I am unable to search for strings like "Cabin pressure".
If you want to be able to search a multi-token value like "Cabin pressure" to be findable with a wildcard query using "Cabin pressure" you need to define it in the mapping as being of type keyword.
The wildcard search searches for single terms that match the wildcard expression, and "Cabin pressure" by default is split into two terms, "Cabin" and "pressure".
In Spring Data Elasticsearch the way to do this is to use #Field(type = FieldType.Keyword, but you'd either need to delete and recreate the index to have the new mapping applied, or you need to create a new index and reindex the existing one into the new. That's because index mappings cannot be updated and in your existing index the type is by default defined as text.
And, if you store "Cabin pressure" as one term - type keyword - don't forget that this will be a different thing than "cabin pressure". Keywords are not normalized, so that upper and lower case differences matter.

How to process a CSV file using Reactor Flux and output as JSON

I've got a CSV file which I want to process using Spring Reactor Flux.
Given a CSV file with header where first two columns are fixed, and
can have more then one optional data columns
Id, Name, Group, Status
6EF3C06E-6240-1A4A-17D6-27E73F0CDD31, Harlan Ferguson, xy1, true
6B261437-217C-0FDF-741A-92477EE354EC, Risa Greene, xy2, false
4FADC070-FCD0-C7E8-1963-A7FACDB6D8D1, Samson Blanchard, xy3, false
562C3486-E009-2C2D-9D3E-14355DB7D4D7, Damian Carson, xy4, true
...
...
...
I want to process the input using Flux
So that the output is
[{
"Id": "6EF3C06E-6240-1A4A-17D6-27E73F0CDD31",
"Name": "Harlan Ferguson",
"data": {
"Group": "xyz1",
"Status": true
}
}, {
"Id": "6B261437-217C-0FDF-741A-92477EE354EC",
"Name": "Risa Greene",
"data": {
"Group": "xy2",
"Status": false
}
}, {
"Id": "4FADC070-FCD0-C7E8-1963-A7FACDB6D8D1",
"Name": "Samson Blanchard",
"data": {
"Group": "xy3",
"Status": false
}
}, {
"Id": "562C3486-E009-2C2D-9D3E-14355DB7D4D7",
"Name": "Damian Carson",
"data": {
"Group": "xy4",
"Status": true
}
}]
I'm using CSVReader to stream and creating and Flux using
new CSVReader( Files.newBufferedReader(file) );
Flux<String[]> fluxOfCsvRecords = Flux.fromIterable(reader);
I'm coming back to Spring Reactor after couple of years, so my understanding is a bit rusty.
Creating a Mono of header using
Mono<String[]> headerMono = fluxOfCsvRecords.next();
And then,
fluxOfCsvRecords.skip(1)
.flatMap(csvRecord -> headerMono.map(header -> header[0] + " : " + csvRecord[0]))
.subscribe(System.out::println);
This is half-way code just to test that I'm able to combine data from header and rest of the flux, expecting to see
Id : 6EF3C06E-6240-1A4A-17D6-27E73F0CDD31
Id : 6B261437-217C-0FDF-741A-92477EE354EC
Id : 4FADC070-FCD0-C7E8-1963-A7FACDB6D8D1
Id : 562C3486-E009-2C2D-9D3E-14355DB7D4D7
But my output is just
4FADC070-FCD0-C7E8-1963-A7FACDB6D8D1 : 6EF3C06E-6240-1A4A-17D6-27E73F0CDD31
I'll appreciate if anyone can help me understand how to achieve this.
---------------------------Update---------------------
Tried another approach
Flux<String[]> take1 = fluxOfCsvRecords.take(1);
take1.flatMap(header -> fluxOfCsvRecords.map(csvRecord -> header[0] + " : " + csvRecord[0]))
.subscribe(System.out::println);
The output is
Id : 6B261437-217C-0FDF-741A-92477EE354EC
Id : 4FADC070-FCD0-C7E8-1963-A7FACDB6D8D1
Id : 562C3486-E009-2C2D-9D3E-14355DB7D4D7
Missing the row after the header
Add Two class like
public class TopJson {
private int Id;
private String name;
private InnerJson data;
public TopJson() {}
public TopJson(int id, String name, InnerJson data) {
super();
Id = id;
this.name = name;
this.data = data;
}
}
class InnerJson{
private String group;
private String status;
public InnerJson() {}
public InnerJson(String group, String status) {
super();
this.group = group;
this.status = status;
}
converted to appropriate types and used to create the object.
fluxOfCsvRecords.skip(1)
.map((Function<String, TopJson>) x -> {
String[] csvRecord = line.split(",");// a CSV has comma separated lines
return new TopJson(Integer.parseInt(csvRecord[0]), csvRecord[1],
new InnerJson(csvRecord[2], csvRecord[3]));
}).collect(Collectors.toList()));

How to nested a scriptedMetric aggregation into a term aggregation in elasticsearch's java api?

I want to use Elasticsearch's aggregation to do OLAP data analysis.
What I want to do is nested a scriptedMetric aggregation into a term aggregation,as below (it's correct)
{
"from": 0,
"size": 0,
"query":{
"bool":{
"must":[
{
"match":{
"poi_id":1
}
}
]
}
},
"aggregations": {
"poi_id": {
"terms": {
"script": {
"inline": "doc['poi_id'].value + 1"
}
},
"aggregations": {
"price": {
"sum": {
"field": "price"
}
}
}
}
}
}
But I didn't find how to do this in Elasticsearch's java api.
I've tried it this way:
SearchResponse response = client.prepareSearch("poi")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setFetchSource(new String[]{"poi_id","poi_name"}, null)
.setQuery(QueryBuilders.termQuery("poi_id", 1))
.addAggregation(AggregationBuilders.terms("poi_id").subAggregation((AggregationBuilders.scriptedMetric("poi_id").mapScript(new Script("doc['poi_id'].value + 1")))))
.execute()
.actionGet();
But got an error
Caused by: NotSerializableExceptionWrapper[: value source config is invalid; must have either a field context or a script or marked as unwrapped]; nested: IllegalStateException[value source config is invalid; must have either a field context or a script or marked as unwrapped];
I've searched a lot, but can't find a demo.
Any help would be appreciated.
Thanks!
#Override
public Map<String, Object> sumPriceAggregation(String field, int page, int size) {
if (StringUtils.isEmpty(field)) {
field = "brandName";
}
NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder();
PageRequest pageRequest = PageRequest.of(page, size);
queryBuilder.withPageable(pageRequest);
queryBuilder.withSourceFilter(new FetchSourceFilter(new String[] {""}, null));
String termStr = field.toUpperCase();
TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms(termStr)
.field(field)
.subAggregation(AggregationBuilders.sum("totalPrice").field("price")); //be aware this is subAggregation
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder();
nativeSearchQueryBuilder.addAggregation(termsAggregationBuilder);
AggregatedPage<GamingLaptop> aggregatedPage = elasticsearchRestTemplate.queryForPage(
nativeSearchQueryBuilder.build(), GamingLaptop.class);
Aggregations aggregations = aggregatedPage.getAggregations();
ParsedStringTerms stringTerms = aggregations.get(termStr);
List<? extends Terms.Bucket> buckets = stringTerms.getBuckets();
HashMap<String, Object> map = new HashMap<>();
buckets.parallelStream().forEach(bucket -> {
String key = bucket.getKeyAsString();
long docCount = bucket.getDocCount();
map.put(key, docCount);
ParsedSum sum = (ParsedSum) bucket.getAggregations().asMap().get("totalPrice"); //make sure you get the aggregation here
map.putIfAbsent("sum", sum.getValue());
});
return map;
}
'value source config is invalid; must have either a field context or a script or marked as unwrapped' i encountered this error as well, please read the comments in the codes, which is my solution. Either ParsedStringTerms or TermsAggregationBuilder need to be retrieved.

Convert string fields to enum fields in Swashbuckle

We are using Swashbuckle to document our WebAPI project (using Owin) and are trying to modify the generated Swagger file of Swashbuckle.
With the DescribeAllEnumsAsStrings() and an enum property like below, we get an expected result:
class MyResponseClass {
public Color color;
}
enum Color {
LightBlue,
LightRed,
DarkBlue,
DarkRed
}
Swagger generated result:
"color": {
"enum": [
"LightBlue",
"LightRed",
"DarkBlue",
"DarkRed"
],
"type": "string"
},
The challenge for us is that we have some properties that are of type string but we are actually treating them as enum types. For example:
class MyResponseClass {
public string color;
}
The only possible values for this property are dark-blue, dark-red, light-blue, light-red.
So, we want something like below as the result:
"color": {
"enum": [
"light-blue",
"light-red",
"dark-blue",
"dark-red"
],
"type": "string"
},
We have lots of these properties with different values in different classes. It would be great to have a custom attribute like below to make it generic. I can't figure out how to create such an attribute and use it in Swashbuckle DocumentFilters or OperationFilters:
public MyEndpointResponseClass {
[StringEnum("booked", "confirmed", "reserved")]
public string status;
// Other properties
}
public MyEndpointRequestClass {
[StringEnum("dark-blue", "dark-red", "light-blue", "light-red")]
public string color;
// Other properties
}
Instead of a custom attribute (StringEnum) use an attribute that swagger already knows about, a little know attribute (I've never used it before):
[RegularExpression("^(dark-blue|dark-red|light-blue|light-red)")]
That will inject into the parameter.pattern and then we can read it from the IDocumentSchema and transform it into an enum, here is my code:
private class StringEnumDocumentFilter : IDocumentFilter
{
public void Apply(SwaggerDocument swaggerDoc, SchemaRegistry s, IApiExplorer a)
{
if (swaggerDoc.paths != null)
{
foreach (var path in swaggerDoc.paths)
{
ProcessOperation(path.Value.get);
ProcessOperation(path.Value.put);
ProcessOperation(path.Value.post);
ProcessOperation(path.Value.delete);
ProcessOperation(path.Value.options);
ProcessOperation(path.Value.head);
ProcessOperation(path.Value.patch);
}
}
}
private void ProcessOperation(Operation op)
{
if (op != null)
{
foreach (var param in op.parameters)
{
if (param.pattern != null)
{
param.#enum = param.pattern
.Replace("^", "")
.Replace("(", "")
.Replace(")", "")
.Split('|');
}
}
}
}
}
Here is a working example:
http://swashbuckletest.azurewebsites.net/swagger/ui/index?filter=TestStringEnum#/TestStringEnum/TestStringEnum_Post
And the code behind that is on GitHub:
TestStringEnumController.cs
SwaggerConfig.cs#L389

MongoDb query in Spring

In feed collection "likeCount" and "commentCount" are two column and I want to sum these two column and show only top 5 records. For that I have written the below query in MongoDb.
How can I write the same thing in Spring ?
db.feed.aggregate(
[
{ $project: { _id: 1, popularityCount: { $add: [ "$likeCount", "$commentCount" ] } } },
{ "$sort": { "popularityCount": -1 } },
{"$limit":5}
]
)
Use Spring Data Mongo and check out the aggregation support.
#Autowired private MongoTemplate mongo;
public List<TopFeed> findTopFeeds() {
// { $project: { _id: 1, popularityCount: { $add: [ $likeCount, $commentCount ] } } }
final ProjectionOperation projection = project("_id").and("likeCount").plus("commentCount").as("popularityCount");
// { $sort: { popularityCount: -1 } }
final SortOperation sort = sort(DESC, "popularityCount");
// { $limit: 5 }
final LimitOperation limit = limit(5);
// run against collection "feed" and return as a "TopFeed" object
final AggregationResults results = mongo.aggregate(newAggregation(projection, sort, limit), "feed", TopFeed.class);
// get the results
return results.getMappedResults();
}
public class TopFeed {
private String id;
private long popularityCount;
}
If you don't want to use Spring Data you'll have to hit the Mongo client itself. Just inject a com.mongodb.Mongo or com.mongodb.MongoClient with Spring and use it directly.

Resources