Spring Data Elasticsearch setting annotation did not take effect - elasticsearch

I'm trying with spring data elastic search and have a class defined like this:
#Data
#NoArgsConstructor
#AllArgsConstructor
#Document(indexName = "master", type = "master", shards = 1, replicas = 0)
#Setting(settingPath = "/settings/setting.json")
public class Master {
#Id
private String id;
#MultiField(mainField = #Field(type = FieldType.String, store = true),
otherFields = {
#InnerField(suffix = "autocomplete", type = FieldType.String, indexAnalyzer = "autocomplete", searchAnalyzer = "standard")
}
)
private String firstName;
private String lastName;
}
The setting file is under /src/main/settings/setting.json, which looks like this
{
"index": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
I ran my test class by first deleting the index, and recreate the index like this
elasticsearchTemplate.deleteIndex(Master.class);
elasticsearchTemplate.createIndex(Master.class);
elasticsearchTemplate.putMapping(Master.class);
elasticsearchTemplate.refresh(Master.class);
But when I try to save something into the index there is this error message for MapperParsingException:
2017-10-04 18:56:31.806 ERROR 2942 --- [ main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.index.mapper.MapperParsingException: analyzer [autocomplete] not found for field [autocomplete]
Spent 4 hours trying to figure this out, looked at the Debug mode log, nothing.
I tried to break the JSON format by deleting a comma, it broke so the JSON was being interpreted.
I used the RestAPI to query the master index but the settings doesn't seem to contain the autocomplete analyzer or any analyzer.
Weird thing is that my document can be saved and queried even with this error. But I do want this analyzer.
BTW, this is a parent class in a parent-child relationship, if that's relevant.

Finally got it figured out!
I have to put the same setting across all domains using the same index (both parent and child), then delete the index, restart the server, and it worked!

Related

How to create subfield keyword for Aggregation in ElasticSearch

I am trying to get Aggregation results from ElasticSearch index
For exmaple , values in my index
"_source": {
"ctry": "abc",
"totalentry": 1,
"entrydate": "2022-01-06"
},
"_source": {
"ctry": "abc",
"totalentry": 3,
"entrydate": "2022-01-07"
},
"_source": {
"ctry": "xyz",
"totalentry": 1,
"entrydate": "2022-01-08"
}
expected Results should be get totalentry based on country
ctry : abc
totalentry : 4
ctry : xyz
totalentry : 1
My Aggreagtion query
QueryBuilder querybuilder = QueryBuilders.boolQuery().must(QueryBuilders.rangeQuery("entrydate")
.gte("2022-01-01").lte ("2022-01-31"));
TermsAggregationBuilder groupBy = AggregationBuilders.terms("ctry").field("ctry");
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(querybuilder).addAggregation(groupBy)
.build();
List<Sample> records = elasticsearchRestTemplate.queryForList(searchQuery, Sample.class);
Above aggregation query returning 3 records instead of 2 aggregated results.
My index properties
"ctry": {
"type": "keyword"
How to change it to below , so that i hope i will get correct aggregation results
ctry": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
}
}
My java code
#Document(indexName="sample", createIndex=true, shards = 4)
public class Sample {
#Field(type = FieldType.Keyword)
private String ctry;
You are using an outdated version of Spring Data Elasticsearch. The queryForList variants were deprecated in 4.0 and have been removed in 4.2.
You need to use one of the search...() methods that return a SearchHits<Sample>> object. That will contain the documents for your query and the aggregations.

Creating an elasticsearch index from logstash

I am trying to load data from an Sql Server into ElasticSearch. I am using Logstash with the jdbc plugin and the elastic-search plugin. I am loading my data in ElasticSearch but can not figure out how to set my index. I am using a template index to try this. Below is what I am using but whenever I search I do not get any results.
logstash.config
# contents of logstash\bin\logstash.config
input {
jdbc {
jdbc_driver_library => ".\Microsoft JDBC Driver 6.2 for SQL Server\sqljdbc_6.2\enu\mssql-jdbc-6.2.1.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://mydbserver;databaseName=mydb;"
jdbc_user => "******"
jdbc_password => "******"
schedule => "* * * * *"
parameters => { "classification" => "EMPLOYEE" }
statement => "SELECT Cost_Center, CC_Acct_1, CC_Acct_2, CC_Acct_3 from dbo.Cost_Center where CC_Classification = :classification"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "allocation-testweb"
template => "index_template.json"
}
#stdout { codec => rubydebug }
}
index_template.json
{
"template": "allocation-*",
"order":1,
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1,
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": ["letter","digit"]
}
}
}
},
"mappings":{
"costcenter": {
"properties": {
"cc_acct_1": {
"type": "string",
"analyzer": "substring_analyzer"
},
"cc_acct_2": {
"type": "string",
"analyzer": "substring_analyzer"
}
}
}
}
I have created a similar index in code while doing some initial research. Is my index_template incorrect or is there another way I should be doing this?
Update:
I had the mismatched index names between my 2 files. I'm now able to search using Postman and curl. However when I try to get data using a NEST client I can never get data back. Below is the code snippet for the query.
var searchResult = client.Search<CostCenter>(s => s
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
This previously worked with the same data loaded from a file. CostCenter is simply an object with members called Cost_Center, CC_Acct_1, CC_Acct_2, and CC_Acct_3. I'm sure again I am over complicating the issue and missing something obvious.
UPDATE II:
I have made the changes suggested by #RussCam below and still do not get any results back. Below is my updated code.
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(node);
//.InferMappingFor<CostCenter>(m => m.IndexName("allocation_testweb"));
var client = new ElasticClient(settings);
var searchResult = client.Search<CostCenter>(s => s
.Type("costCenter")
.Size(1000)
.Index("allocation_testweb")
.MatchAll());
I commented out the InferMappingFor<> since it was not providing a result.
Mapping image requested by #RussCam. I've also included my costcenter class (I have tried naming all variations of costcenter).
public class costcenter
{
public string cost_center { get; set; }
public string cc_acct_1 { get; set; }
public string cc_acct_2 { get; set; }
public string cc_acct_3 { get; set; }
}

Spring data elastic search wild card search

I am trying to search for the word blue in the below list of text
"BlueSaphire","Bluo","alue","blue", "BLUE",
"Blue","Blue Black","Bluo","Saphire Blue",
"black" , "green","bloo" , "Saphireblue"
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("color")
.withQuery(matchQuery("colorDescriptionCode", "blue")
.fuzziness(Fuzziness.ONE)
)
.build();
This works fine and the search result returns the below records along with the scores
alue 2.8718023
Bluo 1.7804208
Bluo 1.7804208
BLUE 1.2270637
blue 1.2270637
Blue 1.2270637
Blue Black 1.1082436
Saphire Blue 0.7669148
But I am not able to make wild card work . "SaphireBlue" and "BlueSaphire" is also expected to be part of the result
I tried the below setting but it does not work .
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("color")
.withQuery(matchQuery("colorDescriptionCode", "(.*?)blue")
.fuzziness(Fuzziness.ONE)
)
.build();
In stack overflow , I observed a solution to specify analyze wild card .
QueryBuilder queryBuilder = boolQuery().should(
queryString("blue").analyzeWildcard(true)
.field("colorDescriptionCode", 2.0f);
I dont find the queryString static method . I am using spring-data-elasticsearch 2.0.0.RELEASE .
Let me know how i can specify the wild card so the all words containing blue will also be returned in the search results
I know that working examples are always better than theory, but still, I would first like to tell a little theory. The heart of the Elasticsearch is Lucene. So before document will be written to Lucene index, he goes through analysis stage. The analysis stage can be divided into 3 parts:
char filtering;
tokenizing;
token filtering
In the first stage, we can throw away unwanted characters, for example, HTML tags. More information about character filters, you can find on official site.
Next stage is far more interesting. Here we split input text to tokens, which will be used later for searching. A few very useful tokenizers:
standard tokenizer. It's used by default. The tokenizer implements the Unicode Text Segmentation algorithm. In practice, you can use this to split the text into words and use this words as tokens.
n-gram tokenizer. This is what you need if you want to search by part of the word. This tokenizer splits text to a contiguous sequence of n items. For example text "for example" will be splitted to this sequence of tokens "fo", "or", "r ", " e", "ex", "for", "or ex" etc. The length of n-gram is variable and can be configured by min_gram and max_gram params.
edge n-gram tokenizer. Work the same as n-gram tokenizer except for one thing - this tokenizer doesn't increment offset. For example text "for example" will be splitted to this sequence of tokens "fo", "for", "for ", "for e", "for ex", "for exa" etc.
More information about tokenizers you can find on the official site. Unfortunately, I can't post more links because of low reputation.
The next stage is also damn interesting. After we split text to tokens, we can do a lot of interesting things with this. Again I give a few very useful examples of token filters:
lowercase filter. In most cases, we want to get case-insensitive search, so it's good practice to bring tokens to lowercase.
stemmer filter. When we have a deal with natural language, we have a lot of problems. One of the problem is that one word can have many forms. Stemmer filter helps us to get root form of the word.
fuzziness filter. Another problem is that users often make typos. This filter adds tokens that contain possible typos.
If you are interested in looking at the result of the analyzer, you can use this _termvectors endpoint
curl [ELASTIC_URL]:9200/[INDEX_NAME]/[TYPE_NAME]/[DOCUMENT_ID]/_termvectors?pretty
Now talk about queries. Queries are divided into 2 large groups. These groups have 2 significant differences:
Whether the request will go through the analysis stage or not;
Does the request require an exact answer (yes or no)
Examples are the match query and term query. The first will pass the stage of analysis, the second not. The first will not give us a specific answer (but give us a score), the second will does. When creating mappings for a document, we can specify both the index of the analyzer and the search analyzer separately per field.
Now information regarding spring data elasticsearch. Here it makes sense to talk about concrete examples. Suppose that we have a document with a title field and we want to search for information on this field. First, create a file with settings for elasticsearch.
{
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
},
"english_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"unique",
"english_possessive_stemmer",
"english_stemmer"
]
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
},
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
}
}
You can save this settings to your resource folder. Now let's see to our document class
#Document(indexName = "document", type = "document")
#Setting(settingPath = "document_index_setting.json")
public class Document {
#Id
private String id;
#MultiField(
mainField = #Field(type = FieldType.String,
index = not_analyzed),
otherFields = {
#InnerField(suffix = "edge_ngram",
type = FieldType.String,
indexAnalyzer = "edge_ngram_analyzer",
searchAnalyzer = "keyword_analyzer"),
#InnerField(suffix = "ngram",
type = FieldType.String,
indexAnalyzer = "ngram_analyzer"),
searchAnalyzer = "keyword_analyzer"),
#InnerField(suffix = "english",
type = FieldType.String,
indexAnalyzer = "english_analyzer")
}
)
private String title;
// getters and setters omitted
}
So here field title with three inner fields:
title.edge_ngram for searching by edge n-grams with keyword search analyzer. We need this because we don't need that our query be splitted to edge n-grams;
title.ngram for searching by n-grams;
title.english for searching with the nuances of a natural language
And main field title. We don't analyze this because sometimes we want to sort by this field.
Let's use simple multi match query for searching through all this fields:
String searchQuery = "blablabla";
MultiMatchQueryBuilder queryBuilder = multiMatchQuery(searchQuery)
.field("title.edge_ngram", 2)
.field("title.ngram")
.field("title.english");
NativeSearchQueryBuilder searchBuilder = new NativeSearchQueryBuilder()
.withIndices("document")
.withTypes("document")
.withQuery(queryBuilder)
.withPageable(new PageRequest(page, pageSize));
elasticsearchTemplate.queryForPage(searchBuilder.build,
Document.class,
new SearchResultMapper() {
//realisation omitted });
Search is a very interesting and voluminous topic. I tried to answer as briefly as possible, it is possible that because of this there were confusing moments - do not hesitate to ask.
I could not achieve Fuzziness and Wilcard search in one query.
This is the closest solution I could get. I had to fire two different queries and merge the results manually .
#Query("{\"wildcard\" : {\"colorDescriptionCode\" : \"?0\" }}")
Page<ColorDescription> findByWildCard(String colorDescriptionCode, Pageable pageable);
#Query("{\"match\": { \"colorDescriptionCode\": { \"query\": \"?0\", \"fuzziness\": 1 }}}")
Page<ColorDescription> findByFuzzy(String colorDescriptionCode, Pageable pageable);

Elasticsearch. Can not find custom analyzer

I have model like this:
#Getter
#Setter
#Document(indexName = "indexName", type = "typeName")
#Setting(settingPath = "/elastic/elastic-setting.json")
public class Model extends BaseModel {
#Field(type = FieldType.String, index = FieldIndex.analyzed, analyzer = "customAnalyzer")
private String name;
}
And i have elastic-setting.json inside ../resources/elastic/elastic-setting.json:
{
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"customAnalyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
}
}
}
}
I clean my elastic DB and when i start my application i have exception:
MapperParsingException[analyzer [customAnalyzer] not found for field [name]]
What's wrong with my code?
Help me, please!
EDIT
Val, I thought #Setting is like an addition for #Document, but looks like they are interchangeably.
In my case i also have another model, with:
#Document(indexName = "indexName", type = "anotherTypeName")
So, first i create index with name "indexName" for anotherModel, next when elastic preparing Model, it see, that index with name "indexName" already created, and he does not use #Setting.
Now i have another quesion.
How to add custom analyzer to already created index in java code, for example in InitializingBean. Something like - is my analyzer created? no - create. yes - do not create.
Modify your elastic-setting.json file like this:
{
"index": {
"number_of_shards": "1",
"number_of_replicas": "0"
},
"analysis": {
"analyzer": {
"customAnalyzer": {
"type": "custom",
"tokenizer": "uax_url_email"
}
}
}
}
}
Note that you need to delete your index first and recreate it.
UPDATE
You can certainly add a custom analyzer via Java code, however, you won't be able to change your existing mapping in order to use that analyzer, so you're really better off wiping your index and recreating it from scratch with a proper elastic-setting.json JSON file.
For Val:
Yeah, i use something like this.
Previously, i had added #Setting in one of my entity class, but when i started app, index with same name was already created, before Spring Data had analysed entity with #Setting, and index was not modified, because index with same name was already created.
Now I add annotation #Setting(path = "elastic-setting.json") on abstract baseModel, and high level hierarchy class was scanned firstly and analyzer was created as well.

Spring data elasticsearch query products with multiple fields

ES newbie here, sorry for the dumb question.
I have been trying to create a elasticsearch query for a products index. I'm able to query it, but it never returns as I expect.
I'm probably using the query builder in a wrong way, have tried all sorts of queries builders and never got to make it work as I expected.
My Product class (simpler for the sake of the question):
public class Product {
private String sku;
private Boolean soldOut;
private Boolean freeShipping;
private Store store;
private Category category;
private Set<ProductUrl> urls;
private Set<ProductName> names;
}
Category nas name and id which I use for aggregations
The boolean field are used for filters.
ProductName and ProductUrl both have a String locale and String name or String url accordingly
I am currently building my query with the following logic
private SearchQuery buildSearchQuery(String searchTerm, List<Long> categories, Pageable pageable) {
NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder();
if (searchTerm != null) {
builder.withQuery(
new MultiMatchQueryBuilder(searchTerm, "names.name", "urls.url", "descriptions.description", "sku")
.operator(Operator.AND)
.type(MultiMatchQueryBuilder.Type.MOST_FIELDS)
);
}
builder.withPageable(pageable);
return builder.build();
}
The problem is that lots of products are not being matched, for example:
query "andro" does not return "android" products.
What am I missing? Is this way of building the query right?
UPDATE
Adding the names part of my product mapping:
{
"mappings": {
"product": {
"properties": {
"names": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"locale": {
"type": "string"
}
}
}
}
}
}
}

Resources