I am currently learning Elasticsearch and stuck on the issue described below:
On an existing index (I don't know if it matter) I added this new mapping:
PUT user-index
{
"mappings": {
"properties": {
"common_criteria": { -- new property which aggregates other properties by copy_to
"type": "text"
},
"name": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"username": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"phone": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
},
"country": { -- already existed before this mapping
"type": "text",
"copy_to": "common_criteria"
}
}
}
}
The goal is to search ONE or MORE values only on common_criteria.
Say that we have:
{
"common_criteria": ["John Smith","johny","USA"]
}
What I would like to achieve is an exact match searching on multiple values of common_criteria:
We should have a result if we search with John Smith or with USA + John Smith or with johny + USA or with USA or with johny and finally with John Smith + USA + johny (the words order does not matter)
If we search with multiple words like John Smith + Germany or johny + England we should not have a result
I am using Spring Data Elastic to build my query:
NativeSearchQueryBuilder nativeSearchQuery = new NativeSearchQueryBuilder();
BoolQueryBuilder booleanQuery = QueryBuilders.boolQuery();
String valueToSearch = "johny"
nativeSearchQuery.withQuery(booleanQuery.must(QueryBuilders.matchQuery("common_criteria", valueToSearch)
.fuzziness(Fuzziness.AUTO)
.operator(Operator.AND)));
Logging the request sent to Elastic I have:
{
"bool" : {
"must" :
{
"match" : {
"common_criteria" : {
"query" : "johny",
"operator" : "AND",
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"fuzzy_transpositions" : true,
"lenient" : false,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"boost" : 1.0
}
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
With that request I have 0 result. I know that request is not correct because of must.match condition and maybe the field common_criteria is also not well defined.
Thanks in advance for your help and explanations.
EDIT: After trying multi_match query.
Following #rabbitbr's suggestion I tried the multi_match query but does not seem to work. This is the example of a request sent to Elastic (with 0 result):
{
"bool" : {
"must" : {
"multi_match" : {
"query" : "John Smith USA",
"fields" : [
"name^1.0",
"username^1.0",
"phone^1.0",
"country^1.0",
],
"type" : "best_fields",
"operator" : "AND",
"slop" : 0,
"fuzziness" : "AUTO",
"prefix_length" : 0,
"max_expansions" : 50,
"zero_terms_query" : "NONE",
"auto_generate_synonyms_phrase_query" : true,
"fuzzy_transpositions" : true,
"boost" : 1.0
}
},
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
That request does not return a result.
I would try to use Multi-match query before creating a field to store all the others in one place.
The multi_match query builds on the match query to allow multi-field
queries.
I have this search context in my index mapping
Index: region
"place_suggest": {
"type" : "completion",
"analyzer" : "simple",
"preserve_separators" : true,
"preserve_position_increments" : true,
"max_input_length" : 50,
"contexts" : [
{
"name" : "place_type",
"type" : "CATEGORY",
"path" : "place_type"
}
]
}
And I want to add a new context to this mapping
{
"name": "restricted",
"type": "CATEGORY",
"path": "restricted"
}
I've tried using Update Mapping API to add this new context like this:
PUT region_test/_mapping/
{
"properties" : {
"place_suggest" : {
"contexts": [
"name": "restricted",
"type": "CATEGORY",
"path": "restricted"
]
}
}
}
I'm using Kibana dev tools for running this query.
You will not be able to edit your field by adding the new context.
You need to create a new mapping and re-index your index.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html#change-existing-mapping-parms
I have about 15,000 scraped websites with their body texts stored in an elastic search index. I need to get the top 100 most used three-word phrases being used in all these texts:
Something like this:
Hello there sir: 203
Big bad pony: 92
First come first: 56
[...]
I'm new to this. I looked into term vectors but they appear to apply to single documents. So I feel it will be a combination of term vectors and aggregation with n-gram analysis of sorts. But I have no idea how to go about implementing this. Any pointers will be helpful.
My current mapping and settings:
{
"mappings": {
"items": {
"properties": {
"body": {
"type": "string",
"term_vector": "with_positions_offsets_payloads",
"store" : true,
"analyzer" : "fulltext_analyzer"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"analyzer": {
"fulltext_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"type_as_payload"
]
}
}
}
}
}
What you're looking for are called Shingles. Shingles are like "word n-grams": serial combinations of more than one term in a string. (E.g. "We all live", "all live in", "live in a", "in a yellow", "a yellow submarine")
Take a look here: https://www.elastic.co/blog/searching-with-shingles
Basically, you need a field with a shingle analyzer producing solely 3-term shingles:
Elastic blog-post configuration but with:
"filter_shingle":{
"type":"shingle",
"max_shingle_size":3,
"min_shingle_size":3,
"output_unigrams":"false"
}
The, after applying the shingle analyzer to the field in question (as in the blog post), and reindexing your data, you should be able to issue a query returning a simple terms aggregation, on your body field to see the top one-hundred 3-word phrases.
{
"size" : 0,
"query" : {
"match_all" : {}
},
"aggs" : {
"three-word-phrases" : {
"terms" : {
"field" : "body",
"size" : 100
}
}
}
}
EDIT: To add on to this, the synonyms seem to be working with basic querystring queries.
"query_string" : {
"default_field" : "location.region.name.raw",
"query" : "nh"
}
This returns all of the results for New Hampshire, but a "match" query for "nh" returns no results.
I'm trying to add synonyms to my location fields in my Elastic index, so that if I do a location search for "Mass," "Ma," or "Massachusetts" I'll get the same results each time. I added the synonyms filter to my settings and changed the mapping for locations. Here are my settings:
analysis":{
"analyzer":{
"synonyms":{
"filter":[
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter":{
"synonym_filter":{
"type": "synonym",
"synonyms":[
"United States,US,USA,USA=>usa",
"Alabama,Al,Ala,Ala",
"Alaska,Ak,Alas,Alas",
"Arizona,Az,Ariz",
"Arkansas,Ar,Ark",
"California,Ca,Calif,Cal",
"Colorado,Co,Colo,Col",
"Connecticut,Ct,Conn",
"Deleware,De,Del",
"District of Columbia,Dc,Wash Dc,Washington Dc=>Dc",
"Florida,Fl,Fla,Flor",
"Georgia,Ga",
"Hawaii,Hi",
"Idaho,Id,Ida",
"Illinois,Il,Ill,Ills",
"Indiana,In,Ind",
"Iowa,Ia,Ioa",
"Kansas,Kans,Kan,Ks",
"Kentucky,Ky,Ken,Kent",
"Louisiana,La",
"Maine,Me",
"Maryland,Md",
"Massachusetts,Ma,Mass",
"Michigan,Mi,Mich",
"Minnesota,Mn,Minn",
"Mississippi,Ms,Miss",
"Missouri,Mo",
"Montana,Mt,Mont",
"Nebraska,Ne,Neb,Nebr",
"Nevada,Nv,Nev",
"New Hampshire,Nh=>Nh",
"New Jersey,Nj=>Nj",
"New Mexico,Nm,N Mex,New M=>Nm",
"New York,Ny=>Ny",
"North Carolina,Nc,N Car=>Nc",
"North Dakota,Nd,N Dak, NoDak=>Nd",
"Ohio,Oh,O",
"Oklahoma,Ok,Okla",
"Oregon,Or,Oreg,Ore",
"Pennsylvania,Pa,Penn,Penna",
"Rhode Island,Ri,Ri & PP,R Isl=>Ri",
"South Carolina,Sc,S Car=>Sc",
"South Dakota,Sd,S Dak,SoDak=>Sd",
"Tennessee,Te,Tenn",
"Texas,Tx,Tex",
"Utah,Ut",
"Vermont,Vt",
"Virginia,Va,Virg",
"Washington,Wa,Wash,Wn",
"West Virginia,Wv,W Va, W Virg=>Wv",
"Wisconsin,Wi,Wis,Wisc",
"Wyomin,Wi,Wyo"
]
}
}
And the mapping for the location.region field:
"region":{
"properties":{
"id":{"type": "long"},
"name":{
"type": "string",
"analyzer": "synonyms",
"fields":{"raw":{"type": "string", "index": "not_analyzed" }}
}
}
}
But the synonyms analyzer doesn't seem to be doing anything. This query for example:
"match" : {
"location.region.name" : {
"query" : "Massachusetts",
"type" : "phrase",
"analyzer" : "synonyms"
}
}
This returns hundreds of results, but if I replace "Massachusetts" with "Ma" or "Mass" I get 0 results. Why isn't it working?
The order of the filters is
filter":[
"lowercase",
"synonym_filter"
]
So, if elasticsearch is "lowercasing" first the tokens, when it executes the second step, synonym_filter, it won't match any of the entries you have defined.
To solve the problem, I would define the synonyms in lower case
You can also define your synonyms filter as case insensitive:
"filter":{
"synonym_filter":{
"type": "synonym",
"ignore_case" : "true",
"synonyms":[
...
]
}
}