Multi word partial search - elasticsearch

i am very new to the Elastic search.
Like to know how to search partial multi word search.
\
for ex :
My document
{
"title":"harry porter"
}
i need this document with search with following string
1.)har por
same as sql query (select * from books where title like '%har%' or title like '%por%')

Using a completion suggester will provide most of the feature you want. It will find words starting with an arbitrary string, like "har" or "por".
Check out this question for a full example on how to set up a completion suggester.
As described in the documentation, you can achieve multi-word search (i.e. returning "harry horter" from a search for "por") by creating your analyzer with the option preserve_position_increments set to false
PUT books
{
"mappings": {
"book" : {
"properties" : {
"suggest" : {
"type" : "completion",
"preserve_position_increments": false
},
"title" : {
"type": "keyword"
}
}
}
}
}

Refer to this : Edge NGram Tokenizer
This helps in partial multi-word search (similar to autocomplete suggestions). Hope this helps!

Related

elasticsearch doesn't suggesting anything if the exact word is used as text?

I'm using text suggester of elasticsearch. My index contains a document which has a filed name and its value is crick
{
"suggest": {
"my-suggest" : {
"text" : "crick",
"term" : {
"field" : "name",
"sort": "score"
}
}
}
}
it return no match, it only returns a value if there is a misspelled
if I pass the exact text it return nothing any idea !!
You are not using suggest_mode
The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
missing: Only provide suggestions for suggest text terms that are not in the index. This is the default.
popular: Only suggest suggestions that occur in more docs then the original suggest text term.
always: Suggest any matching suggestions based on terms in the suggest text.
Since you haven't mentioned suggest_mode it is picking missing by default.
use this settings
{
"suggest": {
"my-suggest" : {
"text" : "crick",
"term" : {
"field" : "name",
"sort": "score",
"suggest_mode": "always"
}
}
}
}

ElasticSearch filter on exact url

Let's say I create this document in my index:
put /nursery/rhyme/1
{
"url" : "http://example.com/mary",
"text" : "Mary had a little lamb"
}
Why does this query not return anything?
POST /nursery/rhyme/_search
{
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"url" : "http://example.com/mary"
}
}
}
The Term Query finds documents that contain the exact term specified in the inverted index. When you save the document, the url property is analyzed and it will result in the following terms (with the default analyzer) : [http, example, com, mary].
So what you currently have in you inverted index is that bunch of terms, non of them is http://example.com/mary.
What you want is to not analyze the url property or to do a Match Query that will split the query into terms just like when indexing.
Exact Match does not work for analyzed field. A string is by default analyzed which means http://example.com/mary string will be split and stored in reverse index as http , example , com , mary. That's why your query results in no output.
You can make your field not analyzed
{
"url": {
"type": "string",
"index": "not_analyzed"
}
}
but for this you will have to reindex your index.
Study about not_analyzed and term query here.
Hope this helps
In the ElasticSearch 7.x you have to use type "keyword" in maping properties, which is not analized https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Full-text schema in ElasticSearch

I'm (extremely) new to ElasticSearch so forgive my potentially ridiculous question. I currently use MySQL to perform full-text searches, and want to move this to ElasticSearch. Currently my table has a fulltext index spanning three columns:
title,description,tags
In ES, each document would therefore have title, description and tags fields, allowing me to do a fulltext search for a general phrase, or filter on a given tag.
I also want to add further searchable fields such as username (so I can retrieve posts by a given user). So, how do I specify that a fulltext search should match title OR description OR tags but not username?
From the OR filter example, I'd assume I'd have to use something like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"term" : { "title" : "foobar" }
},
{
"term" : { "description" : "foobar" }
},
{
"term" : { "tags" : "foobar" }
}
]
}
}
}
Coming at this new, it doesn't seem like this is very efficient. Is there a better way of doing this, or do I need to move the username field to a separate index?
This is fine.
I general I would suggest getting familiar with ElasticSearch mapping types and options.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html

ES Search partial word - ngram?

I am using Elastic Search to index entities that contain two fields: agencyName and agencyAddress.
Let's say I have indexed one entity:
{
"agencyName": "Turismo Viajes",
"agencyAddress": "Av. MaipĂș 500"
}
I would like to be able to search for this entity and get the entity above searching through the agencyName. Different searches could be:
1) urismo
2) Viaje
3) Viajes
4) Turismo
5) uris
The idea is that if I query with those strings I should always get that entity (probably with different score depending on how accurate it is).
For this I thought that nGram would work out, so I defined a global analyzer in my elastic search.yml file called phrase.
index:
analysis:
analyzer:
phrase:
type: custom
tokenizer: nGram
filter: [nGram, lowercase, asciifolding]
And I created the agency index like this:
{
"possible_clients" : {
"possible_client" : {
"properties" : {
"agencyName" : {
"type" : "string",
"analyzer" : "phrase"
},
"agencyAddress" : {
"type": "string"
}
}
The problem is that when making a call like this:
curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
"query": { "term": { "agencyName": "uris" }}
}'
I don't get any hits. Any ideas what I am doing wrong?
Thanks in advance.
You are using a term query for searching. A term query is always unanalysed. So changing the analyser will not have any effect. You should use for example a match query.
According to the docs, the default value of the max_gram of your tokenizer is 2. So, you index tu, ur, ri, is, sm, mo , etc etc.
The term filter does not analyze your input, so, you are searching for uris, and uris was never indexed.
Try to set a max_gram. :
ngram tokenizer
ngram tokenfilter
And maybe you should not use both the ngram tokenizer and the ngram filter. I always used just the filter. (because the tokenizer was the whitespace)
here is a edgengram filter we had to define here. Ngrams should work just the same.
"filter" : {
"my_filter" : {
"type" : "edgeNGram",
"min_gram" : "1",
"max_gram" : "20"
}
}
Hope it helps.

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Resources