How should I query Elastic Search given my mapping and using keywords? - elasticsearch

I have a very simple mapping which looks like this (I streamlined the example a bit):
{
"location" : {
"properties": {
"name": { "type": "string", "boost": 2.0, "analyzer": "snowball" },
"description": { "type": "string", "analyzer": "snowball" }
}
}
}
Now I index a lot of locations using some random values which are based on real English words.
I'd like to be able to search for locations that match any of the given keywords in either the name or the description field (name is more important, hence the boost I gave it). I tried a few different queries and they don't return any results.
{
"fields" : ["name", "description"],
"query" : {
"terms" : {
"name" : ["savage"],
"description" : ["savage"]
},
"from" : 0,
"size" : 500
}
}
Considering there are locations which have the word savaged in the description it should get me some results (savage is the stem of savaged). It yields 0 results using the above query. I've been using curl to query ES:
curl -XGET -d #query.json http://localhost:9200/myindex/locations/_search
If I use query string instead:
curl -XGET http://localhost:9200/fieldtripfinder/locations/_search?q=description:savage
I actually get one result (of course now it would be searching the description field only).
Basically I am looking for a query that will do a OR kind of search using multiple keywords and compare them to the values in both the name and the description field.

Snowball stems "savage" into "savag" that’s why term "savage" didn't return any results. However, when you specify "savage" on URL, it’s getting analyzed and you get results. Depending on what your intention is, you can either use correct stem ("savag") or analyze your terms by using "match" query instead of "terms":
{
"fields" : ["name", "description"],
"query" : {
"bool" : {
"should" : [
{"match" : {"name" : "savage"}},
{"match" : {"description" : "savage"}}
]
},
"from" : 0,
"size" : 500
}
}

Related

How do I get the occurences and doc_count of every term of an ES Index?

I have an ES Index of the form
{
"adminfile" : {
"mappings" : {
"properties" : {
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
With the field 'title' being the title of the string found in the field 'text'. The titles do not contain any spaces, while the texts are normal texts (sentences with spaces and dots etc).
I want to get all the terms in the index and their doc_count and/or frequency. I found this query in the ES doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
GET /adminfile/_search
{
"size": 10,
"aggs" : {
"text" : {
"terms" : {
"field" : "text.keyword",
"order" : { "_count" : "asc" },
"size": 10
}
}
}
}
This returns all the sources but the aggregation buckets are empty. If I change "text.keyword" to "title.keyword" in that command, it does work and return all the titles as keys.
Why does it not work on the text fields?
Is there a better command to use? I know that this:
GET /adminfile/_search
{
"query" : {
"match" : {"text" : "WordToSearch"}
},
"_source":false,
"aggregations": {
"keywords" : {
"significant_text" : {
"field" : "text",
"filter_duplicate_text": true,
"size": 100
}
}
},
"highlight": {
"fields": {
"text": {}
}
}
}
works to get all occurences of wordToSearch in every document of the index, with the counts and frequency. Is there a way to ask this command to match every word of every doc?
EDIT: I have also tried changing the name of the text field to "contenu" in case ES didn't like have a field of name 'text' and of type 'text'. No effect.
Another option could be using https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html but it _termvectors only works for one specific ID (or _mtermvectors for mutliple specific ID, not all the documents in any case)
EDIT2: I realised that the ignore_above could be a problem. I tried cutting all my texts to 200 chars as a test. The query now runs, except that it returns the entire text as a key instead of cutting it into words.
When you use the keyword version of the field, the content is kept as a single, large token. You're right in assuming that ignore_above is the source of your problem, since these tokens apparently will be longer than 256 characters in your data set.
If you instead aggregate across the tokenized field (the normal text field), instead of the keyword version, you'll get counts for each word (i.e. each token) as processed by the field.

Elasticsearch slow results with IN query and Scoring

I have text document data (500k approximately) saved in elasticsearch where the document text is mapped with it's corresponding document number.
I am trying to fetch results in batches for "Sample Text" in particular set of document numbers (300k appoximately) with scoring and i am facing extreme slowness in the result.
Here is the the Mapping
PUT my_index
{
"mappings" : {
"doc_repo" : {
"properties" : {
"doc_number" : {
"type" : "integer"
},
"document" : {
"type" : "string",
"term_vector" : "with_positions_offsets_payloads"
}
}
}
}
}
Here is the request query
{
"query" : {
"bool" : {
"must" : [
{
"terms" : {
"document" : [
"sample text"
]
}
},
{
"terms" : {
"doc_number" : [1,2,3....,300K] //ArrayOf_300K_DocNumbers
}
}
]
}
},
"fields" : [
"doc_number"
],
"size" : 500,
"from" : 0
}
I Tried fetching result in two other ways
Result without scoring in particular set of document numbers(i used filtering for this)
Result with scoring but without any particular set of document numbers (in batches)
Both of these were pretty quick, but problem comes when i am trying achieve both.
Do i need to change mapping or search query or any other ways to achieve this.
Thanks in advance.
Issue was specifically with elasticsearch 2.X, Upgrading elasticsearch solves the issue.

Asking for significant terms but returns nothing

I am having an issue with Elasticsearch (version 2.0), I am trying to get the significant terms from a bunch of documents but it always returns nothing.
Here is the schema of my index :
{
"documents" : {
"warmers" : {},
"mappings" : {
"document" : {
"properties" : {
"text" : {
"index" : "not_analyzed",
"type" : "string"
},
"entities": {
"properties": {
"text": {
"index": "not_analyzed",
"type": "string"
}
}
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1447410095617",
"uuid" : "h2m2J9sJQaCpxvGDI591zg",
"number_of_replicas" : "1",
"version" : {
"created" : "2000099"
},
"number_of_shards" : "5"
}
},
"aliases" : {}
}
}
So it's a simple index that contains the field text, which is not analyzed, and an array entities that will contains dictionnaries with a single field: text, which is not analyzed neither.
What I want to do is to match some of the documents and extracts the most significant terms from the entities associated. For that, I use a wildcard and then an aggregation.
Here is the the request I am sending through curl:
curl -XGET 'http://localhost:9200/documents/_search' -d '{
"query": {
"bool": {
"must": {"wildcard": {"text": "*test*"}}
}
},
"aggregations" : {
"my_significant_terms" : {
"significant_terms" : { "field" : "entities.text" }
}
}
}'
Unfortunately, even if Elasticsearch is hitting on some documents, the buckets of the significant terms aggregation are always empty.
I tried to put analyzed instead of not_analyzed also, but I got the same empty results.
So first, is it relevant to do it this way ?
I am a very beginner to Elasticsearch, so, can you explain me how the significant terms aggregations work ?
And finaly, if it is relevant, why my query isn't working ?
EDIT: I just saw in the Elasticsearch documentation that the significant terms aggregation need a certain amount of data to become effective, and I just have 163 documents in my index. Could it be that ?
Not sure if it will help. Try to specify
"min_doc_count" : 1
the significant terms aggregation need a certain amount of data to
become effective, and I just have 163 documents in my index. Could it
be that ?
Using 1 shard not 5 will help if you have a small number of docs.

Favor exact matches over nGram in elasticsearch

I am trying to map a field as nGram and 'exact' match, and make the exact matches appear first in the search results. This is an answer to a similar question, but I am struggling to make it work.
No matter what boost value I specify for the 'exact' field I get the same results order each time. This is how my field mapping looks:
"name" : {
"type" : "multi_field",
"fields" : {
"name" : {
"type" : "string",
"boost" : 2.0,
"analyzer" : "ngram"
},
"exact" : {
"type" : "string",
"boost" : 4.0,
"analyzer" : "simple",
"include_in_all" : false
}
}
}
And this is how the query looks like:
{
"query": {
"filtered": {
"query": {
"query_string": {
"fields":["name","name.exact"],
"query":"Woods"
}
}
}
}
}
Understating how score is calculated
Elasticsearch has an option for producing an explanation with every search result. by setting the explain parameter to be true
POST <Index>/<Type>/_search?explain&format=yaml
{
"query" : " ....."
}
it will produce a lot of output for every hit and that can be overwhelming, but it worth taking some time to understand what it all means
the output of eplian might be harder to read in json, so adding format=yaml makes it easier to read
Understanding why a document is matched or not
you can pass the query to a specific document like below to see explanation how matching is being done.
GET <Index>/<type>/<id>/_explain
{
"query": "....."
}
The multi_field mapping is correct, but the search query needs to be changed like this:
{
"query": {
"filtered": {
"query": {
"multi_match": { # changed from "query_string"
"fields": ["name","name.exact"],
"query": "Woods",
# added this so the engine does a "sum of" instead of a "max of"
# this is deprecated in the latest versions but works with 0.x
"use_dis_max": false
}
}
}
}
}
Now the results take into account the 'exact' match and adds up to the score.

Elasticsearch phrase prefix query on multiple fields

I'm new to ES and I'm trying to build a query that would use phrase_prefix for multiple fields so I dont have to search more than once.
Here's what I've got so far:
{
"query" : {
"text" : {
"first_name" : {
"query" : "Gustavo",
"type" : "phrase_prefix"
}
}
}
}'
Does anybody knows how to search for more than one field, say "last_name" ?
The text query that you are using has been deprecated (effectively renamed) a while ago in favour of the match query. The match query supports a single field, but you can use the multi_match query which supports the very same options and allows to search on multiple fields. Here is an example that should be helpful to you:
{
"query" : {
"multi_match" : {
"fields" : ["title", "subtitle"],
"query" : "trying out ela",
"type" : "phrase_prefix"
}
}
}
You can achieve the same using the Java API like this:
QueryBuilders.multiMatchQuery("trying out ela", "title", "subtitle")
.type(MatchQueryBuilder.Type.PHRASE_PREFIX);

Resources