I am new to query for elasticsearch. I have the problem which took me so long to do research and implement it.
I have 4 fields for searching but not all fields has to be chosen when do searching, some field can leave blank.
For example, employee first name, employee last name, job title and department. I would perform searching such use case: just put first name and last name to get all people have that first and last name, or I can choose last name + first name + job title, or I can choose last name and department only and other params leave blank.
Anyone has an idea for the query. I would appreciate any suggestion.
Thank you
One way to achieve your use case is to use bool query or query_string
Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"department": {
"type": "text"
},
"fname": {
"type": "text"
},
"lname": {
"type": "text"
},
"title": {
"type": "text"
}
}
}
}
Index data:
{
"fname": "john",
"title": "faculty",
"department": "navy"
}
{
"fname": "smith",
"title": "faculty",
"department": "navy"
}
{
"fname": "Will",
"lname": "Smith",
"title": "Student",
"department": "engineering"
}
Search Query using bool query:
{
"query": {
"bool": {
"must": [
{
"match": {
"fname": "Smith"
}
},
{
"match": {
"title": "faculty"
}
},
{
"match": {
"department": "navy"
}
}
]
}
}
}
Search Result will be
"hits": [
{
"_index": "67550433",
"_type": "_doc",
"_id": "2",
"_score": 1.9208363,
"_source": {
"fname": "Smith",
"title": "faculty",
"department": "navy"
}
}
]
OR you can even use query_string
{
"query":{
"query_string":{
"query":"fname:smith AND title:faculty AND department:navy"
}
}
}
You can build your query programmatically, as a key-value hash with a 'must' key, and by looping through your input field types, add your fields to the 'must' only if provided.
Alternatively, you can put all the fields under a 'should', assuming that no match will be found for the blank values and therefore there will be no impact on the score and results.
Related
I'm implementing a search box in Elasticsearch and I have an Elasticsearch index with the following mappings:
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"brand": {
"type": "text"
}
}
}
}
And I'd like, quite simply, to do a query such as (in SQL):
SELECT * FROM <table> WHERE brand ILIKE '%test%' OR name ILIKE '%test%';
I've tried a query such as:
{
"query": {
"query_string": {
"query": "*test*",
"fields": ["brand", "name"]
}
}
}
and that gives me my desired result, however, I've noticed that the docs recommend not using query_string for a search box as it can lead to performance issues.
I then tried a multi_match query:
{
"query": {
"multi_match" : {
"query": "test"
}
}
}
But that yielded no results. Further, when I used an ngram tokenizer, it returned all documents all the time.
I've consulted countless resources on this and even on StackOverflow there are countless unanswered questions regarding this topic. Could somebody explain how this is achieved in the Elasticsearch world, or am I simply using the wrong tool for the job? Thanks.
Since you have not provided the sample documents, I have created complete example, what you are trying to do is very much possible in Elasticsearch, with simple boolean should wildcard queries as shown below
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.keyword": {
"value": "*test*"
}
}
},
{
"wildcard": {
"brand.keyword": {
"value": "*test*"
}
}
}
],
"minimum_should_match": 1,
"boost": 1.0
}
}
}
You can test above query on below sample documents
{
"brand" : "test",
"name" : "name foo according to use"
}
{
"brand" : "barand name is foo",
"name" : "name foo according to use"
}
{
"brand" : "barand name is test",
"name" : "name tested according to use"
}
{
"brand" : "barand name is testing",
"name" : "test the name"
}
on above 4 sample documents, query returns below documents
"hits": [
{
"_index": "73885469",
"_id": "1",
"_score": 2.0,
"_source": {
"brand": "barand name is testing",
"name": "test the name"
}
},
{
"_index": "73885469",
"_id": "2",
"_score": 2.0,
"_source": {
"brand": "barand name is test",
"name": "name tested according to use"
}
},
{
"_index": "73885469",
"_id": "4",
"_score": 1.0,
"_source": {
"brand": "test",
"name": "name foo according to use"
}
}
]
Which is i believe your expected documents
I have a index created in ElasticSearch with the field name where I store the whole name of a person: Name and Surname. I want to perform full text search over that field so I have indexed it using the analyzer.
My issue now is that if I search:
"John Rham Rham"
And in the index I had "John Rham Rham Luck", that value has higher score than "John Rham Rham".
Is there any posibility to have better score on the exact field than in the field with more values in the string?
Thanks in advance!
I worked out a small example (assuming you're running on ES 5.x cause of the difference in scoring):
DELETE test
PUT test
{
"settings": {
"similarity": {
"my_bm25": {
"type": "BM25",
"b": 0
}
}
},
"mappings": {
"test": {
"properties": {
"name": {
"type": "text",
"similarity": "my_bm25",
"fields": {
"length": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
POST test/test/1
{
"name": "John Rham Rham"
}
POST test/test/2
{
"name": "John Rham Rham Luck"
}
GET test/_search
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "John Rham Rham",
"operator": "and"
}
}
},
"functions": [
{
"script_score": {
"script": "_score / doc['name.length'].getValue()"
}
}
]
}
}
}
This code does the following:
Replace the default BM25 implementation with a custom one, tweaking the B parameter (field length normalisation)
-- You could also change the similarity to 'classic' to go back to TF/IDF which doesn't have this normilisation
Create an inner field for your name field, which counts the number of tokens inside your name field.
Update the score according to the length of the token
This will result in:
"hits": {
"total": 2,
"max_score": 0.3596026,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.3596026,
"_source": {
"name": "John Rham Rham"
}
},
{
"_index": "test",
"_type": "test",
"_id": "2",
"_score": 0.26970196,
"_source": {
"name": "John Rham Rham Luck"
}
}
]
}
}
Not sure if this is the best way of doing it, but it maybe point you in the right direction :)
I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}
I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.
When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.
After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.
I have an index like this
{
"_index": "entities",
"_type": "names",
"_id": "0000230799",
"_score": 1,
"_source": {
"FIRST_NAME": [
"Deborah",
"Debbie"
],
"LAST_NAME": "Jones"
}
}
I attempt to do a match query on the name, but unless the first name is exact, no hits return
I would expect the below query to generate at least one hit and score it, am i wrong about that?
curl -XPOST 'http://localhost:9200/entities/names/_search?pretty=true' -d '
{
"query": {
"match":{
"FIRST_NAME":"Deb"
}
}
}'
my mappings are
{
"entities": {
"mappings": {
"names": {
"_parent": {
"type": "entity"
},
"_routing": {
"required": true
},
"properties": {
"FIRST_NAME": {
"type": "string"
},
"LAST_NAME": {
"type": "string"
}
}
}
}
}
}
The issue here isn't related to multiple values, but your assumption that the match-query will match anything that starts with your input. It does not.
In the match-family of queries there's the match_phrase_prefix that can be worth checking out. It is explained in a bit more detail here: http://www.elasticsearch.org/blog/starts-with-phrase-matching/
There is also the prefix-query, but note that it does not do any text analysis.
For a general introduction to text analysis, I can recommend these two articles:
https://www.found.no/foundation/text-analysis-part-1/
https://www.found.no/foundation/text-analysis-part-2/