Elasticsearch match on property that has multiple values (array) - elasticsearch

I have an index like this
{
"_index": "entities",
"_type": "names",
"_id": "0000230799",
"_score": 1,
"_source": {
"FIRST_NAME": [
"Deborah",
"Debbie"
],
"LAST_NAME": "Jones"
}
}
I attempt to do a match query on the name, but unless the first name is exact, no hits return
I would expect the below query to generate at least one hit and score it, am i wrong about that?
curl -XPOST 'http://localhost:9200/entities/names/_search?pretty=true' -d '
{
"query": {
"match":{
"FIRST_NAME":"Deb"
}
}
}'
my mappings are
{
"entities": {
"mappings": {
"names": {
"_parent": {
"type": "entity"
},
"_routing": {
"required": true
},
"properties": {
"FIRST_NAME": {
"type": "string"
},
"LAST_NAME": {
"type": "string"
}
}
}
}
}
}

The issue here isn't related to multiple values, but your assumption that the match-query will match anything that starts with your input. It does not.
In the match-family of queries there's the match_phrase_prefix that can be worth checking out. It is explained in a bit more detail here: http://www.elasticsearch.org/blog/starts-with-phrase-matching/
There is also the prefix-query, but note that it does not do any text analysis.
For a general introduction to text analysis, I can recommend these two articles:
https://www.found.no/foundation/text-analysis-part-1/
https://www.found.no/foundation/text-analysis-part-2/

Related

Username search in Elasticsearch

I want to implement a simple username search within Elasticsearch. I don't want weighted username searches yet, so I would expect it wouldn't be to hard to find resources on how do this. But in the end, I came across NGrams and lot of outdated Elasticsearch tutorials and I completely lost track on the best practice on how to do this.
This is now my setup, but it is really bad because it matches so much unrelated usernames:
{
"settings": {
"index" : {
"max_ngram_diff": "11"
},
"analysis": {
"analyzer": {
"username_analyzer": {
"tokenizer": "username_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"username_tokenizer": {
"type": "ngram",
"min_gram": "1",
"max_gram": "12"
}
}
}
},
"mappings": {
"properties": {
"_all" : { "enabled" : false },
"username": {
"type": "text",
"analyzer": "username_analyzer"
}
}
}
}
I am using the newest Elasticsearch and I just want to query similar/exact usernames. I have a user db and users should be able to search for eachother, nothing to fancy.
If you want to search for exact usernames, then you can use the term query
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
There is no need to use an n-gram tokenizer if you want to search for the exact term.
Adding a working example with index data, index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index Data:
{
"username": "Jack"
}
{
"username": "John"
}
Search Query:
{
"query": {
"term": {
"username.keyword": "Jack"
}
}
}
Search Result:
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Jack"
}
}
]
Edit 1:
To match for similar terms, you can use the fuzziness parameter along with the match query
{
"query": {
"match": {
"username": {
"query": "someting",
"fuzziness":"auto"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "3",
"_score": 0.6065038,
"_source": {
"username": "something"
}
}
]

Handle Optional field search in Elasticsearch

I am new to query for elasticsearch. I have the problem which took me so long to do research and implement it.
I have 4 fields for searching but not all fields has to be chosen when do searching, some field can leave blank.
For example, employee first name, employee last name, job title and department. I would perform searching such use case: just put first name and last name to get all people have that first and last name, or I can choose last name + first name + job title, or I can choose last name and department only and other params leave blank.
Anyone has an idea for the query. I would appreciate any suggestion.
Thank you
One way to achieve your use case is to use bool query or query_string
Adding a working example with index data, mapping, search query and search result
Index Mapping:
{
"mappings": {
"properties": {
"department": {
"type": "text"
},
"fname": {
"type": "text"
},
"lname": {
"type": "text"
},
"title": {
"type": "text"
}
}
}
}
Index data:
{
"fname": "john",
"title": "faculty",
"department": "navy"
}
{
"fname": "smith",
"title": "faculty",
"department": "navy"
}
{
"fname": "Will",
"lname": "Smith",
"title": "Student",
"department": "engineering"
}
Search Query using bool query:
{
"query": {
"bool": {
"must": [
{
"match": {
"fname": "Smith"
}
},
{
"match": {
"title": "faculty"
}
},
{
"match": {
"department": "navy"
}
}
]
}
}
}
Search Result will be
"hits": [
{
"_index": "67550433",
"_type": "_doc",
"_id": "2",
"_score": 1.9208363,
"_source": {
"fname": "Smith",
"title": "faculty",
"department": "navy"
}
}
]
OR you can even use query_string
{
"query":{
"query_string":{
"query":"fname:smith AND title:faculty AND department:navy"
}
}
}
You can build your query programmatically, as a key-value hash with a 'must' key, and by looping through your input field types, add your fields to the 'must' only if provided.
Alternatively, you can put all the fields under a 'should', assuming that no match will be found for the blank values and therefore there will be no impact on the score and results.

Why is Elasticsearch filter showing all records?

I am using Elasticsearch 5.5 and trying to run a filter query on some metrics data. For example:
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYH2",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 95721248,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.014Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used",
"type": "jmx"
}
},
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYIU",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 0,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.030Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count",
"type": "jmx"
}
}
I am running the following query:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count"
}
}
}
}
}
Even then if it displaying all records. However, if I use the following text, it works by showing exact matches:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used"
}
}
}
}
}
Can anyone please tell me what wrong I am doing here?
Thanks.
You didn't mention anything about mappings so I suppose you're using dynamic mapping - you've just indexed documents like these two in your elasticsearch.
Once you visit
{yourhost}/zabbix_test-us-east-2-node2-2017.10.29/_mapping
you will see that metric_path field probably has type text which is default for strings. As documentation states:
A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
So your field is processed by analyzer and finally you're not executing match against something like this: zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count but rather against some analyzed form, probably split by periods, and some other special characters.
So if you want to perform filtering like you posted, you should statically define your index before indexing any documents. You don't have to do it for each property, but at least metric_path should be defined as keyword. So you can start with:
PUT {yourhost}/zabbix_test-us-east-2-node2-2017.10.29
{
"mappings": {
"jmx": {
"properties": {
"metric_path": {
"type": "keyword"
}
}
}
}
}
Then you should index your documents. Mapping for other fields will be established by ES dynamically, but both queries attached by you will return exactly one result - just as you expect.

Elasticsearch Multi-Field With 'Raw' Value Not Being Created

I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}

Elasticsearch OR filtered query does not return results

I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.
When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.
After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.

Resources