I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}
Related
I have an object in Elasticsearch which may contain different fields. In my app this object is Enum so it can't actually contain more than one field at the same time. But when i do an update in Elasticsearch - it appends the fields instead of overwriting the whole object.
For example - the document may be public or accessible only to a group of users:
POST _template/test_template
{
"index_patterns": [
"test*"
],
"template": {
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "keyword"
},
"users": {
"type": "object",
"properties": {
"permitted": {
"type": "keyword"
},
"public": {
"type": "boolean"
}
}
}
}
}
},
"aliases" : {
"test-alias" : { }
}
}
POST test_doc/_doc/1
{
"id": "1",
"users": {
"permitted": [
"1", "2"
]
}
}
POST _bulk
{"update":{"_index":"test_doc","_type":"_doc","_id":1}}
{"doc":{"id":"1","users":{"public": true}},"doc_as_upsert":true}
GET test-alias/_search
I am expecting this result:
{
"id": "1",
"users": {
"public": true
}
}
But the actual result is:
{
"id": "1",
"users": {
"permitted": [
"1",
"2"
],
"public": true
}
}
At the same time it overwrites the fields with the same name perfectly (i can change the permitted array or public field to false). How do you disable object fields appending?
You need to change the action in bulk request to index from update, correct request would be
{"index":{"_index":"71908768","_id":1}}
{"doc":{"id":"1","users":{"public": true}}}
refer actions and what they do in details in the official Elasticsearch docs. In short, update partially updates the document, while index action Indexes the specified document. If the document exists, replaces the document and increments the version.
I want to implement a simple username search within Elasticsearch. I don't want weighted username searches yet, so I would expect it wouldn't be to hard to find resources on how do this. But in the end, I came across NGrams and lot of outdated Elasticsearch tutorials and I completely lost track on the best practice on how to do this.
This is now my setup, but it is really bad because it matches so much unrelated usernames:
{
"settings": {
"index" : {
"max_ngram_diff": "11"
},
"analysis": {
"analyzer": {
"username_analyzer": {
"tokenizer": "username_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"username_tokenizer": {
"type": "ngram",
"min_gram": "1",
"max_gram": "12"
}
}
}
},
"mappings": {
"properties": {
"_all" : { "enabled" : false },
"username": {
"type": "text",
"analyzer": "username_analyzer"
}
}
}
}
I am using the newest Elasticsearch and I just want to query similar/exact usernames. I have a user db and users should be able to search for eachother, nothing to fancy.
If you want to search for exact usernames, then you can use the term query
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
There is no need to use an n-gram tokenizer if you want to search for the exact term.
Adding a working example with index data, index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index Data:
{
"username": "Jack"
}
{
"username": "John"
}
Search Query:
{
"query": {
"term": {
"username.keyword": "Jack"
}
}
}
Search Result:
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Jack"
}
}
]
Edit 1:
To match for similar terms, you can use the fuzziness parameter along with the match query
{
"query": {
"match": {
"username": {
"query": "someting",
"fuzziness":"auto"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "3",
"_score": 0.6065038,
"_source": {
"username": "something"
}
}
]
In ES 2.3.3, many queries in the system I'm working on use the _all field. Sometimes these are registered to a percolate index, and when running percolator on the doc, _all is generated automatically.
In converting to ES 5.X _all is being deprecated and so _all has been replaced with a copy_to field that contains the components that we actually care about, and it works great for those searches.
Registering the same query to a percolate index with the same document mapping including copy_to fields works fine. Sending a percolate query with the document never results in a hit for a copy_to field however.
Manually building the copy_to field via simple string concatenation seems to work, it's just that I'd expect to be able to Query -> DocIndex and get the same result as Doc -> PercolateQuery... So I'm just looking for a way to have ES generate the copy_to fields automatically on a document being percolated.
Ended up there was nothing wrong with ES of course, posting here in case it helps someone else. Figured it out while attempting to generate a simpler example to post here with details... Basically the issue came down to the fact that attempting to percolate a document of a type that doesn't exist in the percolate index doesn't give any errors back, but seems to apply all percolate queries without applying any mappings which was just confusing as it worked for simple test cases, but not complex ones. Here's an example:
From the copy_to docs, generate an index with a copy_to mapping. See that a query to the copy_to field works.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
Create a percolate index with the same type
PUT /my_percolate_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
Create a percolate query that matches our other percolate query on the copy_to field, and a second query that just queries on a basic unmodified field
PUT /my_percolate_index/queries/1?refresh
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
PUT /my_percolate_index/queries/2?refresh
{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}
Search, but with the wrong type... there will be a hit on the basic field (first_name: John) even though no document mappings match the request
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "non_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Send in the correct document_type and see both matches as expected
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "my_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.51623213,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"1","_score":0.51623213,"_source":{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}},{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Having noticed my sort on an indexed string field doesn't work properly, I've discovered that it's sorting analyzed strings so "bags of words" and if I want it to work properly I have to sort on the non-analyzed string. My plan was to just change the string field to a multi-field, using information I found in those two articles:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime (Upgrade to a multi-field part)
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
Using Sense I've created this field mapping
PUT myindex/_mapping/type
{
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And then I try to sort my search results using the newly made field. I've put all of the name variations I could think of after reading the articles:
POST myindex/_search
{
"_source" : ["Title","titlemap.Raw","titlemap.Title","titlemap.Title.Raw","Title.Title","Raw","Title.Raw"],
"size": 6,
"query": {
"multi_match": {
"query": "title",
"fields": ["Title^5"
],
"fuzziness": "auto",
"type": "best_fields"
}
},
"sort": {
"Title.Raw": "asc"
}
}
And that's what I get in response:
{
"_index": "myindex_2015_11_26_12_22_38",
"_type": "type",
"_id": "1205",
"_score": null,
"_source": {
"Title": "The title of the item"
},
"sort": [
null
]
}
Only the Title field's value is shown in the response and the sort criterium is null for every result.
Am I doing something wrong, or is there another way to do that?
The index name is not the same after re-indexing and thus the default mapping gets installed... that's probably why.
I suggest using an index template instead, so you don't have to care when to create the index and ES will do it for you. The idea is to create a template with the proper mapping you need and then ES will create every new index whenever it deems necessary, add the myindex alias and apply the proper mapping to it.
curl -XPUT localhost:9200/_template/myindex_template -d '{
"template": "myindex_*",
"settings": {
"number_of_shards": 1
},
"aliases": {
"myindex": {}
},
"mappings": {
"type": {
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
Then whenever you launch your re-indexing process a new index with a new name will be created BUT with the proper mapping and the proper alias.
I have an index like this
{
"_index": "entities",
"_type": "names",
"_id": "0000230799",
"_score": 1,
"_source": {
"FIRST_NAME": [
"Deborah",
"Debbie"
],
"LAST_NAME": "Jones"
}
}
I attempt to do a match query on the name, but unless the first name is exact, no hits return
I would expect the below query to generate at least one hit and score it, am i wrong about that?
curl -XPOST 'http://localhost:9200/entities/names/_search?pretty=true' -d '
{
"query": {
"match":{
"FIRST_NAME":"Deb"
}
}
}'
my mappings are
{
"entities": {
"mappings": {
"names": {
"_parent": {
"type": "entity"
},
"_routing": {
"required": true
},
"properties": {
"FIRST_NAME": {
"type": "string"
},
"LAST_NAME": {
"type": "string"
}
}
}
}
}
}
The issue here isn't related to multiple values, but your assumption that the match-query will match anything that starts with your input. It does not.
In the match-family of queries there's the match_phrase_prefix that can be worth checking out. It is explained in a bit more detail here: http://www.elasticsearch.org/blog/starts-with-phrase-matching/
There is also the prefix-query, but note that it does not do any text analysis.
For a general introduction to text analysis, I can recommend these two articles:
https://www.found.no/foundation/text-analysis-part-1/
https://www.found.no/foundation/text-analysis-part-2/