What's the right way to perform a keyword search? - elasticsearch

If I want to perform a keyword search using a TermQuery, what's the proper way to do this? Am I supposed to prepend ".keyword" to my field name? I would think there is a more first-class citizen way of doing it! 🤷‍♂️
QueryBuilders.termQuery(SOME_FIELD_NAME + ".keyword", someValue)

It all boils down to your mapping. If your field is mapped as a 'straightforward' keyword like so
{
"mappings": {
"properties": {
"some_field": {
"type": "keyword"
}
}
}
}
you won't need to append .keyword -- you'd do just
QueryBuilders.termQuery(SOME_FIELD_NAME, someValue)
It's good practice, though, not to restrict yourself to only keywords, esp. if you'll be doing partial matches, expansions, autocomplete etc down the line.
A typical text field mapping would look like
PUT kwds
{
"mappings": {
"properties": {
"some_field": {
"type": "text",
"fields": {
"keyword": { <---
"type": "keyword"
},
"analyzed": { <---
"type": "text",
"analyzer": "simple"
},
"...": { <---
...
}
}
}
}
}
}
This means you'd be able to access differently-indexed "versions" (fields) of the same "property" (field). The naming is rather confusing but you get the gist.
Long story short, this is where the .keyword convention stems from. You don't need it if your field is already mapped as a keyword.

Related

Can anyone help me - how to use arrays in opensearch?

I put an object with some field and i wanna figure out how to mapping the index to handle and show the values like elasticsearch. I dunno why opensearch separate to individual fields the values. Both app has the same index mappings but the display is different for something.
I tried to map the object type set to nested but nothing changes
PUT test
{
"mappings": {
"properties": {
"szemelyek": {
"type": "nested",
"properties": {
"szam": {
"type": "integer"
},
"nev": {
"type": "text"
}
}
}
}
}
}

Elasticsearch custom mapping definition

I have to upload data to elk in the following format:
{
"location":{
"timestamp":1522751098000,
"resources":[
{
"resource":{
"name":"Node1"
},
"probability":0.1
},
{
"resource":{
"name":"Node2"
},
"probability":0.01
}]
}
}
I'm trying to define a mapping this kind of data and I produced he following mapping:
{
"mappings": {
"doc": {
"properties": {
"location": {
"properties" : {
"timestamp": {"type": "date"},
"resources": []
}
}
}
}
}
I have 2 questions:
how can I define the "resources" array in my mapping?
is it possible to define a custom type (e.g. resource) and use this type in my mapping (e.g "resources": [{type:resource}]) ?
There is a lot of things to know about the Elasticsearch mapping. I really highly suggest to read through at least some of their documentation.
Short answers first, in case you don't care:
Elasticsearch automatically allows storing one or multiple values of defined objects, there is no need to specify an array. See Marker 1 or refer to their documentation on array types.
I don't think there is. Since Elasticsearch 6 only 1 type per index is allowed. Nested objects is probably the closest, but you define them in the same file. Nested objects are stored in a separate index (internally).
Long answer and some thoughts
Take a look at the following mapping:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resources": { [1]
"type": "nested", [2]
"properties": {
"resource": {
"properties": {
"name": { [3]
"type": "text"
}
}
},
"probability": {
"type": "float"
}
}
}
}
}
}
}
}
This is how your mapping could look like. It can be done differently, but I think it makes sense this way - maybe except marker 3. I'll come to these right now:
Marker 1: If you define a field, you usually give it a type. I defined resources as a nested type, but your timestamp is of type date. Elasticsearch automatically allows storing one or multiple values of these objects. timestamp could actually also contain an array of dates, there is no need to specify an array.
Marker 2: I defined resources as a nested type, but it could also be an object like resource a little below (where no type is given). Read about nested objects here. In the end I don't know what your queries would look like, so not sure if you really need the nested type.
Marker 3: I want to address two things here. First, I want to mention again that resource is defined as a normal object with property name. You could do that for resources as well.
Second thing is more a thought-provoking impulse: Don't take it too seriously if something absolutely doesn't fit your case. Just take it as an opinion.
This mapping structure looks very inspired by a relational database approach. I think you usually want to define document structures for elasticsearch more for the expected searches. Redundancy is not a problem, but nested objects can make your queries complicated. I think I would omit the whole resources part and do it something like this:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resource": {
"properties": {
"resourceName": {
"type": "text"
}
"resourceProbability": {
"type": "float"
}
}
}
}
}
}
}
}
Because as I said, in this case resource can contain an array of objects, each having a resourceName and a resourceProbability.

Using both term and match query on same text field?

I have an index with a text field.
"state": {
"type": "text"
}
Now suppose there are two data.
"state": "vail"
and
"state": "eagle vail"
For one of my requirements,
- I need to do a term level query, such that if I type "vail", the search results should only return states with "vail" and not "eagle vail".
But another requirement for different search on the same index,
- I need to do a match query for full text search, such that if I type "vail", "eagle vail" should display as well.
So my question is, how do I do both term level and full text search in this field, as for doing a term level query, I would have to set it as "keyword" type such that it wont be analyzed.
You can use "multi-field" feature to achieve this. Here is a mapping:
{
"mappings": {
"my_type": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
In this case state will act as text field (tokenized) whereas state.raw will be keyword (single-token). When indexing a document you should only set state. state.raw will be created automatically.

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.
I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

How to not-analyze in ElasticSearch?

I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.
If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.
Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?
I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).
This is my mapping currently:
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string" }
}
}
}
my_field1 is language-dependent; this seems to work. my_field2 shall be verbatim. I'd like to give an analyzer there which simply does not do anything.
A sample value for my_field2 would be "B45c 14/04".
"my_field2": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.
This is no longer true due to the removal of the string (replaced by keyword and text) type as described here. Instead you should use keyword type with "index": true | false.
For Example OLD:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
becomes NEW:
{
"foo": {
"type" "keyword",
"index": true
}
}
This means the field is indexed but as it is typed as keyword not analyzed implicitly. If you would like to have the field analyzed, you need to use text type.
keyword analyser can be also used.
// don't actually use this, use "index": "not_analyzed" instead
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string", "analyzer": "keyword" }
}
}
}
As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed.
But keyword analyzer can be useful when it is set by default for whole index.
UPDATE: As it said in comments, string is no longer supported in 5.X

Resources