Find uppercase strings with wildcard - elasticsearch

I have a field my_field that is defined like this:
"properties" : {
...
"my_field" : { "type" : "string", "store" : "no", "index" : "not_analyzed" },
...
}
All lowercase strings that are stored in that field can be found with wildcard:
i.e. kindergarten can be found with my_field:kinder*
but all uppercase strings cannot be found with wildcard:
i.e. KINDERGARTEN can neither be found with myfield:KINDER* nor with my_field:kinder*
Is that the expected behaviour or am I doing something wrong?

You must set lowercase_expanded_terms to false in order to do case-sensitive search with wildcards. Like this: http://localhost:9200/test/_search?lowercase_expanded_terms=false&q=my_field:KINDER*

I did quick test and everything looks correct to me.
I would try to test analysis on that field using /_analyze API to see that values really aren't lowercased.
curl -XPOST 'http://localhost:9200/test/_analyze?field=my_field' -d {
"test": "This Should Be Single Token"
}
Or try Index Termlist Plugin to see what tokens are actually stored in that field.

Related

Wildcard alias for Elastic index

I've got an Elastic index transactions-internal and would like to point all the names like transactions-([a-z]+)-internal to this index using alias, so all the requests like
GET /transactions-a-internal/_search
GET /transactions-b-internal/_search
GET /transactions-c-internal/_search
...
etc
should give the same result as
GET /transactions-internal/_search
I've tried
POST /transactions-internal/_alias/transactions-*-internal
but it returned
Invalid alias name [...] must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]
Is there any "smart" solution for that? I would strongly prefer co configure it on Elastic side, not anywhere else.
You're almost there. It's the other way around, i.e. POST /<index>/_alias/<alias>
POST /transactions-*-internal/_alias/transactions-internal
UPDATE:
If you want the other way around, then you can use the following (note that an alias name cannot contain wildcard characters):
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "transactions-internal", "alias" : "transactions-a-internal" } },
{ "add" : { "index" : "transactions-internal", "alias" : "transactions-b-internal" } },
{ "add" : { "index" : "transactions-internal", "alias" : "transactions-c-internal" } }
]
}
Not quite sure if this is applicable to your situation but if you're starting from scratch a possible solution might be to use a index template.
PUT _index_template/transactions-internal
{
"priority": 1,
"template": {
"aliases": {
"transactions-internal": {}
}
},
"index_patterns": [
"transactions-*-internal"
],
"composed_of": []
}
As I'm quite new to elastic I don't know if this template can be applied to an existing index.But this approach will work for new indizes in v 7.12.1

Wildcard on key in graylog

Hi I am trying to use wildcard on key inside of my query. Because I have arrays in my data so I am saving my data in flat form. Like obj_0_id, obj_1_ID and so on. So is there any way to write something like this obj_*_ID:123
Thank you
You can use query_string which lets you specify wildcards on field names. Example from docs:
{
"query_string" : {
"fields" : ["city.*"],
"query" : "this AND that OR thus",
"use_dis_max" : true
}
}

ElasticSearch filter on exact url

Let's say I create this document in my index:
put /nursery/rhyme/1
{
"url" : "http://example.com/mary",
"text" : "Mary had a little lamb"
}
Why does this query not return anything?
POST /nursery/rhyme/_search
{
"query" : {
"match_all" : {}
},
"filter" : {
"term" : {
"url" : "http://example.com/mary"
}
}
}
The Term Query finds documents that contain the exact term specified in the inverted index. When you save the document, the url property is analyzed and it will result in the following terms (with the default analyzer) : [http, example, com, mary].
So what you currently have in you inverted index is that bunch of terms, non of them is http://example.com/mary.
What you want is to not analyze the url property or to do a Match Query that will split the query into terms just like when indexing.
Exact Match does not work for analyzed field. A string is by default analyzed which means http://example.com/mary string will be split and stored in reverse index as http , example , com , mary. That's why your query results in no output.
You can make your field not analyzed
{
"url": {
"type": "string",
"index": "not_analyzed"
}
}
but for this you will have to reindex your index.
Study about not_analyzed and term query here.
Hope this helps
In the ElasticSearch 7.x you have to use type "keyword" in maping properties, which is not analized https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html

Special Character "- " in the Elastic Search acting

"-" is acting like a or operator for e.g. I am searching "t-link", then it showing the result containing "t-link" as well as "t", why it is giving two terms, but i interested in the "t-link", why it is happening so? How can i recover from it?
Elasticsearch is using by default the standard analyzer for strings.
Basically, your string is tokenized in two tokens, lowercased:
t
link
If you need to know what does elasticsearch with your fields, use the _analyze API.
$ curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 't-link'
$ curl -XGET 'localhost:9200/_analyze?analyzer=simple' -d 't-link'
If you don't want that, make sure you put the right mapping for that field and use either a simple analyzer or a keyword analyzer or no analyzer at all depending on your requirements. See also String core type.
$ curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{
"tweet" : {
"properties" : {
"message" : {"type" : "string", "analyzer" : "simple"},
"other" : {"type" : "string", "index" : "not_analyzed"}
}
}
}
'
Using this form message field will be analyzed with simple analyzer and other field won't be analyzed at all.

Exact (not substring) matching in Elasticsearch

{"query":{
"match" : {
"content" : "2"
}
}} matches all the documents whole content contains the number 2, however I would like the content to be exactly 2, no more no less - think of my requirement in a spirit of Java's String.equals.
Similarly for the second query I would like to match when the document's content is exactly '3 3' and nothing more or less. {"query":{
"match" : {
"content" : "3 3"
}
}}
How could I do exact (String.equals) matching in Elasticsearch?
Without seeing your index type mapping and sample data, it's hard to answer this directly - but I'll try.
Offhand, I'd say this is similar to this answer here (https://stackoverflow.com/a/12867852/382774), where you simply set the content field's index option to not_analyzed in your mapping:
"url" : {
"type" : "string",
"index" : "not_analyzed"
}
Edit: I wasn't clear enough with my original answer, shown above. I did not mean to imply that you should add the example code to your query, I meant that you need to specify in your index type mapping that the url field is of type string and it is indexed but not analyzed (not_analyzed).
This tells Elasticsearch to not bother analyzing (tokenizing or token filtering) the field when you're indexing your documents - just store it in the index as it exists in the document. For more information on mappings, see http://www.elasticsearch.org/guide/reference/mapping/ for an intro and http://www.elasticsearch.org/guide/reference/mapping/core-types/ for specifics on not_analyzed (tip: search for it on that page).
Update:
Official doc tells us that in a new version of Elastic search you can't define variable as "not_analyzed", instead of this you should use "keyword".
For the old version elastic:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
For new version:
{
"foo": {
"type" "keyword",
"index": true
}
}
Note that this functionality (keyword type) are from elastic 5.0 and backward compatibility layer is removed from Elasticsearch 6.0 release.
Official Doc
You should use filter instead of match.
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"content" : 2
}
}
}
}
And you got docs whose content is exact 2, instead of 20 or 2.1

Resources