Elasticsearch-6.x norms false not working - elasticsearch

That is what I have Done:
First:
curl -X PUT "localhost:9200/log_20180419"
Second
curl -X PUT "localhost:9200/log_20180419/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
"properties": {
"title": {
"type": "text",
"norms": false
}
}
}
'
Third
#I insert data with python client : elastisearch-py
from elastisearch import Elastisearch
es_conn = Elastisearch()
content_tmp = "acxzcasiuchxzuicbhasuicgzyugas%s"
for i in range(10000):
result = content_tmp % i
es_conn.index(index="log_20180419", body = {"title":result}, doc_type="_doc")
Forth
I Query It
curl -X GET "localhost:9200/cdn_log_20180419/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"title":"dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
}
'
Result is
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 7.2293553,
"hits" : [
{
"_index" : "cdn_log_20180419",
"_type" : "_doc",
"_id" : "oDR99mIBBZEcRu0i7LlO",
"_score" : 7.2293553,
"_source" : {
"title" : "dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
]
}
}
You can see, it still has _score file in result, I get Confuse with it ?
The Doc is here https://www.elastic.co/guide/en/elasticsearch/reference/current/norms.html

The norm is only one part of scoring. The norm covers the field length norm and index-time boosting (if you are using that), but term frequency and inverse document frequency (TF/IDF) are independent of it.
If you don't need / want scoring for your query, look into boolean filters or constant score.

Related

Elasticsearch: how to find a document by number in logs

I have an error in kibana
"The length [2658823] of field [message] in doc[235892]/index[mylog-2023.02.10] exceeds the [index.highlight.max_analyzed_offset] limit [1000000]. To avoid this error, set the query parameter [max_analyzed_offset] to a value less than index setting [1000000] and this will tolerate long field values by truncating them."
I know how to deal with it (change "index.highlight.max_analyzed_offset" for an index, or set the query parameter), but I want to find the document with long field and examine it.
If i try to find it by id, i get this:
q:
GET mylog-2023.02.10/_search
{
"query": {
"terms": {
"_id": [ "235892" ]
}
}
}
a:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
q:
GET mylog-2023.02.10/_doc/235892
a:
{ "_index" : "mylog-2023.02.10", "_type" : "_doc", "_id" :
"235892", "found" : false }
Maybe this number (doc[235892]) is not id? How can i find this document?
try use Query IDs:
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}

Elasticsearch completion : strange behavior when multiple matches per document

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?
After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Elasticsearch index last update time

Is there a way to retrieve from ElasticSearch information on when a specific index was last updated?
My goal is to be able to tell when it was the last time that any documents were inserted/updated/deleted in the index. If this is not possible, is there something I can add in my index modification requests that will provide this information later on?
You can get the modification time from the _timestamp
To make it easier to return the timestamp you can set up Elasticsearch to store it:
curl -XPUT "http://localhost:9200/myindex/mytype/_mapping" -d'
{
"mytype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
}
}
}'
If I insert a document and then query on it I get the timestamp:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?pretty' -d '{
> fields : ["_timestamp"],
> "query": {
> "query_string": { "query":"*"}
> }
> }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"_timestamp" : 1417599223918
}
} ]
}
}
updating the existing document:
curl -XPOST "http://localhost:9200/myindex/mytype/1/_update" -d'
{
"doc" : {
"field1": "data",
"field2": "more data"
},
"doc_as_upsert" : true
}'
Re-running the previous query shows me an updated timestamp:
"fields" : {
"_timestamp" : 1417599620167
}
I don't know if there are people who are looking for an equivalent, but here is a workaround using shards stats for > Elasticsearch 5 users:
curl XGET http://localhost:9200/_stats?level=shards
As you'll see, you have some informations per indices, commits and/or flushs that you might use to see if the indice changed (or not).
I hope it will help someone.
Just looked into a solution for this problem. Recent Elasticsearch versions have a <index>/_recovery API.
This returns a list of shards and a field called stop_time_in_millis which looks like it is a timestamp for the last write to that shard.

elasticsearch: get only the elements that has a certain non-empty key from the url

I'de like to filter the results of an elasticsearch query and retrieve only those that have a non-empty field
For example. giving the following data
{
total: 4912,
max_score: 1,
hits: [
{
{
_index: "gcba",
_type: "bafici",
_id: "5a93472b-5db4-4ff9-8c8a-d13158e72d5f-62",
_score: 1,
_source: {
id_film: "23",
title: "great film",
}
},
{
_index: "gcba",
_type: "bafici",
_id: "2732fbf4-4e55-4794-8e98-e5d5fa6a0419-40",
_score: 1,
_source: {
name: "conference",
[...]
}
}
}
I'd like to issue something like
.../_search?from=1&size=100&q=id_film:'*'
to get only those elements with an id_film value
ES will only return documents that have that particular field by default when doing a wildcard query:
% curl -XPUT http://localhost:9200/foo/bar/1 -d '{"id":1,"id_film":23}'
{"ok":true,"_index":"foo","_type":"bar","_id":"1","_version":1}%
% curl -XPUT http://localhost:9200/foo/bar/2 -d '{"id":2,"foo":23}'
{"ok":true,"_index":"foo","_type":"bar","_id":"2","_version":1}%
% curl "http://localhost:9200/foo/_search?q=id_film:*&pretty=true"
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "foo",
"_type" : "bar",
"_id" : "1",
"_score" : 1.0, "_source" : {"id":1,"id_film":23}
} ]
}
}%
You can also use the exists (or missing) filters. See here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-exists-filter.html
Only thing is, it's a filter, not a query. To get it working with the search method, you need a query of match_all and exists as the filter. (or, use a constant_score query with this filter specified within it)
The docs for the new Exists Query have good illustrations.

Resources