Check if document exists or not in elasticsearch - elasticsearch

I want to check if a document with specific field value does exist or not in elasticsearch.
I have gone through internet but only found how to check if a field exists or not.
My Index/Type is
/twitter/user
username is one field in document.
I want to check if username="xyz" exists or not in this Type.

You can query with size 0. total value will give you an idea that doc exists or not.
GET /twitter/user/_search
{"size": 0,
"query": {"match": {
"username": "xyz"
}}}
Edited --
_count api can be used as well.
GET /twitter/user/_count
{ "query": {"match": {
"username": "xyz"
}}}

From the documentation:
If all you want to do is to check whether a document exists—you’re not interested in the content at all—then use the HEAD method instead of the GET method. HEAD requests don’t return a body, just HTTP headers:
curl -i -XHEAD http://localhost:9200/twitter/user/userid
Elasticsearch will return a 200 OK status code if the document exists ... and a 404 Not Found if it doesn’t exist
Note: userid is the value of the _id field.

simply search the document, if it exists it will return result otherwise not
http://127.0.0.1:9200/twitter/user/_search?q=username:xyz
and the exact what are you looking for is
http://127.0.0.1:9200/twitter/user/_search/exists?q=username:xyz
it will return exists true or false
{
"exists": false
}

You can use term query with size 0. See below query for reference:
POST twitter/user/_search
{
"query": {
"term": {
"username": "xyz"
}
},
"size" : 0
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
}
}
You will get total document count in hits.total and then you can check for count > 0

Related

elastic range query giving only top 10 records

I am using elastic search query range to get the records from one date to other date using python, but I am only getting 10 records.
Below is the query
{"query": {"range": {"date": {"gte":"2022-01-01 01:00:00", "lte":"2022-10-10 01:00:00"}}}}
Sample Output:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 1.0,
"hits": [{"_source": {}}]
}
The "hits" list consists of only 10 records. When I checked in my database there are more than 10 records.
Can anyone tell me how to modify the query to get all the records for the mentioned date ranges?
You need to use the size param as by default Elasticsearch returns only 10 results.
{
"size" : 100, // if you want to fetch return 100 results.
"query": {
"range": {
"date": {
"gte": "2022-01-01 01:00:00",
"lte": "2022-10-10 01:00:00"
}
}
}
}
Refer Elasticsearch documentation for more info.
Update-1: Since OP wants to know the exact count of records matching search query(refer comments section), one use &track_total_hits=true in the URL (hint: causes performance issues) and then in the search response under hits you will get exact records matching your search as shown.
POST /_search?track_total_hits=true
"hits": {
"total": {
"value": 24981859, // note count with relation equal.
"relation": "eq"
},
"max_score": null,
"hits": []
}
Update-2, OP mentioned in the comments, that he is getting only 10K records in one query, as mentioned in the chat, its restericted by Elasticsearch due to performance reasons but if you still want to change it, it can be done by changing below setting of an index.
index.max_result_window: 20000 // your desired count
However as suggested in documentation
index.max_result_window The maximum value of from + size for searches
to this index. Defaults to 10000. Search requests take heap memory and
time proportional to from + size and this limits that memory. See
Scroll or Search After for a more efficient alternative to raising
this.

Significant terms buckets always empty

I have a collection of posts with their tags imported into Elasticsearch. The indexes are:
language - type: keyword
tags (array) - type: keyword
created_at - type: date
Single document looks like that:
{ "language": "en", "tags": ["foo", "bar"], created_at: "..." }
I'm trying to get the significant terms query on my data set using:
GET _search
{
"aggregations": {
"significant_tags": {
"significant_terms": {
"field": "tags"
}
}
}
}
But the results bucket are always empty:
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"aggregations": {
"significant_tags": {
"doc_count": 2945,
"bg_count": 2945,
"buckets": []
}
}
}
I can confirm the data is properly imported as i'm able to any other aggregation on this dataset and it works fine. Just the significant terms don't want to cooperate. Any ideas on what am i possibly doing wrong in here?
Elasticsearch 6.2.4
Significant terms needs a foreground query or aggregation for it to calculate difference in term frequencies and produce statistically significant results. So you will need to add a initial query then your aggregation. See the doc for details https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

Converting from SQL to elasticsearch query

Elasticsearch noob and need help with a query. I have the following SQL query that I need to convert to a query to Elasticsearch
SELECT COUNT(*)
FROM table
WHERE Message LIKE '%Communication has failed.%'
AND [Date] > CONVERT( CHAR(8), GetDate(), 112) + ' 07:40:00'
AND [Date] < CONVERT( CHAR(8), GetDate(), 112) + ' 22:15:00'
I want to run the query against elasticsearch using curl and need help composing the query.
[Date] is equal to #timestamp in Elasticsearch document. Would also be nice if the elastic query syntax had and equivalent of getdate() to the current data.
The easiest way to get a count of all results is using the hits field in the result set. If you want to to that, your query would be:
POST /my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"message": "*Communication has failed*"
}
},
{
"range": {
"my_date_field": {
"gte": "01/01/2018",
"lt": "01/02/2018",
"format": "dd/MM/yyyy"
}
}
}
]
}
},
"size": 0
}
Notice I am doing a range query on the date and a match query on the message.
I also set a size to zero b/c I only want to get back the hits values. The result would look like this:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 346,
"max_score": 0,
"hits": []
}
}
You could also use an aggregate query, but in this case you would want to aggregate on one field and the result of the aggregate would be the same as the hits field value. Think of an aggregate like the GROUP BY function in SQL. If you were only to GROUP BY one group, then your group would be equal to the value of COUNT(*). If you are interested in learning more about aggregates the documation is here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html

Scan And Scroll Query In ElasticSearch

I am using ES 1.5 version, I used the scroll query, id ,I ts working fine and returning the scroll id
GET /ecommerce_parts/_search?search_type=scan&scroll=3m
{
"query": { "match_all": {}},
"size": 1000
}
{
"scrollid": "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==",
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 184610,
"max_score": 0,
"hits": []
}
}
Now when I passing the scroll id to retrieve the set of documents its throwing an error
GET /_search/scroll?scroll=1m
c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
{
"error": "ElasticsearchIllegalArgumentException[Malformed scrollId []]",
"status": 400
}
GET
/_search/scroll?scroll=3mc2Nhbjs1OzIzOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MjE6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsyNTpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzI0Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MjI6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==
Both of the above queries giving me an error how to retrieve the documents based on the scroll id, If m wrong please suggest a query
For subsequent requests, you're supposed to send the scroll ID in a JSON payload like this:
POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw=="
}
In earlier versions of ES you could do it like this, too:
POST /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzE4Okk4ckVkSld0UXdDU212UU1kNWZBU1E7MTc6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxNjpJOHJFZEpXdFF3Q1NtdlFNZDVmQVNROzIwOkk4ckVkSld0UXdDU212UU1kNWZBU1E7MTk6SThyRWRKV3RRd0NTbXZRTWQ1ZkFTUTsxO3RvdGFsX2hpdHM6MTg0NjEwOw==

Elasticsearch term facet not showing negative terms

I am using elastic search term facets, My field contains some negative values but the facet is ignoring the negative sign
following is the facet query
http://myserver.com:9200/index/type/_search
Get/Post body
{
"facets" : {
"school.id" : {
"terms" : {
"field" : "school.id",
"size" : 10
}
}
}
}
Response
{
"took": 281,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"facets": {
"school.id": {
"_type": "terms",
"missing": 302,
"total": 4390,
"other": 0,
"terms": [
{
"term": "1113515007867355135",
"count": 4390
}
]
}
}
}
The actual value of id is -1113515007867355135, am I doing something wrong or do I need to pass anything to include negative sign (stemming issue)?
The negative sign is a special character in Lucene (and ElasticSearch).
While indexing and searching you need to escape it.
Try adding a \ before the - character in your index, that should bring it up in the facet as well.
Got the answer from Elasticsearch Google Group. Need to update the mapping of the field
Possible Solution:
Update the mapping and use
"index":"analyzed","analyzer" : "keyword"
or
"index": "not_analyzed"

Resources