Cannot get only number of hits in elastic search - elasticsearch

Im using _msearch api to send multiple queries to elastic.
I only need to know how many hits generates each query.
What I understood, you can use the size parameter by setting it to "0" in order to only get the count. However, I still get results with all the found documents. Here is my query:
{"index":"myindex","type":"things","from":0,,"size":0}
{"query":{"bool":{"must":[{"match_all":{}}],"must_not":[],{"match":
{"firstSearch":true}}]}}}, "size" : 0}
{"index":"myindex","type":"things","from":0,,"size":0}
{"query":{"bool":{"must":[{"match_all":{}}],"must_not":[],{"match":
{"secondSearch":true}}]}}}, "size" : 0}
Im using curl to get the results, this way:
curl -H "Content-Type: application/x-ndjson" -XGET localhost:9200/_msearch?pretty=1 --data-binary "#requests"; echo

Setting size as zero signifies that you are asking Elasticsearch to return all the documents which satisfies the query.
You can let Elasticsearch know that you do not need the documents by sending "_source" as false.
Example:
{
"query": {},
"_source": false,
}

You can use
GET /indexname/type/_count?
{ "query":
{ "match_all": {} }
}
please read more document: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-count.html

Related

Trino/presto with elastic : how to search nested objects?

I'm new to trino and I'm trying to use it to query nested objects in elastic search.
This is my mapping in elasticsearch:
{
"product_index": {
"mappings": {
"properties" :{
"id" : { "type" : "keyword"},
"name" { "type" : "keyword"},
"linked_products" :{
"type": "nested",
"properties" :{
"id" : { "type" : "keyword"}
}
}
}
}
}
}
I need to perform a query on the id field under linked_products .
what is the syntax in trino to perform a query on the id field?
Do I need to use special definitions on the target index mapping in elastic to map the nested section for trino?
=========================================================
Hi,
I will try to add some clarifications to my question.
We are trying to query the data according to the id field.
This is the query in Elastic:
get product_index/_search
{
"query": {
"nested" : {
"path" : "linked_products",
"query": {
"bool": {
"should" : [
{ "match" : {"linked_products.id" :123}}
]
}
}
}
}
}
We tried to query the id field in 2 ways:
Trino query -
select count(*)
from es_table aaa
where any_match(aaa.linked_products, x-> x.id=123)
When we try to query according to the id field the Pushdown to elastic doesn't happen and the connector retrieve all the documents to trino (this only happens with queries on nested documents).
send es-query from trino to elastic:
SELECT * FROM es.default."$query:"
It works but when we are trying to retrieve id's with many documents we got timeout from the elastic client.
I don't understand from the documentation if it is possible to perform scrolling when we are using es-query to avoid the timeout problem.
Trino maps nested object type to a ROW the same way that it maps a standard object type during a read. The nested designation itself serves no purpose to Trino since it only determines how the object is stored in Elasticsearch.
Assume we push the following document to your index.
curl -X POST "localhost:9200/product_index/_doc?pretty"
-H 'Content-Type: application/json' -d'
{
"id": "1",
"name": "foo",
"linked_products": {
"id": "123"
}
}
'
The way you would read this out in Trino would just be to use the standard ROW syntax.
SELECT
id,
name,
linked_products.id
FROM elasticsearch.default.product_index;
Result:
|id |name|id |
|---|----|---|
|1 |foo |123|
This is fine and well, but judging from the fact that the name of your nested object is plural, I'll assume you want to store an array of objects like so.
curl -X POST "localhost:9200/product_index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"id": "2",
"name": "bar",
"linked_products": [
{
"id": "123"
},
{
"id": "456"
}
]
}
'
If you run the same query as above, with the second document inserted, you'll get the following error.
SQL Error [58]: Query failed (#20210604_202723_00009_nskc4): Expected object for field 'linked_products' of type ROW: [{id=123}, {id=456}] [ArrayList]
This is because, Trino has no way of knowing which fields are arrays from the default Elasticsearch mapping. So to enable querying over this array, you'll need to follow the instructions in the docs to explicitly identify that field as an Array type in Trino using the _meta field. Here is the command that would be used in this example to indetify linked_products as an ARRAY.
curl --request PUT \
--url localhost:9200/product_index/_mapping \
--header 'content-type: application/json' \
--data '
{
"_meta": {
"presto":{
"linked_products":{
"isArray":true
}
}
}
}'
Now, you will need to account in the SELECT statement that linked_products is an ARRAY of type ROW. Not all of the indexes will have values, so you should use the index safe element_at function to avoid errors.
SELECT
id,
name,
element_at(linked_products, 1).id AS id1,
element_at(linked_products, 2).id AS id2
FROM elasticsearch.default.product_index;
Result:
|id |name|id1|id2 |
|---|----|---|----|
|1 |foo |123|NULL|
|2 |bar |123|456 |
=========================================================
Update to answer #gil bob's updated question.
There is currently no support for pushdown aggregates in the Elasticsearch connector but this is getting added in PR 7131
You can set the elasticsearch.request-timeout properties in your elasticsearch.properties file to increase the request timeout as a workaround until the pushdown occurs. If it's taking Elasticsearch this long to return it, this will need to get set whether you run the aggregation in Trino or Elasticsearch.

Elastic search query not returning results

I have an Elastic Search query that is not returning data. Here are 2 examples of the query - the first one works and returns a few records but the second one returns nothing - what am I missing?
Example 1 works:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"data.case.field1": "ABC123"
}
}
}
'
Example 2 not working:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": {
"term" : { "data.case.field1" : "ABC123" }
}
}
}
}
'
this is happening due to the difference between match and term queries, match queries are analyzed, which means it applied the same analyzer on the search term, which is used on field at index time, while term queries are not analyzed, and used for exact searches, and search term in term queries doesn't go through the analysis process.
Official doc of term query
Returns documents that contain an exact term in a provided field.
Official doc of match query
Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
If you are using text field for data.case.field1 without any explicit analyzer than the default analyzer(standard) for the text field would be applied, which lowercase the text and store the resultant token.
For your text, a standard analyzer would produce the below token, please refer Analyze API for more details.
{
"text" : "ABC123",
"analyzer" : "standard"
}
And generated token
{
"tokens": [
{
"token": "abc123",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now, when you use term query as a search term will not be analyzed and used as it is, which is in captical char(ABC123) it doesn't match the tokens in the index, hence doesn't return result.
PS: refer my this SO answer for more details on term and match queries.
What is your mapping for data.case.field1? If it is of type text, you should use a match query instead of term.
See the warning at the top of this page: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query
Unless we know the mapping type as text or keyword. It is relatively answering in the dark without knowing all the variables involved. May be you can try the following.
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": { <- Try this if you have datatype as keyword
"term" : { "data.case.field1" : "ABC123" }
}
}
}
}
'

How to use kibana with elasticSearch

Would like to get assist on how to insert a query into kibana, finally display result on kibana. My search quesry is like following:
curl -XPOST "http://localhost:9200/_search" -d'
{
"from": 0,
"size": 10,
"query": {
"term": {
"service": "http"
}
}
}'
Thank You
You can't directly build Kibana visualization using ElasticSearch DSL query.
You might want to try :
New Visualization
Write in the query bar : service:http
Add a bucket on the X Axis, select terms agreggation, choose your field for host.

Time out on querying large elastic search index

I queried a large index using a very large size, as I want to retrieve every matching document in a large index, but I got a timeout after a long time. No result is returned. Is there any other way to get all data without timing out?
My query:
{
"size": 90000000,
"query": {
"filtered": {"query": {"match_all":{}},"filter":{"term": {"isbn": 475869}}
}
}
}
You should use scrolling if you need to retrieve a large amount of data.
First, initiate the scroll with your query:
curl -XGET 'localhost:9200/your_index/your_type/_search?scroll=1m' -d '{
"size": 5000,
"query": {
"term" : {
"isbn" : "475869"
}
}
}'
Then you'll get the first 5000 documents as well as a _scroll_id token in the response, which you can use to perform the subsequent requests.
Then you can repeatedly perform the next requests using the scroll_id token from the previous response in order to get the next batch of 5000 documents, until you get no results anymore.
curl -XGET 'localhost:9200/_search/scroll' -d '{
"scroll" : "1m",
"scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1"
}'
Since you're using Jest, there's a SearchScroll class you can use. See in test cases how that class is used.

Delete all documents from index/type without deleting type

I know one can delete all documents from a certain type via deleteByQuery.
Example:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
But i have NO term and simply want to delete all documents from that type, no matter what term. What is best practice to achieve this? Empty term does not work.
Link to deleteByQuery
I believe if you combine the delete by query with a match all it should do what you are looking for, something like this (using your example):
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"match_all" : {}
}
}'
Or you could just delete the type:
curl -XDELETE http://localhost:9200/twitter/tweet
Note: XDELETE is deprecated for later versions of ElasticSearch
The Delete-By-Query plugin has been removed in favor of a new Delete By Query API implementation in core. Read here
curl -XPOST 'localhost:9200/twitter/tweet/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
From ElasticSearch 5.x, delete_by_query API is there by default
POST: http://localhost:9200/index/type/_delete_by_query
{
"query": {
"match_all": {}
}
}
You can delete documents from type with following query:
POST /index/type/_delete_by_query
{
"query" : {
"match_all" : {}
}
}
I tested this query in Kibana and Elastic 5.5.2
Torsten Engelbrecht's comment in John Petrones answer expanded:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d
'{
"query":
{
"match_all": {}
}
}'
(I did not want to edit John's reply, since it got upvotes and is set as answer, and I might have introduced an error)
Starting from Elasticsearch 2.x delete is not anymore allowed, since documents remain in the index causing index corruption.
Since ElasticSearch 7.x, delete-by-query plugin was removed in favor of new Delete By Query API.
The curl option:
curl -X POST "localhost:9200/my-index/_delete_by_query" -H 'Content-Type: application/json' -d' { "query": { "match_all":{} } } '
Or in Kibana
POST /my-index/_delete_by_query
{
"query": {
"match_all":{}
}
}
The above answers no longer work with ES 6.2.2 because of Strict Content-Type Checking for Elasticsearch REST Requests. The curl command which I ended up using is this:
curl -H'Content-Type: application/json' -XPOST 'localhost:9200/yourindex/_doc/_delete_by_query?conflicts=proceed' -d' { "query": { "match_all": {} }}'
In Kibana Console:
POST calls-xin-test-2/_delete_by_query
{
"query": {
"match_all": {}
}
}
(Reputation not high enough to comment)
The second part of John Petrone's answer works - no query needed. It will delete the type and all documents contained in that type, but that can just be re-created whenever you index a new document to that type.
Just to clarify:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet'
Note: this does delete the mapping! But as mentioned before, it can be easily re-mapped by creating a new document.
Note for ES2+
Starting with ES 1.5.3 the delete-by-query API is deprecated, and is completely removed since ES 2.0
Instead of the API, the Delete By Query is now a plugin.
In order to use the Delete By Query plugin you must install the plugin on all nodes of the cluster:
sudo bin/plugin install delete-by-query
All of the nodes must be restarted after the installation.
The usage of the plugin is the same as the old API. You don't need to change anything in your queries - this plugin will just make them work.
*For complete information regarding WHY the API was removed you can read more here.
You have these alternatives:
1) Delete a whole index:
curl -XDELETE 'http://localhost:9200/indexName'
example:
curl -XDELETE 'http://localhost:9200/mentorz'
For more details you can find here -https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
2) Delete by Query to those that match:
curl -XDELETE 'http://localhost:9200/mentorz/users/_query' -d
'{
"query":
{
"match_all": {}
}
}'
*Here mentorz is an index name and users is a type
I'm using elasticsearch 7.5 and when I use
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -d'
{
"query": {
"match_all": {}
}
}'
which will throw below error.
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
I also need to add extra -H 'Content-Type: application/json' header in the request to make it works.
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
{
"took" : 465,
"timed_out" : false,
"total" : 2275,
"deleted" : 2275,
"batches" : 3,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
Just to add couple cents to this.
The "delete_by_query" mentioned at the top is still available as a plugin in elasticsearch 2.x.
Although in the latest upcoming version 5.x it will be replaced by
"delete by query api"
Elasticsearch 2.3 the option
action.destructive_requires_name: true
in elasticsearch.yml do the trip
curl -XDELETE http://localhost:9200/twitter/tweet
For future readers:
in Elasticsearch 7.x there's effectively one type per index - types are hidden
you can delete by query, but if you want remove everything you'll be much better off removing and re-creating the index. That's because deletes are only soft deletes under the hood, until the trigger Lucene segment merges*, which can be expensive if the index is large. Meanwhile, removing an index is almost instant: remove some files on disk and a reference in the cluster state.
* The video/slides are about Solr, but things work exactly the same in Elasticsearch, this is Lucene-level functionality.
If you want to delete document according to a date.
You can use kibana console (v.6.1.2)
POST index_name/_delete_by_query
{
"query" : {
"range" : {
"sendDate" : {
"lte" : "2018-03-06"
}
}
}
}

Resources