Convert Sql to ES query - elasticsearch

I want to get All the records that has latest "created_on" time from elastic search documents.
In sql what i need is
select * from table1
where created_on = (select max(created_on) from table1)
But i'm new to ES and don't know how to do it.
I Can first get the Max(created_on) date from ES and query again to get all the records that has Max(created_on).
Is there a way to get this with single query?

{
"filter" : {
"match_all" : { }
},
"sort": [
{
"created_on": {
"order": "desc"
}
}
],
"size": 1
}
You can try this query and let me know if this works.

You can try
// descending order
var entity= ctx.Table1.OrderByDescending(s => s.Created_on).FirstOrDefault();

Related

Use query result as parameter for another query in Elasticsearch DSL

I'm using Elasticsearch DSL, I'm trying to use a query result as a parameter for another query like below:
{
"query": {
"bool": {
"must_not": {
"terms": {
"request_id": {
"query": {
"match": {
"processing.message": "OUT Followup Synthesis"
}
},
"fields": [
"request_id"
],
"_source": false
}
}
}
}
}
}
As you can see above I'm trying to search for sources that their request_id is not one of the request_idswith processing.message equals to OUT Followup Synthesis.
I'm getting an error with this query:
Error loading data [x_content_parse_exception] [1:1660] [terms_lookup] unknown field [query]
How can I achieve my goal using Elasticsearch DSL?
Original question extracted from the comments
I'm trying to fetch data with processing.message equals to 'IN Followup Sythesis' with their request_id doesn't appear in data with processing.message equals to 'OUT Followup Sythesis'. In SQL language:
SELECT d FROM data d
WHERE d.processing.message = 'IN Followup Sythesis'
AND d.request_id NOT IN (SELECT request_id FROM data WHERE processing.message = 'OUT Followup Sythesis');
Answer: generally speaking, neither application-side joins nor subqueries are supported in Elasticsearch.
So you'll have to run your first query, take the retrieved IDs and put them into a second query — ideally a terms query.
Of course, this limitation can be overcome by "hijacking" a scripted metric aggregation.
Taking these 3 documents as examples:
POST reqs/_doc
{"request_id":"abc","processing":{"message":"OUT Followup Synthesis"}}
POST reqs/_doc
{"request_id":"abc","processing":{"message":"IN Followup Sythesis"}}
POST reqs/_doc
{"request_id":"xyz","processing":{"message":"IN Followup Sythesis"}}
you could run
POST reqs/_search
{
"size": 0,
"query": {
"match": {
"processing.message": "IN Followup Sythesis"
}
},
"aggs": {
"subquery_mock": {
"scripted_metric": {
"params": {
"disallowed_msg": "OUT Followup Synthesis"
},
"init_script": "state.by_request_ids = [:]; state.disallowed_request_ids = [];",
"map_script": """
def req_id = params._source.request_id;
def msg = params._source.processing.message;
if (msg.contains(params.disallowed_msg)) {
state.disallowed_request_ids.add(req_id);
// won't need this particular doc so continue looping
return;
}
if (state.by_request_ids.containsKey(req_id)) {
// there may be multiple docs under the same ID
// so concatenate them
state.by_request_ids[req_id].add(params._source);
} else {
// initialize an appendable arraylist
state.by_request_ids[req_id] = [params._source];
}
""",
"combine_script": """
state.by_request_ids.entrySet()
.removeIf(entry -> state.disallowed_request_ids.contains(entry.getKey()));
return state.by_request_ids
""",
"reduce_script": "return states"
}
}
}
}
which'd return only the correct request:
"aggregations" : {
"subquery_mock" : {
"value" : [
{
"xyz" : [
{
"processing" : { "message" : "IN Followup Sythesis" },
"request_id" : "xyz"
}
]
}
]
}
}
⚠️ This is almost guaranteed to be slow and goes against the suggested guidance of not accessing the _source field. But it also goes to show that subqueries can be "emulated".
💡 I'd recommend to test this script on a smaller set of documents before letting it target your whole index — maybe restrict it through a date range query or similar.
FYI Elasticsearch exposes an SQL API, though it's only offered through X-Pack, a paid offering.

ElasticSearch - Use results of one query as a filter for another

I'm trying to convert to t-sql query to elastic dsl.
I have 2 tables and I'm trying to fetch the data from second table by filtering with the selected guids from first query.
SELECT distinct guid INTO #temp
FROM table1
WHERE code = 13
AND DateDiff(day, log_time, '2019-08-24') = 0
SELECT details,CreatedDate,guid
FROM table2
WHERE guid IN (SELECT guid FROM #temp)
ORDER BY CreatedDate desc,guid
I have tried to convert the first query on index1 in elastic dsl:
{
"query": {
"bool": {
"filter": [
{ "term": { "code": 13 }},
{ "range": { "log_time": { "gt" : "2019-08-23 23:59:59.99",
"lte" : "2019-08-24 23:59:59.99"}}}
]
}
},
"_source": {
"includes": ["_id", "guid"],
"excludes": []
}
}
Now I want to use the guids return by this query as a filter in second query on index2.
Is this possible in elasticsearch?
I know a join is not supported in elasticsearch. But is there any way to perform a subquery in elasticsearch?
I have looked into terms lookup but it seems this only works on _id of the document.
But in my case the guid which is the unique identifier in sql-server would just be a document property in elasticasearch.
Is there a way to tweak terms lookup to work on a field other than _id?

How to achieve the following sql query in elasticsearch?

Want to know the equivalent elasticsearch query for the below sql query?
SELECT * FROM table1 where val1 in (SELECT val1 FROM table1 WHERE val2 = "123");
How to achieve this in an effiecient way?
One way is to fetch all val1 in 1st Elasticsearch query and with the val1 values fetch all values in the 2nd Elasticsearch query. Is there any other way with which we can get the results in a single Elasticsearch query instead of two Elasticsearch query
You could have your query as such assuming that your heading towards a HTTP POST request.
Request:
http://localhost:9200/yourindex/_search
Request Body:
{
"query": {
"query_string": {
"query": "val1:(val2:\"123\")"
}
}
}
Instead of using the IN keyword, you could go with the : symbol in ES OR you could still use the Terms Query . This SO & this thread could be helpful.
EDIT
Using the terms query:
{
"query" : {
"bool" : {
"filter" : {
"terms" : {
"val1" : [ "val2" : ["123"]]
}
}
}
}
}

Elasticsearch records from one type which DO NOT exists in other type

I have 1 index with 2 types in Elasticsearch. I want to query for all the records in type1 that DO NOT exists in type2.
SQL equivalent would be something like;
SELECT * FROM index/type1 AS t1 WHERE t1.uid NOT IN (SELECT t2.uid FROM index/type2 AS t2)
Any suggestions on how I can go about this? I'm using elasticsearch-2.2.0 (Java API). Thank you!
What you are trying to do is definitely not possible with Elasticsearch since it's not a relational store.
There is a possibility to avoid passing terms explicitly but they should be present in some field of some document (Terms lookup) :
{
"bool": {
"must_not": {
"term": {
"uid": {
"index": "document-index",
"type": "document-type",
"id": "document-id",
"path": "path-to-the-array-property-containing-the-terms"
}
}
}
}
}

Filters with AND on nested resources

My problem is : Elasticsearch count is not the same than my database.
I indexed "users" table, each user can have one or many apps_events :
curl localhost:9200/users/_count
{"count":190291,"_shards":{"total":5,"successful":5,"failed":0}}
SELECT COUNT(*) FROM users
count : 190291
=> Same count, everything is ok !
But, when I do a search on 2 filters, one term and one terms one the nested resource :
curl -X GET 'http://localhost:9200/users/user/_search?load=&size=10&pretty' -d '
{
"query": {
"match_all": {
}
},
"filter": {
"and": [
{
"terms": {
"apps_events.type": [
"sale"
]
}
},
{
"term": {
"apps_events.status": "active"
}
}
]
},
"size": 10
}
total : 63756
And in my database :
SELECT
COUNT(DISTINCT(users_id))
FROM
apps_event
WHERE
apps_event_state_id = 1 AND apps_event_project_id = 2;
count : 63340
Because in fact, elasticsearch SQL equivalent query is:
SELECT
COUNT(DISTINCT(users_id))
FROM apps_event
WHERE apps_event_state_id = 1
AND users_id IN
(SELECT DISTINCT(users_id) FROM apps_event WHERE apps_event_project_id = 2)
count : 63756
===> How I can do a simple "AND" for each resource ?
Thanks
You've probably checked this, but is apps_event_project_id the right corollary to apps_events.type? They don't seem the same on the surface, but you would know that for sure. Also, does users_id map directly to ES _id? It could be that you've got duplicates in your index which inflate its count.
Best resource ever for "nested resource" :
http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

Resources