Elastisearch sql how to bucket time - elasticsearch

elasticsearch = 7.16.1
using this in python, this elasticsearch sql query seems to work to get the data I want within the time range:
es.sql.query(body={'query':"select * from \"index-*\" where \"#timestamp\" >= CAST('2022-06-30T08:00:00.000Z'AS DATETIME) and \"#timestamp\" <= CAST('2022-07-10T08:00:00.000Z'AS DATETIME) order by \"#timestamp\" desc "})
But it's returning all rows within that time.
I want to be able to get bucket by minutes, hours, or day. So it returns less rows basically but I can still get the right totals
Couldn't find where to do that here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-datetime.html

Related

Subtotal plus limiting data set

I'm brand-spakin' new to SQL and was asked to help write a query for a report. I need to limit the data to the last 10 services done by a clinician, and then subtotal the difference between the two times (time in and out) for each clinician.
I'm guessing I need to do a "LIMIT" clause to limit the data, but I'm not sure how or where to put that information. I am also thinking I need to use "GROUP BY", but not positive on that either. Any help would be appreciated.
I tried simplifying the existing query that my boss started but I'm getting error messages about the GROUP BY clause because I don't have an aggregate.
Select CV.emp_name,
CV.Visittype,
CVt.clientvisit_id,
CV.client_id,
CV.rev_timein,
CV.rev_timeout,
Convert(varchar(25),Cast(CV.rev_timein As Time),8) As Start_Time,
CV.program_id,
CV.cptcode
From ClientVisit CV
Where CV.visittype = 'Mobile Therapy' And CV.program_id = 31
And CV.cptcode <> 'NB' And CV.rev_timein <=
Convert(datetime,IsNull(#param2, GetDate())) And CV.rev_timein >=
Convert(datetime,IsNull(#param1, GetDate())) And
Cast(CV.rev_timein As time) > '15:59'
Group By CV.emp_name,
CV.rev_timein

Retrieve data according to timestamp value from Room Database

I'm trying to retrieve entries that were written in the last 2 minutes into the database.
I use the following query:
#Query("SELECT * FROM Contacts where ('now()' - gatt_server_connection_timestamp) <= 120000")
List<Contact> getContactsByGattServerConnectionTimestamp();
However the result I get is the whole database.
What is wrong with this query?
The SQLite date-time functions are described here.
If gatt_server_connection_timestamp is in milliseconds since epoch, this query should work:
#Query("SELECT * FROM Contacts where gatt_server_connection_timestamp >= (1000 * strftime('%s', datetime('now', '-2 minutes'))))
List<Contact> getContactsByGattServerConnectionTimestamp();

Power Query (M language) 50 day moving Average

I have a list of products and would like to get a 50 day simple moving average of its volume using Power Query (M).
The table is sorted by product name and date. I add a custom column and applied the code below.
if [date] >= #date(2018,1,29)
then List.Average(List.Range(Source[Volume],[Volume]-1,-50))
else ""
Since it is already sorted by date and name, an if statement was applied with a date as criteria/filter. However, an error occurs that says
'Volume' column not found in the table.
I expect to have an added column in the power query with volume 50 day moving average per product. the calculation to be done if date is greater than or equal Jan 29, 2018.
We don't know what your columns are, but assuming you have [product], [date] and [volume] in Source, this would average the last 50 days of [volume] for the identical [product] based on each [date], and place in a new column
AvgAmountAdded = Table.AddColumn(Source, "AverageAmount", (i) => List.Average(Table.SelectRows(Source, each ([product] = i[product] and [date]<=i[date] and [date]>=Date.AddDays(i[date],-50)))[volume]), type number)
Finally! found a solution.
First, apply Index by product see this post for further details
Then index again without criteria (index all rows)
Then, apply below code
= Table.AddColumn(#"Previous Step", "Volume SMA(50)", each if [Index_byProduct] >= 50 then List.Average(List.Range(#"Previous Step"[Volume], ([Index_All]-50),50)) else 0),
For large dataset, Table.Buffer function is recommended after index-expand step to improve PQ calculation speed

Linq Dynamic Query tried to search data within a year get no results

I am building a Dynamic query to search a collection of documents using LINQ. (please referred to Scott Gu's blog http://weblogs.asp.net/scottgu/dynamic-linq-part-1-using-the-linq-dynamic-query-library).
I can return the documents modified since last week, since last month without any problem! However, if I tried to return the documents last modified from last year to now. I got no result returned. And when I tried to get any documents modified a year ago, I got only one week results from 3 weeks ago)..
Does anyone know why? Below is my code:
// tried to get last year till now: (no result)
( LastModifiedStr >= \"4/27/2014\" and LastModifiedStr <= \"4/28/2015\" )
// tried to get documents older than one year. (results from 4/3/2015-4/9/2015)
( LastModifiedStr >= \"4/27/2014\" )
You cannot compare the string. The LastModified field should be a DateTime format, also inside your query builder, you probably want to convert the string to datetime as well. something like:
( LastModified >= Convert.ToDateTime(\"4/27/2014\") and LastModifiedStr <= Convert.ToDateTime(\"4/28/2015\") )

Cassandra slow get_indexed_slices speed

We are using Cassandra for log collecting.
About 150,000 - 250,000 new records per hour.
Our column family has several columns like 'host', 'errorlevel', 'message', etc and special indexed column 'indexTimestamp'.
This column contains time rounded to hours.
So, when we want to get some records, we use get_indexed_slices() with first IndexExpression by indexTimestamp ( with EQ operator ) and then some other IndexExpressions - by host, errorlevel, etc.
When getting records just by indexTimestamp everything works fine.
But, when getting records by indexTimestamp and, for example, host - cassandra works for long ( more than 15-20 seconds ) and throws timeout exception.
As I understand, when getting records by indexed column and non-indexed column, Cassandra firstly gets all records by indexed column and than filters them by non-indexed columns.
So, why Cassandra does it so slow? By indexTimestamp there are no more than 250,000 records. Isn't it possible to filter them at 10 seconds?
Our Cassandra cluster is running on one machine ( Windows 7 ) with 4 CPUs and 4 GBs memory.
You have to bear in mind that Cassandra is very bad with this kind of queries. Indexed columns queries are not meant for big tables. If you want to search for your data around this type of queries you have to tailor your data model around it.
In fact Cassandra is not a DB you can query. It is a key-value storage system. To understand that please go there and have a quick look: http://howfuckedismydatabase.com/
The most basic pattern to help you is bucket-rows and ranged range-slice-queries.
Let's say you have the object
user : {
name : "XXXXX"
country : "UK"
city : "London"
postal_code :"N1 2AC"
age : "24"
}
and of course you want to query by city OR by age (and & or is another data model yet).
Then you would have to save your data like this, assuming the name is a unique id :
write(row = "UK", column_name = "city_XXXX", value = {...})
AND
write(row = "bucket_20_to_25", column_name = "24_XXXX", value = {...})
Note that I bucketed by country for the city search and by age bracket for age search.
the range query for age EQ 24 would be
get_range_slice(row= "bucket_20_to_25", from = "24-", to = "24=")
as a note "minus" == "under_score" - 1 and "equals" == "under_score" + 1, giving you effectively all the columns that start with "24_"
This also allow you to query for age between 21 and 24 for example.
hope it was useful

Resources