RavenDB time based query - performance

I have a document structure JobData that stores time based data starting from time 0 to time t in Ticks. And usually the data is a document per second.
public class JobData
{
long Ticks {get;set;}
double JobValue {get;set;}
}
For simplicity I am showing only one parameter JobValue, but in reality it is a complex graph of data. My question is if given a given an input time in Ticks, what kind of query would be the best for finding the last JobData based on a given tick?
So if the database has a document at 1000 ticks and then the next one at 2000 ticks, and the user wants to find the state at 1500 ticks, he/she should get the JobData at 1000 ticks as the answer.
The query I am using now is:
var jobData = documentSession.Query<JobData>().Where(t => t.Ticks <= 1500).OrderByDescinding(t => t.Ticks).FirstOrDefault();
Is this the right and most efficient query? I have thousands of these JobData nodes and want to just get to the one that is the closest.
Thanks!

Ahmad,
Yes, that is the way to go about it. And it would be very fast.

Related

LINQ AsNoTracking running slow

I have a need for a non edited list of items from my DB.
It was running slow so I was trying to squeak some speed increases.
So I added AsNoTracking to the LINQ query and it ran slower!
The following code took on average 7.43 seconds. AsNoTracking is after the Where
var result = await _context.SalesOrderItems.Where(x => x.SalesOrderId == SalesOrderId ).AsNoTracking().ToListAsync();
The following code took on average 8.62 seconds. AsNoTracking is before the Where
var result = await _context.SalesOrderItems.AsNoTracking().Where(x => x.SalesOrderId == SalesOrderId ).ToListAsync();
The following code took on average 6.95 seconds. The is no AsNoTracking
var result = await _context.SalesOrderItems.Where(x => x.SalesOrderId == SalesOrderId ).ToListAsync();
So am I missing something? I always though AsNoTracking() should run faster, and is ideal for read only list.
Also this table has two child table.
The first time a query is run it must be compiled. If entities are
already tracked by the context, then a tracking query will return
those instances rather than creating new.
That is the reason why tracked entities might execute faster then with AsNoTracking().
https://github.com/aspnet/EntityFrameworkCore/issues/14366
But execution time about 7 seconds indicates that it is not issue with/no Tracking. It meens issue with data base (not indexed column for example), About 1.5 bilions records are got with 30 mls but not seconds if data set up properly.

What is the difference between TYPE_STEP_COUNT_DELTA and AGGREGATE_STEP_COUNT_DELTA data type in Google Fit Android Api?

The Google Fit API describes two of these data types of the Sensor API. However both seem to be returning the same data. Can anyone explain the difference?
TYPE_STEP_COUNT_DELTA:
In the com.google.step_count.delta data type, each data point represents the number of steps taken since the last reading.
AGGREGATE_STEP_COUNT_DELTA:
Aggregate number of steps during a time interval.
You can see more here:
https://developers.google.com/android/reference/com/google/android/gms/fitness/data/DataType
// Setting a start and end date using a range of 1 week before this moment.
Calendar cal = Calendar.getInstance();
Date now = new Date();
cal.setTime(now);
long endTime = cal.getTimeInMillis();
cal.add(Calendar.WEEK_OF_YEAR, -1);
long startTime = cal.getTimeInMillis();
java.text.DateFormat dateFormat = getDateInstance();
Log.i(TAG, "Range Start: " + dateFormat.format(startTime));
Log.i(TAG, "Range End: " + dateFormat.format(endTime));
DataReadRequest readRequest = new DataReadRequest.Builder()
// The data request can specify multiple data types to return, effectively
// combining multiple data queries into one call.
// In this example, it's very unlikely that the request is for several hundred
// datapoints each consisting of a few steps and a timestamp. The more likely
// scenario is wanting to see how many steps were walked per day, for 7 days.
.aggregate(DataType.TYPE_STEP_COUNT_DELTA, DataType.AGGREGATE_STEP_COUNT_DELTA)
// Analogous to a "Group By" in SQL, defines how data should be aggregated.
// bucketByTime allows for a time span, whereas bucketBySession would allow
// bucketing by "sessions", which would need to be defined in code.
.bucketByTime(1, TimeUnit.DAYS)
.setTimeRange(startTime, endTime, TimeUnit.MILLISECONDS)
.build();

How to achieve dimensional charting on large dataset?

I have successfully used combination of crossfilter, dc, d3 to build multivariate charts for smaller datasets.
My current system caters to 1.5 million txns a day and I want to use the above combination to show dimensional charts on this big sized data (spanned over 6 months). I cannot push this sized data to the frontend for obvious reasons.
The txn data has seconds level granularity but this level of granularity is not required in the visualization. If txn data can be rolled up to a granularity of a day at the backend and push the day based aggregation to the front end then it can drastically reduce the IO traffic and size of the data given to the crossfilter,dc and then dc can show its visualization magic.
Taking forward the above idea -> I decided to reduce the size of the data by reducing the granularity of the timeseries data from millseconds to day by pre-aggregating the data from various dimensions using the below GROUP BY query (this is similar to the stuff done by crossfilter but at the frontend)
SELECT TRUNC(DATELOGGED) AS DTLOGGED, CODE, ACTION, COUNT(*) AS
TXNCOUNT, GROUPING_ID(TRUNC(DATELOGGED),CODE, ACTION) AS grouping_id
FROM AAAA GROUP BY GROUPING SETS(TRUNC(DATELOGGED),
(TRUNC(DATELOGGED),CURR_CODE), (TRUNC(DATELOGGED),ACTION));
Sample output of these rows:
Tuples/Rows in which aggregation is done by (TRUNC(DATELOGGED),CODE) will have a common grouping_id 1 and by (TRUNC(DATELOGGED),ACTION) will have a common grouping_id 2
//group by DTLOGGED, CODE
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"144","ACTION":"", "TXNCOUNT":69,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"376","ACTION":"", "TXNCOUNT":20,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"144","ACTION":"", "TXNCOUNT":254,"GROUPING_ID":1},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"376","ACTION":"", "TXNCOUNT":961,"GROUPING_ID":1},
//group by DTLOGGED, ACTION
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"ENROLLED_PURCHASE", "TXNCOUNT":373600,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"UNENROLLED_PURCHASE", "TXNCOUNT":48978,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"ENROLLED_PURCHASE", "TXNCOUNT":402311,"GROUPING_ID":2},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"UNENROLLED_PURCHASE", "TXNCOUNT":54910,"GROUPING_ID":2},
//group by DTLOGGED
{"DTLOGGED":"2013-08-03T07:00:00.000Z","CODE":"","ACTION":"", "TXNCOUNT":460732,"GROUPING_ID":3},
{"DTLOGGED":"2013-08-04T07:00:00.000Z","CODE":"","ACTION":"", "TXNCOUNT":496060,"GROUPING_ID":3}];
Questions:
These rows are are dis-joined i.e. not like usual rows where each row will have valid values for CODE and ACTION in a single row.
After a selection is made in one of the graphs, the redrawing effect either removes the other graphs or shows no data on them.
Please give me any troubleshooting help or suggest better ways to solve this?
http://jsfiddle.net/universallocalhost/5qJjT/3/
So there are a couple things going on in this question, so I'll try to separate them:
Crossfilter works with tidy data
http://vita.had.co.nz/papers/tidy-data.pdf
This means that you will need to come up with a naive method of filling in the nulls you're seeing (or if need be, in your initial query of the data, omit the nulled values. If you want to get really fancy, you could even infer the null values based off of other data. Whatever your solution, you need to make your data tidy prior to putting it into crossfilter.
Groups and Filtering Operations
txnVolByCurrcode = txnByCurrcode.group().reduceSum(function(d) {
if(d.GROUPING_ID ===1) {
return d.TXNCOUNT;
} else {
return 0;
}
});
This is a filtering operation done on the reduction. This is something that you should separate. Allow that filtering to occur elsewhere (either in the visual, crossfilter itself, or in the query on the data).
This means your reduceSum's become:
var txnVolByCurrcode = txnByCurrcode.group().reduceSum(function(d) {
return d.TXNCOUNT;
});
And if you would like the user to select which group to display:
var groupId = cfdata.dimension(function(d) { return d.GROUPING_ID; });
var groupIdGroup = groupId.group(); // this is an interesting name
dc.pieChart("#group-chart")
.width(250)
.height(250)
.radius(125)
.innerRadius(50)
.transitionDuration(750)
.dimension(groupId)
.group(groupIdGroup)
.renderLabel(true);
For an example of this working:
http://jsfiddle.net/b67pX/

SOQL - single row per each group

I have the following SOQL query to display List of ABCs in my Page block table.
Public List<ABC__c> getABC(){
List<ABC__c> ListABC = [Select WB1__c, WB2__c, WB3__c, Number, tentative__c, Actual__c, PrepTime__c, Forecast__c from ABC__c ORDER BY WB3__c];
return ListABC;
}
As you can see in the above image, WB3 has number of records for A, B and C. But I want to display only 1 record for each WB3 group based on Actual__c. Only latest Actual__c must be displayed for each WB3 Group.
i.e., Ideally I want to display only 3 rows(one each for A,B,C) in this example.
For this, I have used GROUPBY and displayed the result using AggregateResults. Here is the result.
I got the Latest Actual Date for each WB3 as shown above. But the Tentative date is not corresponding to it. The Tentative Date is also the MAX in the list.
Here is the code I used
public List<SiteMonitoringOverview> getSPM(){
AggregateResult[] AgR = [Select WB_3__c, MAX(Tentaive_Date__c) dtTentativeDate , MAX(Actual_Date__c) LatestCDate FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
if(AgR.size()>0){
for(AggregateResult SalesList : AgR){
CustSumList.add(new SiteMonitoringOverview(String.ValueOf(SalesList.ge​t('WB_3__c')), String.valueOf(SalesList.get('dtTentativeDate')), String.valueOF(SalesList.get('LatestCDate')) ));
}
}
return CustSumList;
}
I am forced to use MAX() for tentative date. I want the corresponding Tentative date of the MAX Actual Date. Not the Max Tentative Date.
For group A, the Tentative Date of Max Actual Date is 12/09/2012. But it is displaying the MAX tentative date: 27/02/2013. It should display 12/09/2012. This is because I am using MAX(Tentative_Date__c) in my code. Every column in the SOQL query must be either GROUPED or AGGREGATED. That's weird.
How do I get the required 3 rows in this example?
Any suggestions? Any different approach (looping within in groups)? how?
Just ran into this issue myself. The solution I came up with only works if you want the oldest or newest record from each grouping. Unfortunately it probably won't work in your case. I'll still leave this here incase it does happen to help someone searching for a solution to this issue.
AggregateResult[] groupedResults = [Select Max(Id), WBS_3__c FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
Calling MAX or MIN on the Id will let you get 1 record per group condition. You can then query other information. I my case I just need 1 record from each group and didn't really care which one it was.

Entity Framework 4 really slow on Nerd dinner FindByLocation modification

I have modified the nerd dinner example to find locations in the vicinity of specified position. When selecting from a flat table performance is good, but I wanted to split up the tables so I have a generic coordinates table (SDB_Geography) and also join in a table with specific data for what i call the entity type (HB_Entity).
I have made a new model called HbEntityModel which stores entity, hb and geography "sub models". Now the problem is that this query takes around 5 seconds to execute. I figured I would get a slight performance decrease by doing this but 5 seconds is just ridiculous. Any ideas on how to improve the performance with currrent table setup or do i have to go back to a monstrous flat table?
public IEnumerable<HbEntityModel> FindByLocation(float latitude, float longitude)
{
return (from entity in db.SDB_Entity.AsEnumerable()
join nearest in NearestEntities(latitude, longitude, 2)
on entity.EntityId equals nearest.EntityId
join hb in db.HB_Entity
on entity.EntityId equals hb.EntityId
join geo in db.SDB_Geography
on entity.GeographyId equals geo.GeographyId
select new HbEntityModel(entity, hb, geo)).AsEnumerable();
}
UPDATE
All tables contains around 14000 records.
SDB_Entity 1:0/1 SDB_Geography
SDB_Entity 1:0/1 HB_Entity
The search yields around 70 HbEntityModels.
If selecting from single table the same query takes 0.3s, using IQueryable instead of IEnumerable.
I found out how to do it with some help from Robban". See this post.
I rewrote the function to use a parameterless constructor and could then use IQueryable.
public IQueryable<HbEntityModel> FindByLocation(float latitude, float longitude)
{
return (from entity in db.SDB_Entity
join nearest in NearestEntities(latitude, longitude, 2)
on entity.EntityId equals nearest.EntityId
join hb in db.HB_Entity
on entity.EntityId equals hb.EntityId
join geo in db.SDB_Geography
on entity.GeographyId equals geo.GeographyId
select new HbEntityModel() { Shared=entity, Specific=hb, Geography=geo }).AsQueryable();
}
The query now takes around 0.4 seconds to execute which is somewhat acceptable. Hopefully things will be faster when my mean machine arrives. If someone could give me hints on how to improve the query, use a stored procedure or setup some index, i would be more than grateful.

Resources