Querying RavenDb with max 30 requests error - session

Just want to get some ideas from anyone who have encountered similar problems and how did you guys come up with the solution.
Basically, we have around 10K documents stored in RavenDB. And we need the ability to allow users to perform filter and search against those documents. I am aware that there is a maximum of 1024 page size within RavenDb. So in order for the filter and search to work, I need to do my own paging. But my solution gives me the following error:
The maximum number of requests (30) allowed for this session has been reached.
I have tried many different ways of disposing the session by wrapping it around using keyword and also explicitly calling Dispose after every call to RavenDb with no success.
Does anyone know how to get around this issue? what's the best practice for this kind of scenario?
var pageSize = 1024;
var skipSize = 0;
var maxSize = 0;
using (_documentSession)
{
maxSize = _documentSession.Query<LogEvent>().Count();
}
while (skipSize < maxSize)
{
using (_documentSession)
{
var events = _documentSession.Query<LogEvent>().Skip(skipSize).Take(pageSize).ToList();
_documentSession.Dispose();
//building finalPredicate codes..... which i am not providing here....
results.AddRange(events.Where(finalPredicate.Compile()).ToList());
skipSize += pageSize;
}
}

Raven limits the number of Request (Load, Query, ...) to 30 per Session. This behavior is documented.
I can see that you dispose the session in your code. But I don't see where you recreating the session. Anyways loading data they way you intend to do is not a good idea.
We're using indexes and paging and never load more than 1024.
If you're expecting thousands of documents or your precicate logic doesn't work as an index and you don't care about how long your query will take use the unbounded results API.
var results = new List<LogEvent>();
var query = session.Query<LogEvent>();
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
if (predicate(enumerator.Current.Document)) {
results.Add(enumerator.Current.Document);
}
}
}
Depending on the amount of document this will use a lot of RAM.

Related

Max number of classroom id retrieved

I have aprox 520 classrooms archived in my account, if I try to select them with
var courseList = Classroom.Courses.list({"courseStates":["ARCHIVED"]}).courses;
I get only 300 of them. Is this normal?
How can I select them all? Actually I'm writing a script to delete the oldest, but if I can't retrieve them, I can't delete them.
I understand that you got so many courses that the Courses.list() response is splitted in separate pages. In that case you can very easily navigate them by using tokens. First of all, make sure that you specify the pageSize in your request. That would set the desired amount of responses per page. Please keep in mind that the server may return fewer than the specified number of results, as it declared on the docs. In case that your response got divided into pages, the response would include the nextPageToken field. Then, to obtain the rest of courses, you have to repeat your request including that nextPageToken into the pageToken property. Please don't hesitate to ask me any doubt about this approach.
Thanks a lot Jaques, I found the solution:
var parametri = {"courseStates": "ARCHIVED"};
var page = Classroom.Courses.list(parametri);
var listaClassi = page.courses;
if (page.nextPageToken !== '') {
parametri.pageToken = page.nextPageToken;
page = Classroom.Courses.list(parametri);
listaClassi = listaClassi.concat(page.courses);
}
Anyway, I didn't need to change the pageSize, nor I found any tutorial about it.

Slow query over large collection

I'm working on an audit log which saves sessions in RavenDB. Initially, the website for querying the audit logs was responsive enough but as the amount of logged data has increased, the search page became unusable (it times out before returning using default settings - regardless of the query used). Right now we have about 45mil sessions in the table that gets queried but steady state is expected to be around 150mil documents.
The problem is that with this much live data, playing around to test things has become impractical. I hope some one can give me some ideas what would be the most productive areas to investigate.
The index looks like this:
public AuditSessions_WithSearchParameters()
{
Map = sessions => from session in sessions
select new Result
{
ApplicationName = session.ApplicationName,
SessionId = session.SessionId,
StartedUtc = session.StartedUtc,
User_Cpr = session.User.Cpr,
User_CprPersonId = session.User.CprPersonId,
User_ApplicationUserId = session.User.ApplicationUserId
};
Store(r => r.ApplicationName, FieldStorage.Yes);
Store(r => r.StartedUtc, FieldStorage.Yes);
Store(r => r.User_Cpr, FieldStorage.Yes);
Store(r => r.User_CprPersonId, FieldStorage.Yes);
Store(r => r.User_ApplicationUserId, FieldStorage.Yes);
}
The essense of the query is this bit:
// Query input paramters
var fromDateUtc = fromDate.ToUniversalTime();
var toDateUtc = toDate.ToUniversalTime();
sessionQuery = sessionQuery
.Where(s =>
s.ApplicationName == applicationName &&
s.StartedUtc >= fromDateUtc &&
s.StartedUtc <= toDateUtc
);
var totalItems = Count(sessionQuery);
var sessionData =
sessionQuery
.OrderByDescending(s => s.StartedUtc)
.Skip((page - 1) * PageSize)
.Take(PageSize)
.ProjectFromIndexFieldsInto<AuditSessions_WithSearchParameters.ResultWithAuditSession>()
.Select(s => new
{
s.SessionId,
s.SessionGroupId,
s.ApplicationName,
s.StartedUtc,
s.Type,
s.ResourceUri,
s.User,
s.ImpersonatingUser
})
.ToList();
First, to determine the number of pages of results, I count the number of results in my query using this method:
private static int Count<T>(IRavenQueryable<T> results)
{
RavenQueryStatistics stats;
results.Statistics(out stats).Take(0).ToArray();
return stats.TotalResults;
}
This turns out to be very expensive in itself, so optimizations are relevant both here and in the rest of the query.
The query time is not related to the amount of result items in any relevant way. If I use a different value for the applicationName parameter than any of the results, it is just as slow.
One area of improvement could be to use sequential IDs for the sessions. For reasons not relevant to this post, I found it most practical to use guid based ids. I'm not sure if I can easily change IDs of the existing values (with this much data) and I would prefer not to drop the data (but might if the expected impact is large enough). I understand that sequential ids result in better behaving b-trees for the indexes, but I have no idea how significant the impact is.
Another approach could be to include a timestamp in the id and query for documents with ids starting with the string matching enough of the time to filter the result. An example id could be AuditSessions/2017-12-31-24-31-42/bc835d6c-2fba-4591-af92-7aab96339d84. This also requires me to update or drop all the existing data. This of course also has the benefits of mostly sequential ids.
A third approach could be to move old data into a different collection over time, in recognition of the fact that you would most often look at the most recent data. This requires a background job and support for querying across collection time boundaries. It also has the issue that the collection with the old sessions is still slow if you need to access it.
I'm hoping there is something simpler than these solutions, such as modifying the query or the indexed fields in a way that avoids a lot of work.
At a glance, it is probably related to the range query on the StartedUtc.
I'm assuming that you are using exact numbers, so you have a LOT of distinct values there.
If you can, you can dramatically reduce the cost by changing the index to index on a second / minute granularity (which is usually what you are querying on), and then use Ticks, which allow us to use numeric range query.
StartedUtcTicks = new Datetime(session.StartedUtc.Year, session.StartedUtc.Month, session.StartedUtc.Day, session.StartedUtc.Hour, session.StartedUtc.Minute, session.StartedUtc.Second).Ticks,
And then query by the date ticks.

How to get a previous page from elasticsearch using search_after?

I'm trying to create fast pagination with ElasticSearch. I have read this doc page about search_after operator. I understand how to create a "forward" pagination. But I can't realize how to move to the previous page in this case.
In a project we are working on we’re going to simply reverse the sort direction and then use search_after as if it was a search_before.
This is a late answer, but it’s a little bit better than having to keep track of results on the application. For that specific scenario the Scroll APIs (that I don’t know if it was available at the time) should be more suited.
Although the API doesn't have a search previous, you have this workaround.
It's easy to move backward and I had to do it as well. Just keep track of it in variables in whatever language you're using.
I indexed an object with the searches with a pointer to keep track of where I was in the data. For example:
var $scope.search.display = 0;
var $scope.searchIndex = {};
var data = getElasticSearchQuery() //This is your data from the elastic query
if (!$scope.searchIndex[$scope.search.display + 10] && data.hits.length > 0) {
$scope.searchIndex[$scope.search.display + 10] = data.hits[data.hits.length - 1].sort;
}
If you have 'next' and 'previous' buttons, then in your POST request for elastic just assign the search_after parameter with the correct index:
$scope.prevButton = function(){
$scope.search.display -= 10;
if($scope.search.display < 10){
$scope.search.searchAfter = null;
}
if($scope.searchIndex[$scope.search.display]){
$scope.search.searchAfter = $scope.searchIndex[$scope.search.display]
}
$scope.sendResults(); //send the post in an elastic search query
};
$scope.nextButton = function() {
$scope.search.display += 10;
if($scope.searchIndex[$scope.search.display]){
$scope.search.searchAfter = $scope.searchIndex[$scope.search.display];
}
$scope.sendResults(); //send the post in an elastic search query
};
That should get you on your feet. The 10 is my size, meaning I have a pagination of 10 results.

How to cache IQueryable result for paging

What is the best way to cache Queryable result if every call need to calculate lot of things and return it to client.
Code Sample
[Queryable]
public IQueryable<Car> Get()
{
try
{
var result=GetCarList();
//GetCarList() calculation is taking around 1 min
return result.AsQueryable();
}
}
GetCarList()
{
var query = from car in db.CarDetail
where car.color == "white"
select car;
//10k records of white cars are selected with out considering makers
//white is mandatory
foreach (var car in query)
{
//Processing each record in every call
}
}
Query sample
First Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10
Second Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10$skip=10
Third Page
localhost/api/Car?$filter=(make eq 'ford')&$orderby=carid desc&$top=10$skip=20
Every time each call is taking 1 min even though the calculation is same for current filter. what is the best way to cache this kind of api call?
As the OP explains in his comment, the object to cache is the list returned by the call to GetCarList(); and the result is always the same.
You can simply store this in Cache, see docs: Cache Class.
When you need it, check if it's in cache. If not, create it and store in cache before using (anywhere you want to use it). As the Cache is thread safe you will not have concurernty problems by accesing it from different requests.

How to combine collection of linq queries into a single sql request

Thanks for checking this out.
My situation is that I have a system where the user can create custom filtered views which I build into a linq query on the request. On the interface they want to see the counts of all the views they have created; pretty straight forward. I'm familiar with combining multiple queries into a single call but in this case I don't know how many queries I have initially.
Does anyone know of a technique where this loop combines the count queries into a single query that I can then execute with a ToList() or FirstOrDefault()?
//TODO Performance this isn't good...
foreach (IMeetingViewDetail view in currentViews)
{
view.RecordCount = GetViewSpecificQuery(view.CustomFilters).Count();
}
Here is an example of multiple queries combined as I'm referring to. This is two queries which I then combine into an anonymous projection resulting in a single request to the sql server.
IQueryable<EventType> eventTypes = _eventTypeService.GetRecords().AreActive<EventType>();
IQueryable<EventPreferredSetup> preferredSetupTypes = _eventPreferredSetupService.GetRecords().AreActive<EventPreferredSetup>();
var options = someBaseQuery.Select(x => new
{
EventTypes = eventTypes.AsEnumerable(),
PreferredSetupTypes = preferredSetupTypes.AsEnumerable()
}).FirstOrDefault();
Well, for performance considerations, I would change the interface from IEnumerable<T> to a collection that has a Count property. Both IList<T> and ICollection<T> have a count property.
This way, the collection object is keeping track of its size and you just need to read it.
If you really wanted to avoid the loop, you could redefine the RecordCount to be a lazy loaded integer that calls GetViewSpecificQuery to get the count once.
private int? _recordCount = null;
public int RecordCount
{
get
{
if (_recordCount == null)
_recordCount = GetViewSpecificQuery(view.CustomFilters).Count;
return _recordCount.Value;
}
}

Resources