Using Linq is there a more efficient way to do this?
IDataReader reader = qSeats.ExecuteReader();
var seats = new List<int>();
using (IDataReader reader = qSeats.ExecuteReader())
{
while (reader.Read())
{
seats.Add(Convert.ToInt32(reader.GetInt32(0)));
}
}
I saw: How do I load data to a list from a dataReader?
However this is the same code I have above, and it seems like there could be faster ways.
Like using Linq or seats.AddRange() or some kind of ToList()
DataReader is read-only, forward-only stream of data from a database. The way you are doing it is probably the fastest.
Using the DataReader can increase application performance both by
retrieving data as soon as it is available, rather than waiting for
the entire results of the query to be returned, and (by default)
storing only one row at a time in memory, reducing system overhead.
Retrieving Data Using the DataReader
Related
I need to query a big dataset from DB. Actually I'm gonna use pagination parameters (limit and offset) to avoid loading large dataset into heap. For that purpose I'm trying to fetch rows with RowCallBackHadler interface, because docs say An interface used by JdbcTemplate for processing rows of a ResultSet on a per-row basis. and also I've read advices to use that interface to deal with rows one by one.
But something goes wrong every time when I try to fetch data. Here my code below and also screenshot from visualVM with heap space graphic which indicates that all rows were loaded into memory. Query, which I'm trying to execute, returns something around 1.5m rows in DB.
// here just sql query, map with parameters for query, pretty simple RowCallbackHandler
jdbcTemplate.query(queryForExecute, params, new RowCallbackHandler() {
#Override
public void processRow(ResultSet rs) throws SQLException {
while (rs.next()) {
System.out.println("test");
}
}
});
heap via visualVM:
update: I made a mistake when called rs.next(), but removing this line didn't change the situation with loading rows into memory at all
The main problem was with understanding documentation. Doc says
An interface used by JdbcTemplate for processing rows of a ResultSet on a per-row basis.
Actually my code does things in right way: returns me a ResultSet which contains all row (because limit is not defined). I had no confidence that adding LIMIT to any sql query will work good and decided to implement LIMIT via RowCallbackHandler and it was a bad idea, because LIMIT works great with all type of sql queries (complex and simple).
What do developers commonly use as the key and value to cache the result from a SQL query into Redis? For example, if I have a Users table, and I want to cache the results from the query:
SELECT name, age FROM Users
1) Which Redis data structure should I use? Should I just have a single Key for the query and store the entire object returned by the database as the Value as such:
{ key: { object returned by database } }
Or should I use Redis' List data structure and loop through the rows individually and push them into the List as such:
{ key: [ ... ]}
Wouldn't this add computation time of O(N)? How is this more effective than just simply storing the object returned by the database?
Or should I use Redis' Hash Map data structure and loop through the rows individually and set a unique Key for each row with its corresponding attributes as such:
{ key1: {name: 'Bob', age: 25} }, { key2: {name: 'Sally', age: 15} }, ...
2) What would be a good rule of thumb with regards to the Key? From my understanding, some people just use the SQL query as the Key? But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)? Is this the best way to do it? If you are using an ORM, do you still use the SQL query as the key?
This is nicely analyzed in the Database Caching Strategies Using Redis whitepaper, by AWS.
Here the options discussed in the document. What is best is really a design decision based on tradeoffs you have to make for your specific use-case.
Cache the Database SQL ResultSet
Cache a serialized ResultSet object that contains the fetched database
row.
Pro: When data retrieval logic is abstracted (e.g., as in a Data Access Object or DAO layer), the consuming code expects only a
ResultSet object and does not need to be made aware of its
origination. A ResultSet object can be iterated over, regardless of
whether it originated from the database or was deserialized from the
cache, which greatly reduces integration logic. This pattern can be
applied to any relational database.
Con: Data retrieval still requires extracting values from the ResultSet object cursor and does not further simplify data access; it
only reduces data retrieval latency.
Cache Select Fields and Values in a Custom Format
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: This approach is easy to implement. You essentially store specific retrieved fields and values into a structure such as JSON or
XML and then SET that structure into a Redis string. The format you
choose should be something that conforms to your application’s data
access pattern.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis string and database
results). In addition, you are required to parse through the entire
structure to retrieve the individual attributes associated with it.
Cache Select Fields and Values into an Aggregate Redis Data Structure
Cache the fetched database row into a specific data structure that can
simplify the application’s data access.
Pro: When converting the ResultSet object into a format that simplifies access, such as a Redis Hash, your application is able to
use that data more effectively. This technique simplifies your data
access pattern by reducing the need to iterate over a ResultSet object
or by parsing a structure like a JSON object stored in a string. In
addition, working with aggregate data structures, such as Redis Lists,
Sets, and Hashes provide various attribute level commands associated
with setting and getting data, eliminating the overhead associated
with processing the data before being able to leverage it.
Con: Your application is using different types of objects when querying for particular data (e.g., Redis Hash and database results).
Cache Serialized Application Object Entities
Cache a subset of a fetched database row into a custom structure that
can be consumed by your applications.
Pro: Use application objects in their native application state with simple serializing and deserializing techniques. This can
rapidly accelerate application performance by minimizing data
transformation logic.
Con: Advanced application development use case
Regarding 2)
What would be a good rule of thumb with regards to the Key?
Using the SQL query as the Key is OK for as long as you are sure it is unique. Add prefixes if there is a risk of not-uniqueness. You may have other databases with the same table names, leading to the same queries. Also make them invariant: all lower case or upper case. Redis keys are case-sensitive.
But if you do so, does that mean you would have to store the entire object returned by the database as the Value (as per question 1)?
Not necessarily, it comes down to what processing you are doing with the query. Chances are some are best stored as raw entire object for processing, some as JSON-stringified object to return quickly to the client, some as rows, etc. The best is to adapt accordingly.
Is this the best way to do it?
Not necessarily.
If you are using an ORM, do you still use the SQL query as the key?
You may if your ORM easily exposes the SQL Query programmatically, and it is consistent.
I wouldn't get fixed on the idea of using the SQL Query as key, use something you can be sure it is consistent, it will optimize your processing, and you'll have clear rules to invalidate. It could be the method call with parameters, the web API call, etc.
Is there a way to write data and meta data atomically in azure storage for Page Blobs?
Consider a page blob which has multiple writers.
I see recommendations to use the meta data for things like record count, sequence number, general structure of the blob's data. However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Same applies to any scenario where the meta data write is not keyed by something particular to that write (eg, each write then writes a new name-value pair into meta data).
The below suggestion does not seem to work for me.
// 512 byte aligned stream with my data
Stream toWrite = PageAlignedStreamManager.Write(data);
long whereToWrite = this.MetaData.TotalWrittenSizeInBytes;
this.MetaData.TotalWrittenSizeInBytes += toWrite.Length;
await this.Blob.FetchAttributesAsync();
if (this.MetaData.TotalWrittenSizeInBytes > this.Blob.Properties.Length)
{
await this.Blob.ResizeAsync(PageAlignedMemoryStreamManager.PagesRequired(this.MetaData.TotalWrittenSizeInBytes) * PageAlignedMemoryStreamManager.PageSizeBytes * 2);
}
this.MetaData.RevisionNumber++;
this.Blob.Metadata[STREAM_METADATA_KEY] = JsonConvert.SerializeObject(this.MetaData);
// TODO: the below two lines should happen atomically
await this.Blob.WritePagesAsync(toWrite, whereToWrite, null, AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
await this.Blob.SetMetadataAsync(AccessCondition.GenerateLeaseCondition(this.BlobLeaseId), null, null);
toWrite.Dispose();
If I do not explicitly call SetMetaData as the next action, it does not get set :(
Is there a way to write data and meta data atomically in azure storage?
Yes. You could try to update the data and metadata atomically in this way. When we set/update blob metadata using the following code snippet, it is stored in current blob object. Currently, no network call is made.
blockBlob.Metadata["docType"] = "textDocuments";
when we use the following code to update blob, it actually makes the call to set the blob content and metadata. If upload fails, both blob content and metadata will not be updated.
blockBlob.UploadText("new content");
However, if two writers write data and then have to update the meta data, isn't there a race where each writes and tries to update the record count by reading the current count and then updating. Both read 0 and write 1, but there are actually 2.
Azure Storage supports these three data concurrency strategies (Optimistic concurrency, Pessimistic concurrency and Last writer wins), we could use optimistic concurrency control through the ETag property, or use pessimistic concurrency control through a lease, which could help us guarantee the data consistency.
My task is to dump entire Azure tables with arbitrary unknown schemas. Standard code to do this resembles the following:
TableQuery<DynamicTableEntity> query = new TableQuery<DynamicTableEntity>();
foreach (DynamicTableEntity entity in table.ExecuteQuery(query))
{
// Write a dump of the entity (row).
}
Depending on the table, this works at a rate of 1000-3000 rows per second on my system. I'm guessing this (lack of) performance has something to do with separate HTTP requests issued to retrieve the data in chunks. Unfortunately, some of the tables are multi-gigabyte in size, so this takes a rather long time.
Is there a good way to parallelize the above or speed it up some other way? It would seem that those HTTP requests could be sent by multiple threads, as in web crawlers and the like. However, I don't see an immediate method to do so.
Unless you know the PartitionKeys of the entities in the table (or some other querying criteria which includes PartitionKey), AFAIK you would need to take a top down approach which you're doing right now. In order for you to fire queries in parallel which would work efficiently you have to include PartitionKey in your queries.
I need to load a model, existing of +/- 20 tables from the database with Entity Framework.
So there are probably a few ways of doing this:
Use one huge Include call
Use many Includes calls while manually iterating the model
Use many IsLoaded and Load calls
Here's what happens with the 2 options
EF creates a HUGE query, puts a very heavy load on the DB and then again with mapping the model. So not really an option.
The database gets called a lot, with again pretty big queries.
Again, the database gets called even more, but this time with small loads.
All of these options weigh heavy on the performance. I do need to load all of that data (calculations for drawing).
So what can I do?
a) Heavy operation => heavy load => do nothing :)
b) Review design => but how?
c) A magical option that will make all these problems go away
When you need to load a lot of data from a lack of different tables, there is no "magic" solution which makes all problems go away. But in addition to what you have already discussed, you should consider projection. If you don't need every single property of an entity, it is often cheaper to project the information you do need, i.e.:
from parent in MyEntities.Parents
select new
{
ParentName = ParentName,
Children = from child in parent.Children
select new
{
ChildName = child.Name
}
}
One other thing to keep in mind is that for very large queries, the cost of compiling the query can often exceed the cost of executing it. Only profiling can tell you if this is the problem. If this turns out to be the problem, consider using CompiledQuery.
You might analyze the ratio of queries to updates. If you mostly upload the model once, then everything else is a query, then maybe you should store an XML representation of the model in the database as a "shadow" of the model. You should be able to either read the entire XML column in at once fairly quickly, or else maybe you can do your calculations (or at least the fetch of the values necessary for the calculations) using XQuery.
This assumes SQL Server 2005 or above.
You could consider caching your data in memory instead of getting it from the database each time.
I would recommend Enterprise Library Caching Application block: http://msdn.microsoft.com/en-us/library/dd203099.aspx