Managing large data using caching in C# - performance

Suppose, I've a table Tab1 which contain a huge amount of data, let's say 500000 rows (half a million rows). Now, I want to cache that data every half an hour from Tab1 to HttpContext.Current.Cache["BookData"]. Now, the problem that I am facing is that whenever my c# code tries to fetch that data, it takes long as the data is huge. So, I want my c# code to show that older cached data unless the new data gets fetched from database and when new data is fetched completely, it would remove the older cached data and set the new cached data.
What I've tried is working nice and fine whenever the data is small :
[HttpGet]
public IEnumerable<OG_Books> GetBooks()
{
List<OG_Make> BookList = null;
if (HttpContext.Current.Cache["BookData"] != null)
{
DataSet ds = new DataSet();
ds = HttpContext.Current.Cache["BookData"] as DataSet;
BookList = EnumerableExtension.ToList<OG_Books>(ds.Tables[0]);
}
else
{
SqlConnection con = new SqlConnection(ConnectionString);
var command = new SqlCommand("Select_OG_Book", con);
command.CommandType = CommandType.StoredProcedure;
SqlDataAdapter da = new SqlDataAdapter(command);
DataSet ds = new DataSet();
da.Fill(ds);
HttpContext.Current.Cache.Insert("BookData", ds, null, Cache.NoAbsoluteExpiration, new TimeSpan(0, 30, 0));
BookList = EnumerableExtension.ToList<OG_Books>(ds.Tables[0]);
}
return BookList;
}
The above code caches the a new data every half an hour, which is working nice and fine. Now whenever, that data is huge, my code get stuck for a long time on line :
da.Fill(ds);
So, my question is that, how do I return the older cached data while the new data is being fetched from database and when the new data is fetched completely, it would remove the older cached data and cache the new one, how is it possible?

You can use a grace period for your object in cache.
You can implement it with 2 objects in cache with different TTL.
The first one for your "BookData" (your DataSet) with a TTL of 1 hour for example (instead of 30 minutes).
The second one (can be an object) is to control the duration of BookData in cache before refreshing the data. You can use "BookDataInCache" with a TTL of 30 minutes for example.
When you insert "BookData", you have to insert "BookDataInCache" at the same time.
When you get "BookData", you have to check if "BookDataInCache" is still in cache.
If "BookDataInCache" is in cache you can return "BookData",
else you can return "BookData" and make an async request to get and insert the new "BookData" with "BookDataInCache".

Related

How do I update one column of all rows in a large table in my Spring Boot application?

I have a Spring Boot 2.x project with a big Table in my Cassandra Database. In my Liquibase Migration Class, I need to replace a value from one column in all rows.
For me its a big perfomance hit, when I try to solve this with
SELECT * FROM BOOKING
forEach Row
Update Row
Because of the total number of rows. Even when I select only 1 Column.
Is it possible to make something like "partwise/pagination" loop?
Pseudecode
Take first 1000 rows
do Update
Take next 1000 rows
do Update
loop.
Im also happy about all other solution approaches you have.
Must known:
Make sure there is a way to group the updates by partition. If you try a batchUpdate on 1000 rows not in same partition the coordinator of the request will suffer, you are moving the load from your client to the coordinator, and you want the parallelize the writes instead. A batchUpdate with cassandra has nothing to do with the one in relational databases.
For fined-grained operations like this you want to go back to the usage of the drivers with CassandraOperations and CqlSession for maximum control
There is a way to paginate with Spring Data cassandra using Slice but do not have control over how operations are implemented.
Spring Data Cassandra core
Slice<MyEntity> slice = MyEntityRepo.findAll(CassandraPageRequest.first(size));
while(slice.hasNext() && currpage < page) {
slice = personrepo.findAll(slice.nextPageable());
currpage++;
}
slice.getContent();
Drivers:
// Prepare Statements to speed up queries
PreparedStatement selectPS = session.prepare(QueryBuilder
.selectFrom( "myEntity").all()
.build()
.setPageSize(1000) // 1000 per pages
.setTimeout(Duration.ofSeconds(10)); // 10s timeout
PreparedStatement updatePS = session.prepare(QueryBuilder
.update("mytable")
.setColumn("myColumn", QueryBuilder.bindMarker())
.whereColumn("myPK").isEqualTo(QueryBuilder.bindMarker())
.build()
.setConsistencyLevel(ConsistencyLevel.ONE)); // Fast writes
// Paginate
ResultSet page1 = session.execute(selectPS);
Iterator<Row> page1Iter = page1.iterator();
while (0 < page1.getAvailableWithoutFetching()) {
Row row = page1Iter.next();
cqlsession.executeAsync(updatePS.bind(...));
}
ByteBuffer pagingStateAsBytes =
page1.getExecutionInfo().getPagingState();
selectPS.setPagingState(pagingStateAsBytes);
ResultSet page2 = session.execute(selectPS);
You could of course include this pagination in a loop and track progress.

ADO.NET - Data Adapter Fill Method - Fill Dataset with rows modified in SQL

I am using ADO.NET with Data Adaptor to Fill a Dataset in my .NET Core 3.1 Project.
The first run for the Fill method occurs when my program initially starts so I have an in memeory cache to start using with my business/program logic. When I then make any changes to the tables using EF Core, once the changes have been saved I then run the Data Adapter Fill method to re-populate the Dataset with the updates from the tables that were modified in SQL through EF Core..
Reading various docs for a number of days now, what I'm unclear about is whether the Data Adapter Fill method overwrites all of the existing table rows in the Dataset each time the fill method is called? i.e if I'm loading a dataset with a table from SQL that has 10k rows, is it going to overwrite all 10k rows that exist in the dataset, even if 99% of the rows have not changed?
The reason I am going down the Dataset route is that I want to keep and in memory cache of the various tables from SQL so I can query the data as fast as possible without raising queries SQL all the time.
The solution I want is something along the lines of Data Adaptor Fill method, but I don't want the Dataset to be overwritten for any rows that had not been modified in SQL since the last run.
Is this how things are working already? or do I have to look for another solution?
Below just an example of the Adaptor Fill method.
public async Task<AdoNetResult> FillAlarmsDataSet()
{
string connectionString = _config.GetConnectionString("DefaultConnection");
try
{
string cmdText1 = "SELECT * FROM [dbo].[Alarm] ORDER BY Id;" +
"SELECT * FROM [dbo].[AlarmApplicationRole] ORDER BY Id;";
dataAdapter = new SqlDataAdapter(cmdText1, connectionString);
// Create table mappings
dataAdapter.TableMappings.Add("Alarm", "Alarm");
dataAdapter.TableMappings.Add("AlarmApplicationRole", "AlarmApplicationRole");
alarmDataSet = new DataSet
{
Locale = CultureInfo.InvariantCulture
};
// Create and fill the DataSet
await Task.Run(() => dataAdapter.Fill(alarmDataSet));
return AdoNetResult.Success;
}
catch (Exception ex)
{
// Return the task with details of the exception
return AdoNetResult.Failed(ex);
}
}

Caching is working for one hour while it should be for days

I have created an API using .NETCore 2.0 ; This API is connected to an oracle database to retrieve needed data; One of the functions takes too much time so I decided to use caching in order to retrieve data faster;
Function description: Get ranking
Caching period: Data should be renewed in cache memory each Monday
I am using IMemoryCache, but the problem is that data is not being cached for multiple days; It lasts only for one hour, after that data is being retrieved from database and takes too much time (10 s.); Below is my code:
var dateNow = DateTime.Now;
int diff = 7; // if today is Monday then should add 7 days to get next Monday date
if (dateNow.DayOfWeek != DayOfWeek.Monday) {
var daysToStartWeek = dateNow.DayOfWeek - DayOfWeek.Monday;
diff = (7 - (daysToStartWeek)) % 7;
}
var nextMonday = dateNow.AddDays(diff).Date;
var totalDays = (nextMonday - dateNow).TotalDays;
if (_cache.TryGetValue("GetRanking", out IEnumerable<GetRankingStruct> objRanking))
{
return Ok(objRanking);
}
var dp = new DataProvider(Configuration);
var response = dp.GetRanking(userName, asAtDate);
_cache.Set("GetRanking", response, TimeSpan.FromDays(diff));
return Ok(response);
Could be related to the token life Time since it's only 1 hour?
Firstly - have you tried checking to see if your worker process is being restarted? You don't specify how you are hosting your application but, obviously, if the application (worker process) is restarted your memory cache will be empty.
If your worker process / process is restarting then you could load the cache on start up.
Secondly - I believe that the implementation may choose to empty the cache due to inactivity or memory constraints. You can set the priority to never remove - https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.cacheitempriority?view=dotnet-plat-ext-3.1
I believe you can set this by passing a MemoryCacheOptions object to the constructor of the memory cache https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.memorycache.-ctor?view=dotnet-plat-ext-3.1#Microsoft_Extensions_Caching_Memory_MemoryCache__ctor_Microsoft_Extensions_Options_IOptions_Microsoft_Extensions_Caching_Memory_MemoryCacheOptions__.
Finally - I assume you've made your _cache object static so it is shared by all instances of your class. (Or made the controller, if that's what it is, a singleton).
These are my suggestions.
Good luck.

EF 6.2 code first, simple query takes very long

In an old DB application I'd like to start moving towards code first approach.
There are a lot of SPs, triggers, functions, etc. in the database which make things error prone.
As a starter, I'd like to have a proof of concept, therefore I started with a new solution, where I imported the entire database (Add new item -> ADO.NET entity data model -> Code First from database)
As a simple first shot I wanted to query 1 column of 1 table. The table contains about 5k rows and the result delivers 3k strings. This takes over 90 seconds now!
Here's the code of the query:
static void Main(string[] args)
{
using (var db = new Model1())
{
var theList = db.T_MyTable.AsNoTracking()
.Where(t => t.SOME_UID != null)
.OrderBy(t => t.SOMENAME)
.Select(t => t.SOMENAME)
.ToList();
foreach (var item in theList)
{
Console.WriteLine(item);
}
Console.WriteLine("Number of names: " + theList.Count());
}
Console.ReadKey();
}
In the generated table code I added the column type "VARCHAR" to all of the string fields/column properties:
[Column(TypeName = "VARCHAR")] // this I added to all of the string properties
[StringLength(50)]
public string SOME_UID { get; set; }
I assume I miss out an important step, can't believe code first query is so slow.
I figured the root cause is the huge context that needs to be built, existing of over 1000 tables/files.
How I found the problem: using the profiler I observed that the expected query hits the database after about 90 seconds, telling me that the query itself is fast. Then I tried the same code in a new project, where I only imported the single table I access in the code.
Another proof that it's context related is executing the query twice in the same session; the second time was executed in the milliseconds.
Key point: if you have a legacy database with a lot of tables, don't use 1 single DbContext that contains all the tables (except for initializing the database), but several smaller domain specific ones with the tables you need for the given domain context. Entities can exist in multiple DbContexts, taylor the relationships (e.g. by "Ignore"-ing where not required) and do lazy loading where appropriate. These things help to boost performance.

Get Crystal Report data in session

I have noticed that crystal report runs the Linq query once again when the page index is changed, means when we load second page from first page?
So just wanted to know if we can get which page is loaded so that we can keep values in session.
Just a hint is required as I am not getting the desired results from Google.
Update:
I am sorry in a hurry I just clicked on a wrong tag.
So the problem is like:
This is my code below which I use fr running my crystal report:
var rpt = new Result();
List<class> lst1 = new DALMethod().Get();
rpt.SetDataSource(lst1);
CRReportViewer.ReportSource = rpt;
When I switch from page one to two or more, this method in DAL is called again taking the same time it took first time to load, so I just want to have the data in session when query runs first time, and next time when I get the page index, then I will show data from session.
Is there a way around by which I can get the page index in this c# code?
I had found the solution, hope this might help someone else:
I was using a generic list as a data source:
As soon as we get to know the page loads for the first time, I mean not a postback, we can initialize a list to be maintained in session.
After showing the report we can add the data source (which is a list type).
On Report page shift data will be taken from session.
if (!IsPostBack)
{
//clear session and create new session
Session["ReportGenericList"] = null;
}
List<class> datasourceLst=null;
if (Session["ReportGenericList"] != null)
{
datasourceLst= (List<class>)Session["ReportGenericList"];
}
else
{
datasourceLst = //call methods to fill datasource
Session["ReportGenericList"] = datasourceLst;
}

Resources