Save millions of rows from CSV to Oracle DB Using Spring boot JPA - spring

On regular Basis another application dumps a CSV that contains more than 7-8 millions of rows. I have a cron job that loads the data from CSV ans saves the data into my oracle DB. Here's my code snippet
String line = "";
int count = 0;
LocalDate localDateTime;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MMM-yy");
List<ItemizedBill> itemizedBills = new ArrayList<>();
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= line.split("\\|");
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
String date = data[1].substring(0,2);
String month = data[1].substring(3,6);
String year = data[1].substring(7,9);
month = WordUtils.capitalizeFully(month);
String modifiedDate = date + "-" + month + "-" + year;
localDateTime = LocalDate.parse(modifiedDate, formatter);
customer.setEventDate(localDateTime.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.valueOf(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) { customer.setFnfNum("Other"); }
else{ customer.setFnfNum("FNF Number"); }
itemizedBills.add(customer);
}
count++;
}
itemizedBillRepository.saveAll(itemizedBills);
} catch (IOException e) {
e.printStackTrace();
}
}
This feature works but takes a lot of time to process. How can I make it efficent and make this process faster?

There are a couple of things you should do to your code.
String.split, while convenient, is relatively slow as it will recompile the regexp each time. Better to use Pattern and the split method on that to reduce overhead.
Use proper JPA batching strategies as explained in this blog.
First enable batch processing in your Spring application.properties. We will use a batch size of 50 (you will need to experiment on what is a proper batch-size for your case).
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
Then directly save entities to the database and each 50 items do a flush and clear. This will flush the state to the database and clear the first level cache (which will prevent excessive dirty-checks).
With all the above your code should look something like this.
int count = 0;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MMM-yy");
Pattern splitter = Pattern.compile("\\|");
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= splitter.split(Line);
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
String date = data[1].substring(0,2);
String month = data[1].substring(3,6);
String year = data[1].substring(7,9);
month = WordUtils.capitalizeFully(month);
String modifiedDate = date + "-" + month + "-" + year;
LocalDate localDate = LocalDate.parse(modifiedDate, formatter);
customer.setEventDate(localDate.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.valueOf(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) {
customer.setFnfNum("Other");
} else {
customer.setFnfNum("FNF Number");
}
itemizedBillRepository.save(customer);
}
count++;
if ( (count % 50) == 0) {
this.entityManager.flush(); // sync with database
this.entityManager.clear(); // clear 1st level cache
}
}
} catch (IOException e) {
e.printStackTrace();
}
2 other optimizations you could do:
If your volume property is a long rather then a Long you should use Long.parseLong(data[4]); instead. It saves the Long creation and unboxing. With just 10 rows this might not be an issue, but with millions of rows, those milliseconds will add up.
Use ddMMMyy as the DateTimeFormatter and remove the substring parts in your code. Just do LocalDate.parse(date[1].toUpperCase(), formatted) to achieve the same result without the additional overhead of 5 additional String objects.
int count = 0;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("ddMMMyy");
Pattern splitter = Pattern.compile("\\|");
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= splitter.split(Line);
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
LocalDate localDate = LocalDate.parse(data[1].toUpperCase(), formatter);
customer.setEventDate(localDate.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.parseLong(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) {
customer.setFnfNum("Other");
} else {
customer.setFnfNum("FNF Number");
}
itemizedBillRepository.save(customer);
}
count++;
if ( (count % 50) == 0) {
this.entityManager.flush(); // sync with database
this.entityManager.clear(); // clear 1st level cache
}
}
} catch (IOException e) {
e.printStackTrace();
}

you can use spring data batch insert.This links explains how to do : https://www.baeldung.com/spring-data-jpa-batch-inserts

You can try streaming MySQL results using Java 8 Streams and Spring Data JPA. The below link explains it in details
http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html

Related

How to obtain a specific person's tweets in a specific time interval (Twitter4J)?

I want to collect some specific people's tweets in recent one year. I'm using Twitter4J, like this:
Paging paging = new Paging(i, 200);
try {
statuses = twitter.getUserTimeline("martinsuchan",paging);
} catch (TwitterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
But how can I filter the Tweets of that user for a certain time interval?
Any answer appreciated
You can filter the statuses locally, based on status.getCreatedAt(). Example:
try {
int statusesPerPage = 200;
int page = 1;
String username = "username";
Calendar cal = Calendar.getInstance();
cal.add(Calendar.YEAR, -1);
Twitter twitter = new TwitterFactory().getInstance();
Paging paging = new Paging(page, statusesPerPage);
List<Status> statuses = twitter.getUserTimeline(username, paging);
page_loop:
while (statuses.size() > 0) {
System.out.println("Showing #" + username + "'s home timeline, page " + page);
for (Status status : statuses) {
if (status.getCreatedAt().before(cal.getTime())) {
break page_loop;
}
System.out.println(status.getCreatedAt() + " - " + status.getText());
}
paging = new Paging(++page, statusesPerPage);
statuses = twitter.getUserTimeline(username, paging);
}
} catch (TwitterException te) {
te.printStackTrace();
}

Spring Data Neo4j Ridiculously Slow Over Rest

public List<Errand> interestFeed(Person person, int skip, int limit)
throws ControllerException {
person = validatePerson(person);
String query = String
.format("START n=node:ErrandLocation('withinDistance:[%.2f, %.2f, %.2f]') RETURN n ORDER BY n.added DESC SKIP %s LIMIT %S",
person.getLongitude(), person.getLatitude(),
person.getWidth(), skip, limit);
String queryFast = String
.format("START n=node:ErrandLocation('withinDistance:[%.2f, %.2f, %.2f]') RETURN n SKIP %s LIMIT %S",
person.getLongitude(), person.getLatitude(),
person.getWidth(), skip, limit);
Set<Errand> errands = new TreeSet<Errand>();
System.out.println(queryFast);
Result<Map<String, Object>> results = template.query(queryFast, null);
Iterator<Errand> objects = results.to(Errand.class).iterator();
return copyIterator (objects);
}
public List<Errand> copyIterator(Iterator<Errand> iter) {
Long start = System.currentTimeMillis();
Double startD = start.doubleValue();
List<Errand> copy = new ArrayList<Errand>();
while (iter.hasNext()) {
Errand e = iter.next();
copy.add(e);
System.out.println(e.getType());
}
Long end = System.currentTimeMillis();
Double endD = end.doubleValue();
p ((endD - startD)/1000);
return copy;
}
When I profile the copyIterator function it takes about 6 seconds to fetch just 10 results. I use Spring Data Neo4j Rest to connect with a Neo4j server running on my local machine. I even put a print function to see how fast the iterator is converted to a list and it does appear slow. Does each iterator.next() make a new Http call?
If Errand is a node entity then yes, spring-data-neo4j will make a http call for each entity to fetch all its labels (it's fault of neo4j, which doesn't return labels when you return whole node in cypher).
You can enable debug level logging in org.springframework.data.neo4j.rest.SpringRestCypherQueryEngine to log all cypher statements going to neo4j.
To avoid this call use #QueryResult http://docs.spring.io/spring-data/data-neo4j/docs/current/reference/html/#reference_programming-model_mapresult

NEST Elasticsearch Reindex examples

my objective is to reindex an index with 10 million shards for the purposes of changing field mappings to facilitate significant terms analysis.
My problem is that I am having trouble using the NEST library to perform a re-index, and the documentation is (very) limited. If possible I need an example of the following in use:
http://nest.azurewebsites.net/nest/search/scroll.html
http://nest.azurewebsites.net/nest/core/bulk.html
NEST provides a nice Reindex method you can use, although the documentation is lacking. I've used it in a very rough-and-ready fashion with this ad-hoc WinForms code.
private ElasticClient client;
private double count;
private void reindex_Completed()
{
MessageBox.Show("Done!");
}
private void reindex_Next(IReindexResponse<object> obj)
{
count += obj.BulkResponse.Items.Count();
var progress = 100 * count / (double)obj.SearchResponse.Total;
progressBar1.Value = (int)progress;
}
private void reindex_Error(Exception ex)
{
MessageBox.Show(ex.ToString());
}
private void button1_Click(object sender, EventArgs e)
{
count = 0;
var reindex = client.Reindex<object>(r => r.FromIndex(fromIndex.Text).NewIndexName(toIndex.Text).Scroll("10s"));
var o = new ReindexObserver<object>(onError: reindex_Error, onNext: reindex_Next, completed: reindex_Completed);
reindex.Subscribe(o);
}
And I've just found the blog post that showed me how to do it: http://thomasardal.com/elasticsearch-migrations-with-c-and-nest/
Unfortunately the NEST implementation is not quite what I expected. In my opinion it's a bit over-engineered for possibly the most common use case.
Alot of people just want to update their mappings with zero downtime...
In my case - I had already taken care of creating the index with all its settings and mappings, but NEST insists that it must create a new index when reindexing. That among many other things. Too many other things.
I found it much less complicated to just implement directly - since NEST already has Search, Scroll, and Bulk methods. (this is adopted from NEST's implementation):
// Assuming you have already created and setup the index yourself
public void Reindex(ElasticClient client, string aliasName, string currentIndexName, string nextIndexName)
{
Console.WriteLine("Reindexing documents to new index...");
var searchResult = client.Search<object>(s => s.Index(currentIndexName).AllTypes().From(0).Size(100).Query(q => q.MatchAll()).SearchType(SearchType.Scan).Scroll("2m"));
if (searchResult.Total <= 0)
{
Console.WriteLine("Existing index has no documents, nothing to reindex.");
}
else
{
var page = 0;
IBulkResponse bulkResponse = null;
do
{
var result = searchResult;
searchResult = client.Scroll<object>(s => s.Scroll("2m").ScrollId(result.ScrollId));
if (searchResult.Documents != null && searchResult.Documents.Any())
{
searchResult.ThrowOnError("reindex scroll " + page);
bulkResponse = client.Bulk(b =>
{
foreach (var hit in searchResult.Hits)
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
return b;
}).ThrowOnError("reindex page " + page);
Console.WriteLine("Reindexing progress: " + (page + 1) * 100);
}
++page;
}
while (searchResult.IsValid && bulkResponse != null && bulkResponse.IsValid && searchResult.Documents != null && searchResult.Documents.Any());
Console.WriteLine("Reindexing complete!");
}
Console.WriteLine("Updating alias to point to new index...");
client.Alias(a => a
.Add(aa => aa.Alias(aliasName).Index(nextIndexName))
.Remove(aa => aa.Alias(aliasName).Index(currentIndexName)));
// TODO: Don't forget to delete the old index if you want
}
And the ThrowOnError extension method in case you want it:
public static T ThrowOnError<T>(this T response, string actionDescription = null) where T : IResponse
{
if (!response.IsValid)
{
throw new CustomExceptionOfYourChoice(actionDescription == null ? string.Empty : "Failed to " + actionDescription + ": " + response.ServerError.Error);
}
return response;
}
I second Ben Wilde's answer above. Better to have full control over index creation and the re-index process.
What's missing from Ben's code is support for parent/child relationship. Here is my code to fix that:
Replace the following lines:
foreach (var hit in searchResult.Hits)
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
With this:
foreach (var hit in searchResult.Hits)
{
var jo = hit.Source as JObject;
JToken jt;
if(jo != null && jo.TryGetValue("parentId", out jt))
{
// Document is child-document => add parent reference
string parentId = (string)jt;
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id).Parent(parentId));
}
else
{
b.Index<object>(bi => bi.Document(hit.Source).Type(hit.Type).Index(nextIndexName).Id(hit.Id));
}
}

Custom Plugin Does Not Execute -- Need Assistance With Some Basic Understanding

I am new to CRM and I have created a new AutoNumber plugin (actually modified an existing plugin).
I am having issues getting this plugin to work on the CRM side.
I have created the plugin, and I have created a CREATE step. I am confused with the IMAGE creation, and how I need to go about doing this.
Also, I am using the localContext.Trace and I am not sure where to view this information.
Can anyone help me with understanding the exact steps I need to follow to implement this plugin. I will include my code here in case I am doing something wrong. Again, this was a working plugin and I just modified it. I tried to follow the pattern used by the previous developer.
FYI -- I have purchased the CRM Solution Manager utility to help with the deployment process, but still not having any luck.
Thanks in advance for your time.
Here is the code..
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using IccPlugin.ProxyClasses;
using IccPlugin.ProxyClasses.ProxyClasses;
using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Query;
namespace IccPlugin {
public class ProgramReportAutoNumber : PluginBase {
private readonly string imageAlias = "ProgramReport";
private new_programreport preImage { get; set; }
private new_programreport postImage { get; set; }
private new_programreport targetEntity { get; set; }
private readonly string imageAlias_program = "Program";
private new_program preImage_program { get; set; }
private new_program postImage_program { get; set; }
private new_program targetEntity_program { get; set; }
public ProgramReportAutoNumber(string unsecure, string secure)
: base(typeof(ProgramReportAutoNumber), unsecure, secure) {
base.RegisteredEvents.Add(new Tuple<int, string, string, Action<LocalPluginContext>>((int)CrmPluginStepStage.PreOperation, "Create", "new_programreport", new Action<LocalPluginContext>(Execute)));
//base.RegisteredEvents.Add(new Tuple<int, string, string, Action<LocalPluginContext>>((int)CrmPluginStepStage.PostOperation, "Update", "new_programreport", new Action<LocalPluginContext>(Execute)));
//base.RegisteredEvents.Add(new Tuple<int, string, string, Action<LocalPluginContext>>((int)CrmPluginStepStage.PostOperation, "Delete", "new_programreport", new Action<LocalPluginContext>(Execute)));
}
protected void Execute(LocalPluginContext localContext) {
if (localContext == null) {
throw new ArgumentNullException("localContext");
}
IPluginExecutionContext context = localContext.PluginExecutionContext;
if (context.PreEntityImages.Contains(imageAlias) && (context.PreEntityImages[imageAlias] is Entity)) {
preImage = new new_programreport((Entity)context.PreEntityImages[imageAlias]);
}
if (context.PostEntityImages.Contains(imageAlias) && (context.PostEntityImages[imageAlias] is Entity)) {
postImage = new new_programreport((Entity)context.PostEntityImages[imageAlias]);
}
if (context.PreEntityImages.Contains(imageAlias_program) && (context.PreEntityImages[imageAlias_program] is Entity)) {
preImage_program = new new_program((Entity)context.PreEntityImages[imageAlias_program]);
}
if (context.PostEntityImages.Contains(imageAlias_program) && (context.PostEntityImages[imageAlias_program] is Entity)) {
postImage_program = new new_program((Entity)context.PostEntityImages[imageAlias_program]);
}
if (context.InputParameters.Contains("Target") && (context.InputParameters["Target"] is Entity)) {
targetEntity = new new_programreport((Entity)context.InputParameters["Target"]);
}
switch (context.MessageName) {
case "Create":
HandleCreate(localContext);
break;
case "Update":
HandleUpdate(localContext);
break;
case "Delete":
HandleDelete(localContext);
break;
default:
throw new ArgumentException("Invalid Message Name: " + context.MessageName);
}
}
private void HandleDelete(LocalPluginContext localContext) {
localContext.Trace("START - IccPlugin.ProgramReport.AutoNumber.HandleDelete");
try {
if (preImage == null) {
throw new Exception("IccPlugin.ProgramReport.AutoNumber.HandleDelete: preImage is null, unable to process the delete message.");
}
// TODO: Add code here to implement delete message.
} catch (Exception ex) {
localContext.Trace(String.Format("IccPlugin.ProgramReport.AutoNumber.HandleDelete: Exception while processing the delete message, Error Message: {0}", ex.Message), ex);
throw ex;
} finally {
localContext.Trace("END - IccPlugin.ProgramReport.AutoNumber.HandleDelete");
}
return;
}
private void HandleUpdate(LocalPluginContext localContext) {
localContext.Trace("START - IccPlugin.ProgramReport.AutoNumber.HandleUpdate");
if (preImage == null) {
string msg = "IccPlugin.ProgramReport.AutoNumber.HandleUpdate : The Update step is not registered correctly. Unable to retrieve the pre-operation image using alias" + imageAlias;
localContext.Trace(msg);
throw new Exception(msg);
}
if (postImage == null) {
string msg = "IccPlugin.ProgramReport.AutoNumber.HandleUpdate : The Update step is not registered correctly. Unable to retrieve the post-operation image using alias" + imageAlias;
localContext.Trace(msg);
throw new Exception(msg);
}
if (preImage_program == null) {
string msg = "IccPlugin.Program.AutoNumber.HandleUpdate : The Update step is not registered correctly. Unable to retrieve the pre-operation image using alias" + imageAlias_program;
localContext.Trace(msg);
throw new Exception(msg);
}
if (postImage_program == null) {
string msg = "IccPlugin.Program.AutoNumber.HandleUpdate : The Update step is not registered correctly. Unable to retrieve the post-operation image using alias" + imageAlias_program;
localContext.Trace(msg);
throw new Exception(msg);
}
try {
// TODO: Add code here to implement update message.
} catch (Exception ex) {
localContext.Trace(String.Format("IccPlugin.ProgramReport.AutoNumber.HandleUpdate: Exception while processing the update message, Error Message: {0}", ex.Message), ex);
throw ex;
} finally {
localContext.Trace("END - IccPlugin.ProgramReport.AutoNumber.HandleUpdate");
}
return;
}
private void HandleCreate(LocalPluginContext localContext) {
localContext.Trace("START - IccPlugin.ProgramReport.AutoNumber.HandleCreate");
if (targetEntity == null) {
string msg = "IccPlugin.ProgramReport.AutoNumber.HandleCreate : The Create step is not registered correctly. Unable to retrieve the target entity using alias Target.";
localContext.Trace(msg);
throw new Exception(msg);
}
try {
// if the target entity does not have the new_filenumber attribute set we will set it now.
if (targetEntity.new_filenumber != null && targetEntity.new_filenumber != "") {
// log a warning message and do not change this value.
localContext.Trace("The Program Report being created already has a value for File Number, skipping the auto number assignment for this field.");
} else {
SetFileNumber(localContext);
}
if (targetEntity.new_name != null && targetEntity.new_name != "") {
localContext.Trace("The Program Report being created already has a value for Report Number, skipping the auto number assignment for this field.");
} else {
SetReportNumber(localContext);
}
} catch (Exception ex) {
localContext.Trace(String.Format("IccPlugin.ProgramReport.AutoNumber.HandleCreate: Exception while processing the create message, Error Message: {0}", ex.Message), ex);
throw ex;
} finally {
localContext.Trace("END - IccPlugin.ProgramReport.AutoNumber.HandleCreate");
}
return;
}
private void SetFileNumber(LocalPluginContext localContext) {
localContext.Trace("START - IccPlugin.ProgramReport.AutoNumber.SetFileNumber");
string s_new_filenumberformat = string.Empty;
string s_new_reportnumberprefix = string.Empty;
string s_new_filenumbercode = string.Empty;
try {
IOrganizationService service = localContext.OrganizationService;
string fileNumberValue = "";
emds_autonumbersequence fileNumberSequence = null;
// ##################################################################################################
// 05/02/2013 -- BEP -- Code added for the following change to the auto-number for file numbering
// ##################################################################################################
// 1 - Year/Month/Sequence
// [Year]-[Month]-[Sequence] = [Year] is the current year / [Month] is the current month / [Sequence] is a number series for each Year & Month and resets to 1 when the Month changes
// 2 - Year/PMG/Sequence - PMG
// [Year]-[PMGProductType][Sequence] = [Year] is the current year / [PMGProductType] is 1st letter from the PMG Product Type on the Program Report / [Sequence] is a single number series for this Format
// 3 - Year/Letter/Sequence - ESL,VAR
// [Year]-[FileCode][Sequence] = [Year] is the current year / [FileCode] is from a new field on the Program entity / [Sequence] is a number series for each Format & File Code
// ##################################################################################################
localContext.Trace("Look at the File Number Format to determine which format to use for the Auto-Number, will default to 1 if not set");
if (targetEntity_program.new_filenumberformat.ToString() != "") {
localContext.Trace("A value was set for the new_filenumberformat field, so we will be using this value.");
s_new_filenumberformat = targetEntity_program.new_filenumberformat.ToString();
} else {
localContext.Trace("A value was NOT set for the new_filenumberformat field, so we will be using 1 as the default.");
s_new_filenumberformat = "1";
}
localContext.Trace("File Number Format Being Used = " + s_new_filenumberformat);
switch (s_new_filenumberformat) {
case "1":
#region File Format #1
fileNumberValue = String.Format("{0}-{1}", DateTime.Now.ToString("yy"), DateTime.Now.ToString("MM"));
localContext.Trace("Building QueryExpression to retrieve FileNumber Sequence record.");
QueryExpression qeFileNumberSequence_1 = new QueryExpression(BaseProxyClass.GetLogicalName<emds_autonumbersequence>());
qeFileNumberSequence_1.ColumnSet = new ColumnSet(true);
qeFileNumberSequence_1.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_entitylogicalname, ConditionOperator.Equal, BaseProxyClass.GetLogicalName<new_programreport>());
qeFileNumberSequence_1.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_attributelogicalname, ConditionOperator.Equal, new_programreport.Properties.new_filenumber);
qeFileNumberSequence_1.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_prefix, ConditionOperator.Equal, fileNumberValue);
localContext.Trace("Getting FileNumber sequence record.");
List<emds_autonumbersequence> lFileNumberSequences_1 = service.RetrieveProxies<emds_autonumbersequence>(qeFileNumberSequence_1);
if (lFileNumberSequences_1 == null || lFileNumberSequences_1.Count == 0) {
localContext.Trace("No FileNumber sequence record was returned, creatign a new one.");
// no matching sequence records. Lets start a new sequence index record for this month and year.
fileNumberSequence = new emds_autonumbersequence();
fileNumberSequence.emds_attributelogicalname = new_programreport.Properties.new_filenumber;
fileNumberSequence.emds_entitylogicalname = BaseProxyClass.GetLogicalName<new_programreport>();
fileNumberSequence.emds_index = 1;
fileNumberSequence.emds_prefix = fileNumberValue;
fileNumberSequence.emds_name = String.Format("File Number Sequence For: {0}", fileNumberValue);
fileNumberSequence.Create(service);
} else {
localContext.Trace("A FileNumber sequence record was found, using it.");
// a file number sequence record was returned. Even if there happen to be multiple we are going to just use the first one returned.
fileNumberSequence = lFileNumberSequences_1[0];
}
// ###############################################################################
// 05/02/2013 -- BEP -- Changed the format from "###" to be "##" for seq number
// ###############################################################################
fileNumberValue = String.Format("{0}-{1:00}", fileNumberValue, fileNumberSequence.emds_index);
fileNumberSequence.emds_index++;
fileNumberSequence.Update(service);
#endregion
break;
case "2":
#region File Format #2
if (targetEntity_program.new_reportnumberprefix != null && targetEntity_program.new_reportnumberprefix != "") {
localContext.Trace("A value was set for the new_reportnumberprefix field, so we will be using this value.");
s_new_reportnumberprefix = targetEntity_program.new_reportnumberprefix;
} else {
localContext.Trace("A value was NOT set for the new_reportnumberprefix field, so we will be using P as the default.");
s_new_reportnumberprefix = "P";
}
fileNumberValue = String.Format("{0}-{1}", DateTime.Now.ToString("yy"), s_new_reportnumberprefix);
localContext.Trace("Building QueryExpression to retrieve FileNumber Sequence record.");
QueryExpression qeFileNumberSequence_2 = new QueryExpression(BaseProxyClass.GetLogicalName<emds_autonumbersequence>());
qeFileNumberSequence_2.ColumnSet = new ColumnSet(true);
qeFileNumberSequence_2.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_entitylogicalname, ConditionOperator.Equal, BaseProxyClass.GetLogicalName<new_programreport>());
qeFileNumberSequence_2.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_attributelogicalname, ConditionOperator.Equal, new_programreport.Properties.new_filenumber);
qeFileNumberSequence_2.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_prefix, ConditionOperator.Equal, "PMG");
localContext.Trace("Getting FileNumber sequence record.");
List<emds_autonumbersequence> lFileNumberSequences_2 = service.RetrieveProxies<emds_autonumbersequence>(qeFileNumberSequence_2);
if (lFileNumberSequences_2 == null || lFileNumberSequences_2.Count == 0) {
localContext.Trace("No FileNumber sequence record was returned, creatign a new one.");
// no matching sequence records. Lets start a new sequence index record for this month and year.
fileNumberSequence = new emds_autonumbersequence();
fileNumberSequence.emds_attributelogicalname = new_programreport.Properties.new_filenumber;
fileNumberSequence.emds_entitylogicalname = BaseProxyClass.GetLogicalName<new_programreport>();
fileNumberSequence.emds_index = 1;
fileNumberSequence.emds_prefix = "PMG";
fileNumberSequence.emds_name = String.Format("File Number Sequence For: {0}", fileNumberValue);
fileNumberSequence.Create(service);
} else {
localContext.Trace("A FileNumber sequence record was found, using it.");
// a file number sequence record was returned. Even if there happen to be multiple we are going to just use the first one returned.
fileNumberSequence = lFileNumberSequences_2[0];
}
fileNumberValue = String.Format("{0}-{1:0000}", fileNumberValue, fileNumberValue + fileNumberSequence.emds_index.ToString());
fileNumberSequence.emds_index++;
fileNumberSequence.Update(service);
#endregion
break;
case "3":
#region File Format #3
if (targetEntity_program.new_filenumbercode != null && targetEntity_program.new_filenumbercode != "") {
localContext.Trace("A value was set for the new_filenumbercode field, so we will be using this value.");
s_new_filenumbercode = targetEntity_program.new_filenumbercode;
} else {
localContext.Trace("A value was NOT set for the new_filenumbercode field, so we will be using L as the default.");
s_new_filenumbercode = "l";
}
fileNumberValue = String.Format("{0}-{1}", DateTime.Now.ToString("yy"), s_new_filenumbercode);
localContext.Trace("Building QueryExpression to retrieve FileNumber Sequence record.");
QueryExpression qeFileNumberSequence_3 = new QueryExpression(BaseProxyClass.GetLogicalName<emds_autonumbersequence>());
qeFileNumberSequence_3.ColumnSet = new ColumnSet(true);
qeFileNumberSequence_3.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_entitylogicalname, ConditionOperator.Equal, BaseProxyClass.GetLogicalName<new_programreport>());
qeFileNumberSequence_3.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_attributelogicalname, ConditionOperator.Equal, new_programreport.Properties.new_filenumber);
qeFileNumberSequence_3.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_prefix, ConditionOperator.Equal, fileNumberValue);
localContext.Trace("Getting FileNumber sequence record.");
List<emds_autonumbersequence> lFileNumberSequences_3 = service.RetrieveProxies<emds_autonumbersequence>(qeFileNumberSequence_3);
if (lFileNumberSequences_3 == null || lFileNumberSequences_3.Count == 0) {
localContext.Trace("No FileNumber sequence record was returned, creatign a new one.");
// no matching sequence records. Lets start a new sequence index record for this month and year.
fileNumberSequence = new emds_autonumbersequence();
fileNumberSequence.emds_attributelogicalname = new_programreport.Properties.new_filenumber;
fileNumberSequence.emds_entitylogicalname = BaseProxyClass.GetLogicalName<new_programreport>();
fileNumberSequence.emds_index = 1;
fileNumberSequence.emds_prefix = fileNumberValue;
fileNumberSequence.emds_name = String.Format("File Number Sequence For: {0}", fileNumberValue);
fileNumberSequence.Create(service);
} else {
localContext.Trace("A FileNumber sequence record was found, using it.");
// a file number sequence record was returned. Even if there happen to be multiple we are going to just use the first one returned.
fileNumberSequence = lFileNumberSequences_3[0];
}
fileNumberValue = String.Format("{0}-{1:0000}", fileNumberValue, fileNumberValue + fileNumberSequence.emds_index.ToString());
fileNumberSequence.emds_index++;
fileNumberSequence.Update(service);
#endregion
break;
default:
break;
}
targetEntity.new_filenumber = fileNumberValue;
} catch (Exception ex) {
localContext.Trace(String.Format("IccPlugin.ProgramReport.AutoNumber.SetFileNumber: Exception while setting the File Number value, Error Message: {0}", ex.Message), ex);
throw ex;
} finally {
localContext.Trace("END - IccPlugin.ProgramReport.AutoNumber.SetFileNumber");
}
}
private void SetReportNumber(LocalPluginContext localContext) {
localContext.Trace("START - IccPlugin.ProgramReport.AutoNumber.SetReportNumber");
string s_new_reportnumberprefix = string.Empty;
try {
IOrganizationService service = localContext.OrganizationService;
string reportNumberValue = "";
emds_autonumbersequence reportNumberSequence = null;
// ##################################################################################################
// 05/02/2013 -- BEP -- Code added for the following change to the auto-number for file numbering
// ##################################################################################################
// Currently the plugin uses the GP Class Id as the prefix for the Report Number.
// It now needs to use the Report Number Prefix field.
// ##################################################################################################
if (targetEntity_program.new_reportnumberprefix != null && targetEntity_program.new_reportnumberprefix != "") {
localContext.Trace("A value was set for the new_reportnumberprefix field, so we will be using this value.");
s_new_reportnumberprefix = targetEntity_program.new_reportnumberprefix;
} else {
localContext.Trace("A value was NOT set for the new_reportnumberprefix field, so we will be using P as the default.");
s_new_reportnumberprefix = "P";
}
localContext.Trace("Building QueryExpression to retrieve parent new_program record.");
// #################################################################################
// 05/02/2013 -- BEP -- The above code replaces the need to pull the GP Class ID
// #################################################################################
//new_program program = targetEntity.new_programid.RetrieveProxy<new_program>(service, new ColumnSet(true));
// going to assume that we were able to get the parent program record. If not an exception will be thrown.
// could add a check here and throw our own detailed exception if needed.
reportNumberValue = String.Format("{0}", s_new_reportnumberprefix); // using Trim just to be safe.
// now lets get the sequence record for this Report Number Prefix
QueryExpression qeReportNumberSequence = new QueryExpression(BaseProxyClass.GetLogicalName<emds_autonumbersequence>());
qeReportNumberSequence.ColumnSet = new ColumnSet(true);
qeReportNumberSequence.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_entitylogicalname, ConditionOperator.Equal, BaseProxyClass.GetLogicalName<new_programreport>());
qeReportNumberSequence.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_attributelogicalname, ConditionOperator.Equal, new_programreport.Properties.new_name);
qeReportNumberSequence.Criteria.AddCondition(emds_autonumbersequence.Properties.emds_prefix, ConditionOperator.Equal, reportNumberValue);
localContext.Trace("Getting Report Number sequence record.");
List<emds_autonumbersequence> lReportNumberSequences = service.RetrieveProxies<emds_autonumbersequence>(qeReportNumberSequence);
if (lReportNumberSequences == null || lReportNumberSequences.Count == 0) {
localContext.Trace("No Report Number sequence record was returned, creatign a new one.");
// no matching sequence records. Lets start a new sequence index record for this month and year.
reportNumberSequence = new emds_autonumbersequence();
reportNumberSequence.emds_attributelogicalname = new_programreport.Properties.new_name;
reportNumberSequence.emds_entitylogicalname = BaseProxyClass.GetLogicalName<new_programreport>();
reportNumberSequence.emds_index = 1;
reportNumberSequence.emds_prefix = reportNumberValue;
reportNumberSequence.emds_name = String.Format("Report Number Sequence For Report Number Prefix: {0}", reportNumberValue);
reportNumberSequence.Create(service);
} else {
localContext.Trace("A Report Number sequence record was found, using it.");
// a file number sequence record was returned. Even if there happen to be multiple we are going to just use the first one returned.
reportNumberSequence = lReportNumberSequences[0];
}
reportNumberValue = String.Format("{0}-{1}", reportNumberValue, reportNumberSequence.emds_index);
reportNumberSequence.emds_index++;
reportNumberSequence.Update(service);
targetEntity.new_name = reportNumberValue;
} catch (Exception ex) {
localContext.Trace(String.Format("IccPlugin.ProgramReport.AutoNumber.SetReportNumber: Exception while setting the File Number value, Error Message: {0}", ex.Message), ex);
throw ex;
} finally {
localContext.Trace("END - IccPlugin.ProgramReport.AutoNumber.SetReportNumber");
}
}
}
}
This is specifically to response to:
I am using the localContext.Trace and I am not sure where to view this
information
From the MSDN; Debug a Plug-In - Logging and Tracing
During execution and only when that plug-in passes an exception back
to the platform at run-time, tracing information is displayed to the
user. For a synchronous registered plug-in, the tracing information is
displayed in a dialog box of the Microsoft Dynamics CRM web
application. For an asynchronous registered plug-in, the tracing
information is shown in the Details area of the System Job form in the
web application.

Nhibernate paging performance

I have table that contains more than 12 millions of rows.
I need to index this rows using Lucene.NET (I need to perform initial indexing).
So I try to index in batch manner, by reading batch packets from sql (1000 rows per batch).
Here is how it looks:
public void BuildInitialBookSearchIndex()
{
FSDirectory directory = null;
IndexWriter writer = null;
var type = typeof(Book);
var info = new DirectoryInfo(GetIndexDirectory());
//if (info.Exists)
//{
// info.Delete(true);
//}
try
{
directory = FSDirectory.GetDirectory(Path.Combine(info.FullName, type.Name), true);
writer = new IndexWriter(directory, new StandardAnalyzer(), true);
}
finally
{
if (directory != null)
{
directory.Close();
}
if (writer != null)
{
writer.Close();
}
}
var fullTextSession = Search.CreateFullTextSession(Session);
var currentIndex = 0;
const int batchSize = 1000;
while (true)
{
var entities = Session
.CreateCriteria<BookAdditionalInfo>()
.CreateAlias("Book", "b")
.SetFirstResult(currentIndex)
.SetMaxResults(batchSize)
.List();
using (var tx = Session.BeginTransaction())
{
foreach (var entity in entities)
{
fullTextSession.Index(entity);
}
currentIndex += batchSize;
Session.Flush();
tx.Commit();
Session.Clear();
}
if (entities.Count < batchSize)
break;
}
}
But, the operation times out when current index is bigger then 6-7 million. NHibernate Pagging throws time out.
Any suggestions, any other way in NHibernate to index this 12 millions of rows?
EDIT:
Probably I will implement the most peasant solution.
Because BookId is cluster index in my table and select occurs very fast by BookId, I am going to find max BookId and going through all records and index all of them them.
for (long = 0; long < maxBookId; long++)
{
// get book by bookId
// if book exist, index it
}
If you have any other suggestion, please reply yo this question.
Instead of paging your whole data set, you could try to divide and conquer it. You said you had an index on book id, just change your criteria to return batches of books according to bounds of bookid :
var entities = Session
.CreateCriteria<BookAdditionalInfo>()
.CreateAlias("Book", "b")
.Add(Restrictions.Gte("BookId", low))
.Add(Restrictions.Lt("BookId", high))
.List();
Where low and high are set like 0-1000, 1001-2000, etc

Resources