On regular Basis another application dumps a CSV that contains more than 7-8 millions of rows. I have a cron job that loads the data from CSV ans saves the data into my oracle DB. Here's my code snippet
String line = "";
int count = 0;
LocalDate localDateTime;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MMM-yy");
List<ItemizedBill> itemizedBills = new ArrayList<>();
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= line.split("\\|");
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
String date = data[1].substring(0,2);
String month = data[1].substring(3,6);
String year = data[1].substring(7,9);
month = WordUtils.capitalizeFully(month);
String modifiedDate = date + "-" + month + "-" + year;
localDateTime = LocalDate.parse(modifiedDate, formatter);
customer.setEventDate(localDateTime.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.valueOf(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) { customer.setFnfNum("Other"); }
else{ customer.setFnfNum("FNF Number"); }
itemizedBills.add(customer);
}
count++;
}
itemizedBillRepository.saveAll(itemizedBills);
} catch (IOException e) {
e.printStackTrace();
}
}
This feature works but takes a lot of time to process. How can I make it efficent and make this process faster?
There are a couple of things you should do to your code.
String.split, while convenient, is relatively slow as it will recompile the regexp each time. Better to use Pattern and the split method on that to reduce overhead.
Use proper JPA batching strategies as explained in this blog.
First enable batch processing in your Spring application.properties. We will use a batch size of 50 (you will need to experiment on what is a proper batch-size for your case).
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
Then directly save entities to the database and each 50 items do a flush and clear. This will flush the state to the database and clear the first level cache (which will prevent excessive dirty-checks).
With all the above your code should look something like this.
int count = 0;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("dd-MMM-yy");
Pattern splitter = Pattern.compile("\\|");
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= splitter.split(Line);
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
String date = data[1].substring(0,2);
String month = data[1].substring(3,6);
String year = data[1].substring(7,9);
month = WordUtils.capitalizeFully(month);
String modifiedDate = date + "-" + month + "-" + year;
LocalDate localDate = LocalDate.parse(modifiedDate, formatter);
customer.setEventDate(localDate.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.valueOf(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) {
customer.setFnfNum("Other");
} else {
customer.setFnfNum("FNF Number");
}
itemizedBillRepository.save(customer);
}
count++;
if ( (count % 50) == 0) {
this.entityManager.flush(); // sync with database
this.entityManager.clear(); // clear 1st level cache
}
}
} catch (IOException e) {
e.printStackTrace();
}
2 other optimizations you could do:
If your volume property is a long rather then a Long you should use Long.parseLong(data[4]); instead. It saves the Long creation and unboxing. With just 10 rows this might not be an issue, but with millions of rows, those milliseconds will add up.
Use ddMMMyy as the DateTimeFormatter and remove the substring parts in your code. Just do LocalDate.parse(date[1].toUpperCase(), formatted) to achieve the same result without the additional overhead of 5 additional String objects.
int count = 0;
Instant from = Instant.now();
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("ddMMMyy");
Pattern splitter = Pattern.compile("\\|");
try {
BufferedReader br=new BufferedReader(new FileReader("/u01/CDR_20210325.csv"));
while((line=br.readLine())!=null) {
if (count >= 1) {
String [] data= splitter.split(Line);
ItemizedBill customer = new ItemizedBill();
customer.setEventType(data[0]);
LocalDate localDate = LocalDate.parse(data[1].toUpperCase(), formatter);
customer.setEventDate(localDate.atStartOfDay(ZoneId.systemDefault()).toInstant());
customer.setaPartyNumber(data[2]);
customer.setbPartyNumber(data[3]);
customer.setVolume(Long.parseLong(data[4]));
customer.setMode(data[5]);
if(data[6].contains("0")) {
customer.setFnfNum("Other");
} else {
customer.setFnfNum("FNF Number");
}
itemizedBillRepository.save(customer);
}
count++;
if ( (count % 50) == 0) {
this.entityManager.flush(); // sync with database
this.entityManager.clear(); // clear 1st level cache
}
}
} catch (IOException e) {
e.printStackTrace();
}
you can use spring data batch insert.This links explains how to do : https://www.baeldung.com/spring-data-jpa-batch-inserts
You can try streaming MySQL results using Java 8 Streams and Spring Data JPA. The below link explains it in details
http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html
I have a Java method as below:
private static boolean isDateBetweenRange(DataSet obj, MyClass dataSource, ConditionContext context) {
FilterContext fc = dataSource.getData();
LocalDate dateFieldToCheck = obj.getDate(fc.getDateField()).toInstant()
.atZone(ZoneId.systemDefault()).toLocalDate();
LocalDate minDate = fc.getMinDateValue();
LocalDate maxDate = fc.getMaxDateValue();
if (minDate == null || maxDate == null) {
minDate = context.getStartDate().toInstant().atZone(ZoneId.systemDefault())
.toLocalDate();
maxDate = context.getEndDate().toInstant().atZone(ZoneId.systemDefault())
.toLocalDate();
}
boolean result = (dateFieldToCheck.isAfter(minDate) || dateFieldToCheck.isEqual(minDate))
&& (dateFieldToCheck.isBefore(maxDate) || dateFieldToCheck.isEqual(maxDate));
return result;
}
I want to make the same logic for LocalDateTime also. It's gonna be the exact same code for LocalDateTime if I overload the method.
How can make the method generic to work with LocalDate and LocalDateTime using Generics or any other mechanism?
How can I make context.getXXXDate()... toLocalDate() or toLocalDateTime() a common code based on type I have?
The TemporalAccessor interface can be used to do this. However, be aware that TemporalAccessor is an advanced interface that should not be used outside low-level utility code.
boolean api(TemporalAccessor temporal1, TemporalAccessor temporal2) {
LocalDate date1 = LocalDate.from(temporal1);
LocalDate date2 = LocalDate.from(temporal2);
return ...;
}
This code will now accept LocalDate, LocalDateTime, OffsetDateTime and , ZonedDateTime.
As mentioned in the comments, it is vital to only call ZoneId.systemDefault() once within a piece of business logic, as the value can change.
Well... LocalDateTime is an immutable composition of LocalDate and LocalTime. It has methods toLocalDate() and toLocalTime() with which you can "decompose it". The opposite is also true - you can also pretty easily create a composition - see LocalDate.atTime(LocalTime).
It seems natural to me that if you consider time in your logic your extracted API-s should accept LocalDateTime. In the suspicious cases :) in which you want to feed for some reason this API with LocalDate-s you may explicitly use, for example, LocalDate.atStartOfDay. Something as this:
boolean api(LocalDate aDate, LocalDate anotherDate)
{
return this.api(aDate.atStartOfDay(), anotherDate.atStartOfDay());
}
boolean api(LocalDateTime aDateTime, LocalDateTime anotherDateTime)
{
return ...;
}
Which seems a bit ugly though but will still work pretty fine, won't it?
You can make your method receives a java.time.temporal.Temporal as parameter (or a TemporalAccessor as suggested by #JodaStephen). This code works for both.
I must admit that I'd prefer to do it like #Lachezar Balev's answer is suggesting, but anyway, here's an alternative.
As a Temporal can be of any type (LocalDate, LocalDateTime, ZonedDateTime, etc), but you want just LocalDate and LocalDateTime, I'm doing the following:
try to create a LocalDateTime first: it requires more fields than a LocalDate (hour/minute/second), so if it can be created, it's the prefered type
if it can't be created, I catch the respective exception and try to create a LocalDate
Another detail is that I also simplified your comparison:
dateFieldToCheck.isAfter(minDate) || dateFieldToCheck.isEqual(minDate)
Is equivalent to:
! dateFieldToCheck.isBefore(minDate)
And a similar simplification was done when comparing with maxDate:
public boolean check(Temporal dateFieldToCheck, Temporal minDate, Temporal maxDate) {
// try to get as LocalDateTime
try {
LocalDateTime dtCheck = LocalDateTime.from(dateFieldToCheck);
LocalDateTime dtMin = LocalDateTime.from(minDate);
LocalDateTime dtMax = LocalDateTime.from(maxDate);
return (!dtCheck.isBefore(dtMin)) && (!dtCheck.isAfter(dtMax));
} catch (DateTimeException e) {
// exception: one of the Temporal objects above doesn't have all fields required by a LocalDateTime
// trying a LocaDate instead
LocalDate dCheck = LocalDate.from(dateFieldToCheck);
LocalDate dMin = LocalDate.from(minDate);
LocalDate dMax = LocalDate.from(maxDate);
return (!dCheck.isBefore(dMin)) && (!dCheck.isAfter(dMax));
}
}
Now you can call this method with both a LocalDate and a LocalDateTime:
check(LocalDate.of(2017, 6, 19), LocalDate.of(2017, 6, 18), LocalDate.of(2017, 6, 29)); // true
check(LocalDate.of(2017, 6, 17), LocalDate.of(2017, 6, 18), LocalDate.of(2017, 6, 29)); // false
check(LocalDateTime.of(2017, 6, 29, 9, 0), LocalDateTime.of(2017, 6, 29, 8, 0), LocalDateTime.of(2017, 6, 29, 11, 0)); // true
check(LocalDateTime.of(2017, 6, 29, 7, 0), LocalDateTime.of(2017, 6, 29, 8, 0), LocalDateTime.of(2017, 6, 29, 11, 0)); // false
You can extend the code to recognize other types, if you need (such as LocalTime).
You could also add another parameter indicating what type will be used, like this:
public boolean check(Temporal dateFieldToCheck, Temporal minDate, Temporal maxDate, int type) {
switch (type) {
case 1:
// try to get as LocalDateTime
LocalDateTime dtCheck = LocalDateTime.from(dateFieldToCheck);
LocalDateTime dtMin = LocalDateTime.from(minDate);
LocalDateTime dtMax = LocalDateTime.from(maxDate);
return (!dtCheck.isBefore(dtMin)) && (!dtCheck.isAfter(dtMax));
case 2:
// trying a LocaDate instead
LocalDate dCheck = LocalDate.from(dateFieldToCheck);
LocalDate dMin = LocalDate.from(minDate);
LocalDate dMax = LocalDate.from(maxDate);
return (!dCheck.isBefore(dMin)) && (!dCheck.isAfter(dMax));
// ... and so on
}
// invalid type, return false?
return false;
}
So you could call it with a ZonedDateTime and force it to use a LocalDate (then you could adapt your code and call this method with the objects you've got from atZone(ZoneId.systemDefault())):
ZonedDateTime z1 = obj.getDate(fc.getDateField()).toInstant().atZone(ZoneId.systemDefault());
ZonedDateTime z2 = context.getStartDate().toInstant().atZone(ZoneId.systemDefault());
ZonedDateTime z3 = context.getEndDate().toInstant().atZone(ZoneId.systemDefault());
check(z1, z2, z3, 2);
This code also works with different types:
ZonedDateTime z1 = ...
LocalDate d = ...
LocalDateTime dt = ...
// converts all to LocalDate
System.out.println(check(z1, dt, d, 2));
PS: of course you could create an Enum instead of an int to define the type. Or use a Class<? extends Temporal> to directly indicate the type:
public boolean check(Temporal dateFieldToCheck, Temporal minDate, Temporal maxDate, Class<? extends Temporal> type) {
if (type == LocalDate.class) {
// code for LocalDate (use LocalDate.from() as above)
}
if (type == LocalDateTime.class) {
// code for LocalDateTime (use LocalDateTime.from() as above)
}
}
// convert all objects to LocalDate
check(z1, dt, d, LocalDate.class);
Let's say I want to know how many days are until Christmas with a method that works any day of any year so next Christmas may be this year or next year that I don't know if it is a leap year or not.
I might calculate the next Christmas date and then calculate the days from now until then. I can represent Christmas Day as MonthDay.of(12, 25) but I can't find how that helps.
I found it is easy to calculate the date of next Monday this way:
ZonedDateTime nextMonday = ZonedDateTime.now()
.with(TemporalAdjusters.next(DayOfWeek.MONDAY))
.truncatedTo(ChronoUnit.DAYS);
But I can't find any TemporalAdjuster to do the same with MonthDay.
Is there an easy way I didn't find?
I don't think there is a built-in temporal adjuster to go to the next "MonthDay" but you can build it yourself:
public static void main(String[] args) {
MonthDay XMas = MonthDay.of(DECEMBER, 25);
System.out.println(LocalDate.of(2014, DECEMBER, 5).with(nextMonthDay(XMas)));
System.out.println(LocalDate.of(2014, DECEMBER, 26).with(nextMonthDay(XMas)));
}
public static TemporalAdjuster nextMonthDay(MonthDay monthDay) {
return (temporal) -> {
int day = temporal.get(DAY_OF_MONTH);
int month = temporal.get(MONTH_OF_YEAR);
int targetDay = monthDay.getDayOfMonth();
int targetMonth = monthDay.getMonthValue();
return MonthDay.of(month, day).isBefore(monthDay)
? temporal.with(MONTH_OF_YEAR, targetMonth).with(DAY_OF_MONTH, targetDay)
: temporal.with(MONTH_OF_YEAR, targetMonth).with(DAY_OF_MONTH, targetDay).plus(1, YEARS);
}
I am using the following temporal adjusters:
public static TemporalAdjuster nextOrSame(MonthDay monthDay) {
return temporal -> monthDay.adjustInto(temporal).plus(MonthDay.from(temporal).compareTo(monthDay) > 0 ? 1 : 0, YEARS);
}
public static TemporalAdjuster previousOrSame(MonthDay monthDay) {
return temporal -> monthDay.adjustInto(temporal).minus(MonthDay.from(temporal).compareTo(monthDay) < 0 ? 1 : 0, YEARS);
}
Here is a method that creates a temporal adjuster given a MonthDay, just like the one of assylias but code is different. I think both work.
private static TemporalAdjuster nextMonthDayAdjuster(final MonthDay md) {
return (Temporal d) -> {
Function<Integer, Temporal> dateOnYear = year -> md.atYear(year).adjustInto(d);
int year = d.get(ChronoField.YEAR);
Temporal dateThatYear = dateOnYear.apply(year);
if (d.until(dateThatYear, ChronoUnit.NANOS) > 0L) {
return dateThatYear;
} else {
return dateOnYear.apply(year + 1);
}
};
}
I am looking for a pattern, algorithm, or library that will take a set of dates and return a description of the recurrence if one exits, i.e. the set [11-01-2010, 11-08-2010, 11-15-2010, 11-22-2010, 11-29-2010] would yield something like "Every Monday in November".
Has anyone seen anything like this before or have any suggestions on the best way to implement it?
Grammatical Evolution (GE) is suitable for this kind of problem, because you are searching for an answer that adheres to a certain language. Grammatical Evolution is also used for program generation, composing music, designing, etcetera.
I'd approach the task like this:
Structure the problem space with a grammar.
Construct a Context-free Grammar that can represent all desired recurrence patterns. Consider production rules like these:
datepattern -> datepattern 'and' datepattern
datepattern -> frequency bounds
frequency -> 'every' ordinal weekday 'of the month'
frequency -> 'every' weekday
ordinal -> ordinal 'and' ordinal
ordinal -> 'first' | 'second' | 'third'
bounds -> 'in the year' year
An example of a pattern generated by these rules is: 'every second and third wednesday of the month in the year 2010 and every tuesday in the year 2011'
One way to implement such a grammar would be through a class hierarchy that you will later operate on through reflection, as I've done in the example below.
Map this language to a set of dates
You should create a function that takes a clause from your language and recursively returns the set of all dates covered by it. This allows you to compare your answers to the input.
Guided by the grammar, search for potential solutions
You could use a Genetic algorithm or Simulated Annealing to match the dates to the grammar, try your luck with Dynamic Programming or start simple with a brute force enumeration of all possible clauses.
Should you go with a Genetic Algorithm, your mutation concept should consist of substituting an expression for another one based on the application of one of your production rules.
Have a look at the following GE-related sites for code and information:
http://www.bangor.ac.uk/~eep201/jge/
http://nohejl.name/age/
http://www.geneticprogramming.us/Home_Page.html
Evaluate each solution
The fitness function could take into account the textual length of the solution, the number of dates generated more than once, the number of dates missed, as well as the number of wrong dates generated.
Example code
By request, and because it's such an interesting challenge, I've written a rudimentary implementation of the algorithm to get you started. Although it works it is by no means finished, the design should definitively get some more thought, and once you have gleaned the fundamental take-aways from this example I recommend you consider using one the libraries I've mentioned above.
/// <summary>
/// This is a very basic example implementation of a grammatical evolution algorithm for formulating a recurrence pattern in a set of dates.
/// It needs significant extensions and optimizations to be useful in a production setting.
/// </summary>
static class Program
{
#region "Class hierarchy that codifies the grammar"
class DatePattern
{
public Frequency frequency;
public Bounds bounds;
public override string ToString() { return "" + frequency + " " + bounds; }
public IEnumerable<DateTime> Dates()
{
return frequency == null ? new DateTime[] { } : frequency.FilterDates(bounds.GetDates());
}
}
abstract class Bounds
{
public abstract IEnumerable<DateTime> GetDates();
}
class YearBounds : Bounds
{
/* in the year .. */
public int year;
public override string ToString() { return "in the year " + year; }
public override IEnumerable<DateTime> GetDates()
{
var firstDayOfYear = new DateTime(year, 1, 1);
return Enumerable.Range(0, new DateTime(year, 12, 31).DayOfYear)
.Select(dayOfYear => firstDayOfYear.AddDays(dayOfYear));
}
}
abstract class Frequency
{
public abstract IEnumerable<DateTime> FilterDates(IEnumerable<DateTime> Dates);
}
class WeeklyFrequency : Frequency
{
/* every .. */
public DayOfWeek dayOfWeek;
public override string ToString() { return "every " + dayOfWeek; }
public override IEnumerable<DateTime> FilterDates(IEnumerable<DateTime> Dates)
{
return Dates.Where(date => (date.DayOfWeek == dayOfWeek));
}
}
class MonthlyFrequency : Frequency
{
/* every .. */
public Ordinal ordinal;
public DayOfWeek dayOfWeek;
/* .. of the month */
public override string ToString() { return "every " + ordinal + " " + dayOfWeek + " of the month"; }
public override IEnumerable<DateTime> FilterDates(IEnumerable<DateTime> Dates)
{
return Dates.Where(date => (date.DayOfWeek == dayOfWeek) && (int)ordinal == (date.Day - 1) / 7);
}
}
enum Ordinal { First, Second, Third, Fourth, Fifth }
#endregion
static Random random = new Random();
const double MUTATION_RATE = 0.3;
static Dictionary<Type, Type[]> subtypes = new Dictionary<Type, Type[]>();
static void Main()
{
// The input signifies the recurrence 'every first thursday of the month in 2010':
var input = new DateTime[] {new DateTime(2010,12,2), new DateTime(2010,11,4),new DateTime(2010,10,7),new DateTime(2010,9,2),
new DateTime(2010,8,5),new DateTime(2010,7,1),new DateTime(2010,6,3),new DateTime(2010,5,6),
new DateTime(2010,4,1),new DateTime(2010,3,4),new DateTime(2010,2,4),new DateTime(2010,1,7) };
for (int cTests = 0; cTests < 20; cTests++)
{
// Initialize with a random population
int treesize = 0;
var population = new DatePattern[] { (DatePattern)Generate(typeof(DatePattern), ref treesize), (DatePattern)Generate(typeof(DatePattern), ref treesize), (DatePattern)Generate(typeof(DatePattern), ref treesize) };
Run(input, new List<DatePattern>(population));
}
}
private static void Run(DateTime[] input, List<DatePattern> population)
{
var strongest = population[0];
int strongestFitness = int.MinValue;
int bestTry = int.MaxValue;
for (int cGenerations = 0; cGenerations < 300 && strongestFitness < -100; cGenerations++)
{
// Select the best individuals to survive:
var survivers = population
.Select(individual => new { Fitness = Fitness(input, individual), individual })
.OrderByDescending(pair => pair.Fitness)
.Take(5)
.Select(pair => pair.individual)
.ToArray();
population.Clear();
// The survivers are the foundation for the next generation:
foreach (var parent in survivers)
{
for (int cChildren = 0; cChildren < 3; cChildren++)
{
int treeSize = 1;
DatePattern child = (DatePattern)Mutate(parent, ref treeSize); // NB: procreation may also be done through crossover.
population.Add((DatePattern)child);
var childFitness = Fitness(input, child);
if (childFitness > strongestFitness)
{
bestTry = cGenerations;
strongestFitness = childFitness;
strongest = child;
}
}
}
}
Trace.WriteLine("Found best match with fitness " + Fitness(input, strongest) + " after " + bestTry + " generations: " + strongest);
}
private static object Mutate(object original, ref int treeSize)
{
treeSize = 0;
object replacement = Construct(original.GetType());
foreach (var field in original.GetType().GetFields())
{
object newFieldValue = field.GetValue(original);
int subtreeSize;
if (field.FieldType.IsEnum)
{
subtreeSize = 1;
if (random.NextDouble() <= MUTATION_RATE)
newFieldValue = ConstructRandomEnumValue(field.FieldType);
}
else if (field.FieldType == typeof(int))
{
subtreeSize = 1;
if (random.NextDouble() <= MUTATION_RATE)
newFieldValue = (random.Next(2) == 0
? Math.Min(int.MaxValue - 1, (int)newFieldValue) + 1
: Math.Max(int.MinValue + 1, (int)newFieldValue) - 1);
}
else
{
subtreeSize = 0;
newFieldValue = Mutate(field.GetValue(original), ref subtreeSize); // mutate pre-maturely to find out subtreeSize
if (random.NextDouble() <= MUTATION_RATE / subtreeSize) // makes high-level nodes mutate less.
{
subtreeSize = 0; // init so we can track the size of the subtree soon to be made.
newFieldValue = Generate(field.FieldType, ref subtreeSize);
}
}
field.SetValue(replacement, newFieldValue);
treeSize += subtreeSize;
}
return replacement;
}
private static object ConstructRandomEnumValue(Type type)
{
var vals = type.GetEnumValues();
return vals.GetValue(random.Next(vals.Length));
}
private static object Construct(Type type)
{
return type.GetConstructor(new Type[] { }).Invoke(new object[] { });
}
private static object Generate(Type type, ref int treesize)
{
if (type.IsEnum)
{
return ConstructRandomEnumValue(type);
}
else if (typeof(int) == type)
{
return random.Next(10) + 2005;
}
else
{
if (type.IsAbstract)
{
// pick one of the concrete subtypes:
var subtypes = GetConcreteSubtypes(type);
type = subtypes[random.Next(subtypes.Length)];
}
object newobj = Construct(type);
foreach (var field in type.GetFields())
{
treesize++;
field.SetValue(newobj, Generate(field.FieldType, ref treesize));
}
return newobj;
}
}
private static int Fitness(DateTime[] input, DatePattern individual)
{
var output = individual.Dates().ToArray();
var avgDateDiff = Math.Abs((output.Average(d => d.Ticks / (24.0 * 60 * 60 * 10000000)) - input.Average(d => d.Ticks / (24.0 * 60 * 60 * 10000000))));
return
-individual.ToString().Length // succinct patterns are preferred.
- input.Except(output).Count() * 300 // Forgetting some of the dates is bad.
- output.Except(input).Count() * 3000 // Spurious dates cause even more confusion to the user.
- (int)(avgDateDiff) * 30000; // The difference in average date is the most important guide.
}
private static Type[] GetConcreteSubtypes(Type supertype)
{
if (subtypes.ContainsKey(supertype))
{
return subtypes[supertype];
}
else
{
var types = AppDomain.CurrentDomain.GetAssemblies().ToList()
.SelectMany(s => s.GetTypes())
.Where(p => supertype.IsAssignableFrom(p) && !p.IsAbstract).ToArray();
subtypes.Add(supertype, types);
return types;
}
}
}
Hope this gets you on track. Be sure to share your actual solution somewhere; I think it will be quite useful in lots of scenarios.
If your purpose is to generate human-readable descriptions of the pattern, as in your "Every Monday in November", then you probably want to start by enumerating the possible descriptions. Descriptions can be broken down into frequency and bounds, for example,
Frequency:
Every day ...
Every other/third/fourth day ...
Weekdays/weekends ...
Every Monday ...
Alternate Mondays ...
The first/second/last Monday ...
...
Bounds:
... in January
... between 25 March and 25 October
...
There won't be all that many of each, and you can check for them one by one.
What I would do:
Create samples of the data
Use a clustering algorithm
Generate samples using the algorithm
Creating a fitness function to measure how well it correlates to the full data set. The clustering algorithm will come up with either 0 or 1 suggestions and you can meassure it against how well it fits in with the full set.
Elementate/merge the occurrence with the already found sets and rerun this algorithm.
Looking at that you may want to use either Simulated Annealing, or an Genetic Algorithm. Also, if you have the descriptions, you may want to compare the descriptions to generate a sample.
You could access the system date or system dateandtime and construct crude calendar points in memory based on the date and the day of the week as returned by the call or function result. Then use the number of days in relevant months to sum them and add on the number of days of the day variable in the input and/or access the calendar point for the relevant week starting sunday or monday and calculate or increment index forward to the correct day. Construct text string using fixed characters and insert the relevant variable such as the full name of the day of the week as required. There may be multiple traversals needed to obtain all the events of which the occurrences are to be displayed or counted.
First, find a sequence, if it exists:
step = {day,month,year}
period=0
for d = 1 to dates.count-1
interval(d,step)=datedifference(s,date(d),date(d+1))
next
' Find frequency with largest interval
for s = year downto day
found=true
for d = 1 to dates.count-2
if interval(d,s)=interval(d+1,s) then
found=false
exit for
end if
next
if found then
period=s
frequency=interval(1,s)
exit for
end if
next
if period>0
Select case period
case day
if frequency mod 7 = 0 then
say "every" dayname(date(1))
else
say "every" frequency "days"
end if
case month
say "every" frequency "months on day" daynumber(date(1))
case years
say "every" frequency "years on" daynumber(date(1)) monthname(date(1))
end select
end if
Finally, deal with "in November", "from 2007 to 2010" etc., should be obvious.
HTH
I like #arjen answer but I don't think there is any need for complex algorithm. This is so so simple. If there is a pattern, there is a pattern... therefore a simple algorithm would work. First we need to think of the types of patterns we are looking for: daily, weekly, monthly and yearly.
How to recognize?
Daily: there is a record every day
Weekly: there is a record every week
Monthly: there is a record every month
Yearly: there is a record every year
Difficult? No. Just count how many repetitions you have and then classify.
Here is my implementation
RecurrencePatternAnalyser.java
public class RecurrencePatternAnalyser {
// Local copy of calendars by add() method
private ArrayList<Calendar> mCalendars = new ArrayList<Calendar>();
// Used to count the uniqueness of each year/month/day
private HashMap<Integer, Integer> year_count = new HashMap<Integer,Integer>();
private HashMap<Integer, Integer> month_count = new HashMap<Integer,Integer>();
private HashMap<Integer, Integer> day_count = new HashMap<Integer,Integer>();
private HashMap<Integer, Integer> busday_count = new HashMap<Integer,Integer>();
// Used for counting payments before due date on weekends
private int day_goodpayer_ocurrences = 0;
private int day_goodPayer = 0;
// Add a new calendar to the analysis
public void add(Calendar date)
{
mCalendars.add(date);
addYear( date.get(Calendar.YEAR) );
addMonth( date.get(Calendar.MONTH) );
addDay( date.get(Calendar.DAY_OF_MONTH) );
addWeekendDays( date );
}
public void printCounts()
{
System.out.println("Year: " + getYearCount() +
" month: " + getMonthCount() + " day: " + getDayCount());
}
public RecurrencePattern getPattern()
{
int records = mCalendars.size();
if (records==1)
return null;
RecurrencePattern rp = null;
if (getYearCount()==records)
{
rp = new RecurrencePatternYearly();
if (records>=3)
rp.setConfidence(1);
else if (records==2)
rp.setConfidence(0.9f);
}
else if (getMonthCount()==records)
{
rp = new RecurrencePatternMonthly();
if (records>=12)
rp.setConfidence(1);
else
rp.setConfidence(1-(-0.0168f * records + 0.2f));
}
else
{
calcDaysRepetitionWithWeekends();
if (day_goodpayer_ocurrences==records)
{
rp = new RecurrencePatternMonthly();
rp.setPattern(RecurrencePattern.PatternType.MONTHLY_GOOD_PAYER);
if (records>=12)
rp.setConfidence(0.95f);
else
rp.setConfidence(1-(-0.0168f * records + 0.25f));
}
}
return rp;
}
// Increment one more year/month/day on each count variable
private void addYear(int key_year) { incrementHash(year_count, key_year); }
private void addMonth(int key_month) { incrementHash(month_count, key_month); }
private void addDay(int key_day) { incrementHash(day_count, key_day); }
// Retrieve number of unique entries for the records
private int getYearCount() { return year_count.size(); }
private int getMonthCount() { return month_count.size(); }
private int getDayCount() { return day_count.size(); }
// Generic function to increment the hash by 1
private void incrementHash(HashMap<Integer, Integer> var, Integer key)
{
Integer oldCount = var.get(key);
Integer newCount = 0;
if ( oldCount != null ) {
newCount = oldCount;
}
newCount++;
var.put(key, newCount);
}
// As Bank are closed during weekends, some dates might be anticipated
// to Fridays. These will be false positives for the recurrence pattern.
// This function adds Saturdays and Sundays to the count when a date is
// Friday.
private void addWeekendDays(Calendar c)
{
int key_day = c.get(Calendar.DAY_OF_MONTH);
incrementHash(busday_count, key_day);
if (c.get(Calendar.DAY_OF_WEEK) == Calendar.FRIDAY)
{
// Adds Saturday
c.add(Calendar.DATE, 1);
key_day = c.get(Calendar.DAY_OF_MONTH);
incrementHash(busday_count, key_day);
// Adds Sunday
c.add(Calendar.DATE, 1);
key_day = c.get(Calendar.DAY_OF_MONTH);
incrementHash(busday_count, key_day);
}
}
private void calcDaysRepetitionWithWeekends()
{
Iterator<Entry<Integer, Integer>> it =
busday_count.entrySet().iterator();
while (it.hasNext()) {
#SuppressWarnings("rawtypes")
Map.Entry pair = (Map.Entry)it.next();
if ((int)pair.getValue() > day_goodpayer_ocurrences)
{
day_goodpayer_ocurrences = (int) pair.getValue();
day_goodPayer = (int) pair.getKey();
}
//it.remove(); // avoids a ConcurrentModificationException
}
}
}
RecurrencePattern.java
public abstract class RecurrencePattern {
public enum PatternType {
YEARLY, MONTHLY, WEEKLY, DAILY, MONTHLY_GOOD_PAYER
}
public enum OrdinalType {
FIRST, SECOND, THIRD, FOURTH, FIFTH
}
protected PatternType pattern;
private float confidence;
private int frequency;
public PatternType getPattern() {
return pattern;
}
public void setPattern(PatternType pattern) {
this.pattern = pattern;
}
public float getConfidence() {
return confidence;
}
public void setConfidence(float confidence) {
this.confidence = confidence;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}
RecurrencePatternMonthly.java
public class RecurrencePatternMonthly extends RecurrencePattern {
private boolean isDayFixed;
private boolean isDayOrdinal;
private OrdinalType ordinaltype;
public RecurrencePatternMonthly()
{
this.pattern = PatternType.MONTHLY;
}
}
RecurrencePatternYearly.java
public class RecurrencePatternYearly extends RecurrencePattern {
private boolean isDayFixed;
private boolean isMonthFixed;
private boolean isDayOrdinal;
private OrdinalType ordinaltype;
public RecurrencePatternYearly()
{
this.pattern = PatternType.YEARLY;
}
}
Main.java
public class Algofin {
static Connection c = null;
public static void main(String[] args) {
//openConnection();
//readSqlFile();
RecurrencePatternAnalyser r = new RecurrencePatternAnalyser();
//System.out.println(new GregorianCalendar(2015,1,30).get(Calendar.MONTH));
r.add(new GregorianCalendar(2015,0,1));
r.add(new GregorianCalendar(2015,0,30));
r.add(new GregorianCalendar(2015,1,27));
r.add(new GregorianCalendar(2015,3,1));
r.add(new GregorianCalendar(2015,4,1));
r.printCounts();
RecurrencePattern rp;
rp=r.getPattern();
System.out.println("Pattern: " + rp.getPattern() + " confidence: " + rp.getConfidence());
}
}
I think you'll have to build it, and I think it will be a devil in the details kind of project. Start by getting much more thorough requirements. Which date patterns do you want to recognize? Come up with a list of examples that you want your algorithm to successfully identify. Write your algorithm to meet your examples. Put your examples in a test suite so when you get different requirements later you can make sure you didn't break the old ones.
I predict you will write 200 if-then-else statements.
OK, I do have one idea. Get familiar with the concepts of sets, unions, coverage, intersection and so on. Have a list of short patterns that you search for, say, "Every day in October", "Every day in November", and "Every day in December." If these short patterns are contained within the set of dates, then define a union function that can combine shorter patterns in intelligent ways. For example, let's say you matched the three patterns I mention above. If you Union them together you get, "Every day in October through December." You could aim to return the most succinct set of unions that cover your set of dates or something like that.
Have a look at your favourite calendar program. See what patterns of event recurrence it can generate. Reverse engineer them.