Object Oriented Design for payments & bills? - ruby

I'm struggling a little bit with how to design a system to keep track of bills and payments. I currently have two functioning objects (Bill and Payment), but can't settle on a way to keep track of the accounting between them.
Fundamentally, I just need to know which specific bills have been paid off and the total balance after all of the accounting. I figure that I can do this two ways:
1) Build a separate accounting table where I keep track of each transaction, mapping a specific Bill to a specific payment. This would let me easily look up in the database how much is remaining on a particular bill. The downside is that it seems like a lot of added complexity, as I need to create a new record in this table whenever a new object is created.
2) Try to just write logic to calculate on-the-fly how much is remaining on a particular Bill by looking through the whole transaction history and doing the accounting. On the plus side, this is guaranteed to always be correct, but it seems kind of wrong to continue to do the same calculation over and over to get to what should be a static value.
Has anyone faced a challenge like this in the past, and if so, how did you solve it? Is there some best practice that I'm just missing?

One table: transactions. Bills have a positive value, payments have a negative value. You can give it a column for transaction_type if you want (Invoice, Payment, Credit, Refund), and you can even use Rails STI on that column if you really feel like it. Other useful columns - number, payment_type (credit/cash/check/eft), date.
The remaining balance is just simply a sum of all the values. If the balance is negative, a credit is owed.
If you really need to apply payments to particular bills (a practice I'm not entirely sure is correct accounting) you can have a secondary table (paid_bills) that maps payments to bills, with an amount; presumably the sum of all of the paid_bills.payment_id could not be more than the payment itself.
When displaying things for users, you can always flip the sign - Show a payment as a positive number, and when a payment form submits a positive number flip it back negative.
This is the best way I have found over the years to do this while maintaining best accounting practices.

If you can use a database, create only a table for bills and add a field of type boolean 'paid'. When you want to know if a bill has been paid check this field, when you want to know the global balance add the amounts of the paid bills and substract the unpaid ones.
If not, you could use a static var into any of those classes to keep the global balance. And also a field for paid or not on Bills or a pointer to the Paid object (which would be initialized to null and would point the Paid object once this would be created)

A Bill has_many Payments, thus a Payment has_one Bill (and your payments table will have a bill_id field). This is the only sensible way to model this, I'd argue.
Don't worry about "continue to do the same calculation over and over". Get your object model right, and then worry about optimization later on. CPU time is cheap; human brainpower and ability to manage complexity is not! If you really get to the point where this repeated calculation is a concern (which is, frankly, unlikely), there's plenty of options for speeding it up without violating the fundamental relationship these models have.

Related

Which is the best approach for a dimension (SCD-2 or a SCD-1 + a whole new dimension)

Let´s say I have the following situation:
A dimension Product with some attributes that aren't volatile (Description and Diameter - they can only be changed by a SCD-1 change for correction) and a attribute that can be volatile (Selling Group, it can change over time for the same product).
So, when a change occurs in these volatile attributes of one product, I need to somehow track them.
I have come with these 2 approaches:
For both: keep using SCD-1 for non-volatile attributes.
Approach #1: Use SCD-2 in product_dim only for volatile attributes.
Approach #2: Make Selling Group a whole new dimension and every sell will track the current value in moment of ETL. No need for SCD-2 here.
I am new in Data Warehousing and I'm trying to understand which is better and why. One of my aims is to use a OLAP software to read all of this stuff.
It all comes to the business needs of your model. I don't know the business enough from your question, but as a rule of thumb if you wanna do analysis by Selling Group (i.e: Total Quantity of all products sold by Selling Group X) then you should create as a separate dimension. So in this case approach#2 is correct.
Considering general concepts and assuming a selling group is some kind of group of products, it doesn't make sense having it as an attribute of a product.
If you want to learn more about Dimensional Modelling I'd suggest looking into Ralph Kimball's work if you haven't done yet. An excellent resource is his book The Data Warehouse Toolkit which covers your question and many more techniques. It's a nice tool to have over your desk when questions like this pop up. Most of the experienced Data Modellers have a copy of it to consult every now and then.

Recommender: Log user actions & datamine it – good solution [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am planning to log all user actions like viewed page, tag etc.
What would be a good lean solution to data-mine this data to get recommendations?
Say like:
Figure all the interests from the viewed URL (assuming I know the
associated tags)
Find out people who have similar interests. E.g. John & Jane
viewed URLS related to cars etc
Edit:
It’s really my lack of knowledge in this domain that’s a limiting factor to get started.
Let me rephrase.
Lets say a site like stackoverflow or Quora. All my browsing history going through different questions are recorded and Quora does a data mining job of looking through it and populating my stream with related questions. I go through questions relating to parenting and the next time I login I see streams of questions about parenting. Ditto with Amazon shopping. I browse watches & mixers and two days later they send me a mail of related shopping items that I am interested.
My question is, how do they efficiently store these data and then data mine it to show the next relevant set of data.
Datamining is a method that needs really enormous amounts of space for storage and also enormous amounts of computing power.
I give you an example:
Imagine, you are the boss of a big chain of supermarkets like Wal-Mart, and you want to find out how to place your products in your market so that consumers spend lots of money when they enter your shops.
First of all, you need an idea. Your idea is to find products of different product-groups that are often bought together. If you have such a pair of products, you should place those products as far away as possible. If a customer wants to buy both, he/she has to walk through your whole shop and on this way you place other products that might fit well to one of that pair, but are not sold as often. Some of the customers will see this product and buy it, and the revenue of this additional product is the revenue of your datamining-process.
So you need lots of data. You have to store all data that you get from all buyings of all your customers in all your shops. When a person buys a bottle of milk, a sausage and some bread, then you need to store what goods have been sold, in what amount, and the price. Every buying needs its own ID if you want to get noticed that the milk and the sausage have been bought together.
So you have a huge amount of data of buyings. And you have a lot of different products. Let’s say, you are selling 10.000 different products in your shops. Every product can be paired with every other. This makes 10,000 * 10,000 / 2 = 50,000,000 (50 Million) pairs. And for each of this possible pairs you have to find out, if it is contained in a buying. But maybe you think that you have different customers at a Saturday afternoon than at a Wednesday late morning. So you have to store the time of buying too. Maybee you define 20 time slices along a week. This makes 50M * 20 = 1 billion records. And because people in Memphis might buy different things than people in Beverly Hills, you need the place too in your data. Lets say, you define 50 regions, so you get 50 billion records in your database.
And then you process all your data. If a customer did buy 20 products in one buying, you have 20 * 19 / 2 = 190 pairs. For each of this pair you increase the counter for the time and the place of this buying in your database. But by what should you increase the counter? Just by 1? Or by the amount of the bought products? But you have a pair of two products. Should you take the sum of both? Or the maximum? Better you use more than one counter to be able to count it in all ways you can think of.
And you have to do something else: Customers buy much more milk and bread then champagne and caviar. So if they choose arbitrary products, of course the pair milk-bread has a higher count than the pair champagne-caviar. So when you analyze your data, you must take care of some of those effects too.
Then, when you have done this all you do your datamining-query. You select the pair with the highest ratio of factual count against estimated count. You select it from a database-table with many billion records. This might need some hours to process. So think carefully if your query is really what you want to know before you submit your query!
You might find out that in rural environment people on a Saturday afternoon buy much more beer together with diapers than you did expect. So you just have to place beer at one end of the shop and diapers on the other end, and this makes lots of people walking through your whole shop where they see (and hopefully buy) many other things they wouldn't have seen (and bought) if beer and diapers was placed close together.
And remember: the costs of your datamining-process are covered only by the additional bargains of your customers!
conclusion:
You must store pairs, triples of even bigger tuples of items which will need a lot of space. Because you don't know what you will find at the end, you have to store every possible combination!
You must count those tuples
You must compare counted values with estimated values
Store each transaction as a vector of tags (i.e. visited pages containing these tags). Then do association analysis (i can recommend Weka) on this data to find associations using the "Associate" algorithms available. Effectiveness depends on a lot of different things of course.
One thing that a guy at my uni told me was that often you can simply create a vector of all the products that one person has bought and compare this with other peoples vectors and get decent recommendations. That is represent users as the products they buy or the pages they visit and do e.g. Jaccard similarity calculations. If the "people" are similar then look at products they bought that this person didn't. (Probably those that are the most common in the population of similar people)
Storage is a whole different ballgame, there are many good indices for vector data such as KD trees implemented in different RDBMs.
Take a course in datamining :) or just read one of the excellent textbooks available (I have read Introduction to data mining by Pang-Ning tan et al and its good.)
And regarding storing all the pairs of products etc, of course this is not done and more efficient algorithms based on support and confidence are used to prune the search space.
I should say recommendation is machine learning issue.
how to store the datas depends on which algorithm you chose.

Efficient searching in huge multi-dimensional matrix

I am looking for a way to search in an efficient way for data in a huge multi-dimensional matrix.
My application contains data that is characterized by multiple dimensions. Imagine keeping data about all sales in a company (my application is totally different, but this is just to demonstrate the problem). Every sale is characterized by:
the product that is being sold
the customer that bought the product
the day on which it has been sold
the employee that sold the product
the payment method
the quantity sold
I have millions of sales, done on thousands of products, by hundreds of employees, on lots of days.
I need a fast way to calculate e.g.:
the total quantity sold by an employee on a certain day
the total quantity bought by a customer
the total quantity of a product paid by credit card
...
I need to store the data in the most detailed way, and I could use a map where the key is the sum of all dimensions, like this:
class Combination
{
Product *product;
Customer *customer;
Day *day;
Employee *employee;
Payment *payment;
};
std::map<Combination,quantity> data;
But since I don't know beforehand which queries are performed, I need multiple combination classes (where the data members are in different order) or maps with different comparison functions (using a different sequence to sort on).
Possibly, the problem could be simplified by giving each product, customer, ... a number instead of a pointer to it, but even then I end up with lots of memory.
Are there any data structures that could help in handling this kind of efficient searches?
EDIT:
Just to clarify some things: On disk my data is stored in a database, so I'm not looking for ways to change this.
The problem is that to perform my complex mathematical calculations, I have all this data in memory, and I need an efficient way to search this data in memory.
Could an in-memory database help? Maybe, but I fear that an in-memory database might have a serious impact on memory consumption and on performance, so I'm looking for better alternatives.
EDIT (2):
Some more clarifications: my application will perform simulations on the data, and in the end the user is free to save this data or not into my database. So the data itself changes the whole time. While performing these simulations, and the data changes, I need to query the data as explained before.
So again, simply querying the database is not an option. I really need (complex?) in-memory data structures.
EDIT: to replace earlier answer.
Can you imagine you have any other possible choice besides running qsort( ) on that giant array of structs? There's just no other way that I can see. Maybe you can sort it just once at time zero and keep it sorted as you do dynamic insertions/deletions of entries.
Using a database (in-memory or not) to work with your data seems like the right way to do this.
If you don't want to do that, you don't have to implement lots of combination classes, just use a collection that can hold any of the objects.

How to manage transactions, debt, interest and penalty?

I am making a BI system for a bank-like institution. This system should manage credit contracts, invoices, payments, penalties and interest.
Now, I need to make a method that builds an invoice. I have to calculate how much the customer has to pay right now. He has a debt, which he has to pay for. He also has to pay for the interest. If he was ever late with due payment, penalties are applied for each day he's late.
I thought there were 2 ways of doing this:
By having only 1 original state - the contract's original state. And each time to compute the monthly payment which the customer has to make, consider the actual, made payments.
By constantly making intermediary states, going from the last intermediary state, and considering only the events that took place between the time of these 2 intermediary states. This means having a job that performs periodically (daily, monthly), that takes the last saved state, apply the changes (due payments, actual payments, changes in global constans like the penalty rate which is controlled by the Central Bank), and save the resulting state.
The benefits of the first variant:
Always actual. If changes were made with a date from the past (a guy came with a paid invoice 5 days after he made the payment to the bank), they will be correctly reflected in the results.
The flaws of the first variant:
Takes long to compute
Documents printed with the current results may differ if the correct data changes due to operations entered with a back date.
The benefits of the second variant:
Works fast, and aggregated data is always available for search and reports.
Simpler to compute
The flaws of the second variant:
Vulnerable to failed jobs.
Errors in the past propagate until the end, to the final results.
An intermediary result cannot be changed if new data from past transactions arrives (it can, but it's hard, and with many implications, so I'd rather mark it as Tabu)
Jobs cannot be performed successfully and without problems if an unfinished transaction exists (an issued invoice that wasn't yet paid)
Is there any other way? Can I combine the benefits from these two? Which one is used in other similar systems you've encountered? Please share any experience.
Problems of this nature are always more complicated than they first appear. This
is a consequence of what I like to call the Rumsfeldian problem of the unknown unknown.
Basically, whatever you do now, be prepared to make adjustments for arbitrary future rules.
This is a tough proposition. some future possibilities that may have a significant impact on
your calculation model are back dated payments, adjustments and charges.
Forgiven interest periods may also become an issue (particularly if back dated). Requirements
to provide various point-in-time (PIT) calculations based on either what was "known" at
that PIT (past view of the past) or taking into account transactions occurring after the reference PIT that
were back dated to a PIT before the reference (current view of the past). Calculations of this nature can be
a real pain in the head.
My advice would be to calculate from "scratch" (ie. first variant). Implement optimizations (eg. second variant) only
when necessary to meet performance constraints. Doing calculations from the beginning is a compute intensive
model but is generally more flexible with respect to accommodating unexpected left turns.
If performance is a problem but the frequency of complicating factors (eg. back dated transactions)
is relatively low you could explore a hybrid model employing the best of both variants. Here you store the
current state and calculate forward
using only those transactions that posted since the last stored state to create a new current state. If you hit a
"complication" re-do the entire account from the
beginning to reestablish the current state.
Being able to accommodate the unexpected without triggering a re-write is probably more important in the long run
than shaving calculation time right now. Do not place restrictions on your computation model until you have to. Saving
current state often brings with it a number of built in assumptions and restrictions that reduce wiggle room for
accommodating future requirements.

Amount to show on a bill form

My company is currently setting up an online billing portal for our customers. I was curious as this question went back and forth a bit between developers and testers: When showing the input form for the amount a customer wishes to pay, do you set the default to be the max amount owed by the customer? Taking a look around at sites when I pay my own bills I tend to see three different setups:
Max amount owed is in the input
Nothing is put in
Button options to pay off max, minimum, or your own input
In general we agree that your max and min amount should be shown on the screen somewhere (it's annoying to go look for your bill when the site can show amount owed). Is there a standard or what seems most friendly? Option 1 is nice cause it's all there but might annoy a customer a bit or a customer might accidentally pay off a large amount without realizing it (sounds dumb but you know it'll happen to someone). Option 2 gives the feeling of payment control to a customer but annoys them with having to input an amount everytime. Option 3 looks to be a middle ground but seems like a bit more unneeded work and upkeep when 1 and 2 are simpler and cleaner to look at.
I'd instinctively go for (1) - default to full amount. However, I grew up in an environment where debt wasn't taken lightly.
You should have a confirmation page with the amount payable anyway - since I might enter a wrong amount and press enter. So the "paying to much" argument doesn't really cut it.
Using the full amount as default can be a slight nudge towards paying all of it. With a major volume of payments, this might be notable.
I would not default to smaller amount. A customer might overlook that it's not the full amount, consider the deal done and miss the further payments. With a good layout ("Amount Remaining") that can be avoided in almost all cases - but with a large trade volume, you might create a few annoyed customers.
Can you query your own payment system to see what kind of payments your customers are making most often? Then set that as your default. I'd give them all options, though, including max, min, and custom.

Resources