optimal algorithm for adding chosen table rows to the database - algorithm

I am trying to apply a save method in a backing Java bean which will take the table rows that are selected and save them in the database. However, let's say the user changes his choices a little (changes 1 out of his 5 choices). I am wondering about the algorithm I am going to apply if it matters in efficiency in the long term or not....
here it goes :
every time the user clicks the button (save) I will delete all his previous choices and insert all the current choices to the database
once the button is clicked --- see which rows the user de-selected and delete their rows from the database and add the new ones???
is choice number 2 better or not than choice 1 .......or it doesn't really matter for number of choices that will not exceed 15 ??
Thanks

I would definitely go for option 2, try to figure out the minimum number of operations you need to perform.
It is, however, fairly normal to fall back to option 1 in times of deadlines etc. since it is a bit easier to implement.
There shouldn't, however, be that much harder to figure out what the changes are, since it doesn't seem to me that you're changing the rows themselves. Either you're deleting ones that had their checkmark cleared, or you insert ones that had their checkmark set.
Simply store a list of primary key values of whatever is in the database, then compare to that list when you iterate through the new list when the user wants to persist the changes.
A minimal work solution here would also mean you would be a bit more future-proof in terms of refactoring, changes, or additions. For instance, what if there in the future is data attached to any of those rows. You would need to keep that as well. Generally I'm a bit opposed to writing code just for the sake of "what if", but here I feel it's more like "why wouldn't you ..." than that.
So my advice is go for option 2. Not much more work.

Related

Is it wise to leave holes in a table?

Caution: May be hard to understand
So let's say I have a table which needs to hold indexes that correspond to a different table holding objects. For the sake of this question, I'm going to refer to these indexes as "idxs". I would like to have it so that it's viable to remove a specific idx quickly, but also to be able to loop over all the idxs quickly as well.
I want this to go as fast as possible, and I have a few options to choose from:
Note: I would say there may be anywhere from 500 - 3000 objects at maximum, which means up to that many idxs to refer to them. However, there probably won't be more than 200 idxs stored in my table at a time.
I can store each idx by pushing them onto the end of the table, which makes it easy to loop over them, but it creates a problem with removing them since I have to loop through the table to find the right one and then remove it.
I can make the table into a sort of set as described here (Search for an item in a Lua list) which makes it extremely fast to remove the idxs but not so much to loop over them, since I may have to loop over all the empty gaps in the table which could be up to thousands long. Also, I'm actually going to have lots of these tables holding idxs so I don't think having that many tables of that length is a great idea.
There is possibly a third solution or maybe more but I'm not sure. What would be the best thing to do in this situation? Or is this a sign that I should redesign my program entirely?
As a second note, I would like to mention that removing idxs probably happens more times each frame than when I need to loop over them.
I can give more context if needed.

Falling Objects Corona

I am trying to create a falling objects game. However, I am having hard time to create level for this game. It does not change scene or anything but I want it to be harder along the way. There are bad and good items falling and the character should eat good items and avoid from bad items.
I am creating those bad and good items with a createItem function and calling this function with two timer.performWithDelay. The items are falling randomly. One for the good items and one for the bad items. However, sometimes bad item comes under the good item and it is impossible to catch the good item. How can I stop that? I added a collision filter to let those items pass through each other so that's why they come as one under the another.
Here is how I call createItem with two timers:
goodTimer = timer.performWithDelay(1000, function() createItem(goodItem[math.random(1,#goodItem)],1) end, 0 )
badTimer = timer.performWithDelay(5000, function() createItem(badItem[math.random(1,#badItem)],0) end, 0 )
If you want a deterministic behaviour (e.g., always having the option to get a good item), you would have to re-define the random layout such that it cannot affect this behaviour in any case. From your description, you are just taking collisions into account, what does not represent an accurate correction for pure randomness.
In this situation, I would:
set up a "security area" surrounding each item. For example: 2
times height/width of the given item.
make the generation of each item depend on other (existing) items. For example: before creating a good item, your algorithm should check if there is already an existing bad one which might interfere with it. That is, if, by bringing the aforementioned security area into account, there is any chance that both items would be in a situation you don't want. This analysis should be easily performed by calculating the expected collision point of the corresponding security areas under the current velocities. In case of finding out that a collision might occur, the generation of the item would be delayed, completely ignored or its type might be changed (from bad to good or vice versa).
set the collision rules but just as an in-the-safest-side post-correction, as far as most of the situations should be accounted for by the aforementioned points.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

how to recycle images, but not show anyone the same image twice?

I'm writing a web app similar to wtfimages.com in that one visitor should never (or rarely) see the same thing twice, but different visitors can see the same thing. Ideally, this would span visits, so that when Bob comes back tomorrow he doesn't see today's things again either.
Three first guesses:
have enough unique things that it's unlikely any user will draw enough items to repeat
actually track each user somehow and log what he has seen
have client-side Javascript request things by id according to a pseudorandom sequence seeded with something unique to the visitor and session (e.g., IP and time)
Edit: So the question is, which of these three is the best solution? Is there a better one?
Note: I suspect this question is the web 2.0 equivalent of "how do I implement strcpy?", where everybody worth his salt knows K&R's idiomatic while(*s++ = *t++) ; solution. If that's the case, please point me to the web 2.0 K&R, because this specific question is immaterial. I just wanted a a "join the 21st century" project to learn CGI scripting with Python and AJAX with jQuery.
The simplest implementation I can think of would be to make a circular linked list, and then start individual users at random offsets in the linked list. You are guaranteed that they will see every image there is to see before they will see any image twice.
Technically, it only needs to be a linked list in a conceptual sense. For example, you could just use the database identifiers of the various items and wrap around once you've hit the last one.
There are complexity problems with other solutions. For example, if you want it to be a different order for each person, that requires permuting the elements in some way. But then you have to store that permutation, so as to guarantee that people see things in different orders. That's going to take up a lot of space. It will also require you to update everybody's permutations if you add or remove an image to the list of things to see, which is yet more work.
A compromise solution that still allows you to guarantee a person sees every image before they see any image twice while still varying things among people might be something like this:
Using some hash function H (say, MD5), take the hash of each image, and store the image with a filename equal to the digest (e.g. 194db8c5[...].jpg).
Decide on a number N. This will be the number of different paths that a randomly selected person could take to traverse all the images. For example, if you pick N = 10, each person will take one of 10 possible distinct journeys through the images. Don't pick an N larger than the digest size of H (for MD5, this is 16; for SHA-1, it's 64).
Make N different permutations of the image list, with the ith such permutation being generated by rotating the characters in each file name i characters to the left, and then sorting all the entries. (For example, a file originally named abcdef with i == 4 will become efabcd. Now sort all the files that have been transformed in this way, and you have a distinct list.)
Randomly assign to each user a number r from 0 .. N - 1 inclusive. They now see the images in the ordering specified by r.
Ultimately, this seems like a lot of work. I'd say just suck it up and make it random, accept that people will occasionally see the same image again, and move on.
Personally I would just store a cookie on the user's machine which holds all the ID's of what he's seen. That way you can keep the 'randomness' and not have to show the items in sequential order as John Feminella's otherwise great solution suggests.
Applying the cookie data in an SQL query would also be trivial: say that you have a comma separated ID's in the cookie, you can just do this (in PHP):
"SELECT image FROM images WHERE id NOT IN(".$_COOKIE['myData'].") ORDER BY RAND() LIMIT 1"
Note that this is just an simple example, you should of course escape the cookie data properly and there might be more efficient ways to select a random entry from a table.
Using a cookie also makes it possible to start off where the user left off the previous time. And cookie sizes won't probably be an issue, you can hold a lot of ID's in 4KB which is (usually) the maximum size of cookie files.
EDIT
If your cookie data looks like this:
$_COOKIE['myData'] == '1,6,19,200,70,16';
You can safely use that data in a SQL query with:
$ids = array_map('mysql_real_escape_string', explode(',', $_COOKIE['myData']));
$query = "SELECT image FROM images WHERE id NOT IN('".implode("', '", $ids)."') ORDER BY RAND() LIMIT 1"
What this will do is that it splits the ID string into individual ID's, then runs mysql_real_escape_string to each of them, then implodes them with quotes so that the query becomes:
$query == "SELECT image FROM images WHERE id NOT IN('1', '6', '19', '200', '70', '16') ORDER BY RAND() LIMIT 1"
So $_COOKIE[] variables are just like any other variable, and you must do same precautions for them as with other data.
You have 2 class of solutions:
state-less
state-full
You need to pick one: (#1) is of course not guaranteed (i.e. probability of showing same image to user is variable) whilst (#2) allows you guarantees (depending on the implementation of course).
Here is another suggestion you might want to consider:
Maintain state on the Client-Side through HTML5 localstorage (when available): the value of this option will only continue to increase as Web Browsers with HTML5 support increases.

Algorithm for most recently/often contacts for auto-complete?

We have an auto-complete list that's populated when an you send an email to someone, which is all well and good until the list gets really big you need to type more and more of an address to get to the one you want, which goes against the purpose of auto-complete
I was thinking that some logic should be added so that the auto-complete results should be sorted by some function of most recently contacted or most often contacted rather than just alphabetical order.
What I want to know is if there's any known good algorithms for this kind of search, or if anyone has any suggestions.
I was thinking just a point system thing, with something like same day is 5 points, last three days is 4 points, last week is 3 points, last month is 2 points and last 6 months is 1 point. Then for most often, 25+ is 5 points, 15+ is 4, 10+ is 3, 5+ is 2, 2+ is 1. No real logic other than those numbers "feel" about right.
Other than just arbitrarily picked numbers does anyone have any input? Other numbers also welcome if you can give a reason why you think they're better than mine
Edit: This would be primarily in a business environment where recentness (yay for making up words) is often just as important as frequency. Also, past a certain point there really isn't much difference between say someone you talked to 80 times vs say 30 times.
Take a look at Self organizing lists.
A quick and dirty look:
Move to Front Heuristic:
A linked list, Such that whenever a node is selected, it is moved to the front of the list.
Frequency Heuristic:
A linked list, such that whenever a node is selected, its frequency count is incremented, and then the node is bubbled towards the front of the list, so that the most frequently accessed is at the head of the list.
It looks like the move to front implementation would best suit your needs.
EDIT: When an address is selected, add one to its frequency, and move to the front of the group of nodes with the same weight (or (weight div x) for courser groupings). I see aging as a real problem with your proposed implementation, in that it requires calculating a weight on each and every item. A self organizing list is a good way to go, but the algorithm needs a bit of tweaking to do what you want.
Further Edit:
Aging refers to the fact that weights decrease over time, which means you need to know each and every time an address was used. Which means, that you have to have the entire email history available to you when you construct your list.
The issue is that we want to perform calculations (other than search) on a node only when it is actually accessed -- This gives us our statistical good performance.
This kind of thing seems similar to what is done by firefox when hinting what is the site you are typing for.
Unfortunately I don't know exactly how firefox does it, point system seems good as well, maybe you'll need to balance your points :)
I'd go for something similar to:
NoM = Number of Mail
(NoM sent to X today) + 1/2 * (NoM sent to X during the last week)/7 + 1/3 * (NoM sent to X during the last month)/30
Contacts you did not write during the last month (it could be changed) will have 0 points. You could start sorting them for NoM sent in total (since it is on the contact list :). These will be showed after contacts with points > 0
It's just an idea, anyway it is to give different importance to the most and just mailed contacts.
If you want to get crazy, mark the most 'active' emails in one of several ways:
Last access
Frequency of use
Contacts with pending sales
Direct bosses
Etc
Then, present the active emails at the top of the list. Pay attention to which "group" your user uses most. Switch to that sorting strategy exclusively after enough data is collected.
It's a lot of work but kind of fun...
Maybe count the number of emails sent to each address. Then:
ORDER BY EmailCount DESC, LastName, FirstName
That way, your most-often-used addresses come first, even if they haven't been used in a few days.
I like the idea of a point-based system, with points for recent use, frequency of use, and potentially other factors (prefer contacts in the local domain?).
I've worked on a few systems like this, and neither "most recently used" nor "most commonly used" work very well. The "most recent" can be a real pain if you accidentally mis-type something once. Alternatively, "most used" doesn't evolve much over time, if you had a lot of contact with somebody last year, but now your job has changed, for example.
Once you have the set of measurements you want to use, you could create an interactive apoplication to test out different weights, and see which ones give you the best results for some sample data.
This paper describes a single-parameter family of cache eviction policies that includes least recently used and least frequently used policies as special cases.
The parameter, lambda, ranges from 0 to 1. When lambda is 0 it performs exactly like an LFU cache, when lambda is 1 it performs exactly like an LRU cache. In between 0 and 1 it combines both recency and frequency information in a natural way.
In spite of an answer having been chosen, I want to submit my approach for consideration, and feedback.
I would account for frequency by incrementing a counter each use, but by some larger-than-one value, like 10 (To add precision to the second point).
I would account for recency by multiplying all counters at regular intervals (say, 24 hours) by some diminisher (say, 0.9).
Each use:
UPDATE `addresslist` SET `favor` = `favor` + 10 WHERE `address` = 'foo#bar.com'
Each interval:
UPDATE `addresslist` SET `favor` = FLOOR(`favor` * 0.9)
In this way I collapse both frequency and recency to one field, avoid the need for keeping a detailed history to derive {last day, last week, last month} and keep the math (mostly) integer.
The increment and diminisher would have to be adjusted to preference, of course.

Resources