frequently and most recently used items ranking algorithm - algorithm
I'm searching for an ranking algorithm that will sort items (files, applications, visited websites...) by the amount of usages and also most recent usages.
eg in an application launcher, user types some short prefix of the application name and apps that meets the conditions will be ranked. app A was users favorite app and was used very often, but now he is using app B often and just sometimes uses app A. app A was launched way more times than app B, but B has more usages than A in the last time.
so app B is ranked before app A.
furthermore, if app C wants to outrun app B, it must be used (in the recent time) more often, but for app A to be the first, it doesn't need so much usages, because it is the users favorite app and has more usages in the past then other apps.
I don't know if this is a good explanation of what I want, but I hope that some will understand.
I think you can achieve this using an implementation of a cache replacement policy.
These algorithms help computer processors (CPUs) determine which sections ("pages") of main memory (RAM) to keep in a cache (L1, L2 etc) -- which the CPU can access much faster than RAM. But they can be adapted to your problem very easily.
An algorithm which sorts items by most recent usage would be similar to the cache policy LRU which expires/replaces the Least-Recently-Used page when the cache is full.
An algorithm which sorts items by most frequent usage (here "frequent" really means "number of uses") would be similar to the cache policy LFU which expires/replaces the Least-Frequently-Used page when the cache is full.
There are several policies that explicitly or implicitly combine both concepts in the manner you are requesting. Some also involve "time" (either in terms of actual computer clock time or simply an incrementing counter of page requests etc) to obtain a better sense of the "age" or "frequency" of page access (as opposed to simply number of uses).
These become slightly more complicated and harder to implement, especially if you need a very efficient algorithm. But for most uses related to user-interfaces, even an inefficient algorithm should be plenty fast, because the number of items are small and the user will modify the list very infrequently.
One example of an algorithm that might work for you is the LRFU (Least Recently/Frequently Used) policy that directly seeks to combine LRU and LFU to expire pages based on a formula that combines both recency and frequency of use. You can find a reference to this on the same page listed above. You can also see the scholarly article where it is reported.
The article implements it using a combination of a heap and a linked-list data structure, and it contains some pseudo-code for the implementation.
For your uses, you could probably write a much simpler, but less efficient algorithm fairly easily.
For example you could simply store an array of objects with 2 properties:
Value (the thing you care about -- e.g. website or file name etc)
Score (discussed below -- called "CRF" in the article)
Whenever a Value is selected by the user, you modify the list as follows:
Update the Scores of all existing Items in the list by multiplying each Score by a constant weighting factor, USAGE_WEIGHT (a number between 0.5 and 1.0, discussed more below).
Search the list to see if the recenty-selected item already exists. If it does, simply add 1.0 to its existing Score. If it doesn't, then create a new Item with an initial Score of 1.0 and add it to the list. If there is no more room in the list (meaning it already contains the maximum number of MRU items you want to display) then first remove the item in the list with the lowest score (which will always be at the end of the list, due to the next step).
Now re-sort the list by Score in descending order. (Higest scores should appear first in the MRU list).
The USAGE_WEIGHT factor determines how imporant "recency" is compared to "frequency" (aka usage count). A value of 0.5 will cause the list to have a fully-LRU behavior (where only recency matters), while a value of 1.0 will cause the list to have a fully-LFU behavior (where only usage-count matters). An intermediate value, for example 0.9, will cause the list to have a mixture of LRU and LFU behavior, as shown by the example output below.
In each scenario below, the "Values" are letters, and they are added in this order:
A B C B A A D A C D A B D E C B A
After each addition, I list the letter that was added along with the current MRU list in quotes (e.g. "DBA"). The maximum MRU size is 3. I also list a more detailed representation of the list, showing the Value (Letter) and Score of each item in the form { Letter, Score }.
USAGE_WEIGHT = 1.0 (Fully LFU -- only usage-count matters)
1. (Added A) "A" [ { A, 1.0 } ]
2. (Added B) "AB" [ { A, 1.0 } { B, 1.0 } ]
3. (Added C) "ABC" [ { A, 1.0 } { B, 1.0 } { C, 1.0 } ]
4. (Added B) "BAC" [ { B, 2.0 } { A, 1.0 } { C, 1.0 } ]
5. (Added A) "BAC" [ { B, 2.0 } { A, 2.0 } { C, 1.0 } ]
6. (Added A) "ABC" [ { A, 3.0 } { B, 2.0 } { C, 1.0 } ]
7. (Added D) "ABD" [ { A, 3.0 } { B, 2.0 } { D, 1.0 } ]
8. (Added A) "ABD" [ { A, 4.0 } { B, 2.0 } { D, 1.0 } ]
9. (Added C) "ABC" [ { A, 4.0 } { B, 2.0 } { C, 1.0 } ]
10. (Added D) "ABD" [ { A, 4.0 } { B, 2.0 } { D, 1.0 } ]
11. (Added A) "ABD" [ { A, 5.0 } { B, 2.0 } { D, 1.0 } ]
12. (Added B) "ABD" [ { A, 5.0 } { B, 3.0 } { D, 1.0 } ]
13. (Added D) "ABD" [ { A, 5.0 } { B, 3.0 } { D, 2.0 } ]
14. (Added E) "ABE" [ { A, 5.0 } { B, 3.0 } { E, 1.0 } ]
15. (Added C) "ABC" [ { A, 5.0 } { B, 3.0 } { C, 1.0 } ]
16. (Added B) "ABC" [ { A, 5.0 } { B, 4.0 } { C, 1.0 } ]
17. (Added A) "ABC" [ { A, 6.0 } { B, 4.0 } { C, 1.0 } ]
USAGE_WEIGHT = 0.5 (Fully LRU -- only recency matters)
1. (Added A) "A" [ { A, 1.0 } ]
2. (Added B) "BA" [ { B, 1.0 } { A, 0.5 } ]
3. (Added C) "CBA" [ { C, 1.0 } { B, 0.5 } { A, 0.25 } ]
4. (Added B) "BCA" [ { B, 1.25 } { C, 0.5 } { A, 0.125 } ]
5. (Added A) "ABC" [ { A, 1.0625 } { B, 0.625 } { C, 0.25 } ]
6. (Added A) "ABC" [ { A, 1.5313 } { B, 0.3125 } { C, 0.125 } ]
7. (Added D) "DAB" [ { D, 1.0 } { A, 0.7656 } { B, 0.1563 } ]
8. (Added A) "ADB" [ { A, 1.3828 } { D, 0.5 } { B, 0.0781 } ]
9. (Added C) "CAD" [ { C, 1.0 } { A, 0.6914 } { D, 0.25 } ]
10. (Added D) "DCA" [ { D, 1.125 } { C, 0.5 } { A, 0.3457 } ]
11. (Added A) "ADC" [ { A, 1.1729 } { D, 0.5625 } { C, 0.25 } ]
12. (Added B) "BAD" [ { B, 1.0 } { A, 0.5864 } { D, 0.2813 } ]
13. (Added D) "DBA" [ { D, 1.1406 } { B, 0.5 } { A, 0.2932 } ]
14. (Added E) "EDB" [ { E, 1.0 } { D, 0.5703 } { B, 0.25 } ]
15. (Added C) "CED" [ { C, 1.0 } { E, 0.5 } { D, 0.2852 } ]
16. (Added B) "BCE" [ { B, 1.0 } { C, 0.5 } { E, 0.25 } ]
17. (Added A) "ABC" [ { A, 1.0 } { B, 0.5 } { C, 0.25 } ]
USAGE_WEIGHT = 0.9 (LRFU -- Mixture of LRU and LFU)
1. (Added A) "A" [ { A, 1.0 } ]
2. (Added B) "BA" [ { B, 1.0 } { A, 0.9 } ]
3. (Added C) "CBA" [ { C, 1.0 } { B, 0.9 } { A, 0.81 } ]
4. (Added B) "BCA" [ { B, 1.81 } { C, 0.9 } { A, 0.729 } ]
5. (Added A) "ABC" [ { A, 1.6561 } { B, 1.629 } { C, 0.81 } ]
6. (Added A) "ABC" [ { A, 2.4905 } { B, 1.4661 } { C, 0.729 } ]
7. (Added D) "ABD" [ { A, 2.2414 } { B, 1.3195 } { D, 1.0 } ]
8. (Added A) "ABD" [ { A, 3.0173 } { B, 1.1875 } { D, 0.9 } ]
9. (Added C) "ABC" [ { A, 2.7156 } { B, 1.0688 } { C, 1.0 } ]
10. (Added D) "ADB" [ { A, 2.444 } { D, 1.0 } { B, 0.9619 } ]
11. (Added A) "ADB" [ { A, 3.1996 } { D, 0.9 } { B, 0.8657 } ]
12. (Added B) "ABD" [ { A, 2.8796 } { B, 1.7791 } { D, 0.81 } ]
13. (Added D) "ADB" [ { A, 2.5917 } { D, 1.729 } { B, 1.6012 } ]
14. (Added E) "ADE" [ { A, 2.3325 } { D, 1.5561 } { E, 1.0 } ]
15. (Added C) "ADC" [ { A, 2.0993 } { D, 1.4005 } { C, 1.0 } ]
16. (Added B) "ADB" [ { A, 1.8893 } { D, 1.2604 } { B, 1.0 } ]
17. (Added A) "ADB" [ { A, 2.7004 } { D, 1.1344 } { B, 0.9 } ]
In the first example (USAGE_WEIGHT=1.0), the Scores of existing items do NOT change when new items are added (this is because we are multiplying by 1.0 on each step). This leads to simply incrementing Scores by 1.0 after each usage, so the Score directly represents the usage-count. Notice that Items are always listed in order of descreasing usage-count.
In the second example (USAGE_WEIGHT=0.5), the Scores of existing items are halved each time an item is added (this is because we are multiplying by 0.5 on each step). This leads to the property that all Scores in the list are lower than the most recently-added item (which receives a Score of 1.0) moreover it is always true that an Item added at step N will always have a greater score than one added in any step before N, regardless of how many times the item was re-added. This is exactly the property that produces the LRU policy.
Finally, when (USAGE_WEIGHT=0.9) we can see how the two policies are mixed. This third example starts out looking like LRU (i.e. "recency" is important). But as the usage-count of certain Items begins to increase, they begin to have an effect and shift the behavior. This can be seen by step 7 where LRU lists "DAB", but example 3 shows "ABD" due to the higher usages of A and B. Then example 3 looks more like LFU for a few steps, but an interesting thing happens at step 10. Here is where LRFU starts to shine. By this point, A has been added 4 times while "B", "C", and "D" have each been added twice. LRU shows "DCA" because "D" and "C" were more recently added, but it ignores that fact that the user is twice as likely to choose "A" over "D" or "C". LFU shows "ABD" which is okay except that the User selected "D" twice after choosing "B" which suggests that usage of "D" is "heating up" and therefore more likely than "B". LRFU gets it right by showing "ADB". Of course this is all somewhat subjective, and other reader might disagree that this is a better choice. After all, we're trying to predict the User's future choices based on previous history, so there's no perfect solution. But with LRFU you can at least "tune" the USAGE_WEIGHT parameter to find the right balance of LRU vs LFU for a given situation.
In fact, for some applications it might even be preferable to dynamically change the USAGE_WEIGHT as the program progresses to improve prediction based on historical data. This probably wouldn't make sense for user-interface MRU lists, but more for predicting high-volume or high-frequency events.
FYI, In the LRFU algorithm discussed in the paper, the Score is called "CRF". The algorithm they discuss also stores the "time" (or step number) at which each Score is calculated. By storing the Score and the time, is is possible to only update the Score for the Item being added as well as a small subset of Items -- not the entire list. In addition, the sort-order is maintained by the heap and linked-list data structure combination, so the algorithm is much more efficient than what I've described here using a simple array and recalculating and re-sorting the list after each addition. But this implementation is straightforward, easy to understand, and will work fine for user-interface MRU lists.
Here is a very basic implementation of a naive LRFU list in Java. There is a lot that can be done to improve it in terms of performance, but it sufficiently demonstrates the results of LRFU:
public static void main( String[] args ) {
double[] weights = { 1.0, 0.5, 0.9 };
for(double weight : weights) {
System.out.println("USAGE_WEIGHT = " + weight);
testMRU(weight);
System.out.println();
}
}
private static void testMRU(double weight) {
PrintStream p = System.out;
MRUList<String> list = new MRUList<>(3, weight);
String[] lettersAdded = "A B C B A A D A C D A B D E C B A".split(" ");
for(int i = 0; i < lettersAdded.length; i++) {
String value = lettersAdded[i];
list.add(value);
p.printf("%3s. (Added %s) \"", i, value);
for(MRUItem<String> item : list.list)
p.print(item.Value);
p.print("\"\t[ ");
for(MRUItem<String> item : list.list) {
p.printf("{ %s, %.5s } ", item.Value, item.Score);
}
p.println("]");
}
}
private static class MRUList<T> {
public static final double SCORE_INIT = 1.0;
private double usageWeight; // factor that balances LRU vs LFU
private int maxSize; // maximum number of MRU items.
public ArrayList<MRUItem<T>> list;
public MRUList(int maxItemCount) { this(maxItemCount, 0.9); }
public MRUList(int maxItemCount, double usageWeightFactor) {
maxSize = maxItemCount;
usageWeight = usageWeightFactor;
list = new ArrayList<>(maxSize);
}
// Add an item each time the user chooses it.
public void add(T value) {
// Update the score of all existing items
for(MRUItem<T> item : list)
item.Score *= usageWeight; // age the items (this does not affect sort order)
// Search for the item in the list.
MRUItem<T> existing = find(value);
if (existing==null) {
existing = new MRUItem<>(value, SCORE_INIT);
if (list.size()<maxSize) {
// we have room -- add the item.
list.add(existing);
} else {
// no more room -- replace last item.
list.set(list.size() - 1, existing);
}
} else {
// increment the score of the item if it already existed in the list.
existing.Score += SCORE_INIT;
}
// Sort the items for display.
// Collections.sort uses the Comparable interface of MRUItem.
Collections.sort(list);
}
// Get a copy of the list of items, in the correct display order.
public List<T> getItems() {
ArrayList<T> copy = new ArrayList<>();
for(MRUItem<T> item : list)
copy.add(item.Value);
return copy;
}
// return an item if it's Value is already present in the list.
private MRUItem<T> find(T value) {
for(MRUItem<T> item : list)
if (Objects.equals(item.Value, value))
return item;
return null;
}
}
private static class MRUItem<T> implements Comparable<MRUItem<T>> {
public T Value;
public double Score;
public MRUItem(final T value, final double score) {
Score = score;
Value = value;
}
// Sorts by Score in descending order (due to - sign)
#Override
public int compareTo(final MRUItem<T> other) {
return -Double.compare(Score, other.Score);
}
}
Related
TensorFlow Object Detection API - Validation loss increase while mAP increase aswell?
Im trying to retrain a model named: faster_rcnn_inception_resnet_v2_atrous_coco I have only 1 object which i want to detect, it means I have got 1 class only. I split my data into train (230 images) and validation (100 images). After creating csv's and record files, I started to train my model with this config: # Faster R-CNN with Inception Resnet v2, Atrous version; # Configured for MSCOCO Dataset. # Users should configure the fine_tune_checkpoint field in the train config as # well as the label_map_path and input_path fields in the train_input_reader and # eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that # should be configured. model { faster_rcnn { num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 } } feature_extractor { type: 'faster_rcnn_inception_resnet_v2' first_stage_features_stride: 8 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 8 width_stride: 8 } } first_stage_atrous_rate: 2 first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 17 maxpool_kernel_size: 1 maxpool_stride: 1 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 } } train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0003 schedule { step: 900000 learning_rate: .00003 } schedule { step: 1200000 learning_rate: .000003 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 10.0 fine_tune_checkpoint: "faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true # Note: The below line limits the training process to 200K steps, which we # empirically found to be sufficient enough to train the pets dataset. This # effectively bypasses the learning rate schedule (the learning rate will # never decay). Remove the below line to train indefinitely. num_steps: 200000 data_augmentation_options { random_horizontal_flip { } } } train_input_reader: { tf_record_input_reader { input_path: "data/train.record" } label_map_path: "training/object-detection.pbtxt" } eval_config: { num_examples: 100 eval_interval_secs: 60 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. # max_evals: 10 num_visualizations: 100 } eval_input_reader: { tf_record_input_reader { input_path: "data/test.record" } label_map_path: "training/object-detection.pbtxt" shuffle: false num_readers: 1 } Now, when I open Tensorboard I get a nice loss graphs: Loss Graphs And as you can see training loss (orange) goes down as expected, while val loss (blue) goes down at the begining and then goes up (probably because of overfitting training data). So far so good, but when i look at mAP and Recall graphs: mAP graph Recall graph I see mAP goess up as loss goes down, but when loss increase, I would think that mAP will decrease, but it goes up, which i can't understand why.. same for Recall graphs.. My val loss is incresing, but mAP & Recall keep increasing aswell, any idea why?
d3: Performing .Sort by classes
I am trying to understand how the .sort() method works. My implementation should be very simple, using the method: .sort(function (a, b) { }); I want to check whether the element is a member of a certain class. If it is, I want it to be put towards the top. Otherwise, it should go to the bottom. What's the pattern? The sort is on a path group of states from a geojson projection: d3.json("./data/states.json", function(us) { mapSvg.selectAll("path") .data(us.features) .enter() .append("path") .attr("d", path).attr("class",function(d){return "border2 thestates"}) }); } I want to bring some of the states to the front if they have a class.
The return value of the .sort() function should be as follows (from the documentation): It should return either a negative, positive, or zero value. If negative, then a should be before b; if positive, then a should be after b; otherwise, a and b are considered equal and the order is arbitrary. So in your case, it should be something like this: function(a, b) { if(a.memberOfClass() && !b.memberOfClass()) { return -1; } else if(b.memberOfClass()) { return 1; } else { return 0; } } You can simplify this because true also evaluates to 1: function(a, b) { return -a.memberOfClass() + b.memberOfClass(); }
Removing From ArrayList, In Loop Based On It's Size, But Breaking After Remove Still Gives OutOfBounds
Alright so I remove an object from an array list, then break, but I still get OutOfBounds, I'm kinda confused, could someone help me, I've tried to isolate the problem, but I still can't figure it out. Here is the error I get: Exception in thread "Thread-2" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at GameFunctions.closestTarget(GameFunctions.java:37) at GameFunctions.act(GameFunctions.java:147) at GamePanel$1.run(GamePanel.java:50) Here are the two methods causing me problems: public void act(ArrayList<Germs> g, ArrayList<WhiteBloodCell> p, String actor) { if(actor.equals("Germs")) { for(int i=0;i<g.size();i++) { if(g.get(i) instanceof SPneumoniae) { g.get(i).move(); g.get(i).testBounds(); if(checkSize(g, i)) { System.out.println("broken"); break; } } } } else if(actor.equals("WhiteBloodCells")) { for(int i=0;i<p.size();i++) { p.get(i).setTarget(closestTarget(g, p.get(i))); p.get(i).move(); } } } And here is the method called that's removing the object: public boolean checkSize(ArrayList<Germs> g, int i) { if(g.get(i).getRadius() > 30) { g.get(i).setRadius(30); } else if(g.get(i).getRadius() <= 0) { g.remove(i); return true; } return false; }
It looks like the error is due to there being nothing in the ArrayList of g. Check out this area of your code: else if(actor.equals("WhiteBloodCells")) { for(int i=0;i<p.size();i++) { p.get(i).setTarget(closestTarget(g, p.get(i))); p.get(i).move(); } } See if that gives you any leads. Edit -- The bug IS coming from the closestTarget function based on the exceptions listed.
parent of a node in a binary tree by searching given key
this is the function in c that's not giving me the solution struct node* serch(struct node *ptr,int x) { if(ptr->data==x) { printf(" root of tree itself "); } else { struct node *ptr1,*ptr2; ptr1=ptr->left; ptr2=ptr->right; while((ptr1->data!=x)&&(ptr2->data!=x)) { if(ptr->data>x) { ptr=ptr1; ptr1=ptr->left; ptr2=ptr->right; } else if(ptr->data<x) { ptr=ptr2; ptr1=ptr->left; ptr2=ptr->right; } } return ptr; } } THE code works fine for the node's having both the children(particularly works fine upto the level the tree is balanced) but after that it doesn't work and gives the error parentnode.exe has stopped working ,windows is checking for a solution.
You have several bugs you are not returning something for the case ptr->data==x. Your c compiler should have given a warning that not all paths return a value. you are not checking for nulls
Linq find differences in two lists
I have two list of members like this: Before: Peter, Ken, Julia, Tom After: Peter, Robert, Julia, Tom As you can see, Ken is is out and Robert is in. What I want is to detect the changes. I want a list of what has changed in both lists. How can linq help me?
Your question is not completely specified but I assume that you are looking for the differences as sets (that is, ordering does not matter). If so, you want the symmetric difference of the two sets. You can achieve this using Enumerable.Except: before.Except(after).Union(after.Except(before));
As an alternative to linq answers, which have to pass both lists twice, use HashSet.SymmetricExceptWith(): var difference = new HashSet(before); difference.SymmetricExceptWith(after); Could be considerably more efficient.
Another way: before.Union(after).Except(before.Intersect(after))
Here is the version having O(n) complexity, provided your sequences are both ordered: public static IEnumerable<T> SymmetricDifference<T>(IEnumerable<T> coll1, IEnumerable<T> coll2, IComparer<T> cmp) { using (IEnumerator<T> enum1 = coll1.GetEnumerator()) using (IEnumerator<T> enum2 = coll2.GetEnumerator()) { bool enum1valid = enum1.MoveNext(); bool enum2valid = enum2.MoveNext(); while (enum1valid && enum2valid) { int cmpResult = cmp.Compare(enum1.Current, enum2.Current); if (cmpResult < 0) { yield return enum1.Current; enum1valid = enum1.MoveNext(); } else if (cmpResult > 0) { yield return enum2.Current; enum2valid = enum2.MoveNext(); } else { enum1valid = enum1.MoveNext(); enum2valid = enum2.MoveNext(); } } while (enum1valid) { yield return enum1.Current; enum1valid = enum1.MoveNext(); } while (enum2valid) { yield return enum2.Current; enum2valid = enum2.MoveNext(); } } } public static IEnumerable<T> SymmetricDifference<T>(IEnumerable<T> coll1, IEnumerable<T> coll2) { return SymmetricDifference(coll1, coll2, Comparer<T>.Default); }