Jira's Lexorank algorithm for new stories - algorithm

I am looking to create a large list of items that allows for easy insertion of new items and for easily changing the position of items within that list. When updating the position of an item, I want to change as few fields as possible regarding the order of items.
After some research, I found that Jira's Lexorank algorithm fulfills all of these needs. Each story in Jira has a 'rank-field' containing a string which is built up of 3 parts: <bucket>|<rank>:<sub-rank>. (I don't know whether these parts have actual names, this is what I will call them for ease of reference)
Examples of valid rank-fields:
0|vmis7l:hl4
0|i000w8:
0|003fhy:zzzzzzzzzzzw68bj
When dragging a card above 0|vmis7l:hl4, the new card will receive rank 0|vmis7l:hl2, which means that only the rank-field for this new card needs to be updated while the entire list can always be sorted on this rank-field. This is rather clever, and I can't imagine that Lexorank is the only algorithm to use this.
Is there a name for this method of sorting used in the sub-rank?
My question is related to the creation of new cards in Jira. Each new card starts with an empty sub-rank, and the rank is always chosen such that the new card is located at the bottom of the list. I've created a bunch of new stories just to see how the rank would change, and it seems that the rank is always incremented by 8 (in base-36).
Does anyone know more specifically how the rank for new cards is generated? Why is it incremented by 8?
I can only imagine that after some time (270 million cards) there are no more ranks to generate, and the system needs to recalculate the rank-field of all cards to make room for additional ranks.
Are there other triggers that require recalculation of all rank-fields?
I suppose the bucket plays a role in this recalculation. I would like to know how?

We are talking about a special kind of indexing here. This is not sorting; it is just preparing items to end up in a certain order in case someone happens to sort them (by whatever sorting algorithm). I know that variants of this kind of indexing have been used in libraries for decades, maybe centuries, to ensure that books belonging together but lacking a common title end up next to each other in the shelves, but I have never heard of a name for it.
The 8 is probably chosen wisely as a compromise, maybe even by analyzing typical use cases. Consider this: If you choose a small increment, e. g. 1, then all tickets will have ranks like [a, b, c, …]. This will be great if you create a lot of tickets (up to 26) in the correct order because then your rank fields keep small (one letter). But as soon as you move a ticket between two other tickets, you will have to add a letter: [a, b] plus a new ticket between them: [a, an, b]. If you expect to have this a lot, you better leave gaps between the ranks: [a, i, q, …], then an additional ticket can get a single letter as well: [a, e, i, q, …]. But of course if you now create lots of tickets in the correct order right in the beginning, you quickly run out of letters: [a, i, q, y, z, za, zi, zq, …]. The 8 probably is a good value which allows for enough gaps between the tickets without increasing the need for many letters too soon. Keep in mind that other scenarios (maybe not Jira tickets which are created manually) might make other values more reasonable.
You are right, the rank fields get recalculated now and then, Lexorank calls this "balancing". Basically, balancing takes place in one of three occasions: ① The ranks are exhausted (largest value reached), ② the ranks are due to user-reranking of tickets too close together ([a, b, i] and something is supposed to go in between a and b), and ③ a balancing is triggered manually in the management page. (Actually, according to the presentation, Lexorank allows for up to three letter ranks, so "too close together" can be something like aaa and aab but the idea is the same.)
The <bucket> part of the rank is increased during balancing, so a messy [0|a, 0|an, 0|b] can become a nice and clean [1|a, 1|i, 1|q] again. The brownbag presentation about Lexorank (as linked by #dandoen in the comments) mentions a round-robin use of <buckets>, so instead of a constant increment (0→1→2→3→…) a 2 is increased modulo 3, so it will turn back to 0 after the 2 (0→1→2→0→…). When comparing the ranks, the sorting algorithm can consider a 0 "greater" than a 2 (it will not be purely lexicographical then, admitted). If now the balancing algorithm works backwards (reorder the last ticket first), this will keep the sorting order intact all the time. (This is just a side aspect, that's why I keep the explanation small, but if this is interesting, ask, and I will elaborate on this.)
Sidenote: Lexorank also keeps track of minimum and maximum values of the ranks. For the functioning of the algorithm itself, this is not necessary.

Related

Using Softmax Explorer (--cb_explore_adf) for Ranking in VowpalWabbit

I am trying to use VW to perform ranking using the contextual bandit framework, specifically using --cb_explore_adf --softmax --lambda X. The choice of softmax is because, according to VW's docs: "This is a different explorer, which uses the policy not only to predict an action but also predict a score indicating the quality of each action." This quality-related score is what I would like to use for ranking.
The scenario is this: I have a list of items [A, B, C, D], and I would like to sort it in an order that maximizes a pre-defined metric (e.g., CTR). One of the problems, as I see, is that we cannot evaluate the items individually because we can't know for sure which item made the user click or not.
To test some approaches, I've created a dummy dataset. As a way to try and solve the above problem, I am using the entire ordered list as a way to evaluate if a click happens or not (e.g., given the context for user X, he will click if the items are [C, A, B, D]). Then, I reward the items individually according to their position on the list, i.e., reward = 1/P for 0 < P < len(list). Here, the reward for C, A, B, D is 1, 0.5, and 0.25, 0.125, respectively. If there's no click, the reward is zero for all items. The reasoning behind this is that more important items will stabilize on top and less important on the bottom.
Also, one of the difficulties I found was defining a sampling function for this approach. Typically, we're interested in selecting only one option, but here I have to sample multiple times (4 in the example). Because of that, it's not very clear how I should incorporate exploration when sampling items. I have a few ideas:
Copy the probability mass function and assign it to copy_pmf. Draw a random number between 0 and max(copy_pmf) and for each probability value in copy_pmf, increment the sum_prob variable (very similar to the tutorial here:https://vowpalwabbit.org/tutorials/cb_simulation.html). When sum_prob > draw, we add the current item/prob to a list. Then, we remove this probability from copy_pmf, set sum_prob = 0, and draw a new number again between 0 and max(copy_pmf) (which might change or not).
Another option is drawing a random number and, if the maximum probability, i.e., max(pmf) is greater than this number, we exploit. If it isn't, we shuffle the list and return this (explore). This approach requires tuning the lambda parameter, which controls the output pmf (I have seen cases where the max prob is > 0.99, which would mean around a 1% chance of exploring. I have also seen instances where max prob is ~0.5, which is around 50% exploration.
I would like to know if there are any suggestions regarding this problem, specifically sampling and the reward function. Also, if there are any things I might be missing here.
Thank you!
That sounds like something that can be solved by conditional contextual bandits
For demo scenario that you are mentioning each example should have 4 slots.
You can use any exploration algorithm in this case and it is going to be done independently per each slot. Learning objective is average loss over all slots, but decisions are made sequentially from the first slot to the last, so you'll effectively learn the ranking even in case of binary reward here.

Efficiently and dynamically rank many users in memory?

I run a Java game server where I need to efficiently rank players in various ways. For example, by score, money, games won, and other achievements. This is so I can recognize the top 25 players in a given category to apply medals to those players, and dynamically update them as the rankings change. Performance is a high priority.
Note that this cannot easily be done in the database only, as the ranks will come from different sources of data and different database tables, so my hope is to handle this all in memory, and call methods on the ranked list when a value needs to be updated. Also, potentially many users can tie for the same rank.
For example, let's say I have a million players in the database. A given player might earn some extra points and instantly move from 21,305th place to 23rd place, and then later drop back off the top 25 list. I need a way to handle this efficiently. I imagine that some kind of doubly-linked list would be used, but am unsure of how to handle quickly jumping many spots in the list without traversing it one at a time to find the correct new ranking. The fact that players can tie complicates things a little bit, as each element in the ranked list can have multiple users.
How would you handle this in Java?
I don't know whether there is library that may help you, but I think you can maintain a minimum heap in the memory. When a player's point updates, you can compare this to the root of the heap, if less than,do nothing.else adjust the heap.
That means, you can maintain a minimum heap that has 25 nodes which are the highest 25 of all the players in one category.
Forget linked list. It allows fast insertions, but no efficient searching, so it's of no use.
Use the following data
double threshold
ArrayList<Player> top;
ArrayList<Player> others; (3)
and manage the following properties
each player in top has a score greater or equal to threshold
each player in others has a score lower than threshold
top is sorted
top.size() >= 25
top.size() < 25 + N where N is some arbitrary limit (e.g., 50)
Whenever some player raises it score, do the following:
if they're in top, sort top (*)
if they're in others, check if their score promotes them to top
if so, remove them from others, insert in top, and sort top
if top grew too big, move the n/2 worst players from top to others and update threshold
Whenever some player lowers it score, do the following:
- if they're in others, do nothing
- if they're in top, check if their new score allows them to stay in top
- if so, sort top (1)
- otherwise, demote them to bottom, and check if top got too small
- if so, determine an appropriate new threshold and move all corresponding players to top. (2)
(1) Sorting top is cheap as it's small. Moreover, TimSort (i.e., the algorithm behind Arrays.sort(Object[])) works very well on partially sorted sequences. Instead of sorting, you can simply remember that top is unsorted and sort it later when needed.
(2) Determining a proper threshold can be expensive and so can be moving the players. That's why only N/2 player get moved away from it when it grows too big. This leaves some spare players and makes this case pretty improbable assuming that players rarely lose score.
EDIT
For managing the objects, you also need to be able to find them in the lists. Either add a corresponding field to Player or use a TObjectIntHashMap.
EDIT 2
(3) When removing an element from the middle of others, simply replace the element by the last one and shorten the list by one. You can do it as the order doesn't matter and you must do it because of speed. (4)
(4) The whole others list needn't be actually stored anywhere. All you need is a possibility to iterate all the players not contained in top. This can be done by using an additional Set or by simply iterating though all the players and skipping those scoring above threshold.
FINAL RECOMMENDATIONS
Forget the others list (unless I'm overlooking something, you won't need it).
I guess you will need no TObjectIntHashMap either.
Use a list top and a boolean isTopSorted, which gets cleared whenever a top score changes or a player gets promoted to top (simple condition: oldScore >= threshold | newScore >= threshold).
For handling ties, make top contain at least 25 differently scored players. You can check this condition easily when printing the top players.
I assume you may use plenty of memory to do that or memory is not a concern for you. Now as you want only the top 25 entries for any category, I would suggest the following:
Have a HashSet of Player objects. Player objects have the info like name, games won, money etc.
Now have a HashMap of category name vs TreeSet of top 25 player objects in that category. The category name may be a checksum of some columns say gamewon, money, achievent etc.
HashMap<String /*for category name */, TreeSet /*Sort based on the criteria */> ......
Whenever you update a player object, you update the common HashSet first and then check if the player object is a candidate for top 25 entries in any of the categories. If it is a candidate, some player object unfortunately may lose their ranking and hence may get kicked out of the corresponding treeset.
>> if you make the TreeSet sorted by the score, it'll break whenever the score changes (and the player will not be found in it
Correct. Now I got the point :) . So, I will do the following to mitigate the problem. The player object will have a field that indicate whether it is already in some categories, basically a set of categories it is already in. While updating a player object, we have to check if the player is already in some categories, if it is in some categories already, we will rearrange the corresponding treeset first; i.e. remove the player object and readjust the score and add it back to the treeset. Whenever a player object is kicked out of a category, we remove the category from the field which is holding a set of categories the player is in
Now, what you do if the look-up is done with a brand new search criteria (means the top 25 is not computed for this criteria already) ? ------
Traverse the HashMap and build the top entries for this category from "scratch". This will be expensive operation, just like indexing something afresh.

n! combinations, how to find best one without killing computer?

I'll get straight to it. I'm working on an web or phone app that is responsible for scheduling. I want students to input courses they took, and I give them possible combinations of courses they should take that fits their requirements.
However, let's say there's 150 courses that fits their requirements and they're looking for 3 courses. That would be 150C3 combinations, right?.
Would it be feasible to run something like this in browser or a mobile device?
First of all you need a smarter algorithm which can prune the search tree. Also, if you are doing this for the same set of courses over and over again, doing the computation on the server would be better, and perhaps precomputing a feasible data structure can reduce the execution time of the queries. For example, you can create a tree where each sub-tree under a node contains nodes that are 'compatible'.
Sounds to me like you're viewing this completely wrong. At most institutions there are 1) curriculum requirements for graduation, and 2) prerequisites for many requirements and electives. This isn't a pure combinatorial problem, it's a dependency tree. For instance, if Course 201, Course 301, and Course 401 are all required for the student's major, higher numbers have the lower numbered ones as prereqs, and the student is a Junior, you should be strongly recommending that Course 201 be taken ASAP.
Yay, mathematics I think I can handle!
If there are 150 courses, and you have to choose 3, then the amount of possibilities are (150*149*148)/(3*2) (correction per jerry), which is certainly better than 150 factorial which is a whole lot more zeros ;)
Now, you really don't want to build an array that size, and you don't have to! All web languages have the idea of randomly choosing an element in an array, so you get an element in an array and request 3 random unique entries from it.
While the potential course combinations is very large, based on your post I see no reason to even attempt to calculate them. This task of random selection of k items from n-sized list is delightfully trivial even for old, slow devices!
Is there any particular reason you'd need to calculate all the potential course combinations, instead of just grab-bagging one random selection as a suggestion? If not, problem solved!
Option 1 (Time\Space costly): let the user on mobile phone browse the list of (150*149*148) possible choices, page by page, the processing is done at the server-side.
Option 2 (Simple): instead of the (150*149*148)-item decision tree, provide a 150-item bag, if he choose one item from the bag, remove it from the bag.
Option 3 (Complex): expand your decision tree (possible choices) using a dependency tree (parent course requires child courses) and the list of course already taken by the student, and his track\level.
As far as I know, most educational systems use the third option, which requires having a profile for the student.

Coming up with factors for a weighted algorithm?

I'm trying to come up with a weighted algorithm for an application. In the application, there is a limited amount of space available for different elements. Once all the space is occupied, the algorithm should choose the best element(s) to remove in order to make space for new elements.
There are different attributes which should affect this decision. For example:
T: Time since last accessed. (It's best to replace something that hasn't been accessed in a while.)
N: Number of times accessed. (It's best to replace something which hasn't been accessed many times.)
R: Number of elements which need to be removed in order to make space for the new element. (It's best to replace the least amount of elements. Ideally this should also take into consideration the T and N attributes of each element being replaced.)
I have 2 problems:
Figuring out how much weight to give each of these attributes.
Figuring out how to calculate the weight for an element.
(1) I realize that coming up with the weight for something like this is very subjective, but I was hoping that there's a standard method or something that can help me in deciding how much weight to give each attribute. For example, I was thinking that one method might be to come up with a set of two sample elements and then manually compare the two and decide which one should ultimately be chosen. Here's an example:
Element A: N = 5, T = 2 hours ago.
Element B: N = 4, T = 10 minutes ago.
In this example, I would probably want A to be the element that is chosen to be replaced since although it was accessed one more time, it hasn't been accessed in a lot of time compared with B. This method seems like it would take a lot of time, and would involve making a lot of tough, subjective decisions. Additionally, it may not be trivial to come up with the resulting weights at the end.
Another method I came up with was to just arbitrarily choose weights for the different attributes and then use the application for a while. If I notice anything obviously wrong with the algorithm, I could then go in and slightly modify the weights. This is basically a "guess and check" method.
Both of these methods don't seem that great and I'm hoping there's a better solution.
(2) Once I do figure out the weight, I'm not sure which way is best to calculate the weight. Should I just add everything? (In these examples, I'm assuming that whichever element has the highest replacementWeight should be the one that's going to be replaced.)
replacementWeight = .4*T - .1*N - 2*R
or multiply everything?
replacementWeight = (T) * (.5*N) * (.1*R)
What about not using constants for the weights? For example, sure "Time" (T) may be important, but once a specific amount of time has passed, it starts not making that much of a difference. Essentially I would lump it all in an "a lot of time has passed" bin. (e.g. even though 8 hours and 7 hours have an hour difference between the two, this difference might not be as significant as the difference between 1 minute and 5 minutes since these two are much more recent.) (Or another example: replacing (R) 1 or 2 elements is fine, but when I start needing to replace 5 or 6, that should be heavily weighted down... therefore it shouldn't be linear.)
replacementWeight = 1/T + sqrt(N) - R*R
Obviously (1) and (2) are closely related, which is why I'm hoping that there's a better way to come up with this sort of algorithm.
What you are describing is the classic problem of choosing a cache replacement policy. Which policy is best for you, depends on your data, but the following usually works well:
First, always store a new object in the cache, evicting the R worst one(s). There is no way to know a priori if an object should be stored or not. If the object is not useful, it will fall out of the cache again soon.
The popular squid cache implements the following cache replacement algorithms:
Least Recently Used (LRU):
replacementKey = -T
Least Frequently Used with Dynamic Aging (LFUDA):
replacementKey = N + C
Greedy-Dual-Size-Frequency (GDSF):
replacementKey = (N/R) + C
C refers to a cache age factor here. C is basically the replacementKey of the item that was evicted last (or zero).
NOTE: The replacementKey is calculated when an object is inserted or accessed, and stored alongside the object. The object with the smallest replacementKey is evicted.
LRU is simple and often good enough. The bigger your cache, the better it performs.
LFUDA and GDSF both are tradeoffs. LFUDA prefers to keep large objects even if they are less popular, under the assumption that one hit to a large object makes up lots of hits for smaller objects. GDSF basically makes the opposite tradeoff, keeping many smaller objects over fewer large objects. From what you write, the latter might be a good fit.
If none of these meet your needs, you can calculate optimal values for T, N and R (and compare different formulas for combining them) by minimizing regret, the difference in performance between your formula and the optimal algorithm, using, for example, Linear regression.
This is a completely subjective issue -- as you yourself point out. And a distinct possibility is that if your test cases consist of pairs (A,B) where you prefer A to B, then you might find that you prefer A to B , B to C but also C over A -- i.e. its not an ordering.
If you are not careful, your function might not exist !
If you can define a scalar function of your input variables, with various parameters for coefficients and exponents, you might be able to estimate said parameters by using regression, but you will need an awful lot of data if you have many parameters.
This is the classical statistician's approach of first reviewing the data to IDENTIFY a model, and then using that model to ESTIMATE a particular realisation of the model. There are large books on this subject.

Have/Want List Matching Algorithm

Have/Want List Matching Algorithm
I am implementing an item trading system on a high-traffic site. I have a large number of users that each maintain a HAVE list and a WANT list for a number of specific items. I am looking for an algorithm that will allow me to efficiently suggest trading partners based on your HAVEs and WANTs matched with theirs. Ideally I want to find partners with the highest mutual trading potential (i.e. I have a ton of things you want, you have a ton of things I want). I don't need to find the global highest-potential pair (which sounds hard), just find the highest-potential pairs for a given user (or even just some high-potential pairs, not the global max).
Example:
User 1 HAS A,C WANTS B,D
User 2 HAS D WANTS A
User 3 HAS A,B,D WANTS C
User 1 goes to the site and clicks a button that says
"Find Trading Partners" and the top-ranked result is
User 3, followed by User 2.
An additional source of complexity is that the items have different values, and I want to match on the highest valued trade possible, rather than on the most number of matches between two traders. So in the example above, if all items are worth 1, but A and D are both worth 10, User 1 now gets matched with User 2 above User 3.
A naive way to do this would to compute the max trade value between the user looking for partners vs. all other users in the database. I'm thinking with some lookup tables on the right things I might be able to do better. I've tried googling around, since this seems like a classical problem, but I don't know the name for it.
Can anyone recommend a good approach to solving this problem? I've seen sites like the Magic Online Trading League that seem to solve it in realtime.
You could do this in O(n*k^2) (n is the number of people, k is the average number of items they have/want) by keeping hash tables (or, in a database, indexes) of all the people who have and want given items, then giving scores for all the people who have items the current user wants, and want items the current user has. Display the top 10 or 20 scores.
[Edit] Example of how this would be implemented in SQL:
-- Get score for #userid wants
SELECT UserHas.UserID, SUM(Items.Weight) AS Score
FROM UserWants
INNER JOIN UserHas ON UserWants.ItemID = UserHas.ItemID
INNER JOIN Items ON Items.ItemID = UserWants.ItemID
WHERE UserWants.UserID = #userid
GROUP BY UserWants.UserID, UserHas.UserID
This gives you a list of other users and their score, based on what items they have that the current user wants. Do the same for items the current user has the others want, then combine them somehow (add the scores or whatever you want) and grab the top 10.
This problem looks pretty similar to stable roomamates problem. I don't see any thing wrong with the SQL implementation that got highest votes but as some else suggested this is like a dating/match making problem similar to the lines of stable marriage problem but here all the participants are in one pool.
The second wikipedia entry also has a link to a practical solution in javascript which could be useful
You could maintain a per-item list (as a complement to per-user list). Item search is then spot on. Now you can allow your self brute force search for most valuable pair by checking most valuable items first. If you want more complex (arguably faster) search you could introduce set of items that often come together as meta-items, and look for them first.
Okay, what about this:
There are basically giant "Pools"
Each "pool" contains "sections." Each "Pool" is dedicated to people who own a specific item. Each section is for people who own that item, and want another.
What I mean:
Pool A (For those requesting A)
--Section B (For those requesting A that have B)
--Section C (For those requesting A that have C, even if they also have B)
Pool B
--Section A
--Section B
Pool C
--Section A
--Section C
Each section is filled with people.
"Deals" would consist of one "Requested" item, and a "Pack," you're willing to give any or all of the items up to get the item you requested.
Every "Deal" is calculated per-pool.... if you want a given item, you go to the pools of the items you'd be willing to give, and it find the Section which belongs to the item you are requesting.
Likewise, your deal is placed in the pools. So you can immediately find all of the applicable people, because you know EXACTLY which pools, and EXACTLY which sections to search in, no sorting necessary once they've entered the system.
And, then, age would have priority, older deals would be picked, rather than new ones.
Let's assume you can hash your items, or at least sort them. Assume your goal is to find the best result for a given user, on request, as in your original example. (Optimizing trading partners to maximize overall trade value is a different question.)
This would be fast. O(log n) for each insertion operation. Worst case O(n) for suggesting trading partners, but you bound this by processing time.
You're already maintaining a list of items per user.
Give each user a score equal to the sum of the values of the items they have.
Maintain a list of user-HAVES and user-WANTS per item (#Dialecticus), sorted by user score. (You can sort on demand, or keep the lists sorted dynamically every time a user changes their HAVE list.)
When a user user1 requests suggested trade partners
Iterate over their items item in order by value.
Iterate over the user-HAVES user2 for each item, in order by user score.
Compute trade value for user1 trades-with user2.
Remember best trade so far.
Keep hash of users processed so far to avoid recomputing value for a user multiple times.
Terminate when you run out of processing time (your real-time guarantee).
Sorting by item value and user score is the approximation that makes this fast. I'm not sure how sub-optimal it would be, though. There are certainly easy examples where this would fail to find the best trade if you don't run it to completion. In practice, it seems like it might be good enough. In the limit, you can make it optimal by letting it run until it exhausts the lists in step 4.1 and 4.2. There's extra memory cost associated with the inverted lists, but you didn't say you were memory constrained. And generally, if you want speed, it's not uncommon to trade-off space to get it.
I mark item by letter and user by number.
m - number of items in all have/want lists (have or want, not have and want)
x - number of users.
For each user you have list of his wants and haves. Left line is want list, right is have list (both will be sorted so we can use binary search).
1 - ABBCDE FFFGH
2 - CFGGH BE
3 - AEEGH BBDF
For each pair of users you generate two values and store them somewhere, you'd only generate it once and than actualize. Sorting first table and generating second, is O(m*x*log(m/x)) + O(log(m)) and will require O(x^2) extra memory. These values are: how many would first user get and how many another (if you want you can modify these values by multiplying them by value of particular item).
1-2 : 1 - 3 (user 1 gets 1) - (user 2 gets 3)
1-3 : 3 - 2
2-3 : 1 - 1
You also compute and store best trader for each user. After you've generated this helpful data you can quickly query.
Adding/Removing item - O(m*log(m/x)) (You loop through user's have/want list and do binary search on have/want list of every other user and actualize data)
Finding best connection - O(1) or O(x) (Depends on whether result stored in cache is correct or needs to be updated. You loop through user's pairs and do whatever you want with data to return to user the best connection)
By m/x I estimate number of items in single user's want/have list.
In this algorithm I'm assuming that all data isn't stored in Database (I don't know if binary search is possible with Databases) and that inserting/removing item into list is O(1).
PS. Sorry for bad english and I hope I've computed it all correctly and that it is working because I also need it.
Of course you could always seperate the system into three categories; "Wants," "Haves," and "Open Offers." So lets say User1 has Item A, User2 has Item B & C and is trading those for item A, but User1 still wants Item D, and User2 wants Item E. So User1 (assuming he's the trade "owner") puts a request, or want for Item D and Item E, thus the offer stands, and goes on the "Open Offers" list. If it isn't accepted or edited within two or so days, it's automatically cancelled. So User3 is looking for Item F and Item G, and searches on the "Have list" for Items F & G, which are split between User1 & User2. He realizes that User1 and User2's open offer includes requests for Items D & E, which he has. So he chooses to "join" the operation, and it's accepted on their terms, trading and swaping they items among them.
Lets say User1 now wants Item H. He simply searches on the "Have" list for the item, and among the results, he finds that User4 will trade Item H for Item I, which User1 happens to have. They trade, all is well.
Just make it BC only. That solves all problems.

Resources