Sort shopping list based on previous shopping trips - algorithm

I want to sort a shopping list based on the order items were checked off in previous shopping trips. For example I go to the store and shop Apples, Bananas and Eggs.
Next I go to the store I shop Avocados and Tomatos and Apples. For my next trip the application sorts Avocados, Tomatos and Apples all before Eggs f.e.
I found this post talking about topological sorting: How to sort a shopping list based on previous pick order?.
However I am not sure how this should work since in theory I could have cycles (A user could theoretically check off apples and then bananas and the next time bananas are checked off before apples).
Could you guide me on how to tackle this problem?
I assume:
Past item orderings should guide ordering the current order.
Any new items appear after any items ordered historically.
Orderings further back in time should have less impact than more recent orderings.
My idea is to assign weights to items seen in past orders based on:
Their position in a historic ordering.
How old that odering is.
The weightings might need adjusting, but, using data from that other question you link to, the Python code below does create orderings based on historic orderings:
from collections import defaultdict
shop = [['Toothpaste', 'Bread', 'Meat', 'Vegetables', 'Milk', 'Ice cream'], # last
['CDs', 'Bread', 'Fruit', 'Vegetables', 'Juice', 'Sugar', 'Chocolates'], # last-but-1
['Meat', 'Juice', 'Milk', 'Sugar']] # last-but-2
def weight_from_index(idx: int) -> float | int:
"Items to the left are of LOwer wt and will sort first."
return idx + 1
def historic_multiplier(idy: int) -> float:
"Older rows have larger multipliers and so are of lower overall weight."
return (idy + 1)**1
def shopping_weights(history: list[list[str]]) -> dict[str, int | float]:
"Weight for items from historic shops."
item2weight = defaultdict(float)
for y, hist in enumerate(history):
for x, item in enumerate(hist):
item2weight[item] += historic_multiplier(y) * weight_from_index(x)
return dict(item2weight)
def order_items(items: list[str], weights) -> list[str]:
wts = weights.copy()
new_items = set(items) - set(wts)
# New items last, but in given order otherwise
max_wt = max(wts.values())
for itm in new_items:
wts[itm] = max_wt + 1 + items.index(itm)
return sorted(items, key = lambda i: wts[i])
item_weights = shopping_weights(shop)
new_shop = ['Curry', 'Vegetables', 'Eggs', 'Milk', 'CDs', 'Meat']
new_order = order_items(new_shop, item_weights)
# ['CDs', 'Meat', 'Vegetables', 'Milk', 'Curry', 'Eggs']
# Update the historic item orders
shop.insert(0, new_order)


DAX - RANKX Using Two Calculations -

I have a data table that contains transactions by supplier. Each row of data represents one transaction. Each transaction contains a "QTY" column as well as a "Supplier" column.
I need to rank these suppliers by the count of transactions (Count of rows per unique supplier) then by the SUM of the "QTY" for all of each supplier's transactions. This needs to be in 1 rank formula, not two separate rankings. This will help in breaking any ties in my ranking.
I have tried dozens of formulas and approaches and can't seem to get it right.
See below example:
Suppliers ABC and EFG each have 4 transactions so they would effectively tie for Rank 1, however ABC has a Quantity of 30 and EFG has a QTY of 25 so ABC should rank 1 and EFG should rank 2.
Can anyone assist?
Welcome to SO. You can create a new calculated column -
Rank =
var SumTable = SUMMARIZE(tbl, tbl[Supplier], "CountTransactions", COUNT(tbl[Transaction Number]), "SumQuantity", SUM(tbl[Quantity]))
var ThisSupplier = tbl[Supplier]
var ThisTransactions = SUMX(FILTER(SumTable, [Supplier] = ThisSupplier), [CountTransactions])
var ThisQuantity = SUMX(FILTER(SumTable, [Supplier] = ThisSupplier), [SumQuantity])
var ThisRank =
[CountTransactions] >= ThisTransactions &&
[SumQuantity] >= ThisQuantity)
Here's the final result -
I'm curious to see if anyone posts an alternative solution. In the meantime, give mine a try and let me know if it works as expected.

How to calculate original loan amount without year terms?
I want to get result like above calculator when I select:
I want to save: 6000
I want to spend it: As soon as possible
Starting balance: 0
Interest rate : 10%
Regular savings: 1000 Monthly
But I am getting not correct result using this code:
loan = 6000.0
interest = 10.0
monthly_payment = 1000.0
i =0.0
record = []
count = 1
add_interst = 0.0
while( loan>=0)
i = interest/(100*12)*loan
add_interst = add_interst + i
puts add_interst
I am getting 181.42163384701658 which should be 168. I don't know where I am wrong.
The code doesn't work because you are doing the opposite of what the link you reference is doing. What they are calculating is saving interest, what you are calculating is loan interest.
Basically, this is how you should define the variables.Also, as others have pointed out, it is good to use BigDecimal to calculate money:
require 'bigdecimal'
balance = 0.to_d
interest = 10.to_d/1200.to_d
regular_saving = 1000;
goal =6000;
i = 0;
added_interest = 0
So, to correct things, you have to start from the starting balance (i.e 0) and start incrementing. Something like this:
while balance < goal
balance += regular_saving;
i = balance * (interest);
balance +=i;
Note also, that in the last year you don't need to pay the full saving amount. You only need to pay to reach the goal. For that, you need to add a conditional statement to check goal - balance < regular_saving. If this was the case, the interest should be calculated in terms of the balance that should be paid (slightly less than the goal).

How should I change his script so that it works for collections and not just one product in Shopify?

The script is a template that comes with the script editor app in Shopify. I need to make it work so that if you buy one product from a collection, you get another one free from that collection. This script works only for buying the same product. Here is the script:
# Returns the integer amount of items that must be discounted next
# given the amount of items seen
def discounted_items_to_find(total_items_seen, discounted_items_seen)
Integer(total_items_seen / (PAID_ITEM_COUNT + DISCOUNTED_ITEM_COUNT) * DISCOUNTED_ITEM_COUNT) - discounted_items_seen
# Partitions the items and returns the items that are to be discounted.
# Arguments
# ---------
# * cart
# The cart to which split items will be added (typically Input.cart).
# * line_items
# The selected items that are applicable for the campaign.
def partition(cart, line_items)
# Sort the items by price from high to low
sorted_items = line_items.sort_by{|line_item| line_item.variant.price}.reverse
# Create an array of items to return
discounted_items = []
# Keep counters of items seen and discounted, to avoid having to recalculate on each iteration
total_items_seen = 0
discounted_items_seen = 0
# Loop over all the items and find those to be discounted
sorted_items.each do |line_item|
total_items_seen += line_item.quantity
# After incrementing total_items_seen, see if any items must be discounted
count = discounted_items_to_find(total_items_seen, discounted_items_seen)
# If there are none, skip to the next item
next if count <= 0
if count >= line_item.quantity
# If the full item quantity must be discounted, add it to the items to return
# and increment the count of discounted items
discounted_items_seen += line_item.quantity
# If only part of the item must be discounted, split the item
discounted_item = line_item.split(take: count)
# Insert the newly-created item in the cart, right after the original item
position = cart.line_items.find_index(line_item)
cart.line_items.insert(position + 1, discounted_item)
# Add it to the list of items to return
discounted_items_seen += discounted_item.quantity
# Return the items to be discounted
eligible_items = do |line_item|
product = line_item.variant.product
!product.gift_card? && == 11380899340
discounted_line_items = partition(Input.cart, eligible_items)
discounted_line_items.each do |line_item|
line_item.change_line_price(, message: "Buy one Bolur, get one Bolur free")
Output.cart = Input.cart
I tried changing, what seems to be the relevant code:
eligible_items = do |line_item|
product = line_item.variant.product
!product.gift_card? && == 11380899340
to this:
eligible_items = do |line_item|
product = line_item.variant.product
!product.gift_card? && **** == 123
but I get an error:
undefined method 'collection' for main (Your Cart)
undefined method'collection' for main (No Customer)
Two things here:
line_item.variant.product does not have the property collections. For that, you want to use line_item.product (docs) – which (should...see point two) expose all of the methods and properties of the product object.
However, in my attempt to do something similar to you (discount based on product) I tried iterating over line_item.variant – and am always hitting the error of undefined method 'product' for #<LineItem:0x7f9c97f9fff0>. Which I interpret as "line_items accessed in cart scripts can only be at the variant level".
So, I wonder if this is because the cart only contains variants (product/color/size) – so we aren't actually able to access the line_items by product, and only by variant.
I tired iterating over line_item.product_id, which also throws a similar error. I think we just have to try to do some hacky thing at the variant level.
I am going to see if I can access the product by the variant ID...back to the docs!
You actually can't do a collection, so you'd need to modify the script to work with a product type or tags. That script will need to be heavily modified to work for a number of products and not multiples of the same

How can I increase my python-code speed?

I have a dataframe, df1, that reports courses students have taken, where ID is the student’s id, COURSES is a list of courses taken by the student, and TYPE and MAJOR are student attributes. The dataframe looks like this:
1 ['Intr To Archaeology', 'Statics', 'Circuits I…] Freshman EEEL
2 ['Signals & Systems I', ‘Instrumentation’…] Transfer EEEL
3 ['Keyboard Competence', 'Elementary … ] Freshman EEEL
4 ['Cultural Anthro', 'Vector Analysis’ … ] Freshma EEEL
I created a new dataframe, df2, that reports a dissimilarity measure for each pair of students based on the courses they’ve taken. df2 looks like this:
I created using the following script, but it runs very slowly (there are thousands of students). Can someone suggest a more efficient way to create df2?
One major problem is that the script below calculates the distance between (student 1 and student 2) and (student 2 and student 1), which is redundant since the distances are the same. However, the condition I created to prevent this:
if (id1 >= id2):
doesn't work.
Entire script:
for id1, student1 in df.iterrows():
for id2, student2 in df.iterrows():
if (id1 >= id2):
ID_1 = student1["ID"]
ID_2 = student2["ID"]
# courses as list strings
s1 = student1["COURSES"]
s2 = student2["COURSES"]
# courses as sets
courses1 = set(ast.literal_eval(s1))
courses2 = set(ast.literal_eval(s2))
distance = float(len(courses1.symmetric_difference(courses2)))/(len(courses1) + len(courses2))
# Some strings seem to have a different format
distance = -1
ID_1_Transfer = 1 if student1["TYPE"] == "Transfer" else 0
ID_2_Transfer = 1 if student2["TYPE"] == "Transfer" else 0
df2= df2.append({'ID_1': ID_1,'ID_2': PIDM_2,'Distance': distance, 'ID_1_Transfer': ID_1_Transfer, 'ID_2_Transfer': ID_2_Transfer}, ignore_index=True)

Regroup By in PigLatin

In PigLatin, I want to group by 2 times, so as to select lines with 2 different laws.
I'm having trouble explaining the problem, so here is an example. Let's say I want to grab the specifications of the persons who have the nearest age as mine ($my_age) and have lot of money.
Relation A is four columns, (name, address, zipcode, age, money)
B = GROUP A BY (address, zipcode); # group by the address
-- generate the address, the person's age ...
C = FOREACH B GENERATE group, MIN($my_age - age) AS min_age, FLATTEN(A);
D = FILTER C BY min_age == age
--Then group by as to select the richest, group by fails :
E = GROUP D BY group; or E = GROUP D BY (address, zipcode);
-- The end would work
D = FOREACH E GENERATE group, MAX(money) AS max_money, FLATTEN(A);
F = FILTER C BY max_money == money;
I've tried to filter at the same time the nearest and the richest, but it doesn't work, because you can have richest people who are oldest as mine.
An another more realistic example is :
You have demands file like : iddem, idopedem, datedem
You have operations file like : idope,labelope,dateope,idoftheday,infope
I want to return operations that matches demands like :
idopedem matches ideope.
The dateope must be the nearest with datedem.
If datedem - date_ope > 0, then I must select the operation with the max(idoftheday), else I must select the operation with the min(idoftheday).
Relation A is 5 columns (idope,labelope,dateope,idoftheday,infope)
Relation B is 3 columns (iddem, idopedem, datedem)
C = JOIN A BY idope, B BY idopedem;
D = FOREACH E GENERATE iddem, idope, datedem, dateope, ABS(datedem - dateope) AS datedelta, idoftheday, infope;
E = GROUP C BY iddem;
F = FOREACH D GENERATE group, MIN(C.datedelta) AS deltamin, FLATTEN(D);
G = FILTER F BY deltamin == datedelta;
--Then I must group by another time as to select the min or max idoftheday
H = GROUP G BY group; --Does not work when dump
H = GROUP G BY iddem; --Does not work when dump
I = FOREACH H GENERATE group, (datedem - dateope >= 0 ? max(idoftheday) as idofdaysel : min(idoftheday) as idofdaysel), FLATTEN(D);
J = FILTER F BY idofdaysel == idoftheday;
Data in the 2nd example (note date are already in Unix format) :
You have demands file like :
1, 'ctr1', 1359460800000
2, 'ctr2', 1354363200000
You have operations file like :
Result must be like :
1, 'ctr1', 'tata',1359460800000,2,'blabla3'
2, 'ctr2', 'toto',1359460800000,1,'blabla4'
Sample input and output would help greatly, but from what you have posted it appears to me that the problem is not so much in writing the Pig script but in specifying what exactly it is you hope to accomplish. It's not clear to me why you're grouping at all. What is the purpose of grouping by address, for example?
Here's how I would solve your problem:
First, design an optimization function that will induce an ordering on your dataset that reflects your own prioritization of money vs. age. For example, to severely penalize large age differences but prefer more money with small ones, you could try:
scored = FOREACH A GENERATE *, money / POW(1+ABS($my_age-age)/10, 2) AS score;
ordered = ORDER scored BY score DESC;
top10 = LIMIT ordered 10;
That gives you the 10 best people according to your optimization function.
Then the only work is to design a function that matches your own judgments. For example, in the function I chose, a person with $100,000 who is your age would be preferred to someone with $350,000 who is 10 years older (or younger). But someone with $500,000 who is 20 years older or younger is preferred to someone your age with just $50,000. If either of those don't fit your intuition, then modify the formula. Likely a simple quadratic factor won't be sufficient. But with a little experimentation you can hit upon something that works for you.
