Determine if two tables have the same set of keys - performance

I'm using Lua tables as sets by placing the value of the set in the table key and 1 as the table value, e.g.
function addToSet(s,...) for _,e in ipairs{...} do s[e]=1 end end
function removeFromSet(s,...) for _,e in ipairs{...} do s[e]=nil end end
local logics = {}
addToSet(logics,true,false,"maybe")
To test if two sets are equal I need to ensure that they have exactly the same keys. What's an efficient way to do this?

Since you asked about efficiency, I'll provide an alternative implementation. Depending on your expected input tables, you might want to avoid the second loop's lookups. It is more efficient if the tables are expected to be the same, it is less efficient if there are differences.
function sameKeys(t1,t2)
local count=0
for k,_ in pairs(t1) do
if t2[k]==nil then return false end
count = count + 1
end
for _ in pairs(t2) do
count = count - 1
end
return count == 0
end
Another version avoids lookups unless they are necessary. This might perform faster in yet another set of use-cases.
function sameKeys(t1,t2)
local count=0
for _ in pairs(t1) do count = count + 1 end
for _ in pairs(t2) do count = count - 1 end
if count ~= 0 then return false end
for k,_ in pairs(t1) do if t2[k]==nil then return false end end
return true
end
EDIT: After some more research and testing, I came to the conclusion that you need to distinguish between Lua and LuaJIT. With Lua, the performance characteristics are dominated by Lua's Parser and therefore by the number of source code tokens. For Lua, this means that Phrogz's version is most likely the faster alternative. For LuaJIT, the picture changes dramatically, as the parser is no longer an issue. For almost all cases the first version I showed is an improvement, the second version is probably best when the tables are very big. I would advise everyone to run their own benchmarks and check which version works best in their environment.

Loop through both tables and make sure that the key has a value in the other. Fail as soon as you find a mismatch, return true if you got through both. For sets of size M and N this is O(M+N) complexity.
function sameKeys(t1,t2)
for k,_ in pairs(t1) do if t2[k]==nil then return false end end
for k,_ in pairs(t2) do if t1[k]==nil then return false end end
return true
end
Seen in action:
local a,b,c,d = {},{},{},{}
addToSet(a,1,2,3)
addToSet(b,3,1,2,3,3,1)
addToSet(c,1,2)
addToSet(d,2,1)
print(sameKeys(a,b)) --> true
print(sameKeys(a,c)) --> false
print(sameKeys(d,c)) --> true
Note testing for t[k]==nil is better than just not t[k] to handle the (unlikely) case that you have set a value of false for the table entry and you want the key to be present in the set.

Related

Lua Compare two Tables values No Order

I wonder if there is a faster way to check if two tables are equal (without order in the keys).
This is my solution
function TableCompareNoOrder(table1,table2)
if #table1 ~= #table2 then return false end
local equal = false
for key, item in pairs(table1) do
for key2, item2 in pairs(table2) do
if item == item2 then equal = true end
end
if equal == false then return false end
end
local equal = false
for key, item in pairs(table2) do
for key2, item2 in pairs(table1) do
if item == item2 then equal = true end
end
if equal == false then return false end
end
return equal
end
a = {1,2,3,0}
b = {2,3,1,0}
print(TableCompareNoOrder(a,b))
Yes, there are faster ways. Your approach is currently O(n²) as it has to compare each element with each other element. One way is to sort the arrays first. Another one would be to build a lookup table ("set") using the hash part. If hash access is assumed to take amortized constant time, this should run in O(n) time - significantly faster. It will take O(n) additional memory but leaves the passed tables intact; to do the same using sorting you'd have to first create a copy, then sort it.
The hash approach looks like the following (and also works for values like tables which are compared by reference and can't be sorted unless a metatable is set or a custom comparator function is supplied):
function TableCompareNoOrder(table1, table2)
if #table1 ~= #table2 then return false end
-- Consider an early "return true" if table1 == table2 here
local t1_counts = {}
-- Check if the same elements occur the same number of times
for _, v1 in ipairs(table1) do
t1_counts[v1] = (t1_counts[v1] or 0) + 1
end
for _, v2 in ipairs(table2) do
local count = t1_counts[v2] or 0
if count == 0 then return false end
t1_counts[v2] = count - 1
end
return true
end
For the sake of completeness, here's a simple implementation using sorting:
function TableCompareNoOrder(table1, table2)
if #table1 ~= #table2 then return false end
-- Lazy implementation: Sort copies of both tables instead of using a binary search. Takes twice as much memory.
local t1_sorted = {table.unpack(table1)} -- simple way to copy the table, limited by stack size
table.sort(t1_sorted)
local t2_sorted = {table.unpack(table2)}
table.sort(t2_sorted)
for i, v1 in ipairs(t1_sorted) do
if t2_sorted[i] ~= v1 then return false end
end
return true
end
This should roughly run in O(n log n) (performance is determined by the sorting algorithm, usually quicksort which has this as it's average running time).

Lua pathfinding code needs optimization

After working on my code for a while, optimizing the most obvious things, I've resulted in this:
function FindPath(start, finish, path)
--Define a table to hold the paths
local paths = {}
--Make a default argument
path = path or {start}
--Loop through connected nodes
for i,v in ipairs(start:GetConnectedParts()) do
--Determine if backtracking
local loop = false
for i,vv in ipairs(path) do
if v == vv then
loop = true
end
end
if not loop then
--Make a path clone
local npath = {unpack(path)}
npath[#npath+1] = v
if v == finish then
--If we reach the end add the path
return npath
else
--Otherwise add the shortest part extending from this node
paths[#paths+1] = FindPath(v, finish, npath) --NOTED HERE
end
end
end
--Find and return the shortest path
if #paths > 0 then
local lengths = {}
for i,v in ipairs(paths) do
lengths[#lengths+1] = #v
end
local least = math.min(unpack(lengths))
for i,v in ipairs(paths) do
if #v == least then
return v
end
end
end
end
The problem being, the line noted gets some sort of game script timeout error (which I believe is a because of mass recursion with no yielding). I also feel like once that problem is fixed, it'll probably be rather slow even on the scale of a pacman board. Is there a way I can further optimize it, or perhaps a better method I can look into similar to this?
UPDATE: I finally decided to trash my algorithm due to inefficiency, and implemented a Dijkstra algorithm for pathfinding. For anybody interested in the source code it can be found here: http://pastebin.com/Xivf9mwv
You know that Roblox provides you with the PathfindingService? It uses C-side A* pathing to calculate quite quickly. I'd recommend using it
http://wiki.roblox.com/index.php?title=API:Class/PathfindingService
Try to remodel your algorithm to make use of tail calls. This is a great mechanism available in Lua.
A tail call is a type of recursion where your function returns a function call as the last thing it does. Lua has proper tail calls implementation and it will dress this recursion as a 'goto' under the scenes, so your stack will never blow.
Passing 'paths' as one of the arguments of FindPath might help with that.
I saw your edit about ditching the code, but just to help others stumbling on this question:
ipairs is slower than pairs, which is slower than a numeric for-loop.
If performance matters, never use ipairs, but use a for i=1,#tab loop
If you want to clone a table, use a for-loop. Sometimes, you have to use unpack (returning dynamic amount of trailing nils), but this is not such a case. Unpack is also a slow function.
Replacing ipairs with pairs or numeric for-loops and using loops instead of unpack will increase the performance a lot.
If you want to get the lowest value in a table, use this code snippet:
local lowestValue = values[1]
for k,v in pairs(values) do
if v < lowestValue then
lowestValue = k,v
end
end
This could be rewritten for your path example as so:
local least = #path[1]
for k,v in pairs(path) do
if #v < least then
least = v
end
end
I have to admit, you're very inventive. Not a lot of people would use math.min(unpack(tab)) (not counting the fact it's bad)

Ruby // Random number between range, ensure uniqueness against others existing stored ones

Currently trying to generate a random number in a specific range;
and ensure that it would be unique against others stored records.
Using Mysql. Could be like an id, incremented; but can't be it.
Currently testing other existing records in an 'expensive' manner;
but I'm pretty sure that there would be a clean 1/2 lines of code to use
Currently using :
test = 0
Order.all.each do |ord|
test = (0..899999).to_a.sample.to_s.rjust(6, '0')
if Order.find_by_number(test).nil? then
break
end
end
return test
Thanks for any help
Here your are my one-line solution. It is also the quicker one since calls .pluck to retrieve the numbers from the Order table. .select instantiates an "Order" object for every record (that is very costly and unnecessary) while .pluck does not. It also avoids to iterate again each object with a .map to get the "number" field. We can avoid the second .map as well if we convert, using CAST in this case, to a numeric value from the database.
(Array(0...899999) - Order.pluck("CAST('number' AS UNSIGNED)")).sample.to_s.rjust(6, '0')
I would do something like this:
# gets all existing IDs
existing_ids = Order.all.select(:number).map(&:number).map(&:to_i)
# removes them from the acceptable range
available_numbers = (0..899999).to_a - existing_ids
# choose one (which is not in the DB)
available_numbers.sample.to_s.rjust(6, '0')
I think, you can do something like below :
def uniq_num_add(arr)
loop do
rndm = rand(1..15) # I took this range as an example
# random number will be added to the array, when the number will
# not be present
break arr<< "%02d" % rndm unless arr.include?(rndm)
end
end
array = []
3.times do
uniq_num_add(array)
end
array # => ["02", "15", "04"]

refactor this ruby database/sequel gem lookup

I know this code is not optimal, any ideas on how to improve it?
job_and_cost_code_found = false
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, clean_cost_code].each do |row|
job_and_cost_code_found = true
end
if job_and_cost_code_found == false then
info = linenum + "," + id + ",,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
add_to_exception_output_file(info)
end
You're breaking a lot of simple rules here.
Don't select what you don't use.
You select a number of columns, then completely ignore the result data. What you probably want is a count:
SELECT COUNT(*) AS cost_code_count FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?'
Then you'll get one row that will have either a zero or non-zero value in it. Save this into a variable like:
job_and_cost_codes_found = timberline_db[...][0]['cost_code_count']
Don't compare against false unless you need to differentiate between that and nil
In Ruby only two things evaluate as false, nil and false. Most of the time you will not be concerned about the difference. On rare occasions you might want to have different logic for set true, set false or not set (nil), and only then would you test so specifically.
However, keep in mind that 0 is not a false value, so you will need to compare against that.
Taking into account the previous optimization, your if could be:
if job_and_cost_codes_found == 0
# ...
end
Don't use then or other bits of redundant syntax
Most Ruby style-guides spurn useless syntax like then, just as they recommend avoiding for and instead use the Enumerable class which is far more flexible.
Manipulate data, not strings
You're assembling some kind of CSV-like line in the end there. Ideally you'd be using the built-in CSV library to do the correct encoding, and libraries like that want data, not a string they'd have to parse.
One step closer to that is this:
line = [
linenum,
id,
nil,
"Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
].join(',')
add_to_exception_output_file(line)
You'd presumably replace join(',') with the proper CSV encoding method that applies here. The library is more efficient when you can compile all of the data ahead of time into an array-of-arrays, so I'd recommend doing that if this is the end goal.
For example:
lines = [ ]
# ...
if (...)
# Append an array to the lines to write to the CSV file.
lines << [ ... ]
end
Keep your data in a standard structure like an Array, a Hash, or a custom object, until you're prepared to commit it to its final formatted or encoded form. That way you can perform additional operations on it if you need to do things like filtering.
It's hard to refactor this when I'm not exactly sure what it's supposed to be doing, but assuming that you want to log an error when there's no entry matching a job & code pair, here's what I've come up with:
def fetch_by_job_and_cost_code(job, cost_code)
timberline_db['SELECT Job, Cost_Code FROM [JCM_MASTER__COST_CODE] WHERE [Job] = ? AND [Cost_Code] = ?', job, cost_code]
end
if fetch_by_job_and_cost_code(job, clean_cost_code).none?
add_to_exception_output_file "#{linenum},#{id},,Employees default job and cost code do not exist in timberline. job:#{job} cost code:#{clean_cost_code}"
end

Lua - why for loop limit is not calculated dynamically?

Ok here's a basic for loop
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
a = {"first"}
end
As it is now, the loop executes all 4 iterations.
Shouldn't the for-loop-limit be calculated on the go? If it is calculated dynamically, #a would be 1 at the end of the first iteration and the for loop would break....
Surely that would make more sense?
Or is there any particular reason as to why that is not the case?
The main reason why numerical for loops limits are computed only once is most certainly for performance.
With the current behavior, you can place arbitrary complex expressions in for loops limits without a performance penalty, including function calls. For example:
local prod = 1
for i = computeStartLoop(), computeEndLoop(), computeStep() do
prod = prod * i
end
The above code would be really slow if computeEndLoop and computeStep required to be called at each iteration.
If the standard Lua interpreter and most notably LuaJIT are so fast compared to other scripting languages, it is because a number of Lua features have been designed with performance in mind.
In the rare cases where the single evaluation behavior is undesirable, it is easy to replace the for loop with a generic loop using while end or repeat until.
local prod = 1
local i = computeStartLoop()
while i <= computeEndLoop() do
prod = prod * i
i = i + computeStep()
end
The length is computed once, at the time the for loop is initialized. It is not re-computed each time through the loop - a for loop is for iterating from a starting value to an ending value. If you want the 'loop' to terminate early if the array is re-assigned to, you could write your own looping code:
local a = {"first", "second", "third", "fourth"}
function process_array (fn)
local inner_fn
inner_fn =
function (ii)
if ii <= #a then
fn(ii,a)
inner_fn(1 + ii)
end
end
inner_fn(1, a)
end
process_array(function (ii)
print(ii.."th iteration: "..a[ii])
a = {"first"}
end)
Performance is a good answer but I think it also makes the code easier to understand and less error-prone. Also, that way you can (almost) be sure that a for loop always terminates.
Think about what would happen if you wrote that instead:
local a = {"first","second","third","fourth"}
for i=1,#a do
print(i.."th iteration")
if i > 1 then a = {"first"} end
end
How do you understand for i=1,#a? Is it an equality comparison (stop when i==#a) or an inequality comparison (stop when i>=#a). What would be the result in each case?
You should see the Lua for loop as iteration over a sequence, like the Python idiom using (x)range:
a = ["first", "second", "third", "fourth"]
for i in range(1,len(a)+1):
print(str(i) + "th iteration")
a = ["first"]
If you want to evaluate the condition every time you just use while:
local a = {"first","second","third","fourth"}
local i = 1
while i <= #a do
print(i.."th iteration")
a = {"first"}
i = i + 1
end

Resources