Lua: custom compare function in table sort - sorting

I have a table which stores file names, such as:
1.jpg
5.jpg
4.jpg
10.jpg
2.jpg
Now I want to sort it. I used the following code:
table.sort(myTable)
The outcome was
1.jpg
10.jpg
2.jpg
4.jpg
5.jpg
However, I would like to sort it like this:
1.jpg
2.jpg
4.jpg
5.jpg
10.jpg
so I wrote a custom compare function:
function compare(a, b)
return tonumber(a) < tonumber(b)
end
But it came out that: attempt to compare two nil values. So how can I achieve it?

You need to extract a number from the filenames you are comparing first. Assuming the number is unique, something like this should work:
function compare(a, b)
return tonumber(a:match("%d+")) < tonumber(b:match("%d+"))
end
You may also want to check my post on Alphanum sorting for humans in Lua, which covers this and other cases.
[Updated to address the question in comments] To sort by the combination of strings and numbers, you just need to follow one of the options in the linked blog post. For example, to sort the file names you listed in the comments, you can use the following:
local t = {"file001_abc_10.txt", "file001_abc_2.txt", "file001_bcd_4.txt", "file001_bcd_12.txt"}
function compare(a, b)
local function padnum(n, rest) return ("%03d"..rest):format(tonumber(n)) end
return tostring(a):gsub("(%d+)(%.)",padnum) < tostring(b):gsub("(%d+)(%.)",padnum)
end
table.sort(t, compare)
print(unpack(t))
This prints: file001_abc_2.txt file001_abc_10.txt file001_bcd_4.txt file001_bcd_12.txt. You can adjust the number length in the padnum function.

Related

How to sort an array in Ruby

Persoane = []
Nume = gets
Persoane.push Nume.split(",")
puts Persoane.sort
I am trying to get an user to input carachters that get split into substrings which get inserted in an array, then the program would output the strings in alphabetical order. It doesnt seem to work and I just get the array's contents, like so:
PS C:\Users\Lenovo\Desktop\Ruby> ruby "c:\Users\Lenovo\Desktop\Ruby\ruby-test.rb"
Scrie numele la persoane
Andrei,Codrin,Bradea
Andrei
Codrin
Bradea
PS C:\Users\Lenovo\Desktop\Ruby>
you can do this :
Nume = gets
puts Nume.split(",").sort
or in 1 line
array = gets.chomp.split(",").sort
The error is because of your use of push. Let's assume that you define the constant Nume by
Nume='Andrei,Codrin,Bradea'
Then, Nume.split(',') would return the Array ['Andrei', 'Codrin', 'Bradea']. When you do a Persoane.push, the whole array is added to your array Persoane as a single element. Therefore, Persoane contains only one Element, as you can verify when you do a
p Persoane
If you sort a one-element array, the result will also be just that one element - there is nothing to sort.
What you can do is using concat instead of push. This would result in Persoane being a 3-element array which can be sorted.
I'm not sure you need use constants here
If you don't need keep user input and use it somewhere, you can just chain methods like this
persons = gets.chomp.split(",").sort
For something a little different, let's not split at all.
people = gets.scan(/[^,]+/).map(&:strip).sort
This will avoid problems like multiple commas in a row yielding empty strings. Of course, you could also avoid that with:
people = gets.split(/\,+/).map(&:strip).sort

ruby > sort images inside directory with multiple conditions

I need to sort the images present inside some directory with the following order:
00a.jpg
00b.jpg
00c.jpg
...
00x.jpg
00y.jpg
00z.jpg
0aa.jpg
0bb.jpg
0cc.jpg
...
0xx.jpg
0yy.jpg
0zz.jpg
001.jpg
002.jpg
003.jpg
...
097.jpg
098.jpg
099.jpg
100.jpg
101.jpg
102.jpg
But I am not getting any logic to put inside my sort_by? Can anyone has any idea what logic would be best suited for sorting all images in the above mentioned order..
I am expecting something like this :
Dir.entries('.').sort_by { |x| ?? }
Thanks,
Dean
Your requested sort order is not apparent, so I'm going to assume that you want all the images which contain a letter to be before those with numbers only.
For this logic, you can return an array from sort_by, which be evaluated in order - firs item first, second one if the first is tied, etc.
In this example this would be something like:
jpgs.sort_by { |j| [j[/.*[a-z].*\.jpg/] ? 0 : 1, j] }
The first item in the array returned answers the question of whether the image name contains a letter before the extension, and if it does returns a smaller number than if it doesn't. This assures us that images with letters in their names will be before images with only numbers in their names.
Will result in this order:
[
"00a.jpg",
"00b.jpg",
"00c.jpg",
"00x.jpg",
"00y.jpg",
"00z.jpg",
"0aa.jpg",
"0bb.jpg",
"0cc.jpg",
"0xx.jpg",
"0yy.jpg",
"0zz.jpg",
...,
"001.jpg",
"002.jpg",
"003.jpg",
"097.jpg",
"098.jpg",
"099.jpg",
"100.jpg",
"101.jpg",
"102.jpg"
]
I would use:
Dir.entries('.').sort { |a,b| a.split('.').first <=> b.split('.').first }
I think it may be faster than regexp option. Also, its simplier and easier to customize (due using 2 iterators and comparator).

indexing and comparing string index or hash

I want to clean up my music-library by giving attention to songs that have the most doubles on my system. I could just list them all, sort the and do it manually but that would take too long. I want the list to sort on the most possible duplicates. So if a song would have 10 duplicates it would mean there are 10 songnames that resemble each other and thus i would focus my attention to that song first to just keep the best version.
I could compare two songnames using the using the levenshtein string-comparison technique and gem
require 'levenshtein'
Levenshtein.distance("string1", "string2") => 1
But let's say i have x number of songs, i would have to compare each song x times because i can't rely on normal filesorting, i would miss some duplicates then. eg
The Beatles - Hey Jude
Beatles, The - hey jude
Beatles_-_Hey_Judy_(remastered)
should give beatles - hey judy (x3)
Is there a way to produce an index based on the filename that then can be sorted and would give all the duplicates in descending order ? A kind of hash that can be compared ?
I know of other music comparing methods but they have their flaws, and this would be usable to compare other type of files also.
Try to use this code
files is an array of filenames, max_distance is a maximum distance to consider the names similar.
hash = {}
files.each do |file|
similar = hash.keys.select { |f| Levenshtein.distance(f, file) < max_distance }
if similar.any?
hash[similar.first] += 1
else
hash.merge!({file => 0})
end
end
After that you will get hash, which have filenames as keys and "duplicates" count as values, and you can sort it as you want.

String Algorithm Question - Word Beginnings

I have a problem, and I'm not too sure how to solve it without going down the route of inefficiency. Say I have a list of words:
Apple
Ape
Arc
Abraid
Bridge
Braide
Bray
Boolean
What I want to do is process this list and get what each word starts with up to a certain depth, e.g.
a - Apple, Ape, Arc, Abraid
ab - Abraid
ar -Arc
ap - Apple, Ape
b - Bridge, Braide, Bray, Boolean
br - Bridge, Braide, Bray
bo - Boolean
Any ideas?
You can use a Trie structure.
(root)
/
a - b - r - a - i - d
/ \ \
p r e
/ \ \
p e c
/
l
/
e
Just find the node that you want and get all its descendants, e.g., if I want ap-:
(root)
/
a - b - r - a - i - d
/ \ \
[p] r e
/ \ \
p e c
/
l
/
e
Perhaps you're looking for something like:
#!/usr/bin/env python
def match_prefix(pfx,seq):
'''return subset of seq that starts with pfx'''
results = list()
for i in seq:
if i.startswith(pfx):
results.append(i)
return results
def extract_prefixes(lngth,seq):
'''return all prefixes in seq of the length specified'''
results = dict()
lngth += 1
for i in seq:
if i[0:lngth] not in results:
results[i[0:lngth]] = True
return sorted(results.keys())
def gen_prefix_indexed_list(depth,seq):
'''return a dictionary of all words matching each prefix
up to depth keyed on these prefixes'''
results = dict()
for each in range(depth):
for prefix in extract_prefixes(each, seq):
results[prefix] = match_prefix(prefix, seq)
return results
if __name__ == '__main__':
words='''Apple Ape Arc Abraid Bridge Braide Bray Boolean'''.split()
test = gen_prefix_indexed_list(2, words)
for each in sorted(test.keys()):
print "%s:\t\t" % each,
print ' '.join(test[each])
That is you want to generate all the prefixes that are present in a list of words between one and some number you'll specify (2 in this example). Then you want to produce an index of all words matching each of these prefixes.
I'm sure there are more elegant ways to do this. For for a quick and easily explained approach I've just built this from a simple bottom-up functional decomposition of the apparent spec. Of the end result values are lists each matching a given prefix, then we start with the function to filter out such matches from our inputs. If the end result keys are all prefixes between 1 and some N that appear in our input then we have a function to extract those. Then our spec. is an extremely straightforward nested loop around that.
Of course this nest loop might be a problem. Such things usually equate to an O(n^2) efficiency. As shown this will iterate over the original list C * N * N times (C is the constant number representing the prefixes of length 1, 2, etc; while N is the length of the list).
If this decomposition provides the desired semantics then we can look at improving the efficiency. The obvious approach would be to lazily generate the dictionary keys as we iterate once over the list ... for each word, for each prefix length, generate key ... append this word to the the list/value stored at that key ... and continue to the next word.
There's still a nested loop ... but it's the short loop for each key/prefix length. That alternative design has the advantage of allowing us to iterate over lists of words from any iterable, not just an in memory list. So we could iterate over lines of a file, results generated from a database query, etc --- without incurring the memory overhead of keeping the entire original word list in memory.
Of course we're still storing the dictionary in memory. However we can also change that, decouple the logic from the input and storage. When we append each input to the various prefix/key values we don't care if they're lists in a dictionary, or lines in a set of files, or values being pulled out of (and pushed back into) a DBM or other key/value store (for example some sort of CouchDB or other "noSQL clustered/database."
The implementation of that is left as an exercise to the reader.
I don't know what you are thinking about, when you say "route of inefficiency", but pretty obvious solution (possibly the one you are thinking about) comes to mind. Trie looks like a structure for this kind of problems, but it's costly in terms of memory (there is a lot of duplication) and I'm not sure it makes things faster in your case. Maybe the memory usage would pay off, if the information was to be retrieved many times, but your answer suggests, you want to generate the output file once and store it. So in your case the Trie would be generated just to be traversed once. I don't think it makes sense.
My suggestion is to just sort the list of words in lexical order and then traverse the list in order as many times as the max length of the beginning is.
create a dictionary with keys being strings and values being lists of strings
for(i = 1 to maxBeginnigLength)
{
for(every word in your sorted list)
{
if(the word's length is no less than i)
{
add the word to the list in the dictionary at a key
being the beginning of the word of length i
}
}
}
store contents of the dictionary to the file
Using this PHP trie implementation will get you about 50% there. It's got some stuff you don't need and it doesn't have a "search by prefix" method, but you can write one yourself easily enough.
$trie = new Trie();
$trie->add('Apple', 'Apple');
$trie->add('Ape', 'Ape');
$trie->add('Arc', 'Arc');
$trie->add('Abraid', 'Abraid');
$trie->add('Bridge', 'Bridge');
$trie->add('Braide', 'Braide');
$trie->add('Bray', 'Bray');
$trie->add('Boolean', 'Boolean');
It builds up a structure like this:
Trie Object
(
[A] => Trie Object
(
[p] => Trie Object
(
[ple] => Trie Object
[e] => Trie Object
)
[rc] => Trie Object
[braid] => Trie Object
)
[B] => Trie Object
(
[r] => Trie Object
(
[idge] => Trie Object
[a] => Trie Object
(
[ide] => Trie Object
[y] => Trie Object
)
)
[oolean] => Trie Object
)
)
If the words were in a Database (Access, SQL), and you wanted to retrieve all words starting with 'br', you could use:
Table Name: mytable
Field Name: mywords
"Select * from mytable where mywords like 'br*'" - For Access - or
"Select * from mytable where mywords like 'br%'" - For SQL

Sorting strings containing numbers in a user friendly way

Being used to the standard way of sorting strings, I was surprised when I noticed that Windows sorts files by their names in a kind of advanced way. Let me give you an example:
Track1.mp3
Track2.mp3
Track10.mp3
Track20.mp3
I think that those names are compared (during sorting) based on letters and by numbers separately.
On the other hand, the following is the same list sorted in a standard way:
Track1.mp3
Track10.mp3
Track2.mp3
Track20.mp3
I would like to create a comparing alogorithm in Delphi that would let me sort strings in the same way. At first I thought it would be enough to compare consecutive characters of two strings while they are letters. When a digit would be found at some position of both the strings, I would read all digits following them to form a number and then compare the numbers.
To give you an example, I'll compare "Track10" and "Track2" strings this way:
1) read characters while they are equal and while they are letters: "Track", "Track"
2) if a digit is found, read all following digits: "10", "2"
2a) if they are equal, go to 1 or else finish
Ten is greater than two, so "Track10" is greater than "Track2"
It had seemed that everything would be all right until I noticed, during my tests, that Windows considered "Track010" lower than "Track10", while I thought the first one was greater as it was longer (not mentioning that according to my algorithm both the strings would be equal, which is wrong).
Could you provide me with the idea how exactly Windows sorts files by names or maybe you have a ready-to-use algorithm (in any programming language) that I could base on?
Thanks a lot!
Mariusz
Jeff wrote up an article about this on Coding Horror. This is called natural sorting, where you effectively treat a group of digits as a single "character". There are implementations out there in every language under the sun, but strangely it's not usually built-in to most languages' standard libraries.
The mother of all sorts:
ls '*.mp3' | sort --version-sort
The absolute easiest way, I found, was isolate the string you want, so in the OP's case, Path.GetFileNameWithoutExtension(), remove the non-digits, convert to int, and sort. Using LINQ and some extension methods, it's a one-liner. In my case, I was going on directories:
Directory.GetDirectories(#"a:\b\c").OrderBy(x => x.RemoveNonDigits().ToIntOrZero())
Where RemoveNonDigits and ToIntOrZero are extensions methods:
public static string RemoveNonDigits(this string value) {
return Regex.Replace(value, "[^0-9]", string.Empty);
}
public static int ToIntOrZero(this string toConvert) {
try {
if (toConvert == null || toConvert.Trim() == string.Empty) return 0;
return int.Parse(toConvert);
} catch (Exception) {
return 0;
}
}
The extension methods are common tools I use everywhere. YMMV.
Here's a Python approach:
import re
def tryint(s):
"""
Return an int if possible, or `s` unchanged.
"""
try:
return int(s)
except ValueError:
return s
def alphanum_key(s):
"""
Turn a string into a list of string and number chunks.
>>> alphanum_key("z23a")
["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def human_sort(l):
"""
Sort a list in the way that humans expect.
"""
l.sort(key=alphanum_key)
And a blog post with more detail: https://nedbatchelder.com/blog/200712/human_sorting.html

Resources