ruby > sort images inside directory with multiple conditions - ruby

I need to sort the images present inside some directory with the following order:
00a.jpg
00b.jpg
00c.jpg
...
00x.jpg
00y.jpg
00z.jpg
0aa.jpg
0bb.jpg
0cc.jpg
...
0xx.jpg
0yy.jpg
0zz.jpg
001.jpg
002.jpg
003.jpg
...
097.jpg
098.jpg
099.jpg
100.jpg
101.jpg
102.jpg
But I am not getting any logic to put inside my sort_by? Can anyone has any idea what logic would be best suited for sorting all images in the above mentioned order..
I am expecting something like this :
Dir.entries('.').sort_by { |x| ?? }
Thanks,
Dean

Your requested sort order is not apparent, so I'm going to assume that you want all the images which contain a letter to be before those with numbers only.
For this logic, you can return an array from sort_by, which be evaluated in order - firs item first, second one if the first is tied, etc.
In this example this would be something like:
jpgs.sort_by { |j| [j[/.*[a-z].*\.jpg/] ? 0 : 1, j] }
The first item in the array returned answers the question of whether the image name contains a letter before the extension, and if it does returns a smaller number than if it doesn't. This assures us that images with letters in their names will be before images with only numbers in their names.
Will result in this order:
[
"00a.jpg",
"00b.jpg",
"00c.jpg",
"00x.jpg",
"00y.jpg",
"00z.jpg",
"0aa.jpg",
"0bb.jpg",
"0cc.jpg",
"0xx.jpg",
"0yy.jpg",
"0zz.jpg",
...,
"001.jpg",
"002.jpg",
"003.jpg",
"097.jpg",
"098.jpg",
"099.jpg",
"100.jpg",
"101.jpg",
"102.jpg"
]

I would use:
Dir.entries('.').sort { |a,b| a.split('.').first <=> b.split('.').first }
I think it may be faster than regexp option. Also, its simplier and easier to customize (due using 2 iterators and comparator).

Related

Loop through elements of XML file to see if they include any value within an array?

I've read through tons of questions and solutions to determine whether this was already answered elsewhere, but it seems that none of the things I found were exactly what I was trying to get at.
I have an XML document that has hundreds of entries of text, and each entry also lists a URL. Each URL is a string (within tags), ending with a unique 4-digit number. The XML file is basically formatted like so:
<entry>
[other content]
<id>http://www.URL.com/blahblahblah-1234</id>
[other content]
</entry>
I want to essentially single out only the URLs that have a particular number at the end, out of a list of numbers. I put all of the numbers in an array, with the values set as strings ( numbers = ["1234", "8649", etc.]). I've been using nokogiri for other parts of my script, and when I am only looking for a particular string, I just use include?, which works perfectly. However, I'm not sure how to automate this when I have hundreds of strings within the "numbers" array. This is essentially what I logistically need to happen:
id = nokodoc.css("id")
id.each { |id|
hyperlink = id.text
if hyperlink.include?(numbers)
puts "yes!"
else
puts "no :("
end
}
Obviously this doesn't work, because include? expects a string, whereas I'm passing an entire array. (For instance, if I do include?(numbers[0]), it works.) I've tried this with any? but it doesn't seem to work in this case.
Is there a Ruby method that I'm not aware of, that can tell me whether any of the values within an array is present in any of the nodes that I'm looping through? Let me know if any of this needs to be clarified—phrasing the proper question is often the hardest part!
Edit: As a sidenote, ultimately I'd like to remove all entries that correspond to any links that do not end with one of the numbers in the array, i.e.
if hyperlink.include? (any number from the array)
puts "this one is good"
else
id.parent.remove
So I would somehow need the final product to remain parsable with nokogiri.
Thank you so much in advance, for any and all insight!
You can do this:
numbers = ['1234', '8649', ..]
urls = nokodoc.css('id').map(&:text)
urls = urls.select { |url| numbers.any? { |n| url.include? n } }
But it's not efficient. If you know the pattern -- extract the number, and then check if it's in the array. For example, if it's always last 4 digits:
numbers = ['1234', '8649', ..]
urls = nokodoc.css('id').map(&:text)
urls = urls.select { |url| numbers.include? url[-4..-1] }
UPDATE
For the change in the question:
numbers = ['1234', '8649', ..]
nodes = nokodoc.css('id')
nodes.each do |node|
url = node.text
if numbers.any? { |n| url.include? n }
puts 'this one is good'
else
node.parent.remove
end
end

Ruby: Array each loop, save other elements of original array in new array

I am currently trying to compare every element of an array with the others (in Ruby). Those elements are objects of a class. I need to find similarities between them. My idea was to loop through the original array and in this loop creating a new array containing the other elements (not the one of the outer loop) and then loop through this second array and compare every item with the one in the outer each loop.
Here is some pseudocode:
originalArray.each{
|origElement|
tempArray = createNewArray from original array without origElement
tempArray.each{
|differentElement|
Compare origElement with differentElement
}
}
How can I create that tempArray?
I think you should use Array#permutation for this
original_array.permutation(2) { |elements| Compare elements[0] with elements[1] }
First, I want to say bjhaid's answer is beautiful and for your specific instance, it is the one that should be used.
However, I wanted to provide a more general answer that answers the direct question you asked: "How can I create that tempArray?"
If you wanted to delete all values that are equal to the element in the original array, you could simply do:
tempArray = originalArray - [origElement]
However, if you only want to delete that element, you could do:
originalArray.each_with_index {
|origElement, index|
tempArray = originalArray.dup
tempArray.delete_at(index)
tempArray.each{
|differentElement|
Compare origElement with differentElement
}
}
Also, a note on styling. You probably want to use underscores instead of CamelCase for all methods/variables. In the Ruby community, CamelCase is typically reserved for class / module names. You also probably want to keep the "piped-in" variables (called block arguments) on the same line as the beginning of the block. It is certainly not a requirement, but it is an almost universal convention in the Ruby community.
This code snippet would be much more familiar and readable to your typical Ruby dev:
original_array.each_with_index do |orig_element, index|
temp_array = original_array.dup
temp_array.delete_at(index)
temp_array.each do |different_element|
Compare orig_element with different_element
end
end

Understanding Ruby array sorting syntax

I really don't understand the following sorting method:
books = ["Charlie and the Chocolate Factory", "War and Peace", "Utopia", "A Brief History of Time", "A Wrinkle in Time"]
books.sort! { |firstBook, secondBook| firstBook <=> secondBook }
How does the this work? In the ruby books, they had one parameter for example |x| represent each of the values in the array. If there is more than one parameter (firstBook and secondBook in this example) what does it represent??
Thank you!
The <=> operator returns the result of a comparison.
So "a" <=> "b" returns -1, "b" <=> "a" returns 1, and "a" <=> "a" returns 0.
That's how sort is able to determine the order of elements.
Array#sort (and sort!) called without a block will do comparisons with <=>, so the block is redundant. These all accomplish the same thing:
books.sort!
books.sort_by!{|x| x}
books.sort!{|firstBook, secondBook| firstBook <=> secondBook}
Since you are not overriding the default behavior, the second and third forms are needlessly complicated.
So how does this all work?
The first form sorts the array by using some sorting algorithm -- it's not relevant which one -- which needs to be able to compare two elements to decide which comes first. (More on this below.) It automatically, behind the scenes, follows the same logic as the third line above.
The middle form lets you choose what to sort on. For example: instead of, for each item, just sorting on that item (which is the default), you can sort on that item's length:
books.sort_by!{|title| title.length}
Then books is sorted from shortest title to longest title. If all you are doing is calling a method on each item, there's another shortcut available. This does the same thing:
books.sort_by!(&:length)
In the final form, you have control over the comparison itself. For example, you could sort backwards:
books.sort!{|first, second| second <=> first}
Why does sort need two items passed into the block, and what do they represent?
Array#sort (and sort!) with a block is how you override the comparison step of sorting. Comparison has to happen at some point during a sort in order to figure out what order to put things in. You don't need to override the comparison in most cases, but if you do, this is the form that allows that, so it needs two items passed into the block: the two items that need to be compared right now. Let's look at an example in action:
[4, 3, 2, 1].sort{|x, y| puts "#{x}, #{y}"; x <=> y}
This outputs:
4, 2
2, 1
3, 2
3, 4
This shows us that in this case, sort compared 4 and 2, then 2 and 1, then 3 and 2, and then finally 3 and 4, in order to sort the array. The precise details are irrelevant to this discussion and depend on the sorting algorithm being used, but again, all sorting algorithms need to be able to compare items in order to sort.
The block given inside {} is passed as a comparing function for method sort. |a, b| tells us that this comparing function takes 2 parameters (which is expected number of arguments since we need to compare).
This block is executed for each element in array but if we need one more argument we take next element after this.
See http://ruby-doc.org/core-2.0/Array.html#method-i-sort for an explanation. As for a single-parameter method referred to in your books, I can only guess you were looking at sort_by. Can you give an example?

indexing and comparing string index or hash

I want to clean up my music-library by giving attention to songs that have the most doubles on my system. I could just list them all, sort the and do it manually but that would take too long. I want the list to sort on the most possible duplicates. So if a song would have 10 duplicates it would mean there are 10 songnames that resemble each other and thus i would focus my attention to that song first to just keep the best version.
I could compare two songnames using the using the levenshtein string-comparison technique and gem
require 'levenshtein'
Levenshtein.distance("string1", "string2") => 1
But let's say i have x number of songs, i would have to compare each song x times because i can't rely on normal filesorting, i would miss some duplicates then. eg
The Beatles - Hey Jude
Beatles, The - hey jude
Beatles_-_Hey_Judy_(remastered)
should give beatles - hey judy (x3)
Is there a way to produce an index based on the filename that then can be sorted and would give all the duplicates in descending order ? A kind of hash that can be compared ?
I know of other music comparing methods but they have their flaws, and this would be usable to compare other type of files also.
Try to use this code
files is an array of filenames, max_distance is a maximum distance to consider the names similar.
hash = {}
files.each do |file|
similar = hash.keys.select { |f| Levenshtein.distance(f, file) < max_distance }
if similar.any?
hash[similar.first] += 1
else
hash.merge!({file => 0})
end
end
After that you will get hash, which have filenames as keys and "duplicates" count as values, and you can sort it as you want.

How do you modify array mapping data structure resultant from Ruby map?

I believe that I may be missing something here, so please bear with me as I explain two scenarios in hopes to reconcile my misunderstanding:
My end goal is to create a dataset that's acceptable by Highcharts via lazy_high_charts, however in this quest, I'm finding that it is rather particular about the format of data that it receives.
A) I have found that when data is formatted like this going into it, it draws the points just fine:
[0.0000001240,0.0000000267,0.0000000722, ..., 0.0000000512]
I'm able to generate an array like this simply with:
array = Array.new
data.each do |row|
array.push row[:datapoint1].to_f
end
B) Yet, if I attempt to use the map function, I end up with a result like and Highcharts fails to render this data:
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
From code like:
array = data.map{|row| [(row.datapoint1.to_f)] }
Is there a way to coax the map function to produce results in B that more akin to the scenario A resultant data structure?
This get's more involved as I have to also add datetime into this, however that's another topic and I just want to understand this first and what can be done to perhaps further control where I'm going.
Ultimately, EVEN SCENARIO B SHOULD WORK according to the data in the example here: http://www.highcharts.com/demo/spline-irregular-time (press the "View options" button at bottom)
Heck, I'll send you a sucker in the mail if you can fill me in on that part! ;)
You can fix arrays like this
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
that have nested arrays inside them by using the flatten method on the array.
But you should be able to avoid generating nested arrays in the first place. Just remove the square brackets from your map line:
array = data.map{|row| row.datapoint1.to_f }
Code
a = [[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], [3.79e-09]]
b = a.flatten.map{|el| "%.10f" % el }
puts b.inspect
Output
["0.0000000067", "0.0000000044", "0.0000000021", "0.0000000025", "0.0000000038"]
Unless I, too, am missing something, your problem is that you're returning a single-element array from your block (thereby creating an array of arrays) instead of just the value. This should do you:
array = data.map {|row| row.datapoint1.to_f }
# => [ 6.67e-09, 4.39e-09, 2.1e-09, 2.52e-09, ..., 3.79e-09 ]

Resources