Undo wrong sorting after missing leading zeros - algorithm

Here you have a little riddle for anyone who wants to spend some time on it:
I have around 200 files that got badly sorted and renamed due to the lack of leading zeros. I have to undo this sorting, and assign the original values again, so that I have an order like this:
current file name original
TimePoint1 -> TimePoint1
TimePoint2 -> TimePoint10
TimePoint3 -> TimePoint100
TimePoint4 -> TimePoint101
TimePoint5 -> TimePoint102
...
TimePoint250 -> TimePoint250
I will work on an answer, but I didn't want to miss any of the elegant solutions you might provide.
Thanks and have fun!

What you have here is simply alphabetical order. So the easiest way to un-do this sorting will be to use this order again. I am not sure which language you want to use so here is an example in ruby:
a = []
(1..250).each{|t| a << t.to_s} # note that I am adding the *string* representation
a.sort!
(0...250).each do |i|
File.rename("TimePoint#{a[i]}", "TimePoint#{i + 1}")
end

Related

How to efficiently generate a unique combination of first and last name given an array of full names

I'm having a bit of difficulty with a problem I'm having. I have an array of names like so:
[Brutananadilewski, Carl]
[Crews, Xander]
[Cartman, Eric]
[Rubio, Daniel]
[Daniels, Julie]
etc. etc.
What I need to do is to create a list of unique names from this list without having first and last names repeated. So I would have the following as a result:
[Brutananadilewski, Daniel]
[Crews, Erix]
[Cartman, Xander]
[Rubio, Carl]
[Jill, Daniels]
The problem I'm having is trying to do this efficiently. My first instint was to use permutation and here is a snippet from the ruby docs
a.permutation(2).to_a #=> [[1,2],[1,3],[2,1],[2,3],[3,1],[3,2]]
The problem is having the following from that example [[1,2],[1,3]]
Theoretically if this was a first/last name this wouldnt work. I couldn't have this:
[Rubio Daniel, Rubio Julie, Rubio Eric]
Has anyone dealt with this before? I'm having an awfully hard time with efficiency and just getting it to work. Help would be appreciated thank you.
You can use transpose as mentioned by 3limin4tor, then shuffle and zip as mentioned by Dave Newton:
surnames, forenames = names.transpose
shuffled_forenames = forenames.shuffle
shuffled_names = shuffled_forenames.zip(surnames)
The desired outcome isn't entirely clear from the question, but if you're trying to create all combinations of first/last names and get a subset of those, you could also use product and uniq to get all the uniq combinations:
names = [
%w(Brutananadilewski Carl),
%w(Crews Xander),
%w(Cartman Eric),
%w(Rubio Daniel),
%w(Daniels Julie)
]
surnames, forenames = names.transpose
all_name_combos = forenames.product(surnames).uniq
You could then use shuffle and sample(INTEGER) to get a subset of those name combinations
all_name_combos.shuffle.sample(5)

In Pylint, how do I disable "Exactly one space after comma" for multidimensional array indices?

I like having PyLint check that commas are generally followed by spaces, except in one case: multidimensional indices. For example, I get the following warning from Pylint:
C: 31, 0: Exactly one space required after comma
num_features = len(X_train[0,:])
^ (bad-whitespace)
Is there a way to get rid of the warnings requiring spaces after commas for the case multidimensional arrays, but keep the space-checking logic the same for all other comma uses?
Thanks!
I am sure you figured this out by now but for anyone, like me, who happened upon this looking for an answer...
use # pylint: disable=C0326 on the line that is guilty of this. for instance:
num_features = len(X_train[0,:]) #pylint: disable=C0326
This applies to multiple kinds of space errors. See pylint wiki
You'll almost certainly want to disable this via the .pylintrc file for larger situations.
Example, say I have:
x111 = thing.abc(asdf)
x112_b = thing1.abc(asdf)
x112_b224 = thing.abc(asdf)
x112_f = thing1.abc(asdf)
... lots more
Now, presume I want to visually see the situation:
x111 = thing.abc(asdf)
x112_b = thing1.abc(asdf)
... lots more
so I add the following line to .pylintrc
disable=C0326,C0115,C0116
(note only the first one, c0326, counts, but I'm leaving two other docstring ones there so you can see you just add err messages you want to ignore.)

What is the general format of Ruby "diff-lcs" diff output?

The Ruby diff-lcs library does a great job of generating the changeset you need to get from one sequence to another but the format of the output is somewhat confusing to me. I would expect a list of changes but instead the output is always a list containing one or two lists of changes. What is the meaning/intent of having multiple lists of changes?
Consider the following simple example:
> Diff::LCS.diff('abc', 'a-c')
# => [[#<Diff::LCS::Change:0x01 #action="-", #position=1, #element="b">,
# #<Diff::LCS::Change:0x02 #action="+", #position=1, #element="-">],
# [#<Diff::LCS::Change:0x03 #action="-", #position=3, #element="">]]
Ignoring the fact that the last change is blank, why are there two lists of changes instead of just one?
You might have better luck with a better example. If you do this:
Diff::LCS.diff('ab cd', 'a- c_')
Then the output looks like this (with the noise removed):
[
[
<#action="-", #position=1, #element="b">,
<#action="+", #position=1, #element="-">
], [
<#action="-", #position=4, #element="d">,
<#action="+", #position=4, #element="_">
]
]
If we look at Diff::LCS.diff('ab cd ef', 'a- c_ e+'), then we'd get three inner arrays instead of two.
What possible reason could there be for this? There are three operations in a diff:
Add a string.
Remove string.
Change a string.
A change is really just a combination of removes and adds so we're left with just remove and add as the fundamental operations; these line up with the #action values quite nicely. However, when humans look at diffs, we want to see a change as a distinct operation, we want to see that b has become -, the "remove b, add -" version is an implementation detail.
If all we had was this:
[
<#action="-", #position=1, #element="b">,
<#action="+", #position=1, #element="-">,
<#action="-", #position=4, #element="d">,
<#action="+", #position=4, #element="_">
]
then you'd have to figure out which +/- pairs were really changes and which were separate additions and removals.
So the inner arrays map the two fundamental operations (add, remove) to the three operations (add, remove, change) that humans want to see.
You might want to examine the structure of the outputs from these as well:
Diff::LCS.diff('ab cd', 'a- x c_')
Diff::LCS.diff('ab', 'abx')
Diff::LCS.diff('ab', 'xbx')
I think an explicit change #action for Diff::LCS::Change would be better but at least the inner arrays let you group the individual additions and removals into higher level edits.

Ruby regular expression for asterisks/underscore to strong/em?

As part of a chat app I'm writing, I need to use regular expressions to match asterisks and underscores in chat messages and turn them into <strong> and <em> tags. Since I'm terrible with regex, I'm really stuck here. Ideally, we would have it set up such that:
One to three words, but not more, can be marked for strong/em.
Patterns such as "un*believ*able" would be matched.
Only one or the other (strong OR em) work within one line.
The above parameters are in order of importance, with only #1 being utterly necessary - the others are just prettiness. The closest I came to anything that worked was:
text = text.sub(/\*([(0-9a-zA-Z).*])\*/,'<b>\1<\/b>')
text = text.sub(/_([(0-9a-zA-Z).*])_/,'<i>\1<\/i>')
But it obviously doesn't work with any of our params.
It's odd that there's not an example of something similar already out there, given the popularity of using asterisks for bold and whatnot. If there is, I couldn't find it outside of plugins/gems (which won't work for this instance, as I really only need it in in one place in my model). Any help would be appreciated.
This should help you finish what you are doing:
sub(/\*(.*)\*/,'<b>\1</b>')
sub(/_(.*)_/,'<i>\1</i>')
Firstly, your criteria are a little strange, but, okay...
It seems that a possible algorithm for this would be to find the number of matches in a message, count them to see if there are less than 4, and then try to perform one set of substitutions.
strong_regexp = /\*([^\*]*)\*/
em_regexp = /_([^_]*)_/
def process(input)
if input ~= strong_regexp && input.match(strong_regexp).size < 4
input.sub strong_regexp, "<b>\1<\b>"
elsif input ~= em_regexp && intput.match(em_regexp).size < 4
input.sub em_regexp, "<i>\1<\i>"
end
end
Your specifications aren't entirely clear, but if you understand this, you can tweak it yourself.

Fastest way to skip lines while parsing files in Ruby?

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
file.lines.drop(500).take(100) # will get you lines 501-600
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:
file.lines.select.with_index{|l,i| (501..600) === i}
PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100
or, if you prefer FP:
file.lines.tap{|enum| 500.times{enum.next}}.take(100)
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.
See IO#seek, or see IO#open for information on the offset argument.
Sounds like rio might be of help here. It provides you with a lines() method.
You can use IO#readlines, that returns an array with all the lines
IO.readlines(file_in)[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
or
f = File.new(file_in)
f.readlines[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end

Resources