In Pylint, how do I disable "Exactly one space after comma" for multidimensional array indices? - pylint

I like having PyLint check that commas are generally followed by spaces, except in one case: multidimensional indices. For example, I get the following warning from Pylint:
C: 31, 0: Exactly one space required after comma
num_features = len(X_train[0,:])
^ (bad-whitespace)
Is there a way to get rid of the warnings requiring spaces after commas for the case multidimensional arrays, but keep the space-checking logic the same for all other comma uses?
Thanks!

I am sure you figured this out by now but for anyone, like me, who happened upon this looking for an answer...
use # pylint: disable=C0326 on the line that is guilty of this. for instance:
num_features = len(X_train[0,:]) #pylint: disable=C0326
This applies to multiple kinds of space errors. See pylint wiki

You'll almost certainly want to disable this via the .pylintrc file for larger situations.
Example, say I have:
x111 = thing.abc(asdf)
x112_b = thing1.abc(asdf)
x112_b224 = thing.abc(asdf)
x112_f = thing1.abc(asdf)
... lots more
Now, presume I want to visually see the situation:
x111 = thing.abc(asdf)
x112_b = thing1.abc(asdf)
... lots more
so I add the following line to .pylintrc
disable=C0326,C0115,C0116
(note only the first one, c0326, counts, but I'm leaving two other docstring ones there so you can see you just add err messages you want to ignore.)

Related

SPSS Output modify - deleting footnotes

When generating tables in SPSS with the calculation of the mode it will show a footnote when multiple modes are present ("a. Multiple modes exist. The smallest value is shown").
There is a way to hide this footnote, but the hidden footnote will still take up an empty row in the output. For processing reasons I would like to delete the footnote entirely.
I have been trying to do this with the Output Modify command in the syntax, but can't get it to work. There are commands for selecting the footnotes:
/TABLECELLS
SELECT = ["expression", COUNT, MEAN, MEDIAN, RESIDUAL,
PERCENT, SIGNIFICANCE, POSITION(integer),
BODY, HEADERS, FOOTNOTES, TITLE, CAPTION, …]
And for deleting objects:
/DELETEOBJECT DELETE={NO**}
{YES }
But trying to combine these does not yield the wanted result. Is what I am trying to do possible? Or maybe a suggestion for another solution?
Thanks in advance!
try adding "NOTESCAPTIONS = NO" at the end of your OUTPUT MODIFY syntax.
that should remove all notes and captions in the output.
no need to use "DELETEOBJECT" subcommand.

Regex causing high CPU load, causing Rails to not respond

I have a Ruby 1.8.7 script to parse iOS localization files:
singleline_comment = /\/\/(.*)$/
multiline_comment = /\/\*(.*?)\*\//m
string_line = /\s*"(.*?)"\s*=\s*"(.*?)"\s*\;\s*/xm
out = decoded_src.scan(/(?:#{singleline_comment}|#{multiline_comment})?\s*?#{string_line}/)
It used to work fine, but today we tested it with a file that is 800Kb, and that doesn't have ; at the end of each line. The result was a high CPU load and no response from the Rails server. My assumption is that it took the whole file as a single string in the capturing group and that blocked the server.
The solution was to add ? (regex quantificator, 0 or 1 time) to the ; literal character:
/\s*"(.*?)"\s*=\s*"(.*?)"\s*\;?\s*/xm
Now it works fine again even with those files in the old iOS format, but my fear now is, what if a user submits a malformed file, like one with no ending ". Will my server get blocked again?
And how do I prevent this? Is there any way to try to run this only for five seconds? What I can I do to avoid halting my whole Rails application?
It looks like you're trying to parse an entire configuration as if it was a string. While that is doable, it's error-prone. Regular expression engines have to do a lot of looking forward and backward, and poorly written patterns can end up wasting a huge amount of CPU time. Sometimes a minor tweak will fix the problem, but the more text being processed, and the more complex the expression, the higher the chance of something happening that will mess you up.
From benchmarking different ways of getting at data for my own work, I've learned that anchoring regexp patterns can make a huge difference in speed. If you can't anchor a pattern somehow, then you are going to suffer from the backtracking and greediness of patterns unless you can limit what the engine wants to do by default.
I have to parse a lot of device configurations, but instead of trying to treat them as a single string, I break them down into logical blocks consisting of arrays of lines, and then I can provide logic to extract data from those blocks based on knowledge that blocks contain certain types of information. Small blocks are faster to search, and it's a lot easier to write patterns that can be anchored, providing huge speedups.
Also, don't hesitate to use Ruby's String methods, like split to tear apart lines, and sub-string matching to find lines containing what you want. They're very fast and less likely to induce slowdowns.
If I had a string like:
config = "name:\n foo\ntype:\n thingie\nlast update:\n tomorrow\n"
chunks = config.split("\n").slice_before(/^\w/).to_a
# => [["name:", " foo"], ["type:", " thingie"], ["last update:", " tomorrow"]]
command_blocks = chunks.map{ |k, v| [k[0..-2], v.strip] }.to_h
command_blocks['name'] # => "foo"
command_blocks['last update'] # => "tomorrow"
slice_before is a very useful method for this sort of task as it lets us define a pattern that is then used to test for breaks in the master array, and group by those. The Enumerable module has lots of useful methods in it, so be sure to look through it.
The same data could be parsed.
Of course, without sample data for what you're trying to do it's difficult to suggest something that works better, but the idea is, break down your input into small manageable chunks and go from there.
As a comment on how you're defining your patterns.
Instead of using /\/.../ (which is known as "leaning-toothpicks syndrome") use %r which allows you to define a different delimiter:
singleline_comment = /\/\/(.*)$/ # => /\/\/(.*)$/
singleline_comment = %r#//(.*)$# # => /\/\/(.*)$/
multiline_comment = /\/\*(.*?)\*\//m # => /\/\*(.*?)\*\//m
multiline_comment = %r#/\*(.*?)\*/#m # => /\/\*(.*?)\*\//m
The first line in each sample above is how you're doing it, and the second is how I'd do it. They result in identical regexp objects, but the second ones are easier to understand.
You can even have Regexp help you by escaping things for you:
NONGREEDY_CAPTURE_NONE_TO_ALL_CHARS = '(.*?)'
GREEDY_CAPTURE_NONE_TO_ALL_CHARS = '(.*)'
EOL = '$'
Regexp.new(Regexp.escape('//') + GREEDY_CAPTURE_NONE_TO_ALL_CHARS + EOL) # => /\/\/(.*)$/
Regexp.new(Regexp.escape('/*') + NONGREEDY_CAPTURE_NONE_TO_ALL_CHARS + Regexp.escape('*/'), Regexp::MULTILINE) # => /\/\*(.*?)\*\//m
Doing this you can iteratively build up extremely complex expressions while keeping them relatively easy to maintain.
As far as halting your Rails app, don't try to process the files in the same Ruby process. Run a separate job that watches for the files and process them and store whatever you're looking for to be accessed as needed later. That way your server will continue to respond rather than lock up. I wouldn't do it in a thread, but would write a separate Ruby script that looks for incoming data, and if nothing is found, sleeps for some interval of time then looks again. Ruby's sleep method will help with that, or you could use the cron capability of your OS.

Ruby regular expression for asterisks/underscore to strong/em?

As part of a chat app I'm writing, I need to use regular expressions to match asterisks and underscores in chat messages and turn them into <strong> and <em> tags. Since I'm terrible with regex, I'm really stuck here. Ideally, we would have it set up such that:
One to three words, but not more, can be marked for strong/em.
Patterns such as "un*believ*able" would be matched.
Only one or the other (strong OR em) work within one line.
The above parameters are in order of importance, with only #1 being utterly necessary - the others are just prettiness. The closest I came to anything that worked was:
text = text.sub(/\*([(0-9a-zA-Z).*])\*/,'<b>\1<\/b>')
text = text.sub(/_([(0-9a-zA-Z).*])_/,'<i>\1<\/i>')
But it obviously doesn't work with any of our params.
It's odd that there's not an example of something similar already out there, given the popularity of using asterisks for bold and whatnot. If there is, I couldn't find it outside of plugins/gems (which won't work for this instance, as I really only need it in in one place in my model). Any help would be appreciated.
This should help you finish what you are doing:
sub(/\*(.*)\*/,'<b>\1</b>')
sub(/_(.*)_/,'<i>\1</i>')
Firstly, your criteria are a little strange, but, okay...
It seems that a possible algorithm for this would be to find the number of matches in a message, count them to see if there are less than 4, and then try to perform one set of substitutions.
strong_regexp = /\*([^\*]*)\*/
em_regexp = /_([^_]*)_/
def process(input)
if input ~= strong_regexp && input.match(strong_regexp).size < 4
input.sub strong_regexp, "<b>\1<\b>"
elsif input ~= em_regexp && intput.match(em_regexp).size < 4
input.sub em_regexp, "<i>\1<\i>"
end
end
Your specifications aren't entirely clear, but if you understand this, you can tweak it yourself.

Fastest way to skip lines while parsing files in Ruby?

I tried searching for this, but couldn't find much. It seems like something that's probably been asked before (many times?), so I apologize if that's the case.
I was wondering what the fastest way to parse certain parts of a file in Ruby would be. For example, suppose I know the information I want for a particular function is between lines 500 and 600 of, say, a 1000 line file. (obviously this kind of question is geared toward much large files, I'm just using those smaller numbers for the sake of example), since I know it won't be in the first half, is there a quick way of disregarding that information?
Currently I'm using something along the lines of:
while buffer = file_in.gets and file_in.lineno <600
next unless file_in.lineno > 500
if buffer.chomp!.include? some_string
do_func_whatever
end
end
It works, but I just can't help but think it could work better.
I'm very new to Ruby and am interested in learning new ways of doing things in it.
file.lines.drop(500).take(100) # will get you lines 501-600
Generally, you can't avoid reading file from the start until the line you are interested in, as each line can be of different length. The one thing you can avoid, though, is loading whole file into a big array. Just read line by line, counting, and discard them until you reach what you look for. Pretty much like your own example. You can just make it more Rubyish.
PS. the Tin Man's comment made me do some experimenting. While I didn't find any reason why would drop load whole file, there is indeed a problem: drop returns the rest of the file in an array. Here's a way this could be avoided:
file.lines.select.with_index{|l,i| (501..600) === i}
PS2: Doh, above code, while not making a huge array, iterates through the whole file, even the lines below 600. :( Here's a third version:
enum = file.lines
500.times{enum.next} # skip 500
enum.take(100) # take the next 100
or, if you prefer FP:
file.lines.tap{|enum| 500.times{enum.next}}.take(100)
Anyway, the good point of this monologue is that you can learn multiple ways to iterate a file. ;)
I don't know if there is an equivalent way of doing this for lines, but you can use seek or the offset argument on an IO object to "skip" bytes.
See IO#seek, or see IO#open for information on the offset argument.
Sounds like rio might be of help here. It provides you with a lines() method.
You can use IO#readlines, that returns an array with all the lines
IO.readlines(file_in)[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end
or
f = File.new(file_in)
f.readlines[500..600].each do |line|
#line is each line in the file (including the last \n)
#stuff
end

Parsing text files in Ruby when the content isn't well formed

I'm trying to read files and create a hashmap of the contents, but I'm having trouble at the parsing step. An example of the text file is
put 3
returns 3
between
3
pargraphs 1
4
3
#foo 18
****** 2
The word becomes the key and the number is the value. Notice that the spacing is fairly erratic. The word isn't always a word (which doesn't get picked up by /\w+/) and the number associated with that word isn't always on the same line. This is why I'm calling it not well-formed. If there were one word and one number on one line, I could just split it, but unfortunately, this isn't the case. I'm trying to create a hashmap like this.
{"put"=>3, "#foo"=>18, "returns"=>3, "paragraphs"=>1, "******"=>2, "4"=>3, "between"=>3}
Coming from Java, it's fairly easy. Using Scanner I could just use scanner.next() for the next key and scanner.nextInt() for the number associated with it. I'm not quite sure how to do this in Ruby when it seems I have to use regular expressions for everything.
I'd recommend just using split, as in:
h = Hash[*s.split]
where s is your text (eg s = open('filename').read. Believe it or not, this will give you precisely what you're after.
EDIT: I realized you wanted the values as integers. You can add that as follows:
h.each{|k,v| h[k] = v.to_i}

Resources