Auto-correct line lengths in a library - ruby

I am working on a huge project and we have decided to make all code conform to 80 characters per line. It is a ruby based project with wrappers for C. For Ruby, I decided to use Rubocop and with the command:
rubocop --only LineLength
I got 1,714 errors, where length of the line was greater than 80 characters. Aside from that, there were many other errors detected by Rubocop which I want to ignore for now.
I am looking for the easiest way to auto-correct all the line length violations only, to satisfy the 80 character limit both in C and Ruby.

Please don't change line length automatically.
Linelength is a metric, not a style. Styles can often be exchanged, like double vs. single quotes, using hashrockets vs. the new hash syntax introduced in ruby 2 etc. Choosing a style is usually a matter of taste and has few (if any) impact on the program itself. That's why there are auto corrections for styles: Changing them does not change semantics.
Metrics include linelength, classlength and ABC-Size, among others. Checking metrics using static code analysis is something completely different than checking styles. Setting a maximum to linelength for instance is not a matter of taste, you'd rather use it to enforce a certain style of programming. A programmer would have to avoid long variable names and deep nesting to keep the linelength under the limit.
Metrics can also indicate problems in the code. For example, too high ABC-size indicates a method might be doing too much.
Aside from that, it would be very difficult, if not impossible, to shorten all lines of code automatically, since ruby is a very complex language.
Instead of reducing linelength automatically, here's some alternatives:
see if you can reduce the number of violations by enabling the AllowHeredoc and AllowURI options. Read about them here
run rubocop --only LineLength --auto-gen-config and use rubocops configuration to stop checking linelength.
ask yourself: What value do I gain by reducing linelength?
don't think of too long lines as style violation, but rather as a possible indicator of an underlying problem. Try to find that problem, and solve it.


Autofix order of selectors

We use SonarQube against our application. One of the SonarQube rules says:
Selectors of lower specificity should come before overriding selectors of higher specificity
The details are here. As my application has many violations, changing the order by hand isn't really feasible. I'm wondering if there's a way to use scss-lint, stylelint or something else in a "fix" mode that could change the order of the selectors. I looked but couldn't find anything in stylelint. Maybe it can't safely be done automatically, as changing the order could affect specificity and therefore change the application behaviour...
As I personal! know there is no Linter which provide that. (I am curious about it.) But just some thoughts about the need of following that 'rule':
Indeed: writing SASS/CSS the way Selectors with lower specifity comes first is a good practicse. The CSS structure becomes more readable and it is easier to build up your code structure as there is a clearer systematic in your head (and the code).
But just up from the mechanic CSS works there is REALLY NO NEED to do it this way. The code simply doesn't become safer doing so or less safe and the pages don't load slower not doing it. That is what the mechanic of specifity has been done for: as of the specifity not the order of the selectors counts and you are able to write your code in the order you need it. Only if the specifity is the same the order counts.
So, maybe this rule leads to 'better' code. But: NOT ALL RULES NEEDS TO BE FULLFILLED. Not all rules Google tries to establish with their best practice rules they offer in their browser, nor all rules other analysis tools provide needs to be followed.
And if not in this project as it needs resources to correct it ... it maybe could but has not be a target for next project ;-)

Adding keyword commands and functions to Textmate 2 bundle syntax

I want to add some additional syntax highlighting definitions to an existing bundle, but I need some general advice on how to do this. I'm not building a syntax from scratch, and I think my request is pretty simple, but I think it involves some subtleties for which I find the manual somewhat impenetrable in finding the answer.
Basically, I'm trying to fill out the syntax definitions for the Stata bundle. It's great, but there is no built in support for automatically highlighting the base commands and the installed functions, only a handful of basic control statements. Stata is a language which is primarily used by calling lots of different high level pre-defined command calls, like command foo bar, options(). The convention is that these command calls be highlighted.
There are a ton of these commands, and stubs which are used for convenience. Just the base install has almost 3500. Even optimizing them using the bundle helper, which obviously gets rid of the stub issue, still yields a massive regex list. I can easily cut this down to less than 1000 important ones, but its still a lot. There are also 350 "functions" which I would like to match with the syntax function()
I essentially have 3 questions:
Am I creating a serious problem by including a very comprehensive list of matching definitions?
How do I restrict a command to only highlight when it either begins a line or there is only whitespace between the beginning line and the command
What is the preferred way of restricting the list of functions() to only highlight when they have attached parentheses?

Rubocop error 'Class definition is too long ruby'

I am getting rubocop error 'Class definition is too long. [236/100]'.
My class looks like below:
class SomeClassName
include HelperModule
attr_accessor :aaa, :bbb, :ccc
.... methods .....
What might go wrong?
The rubocop docs ClassLength says "length of a class exceeds some maximum value".
What does it mean?
Yes, this is because the overall lines are considered to be too many by rubucop. I agree classes shouldn't get too long but think that should ultimately be determined by: does the class have a single responsibility, are the methods concise enough, are there methods that could be shared via module, etc... The number/alert is a great warning though. If things in the class look OK, you can add # rubocop:disable ClassLength immediately above the class definition.
This likely means that your class definition takes more than 100 lines of code.
(There's lots of good info here already, but I came to this answer looking for the syntax for specifying the max lines per class in Rubocop, and I figure others might come here for that as well.)
In .rubocop.yml
# Allow classes longer than 100 lines of code
Max: 250 # or whatever ends up being appropriate
General answer
Do I need to disable this cop if I want my class to be longer than 100 here or need to configure the maximum length. What would you suggest?
I use rubocop with this workflow, assuming I encounter a pre-existing code base with loads of warnings:
Run rubocop --auto-gen-config to create a "TODO" file. Include that file in your primary rubocop configuration file. See their documentation for details. In your example, it would generate a configuration that allows long classes (at least 236 lines, or more if you have larger classes).
Now, if you run rubocop, it will ignore all offences that you put in your TODO file - i.e., everything will looke fine now. Only if you introduce more errors/warnings (like a class with 237 lines) will it fire up again. So, at this point, rubocop will not do anything for you except prevent you to make it worse.
Once in a while, when I got some time to kill, I pick a rule from the TODO file and work on it. There are three possibilities:
Remove the rule from the TODO file. This will restore the original behaviour of rubocop.
Relax the number associated with it. Say, the class length is limited to 250 right now; I would like to set it to (say) 100 lines, but I know that I do not have time to refactor many classes right now. So I set it to 240. This will trigger all classes that are between 240 and 250 lines; likely only a few that I can handle easily. I fix them and move on. Another day, I maybe return to it and go from 240 to 230 etc.
Sometimes I decide that I do not bother about a particular warning. Then I move the configuration from the TODO file to the proper .rubocop file, allowing it permanently.
So, there are no hard and fast rules about all of this. You are supposed to find your own values. Some things that rubocop objects to are totally fine with me since they are more down to coding style than correctness or whatever.
Specific answer
Do I need to disable this cop if I want my class to be longer than 100 here or need to configure the maximum length. What would you suggest?
I most certainly do have a maximum amount of lines configured for my class files (as well as for methods). The length of a unit of code (be it a class or a method) is a very simple yet effective "code smell" that points to grown code and candidates for refactoring/splitting.
I have picked some numbers for me, and I stick with them. I.e., I don't constantly move them up or down to suit the particular code around, but if a piece of code grows "1 line too large", I take action. Most times my target is to split it roughly in half, which will, in the long run, lead to the minimum required effort.
If a class is very long, it usually breaks the "one responsibility per class" rule. It is usually beneficial to break it down into parts. Not just random subclasses, but actully OO-justifiable, patterned, constructs.
If a method is very long, it can sometimes point to OO measures to be taken (i.e., classes introduced or the method being split over existing classes; especially if the method consists of large if/else constructs or especially case statements), but more often it calls for simple old-fashioned refactoring into smaller (likely private) methods.
Have fun finding your favourite settings in rubocop, it's awesome.

Using strings instead of symbols: good or evil?

Often enough, I find myself dealing with lists of function options (or more general replacement lists) of the form {foo->value,...}. This leads to bugs when foo already has a value in $Context. One obvious way to prevent this is using a string "foo" instead of the symbol: {"foo"->value,...}. This works, but seems to draw ire of some seasoned LISPers I know, who chastise me for conflating symbols and strings and tell me to use built-in quoting constructs.
While it is certainly possible to write code that avoids collisions without using strings, it often seems more trouble than it is worth. On the other hand, I haven't seen too many examples of {"string"->value} type replacement rules. So the question to you is -- is this an acceptable usage pattern?.. Are there cases where it is particularly appropriate?.. Where should it be avoided?..
In my opinion (disclaimer - it is only my opinion), it is best to avoid using strings as option names, at least for "main" options in your function. Strings OTOH are totally fine as settings (r.h.s. of options). This is not to say that you can not use strings, just as you noted. Perhaps, they could be more appropriate for sub-options, and they are used in this way by many system functions (usually "superfunctions" like NDSolve, that may have sub-options within options). The main problems I see with using strings is that they reduce the introspection capabilities, both for the system and for the user. In other words, it is harder to discover an option that has a string name than that with a symbol name - for the latter I can just inspect the names of the symbols in a package, and also symbolic option names have usage messages. You may also want to automate some things, such as writing a utility that finds all option names in the package etc. It is easier to do when option names are symbols, since they all belong to the same context. It is also easy to discover that some options do not have usage messages, one can do that automatically by writing a utility function.
Finally, you may have a better protection against accidental collisions of similar option names. It may be, that many option sequences are passed to your function, and occasionally they may contain options with the same name. If option names were symbols, full symbol names would be different. Then, you will both get a shadowing warning, and at the same time a protection - only the correct option (full) name will be used. For string, you don't get any warning, and may end up using incorrect option setting, if the duplicate string option name with a wrong setting (intended for a different function, say) happens to be first in the list. This scenario is more likely to occur in larger projects, but bugs like this are probably very hard to catch (this is a guess, I never had such situation).
As for possible collisions, if you follow some naming conventions such as option name always starting with a capital letter, plus put most of your code in packages, and do not start your variable or function names (for functions in the interactive session), with a capital letter, then you will greatly reduce the chance of such collisions. Additionally, you should Protect option names, when you define them, or at the end of the package. Then, the collisions will be detected as cases of shadowing. Avoiding shadowing, OTOH, is a general necessity, so the case of options is no more special in this respect than for function names etc.

Cross version line matching

I'm considering how to do automatic bug tracking and as part of that I'm wondering what is available to match source code line numbers (or more accurate numbers mapped from instruction pointers via something like addr2line) in one version of a program to the same line in another. (Assume everything is in some kind of source control and is available to my code)
The simplest approach would be to use a diff tool/lib on the files and do some math on the line number spans, however this has some limitations:
It doesn't handle cross file motion.
It might not play well with lines that get changed
It doesn't look at the information available in the intermediate versions.
It provides no way to manually patch up lines when the diff tool gets things wrong.
It's kinda clunky
Before I start diving into developing something better:
What already exists to do this?
What features do similar system have that I've not thought of?
Why do you need to do this? If you use decent source version control, you should have access to old versions of the code, you can simply provide a link to that so people can see the bug in its original place. In fact the main problem I see with this system is that the bug may have already been fixed, but your automatic line tracking code will point to a line and say there's a bug there. Seems this system would be a pain to build, and not provide a whole lot of help in practice.
My suggestion is: instead of trying to track line numbers, which as you observed can quickly get out of sync as software changes, you should decorate each assertion (or other line of interest) with a unique identifier.
Assuming you're using C, in the case of assertions, this could be as simple as changing something like assert(x == 42); to assert(("check_x", x == 42)); -- this is functionally identical, due to the semantics of the comma operator in C and the fact that a string literal will always evaluate to true.
Of course this means that you need to identify a priori those items that you wish to track. But given that there's no generally reliable way to match up source line numbers across versions (by which I mean that for any mechanism you could propose, I believe I could propose a situation in which that mechanism does the wrong thing) I would argue that this is the best you can do.
Another idea: If you're using C++, you can make use of RAII to track dynamic scopes very elegantly. Basically, you have a Track class whose constructor takes a string describing the scope and adds this to a global stack of currently active scopes. The Track destructor pops the top element off the stack. The final ingredient is a static function Track::getState(), which simply returns a list of all currently active scopes -- this can be called from an exception handler or other error-handling mechanism.
