how to match function code block with regex - visual-studio

What I like to do is remove all functions which has specific word for example, if the word is 'apple':
void eatapple()
{
// blah
// blah
}
I'd like to delete all code from 'void' to '}'.
What I tried is:
^void.*apple(.|\n)*}
But it took very long time I think something is wrong here.
I'm using Visual Studio. Thank you.

To clarify jeong's eventual solution, which I think is pretty clever: it works, but it depends on the code being formatted in a very particular way. But that's OK, because most IDE's can enforce a particular formatting standard anyway. If I may give a related example - the problem that brought me here - I was looking to find expressions of the form (in Java)
if (DEBUG) {
// possibly arbitrary statements or blocks
}
which, yes, isn't technically regular, but I ran the Eclispe code formatter on the files to make sure they all necessarily looked like this (our company's usual preferred code style):
if (DEBUG) {
statement();
while (whatever) {
blahblahblah(etc);
}
// ...
}
and then looking for this (this is Java regex syntax, of course)
^(\s*)if \(DEBUG.*(?:\n\1 .*)*\n\1\}
did the trick.

Finally did it.
^void.*(a|A)pple\(\)\n\{\n((\t.*\n)|(^$\n))*^\}

Function blocks aren't regular, so using a regular expression in this situation is a bad idea.
If you really have a huge number of functions that you need to delete (more than you can delete by hand (suggesting there's something wrong with your codebase — but I digress)) then you should write a quick brace-counting parser instead of trying to use regular expressions in this situation.
It should be pretty easy, especially if you can assume the braces are already balanced. Read in tokens, find one that matches "apple", then keep going until you reach the brace that matches with the one immediately after the "apple" token. Delete everything between.

In theory, regular language is not able to express a sentence described by context free grammar. If it is a one time job, why don't you just do it manually.

Switch to VI. Then you can select the opening brace and press d% to delete the section.

Related

empty lines in functions/methods

At our company's Xmas party we had almost a physical fight over one issue:
whether or not to allow empty lines in a function/method in our code (c/c++/c#). This is all about just single blank line, for example imagine code like:
private void LoadData()
{
if (!loaded)
return;
DataStore coll;
SqlData s = new SqlData();
if (!s.CallStoredProcedureToStore(out coll, "xy.usp_zzz",...))
return;
dataViewXy.BeginUpdate();
dataViewXy.Items.Clear();
for(int i = 0; i < coll.RowCount; i++)
{
dataViewXy.Items.Add(coll[i]["ID"]);
}
dataViewXy.EndUpdate();
}
this is a nonsensical mix, just to illustrate situation - first block of function loads data, second one fills some kind of data control and the blank line separates them.
One should also have written a comment before each 'block', but the important thing is that the for guys would place a blank line before that comment and the against guys would not.
For: Allows programmer to logically separate pieces of function so it improves readability as some kind visual cues for the eyes to follow
Against: Decreases readability and programmer should use comment for separating pieces of the function instead.
My personal opinion is to allow blank lines in functions, because without them long pieces of code look to me like a never ending stream of lines without any cues about where to find the stuff I'm looking for)
I think that an empty line in method body is a code smell. Read this blog post about this very subject: An Empty Line is a Code Smell. In short, empty lines in method body is a bad practice because a method should not contain "parts". A method should always do one thing and its functional decomposition should be done by language constructs (for example, new methods), and never by empty lines.
That will be awesome duel to watch. Just like the final scene in The Good, The Bad, and the Ugly but over method whitespace instead of gold.
Honestly, this shouldn't even matter unless you need to constantly mull over someone else's code in your team on a pretty frequent basis. Personally, I am all for whitespace as it gives a logical grouping. But it might not be the same for someone else. Heck, even I might change the way I group things over time. Grouped items that previously used to make sense might not anymore.
Regardless of how we group items, I think grouping is very important. We can argue over why empty lines are unnecessary, but the fact is that the brain can only handle a limited amount of information at a time. So if I can understand the function as composed of three subgroups for instance, instead of ten statements, that makes a lot of difference. And grouping by comments is subjective to your IDE. Most IDE's color them lightly than the rest of the code which gives that feel of separation, but that becomes IDE specific.
So to argue for the point, it's all about grouping. If a class becomes too big, we have to break it for the sake of understandability and maintenance. Similarly, if the sequential lines of code inside a method becomes too much too grasp, it must be broken down more methods or logically separated with a blank newline.
Also, if the argument is going nowhere, throw in some random stuff like how these empty lines relate to the Buddhist concept of nothingness bla bla..
"One size fits all" rules like this do not work.
There are valid arguments for and against each point of view e.g. I could counter your "long lines of code" argument by saying such functions should be broken into sub routines.
However, this is beside the point.
Saying one "MUST" insert blank lines is as nonsensical as saying one CANNOT insert blank lines.
BW's law in these situations is "use what works for you"
If this must be resolved, take a vote and have everyone agree to abide by the vote before the vote is taken (even you, if you loose).
That's my two cents.
P.S. My personal opinion is for blank lines.
Definitely for, although this question may be subjective.
Besides, I've learned to split up a method if it becomes too large rather can "compress" it by removing enters.
A physical fight? At a Christmas party? Sounds like a fun spot.
I'm in favor of whitespace as well when it improves readability. That's part of the reason why I'm also in favor of lining up curly braces: the extra whitespace sets off the block of code visually.
Most comments are clutter and don't help readability. If they don't add information to help understand code they're only getting in the way.
Sounds like I would have been fighting against the "no whitespace/add comments" crowd.
My rule of thumb is to place all of the variable declarations at the beginning of a function, followed by the processing and then the return statement (if any), each separated by blank lines.
int DoSomething(string robot)
{
int result = 0;
if (robot == "robot")
result = 42;
return result;
}
If it's a moderately complex method with logically distinct blocks of code that don't require encapsulating in their own methods, I'll separate them with spaces as well. Anything that might equate to a 'paragraph' of code is separated with a blank line.
Most of the commenting I leave in the /// comment blocks at the top of the method.
I'm all for whitespace - it's an insignificant amount of memory usage that vastly improves the readability of code by cutting the program up into logical steps. This isn't a replacement for good comments, however. My commenting rule is 'try not to do the things that are irritating in other peoples code (which usually turns out to be not commenting - A simple '//do X and Y if Z says it's necessary' is a good level of detail for an individual comment I reckon)
We have plenty of storage space and good sized monitors, so why not use it to make the job easier?
Huge chunks of whitespaces can push code down off the screen however, which hinders the ability to look at the code and get a good overview, so it's best not to overdo it.
Like all things in programming it's down to case-by-case judgement, personal preference and a hint of moderation.

Is coding style mostly for readability or are there other factors?

I remember reading in Douglas Crockford's "Javascript the Good Parts" book that there is potential for error with blockless statements because of automatic semicolon insertion.
if (condition)
foo = true;
vs
if (condition)
{
foo = true;
}
In the second the example it will work consistently, in the first example a semicolon will be automatically inserted by the interpreter and can lead to ambiguity in the code. As Douglas points out this is potentially bad and hard to debug, which I agree. But it got me thinking are there examples where coding "style" actually has syntax implications? In other words, examples where failing to follow a certain indentation or apparent style actually results in a bug or error. I suppose Python with its significant whitespace is an example, YML with its requirement for no tabs is another.
Feel free to respond in a wide variety of languages and idioms. I am curious to hear about the paradigm cases. In your answer I would like to know the WHAT and the WHY of the coding style or syntax behavior. I don't want to start any coding style flame wars, just matter of fact scenarios where the uninitiated would get tripped up.
Javascript treats these two cases seperately. You have to use the first
return {
// code
}
return
{
// code
}
If you do not the interpreter adds semi colons in the wrong places. I think it puts one after the condition. So the second would be read wrongly as.
return;
{
// code
}
Which is not invalid syntax.
No one has mentioned it before, but one point of style is to write
if (NULL==p) // ...
instead of
if (p==NULL) // ...
The two are functionally equivalent so it's a matter of style. Many prefer the style at the top, because if you type "=" by mistake instead of "==", the first won't compile whereas the second will, and creates a bug that is hard to find (though some compilers now give you a warning for if (p=NULL)).
A coding style is not only for readability. There are some other factors like
Avoid common language pitfalls (silent fails, exceptions while casting etc.)
Ease of maintenance
Increase the clarity of code (e.g. private functions always starting loweCase and public beeing PascalCase)
Enforcing conventions (e.g. not using multiple inheritance or always inherit public in c++)
an example for ease of maintenacne is below:
if(x) return true;
vs.
if(x)
{
return true;
}
it is clear the second is easier to maintain, because i can simply go ahead and add a new line and add a call to bla() without having to add brackets before.
In Python, the whitespace indentation, rather than curly braces or keywords, delimits the statement blocks. An increase in indentation comes after certain statements; a decrease in indentation signifies the end of the current block.
Whitespace is not significant in any of the C family of languages, except to separate certain language tokens. The layout of the source code has no effect on the executable produced by the compiler.
In C++, there's a difference between
vector<pair<int,int>>
and
vector<pair<int,int> >
because >> is treated as a single token by the parser.
Every time I open a parentheses, brace, single or double quote I always close it and then go back to write the statement, condition, etc... This saves me from quite some possibly big mistakes!
It depends on the language. In the curly family C/C++/Java/C#, whitespace is not the restriction as long as your braces are opened and closed properly.
In languages like VB where paired keywords are used, just because keywords delimit functions you can't have
Private Sub someFunction() End Sub
But in C, C++ and other curly languages, you can have
void someFunction() { }
In python, I guess its a completely different matter.
It all depends on the particular language. As per your specific example, I don't think there is a syntactical or semantic difference between the two.
It is language dependant.
For instance in Java
void someFunction() { }
and
void someFunction() {
}
and
void someFunction()
{
}
will have no implication whatsoever. However Sun has enlisted a list of coding conventions if you are interested you could read them here.
These are mainly for maintainability and readability of code but need not be strictly followed.
The title wasn't specific to conditional blocks, so this came to mind:
I don't see it mentioned yet, but one thing to consider is how do you find things with simple tools (like grep and its inferior implementations from windowsland).
For example consider this (admittedly a bit contrived) example
class Foo
// vs.
class
Foo
You can find former with regex "class\s+Foo", but for latter you'll have to have a specialized tool that can parse C++/C#/java.
This applies also in C for function prototypes, some weird people prefer
void
bar (void)
// to
void bar(void)
While you can generally assume that if in front of function name are some A-Z characters, then it's very likely definiton/declaration.
Actually when it comes to ifblocks, then placing the brace has very big impact on debugging code in Visual Studio 200x. When it steps into function/block, it will place text/execution line cursor on opening bracket. So if the bracket happens to be waaaaay on the right side, the code window has to scroll there and buggeritall stays there. Extremely annoying.

What is the most obfuscated code you've had to fix?

Most programmers will have had the experience of debugging/fixing someone else's code. Sometimes that "someone else's code" is so obfuscated it's bad enough trying to understand what it's doing.
What's the worst (most obfuscated) code you've had to debug/fix?
If you didn't throw it away and recode it from scratch, well why didn't you?
PHP OSCommerce is enough to say, it is obfuscated code...
a Java class
only static methods that manipulates DOM
8000 LOCs
long chain of methods that return null on "error": a.b().c().d().e()
very long methods (400/500 LOC each)
nested if, while, like:
if (...) {
for (...) {
if (...) {
if (...) {
while (...) {
if (...) {
cut-and-paste oriented programming
no exceptions, all exceptions are catched and "handled" using printStackTrace()
no unit tests
no documentation
I was tempted to throw away and recode... but, after 3 days of hard debugging,
I've added the magic if :-)
Spaghetti code PHP CMS system.
by default, programmers think someone else's code is obfuscated.
The worse I probably had to do was interpreting what variables i1, i2 j, k, t were in a simple method and they were not counters in 'for' loops.
In all other circumstances I guess the problem area was difficult which made the code look difficult.
I found this line in our codebase today and thought it was a nice example of sneaky obfuscation:
if (MULTICLICK_ENABLED.equals(propService.getProperty(PropertyNames.MULTICLICK_ENABLED))) {} else {
return false;
}
Just making sure I read the whole line. NO SKIMREADING.
When working on a GWT project, I would reach parts of GWT-compiled obfuscated JS code which wasn't mine.
Now good luck debugging real obfuscated code.
I can't remember the full code, but a single part of it remains burned into my memory as something I spend hours trying to understand:
do{
$tmp = shift unless shift;
$tmp;
}while($tmp);
I couldn't understand it at first, it looks so useless, then I printed out #_ for a list of arguments, a series of alternating boolean and function names, the code was used in conjunction with a library detection module that changed behaviour if a function was broken, but the code was so badly documented and made of things like that which made no sense without a complete understanding of the full code I gave up and rewrote the whole thing.
UPDATE from DVK:
And, lest someone claims this was because Perl is unreadable as opposed to coder being a golf master instead of good software developer, here's the same code in a slightly less obfuscated form (the really correct code wouldn't even HAVE alternating sub names and booleans in the first place :)
# This subroutine take a list of alternating true/false flags
# and subroutine names; and executes the named subroutines for which flag is true.
# I am also weird, otherwise I'd have simply have passed list of subroutines to execute :)
my #flags_and_sub_names_list = #_;
while ( #flags_and_sub_names_list ) {
my $flag = shift #flags_and_sub_names_list;
my $subName = shift #flags_and_sub_names_list;
next unless $flag && $subName;
&{ $subName }; # Call the named subroutine
}
I've had a case of a 300lines function performing input sanitization which missed a certain corner case. It was parsing certain situations manually using IndexOf and Substring plus a lot of inlined variables and constants (looks like the original coder didn't know anything about good practices), and no comment was provided. Throwing it away wasn't feasible due to time constraints and the fact that I didn't have the specification required so rewriting it would've meant understanding the original, but after understanding it fixing it was just quicker. I also added lots of comments, so whoever shall come after me won't feel the same pain taking a look at it...
The Perl statement:
select((select(s),$|=1)[0])
which, at the suggestion of the original author (Randal Schwartz himself, who said he disliked it but nothing else was available at the time), was replaced with something a little more understandable:
IO::Handle->autoflush
Beyond that one-liner, some of the Java JDBC libraries from IBM are obfuscated and all variables and functions are either combinations of the letter 'l' and '1' or single/double characters - very hard to track anything down until you get them all renamed. Needed to do this to track down why they worked fine in IBM's JRE but not Sun's.
If you're talking about HLL codes, once I was updating project written by a chinese and all comments were chinese (stored in ansii) and it was a horror to understand some code fragments, if you're talking about low level code there were MANY of them (obfuscated, mutated, vm-ed...).
I once had to reverse engineer a Java 1.1 framework that:
Extended event-driven SAX parser classes for every class, even those that didn't parse XML (the overridden methods were simply invoked ad hoc by other code)
Custom runtime exceptions were thrown in lieu of method invocations wherever possible. As a result, most of the business logic landed in a nested series of catch blocks.
If I had to guess, it was probably someone's "smart" idea that method invocations were expensive in Java 1.1, so throwing exceptions for non-exceptional flow control was somehow considered an optimization.
Went through about three bottles of eye drops.
I once found a time bomb that had been intentionally obfuscated.
When I had finally decoded what it was doing I mentioned it to the manager who said they knew about the time bomb but had left it in place because it was so ineffective and was interwoven with other code.
The time bomb was (presumably) supposed to go off after a certain date.
Instead, it had a bug in it so it only activated if someone was working after lunchtime on Dec 31st.
It had taken three years for that circumstance to occur since the guy who wrote the time bomb left the company.

Is there a way to check if a label is already defined in LaTeX?

I edited the question after David Hanak's answer (thanks btw!). He helped with the syntax, but it appears that I wasn't using the right function to begin with.
Basically what I want is to let the compiler ignore multiple definitions of a certain label and just use the first. In order to do that, I thought I'd just do something like this:
\makeatletter
\newcommand{\mylabel}[1]{
\#ifundefined{#1}{\label{#1}}{X}
}
\makeatother
This does not work though, because the first option is always chosen (it doesn't matter if the label is defined or not). I think the \#ifundefined (and the suggested \ifundefined) only work for commands and not for labels, but I don't really know much about LaTeX. Any help with this would be great! Thanks!
Much later update:
I marked David Hanak's response as the correct answer to my question, but it isn't a complete solution, although it really helped me.
The problem is, I think but I'm no specialist, that even though David's code checks to see if a label is defined, it only works when the label was defined in a previous run (i.e. is in the .aux file). If two \mylabels with the same name are defined in the same run, the second will still be defined. Also, even if you manage to work around this, it will make LaTeX use the first label that you defined chronologically, and not necessarily the first in the text.
Anyway, below is my quick and dirty solution. It uses the fact that counters do seem to be defined right away.
\newcommand{\mylabel}[1]{%
\#ifundefined{c##1}{%
\newcounter{#1}%
\setcounter{#1}{0}%
}{}%
\ifthenelse{\value{#1} > 0}{}{%
\label{#1}%
\addtocounter{#1}{1}%
}%
}
I'm not sure if it is necessary to initialize the counter to 0, as it seems like a likely default, but I couldn't find if that was the case, so I'm just being safe.
Also, this uses the 'ifthen' package, which I'm not sure is necessary.
I am also not a LaTeX expert, however after one day of trying and searching the internet the following worked for me. I have used a dummy counter to solve the problem. Hopefully this helps, apparently not many people are looking for this.
\newcommand{\mylabel}[1]{
\ifcsname c##1\endcsname%
\else%
\newcounter{#1}\label{#1}%
\fi%
}
# is a special character in LaTeX. To make your declaration syntactically correct, you'll have to add two more lines:
\makeatletter
\newcommand{\mylabel}[1]{
\#ifundefined{#1}{\label{#1}}{X}
}
\makeatother
The first line turns # into a normal letter, the last line reverses its effect.
Update: You might also want to take a look at the "plain" \ifundefined LaTeX macro.
Update 2
Okay, I did some research to figure out the answer to the real problem. The thing is that defining a label does not create a macro by that name; it prepends a "r#" to it. So try the following:
\makeatletter
\newcommand{\mylabel}[1]{
\#ifundefined{r##1}{\label{#1}}{X}
}
\makeatother
For more technical details, refer to line 3863 of latex.ltx in your LaTeX distribution (where it says \def\newlabel{\#newl#bel r}).
By Victor Eijkhout, "TeX by Topic", p.143:
\def\ifUnDefinedCs#1{\expandafter\ifx\csname#1\endcsname\relax}
This can be used to check whether a label is defined; if not, the label is printed:
\newcommand{\myautoref}[1]{\ifUnDefinedCs{r##1}{\color{magenta}\IDontKnow\{#1\}}\else\autoref{#1}\fi}

What are your language "hangups"? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've read some of the recent language vs. language questions with interest... Perl vs. Python, Python vs. Java, Can one language be better than another?
One thing I've noticed is that a lot of us have very superficial reasons for disliking languages. We notice these things at first glance and they turn us off. We shun what are probably perfectly good languages as a result of features that we'd probably learn to love or ignore in 2 seconds if we bothered.
Well, I'm as guilty as the next guy, if not more. Here goes:
Ruby: All the Ruby example code I see uses the puts command, and that's a sort of childish Yiddish anatomical term. So as a result, I can't take Ruby code seriously even though I should.
Python: The first time I saw it, I smirked at the whole significant whitespace thing. I avoided it for the next several years. Now I hardly use anything else.
Java: I don't like identifiersThatLookLikeThis. I'm not sure why exactly.
Lisp: I have trouble with all the parentheses. Things of different importance and purpose (function declarations, variable assignments, etc.) are not syntactically differentiated and I'm too lazy to learn what's what.
Fortran: uppercase everything hurts my eyes. I know modern code doesn't have to be written like that, but most example code is...
Visual Basic: it bugs me that Dim is used to declare variables, since I remember the good ol' days of GW-BASIC when it was only used to dimension arrays.
What languages did look right to me at first glance? Perl, C, QBasic, JavaScript, assembly language, BASH shell, FORTH.
Okay, now that I've aired my dirty laundry... I want to hear yours. What are your language hangups? What superficial features bother you? How have you gotten over them?
I hate Hate HATE "End Function" and "End IF" and "If... Then" parts of VB. I would much rather see a curly bracket instead.
PHP's function name inconsistencies.
// common parameters back-to-front
in_array(needle, haystack);
strpos(haystack, needle);
// _ to separate words, or not?
filesize();
file_exists;
// super globals prefix?
$GLOBALS;
$_POST;
I never really liked the keywords spelled backwards in some scripting shells
if-then-fi is bad enough, but case-in-esac is just getting silly
I just thought of another... I hate the mostly-meaningless URLs used in XML to define namespaces, e.g. xmlns="http://purl.org/rss/1.0/"
Pascal's Begin and End. Too verbose, not subject to bracket matching, and worse, there isn't a Begin for every End, eg.
Type foo = Record
// ...
end;
Although I'm mainly a PHP developer, I dislike languages that don't let me do enough things inline. E.g.:
$x = returnsArray();
$x[1];
instead of
returnsArray()[1];
or
function sort($a, $b) {
return $a < $b;
}
usort($array, 'sort');
instead of
usort($array, function($a, $b) { return $a < $b; });
I like object-oriented style. So it bugs me in Python to see len(str) to get the length of a string, or splitting strings like split(str, "|") in another language. That is fine in C; it doesn't have objects. But Python, D, etc. do have objects and use obj.method() other places. (I still think Python is a great language.)
Inconsistency is another big one for me. I do not like inconsistent naming in the same library: length(), size(), getLength(), getlength(), toUTFindex() (why not toUtfIndex?), Constant, CONSTANT, etc.
The long names in .NET bother me sometimes. Can't they shorten DataGridViewCellContextMenuStripNeededEventArgs somehow? What about ListViewVirtualItemsSelectionRangeChangedEventArgs?
And I hate deep directory trees. If a library/project has a 5 level deep directory tree, I'm going to have trouble with it.
C and C++'s syntax is a bit quirky. They reuse operators for different things. You're probably so used to it that you don't think about it (nor do I), but consider how many meanings parentheses have:
int main() // function declaration / definition
printf("hello") // function call
(int)x // type cast
2*(7+8) // override precedence
int (*)(int) // function pointer
int x(3) // initializer
if (condition) // special part of syntax of if, while, for, switch
And if in C++ you saw
foo<bar>(baz(),baaz)
you couldn't know the meaning without the definition of foo and bar.
the < and > might be a template instantiation, or might be less-than and greater-than (unusual but legal)
the () might be a function call, or might be just surrounding the comma operator (ie. perform baz() for size-effects, then return baaz).
The silly thing is that other languages have copied some of these characteristics!
Java, and its checked exceptions. I left Java for a while, dwelling in the .NET world, then recently came back.
It feels like, sometimes, my throws clause is more voluminous than my method content.
There's nothing in the world I hate more than php.
Variables with $, that's one extra odd character for every variable.
Members are accessed with -> for no apparent reason, one extra character for every member access.
A freakshow of language really.
No namespaces.
Strings are concatenated with ..
A freakshow of language.
All the []s and #s in Objective C. Their use is so different from the underlying C's native syntax that the first time I saw them it gave the impression that all the object-orientation had been clumsily bolted on as an afterthought.
I abhor the boiler plate verbosity of Java.
writing getters and setters for properties
checked exception handling and all the verbiage that implies
long lists of imports
Those, in connection with the Java convention of using veryLongVariableNames, sometimes have me thinking I'm back in the 80's, writing IDENTIFICATION DIVISION. at the top of my programs.
Hint: If you can automate the generation of part of your code in your IDE, that's a good hint that you're producing boilerplate code. With automated tools, it's not a problem to write, but it's a hindrance every time someone has to read that code - which is more often.
While I think it goes a bit overboard on type bureaucracy, Scala has successfully addressed some of these concerns.
Coding Style inconsistencies in team projects.
I'm working on a large team project where some contributors have used 4 spaces instead of the tab character.
Working with their code can be very annoying - I like to keep my code clean and with a consistent style.
It's bad enough when you use different standards for different languages, but in a web project with HTML, CSS, Javascript, PHP and MySQL, that's 5 languages, 5 different styles, and multiplied by the number of people working on the project.
I'd love to re-format my co-workers code when I need to fix something, but then the repository would think I changed every line of their code.
It irritates me sometimes how people expect there to be one language for all jobs. Depending on the task you are doing, each language has its advantages and disadvantages. I like the C-based syntax languages because it's what I'm most used to and I like the flexibility they tend to bestow on the developer. Of course, with great power comes great responsibility, and having the power to write 150 line LINQ statements doesn't mean you should.
I love the inline XML in the latest version of VB.NET although I don't like working with VB mainly because I find the IDE less helpful than the IDE for C#.
If Microsoft had to invent yet another C++-like language in C# why didn't they correct Java's mistake and implement support for RAII?
Case sensitivity.
What kinda hangover do you need to think that differentiating two identifiers solely by caSE is a great idea?
I hate semi-colons. I find they add a lot of noise and you rarely need to put two statements on a line. I prefer the style of Python and other languages... end of line is end of a statement.
Any language that can't fully decide if Arrays/Loop/string character indexes are zero based or one based.
I personally prefer zero based, but any language that mixes the two, or lets you "configure" which is used can drive you bonkers. (Apache Velocity - I'm looking in your direction!)
snip from the VTL reference (default is 1, but you can set it to 0):
# Default starting value of the loop
# counter variable reference.
directive.foreach.counter.initial.value = 1
(try merging 2 projects that used different counter schemes - ugh!)
In no particular order...
OCaml
Tuples definitions use * to separate items rather than ,. So, ("Juliet", 23, true) has the type (string * int * bool).
For being such an awesome language, the documentation has this haunting comment on threads: "The threads library is implemented by time-sharing on a single processor. It will not take advantage of multi-processor machines. Using this library will therefore never make programs run faster." JoCaml doesn't fix this problem.
^^^ I've heard the Jane Street guys were working to add concurrent GC and multi-core threads to OCaml, but I don't know how successful they've been. I can't imagine a language without multi-core threads and GC surviving very long.
No easy way to explore modules in the toplevel. Sure, you can write module q = List;; and the toplevel will happily print out the module definition, but that just seems hacky.
C#
Lousy type inference. Beyond the most trivial expressions, I have to give types to generic functions.
All the LINQ code I ever read uses method syntax, x.Where(item => ...).OrderBy(item => ...). No one ever uses expression syntax, from item in x where ... orderby ... select. Between you and me, I think expression syntax is silly, if for no other reason than that it looks "foreign" against the backdrop of all other C# and VB.NET code.
LINQ
Every other language uses the industry standard names are Map, Fold/Reduce/Inject, and Filter. LINQ has to be different and uses Select, Aggregate, and Where.
Functional Programming
Monads are mystifying. Having seen the Parser monad, Maybe monad, State, and List monads, I can understand perfectly how the code works; however, as a general design pattern, I can't seem to look at problems and say "hey, I bet a monad would fit perfect here".
Ruby
GRRRRAAAAAAAH!!!!! I mean... seriously.
VB
Module Hangups
Dim _juliet as String = "Too Wordy!"
Public Property Juliet() as String
Get
Return _juliet
End Get
Set (ByVal value as String)
_juliet = value
End Set
End Property
End Module
And setter declarations are the bane of my existence. Alright, so I change the data type of my property -- now I need to change the data type in my setter too? Why doesn't VB borrow from C# and simply incorporate an implicit variable called value?
.NET Framework
I personally like Java casing convention: classes are PascalCase, methods and properties are camelCase.
In C/C++, it annoys me how there are different ways of writing the same code.
e.g.
if (condition)
{
callSomeConditionalMethod();
}
callSomeOtherMethod();
vs.
if (condition)
callSomeConditionalMethod();
callSomeOtherMethod();
equate to the same thing, but different people have different styles. I wish the original standard was more strict about making a decision about this, so we wouldn't have this ambiguity. It leads to arguments and disagreements in code reviews!
I found Perl's use of "defined" and "undefined" values to be so useful that I have trouble using scripting languages without it.
Perl:
($lastname, $firstname, $rest) = split(' ', $fullname);
This statement performs well no matter how many words are in $fullname. Try it in Python, and it explodes if $fullname doesn't contain exactly three words.
SQL, they say you should not use cursors and when you do, you really understand why...
its so heavy going!
DECLARE mycurse CURSOR LOCAL FAST_FORWARD READ_ONLY
FOR
SELECT field1, field2, fieldN FROM atable
OPEN mycurse
FETCH NEXT FROM mycurse INTO #Var1, #Var2, #VarN
WHILE ##fetch_status = 0
BEGIN
-- do something really clever...
FETCH NEXT FROM mycurse INTO #Var1, #Var2, #VarN
END
CLOSE mycurse
DEALLOCATE mycurse
Although I program primarily in python, It irks me endlessly that lambda body's must be expressions.
I'm still wrapping my brain around JavaScript, and as a whole, Its mostly acceptable. Why is it so hard to create a namespace. In TCL they're just ugly, but in JavaScript, it's actually a rigmarole AND completely unreadable.
In SQL how come everything is just one, huge freekin SELECT statement.
In Ruby, I very strongly dislike how methods do not require self. to be called on current instance, but properties do (otherwise they will clash with locals); i.e.:
def foo()
123
end
def foo=(x)
end
def bar()
x = foo() # okay, same as self.foo()
x = foo # not okay, reads unassigned local variable foo
foo = 123 # not okay, assigns local variable foo
end
To my mind, it's very inconsistent. I'd rather prefer to either always require self. in all cases, or to have a sigil for locals.
Java's packages. I find them complex, more so because I am not a corporation.
I vastly prefer namespaces. I'll get over it, of course - I'm playing with the Android SDK, and Eclipse removes a lot of the pain. I've never had a machine that could run it interactively before, and now I do I'm very impressed.
Prolog's if-then-else syntax.
x -> y ; z
The problem is that ";" is the "or" operator, so the above looks like "x implies y or z".
Java
Generics (Java version of templates) are limited. I can not call methods of the class and I can not create instances of the class. Generics are used by containers, but I can use containers of instances of Object.
No multiple inheritance. If a multiple inheritance use does not lead to diamond problem, it should be allowed. It should allow to write a default implementation of interface methods, a example of problem: the interface MouseListener has 5 methods, one for each event. If I want to handle just one of them, I have to implement the 4 other methods as an empty method.
It does not allow to choose to manually manage memory of some objects.
Java API uses complex combination of classes to do simple tasks. Example, if I want to read from a file, I have to use many classes (FileReader, FileInputStream).
Python
Indentation is part of syntax, I prefer to use the word "end" to indicate end of block and the word "pass" would not be needed.
In classes, the word "self" should not be needed as argument of functions.
C++
Headers are the worst problem. I have to list the functions in a header file and implement them in a cpp file. It can not hide dependencies of a class. If a class A uses the class B privately as a field, if I include the header of A, the header of B will be included too.
Strings and arrays came from C, they do not provide a length field. It is difficult to control if std::string and std::vector will use stack or heap. I have to use pointers with std::string and std::vector if I want to use assignment, pass as argument to a function or return it, because its "=" operator will copy entire structure.
I can not control the constructor and destructor. It is difficult to create an array of objects without a default constructor or choose what constructor to use with if and switch statements.
In most languages, file access. VB.NET is the only language so far where file access makes any sense to me. I do not understand why if I want to check if a file exists, I should use File.exists("") or something similar instead of creating a file object (actually FileInfo in VB.NET) and asking if it exists. And then if I want to open it, I ask it to open: (assuming a FileInfo object called fi) fi.OpenRead, for example. Returns a stream. Nice. Exactly what I wanted. If I want to move a file, fi.MoveTo. I can also do fi.CopyTo. What is this nonsense about not making files full-fledged objects in most languages? Also, if I want to iterate through the files in a directory, I can just create the directory object and call .GetFiles. Or I can do .GetDirectories, and I get a whole new set of DirectoryInfo objects to play with.
Admittedly, Java has some of this file stuff, but this nonsense of having to have a whole object to tell it how to list files is just silly.
Also, I hate ::, ->, => and all other multi-character operators except for <= and >= (and maybe -- and ++).
[Disclaimer: i only have a passing familiarity with VB, so take my comments with a grain of salt]
I Hate How Every Keyword In VB Is Capitalized Like This. I saw a blog post the other week (month?) about someone who tried writing VB code without any capital letters (they did something to a compiler that would let them compile VB code like that), and the language looked much nicer!
My big hangup is MATLAB's syntax. I use it, and there are things I like about it, but it has so many annoying quirks. Let's see.
Matrices are indexed with parentheses. So if you see something like Image(350,260), you have no clue from that whether we're getting an element from the Image matrix, or if we're calling some function called Image and passing arguments to it.
Scope is insane. I seem to recall that for loop index variables stay in scope after the loop ends.
If you forget to stick a semicolon after an assignment, the value will be dumped to standard output.
You may have one function per file. This proves to be very annoying for organizing one's work.
I'm sure I could come up with more if I thought about it.

Resources