Why does string.IndexOf behave differently between netcore3.1 and net 5/6 - .net-6.0

I know this sounds like a replica of IndexOf method returns 0 when it should had return -1 in C# / Java, but it isn't quite.
If you run this test:
var i = "Hello world".IndexOf("\0");
i will be 0 in net5 and net6, whereas in netcore3.1 i will be -1.
I get it that the compiler may think that "\0" equals "" which by the discussion in the referenced question might be right, but the interesting thing here is the difference between netcore3.1 and the later versions.
My solution to this was to change the test to:
var i = "Hello world".IndexOf('\0');
You may find it odd to look for a zero terminator inside a string, but the sample is taken out of a complex context. This simply nails down the reason why a library stopped working when moved from dotnetcore3.1 to net5/net6.

Related

Implementing comparison operators in Bytecode using ASM

I'm working on a personal project of mine creating a simple language which is compiled to Java Bytecode. I'm using the ASM library version 7.3.1 but I've hit a problem with Frames that I can't quite figure out.
This is actually two questions rolled into one. I'm trying to implement simple comparison operators e.g. >, <, >= etc. These operators should return a boolean result obviously. I can't see a way of implementing this directly in Bytecode so I'm using using FCMPG to compare two floats that are already on the stack and then using IFxx to push either a 1 or 0 to the stack depending on which operator I'm generating code for.
For example here is my code for >:
val label = new Label()
mv.visitInsn(FCMPG) // mv is my MethodVisitor, there are 2 Floats on the stack
mv.visitJumpInsn(IFGT, label)
mv.visitInsn(ICONST_1)
mv.visitLabel(label)
mv.visitInsn(ICONST_0)
Question 1: Is this the correct approach for implementing comparison operators or am I missing a simpler method?
Question 2: Running this code generates this error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
at org.objectweb.asm.Frame.merge(Frame.java:1268)
at org.objectweb.asm.Frame.merge(Frame.java:1244)
at org.objectweb.asm.MethodWriter.computeAllFrames(MethodWriter.java:1610)
at org.objectweb.asm.MethodWriter.visitMaxs(MethodWriter.java:1546)
at compiler.codegen.default$$anon$1.generateConstructor(default.scala:138)
at compiler.codegen.default$$anon$1.generateCode(default.scala:157)
at compiler.codegen.default$$anon$1.generateCode(default.scala:21)
at compiler.codegen.package$.generateCode(package.scala:21)
at compiler.codegen.package$CodeGeneratorOp.generateCode(package.scala:17)
at Main$.main(main.scala:27)
at Main.main(main.scala)
I know this is to do with Frames but I don't really understand frames enough to know what I'm doing wrong. I've tried adding mv.visitFrame(F_SAME, 0, null, 0, null) after visitLabel but I get the same error.
1) Yes, this is the correct way to do it. I believe the actual Java compiler does something very similar.
2) You get a verification error because you forgot to add a jump to the end of the if block. If you look closely at your code, you'll see that when the jump isn't taken both branches are executed and you end up with both 0 and 1 on the stack, which leads to a verification error. You need to insect a second jump so only the constant you wish gets pushed to the stack in this case. It should be something like this:
val then_label = new Label()
val end_label = new Label()
mv.visitInsn(FCMPG) // mv is my MethodVisitor, there are 2 Floats on the stack
mv.visitJumpInsn(IFGT, then_label)
mv.visitInsn(ICONST_1)
mv.visitGoto(end_label)
mv.visitLabel(then_label)
mv.visitInsn(ICONST_0)
mv.visitLabel(end_label)

Where in the V8 source does the automatic cast for BinaryOperation occour?

I stumbled again in the good old '12' + 2 = '122'
I wanted to deeply understand what happens here, so my first thesis was that
Maybe Javascript casts the right operand to the type of the first one and
then operates, like so: '12' + String(2) = '122' all good...
But no, because 12 + '2' = '122' too; So the engine's magic is clearly favoring to concat over casting to number.
My second thesis was then
Maybe the engine enumerates all operands and looks for an "operator override", similar to C#? And then favor executing that over doing the self-magic thing?
My confusion got even weirder when I realized that also '5' * '8' = 40, it casts both operands to Number and does the operation.
The only way I could possibly really understand that was to read the V8 code directly from GitHub
The farther I could track down was at v8/src/parsing/parser-base.h line 2865
// We have a "normal" binary operation.
x = factory()->NewBinaryOperation(op, x, y, pos);
if (op == Token::OR || op == Token::AND) {
impl()->RecordBinaryOperationSourceRange(x, right_range);
}
From here I got lost, because I couldn't find where this factory() is coming from.
Long story short, where does the JavaScript "type Magic" come from in the V8 Engine Source code?
V8 developer here.
There are several fast paths for various cases of addition and other operations in V8. If you want to study a canonical (slow, but complete) version, you can look for Object::Add in src/objects.cc.
That said, the source of truth here is not any given engine's implementation, but the JavaScript specification. What the + operator is supposed to do is defined here: https://tc39.github.io/ecma262/#sec-addition-operator-plus.
Any engine's implementation either does precisely that, or something that from the outside is indistinguishable from that -- otherwise it's a bug. It's not a coincidence that the implementation of Object::Add reads almost exactly like the spec ;-)

Can a finite list be lazy in Perl 6?

Suppose I have a sequence where I know the start and end points and the generator is straightforward. Can I make it lazy?
my #b = 0 ... 3;
say 'Is #b lazy? ' ~ #b.is-lazy; # Not lazy
I'd like to combination that known list with itself an unknown number of times but not generate the entire list immediately. I want #cross to be lazy:
my #cross = [X] #b xx $n;
I know I can do this through other simple matters or programming (and my Perl 5 Set::CrossProduct does just that), but I'm curious if there is some simple and intended way that I'm missing. Some way that doesn't involve me counting on my own.
As a side question, what's the feature of a sequence that makes it lazy? Is it just the endpoint? Is there a sequence with a known endpoint that can still be lazy if the generator can make infinite values between? I wondered how much information I have to remove and tried something like this:
my $m = #*ARGS[0];
my #b = 0, * + $m ... ^ * > 3;
say 'Is #b lazy? ' ~ #b.is-lazy;
say '#b: ' ~ #b;
Still, this is not lazy.
The "is-lazy" method, in my opinion, is really a misnomer. The only thing it says, as cuonglm pointed out, is that just passes on that the underlying iterator claims to be lazy. But that still doesn't mean anything, really. An iterator can technically produce values on demand, but still claim it isn't lazy. And vice-versa.
The "is-lazy" check is used internally to prevent cases where the number of elements needs to be known in advance (like .roll or .pick): if the Seq / iterator claims to be lazy, it won't actually try, but fail or throw instead.
The only thing that .lazy does, is wrap the iterator of the Seq into another iterator that does claim to be lazy (if the given iterator claims it isn't lazy, of course). And make sure it won't pull any values unless really needed. So, adding .lazy doesn't tell anything about when values will be produced, only when they will be delivered. And it helps in testing iterator based logic to see how they would work with an iterator that claims to be lazy.
So, getting back to the question: if you want to be sure that something is lazy, you will have to write the iterator yourself. Having said that, the past months I've spent quite a lot of effort in making things as lazy as possible in the core. Notably xx N still isn't lazy, although it does produce a Seq nowadays. Making it lazy broke some spectests at some deep level I haven't been able to figure out just yet. Looking forward, I think you can be sure that things will be as lazy as makes sense generally, with maybe a possibility of indicating favouring memory over CPU. But you will never have complete control over builtins: if you want complete control, you will have to write it yourself.
Yes, you can, using lazy method:
> my #b = (0 ... 3).lazy;
[...]
> say 'Is #b lazy? ' ~ #b.is-lazy;
Is #b lazy? True
There's nothing special Seq that make it be lazy, just its underlying Iterator does.
As already pointed out, is-lazy does not really mean anything.
But I just wanted to mention that there is a page on rakudo wiki that gives a couple of clues on what you can expect from is-lazy.
You can make:
my #b = 0 ... 3;
lazy just by adding the lazy keyword in the right place, like this:
my #b = lazy 0 ... 3;
Or, as cuonglm pointed out, you can call the range's method, like this:
my #b = (0 ... 3).lazy;

Perform operations within string format

I am sure that this has been asked, but I can't find it through my rudimentary searches.
Is it discouraged to perform operations within string initializations?
> increment = 4
=> 4
> "Incremented from #{increment} to #{increment += 1}"
=> "Incremented from 4 to 5"
I sure wouldn't, because that's not where you look for things-that-change-things when reading code.
It obfuscates intent, it obscures meaning.
Compare:
url = "#{BASE_URL}/#{++request_sequence}"
with:
request_sequence += 1
url = "#{BASE_URL}/#{request_sequence}"
If you're looking to see where the sequence number is coming from, which is more obvious?
I can almost live with the first version, but I'd be likely to opt for the latter. I might also do this instead:
url = build_url(++request_sequence)
In your particular case, it might be okay, but the problem is that the position where the operation on the variable should happen must be the last instance of the same variable in the string, and you cannot always be sure about that. For example, suppose (for some stylistic reason), you want to write
"Incremented to #{...} from #{...}"
Then, all of a sudden, you cannot do what you did. So doing operation during interpolation is highly dependent on the particular phrasing in the string, and that decreases maintainability of the code.

What obscure syntax ruined your day? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When have you run into syntax that might be dated, never used or just plain obfuscated that you couldn't understand for the life of you.
For example, I never knew that comma is an actual operator in C. So when I saw the code
if(Foo(), Bar())
I just about blew a gasket trying to figure out what was going on there.
I'm curious what little never-dusted corners might exist in other languages.
C++'s syntax for a default constructor on a local variable. At first I wrote the following.
Student student(); // error
Student student("foo"); // compiles
This lead me to about an hour of reading through a cryptic C++ error message. Eventually a non-C++ newbie dropped by, laughed and pointed out my mistake.
Student student;
This is always jarring:
std::vector <std::vector <int> >
^
mandatory space.
When using the System.DirectoryServices name space to bind to an ADAM (Active Directory Application Mode; now called AD LDS, I think), I lost an entire day trying to debug this simple code:
DirectoryEntry rootDSE = new DirectoryEntry(
"ldap://192.168.10.78:50000/RootDSE",
login,
password,
AuthenticationTypes.None);
When I ran the code, I kept getting a COMException with error 0x80005000, which helpfully mapped to "Unknown error."
I could use the login and password and bind to the port via ADSI Edit. But this simple line of code didn't work. Bizarre firewall permission? Something screwed in configuration? Some COM object not registered correctly? Why on earth wasn't it working?
The answer? It's LDAP://, not ldap://.
And this is why we drink.
C++
class Foo
{
// Lots of stuff here.
} bar;
The declaration of bar is VERY difficult to see. More commonly found in C, but especially annoying in C++.
Perl's syntax caused me a bad day a while ago:
%table = {
foo => 1,
bar => 2
};
Without proper warnings (which are unavailable on the platform I was using), this creates a one-element hash with a key as the given hash reference and value undef. Note the subtle use of {}, which creates a new hash reference, and not (), which is an array used to populate the %table hash.
I was shocked Python's quasi-ternary operator wasn't a syntax error the first time I saw it:
X if Y else Z
This is stupid and common, but this syntax:
if ( x = y ) {
// do something
}
Has caught me about three times in the past year in a couple of different languages. I really like the R language's convention of using <- for assignment, like this:
x <- y
If the x = y syntax were made to mean x == y, and x <- y to mean assignment, my brain would make a smoother transition to and from math and programming.
C/C++'s bitvector syntax. The worst part about this is trying to google for it simply based on the syntax.
struct C {
unsigned int v1 : 12;
unsigned int v2 : 1;
};
C#'s ?? operator threw me for a loop the first time I saw it. Essentially it will return the LHS if it's non-null and the RHS if the LHS is null.
object bar = null;
object foo = bar ?? new Student(); // gets new Student()
Powershell's function calling semantics
function foo() {
params ($count, $name);
...
}
foo (5, "name")
For the non powershellers out there. This will work but not how you expect it to. It actually creates an array and passes it as the first argument. The second argument has no explicit value. The correct version is
foo 5 "name"
The first time I saw a function pointer in C++ I was confused. Worse, because the syntax has no key words, it was really hard to look up. What exactly does one type into a search engine for this?
int (*Foo)(float, char, char);
I ended up having to ask the local C++ guru what it was.
VB's (yeah yeah, I have to use it) "And" keyword - as in:
If Object IsNot Nothing And Object.Property Then
See that Object.Property reference, after I've made sure the object isn't NULL? Well, VB's "And" keyword * does * not * block * further * evaluation and so the code will fail.
VB does have, however, another keyword - AndAlso:
If Object IsNot Nothing AndAlso Object.Property Then
That will work as you'd expect and not explode when run.
I was once very confused by some C++ code that declared a reference to a local variable, but never used it. Something like
MyLock &foo;
(Cut me some slack on the syntax, I haven't done C++ in nearly 8 years)
Taking that seemingly unused variable out made the program start dying in obscure ways seemingly unrelated to this "unused" variable. So I did some digging, and found out that the default ctor for that class grabbed a thread lock, and the dtor released it. This variable was guarding the code against simultaneous updates without seemingly doing anything.
Javascript: This syntax ...
for(i in someArray)
... is for looping through arrays, or so I thought. Everything worked fine until another team member dropped in MooTools, and then all my loops were broken because the for(i in ...) syntax also goes over extra methods that have been added to the array object.
Had to translate some scientific code from old FORTRAN to C. A few things that ruined my day(s):
Punch-card indentation. The first 6 characters of every line were reserved for control characters, goto labels, comments, etc:
^^^^^^[code starts here]
c [commented line]
Goto-style numbering for loops (coupled with 6 space indentation):
do 20, i=0,10
do 10, j=0,10
do_stuff(i,j)
10 continue
20 continue
Now imagine there are multiple nested loops (i.e., do 20 to do 30) which have no differentiating indentation to know what context you are in. Oh, and the terminating statements are hundreds of lines away.
Format statement, again using goto labels. The code wrote to files (helpfully referred to by numbers 1,2,etc). To write the values of a,b,c to file we had:
write (1,51) a,b,c
So this writes a,b,c to file 1 using a format statement at the line marked with label 51:
51 format (f10.3,f10.3,f10.3)
These format lines were hundreds of lines away from where they were called. This was complicated by the author's decision to print newlines using:
write (1,51) [nothing here]
I am reliably informed by a lecturer in the group that I got off easy.
C's comma operator doesn't seem very obscure to me: I see it all the time, and if I hadn't, I could just look up "comma" in the index of K&R.
Now, trigraphs are another matter...
void main() { printf("wat??!\n"); } // doesn't print "wat??!"
Wikipedia has some great examples, from the genuinely confusing:
// Will the next line be executed????????????????/
a++;
to the bizarrely valid:
/??/
* A comment *??/
/
And don't even get me started on digraphs. I would be surprised if there's somebody here who can fully explain C's digraphs from memory. Quick, what digraphs does C have, and how do they differ from trigraphs in parsing?
Syntax like this in C++ with /clr enabled. Trying to create a Managed Dictionary object in C++.
gcroot<Dictionary<System::String^, MyObj^>^> m_myObjs;
An oldie:
In PL/1 there are no reserved words, so you can define variables, methods, etc. with the same name as the language keywords.
This can be a valid line of code:
IF ELSE THEN IF ELSE THEN
(Where ELSE is a boolean, and IF and THEN are functions, obviously.)
Iif(condition, expression, expression) is a function call, not an operator.
Both sides of the conditional are ALWAYS evaluated.
It always ruines my day if I have to read/write some kind of Polish notation as used in a lot of HP calculators...
PHP's ternary operator associates left to right. This caused me much anguish one day when I was learning PHP. For the previous 10 years I had been programming in C/C++ in which the ternary operator associates right to left.
I am still a little curious as to why the designers of PHP chose to do that when, in many other respects, the syntax of PHP matches that C/C++ fairly closely.
EDIT: nowadays I only work with PHP under duress.
Not really obscure, but whenever I code too much in one language, and go back to another, I start messing up the syntax of the latter. I always chuckle at myself when I realize that "#if" in C is not a comment (but rather something far more deadly), and that lines in Python do not need to end in a semicolon.
While performing maintentnace on a bit of C++ code I once spotted that someone had done something like this:
for (i=0; i<10; i++)
{
MyNumber += 1;
}
Yes, they had a loop to add 1 to a number 10 times.
Why did it ruin my day? The perpetrator had long since left, and I was having to bug fix their module. I thought that if they were doing something like this, goodness knows what else I was going to encounter!
AT&T assembler syntax >:(
This counter-intuitive, obscure syntax has ruined many of my days, for example, the simple Intel syntax assembly instruction:
mov dword es:[ebp-5], 1 /* Cool, put the value 1 into the
* location of ebp minus five.
* this is so obvious and readable, and hard to mistake
* for anything else */
translates into this in AT&T syntax
movl $1, %es:-4(%ebp) /* huh? what's "l"? 4 bytes? 8 bytes? arch specific??
* wait, why are we moving 1 into -4 times ebp?
* or is this moving -4 * ebp into memory at address 0x01?
* oh wait, YES, I magically know that this is
* really setting 4 bytes at ebp-5 to 1!
More...
mov dword [foo + eax*4], 123 /* Intel */
mov $123, foo(, %eax, 4) /* AT&T, looks like a function call...
* there's no way in hell I'd know what this does
* without reading a full manual on this syntax */
And one of my favorites.
It's as if they took the opcode encoding scheme and tried to incorporate it into the programming syntax (read: scale/index/base), but also tried to add a layer of abstraction on the data types, and merge that abstraction into the opcode names to cause even more confusion. I don't see how anyone can program seriously with this.
In a scripting language (Concordance Programming Language) for stand alone database software (Concordance) used for litigation document review, arrays were 0 indexed while (some) string functions were 1 indexed. I haven't touched it since.
This. I had my run in with it more then once.
GNU extensions are often fun:
my_label:
unsigned char *ptr = (unsigned char *)&&my_label;
*ptr = 5; // Will it segfault? Finding out is half the fun...
The syntax for member pointers also causes me grief, more because I don't use it often enough than because there's anything really tricky about it:
template<typename T, int T::* P>
function(T& t)
{
t.*P = 5;
}
But, really, who needs to discuss the obscure syntax in C++? With operator overloading, you can invent your own!

Resources