Related
I want to get the last element of a lazy but finite Seq in Raku, e.g.:
my $s = lazy gather for ^10 { take $_ };
The following don't work:
say $s[* - 1];
say $s.tail;
These ones work but don't seem too idiomatic:
say (for $s<> { $_ }).tail;
say (for $s<> { $_ })[* - 1];
What is the most idiomatic way of doing this while keeping the original Seq lazy?
What you're asking about ("get[ing] the last element of a lazy but finite Seq … while keeping the original Seq lazy") isn't possible. I don't mean that it's not possible with Raku – I mean that, in principle, it's not possible for any language that defines "laziness" the way Raku does with, for example, the is-lazy method.
If particular, when a Seq is lazy in Raku, that "means that [the Seq's] values are computed on demand and stored for later use." Additionally, one of the defining features of a lazy iterable is that it cannot know its own length while remaining lazy – that's why calling .elems on a lazy iterable throws an error:
my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy; # OUTPUT: «True»
$s.elems; # THROWS: «Cannot .elems a lazy list onto a Seq»
Now, at this point, you might reasonably be thinking "well, maybe Raku doesn't know how long $s is, but I can tell that it has exactly 10 elements in it." And you're not wrong – with that code, $s is indeed guaranteed to have 10 elements. This means that, if you want to get the tenth (last) element of $s, you can do so with $s[9]. And accessing $s's tenth element like that won't change the fact that $s.is-lazy.
But, importantly, you can only do so because you know something "extra" about $s, and that extra info undoes a good chunk of the reason you might want a list to be lazy in practice.
To see what I mean, consider a very similar Seq
my $s2 = lazy gather for ^10 { last if rand > .95; take $_ };
say $s2.is-lazy; # OUTPUT: «True»
Now, $s2probably has 10 elements, but it might not – the only way to know is to iterate through it and find out. In turn, this means $s2[9] does not jump to the tenth element the way $s[9] did; it iterates through $s2 just like you'd need to. And, as a result, if you run $s2[9], then $s2 will no longer be lazy (i.e., $s2.is-lazy will return False).
And this is, in effect, what you did in the code in your question:
my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy; # OUTPUT: «True»
say (for $s<> { $_ }).tail; # OUTPUT: «9»
say $s.is-lazy; # OUTPUT: «False»
Because Raku cannot ever know that it has reached the tail of a lazy Seq, the only way it could tell you the .tail is to fully iterate $s. And that necessarily means that $s is no longer lazy.
Two complications
It's worth mentioning two adjacent topics that aren't actually related but that are close enough that they trip some people up.
First, nothing I've said about lazy iterables not knowing their length precludes some non-lazy iterables from knowing their length. Indeed, a decent number of Raku types do both the Iterator role and the PredictiveIterator role – and the main point of a PredictiveIterator is that it does know how many elements it can produce without needing to produce/iterate them. But PredictiveIterators cannot be lazy.
The second potentially confusing topic is closely related to the first: while no PredictiveIterator can be lazy (that is, none will ever have an .is-lazy method that returns True), some PredictiveIterators have behavior that is very similar to laziness – and, in fact, may even be colloquially referred to as "lazy".
I can't do a great job explaining this distinction because, quite honestly, I don't fully understand it myself. But I can give you an example: the .lines method on an IO::Handle. It's certainly the case that reading the lines of a huge file behaves a lot like it's dealing with a lazy iterable. most obviously, you can process each line without ever having the whole file in memory. And the docs even say that "lines are read lazily" with the .lines method.
On the other hand:
my $l = 'some-file-with-100_000-lines.txt'.IO.lines;
say $l.is-lazy; # OUTPUT: «False»
say $l.iterator ~~ PredictiveIterator; # OUTPUT: «True»
say $l.elems; # OUTPUT: «100000»
So I'm not quite sure whether it's fair to say that $l "is a lazy iterable", but if it is, it's "lazy" in a different way than $s was.
I realize that was a lot, but I hope it is helpful. If you have a more specific use case in mind for laziness (I bet it wasn't gathering the numbers from zero to nine!), I'd be happy to address that more specifically. And if anyone else can fill in some of the details with .lines and other lazy-not-lazy PredictiveIterators, I'd really appreciate it!
Drop the lazy
Lazy sequences in Raku are designed to work well as is. You don't need to emphasize they're lazy by adding an explicit lazy.
If you add an explicit lazy, Raku interprets that as a request to block operations such as .tail because they will almost certainly immediately render laziness moot, and, if called on an infinite sequence, or even just a sufficiently large one, hang or OOM the program.
So, either drop the lazy, or don't invoke operations like .tail that will be blocked if you do.
Expanded version of my original answer
As noted by #ugexe, the idiomatic solution is to drop the lazy.
Quoting my answer to the SO About Laziness:
if a gather is asked if it's lazy, it returns False.
Aiui, something like the following applies:
Some lazy sequence producers may be actually or effectively infinite. If so, calling .tail etc on them will hang the calling program. Conversely, other lazy sequences perform fine when all their values are consumed in one go. How should Raku distinguish between these two scenarios?
A decision was made in 2015 to let value producing datatypes emphasize or deemphasize their laziness via their response to an .is-lazy call.
Returning True signals that a sequence is not only lazy but wants to be known to be lazy by consuming code that calls .is-lazy. (Not so much end-user code but instead built in consuming features such as # sigilled variables handling an assignment trying to determine whether or not to assign eagerly.) Built in consuming features take a True as a signal they ought block calls like .tail. If a dev knows this is overly conservative, they can add an eager (or remove an unneeded lazy).
Conversely, a datatype, or even a particular object instance, may return False to signal that it does not want to be considered lazy. This may be because the actual behaviour of a particular datatype or instance is eager, but it might instead be that it is lazy technically, but doesn't want a consumer to block operations such as .tail because it knows they will not be harmful, or at least prefers to have that be the default presumption. If a dev knows better (because, say, it hangs the program), or at least does not want to block potentially problematic operations, they can add a lazy (or remove an unneeded eager).
I think this approach works well, but it doc and error messages mentioning "lazy" may not have caught up with the shift made in 2015. So:
If you've been confused by some doc about laziness, please search for doc issues with "lazy" in them, or "laziness", and add comments to existing issues, or file a new doc issue (perhaps linking to this SO answer).
If you've been confused by a Rakudo error message mentioning laziness, please search for Rakudo issues with "lazy" in them, and tagged [LTA] (which means "Less Than Awesome"), and add comments, or file a new Rakudo issue (with an [LTA] tag, and perhaps a link to this SO answer).
Further discussion
the docs ... say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.”
Yes. Aiui this is correct.
[which] sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.
No.
An explicit lazy statement prefix or method adds emphasis to laziness, and Raku interprets that to mean it ought block operations like .tail in case they hang the program.
In contrast, binding to a variable alters neither emphasis nor deemphasis of laziness, merely relaying onward whatever the bound producer datatype/instance has chosen to convey via .is-lazy.
not only in connection with gather but elsewhere as well
Yes. It's about the result of .is-lazy:
my $x = (1, { .say; $_ + 1 } ... 1000);
my $y = lazy (1, { .say; $_ + 1 } ... 1000);
both act lazily ... but $x.tail is possible while $y.tail is not.
Yes.
An explicit lazy statement prefix or method forces the answer to .is-lazy to be True. This signals to a consumer that cares about the dangers of laziness that it should become cautious (eg rejecting .tail etc.).
(Conversely, an eager statement prefix or method can be used to force the answer to .is-lazy to be False, making timid consumers accept .tail etc calls.)
I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.
It's two kinds of what I'll call consumption guidance:
Don't-tail-me If an object returns True from an .is-lazy call then it is treated as if it might be infinite. Thus operations like .tail are blocked.
You-can-tail-me If an object returns False from an .is-lazy call then operations like .tail are accepted.
It's not so much that there's a need to be careful about which of these two kinds is in play, but if one wants to call operations like tail, then one may need to enable that by inserting an eager or removing a lazy, and one must take responsibility for the consequences:
If the program hangs due to use of .tail, well, DIHWIDT.
If you suddenly consume all of a lazy sequence and haven't cached it, well, maybe you should cache it.
Etc.
What I would say is that the error messages and/or doc may well need to be improved.
This is a basic question on the use of Trace Tables to assist in a dry run of a simple algorithm.
What I find most tricky is when to take a new line in the trace table? For example, take the following question:
Here is the array of integers which it applies to:
The following trace table is presented as one completes a dry run. Here is the solution:
I understand that initialising the variables Number, Lower & Upper appear on the first line, but when I go into the While Loop, I am tempted to put the value 5 on the first line also, for the variable Current. In essence, this is what I am tempted to do:
Why does this solution require that the value for Current, which is 5, appear on the second line? I suppose the question could be rephrased to 'When do I take a new line in a trace table?'
Thanks.
I think there is no specific way to do trace table, which means you have to setup your rules before you work and go on.
consider this example:
and this also:
did you notice the difference between loop iterator in each one. In first example they put the initialization value of the iterator in first line, and in second example they put the initialization of the loop iterator in the second line.
also have a look at wiki they also put the loop initialization in the second line.
also this video has similar example to those I posted here and is always start loop iterator in the second line.
also this example has totally different approach, which is each line of code in a new line in the trace table.
you can find also another different approach for trace table here
Finally:
In my opinion chose the rules that make sense for you, for example:
1-first line will contain the default values for the variables.
2-regarding loop iterations, put loop iterator in the same line as the variables that affected by this iteration, like the second example I posted above.
regarding your question I think it's more clear to put Current first value 5 in the second line, so you can track what each loop iteration affect your variables in a clear way.
I am getting a rather strange behaviour when invoking oracle instr function, or probably I'm blind enough not to see my stupid mistake.
Actually I written a procedure to split a string. For example if we have string
a,e,i,o,u
then my split method will look like
string_split('a,e,i,o,u',',',5);
where first parameter is the string to split while second one is the separator and third one is the number of element I know is there after splitting.
Now, of number of things , one thing my procedure do is invoke
start_index := instr(temp_string_to_split,',',1,(total_element-i));
But the moment it is invoked I get a
ORA-06512 ,numeric or value error
But if I invoke
start_index := instr(temp_string_to_split,1,(total_element-i));
the procedure runs,though not in a desirable manner. Note that in second invocation separator parameter is missing, and directly number is passed as the second parameter, which I guess should have cause big time exception. But surprisingly it goes and run fine.
Can somebody explain this anomaly...or help me see if I'm missing something.
Thanks,
Mawia
I'm assuming that in your call to instr, temp_string_to_split is the string that was passed to string_split, and (total_element-i) is meant to be an iterator over the number of splits to make. (As an aside, it seems odd that you have ',' hardcoded in this call, when you appear to be passing it as a parameter to string_split.
I tried emulating this with the following SQL, which worked fine:
SELECT LEVEL,instr('a,e,i,o,u',',',1,LEVEL)
from dual connect by level < 5;
Do you know the exact values of temp_string_to_split, total_element, and i on the call to instr that caused the error?
Thanks a lot all for responding.
Actually as I told earlier, I was calling
start_index := instr(temp_string_to_split,',',1,(total_element-i));
in a loop. Now as a final value of the loop
(total_element-i)
was getting negative. And this was the root of malady.
One thing, I'm still puzzled though is as it was a run time generated condition, that is to say everything before the final invocation was legal. Then why I dont see on console the result of few of DBMS_OUTPUT.PUT_LINE statement which I had put into to trace the execution.
Thanks,
Mawia
I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times
I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).
This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.
If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.
A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#
Which (if any) of the following will give the smallest performance hit? Or is the difference so small that I should use the most readable?
In the same page I've noted 3 styles used by previous maintainers:
Method 1:
If (strRqMethod = "Forum" or strRqMethod = "URL" or strRqMethod = "EditURL" or strRqMethod = "EditForum") Then
...
End If
Method 2:
Select Case strRqMethod
Case "Reply", "ReplyQuote", "TopicQuote"
'This is the only case in this statement...'
...
End Select
Method 3:
If InArray("Edit,EditTopic,Reply,ReplyQuote,Topic,TopicQuote",strRqMethod) Then
...
End If
.
.
.
'Elsewhere in the code'
function InArray(strArray,strValue)
if strArray <> "" and strArray <> "0" then
if (instr("," & strArray & "," ,"," & strValue & ",") > 0) then
InArray = True
else
InArray = False
end if
else
InArray = False
end if
end function
Moving away from Classic ASP/VBScript is not an option, so those comments need not bother to post.
You can benchmark this yourself to get the best results, as some performance will differ depending on the size of the input string.
However, I will say from a maintenance perspective, the second one is a bit easier to read/understand.
Well Method 3 is clearly going to perform worse than the other two.
Between Method 1 and Method 2 the difference is going to be marginal. Its worth remembering that VBScript doesn't do boolean expression short cutting hence in Method 1 strRqMethod will be compared with all strings even if it matches the first one. The Case statement in Method 2 at least has the option not to do that and likely will stop comparing when the first match is found in the set.
Utimately I would choose Method 2 not because I think it might be faster but because it expresses the intent of the code in the clearest way.
Educated guess:
Performance-wise, first two approaches are roughly equivalent; third method is very likely slower, even if it gets inlined.
Furthermore the differential between the first two are likely in the micro-seconds range, so you can safely consider this to be a bone fide case of premature optimization...
Since we're on the topic of OR-ed boolean evaluation, a few things to know:
Most compilers/interpreters will evaluate boolean expressions with "short circuit optimization", which means that at the first true condition found, the subsequent OR-ed conditions are NOT evaluated (since they wouldn't change the outcome). It is therefore a good idea to list the condition in [rough] decreasing order of probability, i.e. listing all the common cases first. (Also note that short circuit evaluation is also used with AND-ed expressions, but of course in the reverse, i.e. at the first false condition, the evalation stops, hence suggesting to write the expression with the most likely conditions to fail first).
Comparing strings is such a common task that most languages have this done in a very optimized fashion, at a very low level of the language. Most any trick we can think to improve this particular task is typically less efficient than the native operator.
As long as this is not done 100.000 (in other words: a lof of) times in a loop, it makes no difference. Although it is parsed code, we may still assume that the parsing is done swift and quickly enough not to make a difference.
I found severe performance problems only when you are concatenating a lot of strings - like I once found out when running a page, adding debug code to a global string to be able to dispay the debug only at the bottom of the page. The longer the page was, the more code it ran, the more debug code I added, and the longer the time it took to display the page. Since this page was doing some database access, I presumed it was somewhere in that code that the delay occured, only to found out that it was just the debug statements (to be honest, I had a log of debug string concatenated).