Are there sequence-operator implementations in .NET 4.0? - linq

With that I mean similar to the Linq join, group, distinct, etc. only working on sequences of values, not collections.
The difference between a sequence and a collection is that a sequence might be infinite in length, whereas a collection is finite.
Let me give you an example:
var c1 = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
var c2 = FunctionThatYieldsFibonacciNumbers();
var c3 = c1.Except(c2);
This does not work. The implementation of Except does not work on the basis that the numbers in either collection will be strictly ascending or descending, so it first tries to gather all the values from the second collection into a set (or similar), and only after that will it start enumerating the first collection.
Assuming that the function above is just a While-loop that doesn't terminate unless you explicitly stop enumerating it, the above code will fail with a out-of-memory exception.
But, given that I have collections that are considered to be strictly ascending, or descending, are there any implementations already in .NET 4.0 that can do:
Give me all values common to both (inner join)
Give me all values of both (union/outer join)
Give me all values in sequence #1 that isn't in sequence #2
I need this type of functionality related to a scheduling system I need to build, where I need to do things like:
c1 = the 1st and 15th of every month from january 2010 and onwards
c2 = weekdays from 2010 and onwards
c3 = all days in 2010-2012
c4 = c1 and c2 and c3
This would basically give me every 1st and 15th of every month in 2010 through 2012, but only when those dates fall on weekdays.
With such functions it would be much easier to generate the values in question without explicitly having to build collections out of them. In the above example, building the first two collections would need to know the constraint of the third collection, and the examples can become much more complex than the above.

I'd say that the LINQ operators already work on general sequences - but they're not designed to work specifically for monotonic sequences, which is what you've got here.
It wouldn't be too hard to write such things, I suspect - but I don't believe anything's built-in; there isn't even anything for this scenario in System.Interactive, as far as I can see.

You may onsider the Seq module of F#, which is automatically invoked by using special F# language constructs like 1 .. 10 which generates a sequence. It supports infinite sequences the way you describe, because it allows for lazy evaluation. Using F# may or may not be trivial in your situation. However, it shouldn't be too hard to use the Seq module directly from C# (but I haven't tried it myself).
Following this Mandelbrot example shows a way to use infinite sequences with C# by hiding yield. Not sure it brings you closer to what you want, but it might help.
EDIT
While you already commented that it isn't worthwhile in your current project and accepted an answer to your question, I was intrigued by the idea and conjured up a little example.
It appeared to be rather trivial and works well in C# with .NET 3.5 and .NET 4.0, by simple including FSharp.Core.dll (download it for .NET 3.5) to your references. Here's an out-of-the box example of an infinite sequence implementing your first use-case:
// place in your using-section:
using Microsoft.FSharp.Collections;
using Microsoft.FSharp.Core;
// [...]
// trivial 1st and 15th of the month filter, starting Jan 1, 2010.
Func<int, DateTime> firstAndFifteenth = (int i) =>
{
int year = i / 24 + 2010;
int day = i % 2 != 0 ? 15 : 1;
int month = ((int)i / 2) % 12 + 1;
return new DateTime(year, month, day);
};
// convert func to keep F# happy
var fsharpFunc = FSharpFunc<int, DateTime>.FromConverter(
new Converter<int, DateTime>(firstAndFifteenth));
// infinite sequence, returns IEnumerable
var infSeq = SeqModule.InitializeInfinite<DateTime>(fsharpFunc);
// first 100 dates
foreach (var dt in infSeq.Take(100))
Debug.WriteLine("Date is now: {0:MM-dd-yyy}", dt);
Output is as can be expected, first few lines like so:
Date is now: 01-01-2010
Date is now: 01-15-2010
Date is now: 02-01-2010
Date is now: 02-15-2010
Date is now: 03-01-2010

Related

Is there a way to use range with Z3ints in z3py?

I'm relatively new to Z3 and experimenting with it in python. I've coded a program which returns the order in which different actions is performed, represented with a number. Z3 returns an integer representing the second the action starts.
Now I want to look at the model and see if there is an instance of time where nothing happens. To do this I made a list with only 0's and I want to change the index at the times where each action is being executed, to 1. For instance, if an action start at the 5th second and takes 8 seconds to be executed, the index 5 to 12 would be set to 1. Doing this with all the actions and then look for 0's in the list would hopefully give me the instances where nothing happens.
The problem is: I would like to write something like this for coding the problem
list_for_check = [0]*total_time
m = s.model()
for action in actions:
for index in range(m.evaluate(action.number) , m.evaluate(action.number) + action.time_it_takes):
list_for_check[index] = 1
But I get the error:
'IntNumRef' object cannot be interpreted as an integer
I've understood that Z3 isn't returning normal ints or bools in their models, but writing
if m.evaluate(action.boolean):
works, so I'm assuming the if is overwritten in a way, but this doesn't seem to be the case with range. So my question is: Is there a way to use range with Z3 ints? Or is there another way to do this?
The problem might also be that action.time_it_takes is an integer and adding a Z3int with a "normal" int doesn't work. (Done in the second part of the range).
I've also tried using int(m.evaluate(action.number)), but it doesn't work.
Thanks in advance :)
When you call evaluate it returns an IntNumRef, which is an internal z3 representation of an integer number inside z3. You need to call as_long() method of it to convert it to a Python number. Here's an example:
from z3 import *
s = Solver()
a = Int('a')
s.add(a > 4);
s.add(a < 7);
if s.check() == sat:
m = s.model()
print("a is %s" % m.evaluate(a))
print("Iterating from a to a+5:")
av = m.evaluate(a).as_long()
for index in range(av, av + 5):
print(index)
When I run this, I get:
a is 5
Iterating from a to a+5:
5
6
7
8
9
which is exactly what you're trying to achieve.
The method as_long() is defined here. Note that there are similar conversion functions from bit-vectors and rationals as well. You can search the z3py api using the interface at: https://z3prover.github.io/api/html/namespacez3py.html

Calculating year dummy variables using a for loop (foreach) in Stata

I am attempting to generate a dummy variable for each year from 1996 to 2012 (inclusive) such that the 1996 dummy should equal 1 if it is 1996 and 0 if else using the foreach command in Stata to cut down on time (at least for future projects). What is currently happening is that the dummy for 1996 is being produced, but no others are generated.
I think that it has to do with how I am defining j, but I cannot quite figure out the formatting to achieve the results that I want. I have looked online and in the Stata help files and cannot find anything on this specific topic.
Here is what I have thus far:
local var year
local j = 1996
foreach j of var year {
gen d`j' = 1 if year==`j'
local ++j
}
I will continue to try and figure this out on my own, but if anyone has a suggestion I would be greatly appreciative.
Let us look at this line by line.
local var year
You defined a local macro var with content "year". This is legal but you never refer to that local macro in this code, so the definition is pointless.
local j = 1996
You defined a local macro j with content "1996". This is legal.
foreach j of var year {
You open a loop and define the loop index to be j. That means that within the loop any reference to local macro j will be interpreted in terms of the list of arguments you provide. (The previous definition of j is irrelevant within the loop, and so has no effect in the rest of your code.)
... of var year
You specify that the loop is over a variable list here. Note that the keyword var here is short for varlist and has absolutely nothing to do the local macro name var you just defined. The variable list consists of the single variable name year.
gen d`j' = 1 if year==`j'
This statement will be interpreted, the one and only time the loop is executed, as
gen dyear = 1 if year==year
as references to the local macro j are replaced with its contents, the variable name year. year==year is true for every observation. The effect is a new variable dyear which is 1 in every observation. That is not an indicator or dummy variable as you want it. If you look at your dataset carefully, you will see that is not a dummy variable for year being 1996.
local ++j
You are trying to increment the local macro j by 1. But you just set local macro j to contain the string "year", which is a variable name. But you can't add 1 to a string, and so the error message will be type mismatch. You don't report that error, which is a surprise. It is a little subtle, as in the previous command the context of generate allows interpretation of the reference to year as an instruction to calculate with the variable year, which is naturally numeric. But local commands are all about string manipulation, which may or may not have numeric interpretation, and your command is equivalent, first of all, to instructing Stata to add
"year" + 1
which triggers a type mismatch error.
Turning away from your code: Consider a loop
forval y = 1996/2012 {
gen d`y' = 1 if year == `y'
}
This is closer to what you want but makes clearer another bug in your code. This would create variables d1996 to d2012 but each will be 1 in the year specified but missing otherwise, which is not what you want.
You could fix that by adding a further line in the loop
replace d`y' = 0 if year != `y'
but a much cleaner way to do it is the single line
gen d`y' = year == `y'
The expression
year == `y'
is evaluated as 1 when true and 0 when false, which is what you want.
All this is standard technique documented in [U] or [P].
As #Roberto Ferrer pointed out, however, experienced Stata users would not define dummies this way, as tabulate offers an option to do it without a loop.
A tutorial that brings together comments on local macros, foreach and forvalues loops is within http://www.stata-journal.com/sjpdf.html?articlenum=pr0005
search foreach
within Stata would have pointed to that as one of various pieces you can read.
Looping is not necessary. Try the tabulate command with the gen() option. See help tabulate oneway.
See also help xi and help factor variables.
You are trying to loop through the distinct values of year but the syntax is not correct. You are actually looping through a list of variables with only one element: year. The command levelsof gives you the distinct values, but like I said, looping is not necessary.
Maybe this might help.
/*assuming the data is from 1970-2012*/
/*assuming your year variable name is fyear*/
forvalues x=1970/2012 {
gen fyear `x'=0
replace fyear `x'=1 if fyear==`x'
}
However, I do agree with Roberto Ferrer that loop may not be necessary.

Creating a SPSS loop function

I need to create a syntax loop that runs a series of transformation
This is a simplified example of what I need to do
I would like to create five fruit variables
apple_variable
banana_variable
mango_variable
papaya_variable
orange_variable
in V1
apple=1
banana=2
mango=3
papaya=4
orange=5
First loop
IF (V1={number}) {fruit}_variable = VX.
IF (V2={number}) {fruit}_variable = VY.
IF (V3={number}) {fruit}_variable = VZ.
Run loop for next fruit
So what I would like is the scripte to check if V1, V2 or V3 contains the fruit number. If one of them does (only one can) The new {fruit}_variable should get the value from VX, VY or VZ.
Is this possible? The script need to create over 200 variables so a bit to time consuming to do manually
The first loop can be put within a DO REPEAT command. Essentially you define your two lists of variables and you can loop over the set of if statements.
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = 1 apple_variable = VA.
END REPEAT.
Now 1 and apple_variable are hard coded in the example above, but we can roll this up into a simple macro statement to take arbitrary parameters.
DEFINE !fruit (!POSITIONAL = !TOKENS(1)
/!POSITIONAL = !TOKENS(1)).
DO REPEAT V# = V1 V2 V3
/VA = VX VY VZ.
if V# = !1 !2 = VA.
END REPEAT.
!ENDDEFINE.
!fruit 1 apple_variable.
Now this will still be a bit tedious for over 200 variables, but should greatly simplify the task. After I get this far I typically just do text editing to my list to call the macro 200 times, which in this instance all that it would take is inserting !fruit before the number and the resulting variable name. This works well especially if the list is static.
Other approaches using the in-built SPSS facilities (mainly looping within the defined MACRO) IMO tend to be ugly, can greatly complicate the code and frequently not worth the time (although certainly doable). Although that is somewhat mitigated if you are willing to accept a solution utilizing python commands.
DO REPEAT is a good solution here, but I'm wondering what the ultimate goal is. This smells like a problem that might be solved by using the multiple response facilities in Statistics without the need to go through these transformations. Multiple response functionality is available in the old MULTIPLE RESPONSE procedure and in the newer CTABLES and Chart Builder facilities.
HTH,
Jon Peck
combination of loop statements: for,while, do while with nested if..else and switch case will do the trick. just make sure you have your initial value and final value for the loop to go
let's say:
for (initial; final; increment)
{
if (x == value) {
statements;
}else{
...
}

When are numbers NOT Magic?

I have a function like this:
float_as_thousands_str_with_precision(value, precision)
If I use it like this:
float_as_thousands_str_with_precision(volts, 1)
float_as_thousands_str_with_precision(amps, 2)
float_as_thousands_str_with_precision(watts, 2)
Are those 1/2s magic numbers?
Yes, they are magic numbers. It's obvious that the numbers 1 and 2 specify precision in the code sample but not why. Why do you need amps and watts to be more precise than volts at that point?
Also, avoiding magic numbers allows you to centralize code changes rather than having to scour the code when for the literal number 2 when your precision needs to change.
I would propose something like:
HIGH_PRECISION = 3;
MED_PRECISION = 2;
LOW_PRECISION = 1;
And your client code would look like:
float_as_thousands_str_with_precision(volts, LOW_PRECISION )
float_as_thousands_str_with_precision(amps, MED_PRECISION )
float_as_thousands_str_with_precision(watts, MED_PRECISION )
Then, if in the future you do something like this:
HIGH_PRECISION = 6;
MED_PRECISION = 4;
LOW_PRECISION = 2;
All you do is change the constants...
But to try and answer the question in the OP title:
IMO the only numbers that can truly be used and not be considered "magic" are -1, 0 and 1 when used in iteration, testing lengths and sizes and many mathematical operations. Some examples where using constants would actually obfuscate code:
for (int i=0; i<someCollection.Length; i++) {...}
if (someCollection.Length == 0) {...}
if (someCollection.Length < 1) {...}
int MyRidiculousSignReversalFunction(int i) {return i * -1;}
Those are all pretty obvious examples. E.g. start and the first element and increment by one, testing to see whether a collection is empty and sign reversal... ridiculous but works as an example. Now replace all of the -1, 0 and 1 values with 2:
for (int i=2; i<50; i+=2) {...}
if (someCollection.Length == 2) {...}
if (someCollection.Length < 2) {...}
int MyRidiculousDoublinglFunction(int i) {return i * 2;}
Now you have start asking yourself: Why am I starting iteration on the 3rd element and checking every other? And what's so special about the number 50? What's so special about a collection with two elements? the doubler example actually makes sense here but you can see that the non -1, 0, 1 values of 2 and 50 immediately become magic because there's obviously something special in what they're doing and we have no idea why.
No, they aren't.
A magic number in that context would be a number that has an unexplained meaning. In your case, it specifies the precision, which clearly visible.
A magic number would be something like:
int calculateFoo(int input)
{
return 0x3557 * input;
}
You should be aware that the phrase "magic number" has multiple meanings. In this case, it specifies a number in source code, that is unexplainable by the surroundings. There are other cases where the phrase is used, for example in a file header, identifying it as a file of a certain type.
A literal numeral IS NOT a magic number when:
it is used one time, in one place, with very clear purpose based on its context
it is used with such common frequency and within such a limited context as to be widely accepted as not magic (e.g. the +1 or -1 in loops that people so frequently accept as being not magic).
some people accept the +1 of a zero offset as not magic. I do not. When I see variable + 1 I still want to know why, and ZERO_OFFSET cannot be mistaken.
As for the example scenario of:
float_as_thousands_str_with_precision(volts, 1)
And the proposed
float_as_thousands_str_with_precision(volts, HIGH_PRECISION)
The 1 is magic if that function for volts with 1 is going to be used repeatedly for the same purpose. Then sure, it's "magic" but not because the meaning is unclear, but because you simply have multiple occurences.
Paul's answer focused on the "unexplained meaning" part thinking HIGH_PRECISION = 3 explained the purpose. IMO, HIGH_PRECISION offers no more explanation or value than something like PRECISION_THREE or THREE or 3. Of course 3 is higher than 1, but it still doesn't explain WHY higher precision was needed, or why there's a difference in precision. The numerals offer every bit as much intent and clarity as the proposed labels.
Why is there a need for varying precision in the first place? As an engineering guy, I can assume there's three possible reasons: (a) a true engineering justification that the measurement itself is only valid to X precision, so therefore the display shoulld reflect that, or (b) there's only enough display space for X precision, or (c) the viewer won't care about anything higher that X precision even if its available.
Those are complex reasons difficult to capture in a constant label, and are probbaly better served by a comment (to explain why something is beng done).
IF the use of those functions were in one place, and one place only, I would not consider the numerals magic. The intent is clear.
For reference:
A literal numeral IS magic when
"Unique values with unexplained meaning or multiple occurrences which
could (preferably) be replaced with named constants." http://en.wikipedia.org/wiki/Magic_number_%28programming%29 (3rd bullet)

Mux & Demux w/ LINQ

I'm playing around with using LINQ to Objects for multiplexing and demultiplexing but it seems to me that this is a pretty tricky problem.
See this demuxer signature:
public static IEnumerable<IEnumerable<TSource>> Demux<TSource>(this IEnumerable<TSource> source, int multiplexity)
On an abstract level this is easy but ideally one would want to
remain lazy for the source stream
remain lazy for each multiplexed stream
not reiterate over the same elements
How would you do this?
I'm a bit tired so it could be my concentration failing me here...
Assuming that you want (0, 1, 2, 3) to end up as (0, 2) and (1, 3) when demuxing to two streams, you basically can't do it without buffering. You could buffer only when necessary, but that would be difficult. Basically you need to be able to cope with two contradictory ways of using the call...
Getting both iterators, and reading one item from each of them:
// Ignoring disposing of iterators etc
var query = source.Demux(2);
var demuxIterator = query.GetEnumerator();
demuxIterator.MoveNext();
var first = demuxIterator.Current;
demuxIterator.MoveNext();
var second = demuxIterator.Current;
first.MoveNext();
Console.WriteLine(first.Current); // Prints 0
second.MoveNext();
Console.WriteLine(second.Current); // Prints 1
Or getting one iterator, then reading both items:
// Ignoring disposing of iterators etc
var query = source.Demux(2);
var demuxIterator = query.GetEnumerator();
demuxIterator.MoveNext();
var first = demuxIterator.Current;
first.MoveNext();
Console.WriteLine(first.Current); // Prints 0
first.MoveNext();
Console.WriteLine(first.Current); // Prints 2
In the second case, it has to either remember the 1, or be able to reread it.
Any chance you could deal with IList<T> instead of IEnumerable<T>? That would "break" the rest of LINQ to Objects, admittedly - lazy projections etc would be a thing of the past.
Note that this is quite similar to the problem that operations like GroupBy have - they're deferred, but not lazy: as soon as you start reading from a GroupBy result, it reads the whole of the input data.

Resources