Binary Search Tree (wrapper) [closed] - data-structures

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I missed class last week so I am watching the online lecture (my professor records his lectures) and he keeps referring to a "wrapper".
I don't recall him explaining what a wrapper is. What does he mean by that? I googled it but to no avail. The language we are learning data structures in is C. Thank you!

A wrapper is usually some object or function that hides a full implementation behind an easier-to-use interface. Typically, a wrapper object around a binary search tree would be an object that exports nice functions like "insert," "delete," and "lookup" without any of those functions taking in explicit node pointers. That way, the binary search tree can be used without leaking the details of the representation to the client.
You sometimes also see the term "wrapper" used to represent any piece of software that sits atop some other software and simplifies it. For example, some libraries for networking might be wrappers around the sockets API - they use sockets as an underlying representation, but don't expose that to clients. That way, clients can use the easier library rather than concerning themselves with all the low-level details of the sockets API. You also sometimes see C++ wrappers around C code that use C++ objects, which have constructors, destructors, encapsulation, etc., to simplify the C code.
Wrapper functions are sometimes used to make recursive functions easier to write. In some cases, you might have a recursive function that takes in extra parameters in order to operate properly. A wrapper function might just call the recursive function with the appropriate parameters. That way, you can directly call the wrapper function rather than calling the recursive function, passing in a bunch of other parameters.
Hope this helps!

Related

How to maximize performance of native R script (that will be run thousands of times)? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I'm trying to do a brute-force head-to-head comparison of several statistical tests on the same simulated datasets. I'm going to generate several thousand 'control' and several thousand 'experimental' populations and run the same tests on each set. The wrapper that calls the tests is going to be called thousands of times.
My questions are:
Is the below plan for doing this a good one?
Can you think of any way to squeeze more speed out of it (without rewriting native R functions, though I am okay with copying a subset of their internal code and running just that instead of the whole function)?
The Plan
I already have the simulated populations, and will use the appropriate apply function to pass the control and corresponding experimental observations to the wrapper.
The wrapper will have no arguments other than the control and experimental observations (let's call them xx and yy). Everything else will be hardcoded within the wrapper in order to avoid as much as possible the overhead of flow control logic and copying data between environments.
Each function to be called will be on a separate line, in a consistent format, in order of dependency (in the sense that, for example, cox.zph depends on there already existing a coxph object, so coxph() will be called earlier than cox.zph()). The functions will be wrapped in try() and if a function fails, the output and the functions that depend on it first test whether the object it returned has try-error as its first class and if it does, some kind of placeholder value.
The block of called functions will be followed by a long c() statement with each item extracted from the respective fit objects on a separate line. Here too, if the source object turns out to be try-error or a placeholder, put an NA in that output slot.
This way, the whole run isn't aborted if some of the functions fail, and the output from each simulation is a numeric vector of the same length, suitable for capturing to a matrix.
Depending on the goals of a given set of simulations, I can comment out or insert additional tests and results as needed.
A Few More Specific Followup Questions
If I'm already using compilePKGS(T) and enableJIT(3) (from the built-in compiler library), is there anything further to be gained by manually running compile() or cmpfun() on my wrapper function and the interpreted functions it calls?
Does anybody have any guidance on choosing the best enableJIT() value, or if I don't care about startup time, is "the more, the better"?
If each simulation is a new random variable, I have nothing to gain from memoizing, right?
For long-running tasks I like to have the inner function check to see if there exists a file of a given name and if so, source it into its environment. This allows me regain control of the session on the fly to fix problems, run browser(), save out internal objects, etc. without having to abort the whole run. But, I imagine that pinging the file system that often will start to add up. Is there a consensus on the most efficient way to communicate a boolean value (i.e. source the debug script or don't) to a running R process (under Linux)?
Thanks.
This will likely only address parts of your questions. I had luck speeding up processes by avoiding the apply function as well. apply is not vectorized and actually takes quite a bit of time. I saw gains using nested ifelse() statements.
Have you tried Rprof()? It was useful in my case in identifying slow elements of my code. Not a solution per se but a useful diagnostic.

Questions about encryption [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
It is my intention to reinvent the wheel. I learn many things by example. That's important information you should consider when answering.
This isn't homework, it's purely a side project.
Now, I've heard of modulus math. I'm trying to use it to make an uncrackable (by today's technology standards).
I have more than one question:
How can you make encryption harder to crack (asymmetric)
Random Numbers or Patterns (Like, there's a set standard, not random), why or why not.
Is this formula vulnerable:
a = (unknown prime)
b = (unknown prime)
c = (unknown prime)
d = (a ^ b) mod c
Can you get a, b, and c if d = 9? Don't brute force it but actually make a formula to reverse it. If you can do so, post it.
Is this a good way to make a key, seed, or something of that nature? Why or why not?
By all means, answer what you understand, the best answer gets marked as so, and fairly!
Also, if you can, give me references to cryptology texts (Free).
This is a broad question. As templatetypedef has mentioned, you shouldn't be designing any cryptography-related algorithm you plan to use in the real world, so you may want to forget about "trying to use it to make an uncrackable [cipher]" and leave that to the experts.
Answering your questions:
1- The general understanding is that if you want to make a message harder to "crack", you increase the key size. There's no point in developing a entirely new (and most certainly weak) cipher when there are tried and true algorithms for that.
2- It's difficult to tell what's being asked here, but I'll assume you're referring to the randomness of a given ciphertext. The general understanding is that any kind of pattern (non-randomness) of the ciphertext is a very bad sign. Ciphertext should be indistinguishable from pure random data, and there are several batteries of tests to check that (see ENT, Diehard and many others)
3- The formula you give is close to the one used in RSA, although a,b,c are not the primes directly. Also, it's not clear which one of your variables is the plaintext (hint: a cipher that can only encrypt prime numbers is not particularly useful). AFAICT, as you state, it's also not reversible (which is not a good thing at all, unless you don't plan on decrypting the messages ever...)
As a final note, you're looking for a cryptography reference, not a cryptology one. Many people associate cryptology with classic (and easy to break) ciphers, like Caesar's.
Bruce Schneier's Applied Cryptography is a standard textbook on the subject. It's not free, but you can always try university libraries.

F# seems slower than other languages... what can I do to speed it up? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I like F# ; I really, really do. Having been bitten by the "functional programming"-bug, I force myself to use it when I have the opportunity to. In fact, I recently used it (during a one week vacation) to code a nice AI algorithm.
However, my attempts so far (see a SO question related to my first attempt here) seem to indicate that, though undoubtedly beautiful... F# has the slowest execution speed of all the languages I've used.
Am I doing something wrong in my code?
I verbosely explain what I did in my blog post, and in my experiments, I see OCaml and the rest of the group running anywhere from 5x to 35x faster than F#.
Am I the only one with such experiences? I find it disheartening that the language I like the most, is also the slowest one - sometimes by far...
EDIT: Direct GitHub link, where the code lives in various language forms...
EDIT2: Thanks to Thomas and Daniel, speed improved considerably:
Greatest speed boost: moving from "ref" to "mutable" gave a whopping 30%.
Removing exceptions and using while/flagChecks gave another 16%.
Switching from discriminated unions to enums gave another 5%.
"inline" gave 0.5-1%
EDIT3: Dr Jon Harrop joined the fight: 60% speedup, by making ScoreBoard operate directly on the "enumerated" version of the data. The imperative version of F# now runs 3-4 times slower than C++, which is a good result for a VM-based runtime. I consider the problem solved - thanks guys!
EDIT4: After merging all optimizations, these are the results (F# reached C# in imperative style - now if only I could do something about functional style, too!)
real 0m0.221s: That was C++
real 0m0.676s: That was C# (imperative, C++ mirror)
real 0m0.704s: That was F# (imperative, C++ mirror)
real 0m0.753s: That was OCaml (imperative, C++ mirror)
real 0m0.989s: That was OCaml (functional)
real 0m1.064s: That was Java (imperative)
real 0m1.955s: That was F# (functional)
Unless you can give a reasonably sized code sample, it's difficult to tell. Anyway, the imperative F# version should be as efficient as the imperative C# version. I think one approach is to benchmark the two to see what is causing the difference (then someone can help with making that bit faster).
I briefly looked at your code and here are some assorted (untested) suggestions.
You can replace discriminated union Cell with an enum (this means you'll use value types and integer comparison instead of reference types and runtime type tests):
type Cell =
| Orange = 1
| Yellow = 2
| Barren = 3
You can mark some trivial functions as inline. For example:
let inline myincr (arr:int array) idx =
arr.[idx] <- arr.[idx] + 1
Don't use exceptions for control-flow. This is often done in OCaml, but .NET exceptions are slow and should be only used for exceptions. You can replace the for loop in your sample with a while loop and a mutable flag or with a tail-recursive function (a tail-recursive function is compiled into a loop, so it will be efficient, even in imperative solution).
This isn't an answer, per se, but have you tried writing the exact same code in F# and C#, i.e., imperative F# code? The speed should be similar. If you're comparing terse functional code with heavy use of higher-order functions, sequence expressions, lazy values, complex pattern matching, etc.--all things that allow for shorter, clearer (read, more maintainable) code--well, there is frequently a trade-off. Generally, development/maintenance time is much greater than execution time, so it's usually considered a desirable trade-off.
Some references:
F# and C# 's CLR is same then why is F# faster than C#
C# / F# Performance comparison
https://stackoverflow.com/questions/142985/is-a-program-f-any-more-efficient-execution-wise-than-c
Another point to consider: in a functional language you're working at a higher level and it becomes very easy to overlook the costs of operations. For example, Seq.sort seems innocent enough, but naive use of it can doom performance. I'd recommend poring over your code, asking yourself along the way if you understand the cost of each operation. If you're not feeling reflective, a faster way to do this is, of course, with a profiler.

Dividing 1 by a huge integer [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have to divide 1 by a number X of more than 4000 digits that I have stored in a string and obviously this is going to return a floating point number. I'm looking for algorithms to perform this division efficiently but I could not find anything that convinces me.
As a side note, I would like to implement the algorithm on my own without using a third-party library.
Anyone have any idea?
Thanks!
EDIT:
The reason why I do not want to use a third-party library it's that I want to do this operation using openCL but without losing too much accuracy in the process. Therefore using one of those libraries is actually not possible in this case.
You are describing a special case of division, known as inverting a number. Here's a paper which gives a description of Picarte's Iteration method of inverting a large integer: http://www.dcc.uchile.cl/~cgutierr/ftp/picarte.pdf
You should take a look at the GNU Multiple Precision Arithmetic Library, it has no limits to the size of the numbers handled, and will obviously have insanely well optimized number crunching algorithms.
As for implementing it yourself, if it's not for educational purposes, I'd say don't fall prey to the NIH syndrome! And a Web search on binary arithmetic should provide a wealth of documents to start with…
You should use the System.Numerics.BigInteger structure, it allows you to make a lot of calculations however it's only available in the .NET 4.0
If your number X is an integer you may well not be able to do what you want. float and double are pretty much out; you'll have to use a long double. On some platforms a long double is just a double.
If you don't want to use a third-party bignum package (why?), you will have to implement the division algorithm on your own (and that is pretty much going to require you to develop a good chunk of a bignum package).

Time to understand a program by LOC

Are there any broad, overgeneralized and mostly useless rules about how long it will take to understand a program based on the number of LOC (lines of code)?
(I understand any rules will be broad, overgeneralized and mostly useless. That's fine.)
(The language in question is Delphi, but that shouldn't matter because I'm looking for broad, overgeneralized and mostly useless rules.)
It's not the number of LOC that determines how long it takes to understand a program, it's more the complexity.
If my program had 100,000 lines of print statements, I think the program is pretty clear to understand. However if I had a program with for-loops nested ten deep, I think that will take far longer to understand.
Cyclomatic complexity can give a ROUGH indication of how hard the code is to understand, and can signal some other warning flags as well about your code.
Some papers concerning peer code review say that it should be somewhere between 100 and 400 lines of code per hour.
I have the theory that it's O(n2) (because you have to understand each line in conjunction with every other line).
But, as usual when using big-o notation to get an actual numeric value, this answer is broad, overgeneralized and mostly useless.
Code review metrics (which is not the same thing, but nearly comparable) put the number in the range of approximately 50-100 LoC per hour, for an experienced code reviewer.
This of course also depends on what they're looking for in the review, language, complexity, familiarity, etc.... But that might give you a general overgeneralization anyway.
You cannot google this because there will be a different approximate number for each individual person programming in a specific language.
You are trying to write the Drake's Equation for program writing.
This is what I mean.
About program writers.
each person has a different style of writing and commenting code
every programming language has different nuances and readability
algorithms can be implemented in many ways even in the same language
data structures used by different people tend to be quite varied
the decision of how code is distributed over source files also changes with personal taste
Moving to the person reading the code.
the familiarity of the person with the language matters
familiarity to the algorithms and data structure patterns used matters
amount of information context that the person can retain at a time matters
Shifting focus to the environment, things that matter would be.
the amount of distraction (both for the programmer and the person trying to read the program)
nearness to code release time for the programmer
pending activities and motivation on the part of the reader
proximity of popular events (vacations, sports events, movie release dates!)
I'm looking for broad, overgeneralized and mostly useless rules.
Sounds to me like you're just trying to find a way to estimate time it will take to learn a new codebase to management or something. In that case, find a code snippet online, and time how long it takes you to understand it. Divide that by the number of lines in the snippet. Add some padding. Bam! There's your rule.
Look at the COCOMO equations. They contain broad, overgeneralized and mostly useless rules based on Source Lines of Code.
Apart from "how complicated is the program?", other variables include things like "how well do you understand it?" and "how well do you understand other things, such as the program's functional specification?"
When I start to work with a new program, I try to understand as little of it as possible! Specifically I try to:
Understand the functional specification of the change that someone wants me to make (if nobody wanted me to change the program then I wouldn't need to understand it at all)
Find and understand the smallest possible subset of the existing program, which will let me make that change without breaking any other, previous/existing functionality.

Resources