I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!
I'm trying to figure out the "best" way to fill a std::map in C++11 with values read from a text file. The solution for that problem should work with STL only.
There actually is a duplicate of my question on SO since the answers drifted off-topic by targeting std::vector instead of std::map, I'm rephrasing it following this advice from meta SO in hope to get some answers to the actual question. I've spotted two approaches/ideas so far:
Use std::fill:
As far as I understand STL does offer (experimental) parallel versions of fill but apparently those do not work with maps since std::fill expects arguments to meet the "ForwardIterator" requirements.
std::async/std:future:
I'm actually unsure if it is a good idea or for that matter even possible to use futures with maps. Since a map is supposed to have a unique keyset and multiple futures theoretically could be added before having an actual key value. In this case will std::map stall the concurrent inserts till the future with undetermined key value has been decided?
Is there any practice of asynchronously loading maps (with values from a text file)? Which solution is considered safe?
This question already has answers here:
What is recursion and when should I use it?
(40 answers)
Closed 9 years ago.
I am wondering, why do people use recursion? In most of my learning experience, I've found it to be much more inefficient that iterative methods, so why do people use it? Is it because you can simply write a shorter method? Is it used in real-world programming outside the classroom setting (or learning purposes)? If it is, please provide a good example if you can, I'm very curious.
Thanks in advance for your help! I really appreciate it!
If you have a tree data structure and you want to walk over it in depth-first order, recursion is the only way to do it.
If you want to write a parser for a typical language having context-free rules, like every programming language in existence, a recursive-descent parser is a simple and natural way to do it.
There is no iterative way to do it with limited storage.
Well, for one thing, it's used in functional programming languages (like Haskell), which don't really have iteration, and they are optimized for recursion. Also, for some problems (like working with binary trees), recursions is a very natural and clean solution.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
So sometimes I need to write a data structure I can't find on Hackage, or what I find isn't tested or quality enough for me to trust, or it's just something I don't want to be a dependency. I am reading Okasaki's book right now, and it's quite good at explaining how to design asymptotically fast data structures.
However, I am working specifically with GHC. Constant factors are a big deal for my applications. Memory usage is also a big deal for me. So I have questions specifically about GHC.
In particular
How to maximize sharing of nodes
How to reduce memory footprint
How to avoid space leaks due to improper strictness/laziness
How to get GHC to produce tight inner loops for important sections of code
I've looked around various places on the web, and I have a vague idea of how to work with GHC, for example, looking at core output, using UNPACK pragmas, and the like. But I'm not sure I get it.
So I popped open my favorite data structures library, containers, and looked at the Data.Sequence module. I can't say I understand a lot of what they're doing to make Seq fast.
The first thing that catches my eye is the definition of FingerTree a. I assume that's just me being unfamiliar with finger trees though. The second thing that catches my eye is all the SPECIALIZE pragmas. I have no idea what's going on here, and I'm very curious, as these are littered all over the code.
Many functions also have an INLINE pragma associated with them. I can guess what that means, but how do I make a judgement call on when to INLINE functions?
Things get really interesting around line ~475, a section headered as 'Applicative Construction'. They define a newtype wrapper to represent the Identity monad, they write their own copy of the strict state monad, and they have a function defined called applicativeTree which, apparently is specialized to the Identity monad and this increases sharing of the output of the function. I have no idea what's going on here. What sorcery is being used to increase sharing?
Anyway, I'm not sure there's much to learn from Data.Sequence. Are there other 'model programs' I can read to gain wisdom? I'd really like to know how to soup up my data structures when I really need them to go faster. One thing in particular is writing data structures that make fusion easy, and how to go about writing good fusion rules.
That's a big topic! Most has been explained elsewhere, so I won't try to write a book chapter right here. Instead:
Real World Haskell, ch 25, "Performance" - discusses profiling, simple specialization and unpacking, reading Core, and some optimizations.
Johan Tibell is writing a lot on this topic:
Computing the size of a data structure
Memory footprints of common data types
Faster persistent structures through hashing
Reasoning about laziness
And some things from here:
Reading GHC Core
How GHC does optimization
Profiling for performance
Tweaking GC settings
General improvements
More on unpacking
Unboxing and strictness
And some other things:
Intro to specialization of code and data
Code improvement flags
applicativeTree is quite fancy, but mainly in a way which has to do with FingerTrees in particular, which are quite a fancy data structure themselves. We had some discussion of the intricacies over at cstheory. Note that applicativeTree is written to work over any Applicative. It just so happens that when it is specialized to Id then it can share nodes in a manner that it otherwise couldn't. You can work through the specialization yourself by inlining the Id methods and seeing what happens. Note that this specialization is used in only one place -- the O(log n) replicate function. The fact that the more general function specializes neatly to the constant case is a very clever bit of code reuse, but that's really all.
In general, Sequence teaches more about designing persistent data structures than about all the tricks for eeking out performance, I think. Dons' suggestions are of course excellent. I'd also just browse through the source of the really canonical and tuned libs -- Map, IntMap, Set, and IntSet in particular. Along with those, its worth taking a look at Milan's paper on his improvements to containers.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
As I do my coding I sometimes wonder if I'm doing things the best way or just the way it's always been done. Does what I'm doing make sense anymore?
For example, declaring all your variables at the top of the function. If I try to declare it twice or below where I start using it my IDE will bark at me at design time - so what's the big deal? It seems like it would make more sense to declare the variables right above the block where they'd be used.
Another one would be hungarian notation. I hate that all my variables related to a particular object are scattered throughout my intellisense.
With modern advancements in frameworks and IDE's, are there some coding practices that don't really apply anymore and others that may be just plain wrong now?
Don't declare variables above the block where they'll be used - declare them in the narrowest scope available, at the point of first use, assuming that's feasible in your language.
Hungarian notation will depend on the conventions for your language/platform. It also depends on which variety of Hungarian you're using - the sensible one (which I'm still not fond of) or the version which only restates the type information already available.
One thing to watch out for: when you take up a new language, make sure you take up the idioms for it at the same time, particularly the naming conventions. This will help your code fit in with the new language, rather than with your old (probably unrelated) code. I find it also helps me to think in tune with the new language as well, rather than fighting against it.
But yes, it's certainly worth revisiting coding practices periodically. If you can't decide why something's a good idea, try doing without it for a while...
Accidental assignment protection:
Putting the lvalue on the right hand side is not needed in some newer languages like C#.
In C# the following won't compile:
if (variable = 0)
So in C# there is no need to do:
if (0 == variable)
This practice is very common in C/C++ programs to avoid accidental assignments that were meant to be comparisons.
Multiple return points:
Disallowing multiple return points was enforced mainly because you don't want to forget to delete your variables.
Instead if you just use RAII you don't need to worry about it.
Disclaimer: There are still good reasons to minimize multiple return points, and sometimes it is useful to have only one.
Header files
In most modern languages, you do not separate your code into declaration and definition.
C++ defines for multiple header file includes
In C++ you used to often do:
#ifdef _MYFILE_H_
#define _MYFILE_H_
//code here
#endif
This sometimes would lead to something like the following though:
#ifdef _MYFILE_H_
#define _WRONGNAME_H_
//code here
#endif
A better way to do this if your compiler supports it:
#pragma once
C variable declarations
With C you had to declare all variables at the top of your block of code. Even later versions of C didn't require this though, but people still do it.
Hungarian notation: (Read, contains some unique info)
Hungarian notation can still be good. But I don't mean that kind of hungarian notation.
Before it was very important in C to have things like:
int iX = 5;
char szX[1024];
strcpy(szX, "5");
Because you could have completely type unsafe functions like:
printf("%i", iX);
Now if I would have called the string x, my program would have crashed.
Of course the fix to this is to use only typesafe functions. So as long as you do that you don't need hungarian notation in this sense.
But still it is a good idea as discussed by Joel in his sense.
I used to separate all my line numbers by 10, starting each logically separate piece of code at intervals of 100 or 1000 i.e.
10 Print "Hello"
20 Gosub 100
30 'Peeks and Pokes
For obvious reasons, I no longer code like this.
Short identifiers: many old-school coders use short, cryptic identifiers. Brevity is a useful virtue but considering that a good IDE has auto-complete, a descriptive name is far better than something easy to type.
Short lines: Some people insist on 80-column text. The rest of us have real monitors and don't mind if a line is longer than 80 chars. It can improve readability to have longer lines.
Aligning in columns (e.g. variables in declarations or = in assignments).
It is a pain to maintain manually, automatic renaming will mess it up anyway, some lines get very long with things belonging together wide apart so you struggle to see the relation.
Like it's been said before, don't try to adapt one language's idioms to another. This is especially true in drastically different languages, such as going from C++ to Python. Also (this might just be a question of personal style), I used to declare a variable, then assign it a value later. I find it much faster and space-efficient to just declare and define it at the same time.
As far as variable declaration, the best place to declare them is just before they are used. If your function/procedure is so large that there are tons of variables declared at the top, consider refactoring the function into multiple, smaller ones.
As far as Hungarian Notation goes, the same answer applies. If the function is so large that you can't quickly spot the definition of the variable (even though it should be declared just before being used), then consider refactoring.
In most cases, a well written, well refactored function should make variable declaration and data type obvious with a quick glance at the code page.
Although it is in Java, this is the book I recommend for people who want to optimize/modernize their coding style: http://www.amazon.com/Implementation-Patterns-Addison-Wesley-Signature-Kent/dp/0321413091
With modern advancements in frameworks and IDE's, are there some coding practices that don't really apply anymore and others that may be just plain wrong now
Depends on the language to a large extent.
W.r.t C:
Using the register keyword
W.r.t C++:
Abusing static; now you are supposed to use namespaces even if anonymous ones
Or, did I misunderstand your question?
Manual ref counting of a pointer is an old practice that drives me absolutely crazy. I fix around 1-2 bugs a month because someone tried to be smart and manually ref count a pointer. Just use a smart pointer. It will save you time.
The variables at the top make sense in a language like javascript. It doesn't has block scope, so it does simplifies the reading.
Consider a function body that contains:
//some code
if(something)
{
var c = 123;
}
alert(c); // gives 123 when the if is executed and undefined when it doesn't.
That is a remainder that each language is different and that definitely can affect what is and isn't appropriate. Also consider the code you use in the related framework usually uses a certain coding style, if you go with something radically different you will inevitable end up with mixed styles.
Update: The above in javascript is changing (as mentioned in a comment). It doesn't seem to be broadly supported (didn't find a good link on it thought :(), which is also a reminder we can't rush into the new features without considering the context we use them.