One sorter function for sort generic collections

One sorter function for sort generic collections - sorting

In my project I have many collection (slices) of different data types. Any particular collection should define fields it can sort by. I want to write one sort function and call it with user input (sort field and order) whenever collection should be sorted. I came up with the following boilerplate code (described only one type of collection, but for others it will be the same): https://gist.github.com/abonec/f1ee23a38e78ea48d470c39885de47ba
I have an interface Sortable for collections. Sortable should be passed to the sortStats with user input of sort field. If concrete implementation supports this type of sort it should return sort.Interface with corresponding interface.
Problem with many repeated implementations of sort.Interface where Len() and Swap() method is identical. Different is only Less().
Is there any approach to get a rid of Len() and Swap() methods in this case or may be other approach to write generic sort function with dynamic sort field?

You're looking for generics, which Go doesn't currently support. See this FAQ entry.
The Go team is working to add generics to the language - it's a work in progress, and everyone is free to participate in the discussion. Once generics exist, they will provide the solution you seek here.
In the meantime, you could use code generation or think of a slightly different design for your problem. Some code duplication is OK too, Go doesn't frown upon it as badly as some other languages.

Related

How to split up really long match on enum with many variants?

What's the usual best practice to split up a really long match on an enum with dozens of variants to handle, each with dozens or hundreds of lines of code?
I've started to create helper functions for each case and just call those functions passing in the enum's fields (or whatever they're called). But it seems a bit redundant to have MyEnum::MyCase{a,b,c} => handle_mycase(a,b,c) many times.
And if that is the best practice, is it possible to destructure MyEnum::MyCase directly in that helper function's parameters, despite the fact that technically it's refutable, since realistically I already know I'm calling it with the right case?

Maybe the crate enum_dispatch helps you.
IIRC, on a high level: It assumes that all your enum variants implement a trait with a function handle_mycase. Then handle_mycase can be called on the enum directly and will be dispatched to the concrete struct.

c/c++ qsort/std::sort equivalent in freepascal

Was there any built-in sort function to sort an array (of string or of integer) in freepascal?
In c/c++ there is std::sort(any_arr) or qsort(any_arr) that can do this..
In Javascript or ruby there is .sort method..
What's in object-pascal?

Generic functions is a topic currently in the planning stage. Currently most list types (as benibela said) have own sort and customsort options.
Some of the generic container types (like tfpsmap in unit fgl) might have generic sort routines, but usually work with a separate compare function pointer/variable to determine order.

Return concrete or abstract datatypes?

I'm in the middle of reading Code Complete, and towards the end of the book, in the chapter about refactoring, the author lists a bunch of things you should do to improve the quality of your code while refactoring.
One of his points was to always return as specific types of data as possible, especially when returning collections, iterators etc. So, as I've understood it, instead of returning, say, Collection<String>, you should return HashSet<String>, if you use that data type inside the method.
This confuses me, because it sounds like he's encouraging people to break the rule of information hiding. Now, I understand this when talking about accessors, that's a clear cut case. But, when calculating and mangling data, and the level of abstraction of the method implies no direct data structure, I find it best to return as abstract a datatype as possible, as long as the data doesn't fall apart (I wouldn't return Object instead of Iterable<String>, for example).
So, my question is: is there a deeper philosophy behind Code Complete's advice of always returning as specific a data type as possible, and allow downcasting, instead of maintaining a need-to-know-basis, that I've just not understood?

I think it is simply wrong for the most cases. It has to be:
be as lenient as possible, be as specific as needed
In my opinion, you should always return List rather than LinkedList or ArrayList, because the difference is more an implementation detail and not a semantic one. The guys from the Google collections api for Java taking this one step further: they return (and expect) iterators where that's enough. But, they also recommend to return ImmutableList, -Set, -Map etc. where possible to show the caller he doesn't have to make a defensive copy.
Beside that, I think the performance of the different list implementations isn't the bottleneck for most applications.

Most of the time one should return an interface or perhaps an abstract type that represents the return value being returned. If you are returning a list of X, then use List. This ultimately provides maximum flexibility if the need arises to return the list type.
Maybe later you realise that you want to return a linked list or a readonly list etc. If you put a concrete type your stuck and its a pain to change. Using the interface solves this problem.
#Gishu
If your api requires that clients cast straight away most of the time your design is suckered. Why bother returning X if clients need to cast to Y.

Can't find any evidence to substantiate my claim but the idea/guideline seems to be:
Be as lenient as possible when accepting input. Choose a generalized type over a specialized type. This means clients can use your method with different specialized types. So an IEnumerable or an IList as an input parameter would mean that the method can run off an ArrayList or a ListItemCollection. It maximizes the chance that your method is useful.
Be as strict as possible when returning values. Prefer a specialized type if possible. This means clients do not have to second-guess or jump through hoops to process the return value. Also specialized types have greater functionality. If you choose to return an IList or an IEnumerable, the number of things the caller can do with your return value drastically reduces - e.g. If you return an IList over an ArrayList, to get the number of elements returned - use the Count property, the client must downcast. But then such downcasting defeats the purpose - works today.. won't tomorrow (if you change the Type of returned object). So for all purposes, the client can't get a count of elements easily - leading him to write mundane boilerplate code (in multiple places or as a helper method)
The summary here is it depends on the context (exceptions to most rules). E.g. if the most probable use of your return value is that clients would use the returned list to search for some element, it makes sense to return a List Implementation (type) that supports some kind of search method. Make it as easy as possible for the client to consume the return value.

I could see how, in some cases, having a more specific data type returned could be useful. For example knowing that the return value is a LinkedList rather than just List would allow you to do a delete from the list knowing that it will be efficient.

I think, while designing interfaces, you should design a method to return the as abstract data type as possible. Returning specific type would make the purpose of the method more clear about what they return.
Also, I would understand it in this way:
Return as abstract a data type as possible = return as specific a data type as possible
i.e. when your method is supposed to return any collection data type return collection rather than object.
tell me if i m wrong.

A specific return type is much more valuable because it:
reduces possible performance issues with discovering functionality with casting or reflection
increases code readability
does NOT in fact, expose more than is necessary.
The return type of a function is specifically chosen to cater to ALL of its callers. It is the calling function that should USE the return variable as abstractly as possible, since the calling function knows how the data will be used.
Is it only necessary to traverse the structure? is it necessary to sort the structure? transform it? clone it? These are questions only the caller can answer, and thus can use an abstracted type. The called function MUST provide for all of these cases.
If,in fact, the most specific use case you have right now is Iterable< string >, then that's fine. But more often than not - your callers will eventually need to have more details, so start with a specific return type - it doesn't cost anything.

What is an elegant way to track the size of a set of objects without a single authoritative collection to reference?

Update: Please read this question in the context of design principles, elegance, expression of intent, and especially the "signals" sent to other programmers by design choices.
I have two "views" of a set of objects. One is a dictionary/map indexing the objects by a string value. The other is a dictionary/map indexing the objects by an ordinal (ordering integer). There is no "master" collection of the objects by themselves that can serve as the authoritative source for the number of objects, but the two dictionaries should always both contain references to all the objects.
When a new item is added to the set a reference is added to both dictionaries, and then some processing needs to be done which is affected by the new total number of objects.
What should I use as the authoritative source to reference for the current size of the set of objects? It seems that all my options are flawed in one dimension or another. I can just consistently reference one of the dictionaries, but that would codify an implication of that dictionary's superiority over the other. I could add a 3rd collection, a simple list of the objects to serve as the authoritative list, but that increases redundancy. Storing a running count seems simplest, but also increases redundancy and is more brittle than referencing a collection's self-tracked count on the fly.
Is there another option that will allow me to avoid choosing the lesser evil, or will I have to accept a compromise on elegance?

I would create a class that has (at least) two collections.
A version of the collection that is
sorted by string
A version of the
collection that is sorted by ordinal
(Optional) A master collection
The class would handle the nitty gritty management:
The syncing of the contents for the collections
Standard collection actions (e.g. Allow users get the size, Add or retrieve items)
Let users get by string or ordinal
That way you can use the same collection wherever you need either behavior, but still abstract away the "indexing" behavior you are going for.
The separate class gives you a single interface with which to explain your intent regarding how this class is to be used.

I'd suggest encapsulation: create a class that hides the "management" details (such as the current count) and use it to expose immutable "views" of the two collections.
Clients will ask the "manglement" object for an appropriate reference to one of the collections.
Clients adding a "term" (for lack of a better word) to the collections will do so through the "manglement" object.
This way your assumptions and implementation choices are "hidden" from clients of the service and you can document that the choice of collection for size/count was arbitrary. Future maintainers can change how the count is managed without breaking clients.
BTW, yes, I meant "manglement" - my favorite malapropism for management (in any context!)

If both dictionaries contain references to every object, the count should be the same for both of them, correct? If so, just pick one and be consistent.

I don't think it is a big deal at all. Just reference the sets in the same order each time
you need to get access to them.
If you really are concerned about it you could encapsulate the collections with a wrapper that exposes the public interfaces - like
Add(item)
Count()
This way it will always be consistent and atomic - or at least you could implement it that way.
But, I don't think it is a big deal.

How does Linq work (behind the scenes)?

I was thinking about making something like Linq for Lua, and I have a general idea how Linq works, but was wondering if there was a good article or if someone could explain how C# makes Linq possible
Note: I mean behind the scenes, like how it generates code bindings and all that, not end user syntax.

It's hard to answer the question because LINQ is so many different things. For instance, sticking to C#, the following things are involved:
Query expressions are "pre-processed" into "C# without query expressions" which is then compiled normally. The query expression part of the spec is really short - it's basically a mechanical translation which doesn't assume anything about the real meaning of the query, beyond "order by is translated into OrderBy/ThenBy/etc".
Delegates are used to represent arbitrary actions with a particular signature, as executable code.
Expression trees are used to represent the same thing, but as data (which can be examined and translated into a different form, e.g. SQL)
Lambda expressions are used to convert source code into either delegates or expression trees.
Extension methods are used by most LINQ providers to chain together static method calls. This allows a simple interface (e.g. IEnumerable<T>) to effectively gain a lot more power.
Anonymous types are used for projections - where you have some disparate collection of data, and you want bits of each of the aspects of that data, an anonymous type allows you to gather them together.
Implicitly typed local variables (var) are used primarily when working with anonymous types, to maintain a statically typed language where you may not be able to "speak" the name of the type explicitly.
Iterator blocks are usually used to implement in-process querying, e.g. for LINQ to Objects.
Type inference is used to make the whole thing a lot smoother - there are a lot of generic methods in LINQ, and without type inference it would be really painful.
Code generation is used to turn a model (e.g. DBML) into code
Partial types are used to provide extensibility to generated code
Attributes are used to provide metadata to LINQ providers
Obviously a lot of these aren't only used by LINQ, but different LINQ technologies will depend on them.
If you can give more indication of what aspects you're interested in, we may be able to provide more detail.
If you're interested in effectively implementing LINQ to Objects, you might be interested in a talk I gave at DDD in Reading a couple of weeks ago - basically implementing as much of LINQ to Objects as possible in an hour. We were far from complete by the end of it, but it should give a pretty good idea of the kind of thing you need to do (and buffering/streaming, iterator blocks, query expression translation etc). The videos aren't up yet (and I haven't put the code up for download yet) but if you're interested, drop me a mail at skeet#pobox.com and I'll let you know when they're up. (I'll probably blog about it too.)

Mono (partially?) implements LINQ, and is opensource. Maybe you could look into their implementation?

Read this article:
Learn how to create custom LINQ providers

Perhaps my LINQ for R6RS Scheme will provide some insights.
It is 100% semantically, and almost 100% syntactically the same as LINQ, with the noted exception of additional sort parameters using 'then' instead of ','.
Some rules/assumptions:
Only dealing with lists, no query providers.
Not lazy, but eager comprehension.
No static types, as Scheme does not use them.
My implementation depends on a few core procedures:
map - used for 'Select'
filter - used for 'Where'
flatten - used for 'SelectMany'
sort - a multi-key sorting procedure
groupby - for grouping constructs
The rest of the structure is all built up using a macro.
Bindings are stored in a list that is tagged with bound identifiers to ensure hygiene. The binding are extracted and rebound locally where ever an expression occurs.
I did track the progress on my blog, that may provide some insight to possible issues.

For design ideas, take a look at c omega, the research project that birthed Linq. Linq is a more pragmatic or watered down version of c omega, depending on your perspective.

Matt Warren's blog has all the answers (and a sample IQueryable provider implementation to give you a headstart):
http://blogs.msdn.com/mattwar/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

One sorter function for sort generic collections - sorting

Related

How to split up really long match on enum with many variants?

c/c++ qsort/std::sort equivalent in freepascal

Return concrete or abstract datatypes?

What is an elegant way to track the size of a set of objects without a single authoritative collection to reference?

How does Linq work (behind the scenes)?

Categories

Resources