Is accessing Generic Objects bad compared to Strict Data-Type classes in AS3? - performance

I'm having a debate with a friend regarding Generic Objects vs. Strict Data-Type instances access.
If I have a fairly large JSON file to convert to objects & arrays of data in Flash, is it best that I then convert those objects to strict AS3 classes dedicated to each objects?
Is there a significant loss on performance depending on the quantity of objects?
What's the technical reason behind this? Does Generic Object leave a bigger foot-print in memory than Strict Data-Type instances of a custom class?

It's hard to answer this question on a generic scale since in the end "it all depends". What it depends on is what type of objects you are working with, how you expose those objects to the rest of the program and what type of requirements you have on your runtime environment.
Generally speaking, generic objects are bad since you no longer have "type security".
Generally speaking, converting objects to typed objects forces you to leave a bigger memory footprint since you need to run that class during runtime, and also forces you to recompile an untyped object "again" into another type of object, causing some extra cpu cycles.
In the end it kinda bowls down to this, if the data that you received is exposed to the rest of system, it's generally a good idea to convert it into some kind of typed object.
Converting it to a typed object and then working on that object, improves code readability and makes it easier to read the code since you don't have to remember if the data/key table used "image" or "Image" or "MapImage" as the accessor to retrieve the image info of something.
Also, if you ever change the backend system to provide other/renamed keys, you only have to do the change in one place, instead of scattered all over the system.
Hope this answer helps :)

Related

Is it possible to use CompUnit modules for collected data?

I have developed a module for processing a collection of documents.
One run of the software collects information about them. The data is stored in two structures called %processed and %symbols. The data needs to be cached for subsequent runs of the software on the same set of documents, some of which can change. (The documents are themselves cached using CompUnit modules).
Currently the data structures are stored / restored as follows:
# storing
'processed.raku`.IO.spurt: %processed.raku;
'symbols.raku`.IO.spurt: %symbol.raku;
# restoring
my %processed = EVALFILE 'processed.raku';
my %symbols = EVALFILE 'symbols.raku';
Outputting these structures into files, which can be quite large, can be slow because the hashes are parsed to create the Stringified forms, and slow on input because they are being recompiled.
It is not intended for the cached files to be inspected, only to save state between software runs.
In addition, although this is not a problem for my use case, this technique cannot be used in general because Stringification (serialisation) does not work for Raku closures - as far as I know.
I was wondering whether the CompUnit modules could be used because they are used to store compiled versions of modules. So perhaps, they could be used to store a 'compiled' or 'internal' version of the data structures?
Is there already a way to do this?
If there isn't, is there any technical reason it might NOT be possible?
(There's a good chance that you've already tried this and/or it isn't a good fit for your usecase, but I thought I'd mention it just in case it's helpful either to you or to anyone else who finds this question.)
Have you considered serializing the data to/from JSON with JSON::Fast? It has been optimized for (de)serialization speed in a way that basic stringification hasn't been / can't be. That doesn't allow for storing Blocks or other closures – as you mentioned, Raku doesn't currently have a good way to serialize them. But, since you mentioned that isn't an issue, it's possible that JSON would fit your usecase.
[EDIT: as you pointed out below, this can make support for some Raku datastructures more difficult. There are typically (but not always) ways to work around the issue by specifying the datatype as part of the serialization step:
use JSON::Fast;
my $a = <a a a b>.BagHash;
my $json = $a.&to-json;
my BagHash() $b = from-json($json);
say $a eqv $b # OUTPUT: «True»
This gets more complicated for datastructures that are harder to represent in JSON (such as those with non-string keys). The JSON::Class module could also be helpful, but I haven't tested its speed.]
After looking at other answers and looking at the code for Precompilation, I realised my original question was based on a misconception.
The Rakudo compiler generates an intermediate "byte code", which is then used at run time. Since modules are self-contained units for compilation purposes, they can be precompiled. This intermediate result can be cached, thus significantly speeding up Raku programs.
When a Raku program uses code that has already been compiled, the compiler does not compile it again.
I had thought of the precompilation cache as a sort of storage of the internal state of a program, but it is not quite that. That is why - I think - #ralph was confused by the question, because I was not asking the right sort of question.
My question is about the storage (and restoration) of data. JSON::Fast, as discussed by #codesections is very fast because it is used by the Rakudo compiler at a low level and so is highly optimised. Consequently, restructuring data upon restoration will be faster than restoring native data types because the slow rate-determining step is storage and restoration from "disk", which JSON does very quickly.
Interestingly, the CompUnit modules I mentioned use low level JSON functions that make JSON::Fast so quick.
I am now considering other ways of storing data using optimised routines, perhaps using a compression/archiving module. It will come down to testing which is fastest. It may be that the JSON route is the fastest.
So this question does not have a clear answer because the question itself is "incorrect".
Update As #RichardHainsworth notes, I was confused by their question, though felt it should be helpful to answer as I did. Based on his reaction, and his decision not to accept #codesection's answer, which at that point was the only other answer, I concluded it was best to delete this answer to encourage others to answer. But now Richard has provided an answer that provides good resolution, I'm undeleting it in the hope that's now more useful.
TL;DR Instead of using EVALFILE, store your data in a module which you then use. There are simple ways to do this that would be minimal but useful improvements over EVALFILE. There are more complex ways that might be better.
A small improvement over EVALFILE
I've decided to first present a small improvement so you can solidify your shift in thinking from EVALFILE. It's small in two respects:
It should only take a few minutes to implement.
It only gives you a small improvement over EVALFILE.
I recommend you properly consider the rest of this answer (which describes more complex improvements with potentially bigger payoffs instead of this small one) before bothering to actually implement what I describe in this first section. I think this small improvement is likely to turn out to be redundant beyond serving as a mental bridge to later sections.
Write a program, say store.raku, that creates a module, say data.rakumod:
use lib '.';
my %hash-to-store = :a, :b;
my $hash-as-raku-code = %hash-to-store .raku;
my $raku-code-to-store = "unit module data; our %hash = $hash-as-raku-code";
spurt 'data.rakumod', $raku-code-to-store;
(My use of .raku is of course overly simplistic. The above is just a proof of concept.)
This form of writing your data will have essentially the same performance as your current solution, so there's no gain in that respect.
Next, write another program, say, using.raku, that uses it:
use lib '.';
use data;
say %data::hash; # {a => True, b => True}
useing the module will entail compiling it. So the first time you use this approach for reading your data instead of EVALFILE it'll be no faster, just as it was no faster to write it. But it should be much faster for subsequent reads. (Until you next change the data and have to rebuild the data module.)
This section also doesn't deal with closure stringification, and means you're still doing a data writing stage that may not be necessary.
Stringifying closures; a hack
One can extend the approach of the previous section to include stringifications of closures.
You "just" need to access the source code containing the closures; use a regex/parse to extract the closures; and then write the matches to the data module. Easy! ;)
For now I'll skip filling in details, partly because I again think this is just a mental bridge and suggest you read on rather than try to do as I just described.
Using CompUnits
Now we arrive at:
I was wondering whether the CompUnit modules could be used because they are used to store compiled versions of modules. So perhaps, they could be used to store a 'compiled' or 'internal' version of the data structures?
I'm a bit confused by what you're asking here for two reasons. First, I think you mean the documents ("The documents are themselves cached using CompUnit modules"), and that documents are stored as modules. Second, if you do mean the documents are stored as modules, then why wouldn't you be able to store the data you want stored in them? Are you concerned about hiding the data?
Anyhow, I will presume that you are asking about storing the data in the document modules, and that you're interested in ways to "hide" that data.
One simple option would be to write the data as I did in the first section, but insert the our %hash = $hash-as-raku-code"; etc code at the end, after the actual document, rather than at the start.
But perhaps that's too ugly / not "hidden" enough?
Another option might be to add Pod blocks with Pod block configuration data at the end of your document modules.
For example, putting all the code into a document module and throwing in a say just as a proof-of-concept:
# ...
# Document goes here
# ...
# At end of document:
=begin data :array<foo bar> :hash{k1=>v1, k2=>v2} :baz :qux(3.14)
=end data
say $=pod[0].config<array>; # foo bar
That said, that's just code being executed within the module; I don't know if the compiled form of the module retains the config data. Also, you need to use a "Pod loader" (cf Access pod from another Raku file). But my guess is you know all about such things.
Again, this might not be hidden enough, and there are constraints:
The data can only be literal scalars of type Str, Int, Num, or Bool, or aggregations of them in Arrays or Hashs.
Data can't have actual newlines in it. (You could presumably have double quoted strings with \ns in them.)
Modifying Rakudo
Aiui, presuming RakuAST lands, it'll be relatively easy to write Rakudo plugins that can do arbitrary work with a Raku module. And it seems like a short hop from RakuAST macros to basic is parsed macros which in turn seem like a short hop from extracting source code (eg the source of closures) as it goes thru the compiler and then spitting it back out into the compiled code as data, possibly attached to Pod declarator blocks that are in turn attached to code as code.
So, perhaps just wait a year or two to see if RakuAST lands and gets the hooks you need to do what you need to do via Rakudo?

Java: Are Instance methods Prohibited for Domain Objects in Functional Programming?

Since Functional Programming treats Data and Behavior separately, and behavior is not supposed to mutate the the state of an Instance, does FP recommend not having instance methods at all for Domain Objects? Or should I always declare all the fields final?
I am asking more in the context of Object oriented languages like Java.
Since Functional Programming treats Data and Behavior separately,
I heard that said a lot, but it is not necessarily true. Yes, syntactically they are different, but encapsulation is a thing in FP too. You don't really want your data structures exposed for the same reason you don't want it in OOP, you want to evolve it later. You want to add features, or optimize it. Once you gave direct access to the data you've essentially lost control of that data.
For example in haskell, there are modules, which are actually the data + behavior in a single unit. Normally the "constructors" of data (i.e. the direct access to "fields") are not available for outside functions. (There are exceptions as always.)
does FP recommends not having instance methods at all for Domain Objects
FP is a paradigm which says that software should be build using a (mathematical) composition of (mathematical) functions. That is essentially it. Now if you squint enough, you could call a method a function, with just one additional parameter this. Provided everything is immutable.
So I would say no, "FP" does not explicitly define syntax and it can be compatible with objects under certain conditions.
I am asking more in the context of Object oriented languages like Java.
This is where it kind-of gets hazy. Java is not well suited to do functional programming. Keep in mind, that it may have borrowed certain syntax from traditional FP languages, but that doesn't make it suitable for FP.
For example immutability, pure functions, function composition are all things that you should have to do FP, Java has none of those. I mean you can write code to "pretend", but you would be swimming against the tide.
does FP recommends not having instance methods at all for Domain Objects?
In the Domain Driven Design book, Eric Evans discusses modeling your domain with Entities, and Value Objects.
The Value Object pattern calls for immutable private data; once you initialize the value object, it does not change. In the language of Command Query Separation, we would say that the interface of a Value Object supports queries, but not commands.
So an instance method on a value object is very similar to a closure, with the immutable private state of the object playing the role of the captured variables.
Your fields should be final, but functional code and instance methods are not mutually exclusive.
Take a look at the BigDecimal class for example:
BigDecimal x = new BigDecimal(1);
BigDecimal y = new BigDecimal(2);
BigDecimal z = a.add(b);
x and y are immutable and the add method leaves them unchanged and creates a new BigDecimal.

Is there such a thing as ‘class bloat’ - i.e. too many classes causing inefficiencies?

E.g. let’s consider I have the following classes:
Item
ItemProperty which would include objects such as Colour and Size. There's a relation-property of the Item class which lists all of the ItemProperty objects applicable to this Item (i.e. for one item you might need to specify the Colour and for another you might want to specify the Size).
ItemPropertyOption would include objects such as Red, Green (for Colour) and Big, Small (for Size).
Then an Item Object would relate to an ItemProperty, whereas an ItemChoice Object would relate to an ItemPropertyOption (and the ItemProperty which the ItemPropertyOption refers to could be inferred).
The reason for this is so I could then make use of queries much more effectively. i.e. give me all item-choices which are Red. It would also allow me to use the Parse Dashboard to quickly add elements to the site as I could easily specify more ItemPropertys and ItemPropertyOptions, rather than having to add them in the codebase.
This is just a small example and there's many more instances where I'd like to use classes so that 'options' for various drop-downs in forms are in the database and can easily be added and edited by me, rather than hard-coded.
1) I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
2) there could be hundreds of nested properties that I want to access via ‘inverse querying’
So, I can think of 2 potential causes of inefficiency and wanted to know if they’re founded:
Is having lots of classes inefficient?
Is back-querying against nested classes inefficient?
The other option I can think of — if ‘class-bloat’ really is a problem — is to make fields on parent classes that, instead of being nested across other classes (that represent further properties, as above), just representing them as a nested JSON property directly.
The job of designing is to render in object descriptions truths about the world that are relevent to the system's requirements. In the world of the OP's "items", it's a fact that items have color, and it's a relevant fact because users care about an item's color. You'd only call a system inefficient if it consumes computing resources that it doesn't need to consume.
So, for something like a configurator, the fact that we have items, and that those items have properties, and those properties have an enumerable set of possible values sounds like a perfectly rational design.
Is it inefficient or "bloated"? The only place I'd raise doubt is in the explicit assertion that items have properties. Of course they do, but that's natively true of javascript objects and parse entities.
In other words, you might be able to get along with just item and several flavors of propertyOptions: e.g. Item has an attribute called "colorProperty" that is a pointer to an instance of "ColorProperty" (whose instances have a name property like 'red', 'green', etc. and maybe describe other pertinent facts, like a more precise description in RGB form).
There's nothing wrong with lots of classes if they represent relevant truth. Do that first. You might discover empirically that your design is too resource consumptive (I doubt you will in this case), at which point we'd start looking for cheats to be somehow skinnier. But do it the right way first, cheat later only if you must.
Is having lots of classes inefficient?
It's certainly inefficient for poor humans who have to remember what all those classes do and how they're related to each other. It takes time to write all those classes in the first place, and every line that you write is a line that has to be maintained.
Beyond that, there's certainly some cost for each class in any OOP language, and creating more classes than you really need will mean that you're paying more than you need to for the work that you're doing, which is pretty much the definition of inefficient.
I’ll probably be doing this in a similar way for 5+ more similar kinds of class-structures
Maybe you could spend some time thinking about the similarity between these cases and come up with a single set of more flexible classes that you can use in all those cases. Writing general code is harder than writing very specific code, but if you do a good job you'll recoup the extra effort many times over through reuse.

Abstracting away from data structure implementation details in Clojure

I am developing a complex data structure in Clojure with multiple sub-structures.
I know that I will want to extend this structure over time, and may at times want to change the internal structure without breaking different users of the data structure (for example I may want to change a vector into a hashmap, add some kind of indexing structure for performance reasons, or incorporate a Java type)
My current thinking is:
Define a protocol for the overall structure with various accessor methods
Create a mini-library of functions that navigate the data structure e.g. (query-substructure-abc param1 param2)
Implement the data structure using defrecord or deftype, with the protocol methods defined to use the mini-library
I think this will work, though I'm worried it is starting to look like rather a lot of "glue" code. Also it probably also reflects my greater familiarity with object-oriented approaches.
What is the recommended way to do this in Clojure?
I think that deftype might be the way to go, however I'd take a pass on the accessor methods. Instead, look into clojure.lang.ILookup and clojure.lang.Associative; these are interfaces which, if you implement them for your type, will let you use get / get-in and assoc / assoc-in, making for a far more versatile solution (not only will you be able to change the underlying implementation, but perhaps also to use functions built on top of Clojure's standard collections library to manipulate your structures).
A couple of things to note:
You should probably start with defrecord, using get, assoc & Co. with the standard defrecord implementations of ILookup, Associative, IPersistentMap and java.util.Map. You might be able to go a pretty long way with it.
If/when these are no longer enough, have a look at the sources for emit-defrecord (a private function defined in core_deftype.clj in Clojure's sources). It's pretty complex, but it will give you an idea of what you may need to implement.
Neither deftype nor defrecord currently define any factory functions for you, but you should probably do it yourself. Sanity checking goes inside those functions (and/or the corresponding tests).
The more conceptually complex operations are of course a perfect fit for protocol functions built on the foundation of get & Co.
Oh, and have a look at gvec.clj in Clojure's sources for an example of what some serious data structure code written using deftype might look like. The complexity here is of a different kind from what you describe in the question, but still, it's one of the few examples of custom data structure programming in Clojure currently available for public consumption (and it is of course excellent quality code).
Of course this is just what my intuition tells me at this time. I'm not sure that there is much in the way of established idioms at this stage, what with deftype not actually having been released and all. :-)

What is an elegant way to track the size of a set of objects without a single authoritative collection to reference?

Update: Please read this question in the context of design principles, elegance, expression of intent, and especially the "signals" sent to other programmers by design choices.
I have two "views" of a set of objects. One is a dictionary/map indexing the objects by a string value. The other is a dictionary/map indexing the objects by an ordinal (ordering integer). There is no "master" collection of the objects by themselves that can serve as the authoritative source for the number of objects, but the two dictionaries should always both contain references to all the objects.
When a new item is added to the set a reference is added to both dictionaries, and then some processing needs to be done which is affected by the new total number of objects.
What should I use as the authoritative source to reference for the current size of the set of objects? It seems that all my options are flawed in one dimension or another. I can just consistently reference one of the dictionaries, but that would codify an implication of that dictionary's superiority over the other. I could add a 3rd collection, a simple list of the objects to serve as the authoritative list, but that increases redundancy. Storing a running count seems simplest, but also increases redundancy and is more brittle than referencing a collection's self-tracked count on the fly.
Is there another option that will allow me to avoid choosing the lesser evil, or will I have to accept a compromise on elegance?
I would create a class that has (at least) two collections.
A version of the collection that is
sorted by string
A version of the
collection that is sorted by ordinal
(Optional) A master collection
The class would handle the nitty gritty management:
The syncing of the contents for the collections
Standard collection actions (e.g. Allow users get the size, Add or retrieve items)
Let users get by string or ordinal
That way you can use the same collection wherever you need either behavior, but still abstract away the "indexing" behavior you are going for.
The separate class gives you a single interface with which to explain your intent regarding how this class is to be used.
I'd suggest encapsulation: create a class that hides the "management" details (such as the current count) and use it to expose immutable "views" of the two collections.
Clients will ask the "manglement" object for an appropriate reference to one of the collections.
Clients adding a "term" (for lack of a better word) to the collections will do so through the "manglement" object.
This way your assumptions and implementation choices are "hidden" from clients of the service and you can document that the choice of collection for size/count was arbitrary. Future maintainers can change how the count is managed without breaking clients.
BTW, yes, I meant "manglement" - my favorite malapropism for management (in any context!)
If both dictionaries contain references to every object, the count should be the same for both of them, correct? If so, just pick one and be consistent.
I don't think it is a big deal at all. Just reference the sets in the same order each time
you need to get access to them.
If you really are concerned about it you could encapsulate the collections with a wrapper that exposes the public interfaces - like
Add(item)
Count()
This way it will always be consistent and atomic - or at least you could implement it that way.
But, I don't think it is a big deal.

Resources