How To Empty a Dynamic Array [closed] - performance

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need to re-use a dynamic arrays many times as I consider it a better performance.
Hence, I don't need to create a new dynamic array every time I need it.
I want to ask if it can lead to bugs and inefficiency if I use the same array for several instructions then clear it and reuse it? And how can I correct my procedure, so, it might approach my need.
My code :
procedure Empty(local_array : array of Integer);
var
i : Integer;
begin
for i:= 0 to high(local_array) do
local_array[i]:= nil;
Setlength(local_array, 0);
end;

If you want to reuse your array don't mes with its size. Changing the size of an array or more specifically increasing it is what could lead to the need for data reallocation.
What is array data reallocation?
In Delphi all arrays need to be stored in continuous memory block. This means that if you are trying to increase the size of your array and there already some data after memory block that is currently assigned to your array the whole array needs to be moved to another memory location where there is enough space to store the new array size in one continuous memory block.
So instead of resizing your array leave its size alone and just set value of array items to some default value. Yes this means that such array will still occupy its allocated memory. But that is goal of reusing such array as you avoid overhead for allocating/deallocating memory to your array.
If you go this way don't forget to store your own count of used items in your array since its length may be larger than the number of item actually used.

Related

Distinction between a data structure's members being stored by hash value and by index [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
From Picking the right data structure in Swift:
Like we took a look at in “The power of sets in Swift”, one of the big advantages that sets have over arrays is that both inserts and removals can always be performed in constant (O(1)) time, since members are stored by hash value, rather than by index.
What does it mean if a data structure's members are stored by hash value rather than by index?
Arrays are allocated as single, large blocks of memory and entries are accessed by their indexes. The order of entries is fixed and they need have no particular identity apart from their position in the array.
Other more complex data structures allow one to store objects identified and accessed using some sort of key. (Hash tables, sets, dictionaries, ...) Let's call these "keyed collections". Some objects have a natural key e.g. "SocialSecurityNumber" but what should one do if a key is needed and there are no obvious candidate field/s in the data object?
Hashing is a technique which sets out to derive a "fairly unique identity" to associate with an object. Think of it as mapping numbers to (arbitrary) data.
Although there are some "standard hashing techniques", this is still a field that is evolving - involving some interesting mathematics.
Hashes have purposes including secure hashing (to detect and prevent deliberate tampering with data), error detection and - in this case - keyed (or hashed) data access.
A non-secure hash algorithm should be as fast as possible BUT optimising for speed can involve a trade-off against the "fairly unique" part of the mapping requirement (while secure hashing is unavoidably - and sometimes deliberately - more slow and expensive)
Hashing cannot (ever) guarantee that a given hash value is unique to an object and so attention has to be given to minimising the occurrence of "collisions" and optimising how to deal with them when they occur. This is a difficult subject on its own, when you consider that data has to be treated as "arbitrary" - either appearing to be random, to contain sequences/patterns and/or with duplication.
With that said, assuming we have a "good" hash function, we can - in principle at least - store arbitrary objects in keyed collections.
Important considerations
Arrays offer extremely fast sequential and random access (by index), while insert, delete and growth operations are slow.
Keyed collections have the advantage you quote of offering extremely fast inserts and deletes, but they are granular in nature and introduce complexities such as memory fragmentation (memory management is an overhead, added complexity means added cost).
Performance degrades rapidly when collisions start occurring.
There is no such thing as a free lunch and calculating hashes is relatively expensive (compared to simply using an index value or stored key).
There is a specific downside to hashes that "natural keys" and indexes do not have, which is they do not offer a natural ordering/sequence. (Processing objects sequentially according to their hash values is tantamount to processing them randomly.)
It is always important to choose data structures appropriate to their intended use (but that's what the link you quote is all about;-)
You are actually asking what is the difference between Array and Hash map/table/set. This is part of computer science "Data Structures" course and I am sure you can google some high level overview of each. Highly recommended :)
In short:
You can imagine an array as a long shelf with cells, where each cell has sequence number (a.k.a. index):
Array: [ dog ][ cat ][ mouse ][ fox ]...
where dog is at cell #0, cat is at #1 and so on.
Now, in array you can retrieve objects using cell index, like "Give me the content of cell #1". But in order to find out if you have a "mouse" in your array - you will have to iterate over all the cells. (Inefficient)
Sets (a.k.a. Hash maps) store objects using another index - "hash code", which is kind of a function that calculates some pseudo-unique number per given object (without going into details). So cat and mouse will have unique hash codes and now for Set it is very efficient to find out if you have a "mouse" in the Set.

From Log value to Exponential value, huge Distortion for prediction of machine learning algorithm [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I build a machine learning algorithms to predict Y' value. For this, I used Log value of Y for data scaling.
As I got the predicted Y' and actual Y value, I have to convert Log value of Y&Y' to Exponential value.
BUT, there was so huge distortion from the values over exp7 (=ln1098)... It makes a lot of MSE(error).
How can I avoid this huge distortion?? (Generally, I need to get values over 1000)
Thanks!!
For this, I used Log value of Y for data scaling.
Not for scaling, but to make target variable distribution normal.
If your MSE arises when real target value arises too - it means that the model simply can't fit enough on big values. Usually it can be solved by cleaning data (removing outliers). Or take another ML-model.
UPDATE
You can run KFold and for each fold calculate MSE/MAE between predicted and real values. Then take big errors and take a look which parameters/features this cases have.
You can eliminate cases with big errors, but it's usually dangerous.
In general bad fit on big values mean that you did not remove outliers from your original dataset. Plot histograms and scatter plots and make sure that you don't have them.
Check categorical variables: maybe you have small values (<=5%). If so, group them.
Or you need to create 2 models: one for small values, one for big ones.

How to store set of numbers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a set of 1,000,000 unique numbers. The numbers are in the interval between 0 an 50,000,000. Consider, the numbers are random. I need a data structure which would hold them all. The data-structure should require as little memory as possible. It should be possible to find quickly whether the number is in the set with no errors.
I found a solution with bloom filter. Yes, bloom filter has a probability of false positives, but since there are "just" 50,000,000 possible numbers, I can find all the errors and keep them in the std::set. By this method, I'm able to store all the numbers in 2.3MB of memory.
Can you find a better method?
Rather than a range of 0 to 50,000,000, how about 1,024 separate ranges of 65,536? That'd give you a 64 MB range. I suppose you can make it 763 rather than 1,024, which will give you 50,003,968.
Something like ushort[763][];
Now you're storing 1,000,000 16-bit values rather than 32-bit values.
The values in the rows are sorted. So to determine if a number is in the set, you divide the number by 763 to figure out which array to look in, and then do a binary search on number % 65536.
Storage for the numbers themselves is 2,000,000 bytes. Plus a small amount of overhead for the arrays.
This will be faster in execution, smaller than your Bloom filter approach, no false positives, and a whole lot easier to implement.
The minimum space to store such a vector in general is 884,002 bytes. That stores an integer index (a very large integer) into the list of all possible choices of 1,000,000 out of 50,000,000.
You can get close to that with a simple, fast byte encoding. Given the sorted list of numbers, replace each number with the difference from the last number. (Assume that -1 precedes the first number.) The differences are all one or more, so subtract one. If the result is 254 or less, code it as a byte. Otherwise, write 255, and follow with two bytes with a larger difference minus 255. If it doesn't fit in that, then write three 255's, and follow with three bytes with the difference. This will almost always code the vector in less than 1,012,000 bytes.

Difference between hash tables and random access tables [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
What is the difference between Hash tables and random access tables. I feel that they are similar , but wanted to find out the exact differences, Googling did not help me much.
In general Hash Tables are there to be able to map things like various entities to other entities. Depending on programming language it may be mapping tuples to strings, strings to objects, strings to strings and so on - infinite possibilities.
Regular arrays let you address entities using integer index:
array[index] ==> string for example
On the contrary hash maps aka hash tables aka dictionaries aka associative arrays aka hashes etc let you - among other possibilities - map a string to integer for example:
hash_map['Bill'] => 23 etc
For basic understanding go to:
wiki hash tables
Python dicts
PHP arrays
For more advanced understanding I recommend these 2 books:
'Algorithms' by Sadgewick
'Data Structures and Algorithms' by Drozdek
A hash table (aka hash map, or associative array or dictionary or just a hash) is a specific type of random access data structure.
A Hash Table is "random access" because it allows direct, "indexed" access to individual members in constant time. An Array may also be considered a random access data structure, because you can fetch an individual element by its index.
In contrast, a linked list is not a random access data structure, because you need to iterate through its members to find a specific element.

Separate objects from binary volume [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm using MATLAB.
I have a three dimensional array filled with logicals. This array represents data of a cylinder with N uniformly shaped, but arbitrary orientated staples in it. The volume is discretized in voxels (3 dimensional pixels) and a logical '1' means 'at this point in the cylinder IS a part of a staple', while a '0' means 'at this point in the cylinder is air'.
The following picture contains ONE two dimensional slice of the full volume. Imagine the complete volume composed of such slices. White means '1' and black means '0'.
To my problem now: I have to separate each staple as good as possible.
The output products should be N three dimensional arrays with only the voxels belonging to a certain staple being '1', everything else '0'. So that I have arrays that only contain the data of one staple.
The biggest problem is, that '1's of different staples can lie next to each other (touching each other and being entangled), making it difficult to decide to which staple they belong to.
Simplifying is the fact, that boundary voxels of a staple may be cut away, I can work with any output array which preserves the approximate shape of the original staple.
Maybe somebody of you can provide an idea how such a problem could be solved, or even name me algorithms which I can take a look at.
Thanks in advance.
Since the staples are many pixel objects, you can reduce noise using 3d median filtering or bwareaopen to start with. Then bwlabeln can be used to label connected components in the binary array. Then you can use
REGIONPROPS to further analyze each connected object, and see if this is a standalone staple or more. This can be done using features such as 'Perimeter' to identify different cases, but you'll have to investigate yourself these and other regionprops features .

Resources