Using a slice instead of list when working with large data volumes in Go

Using a slice instead of list when working with large data volumes in Go - go

I have a question on the utility of slices in Go. I have just seen Why are lists used infrequently in Go? and Why use arrays instead of slices? but had some question which I did not see answered there.
In my application:
I read a CSV file containing approx 10 million records, with 23 columns per record.
For each record, I create a struct and put it into a linked list.
Once all records have been read, the rest of the application logic works with this linked list (the processing logic itself is not relevant for this question).
The reason I prefer a list and not a slice is due to the large amount of contiguous memory an array/slice would need. Also, since I don't know the size of the exact number of records in the file upfront, I can't specify the array size upfront (I know Go can dynamically re-dimension the slice/array as needed, but this seems terribly inefficient for such a large set of data).
Every Go tutorial or article I read seems to suggest that I should use slices and not lists (as a slice can do everything a list can, but do it better somehow). However, I don't see how or why a slice would be more helpful for what I need? Any ideas from anyone?

... approx 10 million records, with 23 columns per record ... The reason I prefer a list and not a slice is due to the large amount of contiguous memory an array/slice would need.
This contiguous memory is its own benefit as well as its own drawback. Let's consider both parts.
(Note that it is also possible to use a hybrid approach: a list of chunks. This seems unlikely to be very worthwhile here though.)
Also, since I don't know the size of the exact number of records in the file upfront, I can't specify the array size upfront (I know Go can dynamically re-dimension the slice/array as needed, but this seems terribly inefficient for such a large set of data).
Clearly, if there are n records, and you allocate and fill in each one once (using a list), this is O(n).
If you use a slice, and allocate a single extra slice entry every time, you start with none, grow it to size 1, then copy the 1 to a new array of size 2 and fill in item #2, grow it to size 3 and fill in item #3, and so on. The first of the n entities is copied n times, the second is copied n-1 times, and so on, for n(n+1)/2 = O(n2) copies. But if you use a multiplicative expansion technique—which Go's append implementation does—this drops to O(log n) copies. Each one does copy more bytes though. It ends up being O(n), amortized (see Why do dynamic arrays have to geometrically increase their capacity to gain O(1) amortized push_back time complexity?).
The space used with the slice is obviously O(n). The space used for the linked list approach is O(n) as well (though the records now require at least one forward pointer so you need some extra space per record).
So in terms of the time needed to construct the data, and the space needed to hold the data, it's O(n) either way. You end up with the same total memory requirement. The main difference, at first glace anyway, is that the linked-list approach doesn't require contiguous memory.
So: What do we lose when using contiguous memory, and what do we gain?
What we lose
The thing we lose is obvious. If we already have fragmented memory regions, we might not be able to get a contiguous block of the right size. That is, given:
used: 1 MB (starting at base, ending at base+1M)
free: 1 MB (starting at +1M, ending at +2M)
used: 1 MB (etc)
free: 1 MB
used: 1 MB
free: 1 MB
we have a total of 6 MB, 3 used and 3 free. We can allocate 3 1 MB blocks, but we can't allocate one 3 MB block unless we can somehow compact the three "used" regions.
Since Go programs tend to run in virtual memory on large-memory-space machines (virtual sizes of 64 GB or more), this tends not to be a big problem. Of course everyone's situation differs, so if you really are VM-constrained, that's a real concern. (Other languages have compacting GC to deal with this, and a future Go implementation could at least in theory use a compacting GC.)
What we gain
The first gain is also obvious: we don't need pointers in each record. This saves some space—the exact amount depends on the size of the pointers, whether we're using singly linked lists, and so on. Let's just assume 2 8 byte pointers, or 16 bytes per record. Multiply by 10 million records and we're looking pretty good here: we've saved 160 MBytes. (Go's container/list implementation uses a doubly linked list, and on a 64 bit machine, this is the size of the per-element threading needed.)
We gain something less obvious at first, though, and it's huge. Because Go is a garbage-collected language, every pointer is something the GC must examine at various times. The slice approach has zero extra pointers per record; the linked-list approach has two. That means that the GC system can avoid examining the nonexistent 20 million pointers (in the 10 million records).
Conclusion
There are times to use container/list. If your algorithm really calls for a list and is significantly clearer that way, do it that way, unless and until it proves to be a problem in practice. Or, if you have items that can be on some collection of lists—items that are actually shared, but some of them are on the X list and some are on the Y list and some are on both—this calls for a list-style container. But if there's an easy way to express something as either a list or a slice, go for the slice version first. Because slices are built into Go, you also get the type safety / clarity mentioned in the first link (Why are lists used infrequently in Go?).

Related

Calculating size class

High-performance malloc implementations often implement segregated free lists, that is, each of the more common (smaller) sizes gets its own separate free list.
A first attempt at this could say, below a certain threshold, the size class is just the size divided by 8, rounded up. But actual implementations have more nuance, arranging the recognized size classes on something like an exponential curve (but gentler than simply doubling at each step), e.g. http://jemalloc.net/jemalloc.3.html
I'm trying to figure out how to convert a size to a size class on some such curve. Now, in principle this is not difficult; there are several ways to do it. But to achieve the desired goal of speeding up the common case, it really needs to be fast, preferably only a few instructions.
What's the fastest way to do this conversion?

In the dark ages, when I used to worry about those sorts of things, I just iterated through all the possible sizes starting at the smallest one.
This actually makes a lot of sense, since allocating memory strongly implies work outside of the actual allocation -- like initializing and using that memory -- that is proportional to the allocation size. In all but the smallest allocations, that overhead will swamp whatever you spend to pick a size class.
Only the small ones really need to be fast.

Lets assume you want to use all the power of two sizes and a plus half the size, ie 8, 12, 16, 24, 32, 48, 64 etc. ... 4096.
Check the value is less than or equal 4096, I have arbitrarily chosen that to be the highest allocable for this example.
Take the size of your struct, then use the highest set bit times two, and add 1 if the next bit is also set and you get an index into the size list add one more if the value is higher than the value the two bits would give. Should be 5-6 ASM instructions
So 26 is 16+8+2 bits are 4,3,1 4*2 + 1 + 1 so the 10th index is chosen for a 32 byte list.
Your system might require some minimum allocation size.
Also if your doing a lot of allocations, consider using some pool allocator that is private to your program with backup from the system allocator.

Is there an efficient way to store a lookup structure that uses random integer keys?

I need to implement a lookup structure with the following requirements:
Keys are random 128-bit integers
Values are 64-bit
It will be stored on disk
It must be searchable without the entire structure being resident in memory (I intend to memory map the file)
It must be mutable, but writes to disk must be incremental (must not require overwriting the entire structure)
Is there an efficient way to achieve all of this?
Please do not answer, "Don't use UUIDs." I am asking a specific question; changing the requirements changes the question.

Since your keys and values each are a fixed number of bytes, you could implement a hashtable as a file. The first few bytes contain the current number of elements and the current capacity, and then the entries each take up 16 + 8 bytes (if 0 is forbidden as a key) or 1 + 16 + 8 bytes if you need a flag to indicate whether an entry exists or not.
You can hash the key, then use arithmetic to seek to the correct position in the file, then read or write just the entries you need to. To resolve hash collisions, linear probing is probably best to avoid the number of seeks. Since the keys are random, catastrophic collision pileups shouldn't happen, and the hash can simply be to take the lowest k bits of the key, where the current capacity is 2^k.
This takes O(n) space, and allows lookups in O(1) average time, and writes in O(1) amortized time. Occasionally, you have to resize the hashtable to increase the capacity on a write; this takes O(n) time on those occasions.
If you need O(1) writes in the worst-case, you could maintain both the old and new hashtables, do lookups in both, and then on each write operation, copy across two entries from the old to the new. If the capacity is always increased by a factor of 2, then this gives non-amortized constant time writes, except for the cost of allocating an empty hashtable of size O(n). If creating an empty file of a particular size is also too slow for a single write operation, then you can amortize empty-file-creation across many writes too.

How to fix Run Time Error '7' Out of memory in visual basic 6?

I am trying to ZIP a folder with sub folders and files in vb6. For that I read each file and store them one by one in byte array using Redim Preserve. But large folders having size larger than 130MB throw an Out of Memory error.I have 8 GB of RAM in my PC so it shouldn't be a problem.So, is this some limitation by visual basic 6 that we can't use more than 150MB memory?
'Length of a particular File is determined
lngFileLen = FileLen(a_strFilePath)
DoEvents
If lngFileLen <> 0 Then
m_lngPtr = m_lngPtr + lngFileLen
'Next line Throws error once m_lngPtr reaches around 150 MB
ReDim Preserve arrFileBuffer(1 To m_lngPtr)

First of all, VB6 arrays can only be resized to a maximum of 2,147,483,647 elements. However, since that's also the upper limit of a Long in VB6, it seems like that's unlikely to be the problem. However, even though it may be allowed to make an array that big, it's running in a 32-bit process, so it's still subject to the limit of 2GB of addressable memory for the whole process. Since the VB6 run-time has some overhead, it's using some of that memory for other things, and since your program is likely doing other things too, that will be using up some memory too.
In addition to that, when you create an array, the system has to find that number of bytes of contiguous memory. So, even when there is enough memory available, within the 2GB limit, if it's sufficiently fragmented, you can still get out of memory errors. For that reason, creating gigantic arrays is always a concern.
Next, you are using ReDim Preserve, which requires twice the memory. When you resize the array like that, what it actually has to do, under the hood, is create a second array of the new size and then copy all of the other data out of the old array into the new one. Once it's done copying all the data out of the source array, it can then delete it, but while it's performing the copy, it needs to hold both the old array and the new array in memory simultaneously. That means that in a best case scenario, even if there was no other allocated memory or fragmentation, the maximum memory size of an array that you could resize would be 1GB.
Finally, in your example, you never showed what the data type of the array was. If it's an array of bytes, you should be good, I think (where the memory size of the array would only be slightly more than it's length in elements). However, if, for instance, it's an array of strings or variants, then I believe that's going to require a minimum of 4 bytes per element, thereby more-than-quadrupling the memory size of the array.

What's the maximum number of elements a scheme list can have?

I'm working with Chicken Scheme, I wonder how many elements a list can have.

There is no hard limit – it can have as many as there's room for in memory.

The documentation, under the option -:hmNUMBER, it mentions that there is a default max limit of 2GB heap size, which gives you about 45 million pairs. You can increase these with several options, but the simplest for a set default memory limit is -heap-size. Here is how to double the default:
csc -heap-size 4000M <file>
It says under the documentation for -heap-size that it only uses half of the allocated memory at every given time. It might be using the lonely hearts garbage collection algorithm where when memory is full it moves used memory to the unused segment making the old segment the unused one.

Redis 10x more memory usage than data

I am trying to store a wordlist in redis. The performance is great.
My approach is of making a set called "words" and adding each new word via 'sadd'.
When adding a file thats 15.9 MB and contains about a million words, the redis-server process consumes 160 MB of ram. How come I am using 10x the memory, is there any better way of approaching this problem?

Well this is expected of any efficient data storage: the words have to be indexed in memory in a dynamic data structure of cells linked by pointers. Size of the structure metadata, pointers and memory allocator internal fragmentation is the reason why the data take much more memory than a corresponding flat file.
A Redis set is implemented as a hash table. This includes:
an array of pointers growing geometrically (powers of two)
a second array may be required when incremental rehashing is active
single-linked list cells representing the entries in the hash table (3 pointers, 24 bytes per entry)
Redis object wrappers (one per value) (16 bytes per entry)
actual data themselves (each of them prefixed by 8 bytes for size and capacity)
All the above sizes are given for the 64 bits implementation. Accounting for the memory allocator overhead, it results in Redis taking at least 64 bytes per set item (on top of the data) for a recent version of Redis using the jemalloc allocator (>= 2.4)
Redis provides memory optimizations for some data types, but they do not cover sets of strings. If you really need to optimize memory consumption of sets, there are tricks you can use though. I would not do this for just 160 MB of RAM, but should you have larger data, here is what you can do.
If you do not need the union, intersection, difference capabilities of sets, then you may store your words in hash objects. The benefit is hash objects can be optimized automatically by Redis using zipmap if they are small enough. The zipmap mechanism has been replaced by ziplist in Redis >= 2.6, but the idea is the same: using a serialized data structure which can fit in the CPU caches to get both performance and a compact memory footprint.
To guarantee the hash objects are small enough, the data could be distributed according to some hashing mechanism. Assuming you need to store 1M items, adding a word could be implemented in the following way:
hash it modulo 10000 (done on client side)
HMSET words:[hashnum] [word] 1
Instead of storing:
words => set{ hi, hello, greetings, howdy, bonjour, salut, ... }
you can store:
words:H1 => map{ hi:1, greetings:1, bonjour:1, ... }
words:H2 => map{ hello:1, howdy:1, salut:1, ... }
...
To retrieve or check the existence of a word, it is the same (hash it and use HGET or HEXISTS).
With this strategy, significant memory saving can be done provided the modulo of the hash is
chosen according to the zipmap configuration (or ziplist for Redis >= 2.6):
# Hashes are encoded in a special way (much more memory efficient) when they
# have at max a given number of elements, and the biggest element does not
# exceed a given threshold. You can configure this limits with the following
# configuration directives.
hash-max-zipmap-entries 512
hash-max-zipmap-value 64
Beware: the name of these parameters have changed with Redis >= 2.6.
Here, modulo 10000 for 1M items means 100 items per hash objects, which will guarantee that all of them are stored as zipmaps/ziplists.

As for my experiments, It is better to store your data inside a hash table/dictionary . the best ever case I reached after a lot of benchmarking is to store inside your hashtable data entries that are not exceeding 500 keys.
I tried standard string set/get, for 1 million keys/values, the size was 79 MB. It is very huge in case if you have big numbers like 100 millions which will use around 8 GB.
I tried hashes to store the same data, for the same million keys/values, the size was increasingly small 16 MB.
Have a try in case if anybody needs the benchmarking code, drop me a mail

Did you try persisting the database (BGSAVE for example), shutting the server down and getting it back up? Due to fragmentation behavior, when it comes back up and populates its data from the saved RDB file, it might take less memory.
Also: What version of Redis to you work with? Have a look at this blog post - it says that fragmentation has partially solved as of version 2.4.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio