Is there any algorithms to sort data from serial input using buffer which is smaller than data length?
For example, I have 100 bytes of serial data, which can be read only once, and 40 bytes buffer. And I need to print out sorted bytes.
I need it in Javascript, but any general ideas are appreciated.
This kind of sorting is not possible in a single pass.
Using your example: suppose you have filled your 40 byte buffer, so you need to start printing out bytes in order to make room for the next one. In order to print out sorted data, you must print the smallest byte first. However, if the smallest byte has not been read, you can't possibly print it out yet!
The closest relevant fit to your question may be external sorting algorithms, which take multiple passes in order to sort data that can't fit into memory. That is, if you have peripherals that can store the output of a processing pass, you can sort data larger than your memory in O(log(N/M)) passes, where N is the size of the problem, and M is the size of your memory.
The classic storage peripheral for external sorting is the tape drive; however, the same algorithms work for disk drives (of whatever kind). Also, as cache hierarchies grow in depth, the principles of external sorting become more relevant even for in-memory sorts -- try taking a look at cache-oblivious algorithms.
Related
I have a question on the utility of slices in Go. I have just seen Why are lists used infrequently in Go? and Why use arrays instead of slices? but had some question which I did not see answered there.
In my application:
I read a CSV file containing approx 10 million records, with 23 columns per record.
For each record, I create a struct and put it into a linked list.
Once all records have been read, the rest of the application logic works with this linked list (the processing logic itself is not relevant for this question).
The reason I prefer a list and not a slice is due to the large amount of contiguous memory an array/slice would need. Also, since I don't know the size of the exact number of records in the file upfront, I can't specify the array size upfront (I know Go can dynamically re-dimension the slice/array as needed, but this seems terribly inefficient for such a large set of data).
Every Go tutorial or article I read seems to suggest that I should use slices and not lists (as a slice can do everything a list can, but do it better somehow). However, I don't see how or why a slice would be more helpful for what I need? Any ideas from anyone?
... approx 10 million records, with 23 columns per record ... The reason I prefer a list and not a slice is due to the large amount of contiguous memory an array/slice would need.
This contiguous memory is its own benefit as well as its own drawback. Let's consider both parts.
(Note that it is also possible to use a hybrid approach: a list of chunks. This seems unlikely to be very worthwhile here though.)
Also, since I don't know the size of the exact number of records in the file upfront, I can't specify the array size upfront (I know Go can dynamically re-dimension the slice/array as needed, but this seems terribly inefficient for such a large set of data).
Clearly, if there are n records, and you allocate and fill in each one once (using a list), this is O(n).
If you use a slice, and allocate a single extra slice entry every time, you start with none, grow it to size 1, then copy the 1 to a new array of size 2 and fill in item #2, grow it to size 3 and fill in item #3, and so on. The first of the n entities is copied n times, the second is copied n-1 times, and so on, for n(n+1)/2 = O(n2) copies. But if you use a multiplicative expansion technique—which Go's append implementation does—this drops to O(log n) copies. Each one does copy more bytes though. It ends up being O(n), amortized (see Why do dynamic arrays have to geometrically increase their capacity to gain O(1) amortized push_back time complexity?).
The space used with the slice is obviously O(n). The space used for the linked list approach is O(n) as well (though the records now require at least one forward pointer so you need some extra space per record).
So in terms of the time needed to construct the data, and the space needed to hold the data, it's O(n) either way. You end up with the same total memory requirement. The main difference, at first glace anyway, is that the linked-list approach doesn't require contiguous memory.
So: What do we lose when using contiguous memory, and what do we gain?
What we lose
The thing we lose is obvious. If we already have fragmented memory regions, we might not be able to get a contiguous block of the right size. That is, given:
used: 1 MB (starting at base, ending at base+1M)
free: 1 MB (starting at +1M, ending at +2M)
used: 1 MB (etc)
free: 1 MB
used: 1 MB
free: 1 MB
we have a total of 6 MB, 3 used and 3 free. We can allocate 3 1 MB blocks, but we can't allocate one 3 MB block unless we can somehow compact the three "used" regions.
Since Go programs tend to run in virtual memory on large-memory-space machines (virtual sizes of 64 GB or more), this tends not to be a big problem. Of course everyone's situation differs, so if you really are VM-constrained, that's a real concern. (Other languages have compacting GC to deal with this, and a future Go implementation could at least in theory use a compacting GC.)
What we gain
The first gain is also obvious: we don't need pointers in each record. This saves some space—the exact amount depends on the size of the pointers, whether we're using singly linked lists, and so on. Let's just assume 2 8 byte pointers, or 16 bytes per record. Multiply by 10 million records and we're looking pretty good here: we've saved 160 MBytes. (Go's container/list implementation uses a doubly linked list, and on a 64 bit machine, this is the size of the per-element threading needed.)
We gain something less obvious at first, though, and it's huge. Because Go is a garbage-collected language, every pointer is something the GC must examine at various times. The slice approach has zero extra pointers per record; the linked-list approach has two. That means that the GC system can avoid examining the nonexistent 20 million pointers (in the 10 million records).
Conclusion
There are times to use container/list. If your algorithm really calls for a list and is significantly clearer that way, do it that way, unless and until it proves to be a problem in practice. Or, if you have items that can be on some collection of lists—items that are actually shared, but some of them are on the X list and some are on the Y list and some are on both—this calls for a list-style container. But if there's an easy way to express something as either a list or a slice, go for the slice version first. Because slices are built into Go, you also get the type safety / clarity mentioned in the first link (Why are lists used infrequently in Go?).
I need to implement a lookup structure with the following requirements:
Keys are random 128-bit integers
Values are 64-bit
It will be stored on disk
It must be searchable without the entire structure being resident in memory (I intend to memory map the file)
It must be mutable, but writes to disk must be incremental (must not require overwriting the entire structure)
Is there an efficient way to achieve all of this?
Please do not answer, "Don't use UUIDs." I am asking a specific question; changing the requirements changes the question.
Since your keys and values each are a fixed number of bytes, you could implement a hashtable as a file. The first few bytes contain the current number of elements and the current capacity, and then the entries each take up 16 + 8 bytes (if 0 is forbidden as a key) or 1 + 16 + 8 bytes if you need a flag to indicate whether an entry exists or not.
You can hash the key, then use arithmetic to seek to the correct position in the file, then read or write just the entries you need to. To resolve hash collisions, linear probing is probably best to avoid the number of seeks. Since the keys are random, catastrophic collision pileups shouldn't happen, and the hash can simply be to take the lowest k bits of the key, where the current capacity is 2^k.
This takes O(n) space, and allows lookups in O(1) average time, and writes in O(1) amortized time. Occasionally, you have to resize the hashtable to increase the capacity on a write; this takes O(n) time on those occasions.
If you need O(1) writes in the worst-case, you could maintain both the old and new hashtables, do lookups in both, and then on each write operation, copy across two entries from the old to the new. If the capacity is always increased by a factor of 2, then this gives non-amortized constant time writes, except for the cost of allocating an empty hashtable of size O(n). If creating an empty file of a particular size is also too slow for a single write operation, then you can amortize empty-file-creation across many writes too.
I need to search for a specific record in a large file. The search will be performed on a microprocessor (ESP8266), so I'm working with limited storage and RAM.
The list looks like this:
BSSID,data1,data2
001122334455,float,float
001122334466,float,float
...
I was thinking using an index to speed up the search. The data are static, and the index will be built on a computer and then loaded onto the microcontroller.
What I've done so far is very simplistic.
I created an index of the first byte of the BSSID and points at the first and last values with that BSSID prefix.
The performance is terrible, but the index file is very small and uses very little RAM. I though to go further with this method, taking a look at the first two bytes, but the index table will be 256 times larger, resulting in a table 1/3 the size of the data file.
This is the index with the first method:
00,0000000000,0000139984
02,0000139984,0000150388
04,0000150388,0000158812
06,0000158812,0000160900
08,0000160900,0000171160
What indexing algorithm do you suggest that I use?
EDIT:Sorry I didn't include enough background before.I'm storing the data and index file on the flash memory of the chip. I have at the moment 30000 records, but this number could potentially grow until the chips momery limit is hit. The set is indeed static when is stored on the microcontroller but could be updated in a second moment with the help of a computer.The data isn't spread simmetrically between indexes.My goal is to find a good compromise between search speed, index size and RAM used.
I'm not sure where you're stuck, but I can comment on what you've done so far.
Most of all, the way to determine the "best" method is to
define "best" for your purposes;
research indexing algorithms (basic ones have been published for over 50 years);
choose a handful to implement;
Evaluate those implementations according to your definition of "best".
Keep in mind your basic resource restriction: you have limited RAM. If method requires more RAM than you have, it doesn't work, and is therefore infinitely slower than any method that does work.
You've come close to a critical idea, however: you want your index table to expand to consume any free RAM, using that space as effectively as possible. If you can index 16 bits instead of 8 and still fit the table comfortably into your available space, then you've cut down your linear search time by roughly a factor of 256.
Indexing considerations
Don't put the ending value in each row: it's identical to the starting value in the next row. Omit that, and you save one word in each row of the table, giving you twice the table room.
Will you get better performance if you slice the file into equal parts (same quantity of BSSIDS for each row of your table), and then store the entire starting BSSID with its record number? If your BSSIDs are heavily clumped, this might improve your overall processing, even though your table had fewer rows. You can't use a direct index in this case; you have to search the first column to get the proper starting point.
Does that move you toward a good solution?
Not sure how much memory you got (I am not familiar with that MCU) but do not forget that these tables are static/constant so they can be stored in EEPROM instead of RAM some chips have quite a lot of EEPROM usually way more than RAM...
Assume your file is sorted by the index. So You you got (assuming 32bit address) per each entry:
BYTE ix, DWORD beg,DWORD end
Why not this:
struct entry { DWORD beg,end };
entry ix0[256];
Where the first BYTE is also address in index array. This will spare 1 Byte per entry
Now as Prune suggested you can ignore the end address as you will scan the following entries in file anyway until you hit the correct index or index with different first BYTE. so yo can use:
DWORD ix[256];
where yo have only start address beg.
Now we do not know how many entries you actually have nor how many entries will share the same second BYTE of index. So we can not do any further assumption to improve...
You wanted to do something like:
DWORD ix[65536];
But have not enough memory for it ... how about doing something like this instead:
const N=1024; // number of entries you can store
const dix=(max_index_value+1)/N;
const ix[N]={.....};
so each entry ix[i] will cover all the indexes from i*dix to ((i+1)*dix)-1. So to find index you do this:
i = ix[index/dix];
for (;i<file_size;)
{
read entry from file at i-th position;
update position i;
if (file_index==index) { do your stuff; break; }
if (file_index> index) { index not found; break; }
}
To improve performance you can rewrite this linear scan into binary search between address of ix[index/dix] and ix[(index/dix)+1] or file size for the last index ... assuming each entry in file has the same size ...
Suppose I have an int array of 10 elements. With a 64 byte cacheline, it can hold 16 array elements from arr[0] to arr[15].
I would like to know what happens when you fetch, for example, arr[5] from the L1 cache into a register. How does this operation take place? Can the cpu pick an offset into a cacheline and read the next n bytes?
The cache will usually provide the full line (64B in this case), and a separate component in the MMU would rotate and cut the result (usually some barrel shifter), according to the requested offset and size. You would usually also get some error checks (if the cache supports ECC mechanisms) along the way.
Note that caches are often organized in banks, so a read may have to fetch bytes from multiple locations. By providing a full line, the cache can construct the bytes in proper order first (and perform the checks), before letting the MMU pick the relevant part.
Some designs focusing on power saving may decide to implement lower granularity, but this is often only adding complexity as you may have to deal with more cases of line segments being split.
How can I calculate storage when FTPing to MainFrame? I was told LRECL will always remain '80'. Not sure how I can calculate PRI and SEC dynamically based on the file size...
QUOTE SITE LRECL=80 RECFM=FB CY PRI=100 SEC=100
If the site has SMS, you shouldn't need to, but if you need to calculate the number of tracks is the size of the file in bytes divided by 56,664, or the number of cylinders is the size of the file in bytes divided by 849,960. In either case, you would round up.
Unfortunately IBM's FTP server does not support the newer space allocation specifications in number of records (the JCL parameter AVGREC=U/M/K plus the record length as the first specification in the SPACE parameter).
However, there is an alternative, and that is to fall back on one of the lesser-used SPACE parameters - the blocksize specification. I will assume 3390 disk types for simplicity, and standard data sets.
For fixed-length records, you want to calculate the largest number that will fit in half a track (27994 bytes), because z/OS only supports block sizes up to 32760. Since you are dealing with 80-byte records, that number is 27290. Divide your file size by that number and that will give you the number of blocks. Then in a SITE server command, specify
SITE BLKSIZE=27920 LRECL=80 RECFM=FB BLOCKS=27920 PRI=calculated# SEC=a_little_extra
This is equivalent to SPACE=(27920,(calculated#,a_little_extra)).
z/OS space allocation calculates the number of tracks required and rounds up to the nearest track boundary.
For variable-length records, if your reading application can handle it, always use BLKSIZE=27994. The reason I have the warning about the reading application is that even today there are applications from ISVs that still have strange hard-coded maximum variable length blocks such as 12K.
If you are dealing with PDSEs, always use BLKSIZE=32760 for variable-length and the closest-to-32760 for fixed-length in your specification (32720 for FB/80), but calculate requirements based on BLKSIZE=4096. PDSEs are strange in their underlying layout; the physical records are 4096 bytes, which is because there is some linear data set VSAM code that handles the physical I/O.