3, 180, 43, 2, 191, 88, 190, 14, 181, 44, 186, 253
You are asked to optimize a cache design for the given references. There are
three direct-mapped cache designs possible, all with a total
of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3
has 4-word blocks. In terms of miss rate, which cache design is the
best? If the miss stall time is 25 cycles, and C1 has an access time
of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the
best cache design?
Okay, so that's the question I need to answer, and I am kind of confused. I understand how a cache works, and I understand how to calculate a miss and hit depending on the tag and index and what not. But what my question is, how many blocks am I using for these caches? I know that we're using 3 different caches with different word-blocks, so we can place more addresses into a block, for C2 for example we can place in 2 words, so 2 addresses. But what does it mean when it says "8 words of data"? I am having trouble understanding this question.
I assume that the more word-blocks there are, the better the hit rate, since we're able to store more addresses. But what does 8 words of data mean exactly, I guess that's my question?
Caches do not only contain data, there is information which has to be kept but isn't usable in the sense that it represents data which will be retrieved and operated with. With this, words of data means 4-byte-long continuous cache storage segments meant for storing data.
The Index and Tag fields are, for example, not data.
See www.csbio.unc.edu/mcmillan/Media/Comp411S12PS6Sol.pdf and https://cseweb.ucsd.edu/classes/su07/cse141/cache-handout.pdf
Related
I want to assign numbers to people as their ID. As these numbers will have to be remembered, entered manually, and written on paper often, I would like to make sure that the numbers are selected to decrease the risk of misremembering or mistyping one person's ID and by accident typing some other person's ID. IDs must be numerical because they also must be encoded in barcode.
Smaller example: For 10 people using the numbers 1 to 10 makes it easy to mistype say 5 instead of 4. Instead, if people were assigned 16, 29, 31, 48, 57, 62, 75, 83, 94, this risk should be reduced as every number has no valid direct neighbors, any digit is unique in any position and misremembering another valid number is less likely.
In reality I need to assign numbers to 1000 people using numbers with six digits.
I am looking for an algorithm that can select these numbers. Preferably it would take into consideration both memorability and the risk of writing or inputting in another valid number by mistake. Unfortunately I cannot immediately describe how to measure these two factors. I was hoping that there exists some standard solution that I am just unable to find for the lack of keywords.
I have also thought about checksums but they do not work on paper. Assigning a second number to participants so a wrong input can be caught by the mismatch is not feasible and faces similar difficulties on paper.
You are in the realm of error correction and error detection codes. Choose your poison.
Simplest way would be to dedicate a digit as a control digit, which allows you to easily detect errors of one typo. For 1000 people and 6 digits (1,000,000 numbers) you can have 2 such digits and allow an easy detection of 2 typos.
We have a whole bunch of machines which use a whole bunch of data stores. We want to transfer all the machines' data to new data stores. These new stores vary in the amount of storage space available to the machines. Furthermore each machine varies in the amount of data it needs stored. All the data of a single machine must be stored on a single data store; it cannot be split. Other than that, it doesn't matter how the data is apportioned.
We currently have more data than we have space, so it is inevitable that some machines will need to have their data left where it is, until we find some more. In the meantime, does anyone know an algorithm (relatively simple: I'm not that smart) that will provide optimal or near-optimal allocation for the storage we have (i.e. the least amount of space left over on the new stores, after allocation)?
I realise this sounds like a homework problem, but I assure you it's real!
At first glance this may appear to be the multiple knapsack problem (http://www.or.deis.unibo.it/knapsack.html, chapter 6.6 "Multiple knapsack problem - Approximate algorithms"), but actually it is a scheduling problem because it involves a time element. Needless to say it is complicated to solve these types of problems. One way is to model them as network flow and use a network flow library like GOBLIN.
In your case, note that you actually do not want to fill the stores optimally, because if you do that, smaller data packages will be more likely to be stored because it will lead to tighter packings. This is bad because if large packages get left on the machines then your future packings will get worse and worse. What you want to do is prioritize storing larger packages, even if that means leaving more extra space on the stores, because then you will gain more flexibility in the future.
Here is how to solve this problem with a simple algorithm:
(1) Determine the bin sizes and sort them. For example, if you have 3 stores with space 20 GB, 45 GB and 70 GB, then your targets are { 20, 45, 70 }.
(2) Sort all the data packages by size. For example, you might have data packages: { 2, 2, 4, 6, 7, 7, 8, 11, 13, 14, 17, 23, 29, 37 }.
(3) If any of the packages sum to > 95% of a store, put them in that store and go to step (1). Not the case here.
(4) Generate all the permutations of two packages.
(5) If any of the permutations sum to > 95% of a store, put them in that store. If there is a tie, prefer a combination with a bigger package. In my example, there are two such pairs { 37, 8 } = 45 and { 17, 2 } = 19. (Notice that using { 17, 2 } trumps using { 13, 7 }). If you find one or more matches, go back to step (1).
Okay, now we just have one store left: 70 and the following packages: { 2, 4, 6, 7, 7, 11, 13, 14, 23, 29 }.
(6) Increase the number of perms by 1 and go to Step 5. For example, in our case we find that no 3-perm adds to over 95% of 70, but the 4 perm { 29, 23, 14, 4 } = 70. At the end we are left with packages { 2, 6, 7, 7, 11, 13 } that are left on the machines. Notice these are mostly the smaller packages.
Notice that perms are tested in reverse lexical order (biggest first). For example, if you have "abcde" where e is the biggest, then the reverse lexical order for 3-perms is:
cde
bde
ade
bce
ace
etc.
This algorithm is very simple and will yield a good result for your situation.
Since I am unsure how to phrase the question I will illustrate it with an example that is very similar to what I am trying to achieve.
I am looking for a way to optimize the amount of time it takes to perform the following task.
Suppose I have three sets of numbers labeled "A", "B", and "C", each set containing an arbitrary number of integers.
I receive a stack of orders that ask for a "package" of numbers, each order asking for a particular combination of integers, one from each set. So an order might look like "A3, B8, C1", which means I will need to grab a 3 from set A, an 8 from set B, and a 1 from set C.
The task is simple: grab an order, look at the numbers, then go collect them and put them together into a "package".
It takes awhile for me to collect the numbers, and often times an order comes in asking for the same numbers as a previous order, so I decide to store all of the packages for later retrieval; this way, the amount of time it takes for me to process a duplicate order would be dramatically reduced rather than having to go and collect the same numbers again.
The amount of time it takes to collect a number is quite long, but not as long as examining each package one by one, if I have a lot of orders that day.
So for example if I have the following sets of numbers and orders
set A: [1, 2, 3]
set B: [4, 5, 6, 12, 18]
set C: [7, 8]
Order 1: A1, B6, C7
Order 2: A3, B5, C8
Order 3: A1, B6, C7
I would put together packages for orders 1 and 2, but then I notice that order 3 is a duplicate order so I can choose to just take the package I put together for the first order and finish this last order quickly.
The goal is to optimize the amount of time taken to process a stack of orders. Currently I have come up with two methods, but perhaps there may be more ways to do things
Gather the numbers for each order, regardless whether it's a duplicate or not. I end up with a lot of packages in the end, and for extreme cases where someone places a bulk order for 50 identical packages, it's clearly a waste of time
check whether the package already exists in cache, perhaps using some sort of hashing method on the orders.
Any ideas?
There is not much detail given about how you fetch the data to compose packages etc. This makes it hard to come up with different solutions to your problem. For example, maybe existing packages could lead you to the data you need to compose new packages, although they differ in one way or another. For this there are actually dedicated hashing methods available like Locality Sensitive Hashing.
Given the two approaches you came up with, it sounds very natural to go for route 2. Hashing in the indices sounds trivial (first order is easily identified by the number 167, or string "167", right?) and therefore you would have no real drawback from using a hash. Maybe memory constraints as you need to keep old packages around. There are also common methods out there to define which packages to keep in the (hashed) cache and which ones to throw away.
Without knowing the exact timings is is not possible to be definitive, but it looks to me as if your idea 2, using some sort of hash table to store previous orders is the way to go.
I want to search a text document for occurrences of keyphrases from a database of keyphrases (extracted from wikipedia article titles). (ie. given a document i want to find whether any of the phrases have a corresponding wikipedia article) I found out about the Aho-Corasick algorithm. I want to know if building an Aho-Corasick automaton for dictionary of millions of entries is efficient and scalable.
Let's just make a simple calculations :
Assume that you have 1 million patterns (strings, phrases) with average length 10 characters and a value (label, token, pointer etc) of 1 word (4 bytes) length , assigned to each pattern
Then you will need an array of 10+4=14 million bytes (14Mb) just to hold the list of patterns.
From 1 million patterns 10 bytes (letters, chars) each you could build an AC trie with no more than 10 million nodes. How big is this trie in practice depends on the size of each node.
It should at least keep 1 byte for a label (letter) and word (4bytes) for a pointer to a next node in trie (or a pattern for a terminal node) plus 1 bit (boolean) to mark terminal node,
Total about 5 bytes
So, the minimum size of a trie for 1 million patterns 10 chars you will need min 50 million bytes or about 50 Mb of memory.
In practice it might be 3-10 times more , but yet is very-very manageable, as even 500Mb memory is very moderate today. (Compare it with Windows applications like Word or Outlook)
Given that in terms of speed Aho-Corasick (AC) algorithm is almost unbeatable, it still remains the best algorithm for multiple pattern match ever. That's my strong personal educated opinion apart from academic garbage .
All reports of "new" latest and greatest algorithms that might outperform AC are highly exaggerated (except maybe for some special cases with short patterns like DNA)
The only improvement of AC could in practice go along the line of more and faster hardware (multiple cores, faster CPUs, clusters etc)
Don't take my word for it, test it for yourself. But remember that real speed of AC strongly depends on implementation (language and quality of coding)
In theory, it should maintain linear speed subject only to the effects of the memory hierarchy - it will slow down as it gets too big to fit in cache, and when it gets really big, you'll have problems if it starts getting paged out.
OTOH the big win with Aho-Corasick is when searching for decent sized substrings that may occur at any possible location within the string being fed in. If your text document is already cut up into words, and your search phrases are no more than e.g. 6 words long, then you could build a hash table of K-word phrases, and then look up every K-word contiguous section of words from the input text in it, for K = 1..6.
(Answer to comment)
Aho-Corasick needs to live in memory, because you will be following pointers all over the place. If you have to work outside memory, it's probably easiest to go back to old-fashioned sort/merge. Create a file of K-word records from the input data, where K is the maximum number of words in any phrase you are interested in. Sort it, and then merge it against a file of sorted Wikipedia phrases. You can probably do this almost by hand on Unix/Linux, using utilities such as sort and join, and a bit of shell/awk/perl/whatever. See also http://en.wikipedia.org/wiki/Key_Word_in_Context (I'm old enough to have actually used one of these indexes, provided as bound pages of computer printout).
Well there is a workaround. By writing the built AC trie of dictionary into a text file in a xml-like format, making an index file for the first 6 levels of that trie, etc... In my tests I search for all partial matches of a sentence in the dictionary (500'000 entries), and I get ~150ms for ~100 results for a sentence of 150-200 symbols.
For more details, check out this paper : http://212.34.233.26/aram/IJITA17v2A.Avetisyan.doc
There are other ways to get performance:
- condense state transitions: you can get them down to 32 bits.
- ditch the pointers; write the state transitions to a flat vector.
- pack nodes near the tree root together: they will be in cache.
The implementation takes about 3 bytes per char of the original pattern set,
and for 32-bit nodes, can take a pattern space of about 10M chars.
For 64-bit nodes, have yet to hit (or figure) the limit.
Doc: https://docs.google.com/document/d/1e9Qbn22__togYgQ7PNyCz3YzIIVPKvrf8PCrFa74IFM/view
Src: https://github.com/mischasan/aho-corasick
Which data structure/s is used in implementation of editors like notepad. This data structure should be extensible, and should support various features like edition, deletion, scrolling, selection of range of text etc?
We wrote an editor for an old machine (keep in mind that this was a while ago, about 1986, so this is from memory, and the state of the art may have advanced somewhat since then) which we managed to get to scream along, performance wise, by using fixed memory blocks from self-managed pools.
It had two pools, each containing a fixed number of specific-sized blocks (one pool was for line structures, the other for line-segment structures). It was basically a linked list of linked lists.
Memory was pre-allocated (for each region) from a 'malloc()'-like call, and we used 65,535 blocks (0 through 65,534 inclusive, block number 65,535 was considered the null block, an end-of-list indicator).
This allowed each for 65, 535 lines (384K or 512K for the padded version) and about 1.6G of file size (taking 2G of allocated space), which was pretty big back then. That was the theoretical file size limit - I don't think we ever approached that in reality since we never allocated the full set of line segment structures.
Not having to call malloc() for every little block of memory gave us a huge speed increase, especially as we could optimise our own memory allocation routines for fixed size blocks (including inlining the calls in the final optimised version).
The structures in the two pools were as follows, with each line being a single byte):
Line structure (6/8 bytes) Line-segment structure (32 bytes)
+--------+ +--------+
|NNNNNNNN| |nnnnnnnn|
|NNNNNNNN| |nnnnnnnn|
|PPPPPPPP| |pppppppp|
|PPPPPPPP| |pppppppp|
|bbbbbbbb| |LLLLLLLL|
|bbbbbbbb| |LLLLLLLL|
|........| |xxxxxxxx|
|........| :25 more :
+--------+ : x lines:
+--------+
where:
Lower-case letters other than x point to the line segment pool.
Upper-case letters point to the line pool.
N was a block number for the next line (null meaning this was the last line in the file).
P the the block number for the previous line (null meaning this was the first line in the file).
b was the block number for the first line segment in that line (null meaning the line was empty).
. was reserved padding (to bump the structure out to 8 bytes).
n was the block number for the next line segment (null meaning this was the last segment in the line).
p was the block number for the previous line segment (null meaning this was the first segment in the line).
L was the block number for the segment's line block.
x was the 26 characters in that line segment.
The reason the line structure was padded was to speed up the conversion of block numbers into actual memory locations (shifting left by 3 bits was much faster than multiplying by 6 in that particular architecture and extra memory used was only 128K, minimal compared to the total storage used) although we did provide the slower version for those who cared more about memory.
We also had an array of 100 16-bit values which contained the line segment (and line number so we could quickly go to specific lines) at roughly that percentage (so that array[7] was the line that was roughly 7% into the file) and two free pointers to maintain the free list in each pool (this was a very simple one way list where N or n in the structure indicated the next free block and free blocks were allocated from, and put back to, the front of these lists).
There was no need to keep a count of the characters in each line segment since 0-bytes were not valid in files. Each line segment was allowed to have 0-bytes at the end that were totally ignored. Lines were compressed (i.e., line segments were combined) whenever they were modified. This kept block usage low (without infrequent and lengthy garbage collection) and also greatly sped up search-and-replace operations.
The use of these structures allowed very fast editing, insertion, deletion, searching and navigation around the text, which is where you're likely to get most of your performance problems in a simple text editor.
The use of selections (we didn't implement this as it was a text mode editor that used vi-like commands such as 3d to delete 3 lines or 6x to delete 6 characters) could be implemented by having a {line#/block, char-pos} tuple to mark positions in the text, and use two of those tuples for a selection range.
Check out Ropes. Handles fast insert/delete/edit of strings. Ranges are usually supported in Rope implementations, and scrolling can be done with an inverted index into the rope.
Wikipedia says many editors use a Gap Buffer. It is basically an array with an unused space in the middle. The cursor sits just before the gap, so deletion and insertion at the cursor is O(1). It should be pretty easy to implement.
Looking at the source code of Notepad++ (as Chris Ballance suggested in this thread here) shows that they also use a gap buffer. You could get some implementation ideas from that.
There is an excellent article about Piece Chains by James Brown, author of HexEdit.
In a nutshell: Piece chains allow you to record the changes made to the text. After loading, you have a piece chain that spans the entire text. Now you insert somewhere in the middle.
Instead of allocating a new buffer, copying the text around, etc., you create two new pieces and modify the existing one: The existing one now contains the text up to the insertion point (i.e. you just change the length of the piece), then you have a piece with the new text and after that a new piece with all the text after the insertion. The original text is left unchanged.
For undo/redo, you simple remember which pieces you added/removed/changed.
The most complex area when using piece chains is that there is no longer a 1:1 mapping between an offset in the visible text and the memory structure. You either have to search the chain or you must maintain a binary tree structure of some kind.
Check out the implementation of Notepad++, you can view the source on SourceForge
The usual thing is to have something like a list or array of arrays of characters. There has been a lot of stuff done on this over the years: you might have a look at this google search.