How many blocks in this cache?

How many blocks in this cache? - caching

Using the sequence of references from Exercise 5.2, show the final cache contents for a three-way set associative cache with two-word blocks and a total size of 24 words.
Here is the problem, how many "blocks" in this cache.
I think a block has 2 words, so there should be 12 blocks, and in three-way, one way contains 4 blocks.
But the solution says there are 24/3=8 blocks per way.
Am I wrong? Or solution was not correct?

Here is the problem, how many "blocks" in this cache.
24 words / 2 words-per-block = 12 blocks.
Blocks per way
One way of one set is one cache line, aka 1 block, which we're told is 2 words for this cache. One of the 3 ways across all sets is 24 words / 3 ways = 8 words = 4 blocks. You're correct.
That's an odd way to describe the total number of sets, but it is the same thing. You could also calculate it as 12 blocks / 3 blocks-per-set = 4 sets. Each way is by definition 1 block, so 3-way set associative means 3 blocks / set. So 24 words / (2 words/block * 3 blocks/set) = 24/6 blocks/set = 4 blocks/set.
What textbook is this? It's not CS:APP 3e global edition, is it? The practice problems in that edition of the textbook were replaced by incompetent people hired by the publisher, see CS:APP example uses idivq with two operands?
8 words per way (size of each set) is the distance between word addresses that index the same set, so accesses that far apart will alias each other and potentially generate a lot more conflict misses.

Related

How to get powerset from a set with 3600 elements using as little memory as possible

I have been looking for a language and code to help me calculate all possible subsets of a set of 3600 elements. At first my search started with python, then I went through JavaScript and then came to Perl. I know using Perl to calculate all subsets as shown in https://rosettacode.org/wiki/Power_set having 16GB of ram there is a significant memory consumption, but I'm not sure if anything better than perl or this script bellow:
MY MWE:
use ntheory "forcomb";
my #S = qw/1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30/;
forcomb { print "[#S[#_]] " } scalar(#S);
print "\n";

There is no calculator that can handle so much elements in memory.
The number of possible subset starting from a set of 3600 elements is 2^3600.
This number is very big. Consider that
2^10 is close to 1.000
2^20 is close to 1.000.000
2^30 is close to 1.000.000.000
Basically every 10 you add three zeros, so with 2^3600 you have a number with 1200 zeros of different combinations, which is an unimaginable big number.
You can't solve this problem also saving the data to disk and using all the existing computers on the earth.
With all the computers existing on the earth (a number close to 2.000.000.000, so 2^31 computers) and imagine a disk space of a terabyte for each of them (2^40 bytes) you can imagine storing information for a set of 71 elements (71 not 3600) using a single byte to store each number and without considering the extra space to store the set information... take your consideration based on that.
You can eventually imagine giving a sort order to all the possible subsets and coding an algorithm that gives you the nth subset based on that sort. This can be done because you don't need to calculate and store all possible subsets, but calculate just one using some rule. If you are interested we can try to evaluate such solution

For a set s (with size |s|), the size of its power set P(s) is |P(s)| = 2^|s|.
Never-mind the memory. You'd need 2^3600 iterations to calculate each value.
This is totally computationally intractable in this universe.

Take Java (or another compiled language like Pascal with some bit support).
It has BitSet, so 3600 elements are represented with approximately 3600/8 = 450 bytes. All possibilities would be 23600: to much to iterate. One could iterate with a BigInteger, every ith bit representing an element.
Simple iterating with a BigInteger upto 23600 - 1 should make it your (descendants') life's work. Be aware that this kind of problem is something for quantum computing.
But I assume you have a very smart algorithm pruning most possibilities.
It would be nice to have dependencies like in sudoku. Then maybe a logic language or some rule engine might do.
Should 3600 be the seconds in an hour which you have to combine, please consider spending that hour otherwise. 😉

Wikipedia pages co-edit graph extraction using Hadoop

I am trying the build the graph of Wikipedia co-edited pages using hadoop. The raw data contains the list of edits, i.e. has one row per edit telling who edited what:
# revisionId pageId userId
1 1 10
2 1 11
3 2 10
4 3 10
5 4 11
I want to extract a graph, in which each node is a page, and there is a link between two pages if at lease one editor edited both pages (the same editor). For the above example, the output would be:
# edges: pageId1,pageId2
1,2
1,3
1,4
2,3
I am far from being an expert in Map/Reduce, but I think this has to be done in two jobs:
The first job extracts the list of edited pages for each user.
# userId pageId1,pageId2,...
10 1,2,3
11 1,4
The second job takes the output above, and simply generates all pairs of pages that each user edited (these pages have thus been edited by the same user, and will therefore be linked in the graph). As a bonus, we can actually count how many users co-edited each page, to get the weight of each edge.
# pageId1,pageID2 weight
1,2 1
1,3 1
1,4 1
2,3 1
I implemented this using Hadoop, and it works. The problem is that the map phase of the second job is really slow (actually, the first 30% are OK, but then it slows down quite a lot). The reason I came up with is that because some users have edited many pages, the mapper has to generate a lot of these pairs as outputs. Hadoop thus has to spill to disk, rendering the whole thing pretty slow.
My questions are thus the following:
For those of you who have more experience than I with Hadoop: am I doing it wrong? Is there a simpler way to extract this graph?
Can disk spills be the reason why the map phase of the second job is pretty slow? How can I avoid this?
As a side node, this runs fine with a small sample of the edits. It only gets slow with GBs of data.

Apparently, this is a common problem known as combinations/cross-correlation/co-occurrences, and there are two patterns to solve it using Map/Reduce, Pairs or Stripes:
Map Reduce Design Patterns :- Pairs & Stripes
MapReduce Patterns, Algorithms, and Use Cases (Cross-correlation section)
Pairs and Stripes
The way I presented in my question is the pairs approach, which usually generates much more data. The stripes approach benefits more from a combiner, and gave better results in my case.

sorting cards with wildcards

i am programming a card game and i need to sort a stack of cards by their rank. so that they form a gapless sequence.
in this special game the card with value 2 could be used as a wild card, so for example the cards
2 3 5
should be sorted like this
3 2 5
because the 2 replaces the 4, otherwise it would not be a valid sequence.
however the cards
2 3 4
should stay like they are.
restriction: there an be only one '2' used as a wildcard.
2 2 3 4
would also stay like it is, because the first 2 would replace the ACE (or 1, whatever you call it).
the following would not be a valid input sequence, since one of the 2s must be use as a wildcard and one not. it is not possible to make up a gapless sequence then.
2 4 2 6
now i have a difficulty to figure out if a 2 is used as a wildcard or not. once i got that, i think i can do the rest of the sorting
thanks for any algorithmic help on this problem!

EDIT in response to your clarification to your new requirement:
You're implying that you'll never get data for which a gapless sequence cannot be formed. (If only I could have such guarantees in the real world.) So:
Do you have a 2?
No: your sequence must already be gapless.
Yes: You need to figure out where to put it.
Sort your input. Do you see a gap? Since you can only use one 2 as a wildcard, there can be at most one gap.
No: treat the 2 as a legitimate number two.
Yes: move the 2 to the gap to fill it in.
EDIT in response to your new requirement:
In this case, just look for the highest single gap, and plug it with a 2 if you have a 2 available.
Original answer:
Since your sequence must be gapless, you could count the number of 2s you have and the sizes of all the gaps that are present. Then just fill in the highest gap for which you have a sufficient number of 2s.

Scalability of aho corasick

I want to search a text document for occurrences of keyphrases from a database of keyphrases (extracted from wikipedia article titles). (ie. given a document i want to find whether any of the phrases have a corresponding wikipedia article) I found out about the Aho-Corasick algorithm. I want to know if building an Aho-Corasick automaton for dictionary of millions of entries is efficient and scalable.

Let's just make a simple calculations :
Assume that you have 1 million patterns (strings, phrases) with average length 10 characters and a value (label, token, pointer etc) of 1 word (4 bytes) length , assigned to each pattern
Then you will need an array of 10+4=14 million bytes (14Mb) just to hold the list of patterns.
From 1 million patterns 10 bytes (letters, chars) each you could build an AC trie with no more than 10 million nodes. How big is this trie in practice depends on the size of each node.
It should at least keep 1 byte for a label (letter) and word (4bytes) for a pointer to a next node in trie (or a pattern for a terminal node) plus 1 bit (boolean) to mark terminal node,
Total about 5 bytes
So, the minimum size of a trie for 1 million patterns 10 chars you will need min 50 million bytes or about 50 Mb of memory.
In practice it might be 3-10 times more , but yet is very-very manageable, as even 500Mb memory is very moderate today. (Compare it with Windows applications like Word or Outlook)
Given that in terms of speed Aho-Corasick (AC) algorithm is almost unbeatable, it still remains the best algorithm for multiple pattern match ever. That's my strong personal educated opinion apart from academic garbage .
All reports of "new" latest and greatest algorithms that might outperform AC are highly exaggerated (except maybe for some special cases with short patterns like DNA)
The only improvement of AC could in practice go along the line of more and faster hardware (multiple cores, faster CPUs, clusters etc)
Don't take my word for it, test it for yourself. But remember that real speed of AC strongly depends on implementation (language and quality of coding)

In theory, it should maintain linear speed subject only to the effects of the memory hierarchy - it will slow down as it gets too big to fit in cache, and when it gets really big, you'll have problems if it starts getting paged out.
OTOH the big win with Aho-Corasick is when searching for decent sized substrings that may occur at any possible location within the string being fed in. If your text document is already cut up into words, and your search phrases are no more than e.g. 6 words long, then you could build a hash table of K-word phrases, and then look up every K-word contiguous section of words from the input text in it, for K = 1..6.
(Answer to comment)
Aho-Corasick needs to live in memory, because you will be following pointers all over the place. If you have to work outside memory, it's probably easiest to go back to old-fashioned sort/merge. Create a file of K-word records from the input data, where K is the maximum number of words in any phrase you are interested in. Sort it, and then merge it against a file of sorted Wikipedia phrases. You can probably do this almost by hand on Unix/Linux, using utilities such as sort and join, and a bit of shell/awk/perl/whatever. See also http://en.wikipedia.org/wiki/Key_Word_in_Context (I'm old enough to have actually used one of these indexes, provided as bound pages of computer printout).

Well there is a workaround. By writing the built AC trie of dictionary into a text file in a xml-like format, making an index file for the first 6 levels of that trie, etc... In my tests I search for all partial matches of a sentence in the dictionary (500'000 entries), and I get ~150ms for ~100 results for a sentence of 150-200 symbols.
For more details, check out this paper : http://212.34.233.26/aram/IJITA17v2A.Avetisyan.doc

There are other ways to get performance:
- condense state transitions: you can get them down to 32 bits.
- ditch the pointers; write the state transitions to a flat vector.
- pack nodes near the tree root together: they will be in cache.
The implementation takes about 3 bytes per char of the original pattern set,
and for 32-bit nodes, can take a pattern space of about 10M chars.
For 64-bit nodes, have yet to hit (or figure) the limit.
Doc: https://docs.google.com/document/d/1e9Qbn22__togYgQ7PNyCz3YzIIVPKvrf8PCrFa74IFM/view
Src: https://github.com/mischasan/aho-corasick

Finding sets that have specific subsets

I am a graduate student of physics and I am working on writing some code to sort several hundred gigabytes of data and return slices of that data when I ask for it. Here is the trick, I know of no good method for sorting and searching data of this kind.
My data essentially consists of a large number of sets of numbers. These sets can contain anywhere from 1 to n numbers within them (though in 99.9% of the sets, n is less than 15) and there are approximately 1.5 ~ 2 billion of these sets (unfortunately this size precludes a brute force search).
I need to be able to specify a set with k elements and have every set with k+1 elements or more that contains the specified subset returned to me.
Simple Example:
Suppose I have the following sets for my data:
(1,2,3)
(1,2,3,4,5)
(4,5,6,7)
(1,3,8,9)
(5,8,11)
If I were to give the request (1,3) I would have the sets: (1,2,3),
(1,2,3,4,5), and (1,3,8,9).
The request (11) would return the set: (5,8,11).
The request (1,2,3) would return the sets: (1,2,3) and (1,2,3,4,5)
The request (50) would return no sets:
By now the pattern should be clear. The major difference between this example and my data is that the sets withn my data are larger, the numbers used for each element of the sets run from 0 to 16383 (14 bits), and there are many many many more sets.
If it matters I am writing this program in C++ though I also know java, c, some assembly, some fortran, and some perl.
Does anyone have any clues as to how to pull this off?
edit:
To answer a couple questions and add a few points:
1.) The data does not change. It was all taken in one long set of runs (each broken into 2 gig files).
2.) As for storage space. The raw data takes up approximately 250 gigabytes. I estimate that after processing and stripping off a lot of extraneous metadata that I am not interested in I could knock that down to anywhere from 36 to 48 gigabytes depending on how much metadata I decide to keep (without indices). Additionally if in my initial processing of the data I encounter enough sets that are the same I might be able to comress the data yet further by adding counters for repeat events rather than simply repeating the events over and over again.
3.) Each number within a processed set actually contains at LEAST two numbers 14 bits for the data itself (detected energy) and 7 bits for metadata (detector number). So I will need at LEAST three bytes per number.
4.) My "though in 99.9% of the sets, n is less than 15" comment was misleading. In a preliminary glance through some of the chunks of the data I find that I have sets that contain as many as 22 numbers but the median is 5 numbers per set and the average is 6 numbers per set.
5.) While I like the idea of building an index of pointers into files I am a bit leery because for requests involving more than one number I am left with the semi slow task (at least I think it is slow) of finding the set of all pointers common to the lists, ie finding the greatest common subset for a given number of sets.
6.) In terms of resources available to me, I can muster approximately 300 gigs of space after I have the raw data on the system (The remainder of my quota on that system). The system is a dual processor server with 2 quad core amd opterons and 16 gigabytes of ram.
7.) Yes 0 can occur, it is an artifact of the data acquisition system when it does but it can occur.

Your problem is the same as that faced by search engines. "I have a bajillion documents. I need the ones which contain this set of words." You just have (very conveniently), integers instead of words, and smallish documents. The solution is an inverted index. Introduction to Information Retrieval by Manning et al is (at that link) available free online, is very readable, and will go into a lot of detail about how to do this.
You're going to have to pay a price in disk space, but it can be parallelized, and should be more than fast enough to meet your timing requirements, once the index is constructed.

Assuming a random distribution of 0-16383, with a consistent 15 elements per set, and two billion sets, each element would appear in approximately 1.8M sets. Have you considered (and do you have the capacity for) building a 16384x~1.8M (30B entries, 4 bytes each) lookup table? Given such a table, you could query which sets contain (1) and (17) and (5555) and then find the intersections of those three ~1.8M-element lists.

My guess is as follows.
Assume that each set has a name or ID or address (a 4-byte number will do if there are only 2 billion of them).
Now walk through all the sets once, and create the following output files:
A file which contains the IDs of all the sets which contain '1'
A file which contains the IDs of all the sets which contain '2'
A file which contains the IDs of all the sets which contain '3'
... etc ...
If there are 16 entries per set, then on average each of these 2^16 files will contain the IDs of 2^20 sets; with each ID being 4 bytes, this would require 2^38 bytes (256 GB) of storage.
You'll do the above once, before you process requests.
When you receive requests, use these files as follows:
Look at a couple of numbers in the request
Open up a couple of the corresponding index files
Get the list of all sets which exist in both these files (there's only a million IDs in each file, so this should't be difficult)
See which of these few sets satisfy the remainder of the request
My guess is that if you do the above, creating the indexes will be (very) slow and handling requests will be (very) quick.

I have recently discovered methods that use Space Filling curves to map the multi-dimensional data down to a single dimension. One can then index the data based on its 1D index. Range queries can be easily carried out by finding the segments of the curve that intersect the box that represents the curve and then retrieving those segments.
I believe that this method is far superior to making the insane indexes as suggested because after looking at it, the index would be as large as the data I wished to store, hardly a good thing. A somewhat more detailed explanation of this can be found at:
http://www.ddj.com/184410998
and
http://www.dcs.bbk.ac.uk/~jkl/publications.html

Make 16383 index files, one for each possible search value. For each value in your input set, write the file position of the start of the set into the corresponding index file. It is important that each of the index files contains the same number for the same set. Now each index file will consist of ascending indexes into the master file.
To search, start reading the index files corresponding to each search value. If you read an index that's lower than the index you read from another file, discard it and read another one. When you get the same index from all of the files, that's a match - obtain the set from the master file, and read a new index from each of the index files. Once you reach the end of any of the index files, you're done.
If your values are evenly distributed, each index file will contain 1/16383 of the input sets. If your average search set consists of 6 values, you will be doing a linear pass over 6/16383 of your original input. It's still an O(n) solution, but your n is a bit smaller now.
P.S. Is zero an impossible result value, or do you really have 16384 possibilities?

Just playing devil's advocate for an approach which includes brute force + index lookup :
Create an index with the min , max and no of elements of sets.
Then apply brute force excluding sets where max < max(set being searched) and min > min (set being searched)
In brute force also exclude sets whole element count is less than that of the set being searched.
95% of your searches would really be brute forcing a very smaller subset. Just a thought.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio