setting container size in multset - c++11

Why there is no constructor to initialize container size in multiset in C++?
for example for vector we can initialize container size as
vector<int> a(n);

Creating a multiset with N identical elements was not viewed as a common use case.
For a vector, creating a vector with a pile of identical elements is a common use case, such as a vector of integral types initialized to zero or minus 1.

Related

How to turn a matrix of nested vectors into a vector of nested vectors

I am producing a matrix of pairs (= nested vectors) like so:
x←⍳10
y←⍳10
x∘.,y
But for further processing I need to have the pairs in one vector of pairs. If I apply ravel:
,/x∘.,y
then the pairs are not anymore nested.
How to transform the matrix into a vector and retain the nested pairs?
You just need to ravel the matrix:
,x∘.,y
This reshapes the matrix into a vector containing the same elements in ravel order.

Initializing an Array in the Context of Studying Data Structures

I am reading CLRS's Introduction to Algorithms and there is question 11.1 Exercise 4 in the book under the section Direct-Address Tables :
We wish to implement a dictionary by using direct addressing on a huge array. At
the start, the array entries may contain garbage, and **initializing** the entire array
is impractical because of its size. Describe a scheme for implementing a direct address
dictionary on a huge array. Each stored object should use O(1) space;
the operations SEARCH, INSERT, and DELETE should take O(1) time each; and
initializing the data structure should take O(1) time. (Hint: Use an additional array,
treated somewhat like a stack whose size is the number of keys actually stored in
the dictionary, to help determine whether a given entry in the huge array is valid or
not.)
I understand the solution is just to create another array, and have it store pointers to this array for elements that exist.
But I'm slightly confused as to the meaning of "initialize" in this context.
If the array is not initialized, how can we even access the data (i.e. get the value at the i-th position with A[i])?
I'm also not sure why the question states this memory constraint. Suppose we could initialize the array, how would the answer change?
The problem is that initializing an array of length N -- setting all the elements to a known value like NULL -- takes O(N) time.
If you have an array that is initialized to NULL, then implementing a direct access table is super easy -- A[i] == NULL means there is no value for i, and if there is a value for i, then it's stored in A[i].
The question is about how to avoid the O(N) initialization cost. If the array is not initialized, then the initial values for all A[i] could be anything at all... so how do you tell if it's a real value or just the initial garbage?
The solution is not just to create another array that stores pointers to the original -- you would have to initialize that other array and then you've wasted O(N) time again.
To avoid that cost altogether, you have to be more clever.
Make 3 arrays A, B, and C, and keep a count N of the total number of values in the dictionary.
Then, if the value for i is v:
A[i] = v;
0 <= B[i] < N; and
C[B[i]] = i
This way, the B and C arrays let you keep track of which indexes in A have been set to a real value, without initializing any of the arrays. When you add a new item, you check conditions (2) and (3) to see if the index valid, and if it isn't, then you do:
A[i] = NULL
B[i] = N
C[N++] = i
This marks index i as valid, and conditions (2) and (3) will then pass for all future checks.
Because of the amount of memory it takes, this technique isn't often used in practice, BUT it does mean that theoretically, you never have to count the cost of array initialization when calculating run time complexity.
In that context, initializing means setting the values inside the array to NULL, 0 or the empty value for the stored type. The idea is that when allocating the memory for the array, the content of that allocated memory is random, so the array ends up containing random values. In this situation initializing the values means setting them to the "empty" value.

resize! for 2-dimensional arrays (matrices) in Julia

The function resize!() in Base takes care of carefully allocating memory to accommodate a given vector size of given type:
v = Vector{Float64}(3)
resize!(v, 5) # allocates two extra indices
Since Julia is column-major, I was wondering if it would be possible to define a resizecols! function for matrices that would allocate extra columns in an efficient way:
A = Matrix{Float64}(3,3)
resizecols!(A, 5) # allocates two extra columns
This is useful in many statistical methods where the number of training examples is not known a priori in a loop. One can start allocating a design matrix X with n columns and then expand it if necessary in the loop.
The package ElasticArrays.jl defines an array type that can be resized in the last dimension. Perfect for the case of resizing the columns of a matrix efficiently.

The difference between size and capacity of matrix in igraph

In igraph, we can use igraph_matrix_size to get the size of a matrix, and we can use igraph_matrix_capacity to get the capacity of a matrix, but who can tell me what is the difference between them?
Thank you. Please.
igraph_matrix_capacity shows the number of elements that matrix can potentially have without doing a relocation.
More details in the documentation:
3.11.3. igraph_matrix_size — The number of elements in a matrix.
long int igraph_matrix_size(const igraph_matrix_t *m);
http://igraph.org/c/doc/ch07.html#igraph_matrix_size
Compared to:
3.11.4. igraph_matrix_capacity — Returns the number of elements allocated for
a matrix.
long int igraph_matrix_capacity(const igraph_matrix_t *m);
Note that this might be different from the size of the matrix (as
queried by igraph_matrix_size(), and specifies how many elements the
matrix can hold, without reallocation.
http://igraph.org/c/doc/ch07.html#igraph_matrix_capacity
Hope that answers your question.

Best data structure to store lots one bit data

I want to store lots of data so that
they can be accessed by an index,
each data is just yes and no (so probably one bit is enough for each)
I am looking for the data structure which has the highest performance and occupy least space.
probably storing data in a flat memory, one bit per data is not a good choice on the other hand using different type of tree structures still use lots of memory (e.g. pointers in each node are required to make these tree even though each node has just one bit of data).
Does anyone have any Idea?
What's wrong with using a single block of memory and either storing 1 bit per byte (easy indexing, but wastes 7 bits per byte) or packing the data (slightly trickier indexing, but more memory efficient) ?
Well in Java the BitSet might be a good choice http://download.oracle.com/javase/6/docs/api/java/util/BitSet.html
If I understand your question correctly you should store them in an unsigned integer where you assign each value to a bit of the integer (flag).
Say you represent 3 values and they can be on or off. Then you assign the first to 1, the second to 2 and the third to 4. Your unsigned int can then be 0,1,2,3,4,5,6 or 7 depending on which values are on or off and you check the values using bitwise comparison.
Depends on the language and how you define 'index'. If you mean that the index operator must work, then your language will need to be able to overload the index operator. If you don't mind using an index macro or function, you can access the nth element by dividing the given index by the number of bits in your type (say 8 for char, 32 for uint32_t and variants), then return the result of arr[n / n_bits] & (1 << (n % n_bits))
Have a look at a Bloom Filter: http://en.wikipedia.org/wiki/Bloom_filter
It performs very well and is space-efficient. But make sure you read the fine print below ;-): Quote from the above wiki page.
An empty Bloom filter is a bit array
of m bits, all set to 0. There must
also be k different hash functions
defined, each of which maps or hashes
some set element to one of the m array
positions with a uniform random
distribution. To add an element, feed
it to each of the k hash functions to
get k array positions. Set the bits at
all these positions to 1. To query for
an element (test whether it is in the
set), feed it to each of the k hash
functions to get k array positions. If
any of the bits at these positions are
0, the element is not in the set – if
it were, then all the bits would have
been set to 1 when it was inserted. If
all are 1, then either the element is
in the set, or the bits have been set
to 1 during the insertion of other
elements. The requirement of designing
k different independent hash functions
can be prohibitive for large k. For a
good hash function with a wide output,
there should be little if any
correlation between different
bit-fields of such a hash, so this
type of hash can be used to generate
multiple "different" hash functions by
slicing its output into multiple bit
fields. Alternatively, one can pass k
different initial values (such as 0,
1, ..., k − 1) to a hash function that
takes an initial value; or add (or
append) these values to the key. For
larger m and/or k, independence among
the hash functions can be relaxed with
negligible increase in false positive
rate (Dillinger & Manolios (2004a),
Kirsch & Mitzenmacher (2006)).
Specifically, Dillinger & Manolios
(2004b) show the effectiveness of
using enhanced double hashing or
triple hashing, variants of double
hashing, to derive the k indices using
simple arithmetic on two or three
indices computed with independent hash
functions. Removing an element from
this simple Bloom filter is
impossible. The element maps to k
bits, and although setting any one of
these k bits to zero suffices to
remove it, this has the side effect of
removing any other elements that map
onto that bit, and we have no way of
determining whether any such elements
have been added. Such removal would
introduce a possibility for false
negatives, which are not allowed.
One-time removal of an element from a
Bloom filter can be simulated by
having a second Bloom filter that
contains items that have been removed.
However, false positives in the second
filter become false negatives in the
composite filter, which are not
permitted. In this approach re-adding
a previously removed item is not
possible, as one would have to remove
it from the "removed" filter. However,
it is often the case that all the keys
are available but are expensive to
enumerate (for example, requiring many
disk reads). When the false positive
rate gets too high, the filter can be
regenerated; this should be a
relatively rare event.

Resources