implementing a dictionary - data-structures

Hii ,
I ran across a interview question of implementing a dictionary that can implement the features of auto-completion , auto - correction , spell check etc...
I actually wanted to know which data structure is the best for implementing a dictionary and how one approaches the above required features...
Any links that guide me on this are welcome...

There is just the same answer for this kind of problem: a Trie. Take a look here..
Also suffix trees (or Patricia Trees) can be useful for this purposes..

Tries are a common structure for this. They are a special case of finite-state automata, which have also been used for dictionary construction and spell checking.

You can get auto-completion with any sorted container, for example a set of strings:
#include <limits>
#include <set>
#include <string>
#include <vector>
int main()
{
std::set<std::string> words = {"foo", "bar", "barber", "baz", "quux"};
std::string starting_with = "ba";
auto lower = words.lower_bound(starting_with);
auto upper = words.upper_bound(starting_with + std::numeric_limits<char>::max());
std::vector<std::string> suggested_words(lower, upper);
}

Related

String Array on ESP32

I need to implement a string array, like:
String[] txt = {"some text1", "some text2", "some text3", "some text4"};
The standard char[] doesn't suit me. How can I use string Array or List in Arduino IDE for ESP32?
You're free to use all STL facilities, including an array of strings. Standard caveats apply (dynamic RAM allocation from heap, STL uses lots of Flash for code, etc)
#include <string>
#include <array>
std::array<std::string> my_array = {"text1", "text2"};

When to use ostream_iterator

As I know, we can use ostream_iterator in c++11 to print a container.
For example,
std::vector<int> myvector;
for (int i=1; i<10; ++i) myvector.push_back(i*10);
std::copy ( myvector.begin(), myvector.end(), std::ostream_iterator<int>{std::cout, " "} );
I don't know when and why we use the code above, instead of traditional way, such as:
for(const auto & i : myvector) std::cout<<i<<" ";
In my opinion, the traditional way is faster because there is no copy, am I right?
std::ostream_iterator is a single-pass OutputIterator, so it can be used in any algorithms which accept such iterator. The use of it for outputing vector of int-s is just for presenting its capabilities.
In my opinion, the traditional way is faster because there is no copy, am I right?
You may find here: http://en.cppreference.com/w/cpp/algorithm/copy that copy is implemented quite similarly to your for-auto loop. It is also specialized for various types to work as efficient as possible. On the other hand writing to std::ostream_iterator is done by assignment to it, and you can read here : http://en.cppreference.com/w/cpp/iterator/ostream_iterator/operator%3D that it resolves to *out_stream << value; operation (if delimiter is ignored).
You may also find that this iterator suffers from the problem of extra trailing delimiter which is inserted at the end. To fix this there will be (possibly in C++17) a new is a single-pass OutputIterator std::experimental::ostream_joiner
A short (and maybe silly) example where using iterator is usefull. The point is that you can direct your data to any sink - a file, console output, memory buffer. Whatever output you choose, MyData::serialize does not needs changes, you only need to provide OutputIterator.
struct MyData {
std::vector<int> data = {1,2,3,4};
template<typename T>
void serialize(T iterator) {
std::copy(data.begin(), data.end(), iterator);
}
};
int main()
{
MyData data;
// Output to stream
data.serialize(std::ostream_iterator<int>(std::cout, ","));
// Output to memory
std::vector<int> copy;
data.serialize(std::back_inserter(copy));
// Other uses with different iterator adaptors:
// std::front_insert_iterator
// other, maybe custom ones
}
The difference is polymorphism vs. hardcoded stream.
std::ostream_iterator builds itself from any class which inherits from std::ostream, so in runtime, you can change or wire the iterator to write to difference output stream type based on the context on which the functions runs.
the second snippet uses a hardcoded std::cout which cannot change in runtime.

How to define common enum in C++11?

So I've got some enum that is defined in one part and I need to use it in multiple other parts of program. As I suppose, there is no extern enums in C++11. So how to use the same defined enum in different units? Sorry for a duplicate or misunderstandings.
This seems to be exactly what header files are for:
enum_def.H:
enum class my_enum_type { /* .... */ };
file1.C:
#include <enum_def.H>
file2.C:
#include <enum_def.H>

Copy a std::vector to boost::unordered_map<string, std::vector<foo>>

Mapping a record type to a vector of field values:
unordered_map<string, vector<string>> input_records;
string rec_type {"name"};
vector<string> fields { "field 1", "field 2", "field 3" };
I want to copy fields to input_records[rec_type]. The following doesn't work:
_input_records[rec_type].insert(rec_deq.begin(), rec_deq.end());
Yet the boost documentation for unordered_map contains this:
template<typename InputIterator>
void insert(InputIterator first, InputIterator last);
Inserts a range of elements into the container. Elements are inserted if and only if there is no element in the container with an equivalent key.
Throws:
When inserting a single element, if an exception is thrown by an operation other than a call to hasher the function has no effect.
Notes:
Can invalidate iterators, but only if the insert causes the load factor to be greater to or equal to the maximum load factor.
Pointers and references to elements are never invalidated.
(Writing this, I realize that probably the InputIterator points to a sequence of key/value pairs, each of which is to be inserted into the Map. So, wrong method.)
How best can one instantiate and populate the vector at input_record[rec_type]? Once populated, the vector won't be modified. Is it as simple as input_record[rec_type] = fields? Do, or can, "move semantics" apply in this situation?
In standard associative containers, what you push in is a std::pair, most of the time using std::make_pair().
#include <vector>
#include <string>
#include <unordered_map>
#include <iostream>
using namespace std;
int main()
{
unordered_map<string, vector<string>> input_records;
string rec_type {"name"};
vector<string> fields { "field 1", "field 2", "field 3" };
input_records.insert( make_pair( rec_type, fields ) );
for( const auto& text : input_records[rec_type] )
cout << text << '\n';
}
This is true for Boost containers too as they are based on the standard ones.
Also, since C++11 there is another function, emplace(), which allows you to create "in place" a pair instead of having to build it first then pass it to the container:
#include <vector>
#include <string>
#include <unordered_map>
#include <iostream>
using namespace std;
int main()
{
unordered_map<string, vector<string>> input_records;
string rec_type {"name"};
vector<string> fields { "field 1", "field 2", "field 3" };
input_records.emplace( rec_type, fields );
for( const auto& text : input_records[rec_type] )
cout << text << '\n';
}
Depending on how you pass data (rvalue reference? move? etc.) there will be more or less copies. However, a simple rule of thumb that works with all standard containers is that you should just try to use emplace() until you are in a situation where you have to use insert().
Using input_record[rec_type] works as you would expect. However, I would suggest using find() instead because the [] operator will add a new element if it's not found, whild find() will return an iterator which will be container.end() in case it don't find the element, so you just compare the iterator to end() to know if the element was found or not.
This is true for all standard associative containers and boost ones.
Most of the time I use associative containers encapsulated into a class which represent the concept I want: like a InputRecord class. Then I provide a find function (or whatever action name suiting the domain) which does exactly what I want, depending on the case: sometime I want it to throw exception if the object is not found, sometime I want to create a new record when the one I look for isn't there, sometime I want to return a pointer, sometime I prefer to return a boost::optional.
You'd better do the same: encapsulate your concept under a class, expose services that are needed, then use whatever you want, however you want inside the class. You can easily even switch to another container if you need to later, without changing the interface, just by changing the code inside the class that will manipulate the container.

Is boost::property_map operator [] O(1) time complexity in this code?

I am new to boost and also boost graph library. Could anyone explain what is the implementation behind the property_map and what is the time-complexity of the operator [] in the following code?
Thanks!
#include <string>
#include <boost/graph/adjacency_list.hpp>
int main()
{
using namespace boost;
typedef adjacency_list<listS, listS, directedS,
property<vertex_name_t, std::string> > graph_t;
graph_t g;
graph_traits<graph_t>::vertex_descriptor u = add_vertex(g);
property_map<graph_t, vertex_name_t>::type name_map = get(vertex_name, g);
name_map[i] = "Joe";
return EXIT_SUCCESS;
}
You can create a property map by giving an std::map to it. So the time and space complexity is probably the same as the underlying std::map. You might want to look deeper into the STL documentation of a Sorted Associative Container.
I've wondered this myself and find it odd that boost::graph documentation doesn't make this clear as questions such as this are highly relevant to performance critical algorithms/applications.
In summary I believe the answer is yes, it is O(1) time complexity. My reasoning follows.
Since the property map concepts are just concepts, it makes no guarantees about the complexity. So we have to look at adjacency_list's implementation of a property map to know its complexity. I believe the relevant code is found in boost/graph/detail/adjacency_list.hpp; search for "Vertex Property Maps".
template <class Graph, class ValueType, class Reference, class Tag>
struct adj_list_vertex_property_map
: public boost::put_get_helper<
Reference,
adj_list_vertex_property_map<Graph, ValueType, Reference, Tag>
>
{
typedef typename Graph::stored_vertex StoredVertex;
typedef ValueType value_type;
typedef Reference reference;
typedef typename Graph::vertex_descriptor key_type;
typedef boost::lvalue_property_map_tag category;
inline adj_list_vertex_property_map(const Graph* = 0, Tag tag = Tag()): m_tag(tag) { }
inline Reference operator[](key_type v) const {
StoredVertex* sv = (StoredVertex*)v;
return get_property_value(sv->m_property, m_tag);
}
inline Reference operator()(key_type v) const {
return this->operator[](v);
}
Tag m_tag;
};
I believe this is the property map that is used for internal properties for an adjacency_list that is instantiated with a ListS VertexList type, as in your example. You can see that the operator[] takes a Graph::vertex_descriptor which appears to be some handle, maybe an iterator, and accesses the property structure directly without lookup, sv->m_property, thus constant time. The call to get_property_value() appears to be for property tag resolution when you have multiple properties associated with each vertex; in your case you only have one. Tag lookup is typically also constant time.
Instantiating a adjacency_list with properties with a VecS VertexList type also appears to give O(1) time complexity in property map lookup. The type used there appears to be vec_adj_list_vertex_property_map and the opearator[] uses the Graph::vector_descriptor in what appears to be an index into a vector of properties per vertex, thus O(1).
In retrospect, I suppose I would expect that because the library works so hard to be performant, it would ensure that this is also performant.

Resources