Parsing morse code - algorithm

I am trying to solve this problem.
The goal is to determine the number of ways a morse string can be interpreted, given a dictionary of word.
What I did is that I first "translated" words from my dictionary into morse. Then, I used a naive algorithm, searching for all the ways it can be interpreted recursively.
#include <iostream>
#include <vector>
#include <map>
#include <string>
#include <iterator>
using namespace std;
string morse_string;
int morse_string_size;
map<char, string> morse_table;
unsigned int sol;
void matches(int i, int factor, vector<string> &dictionary) {
int suffix_length = morse_string_size-i;
if (suffix_length <= 0) {
sol += factor;
return;
}
map<int, int> c;
for (vector<string>::iterator it = dictionary.begin() ; it != dictionary.end() ; it++) {
if (((*it).size() <= suffix_length) && (morse_string.substr(i, (*it).size()) == *it)) {
if (c.find((*it).size()) == c.end())
c[(*it).size()] = 0;
else
c[(*it).size()]++;
}
}
for (map<int, int>::iterator it = c.begin() ; it != c.end() ; it++) {
matches(i+it->first, factor*(it->second), dictionary);
}
}
string encode_morse(string s) {
string ret = "";
for (unsigned int i = 0 ; i < s.length() ; ++i) {
ret += morse_table[s[i]];
}
return ret;
}
int main() {
morse_table['A'] = ".-"; morse_table['B'] = "-..."; morse_table['C'] = "-.-."; morse_table['D'] = "-.."; morse_table['E'] = "."; morse_table['F'] = "..-."; morse_table['G'] = "--."; morse_table['H'] = "...."; morse_table['I'] = ".."; morse_table['J'] = ".---"; morse_table['K'] = "-.-"; morse_table['L'] = ".-.."; morse_table['M'] = "--"; morse_table['N'] = "-."; morse_table['O'] = "---"; morse_table['P'] = ".--."; morse_table['Q'] = "--.-"; morse_table['R'] = ".-."; morse_table['S'] = "..."; morse_table['T'] = "-"; morse_table['U'] = "..-"; morse_table['V'] = "...-"; morse_table['W'] = ".--"; morse_table['X'] = "-..-"; morse_table['Y'] = "-.--"; morse_table['Z'] = "--..";
int T, N;
string tmp;
vector<string> dictionary;
cin >> T;
while (T--) {
morse_string = "";
cin >> morse_string;
morse_string_size = morse_string.size();
cin >> N;
for (int j = 0 ; j < N ; j++) {
cin >> tmp;
dictionary.push_back(encode_morse(tmp));
}
sol = 0;
matches(0, 1, dictionary);
cout << sol;
if (T)
cout << endl << endl;
}
return 0;
}
Now the thing is that I only have 3 seconds of execution time allowed, and my algorithm won't work under this limit of time.
Is this the good way to do this and if so, what am I missing ? Otherwise, can you give some hints about what is a good strategy ?
EDIT :
There can be at most 10 000 words in the dictionary and at most 1000 characters in the morse string.

A solution that combines dynamic programming with a rolling hash should work for this problem.
Let's start with a simple dynamic programming solution. We allocate an vector which we will use to store known counts for prefixes of morse_string. We then iterate through morse_string and at each position we iterate through all words and we look back to see if they can fit into morse_string. If they can fit then we use the dynamic programming vector to determine how many ways we could have build the prefix of morse_string up to i-dictionaryWord.size()
vector<long>dp;
dp.push_back(1);
for (int i=0;i<morse_string.size();i++) {
long count = 0;
for (int j=1;j<dictionary.size();j++) {
if (dictionary[j].size() > i) continue;
if (dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)) {
count += dp[i-dictionary[j].size()];
}
}
dp.push_back(count);
}
result = dp[morse_code.size()]
The problem with this solution is that it is too slow. Let's say that N is the length of morse_string and M is the size of the dictionary and K is the size of the largest word in the dictionary. It will do O(N*M*K) operations. If we assume K=1000 this is about 10^10 operations which is too slow on most machines.
The K cost came from the line dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)
If we could speed up this string matching to constant or log complexity we would be okay. This is where rolling hashing comes in. If you build a rolling hash array of morse_string then the idea is that you can compute the hash of any substring of morse_string in O(1). So you could then do hash(dictionary[j]) == hash(morse_string.substring(i-dictionary[j].size(),i))
This is good but in the presence of imperfect hashing you could have multiple words from the dictionary with the same hash. That would mean that after getting a hash match you would still need to match the strings as well as the hashes. In programming contests, people often assume perfect hashing and skip the string matching. This is often a safe bet especially on a small dictionary. In case it doesn't produce a perfect hashing (which you can check in code) you can always adjust your hash function slightly and maybe the adjusted hash function will produce a perfect hashing.

Related

Need to find highest non repeating number in custom vector

I'm creating a program, where you input n amount of mushroom pickers, they are in a shroom picking contest, they can find shroomA (worth 5 points), shroomB (worth 3 points) and shroomC (worth 15 points). I need to find the contest winner and print his/her name, but if two or more contestants have the same amount of points they are disqualified, meaning I need to find the highest non repeating result.
#include <iostream>
#include <vector>
#include <string>
using namespace std;
class ShroomPicker {
private:
string name;
long long int shroomA, shroomB, shroomC;
public:
void Input() {
char Name[100];
long long int shrooma, shroomb, shroomc;
cin >> Name >> shrooma >> shroomb >> shroomc;
name = Name;
shroomA = shrooma; shroomB = shroomb; shroomC = shroomc;
}
long long int calcPoints() {
return shroomA * 5 + shroomB * 3 + shroomC * 15;
}
string winnersName() {
return name;
}
};
int main() {
int n;
cin >> n;
vector<ShroomPicker> shr;
for (int i = 0; i < n; i++) {
ShroomPicker s;
s.Input();
shr.push_back(s);
}
long long int hiscore = 0;
int num = 0;
for (int i = 0; i < n; i++) {
long long int temp = 0;
temp = shr[i].calcPoints();
if (temp > hiscore) {
hiscore = temp;
num = i;
}
}
cout << shr[num].winnersName();
}
I made this program which finds the highest score even if repeats more than once, could someone suggest how I can find the highest non repeating score?
edit:
for (int i = 0; i < n; i++) {
long long int temp = 0;
temp = shr[i].calcPoints();
if (scoreMap.find(temp) == scoreMap.end()) {
scoreMap[temp] = Info{ i, false };
}
else {
scoreMap[temp] = Info{ i, true };
}
}
I would suggest sorting the list of participants in decreasing number of mushrooms picked (O[nlogn]) and then look through the list from start to finish (O[n] max). The first participant whose number of mushrooms picked is different than those of the adjacent participants (in the sorted list) is the winner.
The fastest (O(N)) way I can think of is to have:
struct Info
{
int picker_index;
bool disqualified;
}
// map from score to the Info object above
std::unordered_map<int, Info> scoreMap;
Iterate through pickers and update the map as follows:
-- If no item in the map, just add scoreMap[score] = Info {picker_index, false};
-- else, set disqualified = true on the existing item;
Once the map is constructed, find the max key in the map for which disqualified = false; similar to what you are doing now.

Recursive algorithm to find all possible solutions in a nonogram row

I am trying to write a simple nonogram solver, in a kind of bruteforce way, but I am stuck on a relatively easy task. Let's say I have a row with clues [2,3] that has a length of 10
so the solutions are:
$$-$$$----
$$--$$$---
$$---$$$--
$$----$$$-
$$-----$$$
-$$----$$$
--$$---$$$
---$$--$$$
----$$-$$$
-$$---$$$-
--$$-$$$--
I want to find all the possible solutions for a row
I know that I have to consider each block separately, and each block will have an availible space of n-(sum of remaining blocks length + number of remaining blocks) but I do not know how to progress from here
Well, this question already have a good answer, so think of this one more as an advertisement of python's prowess.
def place(blocks,total):
if not blocks: return ["-"*total]
if blocks[0]>total: return []
starts = total-blocks[0] #starts = 2 means possible starting indexes are [0,1,2]
if len(blocks)==1: #this is special case
return [("-"*i+"$"*blocks[0]+"-"*(starts-i)) for i in range(starts+1)]
ans = []
for i in range(total-blocks[0]): #append current solutions
for sol in place(blocks[1:],starts-i-1): #with all possible other solutiona
ans.append("-"*i+"$"*blocks[0]+"-"+sol)
return ans
To test it:
for i in place([2,3,2],12):
print(i)
Which produces output like:
$$-$$$-$$---
$$-$$$--$$--
$$-$$$---$$-
$$-$$$----$$
$$--$$$-$$--
$$--$$$--$$-
$$--$$$---$$
$$---$$$-$$-
$$---$$$--$$
$$----$$$-$$
-$$-$$$-$$--
-$$-$$$--$$-
-$$-$$$---$$
-$$--$$$-$$-
-$$--$$$--$$
-$$---$$$-$$
--$$-$$$-$$-
--$$-$$$--$$
--$$--$$$-$$
---$$-$$$-$$
This is what i got:
#include <iostream>
#include <vector>
#include <string>
using namespace std;
typedef std::vector<bool> tRow;
void printRow(tRow row){
for (bool i : row){
std::cout << ((i) ? '$' : '-');
}
std::cout << std::endl;
}
int requiredCells(const std::vector<int> nums){
int sum = 0;
for (int i : nums){
sum += (i + 1); // The number + the at-least-one-cell gap at is right
}
return (sum == 0) ? 0 : sum - 1; // The right-most number don't need any gap
}
bool appendRow(tRow init, const std::vector<int> pendingNums, unsigned int rowSize, std::vector<tRow> &comb){
if (pendingNums.size() <= 0){
comb.push_back(init);
return false;
}
int cellsRequired = requiredCells(pendingNums);
if (cellsRequired > rowSize){
return false; // There are no combinations
}
tRow prefix;
int gapSize = 0;
std::vector<int> pNumsAux = pendingNums;
pNumsAux.erase(pNumsAux.begin());
unsigned int space = rowSize;
while ((gapSize + cellsRequired) <= rowSize){
space = rowSize;
space -= gapSize;
prefix.clear();
prefix = init;
for (int i = 0; i < gapSize; ++i){
prefix.push_back(false);
}
for (int i = 0; i < pendingNums[0]; ++i){
prefix.push_back(true);
space--;
}
if (space > 0){
prefix.push_back(false);
space--;
}
appendRow(prefix, pNumsAux, space, comb);
++gapSize;
}
return true;
}
std::vector<tRow> getCombinations(const std::vector<int> row, unsigned int rowSize) {
std::vector<tRow> comb;
tRow init;
appendRow(init, row, rowSize, comb);
return comb;
}
int main(){
std::vector<int> row = { 2, 3 };
auto ret = getCombinations(row, 10);
for (tRow r : ret){
while (r.size() < 10)
r.push_back(false);
printRow(r);
}
return 0;
}
And my output is:
$$-$$$----
$$--$$$---
$$---$$$--
$$----$$$--
$$-----$$$
-$$-$$$----
-$$--$$$--
-$$---$$$-
-$$----$$$-
--$$-$$$--
--$$--$$$-
--$$---$$$
---$$-$$$-
---$$--$$$
----$$-$$$
For sure, this must be absolutely improvable.
Note: i did't test it more than already written case
Hope it works for you

Find word in string buffer/paragraph/text

This was asked in Amazon telephonic interview - "Can you write a program (in your preferred language C/C++/etc.) to find a given word in a string buffer of big size ? i.e. number of occurrences "
I am still looking for perfect answer which I should have given to the interviewer.. I tried to write a linear search (char by char comparison) and obviously I was rejected.
Given a 40-45 min time for a telephonic interview, what was the perfect algorithm he/she was looking for ???
The KMP Algorithm is a popular string matching algorithm.
KMP Algorithm
Checking char by char is inefficient. If the string has 1000 characters and the keyword has 100 characters, you don't want to perform unnecessary comparisons. The KMP Algorithm handles many cases which can occur, but I imagine the interviewer was looking for the case where: When you begin (pass 1), the first 99 characters match, but the 100th character doesn't match. Now, for pass 2, instead of performing the entire comparison from character 2, you have enough information to deduce where the next possible match can begin.
// C program for implementation of KMP pattern searching
// algorithm
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void computeLPSArray(char *pat, int M, int *lps);
void KMPSearch(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
// create lps[] that will hold the longest prefix suffix
// values for pattern
int *lps = (int *)malloc(sizeof(int)*M);
int j = 0; // index for pat[]
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
while (i < N)
{
if (pat[j] == txt[i])
{
j++;
i++;
}
if (j == M)
{
printf("Found pattern at index %d \n", i-j);
j = lps[j-1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j-1];
else
i = i+1;
}
}
free(lps); // to avoid memory leak
}
void computeLPSArray(char *pat, int M, int *lps)
{
int len = 0; // length of the previous longest prefix suffix
int i;
lps[0] = 0; // lps[0] is always 0
i = 1;
// the loop calculates lps[i] for i = 1 to M-1
while (i < M)
{
if (pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
if (len != 0)
{
// This is tricky. Consider the example
// AAACAAAA and i = 7.
len = lps[len-1];
// Also, note that we do not increment i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Driver program to test above function
int main()
{
char *txt = "ABABDABACDABABCABAB";
char *pat = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
This code is taken from a really good site that teaches algorithms:
Geeks for Geeks KMP
Amazon and companies alike expect knowledge of Boyer–Moore string search or / and Knuth–Morris–Pratt algorithms.
Those are good if you want to show perfect knowledge. Otherwise, try to be creative and write something relatively elegant and efficient.
Did you ask about delimiters before you wrote anything? It could be that they may simplify your task to provide some extra information about a string buffer.
Even code below could be ok (it's really not) if you provide enough information in advance, properly explain runtime, space requirements, choice of data containers.
int find( std::string & the_word, std::string & text )
{
std::stringstream ss( text ); // !!! could be really bad idea if 'text' is really big
std::string word;
std::unordered_map< std::string, int > umap;
while( ss >> text ) ++umap[text]; // you have to assume that each word separated by white-spaces.
return umap[the_word];
}

All of the option to replace an unknown number of characters

I am trying to find an algorithm that for an unknown number of characters in a string, produces all of the options for replacing some characters with stars.
For example, for the string "abc", the output should be:
*bc
a*c
ab*
**c
*b*
a**
***
It is simple enough with a known number of stars, just run through all of the options with for loops, but I'm having difficulties with an all of the options.
Every star combination corresponds to binary number, so you can use simple cycle
for i = 1 to 2^n-1
where n is string length
and set stars to the positions of 1-bits of binary representations of i
for example: i=5=101b => * b *
This is basically a binary increment problem.
You can create a vector of integer variables to represent a binary array isStar and for each iteration you "add one" to the vector.
bool AddOne (int* isStar, int size) {
isStar[size - 1] += 1
for (i = size - 1; i >= 0; i++) {
if (isStar[i] > 1) {
if (i = 0) { return true; }
isStar[i] = 0;
isStar[i - 1] += 1;
}
}
return false;
}
That way you still have the original string while replacing the characters
This is a simple binary counting problem, where * corresponds to a 1 and the original letter to a 0. So you could do it with a counter, applying a bit mask to the string, but it's just as easy to do the "counting" in place.
Here's a simple implementation in C++:
(Edit: The original question seems to imply that at least one character must be replaced with a star, so the count should start at 1 instead of 0. Or, in the following, the post-test do should be replaced with a pre-test for.)
#include <iostream>
#include <string>
// A cleverer implementation would implement C++'s iterator protocol.
// But that would cloud the simple logic of the algorithm.
class StarReplacer {
public:
StarReplacer(const std::string& s): original_(s), current_(s) {}
const std::string& current() const { return current_; }
// returns true unless we're at the last possibility (all stars),
// in which case it returns false but still resets current to the
// original configuration.
bool advance() {
for (int i = current_.size()-1; i >= 0; --i) {
if (current_[i] == '*') current_[i] = original_[i];
else {
current_[i] = '*';
return true;
}
}
return false;
}
private:
std::string original_;
std::string current_;
};
int main(int argc, const char** argv) {
for (int a = 1; a < argc; ++a) {
StarReplacer r(argv[a]);
do {
std::cout << r.current() << std::endl;
} while (r.advance());
std::cout << std::endl;
}
return 0;
}

random writing Markov Model efficiency

Here is my implementation
However, it is a bit slow when analyzing the textfile,
Anyone have a better idea or better data structure to implement Random writing?
Im not using the STL library so dun worry about the syntax.
instead of using push_back, vector here is using .add
randomInteger will generate randome integer between ranges
I would like to produce 2000 character if possible;
I think the slowest part is reading the file char by char?
void generateText(int order, string initSeed, string filename){
Map<string , Vector<char> > model;
char ch;
string key;
ifstream input(filename.c_str());
for(int i = 0; i < order; i++){
input.get(ch);
key+=ch;
}
while(input.get(ch)){
model[key].add(ch);
key = key.substr(1,key.length()-1) + ch;
}
string result;
string seed = initSeed;
for(int i = 0;i<2000;i++){
if (model[seed].size() >0) {
ch = model[seed][randomInteger(0, model[seed].size()-1)];
cout << ch;
seed = seed.substr(1,seed.length()-1) + ch;
}
else
return;
}
}
You need to determine that it is taking too long. (How is this code not running in less than a second on an average laptop?)
If it is, you need to profile.
For example, a likely candidate is the cost of generating random numbers...
You'll only disprove me by profiling ;)
I think it is a bit slow because it creates lots of temporary strings during the analysis phase.
for(int i = 0; i < order; i++){
input.get(ch);
key+=ch; // key = key + ch, at least one new string created
}
while(input.get(ch)){
model[key].add(ch); // key copied to hash table
key = key.substr(1,key.length()-1) + ch; // a couple of temp strings created
}
You could do instead like this:
char key[order + 1]; // pseudo code, won't work because order is not constant
key[order] = 0; /* NUL terminate */
for (int i = 0; i < order; i++) {
input.get(key[i]);
}
while (!(input.eof())) {
for (int j = 0; j < order - 1; k++) {
key[j] = key[j + 1];
}
input.get(key[order]);
model[key].add(ch);
}
Here the only string that is actually created is the string that ends up as a key in the hash table. The key is rotated in a simple character array, avoiding string temporaries.

Resources