random writing Markov Model efficiency - algorithm

Here is my implementation
However, it is a bit slow when analyzing the textfile,
Anyone have a better idea or better data structure to implement Random writing?
Im not using the STL library so dun worry about the syntax.
instead of using push_back, vector here is using .add
randomInteger will generate randome integer between ranges
I would like to produce 2000 character if possible;
I think the slowest part is reading the file char by char?
void generateText(int order, string initSeed, string filename){
Map<string , Vector<char> > model;
char ch;
string key;
ifstream input(filename.c_str());
for(int i = 0; i < order; i++){
input.get(ch);
key+=ch;
}
while(input.get(ch)){
model[key].add(ch);
key = key.substr(1,key.length()-1) + ch;
}
string result;
string seed = initSeed;
for(int i = 0;i<2000;i++){
if (model[seed].size() >0) {
ch = model[seed][randomInteger(0, model[seed].size()-1)];
cout << ch;
seed = seed.substr(1,seed.length()-1) + ch;
}
else
return;
}
}

You need to determine that it is taking too long. (How is this code not running in less than a second on an average laptop?)
If it is, you need to profile.
For example, a likely candidate is the cost of generating random numbers...
You'll only disprove me by profiling ;)

I think it is a bit slow because it creates lots of temporary strings during the analysis phase.
for(int i = 0; i < order; i++){
input.get(ch);
key+=ch; // key = key + ch, at least one new string created
}
while(input.get(ch)){
model[key].add(ch); // key copied to hash table
key = key.substr(1,key.length()-1) + ch; // a couple of temp strings created
}
You could do instead like this:
char key[order + 1]; // pseudo code, won't work because order is not constant
key[order] = 0; /* NUL terminate */
for (int i = 0; i < order; i++) {
input.get(key[i]);
}
while (!(input.eof())) {
for (int j = 0; j < order - 1; k++) {
key[j] = key[j + 1];
}
input.get(key[order]);
model[key].add(ch);
}
Here the only string that is actually created is the string that ends up as a key in the hash table. The key is rotated in a simple character array, avoiding string temporaries.

Related

Find word in string buffer/paragraph/text

This was asked in Amazon telephonic interview - "Can you write a program (in your preferred language C/C++/etc.) to find a given word in a string buffer of big size ? i.e. number of occurrences "
I am still looking for perfect answer which I should have given to the interviewer.. I tried to write a linear search (char by char comparison) and obviously I was rejected.
Given a 40-45 min time for a telephonic interview, what was the perfect algorithm he/she was looking for ???
The KMP Algorithm is a popular string matching algorithm.
KMP Algorithm
Checking char by char is inefficient. If the string has 1000 characters and the keyword has 100 characters, you don't want to perform unnecessary comparisons. The KMP Algorithm handles many cases which can occur, but I imagine the interviewer was looking for the case where: When you begin (pass 1), the first 99 characters match, but the 100th character doesn't match. Now, for pass 2, instead of performing the entire comparison from character 2, you have enough information to deduce where the next possible match can begin.
// C program for implementation of KMP pattern searching
// algorithm
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void computeLPSArray(char *pat, int M, int *lps);
void KMPSearch(char *pat, char *txt)
{
int M = strlen(pat);
int N = strlen(txt);
// create lps[] that will hold the longest prefix suffix
// values for pattern
int *lps = (int *)malloc(sizeof(int)*M);
int j = 0; // index for pat[]
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
while (i < N)
{
if (pat[j] == txt[i])
{
j++;
i++;
}
if (j == M)
{
printf("Found pattern at index %d \n", i-j);
j = lps[j-1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i])
{
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j-1];
else
i = i+1;
}
}
free(lps); // to avoid memory leak
}
void computeLPSArray(char *pat, int M, int *lps)
{
int len = 0; // length of the previous longest prefix suffix
int i;
lps[0] = 0; // lps[0] is always 0
i = 1;
// the loop calculates lps[i] for i = 1 to M-1
while (i < M)
{
if (pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
if (len != 0)
{
// This is tricky. Consider the example
// AAACAAAA and i = 7.
len = lps[len-1];
// Also, note that we do not increment i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
// Driver program to test above function
int main()
{
char *txt = "ABABDABACDABABCABAB";
char *pat = "ABABCABAB";
KMPSearch(pat, txt);
return 0;
}
This code is taken from a really good site that teaches algorithms:
Geeks for Geeks KMP
Amazon and companies alike expect knowledge of Boyer–Moore string search or / and Knuth–Morris–Pratt algorithms.
Those are good if you want to show perfect knowledge. Otherwise, try to be creative and write something relatively elegant and efficient.
Did you ask about delimiters before you wrote anything? It could be that they may simplify your task to provide some extra information about a string buffer.
Even code below could be ok (it's really not) if you provide enough information in advance, properly explain runtime, space requirements, choice of data containers.
int find( std::string & the_word, std::string & text )
{
std::stringstream ss( text ); // !!! could be really bad idea if 'text' is really big
std::string word;
std::unordered_map< std::string, int > umap;
while( ss >> text ) ++umap[text]; // you have to assume that each word separated by white-spaces.
return umap[the_word];
}

Sorting Structures

Basically I need to sort arrays of structs by value from highest to lowest.
I must read from file into structure and then sort it.
Initial information:
6
m k 250
f k 280
m p 240
f p 290
m s 63
f s 45
My attempt: (sorting part might be incorrect)
using namespace std;
struct clothes
{
char gender;
char type;
int price;
};
int main()
{
ifstream file("duomenys.txt");
int amount;
int end;
file >> amount;
end = amount;
clothes robe[amount];
for(int x = 0; x<amount; x++)
{
file >> robe[x].gender >> robe[x].type >> robe[x].price;
}
for(int x = amount - 1; x>0; x--)
{
for(int i = 0; i < end; i++)
{
if(robe[i].price > robe[i+1].price)
{
???
}
}
end--;
}
return 0;
}
I'm pretty new in programming so please keep your answer as beginner-friendly as possible as I don't know much.
How do I swap information between struct robe[0] and robe[1] and sort them after checking if price is higher ?
#Beta
So I came up with something that worked, I declared additional struct which I used for holding values while swapping, and then swapped each value (gender, type, price) seperately into holder seperately.
robeH.genderH = robe[i].gender;
robe[i].gender = robe[i+1].gender;
robe[i+1].gender = robeH.genderH;
robeH.typeH = robe[i].type;
robe[i].type = robe[i+1].type;
robe[i+1].type = robeH.typeH;
robeH.priceH = robe[i].price;
robe[i].price = robe[i+1].price;
robe[i+1].price = robeH.priceH;
Thanks for help
P.S. Is it good practice to do it like I did or is there better way

Parsing morse code

I am trying to solve this problem.
The goal is to determine the number of ways a morse string can be interpreted, given a dictionary of word.
What I did is that I first "translated" words from my dictionary into morse. Then, I used a naive algorithm, searching for all the ways it can be interpreted recursively.
#include <iostream>
#include <vector>
#include <map>
#include <string>
#include <iterator>
using namespace std;
string morse_string;
int morse_string_size;
map<char, string> morse_table;
unsigned int sol;
void matches(int i, int factor, vector<string> &dictionary) {
int suffix_length = morse_string_size-i;
if (suffix_length <= 0) {
sol += factor;
return;
}
map<int, int> c;
for (vector<string>::iterator it = dictionary.begin() ; it != dictionary.end() ; it++) {
if (((*it).size() <= suffix_length) && (morse_string.substr(i, (*it).size()) == *it)) {
if (c.find((*it).size()) == c.end())
c[(*it).size()] = 0;
else
c[(*it).size()]++;
}
}
for (map<int, int>::iterator it = c.begin() ; it != c.end() ; it++) {
matches(i+it->first, factor*(it->second), dictionary);
}
}
string encode_morse(string s) {
string ret = "";
for (unsigned int i = 0 ; i < s.length() ; ++i) {
ret += morse_table[s[i]];
}
return ret;
}
int main() {
morse_table['A'] = ".-"; morse_table['B'] = "-..."; morse_table['C'] = "-.-."; morse_table['D'] = "-.."; morse_table['E'] = "."; morse_table['F'] = "..-."; morse_table['G'] = "--."; morse_table['H'] = "...."; morse_table['I'] = ".."; morse_table['J'] = ".---"; morse_table['K'] = "-.-"; morse_table['L'] = ".-.."; morse_table['M'] = "--"; morse_table['N'] = "-."; morse_table['O'] = "---"; morse_table['P'] = ".--."; morse_table['Q'] = "--.-"; morse_table['R'] = ".-."; morse_table['S'] = "..."; morse_table['T'] = "-"; morse_table['U'] = "..-"; morse_table['V'] = "...-"; morse_table['W'] = ".--"; morse_table['X'] = "-..-"; morse_table['Y'] = "-.--"; morse_table['Z'] = "--..";
int T, N;
string tmp;
vector<string> dictionary;
cin >> T;
while (T--) {
morse_string = "";
cin >> morse_string;
morse_string_size = morse_string.size();
cin >> N;
for (int j = 0 ; j < N ; j++) {
cin >> tmp;
dictionary.push_back(encode_morse(tmp));
}
sol = 0;
matches(0, 1, dictionary);
cout << sol;
if (T)
cout << endl << endl;
}
return 0;
}
Now the thing is that I only have 3 seconds of execution time allowed, and my algorithm won't work under this limit of time.
Is this the good way to do this and if so, what am I missing ? Otherwise, can you give some hints about what is a good strategy ?
EDIT :
There can be at most 10 000 words in the dictionary and at most 1000 characters in the morse string.
A solution that combines dynamic programming with a rolling hash should work for this problem.
Let's start with a simple dynamic programming solution. We allocate an vector which we will use to store known counts for prefixes of morse_string. We then iterate through morse_string and at each position we iterate through all words and we look back to see if they can fit into morse_string. If they can fit then we use the dynamic programming vector to determine how many ways we could have build the prefix of morse_string up to i-dictionaryWord.size()
vector<long>dp;
dp.push_back(1);
for (int i=0;i<morse_string.size();i++) {
long count = 0;
for (int j=1;j<dictionary.size();j++) {
if (dictionary[j].size() > i) continue;
if (dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)) {
count += dp[i-dictionary[j].size()];
}
}
dp.push_back(count);
}
result = dp[morse_code.size()]
The problem with this solution is that it is too slow. Let's say that N is the length of morse_string and M is the size of the dictionary and K is the size of the largest word in the dictionary. It will do O(N*M*K) operations. If we assume K=1000 this is about 10^10 operations which is too slow on most machines.
The K cost came from the line dictionary[j] == morse_string.substring(i-dictionary[j].size(),i)
If we could speed up this string matching to constant or log complexity we would be okay. This is where rolling hashing comes in. If you build a rolling hash array of morse_string then the idea is that you can compute the hash of any substring of morse_string in O(1). So you could then do hash(dictionary[j]) == hash(morse_string.substring(i-dictionary[j].size(),i))
This is good but in the presence of imperfect hashing you could have multiple words from the dictionary with the same hash. That would mean that after getting a hash match you would still need to match the strings as well as the hashes. In programming contests, people often assume perfect hashing and skip the string matching. This is often a safe bet especially on a small dictionary. In case it doesn't produce a perfect hashing (which you can check in code) you can always adjust your hash function slightly and maybe the adjusted hash function will produce a perfect hashing.

how to avoid number repeation by using random class in c#?

hi i am using Random class for getting random numbers but my requirement is once it generate one no that should not be repeate again pls help me.
Keep a list of the generated numbers and check this list before returning the next random.
Since you have not specified a language, I'll use C#
List<int> generated = new List<int>;
public int Next()
{
int r;
do { r = Random.Next() } while generated.Contains(r);
generated.Add(r);
return r;
}
The following C# code shows how to obtain 7 random cards with no duplicates. It is the most efficient method to use when your random number range is between 1 and 64 and are integers:
ulong Card, SevenCardHand;
int CardLoop;
const int CardsInDeck = 52;
Random RandObj = new Random(Seed);
for (CardLoop = 0; CardLoop < 7; CardLoop++)
{
do
{
Card = (1UL << RandObj.Next(CardsInDeck));
} while ((SevenCardHand & Card) != 0);
SevenCardHand |= Card;
}
If the random number range is greater than 64, then the next most efficient way to get random numbers without any duplicates is as follows from this C# code:
const int MaxNums = 1000;
int[] OutBuf = new int[MaxNums];
int MaxInt = 250000; // Reps the largest random number that should be returned.
int Loop, Val;
// Init the OutBuf with random numbers between 1 and MaxInt, which is 250,000.
BitArray BA = new BitArray(MaxInt + 1);
for (Loop = 0; Loop < MaxNums; Loop++)
{
// Avoid duplicate numbers.
for (; ; )
{
Val = RandObj.Next(MaxInt + 1);
if (BA.Get(Val))
continue;
OutBuf[Loop] = Val;
BA.Set(Val, true);
break;
}
}
The drawback with this technique is that it tends to use more memory, but it should be significantly faster than other approaches since it does not have to look through a large container each time a random number is obtained.

Most efficient way to sort parallel arrays in a restricted-feature language

The environment: I am working in a proprietary scripting language where there is no such thing as a user-defined function. I have various loops and local variables of primitive types that I can create and use.
I have two related arrays, "times" and "values". They both contain floating point values. I want to numerically sort the "times" array but have to be sure that the same operations are applied on the "values" array. What's the most efficient way I can do this without the benefit of things like recursion?
You could maintain an index table and sort the index table instead.
This way you will not have to worry about times and values being consistent.
And whenever you need a sorted value, you can lookup on the sorted index.
And if in the future you decided there was going to be a third value, the sorting code will not need any changes.
Here's a sample in C#, but it shouldn't be hard to adapt to your scripting language:
static void Main() {
var r = new Random();
// initialize random data
var index = new int[10]; // the index table
var times = new double[10]; // times
var values = new double[10]; // values
for (int i = 0; i < 10; i++) {
index[i] = i;
times[i] = r.NextDouble();
values[i] = r.NextDouble();
}
// a naive bubble sort
for (int i = 0; i < 10; i++)
for (int j = 0; j < 10; j++)
// compare time value at current index
if (times[index[i]] < times[index[j]]) {
// swap index value (times and values remain unchanged)
var temp = index[i];
index[i] = index[j];
index[j] = temp;
}
// check if the result is correct
for (int i = 0; i < 10; i++)
Console.WriteLine(times[index[i]]);
Console.ReadKey();
}
Note: I used a naive bubble sort there, watchout. In your case, an insertion sort is probably a good candidate. Since you don't want complex recursions.
Just take your favourite sorting algorithm (e.g. Quicksort or Mergesort) and use it to sort the "values" array. Whenever two values are swapped in "values", also swap the values with the same indices in the "times" array.
So basically you can take any fast sorting algorithm and modify the swap() operation so that elements in both arrays are swapped.
Take a look at the Bottom-Up mergesort at Algorithmist. It's a non-recursive way of performing a mergesort. The version presented there uses function calls, but that can be inlined easily enough.
Like martinus said, every time you change a value in one array, do the exact same thing in the parallel array.
Here's a C-like version of a stable-non-recursive mergesort that makes no function calls, and uses no recursion.
const int arrayLength = 40;
float times_array[arrayLength];
float values_array[arrayLength];
// Fill the two arrays....
// Allocate two buffers
float times_buffer[arrayLength];
float values_buffer[arrayLength];
int blockSize = 1;
while (blockSize <= arrayLength)
{
int i = 0;
while (i < arrayLength-blockSize)
{
int begin1 = i;
int end1 = begin1 + blockSize;
int begin2 = end1;
int end2 = begin2 + blockSize;
int bufferIndex = begin1;
while (begin1 < end1 && begin2 < end2)
{
if ( values_array[begin1] > times_array[begin2] )
{
times_buffer[bufferIndex] = times_array[begin2];
values_buffer[bufferIndex++] = values_array[begin2++];
}
else
{
times_buffer[bufferIndex] = times_array[begin1];
values_buffer[bufferIndex++] = values_array[begin1++];
}
}
while ( begin1 < end1 )
{
times_buffer[bufferIndex] = times_array[begin1];
values_buffer[bufferIndex++] = values_array[begin1++];
}
while ( begin2 < end2 )
{
times_buffer[bufferIndex] = times_array[begin2];
values_buffer[bufferIndex++] = values_array[begin2++];
}
for (int k = i; k < i + 2 * blockSize; ++k)
{
times_array[k] = times_buffer[k];
values_array[k] = values_buffer[k];
}
i += 2 * blockSize;
}
blockSize *= 2;
}
I wouldn't suggest writing your own sorting routine, as the sorting routines provided as part of the Java language are well optimized.
The way I'd solve this is to copy the code in the java.util.Arrays class into your own class i.e. org.mydomain.util.Arrays. And add some comments telling yourself not to use the class except when you must have the additional functionality that you're going to add. The Arrays class is quite stable so this is less, less ideal than it would seem, but it's still less than ideal. However, the methods you need to change are private, so you've no real choice.
You then want to create an interface along the lines of:
public static interface SwapHook {
void swap(int a, int b);
}
You then need to add this to the sort method you're going to use, and to every subordinate method called in the sorting procedure, which swaps elements in your primary array. You arrange for the hook to get called by your modified sorting routine, and you can then implement the SortHook interface to achieve the behaviour you want in any secondary (e.g. parallel) arrays.
HTH.

Resources