How to find a frequent character in a string written in pseudocode. Thanks - pseudocode

Most Frequent Character
Design a program that prompts the user to enter a string, and displays the character that appears most frequently in the string.
It is a homework question, but my teacher wasn't helpful and its driving me crazy i can't figure this out.
Thank You in advance.
This is what i have so far!
Declare String str
Declare Integer maxChar
Declare Integer index
Set maxChar = 0
Display “Enter anything you want.”
Input str
For index = 0 To length(str) – 1
If str[index] =
And now im stuck. I dont think its right and i dont know where to go with it!

It seems to me that the way you want to do it is:
"Go through every character in the string and remember the character we've seen most times".
However, that won't work. If we only remember the count for a single character, like "the character we've seen most times is 'a' with 5 occurrences", we can't know if perhaps the character in the 2nd place doesn't jump ahead.
So, what you have to do is this:
Go through every character of the string.
For every character, increase the occurrence count for that character. Yes, you have to save this count for every single character you encounter. Simple variables like string or int are not going to be enough here.
When you're done, you're left with a bunch of data looking like "a"=5, "b"=2, "e"=7,... you have to go though that and find the highest number (I'm sure you can find examples for finding the highest number in a sequence), then return the letter which this corresponds to.
Not a complete answer, I know, but that's all I'm going to say.
If you're stuck, I suggest getting a pen and a piece of paper and trying to calculate it manually. Try to think - how would you do it without a computer? If your answer is "look and see", what if the text is 10 pages? I know it can be pretty confusing, but the point of all this is to get you used to a different way of thinking. If you figure this one out, the next time will be easier because the basic principles are always the same.

This is the code I have created to count all occurences in a string.
String abc = "aabcabccc";
char[] x = abc.toCharArray();
String _array = "";
for(int i = 0; i < x.length; i++) //copy distinct data to a new string
{
if(_array.indexOf(x[i]) == -1)
_array = _array+x[i];
}
char[] y = _array.toCharArray();
int[] count1 = new int[_array.length()];
for(int j = 0; j<x.length;j++) //count occurences
{
count1[new String(String.valueOf(y)).indexOf(x[j])]++;
}
for(int i = 0; i<y.length;i++) //display
{
System.out.println(y[i] + " = " + count1[i]);
}

Related

Replace a part of string in big array with replace method and Java 8

I have array String[] values = new String[100], and I need to check all strings from 10 to 35 with Java 8. Cause I don't want to do it with if and else.
For example:
for (int i = 10; i <= 35; i++){
if (values[i].contains("something")) {
values[i].replace("something", "something else");
}
}
How can I do it with Java 8 and and small amount of code?
Help me please.
Here's an alternative approach that might be a bit cleaner:
Arrays.asList(values).subList(10, 35 + 1)
.replaceAll(s -> s.replace("something", "something else"));
NOTES:
subList() takes a half-open interval, hence the second argument has +1.
The result of String.replace() has to be assigned or returned, not thrown away, since of course it can't modify the original string.
There's no point in calling String.contains() since that's implemented in terms of String.indexOf(). One of the first things String.replace() does is to call String.indexOf() and bail out if the string isn't found.
I can think of this, but it's not any more readable or effective than what you already have in place:
IntStream.rangeClosed(10, 35)
.forEach(ix -> values[ix] = values[ix].replace("something", "something2"));

Storing pointers wrong/not using Unordered_map.find correctly

so the title essentially says it all. I am writing a symbol table in c++ for a compiler project I am working on, and all is going well except for looking up identifiers in the table.
So this is how I store into the table (pseudo like):
vector<symbolTable*>* symbolStack = new symbolTable();
//where a symbolStack is a vector of unordered_maps (symbolTables),
//each iteration in vector referencing a new block of code.
string* check = new string(root->children[0]->lexicode->c_str());
symbol* sym = new symbol();
...... //setting sym info
symbol_entry pair = make_pair(check, test)
//the unordered_map has keys of (string*, symbol*)
symbolStack[tableNumber]->insert(pair);
I am pretty solid that this works, as I have tested printing the size/infos from the map and it all seems to be storing as expect. Here is where the problem is happening for me (this takes place in a different function later):
for(int i = 0; i =< tableNumber;i++){
auto finder = symbolStack[i]->find(checkS) //checkS == check from above
if(finder == symbolStack[i]->end()) cout<<not found;
else cout<<we did it!!!!
My else is never reached. However, if I do this assuming the string*->c_str() == "test":
cout<<string->c_str(); // prints out "test"
cout<<finder->second->c_str() //prints out "test".
So the question. Why is it finding the key, and knowing it found the key, but at the same time returning that is has reached the end of the symbol stack without finding it? I have been trying to figure this out for a good 4 days solid now. Is it that my pointers are somehow off? Any insight is appreciated greatly.
So somewhat answer to my own question.
First I will say this: I have concluded the comparison with find() or similar methods do not work because for some reason the pointers are not matching up. I have no clue why this is still, or what I am doing wrong.
What I did to solve my issue and complete my code is this:
for(int k = 0; k<= tableNumber; k++){
unordered_map<string*,symbol*>::iterator it;
for(it = symbolStack[k]->begin(); it != symbolStack[k]->end(); it++)
{
string a = targetString->c_str();
string b = it->first->c_str();
if(a.compare(b) == 0) cout<<"You have found the match! \n";
}
}
}
So this answers how to get it working pragmatically if somebody else is in a similar ship, however not really answers why my other attempt failed other than noticing the pointer values were different.
In symbolTable you store pointers to strings as keys, not strings themselves. Therefore unordered_map compares pointers, not strings, and cannot find matching items. When you reconstruct the key string (as in your answer, using string b = it->first->c_str()), the comparison on strings works again. So, either you need to store string instead of string * in symbolTable, or you need to provide your own comparison function that will compare keys of type string *.

Converting an if code into forloop statement

Right now i have to write a code that will print out "*" for each collectedDots, however if it doesn't collect any collectedDots==0 then it print out "". Using too many if statements look messy and i was wandering how you would implement the forloop in this case.
As a general principle the kind of rearrangement you've done here is good. You have found a way to express the rule in a general way rather than as a sequence of special cases. This is much easier to reason about and to check, and it's obviously extensible to cases where you have more than 3 dots.
You probably have made an error in confusing your target number and the iteration value, I assume that collectedDots contains the number of dots you have (as per your if statement) and so you need to introduce a variable to count up to that value
for (int i =0; i <= collectedDots; i++)
{
stars = "*";
System.out.print(stars);
}
Ok, so you already have a variable called collectedDots that is a number which tells you how many stars to print?
So your loop would be something like
for every collected dot
print *
But you can't just print it out, you need to return a string that will be printed out. So it's more like
for every collected dot
add a * to our string
return the string
They key difference between this and your attempt so far is that you were assigning a star to be your string each time through the loop, then at the end of it, you return that string–no matter how many times you assign a star to the string, the string will always just be one star.
You also need a separate variable to keep track of your loop, this should do the trick:
String stars = "";
for(int i = 0; i < collectedDots; i++)
{
stars = stars + "*";
}
return stars;
You are almost correct. Just need to change range limit of looping. Looping initial value is set to 1. So whenever you have collectedDots = 0, it will not go in loop and will return "", as stars is intialized with "" before loop.
String stars = "";
for (int i =1; i <= collectedDots; i++)
{
stars = "*";
System.out.print(stars);
}
return stars;

Making a list of integers more human friendly

This is a bit of a side project I have taken on to solve a no-fix issue for work. Our system outputs a code to represent a combination of things on another thing. Some example codes are:
9-9-0-4-4-5-4-0-2-0-0-0-2-0-0-0-0-0-2-1-2-1-2-2-2-4
9-5-0-7-4-3-5-7-4-0-5-1-4-2-1-5-5-4-6-3-7-9-72
9-15-0-9-1-6-2-1-2-0-0-1-6-0-7
The max number in one of the slots I've seen so far is about 150 but they will likely go higher.
When the system was designed there was no requirement for what this code would look like. But now the client wants to be able to type it in by hand from a sheet of paper, something the code above isn't suited for. We've said we won't do anything about it, but it seems like a fun challenge to take on.
My question is where is a good place to start loss-less compressing this code? Obvious solutions such as store this code with a shorter key are not an option; our database is read only. I need to build a two way method to make this code more human friendly.
1) I agree that you definately need a checksum - data entry errors are very common, unless you have really well trained staff and independent duplicate keying with automatic crosss-checking.
2) I suggest http://en.wikipedia.org/wiki/Huffman_coding to turn your list of numbers into a stream of bits. To get the probabilities required for this, you need a decent sized sample of real data, so you can make a count, setting Ni to the number of times number i appears in the data. Then I suggest setting Pi = (Ni + 1) / (Sum_i (Ni + 1)) - which smooths the probabilities a bit. Also, with this method, if you see e.g. numbers 0-150 you could add a bit of slack by entering numbers 151-255 and setting them to Ni = 0. Another way round rare large numbers would be to add some sort of escape sequence.
3) Finding a way for people to type the resulting sequence of bits is really an applied psychology problem but here are some suggestions of ideas to pinch.
3a) Software licences - just encode six bits per character in some 64-character alphabet, but group characters in a way that makes it easier for people to keep place e.g. BC017-06777-14871-160C4
3b) UK car license plates. Use a change of alphabet to show people how to group characters e.g. ABCD0123EFGH4567IJKL...
3c) A really large alphabet - get yourself a list of 2^n words for some decent sized n and encode n bits as a word e.g. GREEN ENCHANTED LOGICIAN... -
i worried about this problem a while back. it turns out that you can't do much better than base64 - trying to squeeze a few more bits per character isn't really worth the effort (once you get into "strange" numbers of bits encoding and decoding becomes more complex). but at the same time, you end up with something that's likely to have errors when entered (confusing a 0 with an O etc). one option is to choose a modified set of characters and letters (so it's still base 64, but, say, you substitute ">" for "0". another is to add a checksum. again, for simplicity of implementation, i felt the checksum approach was better.
unfortunately i never got any further - things changed direction - so i can't offer code or a particular checksum choice.
ps i realised there's a missing step i didn't explain: i was going to compress the text into some binary form before encoding (using some standard compression algorithm). so to summarize: compress, add checksum, base64 encode; base 64 decode, check checksum, decompress.
This is similar to what I have used in the past. There are certainly better ways of doing this, but I used this method because it was easy to mirror in Transact-SQL which was a requirement at the time. You could certainly modify this to incorporate Huffman encoding if the distribution of your id's is non-random, but it's probably unnecessary.
You didn't specify language, so this is in c#, but it should be very easy to transition to any language. In the lookup you'll see commonly confused characters are omitted. This should speed up entry. I also had the requirement to have a fixed length, but it would be easy for you to modify this.
static public class CodeGenerator
{
static Dictionary<int, char> _lookupTable = new Dictionary<int, char>();
static CodeGenerator()
{
PrepLookupTable();
}
private static void PrepLookupTable()
{
_lookupTable.Add(0,'3');
_lookupTable.Add(1,'2');
_lookupTable.Add(2,'5');
_lookupTable.Add(3,'4');
_lookupTable.Add(4,'7');
_lookupTable.Add(5,'6');
_lookupTable.Add(6,'9');
_lookupTable.Add(7,'8');
_lookupTable.Add(8,'W');
_lookupTable.Add(9,'Q');
_lookupTable.Add(10,'E');
_lookupTable.Add(11,'T');
_lookupTable.Add(12,'R');
_lookupTable.Add(13,'Y');
_lookupTable.Add(14,'U');
_lookupTable.Add(15,'A');
_lookupTable.Add(16,'P');
_lookupTable.Add(17,'D');
_lookupTable.Add(18,'S');
_lookupTable.Add(19,'G');
_lookupTable.Add(20,'F');
_lookupTable.Add(21,'J');
_lookupTable.Add(22,'H');
_lookupTable.Add(23,'K');
_lookupTable.Add(24,'L');
_lookupTable.Add(25,'Z');
_lookupTable.Add(26,'X');
_lookupTable.Add(27,'V');
_lookupTable.Add(28,'C');
_lookupTable.Add(29,'N');
_lookupTable.Add(30,'B');
}
public static bool TryPCodeDecrypt(string iPCode, out Int64 oDecryptedInt)
{
//Prep the result so we can exit without having to fiddle with it if we hit an error.
oDecryptedInt = 0;
if (iPCode.Length > 3)
{
Char[] Bits = iPCode.ToCharArray(0,iPCode.Length-2);
int CheckInt7 = 0;
int CheckInt3 = 0;
if (!int.TryParse(iPCode[iPCode.Length-1].ToString(),out CheckInt7) ||
!int.TryParse(iPCode[iPCode.Length-2].ToString(),out CheckInt3))
{
//Unsuccessful -- the last check ints are not integers.
return false;
}
//Adjust the CheckInts to the right values.
CheckInt3 -= 2;
CheckInt7 -= 2;
int COffset = iPCode.LastIndexOf('M')+1;
Int64 tempResult = 0;
int cBPos = 0;
while ((cBPos + COffset) < Bits.Length)
{
//Calculate the current position.
int cNum = 0;
foreach (int cKey in _lookupTable.Keys)
{
if (_lookupTable[cKey] == Bits[cBPos + COffset])
{
cNum = cKey;
}
}
tempResult += cNum * (Int64)Math.Pow((double)31, (double)(Bits.Length - (cBPos + COffset + 1)));
cBPos += 1;
}
if (tempResult % 7 == CheckInt7 && tempResult % 3 == CheckInt3)
{
oDecryptedInt = tempResult;
return true;
}
return false;
}
else
{
//Unsuccessful -- too short.
return false;
}
}
public static string PCodeEncrypt(int iIntToEncrypt, int iMinLength)
{
int Check7 = (iIntToEncrypt % 7) + 2;
int Check3 = (iIntToEncrypt % 3) + 2;
StringBuilder result = new StringBuilder();
result.Insert(0, Check7);
result.Insert(0, Check3);
int workingNum = iIntToEncrypt;
while (workingNum > 0)
{
result.Insert(0, _lookupTable[workingNum % 31]);
workingNum /= 31;
}
if (result.Length < iMinLength)
{
for (int i = result.Length + 1; i <= iMinLength; i++)
{
result.Insert(0, 'M');
}
}
return result.ToString();
}
}

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

What is the right way to split a string into words ?
(string doesn't contain any spaces or punctuation marks)
For example: "stringintowords" -> "String Into Words"
Could you please advise what algorithm should be used here ?
! Update: For those who think this question is just for curiosity. This algorithm could be used to camеlcase domain names ("sportandfishing .com" -> "SportAndFishing .com") and this algo is currently used by aboutus dot org to do this conversion dynamically.
Let's assume that you have a function isWord(w), which checks if w is a word using a dictionary. Let's for simplicity also assume for now that you only want to know whether for some word w such a splitting is possible. This can be easily done with dynamic programming.
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for i=2 to length(w) calculate
S[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and isWord[j..i]).
This takes O(length(w)^2) time, if dictionary queries are constant time. To actually find the splitting, just store the winning split in each S[i] that is set to true. This can also be adapted to enumerate all solution by storing all such splits.
As mentioned by many people here, this is a standard, easy dynamic programming problem: the best solution is given by Falk Hüffner. Additional info though:
(a) you should consider implementing isWord with a trie, which will save you a lot of time if you use properly (that is by incrementally testing for words).
(b) typing "segmentation dynamic programming" yields a score of more detail answers, from university level lectures with pseudo-code algorithm, such as this lecture at Duke's (which even goes so far as to provide a simple probabilistic approach to deal with what to do when you have words that won't be contained in any dictionary).
There should be a fair bit in the academic literature on this. The key words you want to search for are word segmentation. This paper looks promising, for example.
In general, you'll probably want to learn about markov models and the viterbi algorithm. The latter is a dynamic programming algorithm that may allow you to find plausible segmentations for a string without exhaustively testing every possible segmentation. The essential insight here is that if you have n possible segmentations for the first m characters, and you only want to find the most likely segmentation, you don't need to evaluate every one of these against subsequent characters - you only need to continue evaluating the most likely one.
If you want to ensure that you get this right, you'll have to use a dictionary based approach and it'll be horrendously inefficient. You'll also have to expect to receive multiple results from your algorithm.
For example: windowsteamblog (of http://windowsteamblog.com/ fame)
windows team blog
window steam blog
Consider the sheer number of possible splittings for a given string. If you have n characters in the string, there are n-1 possible places to split. For example, for the string cat, you can split before the a and you can split before the t. This results in 4 possible splittings.
You could look at this problem as choosing where you need to split the string. You also need to choose how many splits there will be. So there are Sum(i = 0 to n - 1, n - 1 choose i) possible splittings. By the Binomial Coefficient Theorem, with x and y both being 1, this is equal to pow(2, n-1).
Granted, a lot of this computation rests on common subproblems, so Dynamic Programming might speed up your algorithm. Off the top of my head, computing a boolean matrix M such M[i,j] is true if and only if the substring of your given string from i to j is a word would help out quite a bit. You still have an exponential number of possible segmentations, but you would quickly be able to eliminate a segmentation if an early split did not form a word. A solution would then be a sequence of integers (i0, j0, i1, j1, ...) with the condition that j sub k = i sub (k + 1).
If your goal is correctly camel case URL's, I would sidestep the problem and go for something a little more direct: Get the homepage for the URL, remove any spaces and capitalization from the source HTML, and search for your string. If there is a match, find that section in the original HTML and return it. You'd need an array of NumSpaces that declares how much whitespace occurs in the original string like so:
Needle: isashort
Haystack: This is a short phrase
Preprocessed: thisisashortphrase
NumSpaces : 000011233333444444
And your answer would come from:
location = prepocessed.Search(Needle)
locationInOriginal = location + NumSpaces[location]
originalLength = Needle.length() + NumSpaces[location + needle.length()] - NumSpaces[location]
Haystack.substring(locationInOriginal, originalLength)
Of course, this would break if madduckets.com did not have "Mad Duckets" somewhere on the home page. Alas, that is the price you pay for avoiding an exponential problem.
This can be actually done (to a certain degree) without dictionary. Essentially, this is an unsupervised word segmentation problem. You need to collect a large list of domain names, apply an unsupervised segmentation learning algorithm (e.g. Morfessor) and apply the learned model for new domain names. I'm not sure how well it would work, though (but it would be interesting).
This is basically a variation of a knapsack problem, so what you need is a comprehensive list of words and any of the solutions covered in Wiki.
With fairly-sized dictionary this is going to be insanely resource-intensive and lengthy operation, and you cannot even be sure that this problem will be solved.
Create a list of possible words, sort it from long words to short words.
Check if each entry in the list against the first part of the string. If it equals, remove this and append it at your sentence with a space. Repeat this.
A simple Java solution which has O(n^2) running time.
public class Solution {
// should contain the list of all words, or you can use any other data structure (e.g. a Trie)
private HashSet<String> dictionary;
public String parse(String s) {
return parse(s, new HashMap<String, String>());
}
public String parse(String s, HashMap<String, String> map) {
if (map.containsKey(s)) {
return map.get(s);
}
if (dictionary.contains(s)) {
return s;
}
for (int left = 1; left < s.length(); left++) {
String leftSub = s.substring(0, left);
if (!dictionary.contains(leftSub)) {
continue;
}
String rightSub = s.substring(left);
String rightParsed = parse(rightSub, map);
if (rightParsed != null) {
String parsed = leftSub + " " + rightParsed;
map.put(s, parsed);
return parsed;
}
}
map.put(s, null);
return null;
}
}
I was looking at the problem and thought maybe I could share how I did it.
It's a little too hard to explain my algorithm in words so maybe I could share my optimized solution in pseudocode:
string mainword = "stringintowords";
array substrings = get_all_substrings(mainword);
/** this way, one does not check the dictionary to check for word validity
* on every substring; It would only be queried once and for all,
* eliminating multiple travels to the data storage
*/
string query = "select word from dictionary where word in " + substrings;
array validwords = execute(query).getArray();
validwords = validwords.sort(length, desc);
array segments = [];
while(mainword != ""){
for(x = 0; x < validwords.length; x++){
if(mainword.startswith(validwords[x])) {
segments.push(validwords[x]);
mainword = mainword.remove(v);
x = 0;
}
}
/**
* remove the first character if any of valid words do not match, then start again
* you may need to add the first character to the result if you want to
*/
mainword = mainword.substring(1);
}
string result = segments.join(" ");

Resources