I'm trying to add a value to individual characters

I'm trying to add a value to individual characters - for-loop

I'm working on a Scrabble assignment and I'm trying to assign values to letters. Like in Scrabble, A, E, I, O, U, L, N, S, T, R are all equal to 1. I had some help in figuring out how to add the score up once I assign values, but now I'm trying to figure out how to assign values. Is there a way to create one variable for all the values? That doesn't really make sense to me.
I was also thinking I could do an if-else statement. Like if the letter equals any of those letters, value = 1, else if the letter equals D or G, value = 2 and so on. There are 7 different scores so it's kind of annoying and not efficient, but I'm not really sure what a better way might be. I'm new to programming, a novice, so I'm looking for advice that takes my level into account.
I have started my program by reading words from a text file into an arraylist. I successfully printed the arraylist, so I know that part worked. Next I'm working on how to read each character of each word and assign a value. Last, I will figure out how to sort it.

it's me from the other question again. You can definitely do an if-statement, but if I'm not wrong Scrabble has 8 different values for letters, so you would need 8 “if”s and also since there are around 25 letters (depending on language) you would have to handle all 25 some way in the if-statements which would be quite clunky in my opinion.
I think the best option is to use a Hash-table. A hash-table is basically like a dictionary where you look up a key and get a value. So I would add each letter as a key and keep the corresponding value as the value. It would look like this:
//initialize empty hash map
Hashtable<String, Integer> letterScores = new Hashtable<>();
//now we can add values with "put"
letterScores.put("A",1)
letterScores.put("B",3)
letterScores.put("X",8)
//etc
To access an element from the hash table we can use the "get"-method.
//returns 1
letterScores.get("A")
So when looping through our word we would essentially get something like this to calculate the value of the word:
int sumValue = 0;
for(int i =0; i < word.length(); i++)}
sumValue += letterScores.get(word.charAt(i))
}
For each character we grab the value entry from the letterScores hash table where we have saved all our letter's corresponding values.

Related

How to call Lua table value explicitly when using integer counter (i,j,k) in a for loop to make the table name/address?

I have to be honest that I don't quite understand Lua that well yet. I am trying to overwrite a local numeric value assigned to a set table address (is this the right term?).
The addresses are of the type:
project.models.stor1.inputs.T_in.default, project.models.stor2.inputs.T_in.default and so on with the stor number increasing.
I would like to do this in a for loop but cannot find the right expression to make the entire string be accepted by Lua as a table address (again, I hope this is the right term).
So far, I tried the following to concatenate the strings but without success in calling and then overwriting the value:
for k = 1,10,1 do
project.models.["stor"..k].inputs.T_in.default = 25
end
for k = 1,10,1 do
"project.models.stor"..j..".T_in.default" = 25
end
EDIT:
I think I found the solution as per https://www.lua.org/pil/2.5.html:
A common mistake for beginners is to confuse a.x with a[x]. The first form represents a["x"], that is, a table indexed by the string "x". The second form is a table indexed by the value of the variable x. See the difference:
for k = 1,10,1 do
project["models"]["stor"..k]["inputs"]["T_in"]["default"] = 25
end

You were almost close.
Lua supports this representation by providing a.name as syntactic sugar for a["name"].
Read more: https://www.lua.org/pil/2.5.html
You can use only one syntax in time.
Either tbl.key or tbl["key"].
The limitation of . is that you can only use constant strings in it (which are also valid variable names).
In square brackets [] you can evaluate runtime expressions.
Correct way to do it:
project.models["stor"..k].inputs.T_in.default = 25
The . in models.["stor"..k] is unnecessary and causes an error. The correct syntax is just models["stor"..k].

Which container to use for given situation?

I am doing a problem and i need to do this task.
I want to add pairs (p1,q1),(p2,q2)..(pn,qn) in such way that
(i) Duplicate pair added only once(like in set).
(ii) I store count how many time each pair are added to set.For ex : (7,2) pair
will present in set only once but if i add 3 times count will 3.
Which container is efficient for this problem in c++?
Little example will be great!
Please ask if you cant understand my problem and sorry for bad English.

How about a std::map<Key, Value> to map your pairs (Key) to their count and as you insert, increment a counter (Value).
using pairs_to_count = std::map<std::pair<T1, T2>, size_t>;
std::pair<T1, T2> p1 = // some value;
std::pair<T1, T2> p2 = // some other value;
pairs_to_count[p1]++;
pairs_to_count[p1]++;
pairs_to_count[p2]++;
pairs_to_count[p2]++;
pairs_to_count[p2]++;
In this code, the operator[] will automatically add a key in the map if it does not exist yet. At that moment, it will initialize the key's corresponding value to zero. But as you insert, even the first time, that value is incremented.
Already after the first insertion, the count of 1 correctly reflects the number of insertion. That value gets incremented as you insert more.
Later, retrieving the count is a matter of calling operator[] again to get value associated with a given key.
size_t const p2_count = pairs_to_count[p2]; // equals 3

Array List looping for a duplicate value

I am looking if there is an "easy" or simple way to make an array of something, Lets say Icecreams.. this would be a class of icecream with various Attributes (ID, flavour, Size, scoops), i would like to run an array that gathers every ice cream ordered and then searches through this list for any duplicate values (2+ same size)
First idea i had was a for loop that creates the array than grabs the ice cream ID for the first instance, and checks its "flavour" against the array, if no duplicate is found the ID is increased by 1 (ID++) and then that Ice creams flavour is ran in the array, if a match is found i would set a Boolean to true.
Every approach i seem to take appears to be rather long winded and i haven't got one working as of yet. hoping some fresh/more experienced eyes would help on this.
In answer to below;
The XML would hold something like below
<iceCream id=1>
<flavour>chocolate</flavour>
<scoops>5</scoops>
</iceCream>
<iceCream id=2>
<flavour>banana</flavour>
<scoops>2</scoops>
</iceCream>
I would want to use drools (probably an array list?) to gather each icecream tag and allow me to check if any of the icecreams have the same flavour and output something (set a boolean to true) if a match is found, My understand was to make an array then run each icecream though the array by using its ID to identify it and inside each loop do ID +1 (int ID = 1) then in the lopp ID++. Aswell as search through the flavour childtag.
int ID = 0;
boolean match = false;
ArrayList iceCreams = new ArrayList($cont.getIceCreams());
for(iceCream $Flavour: (ArrayList<iceCream>)iceCreams)
{
ID++
if($Flavour.getFlavour().equals(icecream with id of (ID variable).getFlavour)
{
match = true;
}
}
if(match)
{etc etc etc}
Something along these lines if this helps?

1) If you have control over the first array creation, why dont you make sure that while insertion, you insert only the icecreams that are unique. So, while you are inserting into the array say ID=1, first iterate through the array and check if there is an icecream in the array with ID as 1, if not you put this into the array and do other stuff.
2) Searching part: now while inserting, make sure that you are doing so based on the ascending oder of IDs, so you can perform binary search for the same.
Note: I dont know drools, i have just posted a logic as per my understanding of the problem.

I don't know drools either, but I'll post the some pseudo code for what I think you are trying to accomplish:
for(i = 0; i < len(ice_cream_array); i++)
{
for(j = (i + 1); j < len(ice_cream_array); j++)
{
if (ice_cream_array[i] == ice_cream_array[j])
break from inner loop
else
there is no match
}
}
You may also want to look up bubble sorts and binary searches.

How to split a string into words. Ex: "stringintowords" -> "String Into Words"?

What is the right way to split a string into words ?
(string doesn't contain any spaces or punctuation marks)
For example: "stringintowords" -> "String Into Words"
Could you please advise what algorithm should be used here ?
! Update: For those who think this question is just for curiosity. This algorithm could be used to camеlcase domain names ("sportandfishing .com" -> "SportAndFishing .com") and this algo is currently used by aboutus dot org to do this conversion dynamically.

Let's assume that you have a function isWord(w), which checks if w is a word using a dictionary. Let's for simplicity also assume for now that you only want to know whether for some word w such a splitting is possible. This can be easily done with dynamic programming.
Let S[1..length(w)] be a table with Boolean entries. S[i] is true if the word w[1..i] can be split. Then set S[1] = isWord(w[1]) and for i=2 to length(w) calculate
S[i] = (isWord[w[1..i] or for any j in {2..i}: S[j-1] and isWord[j..i]).
This takes O(length(w)^2) time, if dictionary queries are constant time. To actually find the splitting, just store the winning split in each S[i] that is set to true. This can also be adapted to enumerate all solution by storing all such splits.

As mentioned by many people here, this is a standard, easy dynamic programming problem: the best solution is given by Falk Hüffner. Additional info though:
(a) you should consider implementing isWord with a trie, which will save you a lot of time if you use properly (that is by incrementally testing for words).
(b) typing "segmentation dynamic programming" yields a score of more detail answers, from university level lectures with pseudo-code algorithm, such as this lecture at Duke's (which even goes so far as to provide a simple probabilistic approach to deal with what to do when you have words that won't be contained in any dictionary).

There should be a fair bit in the academic literature on this. The key words you want to search for are word segmentation. This paper looks promising, for example.
In general, you'll probably want to learn about markov models and the viterbi algorithm. The latter is a dynamic programming algorithm that may allow you to find plausible segmentations for a string without exhaustively testing every possible segmentation. The essential insight here is that if you have n possible segmentations for the first m characters, and you only want to find the most likely segmentation, you don't need to evaluate every one of these against subsequent characters - you only need to continue evaluating the most likely one.

If you want to ensure that you get this right, you'll have to use a dictionary based approach and it'll be horrendously inefficient. You'll also have to expect to receive multiple results from your algorithm.
For example: windowsteamblog (of http://windowsteamblog.com/ fame)
windows team blog
window steam blog

Consider the sheer number of possible splittings for a given string. If you have n characters in the string, there are n-1 possible places to split. For example, for the string cat, you can split before the a and you can split before the t. This results in 4 possible splittings.
You could look at this problem as choosing where you need to split the string. You also need to choose how many splits there will be. So there are Sum(i = 0 to n - 1, n - 1 choose i) possible splittings. By the Binomial Coefficient Theorem, with x and y both being 1, this is equal to pow(2, n-1).
Granted, a lot of this computation rests on common subproblems, so Dynamic Programming might speed up your algorithm. Off the top of my head, computing a boolean matrix M such M[i,j] is true if and only if the substring of your given string from i to j is a word would help out quite a bit. You still have an exponential number of possible segmentations, but you would quickly be able to eliminate a segmentation if an early split did not form a word. A solution would then be a sequence of integers (i0, j0, i1, j1, ...) with the condition that j sub k = i sub (k + 1).
If your goal is correctly camel case URL's, I would sidestep the problem and go for something a little more direct: Get the homepage for the URL, remove any spaces and capitalization from the source HTML, and search for your string. If there is a match, find that section in the original HTML and return it. You'd need an array of NumSpaces that declares how much whitespace occurs in the original string like so:
Needle: isashort
Haystack: This is a short phrase
Preprocessed: thisisashortphrase
NumSpaces : 000011233333444444
And your answer would come from:
location = prepocessed.Search(Needle)
locationInOriginal = location + NumSpaces[location]
originalLength = Needle.length() + NumSpaces[location + needle.length()] - NumSpaces[location]
Haystack.substring(locationInOriginal, originalLength)
Of course, this would break if madduckets.com did not have "Mad Duckets" somewhere on the home page. Alas, that is the price you pay for avoiding an exponential problem.

This can be actually done (to a certain degree) without dictionary. Essentially, this is an unsupervised word segmentation problem. You need to collect a large list of domain names, apply an unsupervised segmentation learning algorithm (e.g. Morfessor) and apply the learned model for new domain names. I'm not sure how well it would work, though (but it would be interesting).

This is basically a variation of a knapsack problem, so what you need is a comprehensive list of words and any of the solutions covered in Wiki.
With fairly-sized dictionary this is going to be insanely resource-intensive and lengthy operation, and you cannot even be sure that this problem will be solved.

Create a list of possible words, sort it from long words to short words.
Check if each entry in the list against the first part of the string. If it equals, remove this and append it at your sentence with a space. Repeat this.

A simple Java solution which has O(n^2) running time.
public class Solution {
// should contain the list of all words, or you can use any other data structure (e.g. a Trie)
private HashSet<String> dictionary;
public String parse(String s) {
return parse(s, new HashMap<String, String>());
}
public String parse(String s, HashMap<String, String> map) {
if (map.containsKey(s)) {
return map.get(s);
}
if (dictionary.contains(s)) {
return s;
}
for (int left = 1; left < s.length(); left++) {
String leftSub = s.substring(0, left);
if (!dictionary.contains(leftSub)) {
continue;
}
String rightSub = s.substring(left);
String rightParsed = parse(rightSub, map);
if (rightParsed != null) {
String parsed = leftSub + " " + rightParsed;
map.put(s, parsed);
return parsed;
}
}
map.put(s, null);
return null;
}
}

I was looking at the problem and thought maybe I could share how I did it.
It's a little too hard to explain my algorithm in words so maybe I could share my optimized solution in pseudocode:
string mainword = "stringintowords";
array substrings = get_all_substrings(mainword);
/** this way, one does not check the dictionary to check for word validity
* on every substring; It would only be queried once and for all,
* eliminating multiple travels to the data storage
*/
string query = "select word from dictionary where word in " + substrings;
array validwords = execute(query).getArray();
validwords = validwords.sort(length, desc);
array segments = [];
while(mainword != ""){
for(x = 0; x < validwords.length; x++){
if(mainword.startswith(validwords[x])) {
segments.push(validwords[x]);
mainword = mainword.remove(v);
x = 0;
}
}
/**
* remove the first character if any of valid words do not match, then start again
* you may need to add the first character to the result if you want to
*/
mainword = mainword.substring(1);
}
string result = segments.join(" ");

How to find all brotherhood strings?

I have a string, and another text file which contains a list of strings.
We call 2 strings "brotherhood strings" when they're exactly the same after sorting alphabetically.
For example, "abc" and "cba" will be sorted into "abc" and "abc", so the original two are brotherhood. But "abc" and "aaa" are not.
So, is there an efficient way to pick out all brotherhood strings from the text file, according to the one string provided?
For example, we have "abc" and a text file which writes like this:
abc
cba
acb
lalala
then "abc", "cba", "acb" are the answers.
Of course, "sort & compare" is a nice try, but by "efficient", i mean if there is a way, we can determine a candidate string is or not brotherhood of the original one after one pass processing.
This is the most efficient way, i think. After all, you can not tell out the answer without even reading candidate strings. For sorting, most of the time, we need to do more than 1 pass to the candidate string. So, hash table might be a good solution, but i've no idea what hash function to choose.

Most efficient algorithm I can think of:
Set up a hash table for the original string. Let each letter be the key, and the number of times the letter appears in the string be the value. Call this hash table inputStringTable
Parse the input string, and each time you see a character, increment the value of the hash entry by one
for each string in the file
create a new hash table. Call this one brotherStringTable.
for each character in the string, add one to a new hash table. If brotherStringTable[character] > inputStringTable[character], this string is not a brother (one character shows up too many times)
once string is parsed, compare each inputStringTable value with the corresponding brotherStringTable value. If one is different, then this string is not a brother string. If all match, then the string is a brother string.
This will be O(nk), where n is the length of the input string (any strings longer than the input string can be discarded immediately) and k is the number of strings in the file. Any sort based algorithm will be O(nk lg n), so in certain cases, this algorithm is faster than a sort based algorithm.

Sorting each string, then comparing it, works out to something like O(N*(k+log S)), where N is the number of strings, k is the search key length, and S is the average string length.
It seems like counting the occurrences of each character might be a possible way to go here (assuming the strings are of a reasonable length). That gives you O(k+N*S). Whether that's actually faster than the sort & compare is obviously going to depend on the values of k, N, and S.
I think that in practice, the cache-thrashing effect of re-writing all the strings in the sorting case will kill performance, compared to any algorithm that doesn't modify the strings...

iterate, sort, compare. that shouldn't be too hard, right?

Let's assume your alphabet is from 'a' to 'z' and you can index an array based on the characters. Then, for each element in a 26 element array, you store the number of times that letter appears in the input string.
Then you go through the set of strings you're searching, and iterate through the characters in each string. You can decrement the count associated with each letter in (a copy of) the array of counts from the key string.
If you finish your loop through the candidate string without having to stop, and you have seen the same number of characters as there were in the input string, it's a match.
This allows you to skip the sorts in favor of a constant-time array copy and a single iteration through each string.
EDIT: Upon further reflection, this is effectively sorting the characters of the first string using a bucket sort.

I think what will help you is the test if two strings are anagrams. Here is how you can do it. I am assuming the string can contain 256 ascii characters for now.
#define NUM_ALPHABETS 256
int alphabets[NUM_ALPHABETS];
bool isAnagram(char *src, char *dest) {
len1 = strlen(src);
len2 = strlen(dest);
if (len1 != len2)
return false;
memset(alphabets, 0, sizeof(alphabets));
for (i = 0; i < len1; i++)
alphabets[src[i]]++;
for (i = 0; i < len2; i++) {
alphabets[dest[i]]--;
if (alphabets[dest[i]] < 0)
return false;
}
return true;
}
This will run in O(mn) if you have 'm' strings in the file of average length 'n'

Sort your query string
Iterate through the Collection, doing the following:
Sort current string
Compare against query string
If it matches, this is a "brotherhood" match, save it/index/whatever you want
That's pretty much it. If you're doing lots of searching, presorting all of your collection will make the routine a lot faster (at the cost of extra memory). If you are doing this even more, you could pre-sort and save a dictionary (or some hashed collection) based off the first character, etc, to find matches much faster.

It's fairly obvious that each brotherhood string will have the same histogram of letters as the original. It is trivial to construct such a histogram, and fairly efficient to test whether the input string has the same histogram as the test string ( you have to increment or decrement counters for twice the length of the input string ).
The steps would be:
construct histogram of test string ( zero an array int histogram[128] and increment position for each character in test string )
for each input string
for each character in input string c, test whether histogram[c] is zero. If it is, it is a non-match and restore the histogram.
decrement histogram[c]
to restore the histogram, traverse the input string back to its start incrementing rather than decrementing
At most, it requires two increments/decrements of an array for each character in the input.

The most efficient answer will depend on the contents of the file. Any algorithm we come up with will have complexity proportional to N (number of words in file) and L (average length of the strings) and possibly V (variety in the length of strings)
If this were a real world situation, I would start with KISS and not try to overcomplicate it. Checking the length of the target string is simple but could help avoid lots of nlogn sort operations.
target = sort_characters("target string")
count = 0
foreach (word in inputfile){
if target.len == word.len && target == sort_characters(word){
count++
}
}

I would recommend:
for each string in text file :
compare size with "source string" (size of brotherhood strings should be equal)
compare hashes (CRC or default framework hash should be good)
in case of equity, do a finer compare with string sorted.
It's not the fastest algorithm but it will work for any alphabet/encoding.

Here's another method, which works if you have a relatively small set of possible "letters" in the strings, or good support for large integers. Basically consists of writing a position-independent hash function...
Assign a different prime number for each letter:
prime['a']=2;
prime['b']=3;
prime['c']=5;
Write a function that runs through a string, repeatedly multiplying the prime associated with each letter into a running product
long long key(char *string)
{
long long product=1;
while (*string++) {
product *= prime[*string];
}
return product;
}
This function will return a guaranteed-unique integer for any set of letters, independent of the order that they appear in the string. Once you've got the value for the "key", you can go through the list of strings to match, and perform the same operation.
Time complexity of this is O(N), of course. You can even re-generate the (sorted) search string by factoring the key. The disadvantage, of course, is that the keys do get large pretty quickly if you have a large alphabet.

Here's an implementation. It creates a dict of the letters of the master, and a string version of the same as string comparisons will be done at C++ speed. When creating a dict of the letters in a trial string, it checks against the master dict in order to fail at the first possible moment - if it finds a letter not in the original, or more of that letter than the original, it will fail. You could replace the strings with integer-based hashes (as per one answer regarding base 26) if that proves quicker. Currently the hash for comparison looks like a3c2b1 for abacca.
This should work out O(N log( min(M,K) )) for N strings of length M and a reference string of length K, and requires the minimum number of lookups of the trial string.
master = "abc"
wordset = "def cba accb aepojpaohge abd bac ajghe aegage abc".split()
def dictmaster(str):
charmap = {}
for char in str:
if char not in charmap:
charmap[char]=1
else:
charmap[char] += 1
return charmap
def dicttrial(str,mastermap):
trialmap = {}
for char in str:
if char in mastermap:
# check if this means there are more incidences
# than in the master
if char not in trialmap:
trialmap[char]=1
else:
trialmap[char] += 1
else:
return False
return trialmap
def dicttostring(hash):
if hash==False:
return False
str = ""
for char in hash:
str += char + `hash[char]`
return str
def testtrial(str,master,mastermap,masterhashstring):
if len(master) != len(str):
return False
trialhashstring=dicttostring(dicttrial(str,mastermap))
if (trialhashstring==False) or (trialhashstring != masterhashstring):
return False
else:
return True
mastermap = dictmaster(master)
masterhashstring = dicttostring(mastermap)
for word in wordset:
if testtrial(word,master,mastermap,masterhashstring):
print word+"\n"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio