Loop sentences using return - for-loop

Im trying to loop through sentences and tags_to_label contains array of position of words in the sentence e.g: Adam lemmington ate an apple so tag_to_label = [1 2] position of the nouns in the sentence. Here im trying to return 1 if the length of tags_to_label > 1 else 0 but my code only loops through one sentence, is there a way to loop through all the sentences?
for i, item in enumerate(tags_to_label):
if len(tags_to_label) > 1:
return 1
else:
return 0

Related

Pseudocode or C# algorithm that returns all possible combinations sets for a number of variables

I have 3 variables with some possible values.
For example:
Var1 - possible values: 1,2,3
Var2 - possible values: a, b, c
var3 - possible values: false, true
Can you please help with an approach that returns all possible combinations?
The result be like:
 1,a,false
 1,a,true,
 1,b,false
 1,b,true,
 1,c,false
 1,c,true
 2,a,false
 2,a,true
 2,b,false
 Etc..
I wish the algorithm could apply to any levels of combinations, for example, the algorithm to work on 4 or 5 varibles with other possible values.
It looks like you're trying to enumerate Cartesian products. Assuming your items are in list_of_lists, this recursive function in pseudo-code will do it:
enumerate_cartesian_prducts(list_of_lists):
if list_of_lists is empty:
return [[]]
this_list = list_of_lists[0]
other_lists = list_of_lists[1: ]
other_cartesian_products = []
return [(e + other_cartesian_product) \
for e in this_list and other_cartesian_product in other_cartesian_products]
Note how the last line would probably be a double loop in most languages: it iterates over all the elements in the first list, all the lists in the cartesian products of the rest, and creates a list of all the appended results.
The simplest solution is to have n nested loops:
for each possible value v1 in var1
for each possible value v2 in var2
for each possible value v3 in var3
print(v1,v2,v3);
end for v3
end for v2
end for v1
In more general case, let's assume you have list of lists that contains n lists(one for every var) and each of these lists contains possible values for each variable. You can solve problem with following recursive function all_combinations.
list_of_lists=[[1...][a...][false...]];
current_comb=[];
all_combinations(list_of_lists,current_comb);
function all_combinations(list_of_lists,current_comb)
if (list_of_lists=[])
print(current_comb);
return;
end if
current_list=list_of_lists[0];
remaining_lists=list_of_lists[1:end];
for each v in current_list
tmp=current_comb;tmp.Append(v);
all_combinations(remaining_lists,tmp);
end for v
Of course when adding variables, soon you will need to deal with combinatorial explosion.
The only clean solution is:
have a function mix( A, B ) which takes two lists and returns a list. That's trivial.
Your final code just looks like this:
result = null
result = mix( result, one of your lists );
result = mix( result, another of your lists );
result = mix( result, yet another of your lists );
result = mix( result, yet another list );
result = mix( result, one more list );
example of mix(A,B) ...
mix(A,B)
result = null
for each A
for each B
result += AB
return result
Assume that each variable has a set or vector associated with is. That is:
set1 = [1, 2, 3]
set2 = [a, b, c]
set3 = [F, T]
Then, one way is to loop over these sets in nested "for" loops. Assume that your output structure is a list of 3-element lists. That is, your output desired looks like this:
[[1,a,F], [1,a,T], [1,b,F],......]
Also assume that (like in Python) you can use a function like "append" to append a 2-element list to your big list. Then try this:
myList = [] #empty list
for i in set1:
for j in set2:
for k in set3:
myList.append([i, j, k]) #Appends 3-element list to big list
You may need to do a deepcopy in the append statement so that all the i's, j's, and k's arene't updated in your master list each time you run through an iteration. This may not be the most efficient, but I think it's relatively straightforward.
Here's something in JavaScript that's pseudocode-like. (I've never coded in C#; maybe I'll try to convert it.)
var sets = [[1,2,3],["a","b","c"],[false,true]],
result = [];
function f(arr,i){
if (i == sets.length){
result.push(arr);
return;
}
for (var j=0; j<sets[i].length; j++){
_arr = arr.slice(); // make a copy of arr
_arr.push(sets[i][j]);
f(_arr,i+1);
}
}
f([],0)
Output:
console.log(result);
[[1,"a",false]
,[1,"a",true]
,[1,"b",false]
,[1,"b",true]
,[1,"c",false]
,[1,"c",true]
,[2,"a",false]
,[2,"a",true]
,[2,"b",false]
,[2,"b",true]
,[2,"c",false]
,[2,"c",true]
,[3,"a",false]
,[3,"a",true]
,[3,"b",false]
,[3,"b",true]
,[3,"c",false]
,[3,"c",true]]
You really ought to look for this elsewhere, and it's not a good stackoverflow question. It's homework and there is an algorithm for this already if you search more using the proper terms.
It's quite simple in fact, if you generalize the algorithm for generating all combinations of digits in a binary string, you should be able to get it:
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
Notice that the right-most column alternates its values every cell, while the second-from-right column alternates every 2 cells, the next column over from that alternates every 4 cells, and the final digit alternates every 8 cells.
For your case, think of the above as what happens when your sets are:
Var1 - possible values: 0,1
Var2 - possible values: 0,1
Var3 - possible values: 0,1
Var4 - possible values: 0,1
Start a counter that keeps track of your position in each set, and start by cycling through the "rightmost" set a full time before bumping the position of the "next-from-right" set by 1. Continue cycling the the sets in this way, bumping a set when the one to its "right" cycles over, until you've finished cycling the set in the "most significant position". You will have generated all possible combinations in the sets.
The other answers have focused on "give the codez", which really just rewards you for posting your homework question here... so I thought I would at least explain a little.

how to find two sentences which have the largest number of common words?

Given a list of sentences, find two sentences which have the largest number of common words.
The common words does not need locate in the same position in the sentences( order does not matter).
Thanks !
update:
Does non-pairwise algorithm for this problem exist? Because pairwise is very straightforward.
my idea is to use inverted index to store where this word appears. This need traverse every word in each sentence. And then create a n*n 2D array which is used to count how many times two sentences appear in same bucket in inverted index.
Assume you have an array of sentences:
String[] sentences
Create some variables which contain default values to keep track of the two sentences with the most common words
sentence1Index = -1
sentence2Index = -1
maxCount = -1
Do a nested loop on sentences array
for i : 0 -> sentences.length
for j : 0 -> sentences.length
Make sure you aren't checking the same sentence
if i != j
Split the Strings by empty space (which will usually give you each word assuming you count some symbols as words)
String[] words1 = sentences[i].splitAt(" ")
String[] words2 = sentences[j].splitAt(" ")
Create a temporary count value for this run
tempCount = 0
Loop between two word arrays (gotten from the two sentences you are comparing)
for a : 0 -> words1 .length
for b : 0 -> words2.length
If the word is the same, then increment temp count
if words[a] equal-to-ignore-case words[b]
tempCount++
After finishing comparing words, if the tempCount is greater than current maxCount, update all values that keep track of you are looking for
if tempCount > maxCount
sentence1Index = i
sentence2Index = j
maxCount = tempCount
Return newly created array which the two sentences
if sentence1Index != -1 and sentence2Index != -1
String[] retArray = sentences[sentence1Index], sentences[sentence2Index ]
return retArray
return null
All pseudo code:
String[] sentences
sentence1Index = -1
sentence2Index = -1
maxCount = -1
for i : 0 -> sentences.length
for j : 0 -> sentences.length
if i != j
String[] words1 = sentences[i].splitAt(" ")
String[] words2 = sentences[j].splitAt(" ")
tempCount = 0
for a : 0 -> words1 .length
for b : 0 -> words2.length
if words[a] equal-to-ignore-case words[b]
tempCount++
if tempCount > maxCount
sentence1Index = i
sentence2Index = j
maxCount = tempCount
if sentence1Index != -1 and sentence2Index != -1
String[] retArray = sentences[sentence1Index], sentences[sentence2Index ]
return retArray
return null
First you need a method that will take two of the sentences and determine how many words they have in common. This could work by taking the two sentences given as input, and creating from it two arrays containing the words in alphabetical order. Then you can examine the two arrays, advancing forward whichever array comes alphabetically earlier (so if the current match is "abacus" and "book", you would move "abacus" to the next word). If you have a match ("book" and "book"), then you increment a count of matched words, and move both arrays to the next word. You continue to do this until you reach the end of one of the arrays (since the remainder of words in the other array won't have any matches).
Once you have this algorithm implemented, you will need a loop that will look as follows:
for (i = 0; i < sentenceCount - 1; i++) {
for (j = i+1; j < sentenceCount; j++) {
}
}
Inside the loop you will call your function that calculates the number of words in common using the sentences at indexes i and j. You will keep track of the most number of words in common seen up to that point, and the two sentences where it was found. If a new sentence has a higher number of words in common, you will store that count and the two sentences that yielded that count. At the end, you'll have the two sentences that you want.

Find the combinations of a given encoded string using Ruby

I was asked this question during an interview and I couldn't come up with a satisfactory solution for it. Would appreciate if anybody could give some pointers.
Given a mapping like
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
encode("A") == "1"
encode("BA") == "21"
encode("ABC") == "123"
encode("") == ""
decode("1") == ["A"] -> 1
decode("21") == ["BA", "V"] -> 2
decode("123") == ["ABC", "JC", "AX"] -> 3
decode("012") == [] -> 0
decode("") == [""] -> 1
decode("102") == ["JB"] -> 1
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
We need a numDecode method which gives the length of unique solution array.
Updated :
Given a mapping like :
mapping = {"A" => 1, "B" => 2, "C" => 3..... "Z" => 26}
Suppose we are given a string as "A" the it can be encoded as : "1"
encode("A") should return "1"
encode("BA") should return "21" as if mapping is a hash then B has a value of 2, A has a value of 1.
encode("ABC") should return "123" as mapping["A" is 1, mapping["B"] is 2, and mapping["C"] is 3.
encode("") should return "" as it is not in mapping.
Now if decode("1") is called then it should return an array with one element i.e. ["A"] as key matching with 1 as value in mapping is "A".
decode("") should return an array with empty string i.e. [""].
decode("21") should return an array ["BA", "U"] as 2 is "B", 1 is "A" and "U" is 21 in mapping.
decode("012") should return an empty array as string starts with "0" which is not in mapping keys.
decode("102") should return an array as ["JB"] as "10" is J and "2" is B.
And finally numDecode should return the count of unique decoded strings in array. So,
numDecode(X) == len(decode(X))
numDecode("1") == 1
numDecode("21") == 2
numDecode("123") == 3
numDecode("") == 1
numDecode("102") == 1
numDecode("012") == 0
This is an interesting question, and the interview technique that goes with it is most likely to see how far the critical thinking goes. A good interviewer would probably not expect a single canonically correct answer.
If you get as far as a recursive decode solution that you then enumerate, then you are doing well IMO (at least I'd hire most candidates who could demonstrate clearly thinking through a piece of recursive code at interview!)
Having said that, one key hint is that the question asks for a num_decode function, not necessarily for implementations of encode and decode.
There is a deeper understanding and structure accessible here, that can be gained from analysing the permutations and combinations. It allows you to write a num_decode function that can handle long strings with millions of possible answers, without filling memory or taking hours to enumerate all possibilities.
First note that any set of separate ambiguous encoding multiply the number of possibilities for the whole string:
1920 -> 19 is ambiguous 'AI' or 'S' -> 'AIT' or 'ST'
192011 -> 11 is also ambiguous 'AA' or 'K' -> 'AITAA', 'AITK', 'STAA', 'STK'
Here 19 has two possible interpretations, and 11 also has two. A string with both of these separate instances of ambiguous codings has 2 * 2 == 4 valid combinations.
Each independent section of ambiguous coding multiplies the size of the whole set of decode values by the number of possibilities that it represents.
Next how to deal with longer ambiguous sections. What happens when you add an ambiguous digit to an ambiguous sequence:
11 -> 'AA' or 'K' -> 2
111 -> 'AAA', 'AK', 'KA' -> 3
1111 -> 'AAAA', 'AAK', 'AKA', 'KAA', 'KK' -> 5
11111 -> 'AAAAA', 'AAAK', 'AAKA', 'AKAA', 'AKK', 'KAAA', 'KAK', 'KKA' -> 8
2,3,5,8 should look familiar, it is the Fibonacci sequence, what's going on? The answer is that adding one digit to the sequence allows all the previous combinations plus those of the sub-sequence before that. By adding a digit 1 to the sequence 1111 you can either interpret it as 1111(1) or 111(11) - so you can add together the number of possibilities in 1111 and 111 to get the number of possibilities in 11111. That is, N(i) = N(i-1) + N(i-2) which is how to construct the Fibonacci series.
So, if we can detect ambiguous coding sequences, and get their length, we can now calculate the number of possible decodes, without actually doing the decode:
# A caching Fibonacci sequence generator
def fib n
#fibcache ||= []
return #fibcache[n] if #fibcache[n]
a = b = 1
n.times do |i|
a, b = b, a + b
#fibcache[i+1] = a
end
#fibcache[n]
end
def num_decode encoded
# Check that we don't have invalid sequences, raising here, but you
# could technically return 0 and be correct according to question
if encoded.match(/[^0-9]/) || encoded.match(/(?<![12])0/)
raise ArgumentError, "Not a valid encoded sequence"
end
# The look-ahead assertion ensures we don't match
# a '1' or '2' that is needed by a '10' or '20'
ambiguous = encoded.scan(/[12]*1[789]|[12]+[123456](?![0])/)
ambiguous.inject(1) { |n,s| n * fib(s.length) }
end
# A few examples:
num_decode('') # => 1
num_decode('1') # => 1
num_decode('12') # => 2
num_decode('120') # => 1
num_decode('12121212') # => 34
num_decode('1212121212121212121212121211212121212121') # => 165580141
It is relatively short strings like the last one which foil attempts to enumerate
the possibilities directly by decoding.
The regex in the scan took a little experimentation to get right. Adding 7,8 or 9 is ambiguous after a 1, but not after a 2. You also want to avoid counting a 1 or 2 directly before a 0 as part of an ambiguous sequence because 10 or 20 have no other interpretations. I think I made about a dozen attempts at the regex before settling on the current version (which I believe to be correct, but I did keep finding exceptions to correct values most times I tested the first versions).
Finally, as an exercise, it should be possible to use this code as the basis from which to write a decoder that directly output the Nth possible decoding (or even one that enumerated them lazily from any starting point, without requiring excessive memory or CPU time).
Here's a recursive solution:
$mapping = Hash[(0..25).map { |i| [('A'.ord+i).chr,i+1] }]
$itoa = Hash[$mapping.to_a.map { |pair| pair.reverse.map(&:to_s) }]
def decode( str )
return [''] if str.empty?
return $itoa.key?(str) ? [$itoa[str]] : nil if str.length == 1
retval = []
0.upto(str.length-1) do |i|
word = $itoa[str[0..i]] or next
tails = decode(str[i+1..-1]) or next
retval.concat tails.map { |tail| word + tail }
end
return retval
end
Some sample output:
p decode('1') # ["A"]
p decode('21') # ["BA", "U"]
p decode('123') # ["ABC", "AW", "LC"]
p decode('012') # []
p decode('') # [""]
p decode('102') # ["JB"]
p decode('12345') # ["ABCDE", "AWDE", "LCDE"]
Note differences between this output and the question. E.g. The 21st letter of the alphabet is "U", not "V". etc.
#he = Hash[("A".."Z").to_a.zip((1..26).to_a.map(&:to_s))]
# => {"A"=>"1", "B"=>"2",...,"Z"=>"26"}
#hd = #he.invert # => {"1"=>"A", "2"=>"B",.., "26"=>"Z"}
def decode(str, comb='', arr=[])
return arr << s if str.empty?
# Return if the first character of str is not a key of #hd
return arr unless (c = #hd[str[0]])
# Recurse with str less the first char, s with c appended and arr
arr = decode(str[1..-1], s+c, arr)
# If the first two chars of str are a key of #hd (with value c),
# recurse with str less the first two chars, s with c appended and arr
arr = decode(str[2..-1], s+c, arr) if str.size > 1 && (c = #hd[str[0..1]])
arr
end
def num_decode(str) decode(str).size end
decode('1') # => ["A"]
decode('') # => [""].
decode('21') # => ["BA", "U"]
decode('012') # => [""]
decode('102') # => ["JB"]
decode('123') # => ["ABC", "AW", "LC"]
decode('12345') # => ["ABCDE", "AWDE", "LCDE"]
decode('120345') # => ["ATCDE"]
decode('12720132') # => ["ABGTACB", "ABGTMB", "LGTACB", "LGTMB"]
Any more? Yes, I see a hand back there. The gentleman with the red hat wants to see '12121212':
decode('12121212')
# => ["ABABABAB", "ABABABL", "ABABAUB", "ABABLAB", "ABABLL", "ABAUBAB",
"ABAUBL", "ABAUUB", "ABLABAB", "ABLABL", "ABLAUB", "ABLLAB",
"ABLLL", "AUBABAB", "AUBABL", "AUBAUB", "AUBLAB", "AUBLL",
"AUUBAB", "AUUBL", "AUUUB", "LABABAB", "LABABL", "LABAUB",
"LABLAB", "LABLL", "LAUBAB", "LAUBL", "LAUUB", "LLABAB",
"LLABL", "LLAUB", "LLLAB", "LLLL"]
num_decode('1') # => 1
num_decode('21') # => 2
num_decode('12121212') # => 34
num_decode('12912912') # => 8
This looks like a combinatorics problem, but it's also a parsing problem.
(You asked for pointers, so I'm doing this in English rather than dusting off my Ruby.)
I would do something like this:
If X is an empty string, return 1
If X is not a string composed of digits starting with a nonzero digit, return 0
If X contains no 1's or 2's, return 1 (there's only one possible parsing)
If X contains 1's or 2's, it gets a bit more complicated:
Every 1 that is not the last character in X matches both "A" and the first digit of one of the letters "J" through "S".
Every 2 that is not the last character in X and is followed by a digit less than 7 matches both "B" and the first digit of one of the letters.
Count up your 1's and 2's that meet those criteria. Let that number be Y. You have 2^Y combinations of those, so the answer should be 2^Y but you have to subtract 1 for every time you have a 1 and 2 next to each other.
So, if you haven't returned by Step 4 above, count up your 1's that aren't the last character in X, and the 2's that both aren't the last character in X and aren't followed by a 7,8,9, or 10. Let the sum of those counts be called Y.
Now count every instance that those 1's and 2's are neighbors; let that sum be called Z.
The number of possible parsings is (2^Y) - Z.
In the spirit of giving “some pointers”, instead of writing an actually implementation for numDecode let me say that the most logically straightforward way to tackle this problem is with recursion. If the string passed to numDecode is longer than one character then look at the beginning of the string and based on what you see use one or two (or zero) recursive calls to find the correct value.
And the risk of revealing too much, numDecode("1122") should make recursive calls to numDecode("122") and numDecode("22").
# just look for all singles and double as you go down and keep repeating this.. if you get to the end where the string would be 1 or 2 digets long you count 1
# IE
# 121
# 1 that's good 2 that's good 1 that's good if all good then count + 1
# 12 that's good 1 that's good ... no more doubles if all good then count + 1
# 1 that's good 21 that's good if all good then count + 1
# test this on other cases
$str = "2022"
$strlength = $str.length
$count = 0
def decode(str)
if str[0].to_i >= 1 and str[0].to_i <= 9
$count += 1 if str.length == 1
decode(str[1..-1])
end
if str[0..1].to_i >= 10 and str[0..1].to_i <= 26
$count += 1 if str.length == 2
p str.length
decode(str[2..-1])
end
end
decode($str)
p " count is #{$count}"

How do I multiply code text dynamically?

I have a code snippet in my Ruby code:
while words.size >= 3
frequency["#{words[0]} #{words[1]} #{words[2]}"] += 1
words.shift
end
I would like to set a variable i and, depending on the value of i, the appropriate code gets run. For example is i=2:
while words.size >= 2
frequency["#{words[0]} #{words[1]}"] += 1
words.shift
end
Changing the value in the while condition is easy, but how do I replicate the body of the loop based on the variable i?
You can do this trivially with an if..else statement in your loop:
while words.size > 0
if words.size >= 3
frequency["#{words[0]} #{words[1]} #{words[2]}"] += 1
elsif words.size >=2
frequency["#{words[0]} #{words[1]}"] += 1
else
break
end
words.shift
end
However, this being Ruby, based on how you're creating those keys, it's easy enough to just say:
while words.size >= 2
frequency[words.first(3).join(" ")] += 1
words.shift
end
This will take up to the first three elements from the words array and concatenate them with spaces, giving you your frequency key. In the case that there are only two words left, it'll use both of them.

Trying to write a sorting program with ruby - stack level too deep (system stack error)

I'm reading Chris Pine's book "Learn to Progam" (it's about Ruby). Right now I'm trying to write a program that sorts words. Unfortunately I'm stuck with: stack level too deep (system stack error) in line 16, which, if i Googled correctly means that there is an infinite loop, but I don't know why.
Here's the code:
words = []
wordss = []
word = 'word'
i = 0
k = 0
def sortw array
i = 0
if (array.length == 1) || (array.length == 0)
else sort array, [], [], i
end
return array
end
def sort unsorted, unsort, sorted, i
k = 0 # The error should be here, according to command prompt
while i < unsorted.length
while (unsorted[i] < unsorted[k])
if k < unsorted.length
k = k + 1
elsif k == unsorted.length
sorted.push unsorted[i]
else unsort.push unsorted[i]
end
end
i = i + 1
sort unsorted, unsort, sorted, i
end
if unsort.length != 1
i = 0
sort unsort, [], sorted, i
else sorted.push unsort[0]
end
return sorted
end
puts 'type one word per line...'
puts 'typing enter on an empty line sorts the inputted words'
while word != ''
word = gets.chomp
words = words.push word
end
wordss = (sortw words)
puts 'your words'
puts words
puts 'sorted here'
puts wordss
You are getting the error because recursion does not stop due to a problem with the sorting algorithm. In the sort method, k is always less than unsorted.length. This causes the other arrays, sorted and unsort to never populate.
For example try these for input:
dog
zebra
cat
Additionally, I think you want to not include the blank line so I would change the code from:
words = words.push word to words = words.push word if word != ''
This creates the unsorted array:
[0] dog
[1] zebra
[2] cat
Numbered below are the iterations of the recursive sort method.
#initial variable state:
i = 0
k = 1
dog = dog
skip second while loop
i = 1
zebra > dog
skip second while loop
i = 2
cat < dog
enter second while loop
k = 1, now cat < zebra, so keep looping
k = 2, now cat = cat, so exit while
i = 3
Since i is now equal to the unsorted array length, the two while loops never get entered anymore.
Therefore, the following code results in an infinite loop since nothing was pushed to the unsort array:
if unsort.length != 1
i = 0
sort unsort, [], sorted, i #Problem is that `unsort` and `sorted` are empty
elsif
...
end

Resources