How can I count the number of equal words between two strings? - xpath

How can I count the number of words that appear in two strings?
I'm thinking in something like this
let $nequalwords := count($item[text() eq $speech])
What is the best way to do this?
I thought to go with a two fors comparing word by word, but I don't know if there are a better way to do this.

How about splitting the strings on white space so that you end up with words, and then creating a sequence of the strings and removing those that are not distinct, i.e. those that appear in both strings, by then subtracting this from the count of all words you know how many words appeared in both strings. For example:
let $distinct-words1 := distinct-values(tokenize($string1, "\s+"))
let $distinct-words2 := distinct-values(tokenize($string2, "\s+"))
let $all-words := ($distinct-words1, $distinct-words2)
return
count($all-words) - count(distinct-values($all-words))

How about
count(tokenize($string1, "\s+")[. = tokenize($string2, "\s+")])
This is the number of words in the first string that also appear in the second string. Which might or might not be what you actually want. For example, if the two strings are "the more the merrier" and "the rite of spring", the answer will be 2.

Related

Ruby. Split string in separate decimal numbers

I have a long string which contains only decimal numbers with two signs after comma
str = "123,457568,22321,5484123,77"
The numbers in string only decimals with two signs after comma. How I can separate them in different numbers like that
arr = ["123,45" , "7568,22" , "321,54" , "84123,77"]
You could try a regex split here:
str = "123,457568,22321,5484123,77"
nums = str.split(/(?<=,\d{2})/)
print nums
This prints:
123,45
7568,22
321,54
84123,77
The logic above says to split at every point where a comma followed by two digits precedes.
Scan String for Commas Followed by Two Digits
This is a case where you really need to know your data. If you always have floats with two decimal places, and commas are decimals in your locale, then you can use String#scan as follows:
str.scan /\d+,\d{2}/
#=> ["123,45", "7568,22", "321,54", "84123,77"]
Since your input data isn't consistent (which can be assumed by the lack of a reliable separator between items), you may not be able to guarantee that each item has a fractional component at all, or that the component has exactly two digits. If that's the case, you'll need to find a common pattern that is reliable for your given inputs or make changes to the way you assign data from your data source into str.

Ruby regex count matched elements in the array of digits

I have a string:
'my_array1: ["1445","374","1449","378"], my_array2: ["1445","374", "1449","378"]'
I need to match all sets of digits from my_array2: [...] and count how many of them there.
I need to do something like this with regex and ruby MatchData
string = 'my_array1: ["1445","374", "1449","378"], my_array2: ["1445","374", "1449","378"]'
matches = string.match(/my_array2\:\s[\[,]\"(\d+)\"/)
count_matches = matches.size
Expected result should be 4.
What is the correct way of doing it?
If you are guaranteed that the content of my_array2 is always numeric you could simply use split twice. First you splitby my_array2: [" and then split by ,. This should give you the amount of items you are after.
If you are not guaranteed that, you could still split by my_array2 and instead of splitting again, you use a pattern such as "\d+" (or "\d+(\.\d+)? if you have floating point values) and count.
An example of the expression is available here.

String of words - DP

I have a string of words and I must determine the longest substring so that the latest 2 letters of a word must be the first 2 letters of a word after it.
For example, for the words:
star, artifact, book, ctenophore, list, reply
Edit: So the longest substring would be star, artifact, ctenophore, reply
I'm looking for an idea to solve this problem in O(n). No code, I appreciate any sugestions on how to solve it.
The closest thing to O(n) I have is this :
You should mark every word with an Id. Let's take your example :
star => 1st substring possible. Since you're looking for the longest substring, if a substring stars with ar, it's not the longest, because you can add star in the front.
let's set the star ID to 1, and its string comparison is ar
artifact => the two first character matches the first possible substring. let's set the artifact ID to 1 as well, and change the string comparison to ct
book => the two first character don't match anything in the string comparisons (there's only ct there), so we set the book ID to 2, and we add a new string comparison : ok
...
list => the first two character don't match anything in the string comparisons (re from ID == 1 and ok from ID ==2 ), so we create another ID = 3 and another string comparison
In the end, you just need to go through the IDs and see which one has the most elements. You can probably count it as you go as well.
The main idea of this algorithm is to memorize every substring we're looking for. If we find a match, we just update the right substring with the two new last characters, and if we don't, we add it to the "memory list"
Repeating this procedure makes it O(n*m), with m the number of different IDs.
First, read in all words into a structure. (You don't really need to, but it's easier to work that way. You could also read them in as you go.)
Idea is to have a lookup table (such as a Dictionary in .NET), which will contain key value pairs such that each two last letters of a word will have an entry in this lookup table, and their corresponding value will always be the longest 'substring' found so far.
Time complexity is O(n) - you only go through the list once.
Logic:
maxWord <- ""
word <- read next word
initial <- get first two letters of word
end <- get last two letters of word
if lookup contains key initial //that is the longest string so far... add to it
newWord <- lookup [initial] value + ", " + word
if lookup doesn't contain key end //nothing ends with these two letters so far
lookup add (end, newWord) pair
else if lookup [end] value length < newWord length //if this will be the longest string ending in these two letters, we replace the previous one
lookup [end] <- newWord
if maxWord length < newWord length //put this code here so you don't have to run through the lookup table again and find it when you finish
maxWord <- newWord
else //lookup doesn't contain initial, we use only the word, and similar to above, check if it's the longest that ends with these two letters
if lookup doesn't contain key end
lookup add (end, word) pair
else if lookup [end] value length < word length
lookup [end] <- word
if maxWord length < word length
maxWord <- word
The maxWord variable will contain the longest string.
Here is the actual working code in C#, if you want it: http://pastebin.com/7wzdW9Es

Parse many numbers containing commas from string

I have a series of strings that all include 1 or many numbers (a number in this case would be 123,123,123) in the following format
"This is a number 123,124,123"
"These are some more numbers 123,345,123; 231,123,123; 124,152,123"
"This one is an odd situation 123,124,125; 123,123,123; more text"
What is the cleanest way to parse these numbers into either an array or a string that I can split that looks like this?
"123,124,123"
"123,345,123;231,123,123;124,152,123"
"123,124,125;123,123,123;"
Ultimately I want to be able to separate out the numbers like this.
"123,124,123"
"123,345,123" "231,123,123" "124,152,123"
"123,124,125" "123,123,123"
Currently attempting to use
"string".scan( /\d/ )
but obviously this is only giving me the numbers without the commas and also not separated properly.
Do it like this
string.scan(/[\d,]+/)
Another way would be to remove the unwanted characters.
arr = ["This is a number 123,124,123",
"These are some more numbers 123,345,123; 231,123,123; 124,152,123",
"This one is an odd situation 123,124,125; 123,123,123; more text"]
arr.map { |str| str.gsub(/[^\s\d,]+/,'').split }
#=> [["123,124,123"],
# ["123,345,123", "231,123,123", "124,152,123"],
# ["123,124,125", "123,123,123"]]
Regex that matches your numbers is \d{1,3}(,\d{3})*

String that can contain multiple numbers - how do I extract the longest number?

I have a string that
contains at least one number
can contain multiple numbers
Some examples are:
https://www.facebook.com/permalink.php?story_fbid=53199604568&id=218700384
https://www.facebook.com/username_13/posts/101505775425651120
https://www.facebook.com/username/posts/101505775425699820
I need a way to extract the longest number from the string. So for the 3 strings above, it would extract
53199604568
101505775425651120
101505775425699820
How can I do this?
#get the lines first
text = <<ENDTEXT
https://www.facebook.com/permalink.php?story_fbid=53199604568&id=218700384
https://www.facebook.com/username_13/posts/101505775425651120
https://www.facebook.com/username/posts/101505775425699820
ENDTEXT
lines = text.split("\n")
#this bit is the actual answer to your question
lines.collect{|line| line.scan(/\d+/).sort_by(&:length).last}
Note that i'm returning the numbers as strings here. You could convert them to numbers with to_i
parse the list (to get an int array), then use the Max function. array.Max for syntax.
s = "https://www.facebook.com/permalink.php?story_fbid=53199604568&id=218700384"
s.scan(/\d+/).max{|a,b| a.length <=> b.length}.to_i

Resources