Native String Search Algorithm's best time complexity [closed] - algorithm

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
When I'm studing this algorithm, I found that the best time complexity is O(n), as the book says. But why is not O(m)? I think the best condition is: pattern string successfully matches at the main string's first position, so only m comparisons are needed.
ps. n is the main string's length and m is the length of pattern string

When discussing string search algorithms, it is most often understood as having to find all occurrences. For example, Wikipedia has in its String-searching algorithm article:
The goal is to find one or more occurrences of the needle within the haystack.
This is confirmed in Wikipedia's description of the Boyer-Moore string search algorithm, where it states:
The comparisons continue until either the beginning of P is reached (which means there is a match) or a mismatch occurs upon which the alignment is shifted forward (to the right) according to the maximum value permitted by a number of rules. The comparisons are performed again at the new alignment, and the process repeats until the alignment is shifted past the end of T, which means no further matches will be found.
And again, for the Knuth–Morris–Pratt algorithm we find the same:
the Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S [...]
input:
an array of characters, S (the text to be searched)
an array of characters, W (the word sought)
output:
an array of integers, P (positions in S at which W is found)
an integer, nP (number of positions)
So even in your best case scenario the algorithm must continue the search after the initial match.

yes when you use Bit based (approximate) you can have complexity O(n) but how can you want to find in O(m). Think your first string is a string with length 10^10 and all the characters are 'A', let pattern string "B" so how can you want find "B" in this string with O(m) that m = 1

Related

Powers of a half that sum to one [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Call every subunitary ratio with its denominator a power of 2 a perplex.
Number 1 can be written in many ways as a sum of perplexes.
Call every sum of perplexes a zeta.
Two zetas are distinct if and only if one of the zeta has as least one perplex that the other does not have. In the image shown above, the last two zetas are considered to be the same.
Find all the numbers of ways 1 can be written as a zeta with N perplexes. Because this number can be big, calculate it modulo 100003.
Please don't post the code, but rather the algorithm. Be as precise as you can.
This problem was given at a contest and the official solution, written in the Romanian language, has been uploaded at https://www.dropbox.com/s/ulvp9of5b3bfgm0/1112_descr_P2_fractii2.docx?dl=0 , as a docx file. (you can use google translate)
I do not understand what the author of the solution meant to say there.
Well, this reminds me of BFS algorithms(Breadth first search), where you radiate out from a single point to find multiple solutions w/ different permutations.
Here you can use recursion, and set the base case as when N perplexes have been reached in that 1 call stack of the recursive function.
So you can say:
function(int N <-- perplexes, ArrayList<Double> currentNumbers, double dividedNum)
if N == 0, then you're done - enter the currentNumbers array into a hashtable
clone the currentNumbers ArrayList as cloneNumbers
remove dividedNum from cloneNumbers and add 2 dividedNum/2
iterate through index of cloneNumbers
for every number x in cloneNumbers, call function(N--, cloneNumbers, x)
This is a rough, very inefficient but short way to do it. There's obviously a lot of ways you can prune the algorithm(reduce the amount of duplicates going into the hashtable, prevent cloning as much as possible, etc), but because this shows the absolute permutation of every number, and then enters that sequence into a hashtable, the hashtable will use its equals() comparison to see that the sequence already exists(such as your last 2 zetas), and reject the duplicate. That way, you'll be left with the answer you want.
The efficiency of the current algorithm: O(|E|^(N)), where |E| is the absolute number of numbers you can have inside of the array at the end of all insertions, and N is the number of insertions(or as you said, # of perplexes). Obviously this isn't the most optimal speed, but it does definitely work.
Hope this helps!

Find a number by the decimal part of its square root [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a math problem consisting of two questions:
can we find a number N knowing only the decimal part of its square root up to a precision (only an approximation of the decimal part because the decimal part never ends)
is the answer unique? which mean that we won't find two integer whose square root decimal values are equal (the first 50 for example) .
Example:
if we have 0,4142135623730950488016887242097, can we find that it's the decimal part of square root of 2
or 0,418286444621616658231167581 for 1234567890
The answer for the second question is pretty easy because, let's say we have 50 decimals, the number of possible integer's square root is much more than the 10^50-1 possible values of the decimals parts, so there whill be more than one answer.
I am very grateful for your help or any research track.
You answered the second question yourself already. No there is no unique solution.
For the first question i don't know a quick mathematical solution, but some non-performant programming solutions:
Option A: The brute force method:
iterate over all integers, and compare the square root of each with your number.
Option B: More tricky brute force method, which is more performant, but still slow:
Iterate the integers from 1 to M
Add your decimal part to each of them
Take the power of two and see how close the next integer value is
if the next integer value is very close, take the square root of it to counter check the result
stop as soon as you found the correct integer
Option C: caching:
precalculate your decimal parts for all integers and store them in a HashMap.
use the HashMap to find the results quickly
Consider: since you have a very big amount of data, different decimal parts could result in the same hash value, which would break this option.

Finding number of anagrams [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Following was a question that was asked to me in one of the interviews.
We know anagram of eat is: tea and ate
The question is:
We have a program. We feed a list of 10 thousand alphabets to this program.
We run the program.
Now at run-time, we provide a word to this program eg. "eat"
Now the program should return the number of anagrams that exist in the list of 10 thousand alphabets. Hence for an input of "eat", it should return 2.
What will be the strategy to store those 10 thousand alphabets so that finding the number of anagrams becomes easy.
Order the letters of each word as to minimize it's ordering, i.e. tea becomes aet.
Then simply put these in a (hash) map of words to counts (both tea and ate maps to aet, so we'll have (aet, 2) in the map)
Then, when you get a word, reorder the letters as above and do a lookup for the count.
Running time:
Assuming n words in the list, with an average word length of m...
Expected O(nm log m) preprocessing, expected O(m log m) per query.
It's m log m on the assumption we just do a simple sort of the letters of a word.
The time taken per query is expected to be unaffected by the numbers of words in the list (i.e. hash maps give expected O(1) lookup time).

Proposing an O(logm) algorithm for the following [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I need to propose an algorithm for the following: let us assume that we have an array consisting of zeros and ones. The array is filled with zeros from the beginning of the array to the index m, and all remaining indexes are filled with ones. I need to find this index m in O(logm) time. Here is what i thought: I think this is like binary search, first i look at the middle element of the array, if that is zero, then i forget about the left part of the array and do the same for the right part, and continue like this until i encounter a one. If the middle element is one, then i forget about the right part and do the same for left part of the array. Is this a correct O(logm) solution? Thanks
It is not "like" a binary search - it is a binary search. Unfortunately, it is O(logN), not O(logM).
To find the borderline in O(logM), start from the other end: try positions {1, 2, 4, 8, 16, ... 2^i} and so on, until you hit a 1. Then do a binary search on the interval between 2^i and 2^(i+1), where 2^i+1 is the first position where you discovered a 1.
Finding the first 1 takes O(logM), because the index is doubled on each iteration. After that, the binary search takes another O(logM), because the length of the interval 2^i..2^(i+1) is less than M as well.

challenging string algorithm on pattern matching from bioinformatics [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I was told by a friend the following challenge problem.
Given {A, T, G, C} as our alphabet, we want to know the number of valid phrases with a specified length n with the following recursive pattern definition:
pat=pat1pat2, i.e. concatenate two patterns together to form a new pattern pat.
pat=(pat1|pat2), i.e. choosing either one of the patterns pat1 or pat2 to form a new pattern pat.
pat=(pat1*), i.e. repeating pattern pat1 any number of times (can be 0) to form a new pattern pat.
A phrase formed from the alphabet set {A, T, G, C} is said to satisfy a pattern if it can be formed by above pattern definition; its length is the number of alphabets.
A few examples:
Given a pattern ((A|T|G)*) and n=2, the number of valid phrases
is 9, since there are AA, AT, AG, TA, TT, TG, GA, GT,
GG.
Given a pattern (((A|T)*)|((G|C)*)) and n=2, the number of valid phrases
is 8, since there are AA, AT, TA, TT, GG, GC, CG, CC.
Given a pattern ((A*)C(G*)) and n=3, the number of valid phrases
is 3, since there are AAC, ACG, CGG.
Please point to me the source of this problem if you have ever seen it and your ideas to tackle it.
The choice of letters A,C,G, and T makes me think of DNA base pair sequences. But as thiton wrote, clearly this problem was lifted from the study of regular languages. Google "regular language enumeration" and you should find plenty of research papers and code to get you started. I'd be surprised if computing the number of matching strings for these patterns were not a #P-complete problem, so expect run-times exponential in n.

Resources