Time and space complexity of Ruby Anagram algorithm - ruby

I have written this algorithm here and I am trying to evaluate its time and space complexity in terms of Big-O notation. The algorithm determines if two given strings are anagrams.
def anagram(str1, str2)
str1.each_char do |char|
selected_index = str2.index(char)
return false if !selected_index #to handle nil index
str2.slice!(selected_index)
end
str2.empty?
end
The time complexity of this function is O(n^2), and the space complexity is O(1)? I believe I may be mistaken for the space complexity (could be O(n)) because the selected_index variable is repeatedly re-assigned which takes up memory relative to how long the each_char loop runs for.
If someone could please throw some guidance that would be great :)

Gathering up all those comments into an answer, here is my analysis:
Time
The algorithm as presented does indeed have O(n^2) running time.
The body of the loop is executed n times and takes linear time for index, linear time for slice, and constant time for the rest, requiring a total of O(n^2) time.
Space
The algoithm as presented requires linear space, because it updates a copy of str2 at each iteration.
The rest of the algorithm only takes constant space, unless you include the storage for the inputs themselves, which is also linear.
Faster algorithm: sort str1 and str2
A faster algorithm would be to do string compare sort-by-character(str1) and sort-by-character(str2). That would take O(n log n) time and O(n) space for the sort; and linear time and constant space for the comparison, for an overall O(n log n) time and O(n) space.
Even faster algorithm: use a hash (proposed by OP in the comments)
Using hash tables to store character and then compare character counts can reduce the running time to O(n), assuming standard O(1) insert and lookup hash operations. The space in this case is the space required for the hash tables, which is O(k) for a character alphabet of size k, which can be considered constant if k is fixed. Of course, the input parameters still consume their initial O(n) space as they are passed in or where they are originally stored; the O(k) reflects only the additional space required to run this algorithm.

Related

Big O notation of a preprocessed static data structure

From what i understand, O(n) will grow linearly in regards to the size of the input data set.
I'm getting confused as I have a querying structure that maps keys to a list of preprocessed values that will not ever change after the structure is initialised.
If i define n as the input, an array of keys.
def (arrOfKeys):
for key in arrOfKeys: # O(n) Iterating through the input.
preprocessedList = getPreprocessedListDifferentForEachKey(key) # O(1) this list could have any number of elements.
for anotherPreprocessedList in preprocessedList: # * <- O(n) or O(1)?
for element in anotherPreprocessedList: # * <- O(n) or O(1)?
...
I'm unsure if this O(1) because it is preprocessed or O(n) as the size of the list is dependent on what the input is?
Does this end up being O(n^3) at the worst case or is it possible to argue O(n)?
It depends, if preprocessedList (and its sub-array) is always going to be of a constant length, your 2 inner loops will be of time complexity O(1). If they however are depending on the input argument arrOfKeys they will each be of O(n) and thus O(n) * O(n) = O(n^2).
Combined with the first loop you then multiply it with its time complexity which is O(n).
So if the inner loops are each of O(n) it's going to be in total O(n^3)
If the lengths of preprocessedList is variable, but not depending on the length of arrOfKeys you can define it as m and say it's of time complexity O(m). You can then say that the time complexity is O(n*m^2).
It's usually possible to introduce another symbol to describe the time complexity as long as you explain what they are and how they relate to the indata.

Space complexity of algorithm to copy a list into a HashSet

What is the space complexity for an algorithm which places each element from a list into a HashSet? Is it O(n), where n is the size of the list or is it O(k), where k is the number of unique elements in the list. Since the HashSet only grows when we add unique elements to it it seems to me that the latter is correct.
Space complexity of any algorithm takes into account the size of the input. It is a measure of the maximum working memory that will be needed during the execution of the algorithm. So for O(n) size input the space complexity has to be at least O(n). Source
So given the algorithm used O(n) just for the input, and it isn't a really bad implementation i.e. it uses constant amount of space as it iterates over the list and we know that k < n, so input size will always be the dominating factor in space complexity. So overall the space complexity will be O(n).

basic complexity confusion

I have an algorithm which takes in a 2D array and uses no extra space. So is the space complexity of the algorithm O(n^2) (because I am processing the entire input array) or O(1) (since the algorithm does not use any extra space apart from the input)
In particular, in this question http://www.careercup.com/question?id=4959773472587776 , it does not matter if we use 2 extra 1-dimensional arrays right, since anyway the input space complexity is O(n^2).
Thanks!
Auxiliary space complexity does not include the input space whereas the space complexity does.
For auxiliary space complexity analysis, only consider the extra memory consumption. If your algorithm does not use any extra space then the auxiliary space complexity is O(1).
If the input has size m ( = n x n), and you use 2 arrays of size n, then the auxiliary space complexity will be O(n) (or O(logm)).
For space complexity, since you count the input size, you are right, using 2 arrays will not change the space complexity.

Differences between time complexity and space complexity?

I have seen that in most cases the time complexity is related to the space complexity and vice versa. For example in an array traversal:
for i=1 to length(v)
print (v[i])
endfor
Here it is easy to see that the algorithm complexity in terms of time is O(n), but it looks to me like the space complexity is also n (also represented as O(n)?).
My question: is it possible that an algorithm has different time complexity than space complexity?
The time and space complexities are not related to each other. They are used to describe how much space/time your algorithm takes based on the input.
For example when the algorithm has space complexity of:
O(1) - constant - the algorithm uses a fixed (small) amount of space which doesn't depend on the input. For every size of the input the algorithm will take the same (constant) amount of space. This is the case in your example as the input is not taken into account and what matters is the time/space of the print command.
O(n), O(n^2), O(log(n))... - these indicate that you create additional objects based on the length of your input. For example creating a copy of each object of v storing it in an array and printing it after that takes O(n) space as you create n additional objects.
In contrast the time complexity describes how much time your algorithm consumes based on the length of the input. Again:
O(1) - no matter how big is the input it always takes a constant time - for example only one instruction. Like
function(list l) {
print("i got a list");
}
O(n), O(n^2), O(log(n)) - again it's based on the length of the input. For example
function(list l) {
for (node in l) {
print(node);
}
}
Note that both last examples take O(1) space as you don't create anything. Compare them to
function(list l) {
list c;
for (node in l) {
c.add(node);
}
}
which takes O(n) space because you create a new list whose size depends on the size of the input in linear way.
Your example shows that time and space complexity might be different. It takes v.length * print.time to print all the elements. But the space is always the same - O(1) because you don't create additional objects. So, yes, it is possible that an algorithm has different time and space complexity, as they are not dependent on each other.
Time and Space complexity are different aspects of calculating the efficiency of an algorithm.
Time complexity deals with finding out how the computational time of
an algorithm changes with the change in size of the input.
On the other hand, space complexity deals with finding out how much
(extra)space would be required by the algorithm with change in the
input size.
To calculate time complexity of the algorithm the best way is to check if we increase in the size of the input, will the number of comparison(or computational steps) also increase and to calculate space complexity the best bet is to see additional memory requirement of the algorithm also changes with the change in the size of the input.
A good example could be of Bubble sort.
Lets say you tried to sort an array of 5 elements.
In the first pass you will compare 1st element with next 4 elements. In second pass you will compare 2nd element with next 3 elements and you will continue this procedure till you fully exhaust the list.
Now what will happen if you try to sort 10 elements. In this case you will start with comparing comparing 1st element with next 9 elements, then 2nd with next 8 elements and so on. In other words if you have N element array you will start of by comparing 1st element with N-1 elements, then 2nd element with N-2 elements and so on. This results in O(N^2) time complexity.
But what about size. When you sorted 5 element or 10 element array did you use any additional buffer or memory space. You might say Yes, I did use a temporary variable to make the swap. But did the number of variables changed when you increased the size of array from 5 to 10. No, Irrespective of what is the size of the input you will always use a single variable to do the swap. Well, this means that the size of the input has nothing to do with the additional space you will require resulting in O(1) or constant space complexity.
Now as an exercise for you, research about the time and space complexity of merge sort
First of all, the space complexity of this loop is O(1) (the input is customarily not included when calculating how much storage is required by an algorithm).
So the question that I have is if its possible that an algorithm has different time complexity from space complexity?
Yes, it is. In general, the time and the space complexity of an algorithm are not related to each other.
Sometimes one can be increased at the expense of the other. This is called space-time tradeoff.
There is a well know relation between time and space complexity.
First of all, time is an obvious bound to space consumption: in time t
you cannot reach more than O(t) memory cells. This is usually expressed
by the inclusion
DTime(f) ⊆ DSpace(f)
where DTime(f) and DSpace(f) are the set of languages
recognizable by a deterministic Turing machine in time
(respectively, space) O(f). That is to say that if a problem can
be solved in time O(f), then it can also be solved in space O(f).
Less evident is the fact that space provides a bound to time. Suppose
that, on an input of size n, you have at your disposal f(n) memory cells,
comprising registers, caches and everything. After having written these cells
in all possible ways you may eventually stop your computation,
since otherwise you would reenter a configuration you
already went through, starting to loop. Now, on a binary alphabet,
f(n) cells can be written in 2^f(n) different ways, that gives our
time upper bound: either the computation will stop within this bound,
or you may force termination, since the computation will never stop.
This is usually expressed in the inclusion
DSpace(f) ⊆ Dtime(2^(cf))
for some constant c. the reason of the constant c is that if L is in DSpace(f) you only
know that it will be recognized in Space O(f), while in the previous
reasoning, f was an actual bound.
The above relations are subsumed by stronger versions, involving
nondeterministic models of computation, that is the way they are
frequently stated in textbooks (see e.g. Theorem 7.4 in Computational
Complexity by Papadimitriou).
Yes, this is definitely possible. For example, sorting n real numbers requires O(n) space, but O(n log n) time. It is true that space complexity is always a lowerbound on time complexity, as the time to initialize the space is included in the running time.
Sometimes yes they are related, and sometimes no they are not related,
actually we sometimes use more space to get faster algorithms as in dynamic programming https://www.codechef.com/wiki/tutorial-dynamic-programming
dynamic programming uses memoization or bottom-up, the first technique use the memory to remember the repeated solutions so the algorithm needs not to recompute it rather just get them from a list of solutions. and the bottom-up approach start with the small solutions and build upon to reach the final solution.
Here two simple examples, one shows relation between time and space, and the other show no relation:
suppose we want to find the summation of all integers from 1 to a given n integer:
code1:
sum=0
for i=1 to n
sum=sum+1
print sum
This code used only 6 bytes from memory i=>2,n=>2 and sum=>2 bytes
therefore time complexity is O(n), while space complexity is O(1)
code2:
array a[n]
a[1]=1
for i=2 to n
a[i]=a[i-1]+i
print a[n]
This code used at least n*2 bytes from the memory for the array
therefore space complexity is O(n) and time complexity is also O(n)
The way in which the amount of storage space required by an algorithm varies with the size of the problem it is solving. Space complexity is normally expressed as an order of magnitude, e.g. O(N^2) means that if the size of the problem (N) doubles then four times as much working storage will be needed.
space complexity is the total amount of memory space used by an algorithm/program, including input value execution space. whereas the time complexity is the number of operations an algorithm performs to complete its task. These are two different concept, a single algorithm can of low time complexity but still can take up a lot of memory for example hashmaps take more memory than array but take less time.

How to find the time complexity? [duplicate]

Am I correct in my explanation when calculating the time complexity of the following algorithm?
A HashSet, moduleMarksheetFiles, is being used to add the files that contain the moduleName specified.
for (File file: marksheetFiles){
while(csvReader.readRecord()){
String moduleName = csvReader.get(ModuleName);
if (moduleName.equals(module)){
moduleMarksheetFiles.add(file);
}
}
}
Let m be the number of files
Let k be the average number of records per file.
As each file is added only once because HashSet does not allow for duplicates. HashSet.add() is O(1) on average and O(n) for worst case.
Searching for a record with the specified moduleName involves comparing every record in the file to the moduleName, will take O(n) steps.
Therefore, the average time complexity would be: O((m*k)^2).
Is this correct?
Also, how would you calculate the worst case?
Thanks.
PS. It is not homework, just analysing my system's algorithm to evaluate performance.
No, it's not squared, this is O(nk). (Technically, that means it's also O((nk)²), but we don't care.)
Your misconception is that it the worst-case performance of HashSet is what counts here. However, even though a hashtable may have worst-case O(n) insertion time (if it needs to rehash every element), its amortized insertion time is O(1) (assuming your hash function is well behaved; File.GetHashCode presumably is). In other words, if you insert multiple things, so many of them will be O(1) that the occasional O(n) insertion does not matter.
Therefore, we can treat insertions as constant-time operations, so performance is purely dictated by the number of iterations through the inner loop body, which is O(nk).

Resources