Data structure for sentence completion suggestions [duplicate]

Data structure for sentence completion suggestions [duplicate] - algorithm

This question already has answers here:
Algorithm for autocomplete?
(9 answers)
Closed 9 years ago.
If a word is typed in Google, it will show a list of words as suggestions in a drop-down list.
For example, if you type what, it will show what is your name, what is your father's name, what is your college name, etc. in 8 words.
What is a suitable data structure, as well as best way to list those suggestions?

I think the best method is to use a trie where each edge is weighted according to the probability that the next letter correspond to this edge so that first suggestions have higher probabilities.

Related

Is it efficient to use nested hashtables for this practice problem?

I've been tasked with finding the most efficient solution to the following problem: I need to print out the (k) most streamed movies of genre, (g), in the given year, (y), i can assume it takes o(1) to retrieve the current year. An example of this is:
Every time a movie is streamed, i'm given the name of the movie and the genre.
"What are the top 5 most streamed romance movies in the year 2014?
The returned answer might be something like
MovieName1 (romance) 3409 streams
MovieName2 (romance) 4000 streams
MovieName3 (romance) 5340 streams
MovieName4 (romance) 9000 streams
MovieName5 (romance) 10000 streams
So my thought process is to use 3 nested hashtables.
One where i use they name (key) to map to the frequency (value)
One where i use the genre(key) to map to a map(name,frequency)(value)
And the final one where i use the year(key) to map to a map(genre,
map(name, frequency) (value)
Does that make any sense...I think confused myself just by writing this..
Is it possible to just use a single hashtable that uses they year as a key and maps to a linked list of nodes where every single node contains the movie name, frequency, and genre? Would this be more efficient?
So if i wanted to update batmans frequency i can just do map.get(2008) which would give the head of the linked list
and then do
while(tmp != null){
if(tmp.name == "the dark knight"){
temp.frequency++;
}

So you thought about using a hashmap of years to hashmaps of genres to hashmaps of names to frequencies. Does it make sense? Sure. Is it a good way to solve your problem? Most likely not. It is also possible, as you say, to use a single main hashmap of years to collections of genre-name-frequency tuples (or structures, in a lot of languages - like C, C++, Java, and so on). By collections, you thought of linked lists, but you could very well use vectors or something else (linked lists are very often the worst kind of data structures). But this would not be more efficient and is not necessarily a better way to solve your problem, even though it may be more readable and maintainable.
I won't be talking about performance improvements that don't impact time complexity, since you've made it clear in the comments that it's for an exam that only cares about time complexity (which is sad, but whatever). Also, I'm assuming this is the only problem you need to solve.
Let's see how to improve your ideas. It just so happens that a mix of both your solutions, along with one improvement, gives the best possible solution in terms of time complexity, that is O(k). First, note that each movie name is associated with only one frequency, and each frequency is associated with (likely) only one movie name. And since you want to retrieve movie names based on frequencies, a hashmap has no advantage over a linear collection of name-frequency pairs, such as a vector or a linked list. Then, note that each year and each genre is (independently) associated with multiple movies (each with both a name and a frequency). So a hashmap has a place here: with a given year and a given genre, a hashmap-like structure would give you the relevent movies in expected constant time.
Combine these two results and you get a hashmap with years and genres as keys and collections of name-frequency pairs as values. The one improvement that makes it possible to retrieve the k most streamed movies for given year and genre in O(k) time complexity is a sort: if your name-frequency pairs are sorted by frequency in every collection, you can simply return the first (or last, depending on the order) k names.
One detail that you may find weird is that the hashmap uses both years and genres as keys. That's an implementation detail. You can do it by using two nested hashmaps, one using years as keys and the other using genres, or you can combine years and genres and directly use these pairs as keys. It's actually straightforward to hash year-genre pairs.

When is a set better than an array? [duplicate]

This question already has answers here:
Set vs Array , difference
(3 answers)
Closed 5 years ago.
I have used lots of time arrays in Ruby. But never get a chance to use set. My question is when Set can be useful and when it is better than an array?

From the documentation, the initial definitions go as follows:
Array: An
integer-indexed collection of objects.
Set: A
collection of unordered values with no duplicates.
In a nutshell, you should use Set when you want to make sure that each element in the collection is unique, you want to test if a given element is present in the collection and you won't require random access to the objects.

Solution to get list of cities between two cities A and B [duplicate]

This question already has answers here:
Get stopover towns using the Google Maps API (directions)
(2 answers)
Closed 8 years ago.
I want to get a list of cities that lie between cities A and B. I am using Google Maps API. On searching, I found a similar question on SO,
Is it possible to get list of City between City A and City B in Google Maps
Here, solution that was suggested is reverse geocoding. There is another similar question :
How to get places between give two city using Google Map API
Both the solutions are not clear as how they can be applied to solve the problem stated. Can anyone please explain the approach to be used in steps or any already available example on the web known to you.
Thanks a lot.

What those solutions say is that there is no way to do it directly using Google Maps API. What you can do instead is:
Define what between exactly means(something like a line on a sphere connecting cities A and B).
Choose a bunch of points between these cities(if "between" means a line segment on a sphere, you can divide it into several small segments and pick their end points).
Use reverse geocoding for each of the selected points in step 2 to check if it lies inside any city. If it does, add this city to the result set.
Return the result set.
Another approach(if your are interested in a fixed subset of all cities in the world) is to pick each city from the "interesting" set, get its geolocation and check if it lies between A and B(again, you need to define what between means exactly).

Ranking from pairwise comparisons [duplicate]

This question already has answers here:
How to rank a million images with a crowdsourced sort
(12 answers)
Closed 8 years ago.
Imagine I have a very long list of images and I want to rank them in order of how 'good' people think they are.
I don't want to get users to assign scores to images outright (1 - 10 etc) and order by that, I'd like to try something new.
What I was thinking would be an interesting way of doing it is:
Show a user two random images, they pick the better one
Collect lots of 'comparisons'
Use all the comparisons to come up with some ordering
Turns out this is used regularly, for example (using features, not images), this appears to be the way Uservoice's Smartvote works.
My question is whether there's a good known way to take this long list of comparisons and build a relative ranking for all the images from them but not to the level of complexity found in the research papers.
I've read a bunch of lectures and research papers but I was wondering if there was any sample code out there people might recommend?

Seems like you could just get some kind of numerical ranking system and then just sort based on that. Just borrow the algorithm from a win/loss sport, or chess, and treat each image comparison as a bout.
Did some looking, here's some sample code of what an algorithm like that looks like in Java
And here's a library you can borrow in python
If you search ELO you'll find a version of it in just about any language. Once you get your numerical image rankings, you can sort them any way you like. There are probably other ranking algorithms you could look into for win/loss competition, that was just the first that came up when I googled chess ranking.

For every image, count the number of times it won a duel, and divide by the number of duels it took part in. This ratio is your ranking score.
Example:
A B, A C, A D, B C, B D
Yields
B: 67%, C, D: 50%, A: 33%
Unless you perform a huge number of comparisons, there will be many ties.

Retrieve variables and equation from database and solve them [duplicate]

This question already has answers here:
Algebra equation parser for java
(5 answers)
Closed 9 years ago.
My client wants to save an equation formula in a database (Oracle). In this formula they want to use abbreviations of the variables names (field in a table containing the variables) as a descriptive field to see what the formula uses to calculate the result, but wants to be able to calculate the result of the formula when all the variables have values as well.
This means if they change the formula later, the result has to reflect those changes. They have short and long formulas. e.g.
C=(A+B)/100
D=(E+F)/100
G=(3*C)+(4*D)/7
Do you know any reference to something similar to this?
I'm using jsp and Oracle as stated before.

You are on your own. Oracle will not help you much in parsing equations. For simple things, you can iterate over variables and values using SQL REPLACE function and see if that is good enough for you.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio