Which data structure should be used while storing large number of data, but not any RDBMS? [closed] - algorithm

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
This question was asked in an interview. First, I came up with B-tree. He asked me to be more specific and asked me to describe how I would store the data so that it would be easier to retrieve.
Can you please throw some light on this. Thanks in advance

You question isn't really clear.
"Good" ways to store the data depend on what you want to do with it.
If you want access parts of your data, a list of offsets suffices. If you want to search in text, using an additional inverted index in combonation with docIds->offsets is great. If you have frequent updates to your data and reading is rare, none of those make sense. So it really depends

Sounds like an open question, so you can demonstrate your vast experience of ... well, http://en.wikipedia.org/wiki/NoSQL would be my guess, but you could argue that http://en.wikipedia.org/wiki/Dbm answers the question.

Related

What are the best sites to learn about formal languages, automata, algorithms and data structure? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'd like to know what are the best sites to learn about formal languages, automata, algorithms and data structures. Preferable with many solved questions...
Thanks in advance
What I prefer is., a best book " On Theory of Automation", http://www.amazon.com/Introduction-Automata-Languages-Computation-Edition/dp/0321455363 .,
I have read this book., superb it is.
visit http://rosettacode.org/wiki/Rosetta_Code
You can compare also structure of programs on examples.
You didn't mentioned what kind of algorithms you want to learn. Anyway for basic algorithms and data structures TopCoder algorithm tutorial's page is a good place to start. Visit http://www.topcoder.com/tc?d1=tutorials&d2=alg_index&module=Static

Best way to prepare for Design and Architecture questions related to big data [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Recently, I attended an onsite interview for a company and I was asked design questions related to big data like e.g: get me the list of users accessed a website (say google) between time t1 and t2. What data structures to use, how to handle concurrency, stale data, how many servers are needed to store the data, and requirements(software, hardware) of each server etc.....
Please point me some books/web references to increase my knowledge in this new area.Also provide me insights on how to answer such type of design questions
this book (free download) (amazon: mining of massive datasets) was just posted to HN (that thread also has some useful comments) - from a first skim it looks really good. you could read that.

Whats the best way to determine availability or uptime of my systems [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
By this I mean, whats the best way show the uptime of systems? Idealy id like to show some sort of percentage figure, like what the webhosts do. ie 99.5% uptime.
Is there a standard way to determine this?
We use Pingdom to monitor our servers, and they generate the sort of numbers you're looking for (we just use the free account). They also seem to have an API which will let you get your info programatically - no guarantees that'll work with a free account, though.
Hope this helps!

Hash stable to small changes in text [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there a hash function that is stable to small changes in text? I'm looking for the opposite of a cryptographic hash, where small changes in the source lead to huge changes in the result.
Something like a perceptual hash for text. Is there such a thing?
Edited: by "small changes in text" I mean changes in punctuation, correction of ortographic / grammatical mistakes, etc. The text itself is an article, like a wikipedia entry (but it can be much smaller, like 2 or 3 paragraphs).
Bonus points if somebody can point to a Python implementation.
You're looking for locality sensitive hashing.

Best Fuzzy Matching Algorithm? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?
I suggest you read the articles by Navarro mentioned in the Refences section of the Wikipedia article titled
Approximate string matching.
Making your decision based on actual research is always better than on suggestions by random
strangers.. Especially if performance on a known set of records is important to you.
It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.

Resources