As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are rumors that current Node.js (or, more exactly V8 GC) performs badly when there are lots of JS objects and memory used.
Can You please explain what exatly is the problem - lots of objects or lots of properties on one object (or array)?
Maybe there are some benchmarks, would be interesting to see actual code and numbers.
As far as I know the main problem - lots of properties on one object, not lots of objects itself (although I'm not sure).
If so - would be the in-memory graph database (about couple of hundreds of properties on each node at max) a good case?
Also I heard that latest versions of V8 has improved GC and that it solved some parts of this problems - is this true, and when it will be available in Node.js?
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Recently, I attended an onsite interview for a company and I was asked design questions related to big data like e.g: get me the list of users accessed a website (say google) between time t1 and t2. What data structures to use, how to handle concurrency, stale data, how many servers are needed to store the data, and requirements(software, hardware) of each server etc.....
Please point me some books/web references to increase my knowledge in this new area.Also provide me insights on how to answer such type of design questions
this book (free download) (amazon: mining of massive datasets) was just posted to HN (that thread also has some useful comments) - from a first skim it looks really good. you could read that.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
We are about to move a project on apache cassandra from test to pilot and as a rdbms team, we were propably missing something.
Basic rules (or lessons learned):
be sure you have big or almost no data (nothing between)
do not believe in extremely cheap storage (cheap or not expensive might be
better)
think of your primary key as it was a reverse index
think of time (or another data creation order) as it was a row/clustering key
forgot about 100% foreign keys whenewer you can
sample if you can
do not care about dups
json and asynchronous time aggregation on client can make cpus more relaxed
ETL:
sample history if you can (or sample it just for reporting usage on separate reporting cluster)
single threaded data streams spreaded over couple of servers will come in hand
if you can afford asynchronous processing you can profit from knowledge of data patterns
throw scrap data away (horizontaly and vertically) - or it will mislead BI people or even board members in worse case
do not care about dups
The question is am I still missing something?
Are there another ways to achieve even better performance?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
This question was asked in an interview. First, I came up with B-tree. He asked me to be more specific and asked me to describe how I would store the data so that it would be easier to retrieve.
Can you please throw some light on this. Thanks in advance
You question isn't really clear.
"Good" ways to store the data depend on what you want to do with it.
If you want access parts of your data, a list of offsets suffices. If you want to search in text, using an additional inverted index in combonation with docIds->offsets is great. If you have frequent updates to your data and reading is rare, none of those make sense. So it really depends
Sounds like an open question, so you can demonstrate your vast experience of ... well, http://en.wikipedia.org/wiki/NoSQL would be my guess, but you could argue that http://en.wikipedia.org/wiki/Dbm answers the question.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Please help me to find some working application which is using openmpi. I need any name of application which have widely/worldwide usage and based on openmpi (using it). At least the name of that kind application will be enough.
Thanks
OpenMPI is an implementation of MPI. Applications are written using MPI (i.e. the code calls MPI routines), and they can be compiled/run using any MPI implementation (e.g. MPICH2, OpenMPI, LAM-MPI, etc).
So, to answer your question, strictly speaking there is no such thing as an "OpenMPI application".
As for what applications use MPI, there are many. Here's a few:
AMBER (Molecular Dynamics)
Gromacs (Molecular Dynamics)
DL-POLY (molecular dynamics)
FFTW (for parallel Fourier transform)
MATLAB Parallel Computing Toolbox
FLAME (Agent-based modelling)
CASTEP (Materiam science)
POLCOMS (Marine Ecosystem)
WRF (Weather Forecast)
NWCHEM (Computational Chemistry)
... and the list goes on and on.
Well, you could search for MPI benchmarks. There are several popular ones such as NAS, PALLAS, SPEC, etc.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
By this I mean, whats the best way show the uptime of systems? Idealy id like to show some sort of percentage figure, like what the webhosts do. ie 99.5% uptime.
Is there a standard way to determine this?
We use Pingdom to monitor our servers, and they generate the sort of numbers you're looking for (we just use the free account). They also seem to have an API which will let you get your info programatically - no guarantees that'll work with a free account, though.
Hope this helps!