Data structure for dealing with millions of record [closed] - data-structures

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Which data structure is appropriate for operating over millions of records and later need to iterate over it.

While simple linked list might be sufficient for your needs, in case you also need to be able to maintain records in sorted order, and efficiently access records or begin iteration at a arbitrary point, I would recommend looking in to using a B-tree.
In case you want to persist it to disk, you should use a key-value store, which often use B-tree's (or LSM Trees) "under the hood" as well as providing ACID guarantees. Examples include LMDB, BerkeleyDB, LevelDB
In short, use a database.


Is it necessary to memorize the codes of data structures? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Is it necessary to memorize the code of data structures like linked lists, dynamic arrays , circular linked list, queues , stacks , Graphs etc. Or just the basic knowledge of code is enough ? What kind of questions can be asked in a job interview regarding data structures ?
I don't know what your (future) employer may ask, but generally, I'd say no. You have to know how they work and what they're used for, expecially which data structure serves which purpose with its advantages/disadvantages. If you know that, you'll be able to write the code of such a structure without having it memorized - because you know how it will work.

Which way is more efficient to learn data structures? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
My programming knowledge is up to OOP since that was the last thing we covered in the university. However, I am taking 2 courses this summer and I am constantly under pressure, but I am planning to learn data structures along the way too, to be prepared for it next semester.
I had two plans to learn it but I am not sure which one will be more efficient:
-The first one is to skim through and learn about all the types of data structures and how they are implemented.
-The second one is to try instead of just reading and knowing about a data structure, I will go and try to implement it. However, the drawbacks are that its slow and time consuming, so I might not be able to learn all of the data structures in time
Practice using the data structures in your code.
Code those data structures from scratch.
Repeat steps 1 and 2.
There is really no shortcut for that.

What's the best way to load huge volume tables using Informatica? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Currently, in our project, we are using Informatica for Data loading.
We have a requirement to load 100 tables (in future it will increase) and each has 100 Million records, and we need to perform delta operation on that. What might be the best way to perform this operation in an efficient way?
If it's possible, try truncate and load. This way after each run you will have a full, fresh dump.
If you can't truncate the targets and need the delta, get some timestamp or counter that will allow to read modified rows only - like new and updated. Some "upddated date". This way you will limit the number of data being read. This will not let you do the deletes, though. So...
Create a separate flow for seeking deleted rows, that will not read the full row, but IDs only. This will still need to check all rows, but limited to just one column, so as a result it should be quite efficient. Use it to delete rows in target - or just to mark them as deleted.

Searching through an list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I'm reading about AI and in the notes it is mentioned
A lookup table in chess would have roughly 35^100 entries.
But what does this mean? Is there any way we could find out how long it would take the computer to search through and find it's entry? Would we assume thereis some order or that there is no order?
The number of atoms in the known universe is estimated to be around 10^80 which is much less than 35^100. With current technology, at least a few thousand atoms are required to store a single bit. I assume that each entry of your table would have multiple bits. You would need some really advanced technology to implement the memory of your computer.
So the answer is: With current technology it is not a matter of time, it is simply impossible.

Find plagiarism in bulk articles [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a 20,000 collection of master articles and I will get about 400,000 articles of one or two pages everyday. Now, I am trying to see if each one of this 400k articles are a copy or modified version of my collection of master articles (a threshold of above 60% plagiarism is fine with me)
What are the algorithms and technologies I should use to tackle the problem in a very efficient and timely manner.
Fingerprint the articles (i.e. intelligently hash them based on the word frequency) and then look for statistical connection between the fingerprints. Then if there is a hunch on some of the data set, do a brute force search for matching strings on those.
