I want to make a english word-meaning dictionary as a file/folder, what will be the most efficient way to access it? - performance

So i want to make a dictionary and there are two ways i can think of accomplishing this
a dictinary directory where each word is stored as a filename with its meaning as the filecontent
a single file with all words
First question can someone suggest any better way to store a dictionary which is efficient ? (i have to store this as a file)
Second question which of 2 methods suggested by me is more faster to access ?
Please note i am not going to search through this directory since it will be huge rather only check if the word exists or not by checking if the file exists and then print its content if its exists.
Edit: added how am i going to use the dictionary as suggested by #Vlad Feinstein

The most important part is missing from your question: how are you going to use that dictionary?
However, in order to access words stored in individual files you would have to use OS-provided API, that is most likely less efficient that to go through data in a single file.
Please note that many words have multiple meanings. Did you consider some popular data formats, like JSON, to store all that?

Related

Parsing STDF Files to Compare results

I am new to this site and I would like to get some inputs regarding parsing STDF files. Generally speaking, I am trying to parse a STDF file to gather only the results (numbers) and not the rest of the line. If I am able to achieve this, I would then like to compare all the numbers together through a bubble sort or insertion sort and see if any numbers are equal to each other. I am capable of doing this in C/C++ and Java but I have no experience parsing documents using Scripts.
Could anyone push me in the right direction? What should I be reading to learn my way around this?
Are you already using an STDF library?
You did not mention one, so I assume not.
You should find a library you are comfortable with (the list changes over time, but you can find some by Googling or looking at the STDF page on Wikipedia) rather than attempting to parse STDF yourself, unless you have a good reason to recreate the STDF parser wheel.
An STDF file contains many tests. It generally does not make sense to compare the results for different tests, so I assume you are looking for matching values within the set of results for each test.
I would use your chosen STDF parser to read the value of each test for each part. Keep a set of the results for each test. As you read each new result, check the set to see if already exists. If it does, you have found the case you were looking for, otherwise add the result to the set.

Is it possible to get items from DynamoDB where the primary key ends with a given string?

Is it possible, using the AWS Ruby SDK (or just DynamoDB in general), to get an item or items from a table that uses a primary key only, and where that primary key ends with a certain string?
I haven't come across anything in the docs that explicitly answers this question, either in the ruby ddb docs or the general docs for ddb. I'm not saying the question is not answered, but if it is, I can't find it.
If it is possible, could someone provide an example for ruby or link to the docs where an example exists?
Although #Ryan is correct and this can be done with query, just bear in mind that you're doing a "full-table-scan" here. That might be OK for a one-time job but probably not the best practice for a routine task (and of course not as a part of your API calls).
If your use-case involves quickly finding objects based on their suffix in a specific field, consider extracting this suffix (assuming it's a fixed-size suffix) as another field and have a secondary index on that one. If you want to query arbitrary length suffixes, I would create a lookup table and update it with possible suffixes (or some of them, to save some calls, and then filter when querying).
It looks like you would want to use the Query method on the SDK to find the items your looking for. It seems that "EndsWith" is not available as a comparison operator in the SDK though. So you would need to use CONTAINS and then check your results locally.
This should lead to the best performance, letting DynamoDb do the initial heavy lifting and then further pruning the results once you receive them.
http://docs.aws.amazon.com/sdkforruby/api/Aws/DynamoDB/Client.html#query-instance_method

How to split a large csv file into multiple files in GO lang?

I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.

How to uniquly identify an two objects in same page having same url

I Have two objects in same page but with different locations(tabs), I want to verify those objects each a part ...
i cant uniquely any of objects because the have same properties.
These objects clearly are unique to a point because they have completely different text, this means that you will be able to create an object to match only one of them. My suggestion would be to look for the object by using its text property, one of them will always have "Top Ranking" the other you wil need to turn into a regular expression for the text and will be something "Participants (\d+)".
I am assuming that this next answer is unlikely to be possible so saved it for after the answer you are likely to use but the best solution would of course be to get someone with access to give these elements ids for you to search for. This will in the long term be much easier for you to maintain and not using text will allow this test to run in any language.
Manaysah, do these objects have different indexes? Use the object spy and determine which index they have, the ordinal identifier index may be a solution to your problem. You could also try adding an innertext object property if possible, using a wildcard for the number inside the () as it appears dynamic.
try using xpath for the objects...xpath will definitely be different

Is it possible to use Windows Api (win7) in order to mass rename files?

While in a folder with lots of files, one can select many and rename only one. This one will get the name NewName (1) and the rest will follow as NewName (2) etc..
Is there a way to use this algorithm?
I mostly interested in using WinApi methods in general. It is easy to implement this specific algorithm. I don't know how to dig into explorer.exe and see what method it uses but probably it would be something reusable.
I mostly use c# but any language example would be accepted.
Not with a single function call, no. But you can loop through the files one at a time using SHFileOperation() with the FOF_RENAMEONCOLLISION flag to rename each file to the same target filename so Windows will generate its own unique filenames.
As pointed out by Remy Lebeau I came up with the official way to do it.
IFileOperation::RenameItems
Declares a set of items that are to be given a new display name.
All items are given the same name.
...
If more than one of the items in the collection at pUnkItems is
in the same folder, the renamed files are appended with a number in
parentheses to differentiate them, for instance newfile(1).txt,
newfile(2).txt, and newfile(3).txt.
Here is the referenced link.
This also answers my question on where to start to using windows shell api to do stuff. The answer is here.
Yes it is possible see here http://blog.gadodia.net/stupid-windows-trick-mass-renaming/ I used this to oganized and Number files in mass

Resources