How to create an index for a text file? - text-files

Can anyone point me to a good resource on how to build an index for a large text file?
I am using an OLE Driver on Windows to execute SQL queries against the text file and I don't want to have it read through the entire file.
I've tried googling the topic but I can't find a good resource.
Thank you!

In case anyone else comes across this post looking for a resource on how to index a text file, I found a great tutuorial at:
Indexing Random Access Files in Visual Basic
Hat-tip to Rick Meyer for writing a clear and educational article on the subject.
Originally, my plan was to use an OLE driver to execute SQL queries on text files. After reading Rick's article above along with this tutorial: Open for Random in Visual Basic, I decided to abandon the SQL approach and perform binary searches on random access files instead.
Choosing Random Access instead of a SQL solution (like the ones suggested in the comments above) is certainly more involved, however, I am very happy with the speed and lightweight nature of the Random Access based application.

Related

Present content like power point or impress

I want to build an application that displays the content that user types on the command prompt to the display like a presentation.
I am writing this application in golang. If there are existing libraries that I can use to do this great and if not would need direction how to approach solving this.
I did search on the internet for pointers but found none.
Have a look at the present tool, it does a similar thing using flat files and might even be useful for you.
https://godoc.org/golang.org/x/tools/present

Is there a common solution to generate msoffice, libreoffice and pdf documents on the fly?

I would like to generate office documents (msoffice, oo) and pdf on the fly from one source document. Currently i think about opendocument as templates files and libreoffice-headless as converter.
Does anybody have experience on this topic and is there a (commercial?) ready to use solution?
A commercial solution is Docmosis which has a downloadable and cloud-service solutions using MSWord/OpenOffice documents as templates and providing template-population features, load balancing, doc/docx/odt/pdf/rtf/html production and quite a few other features. One of it's key features is to generate point-in-time output in multiple formats (from the same template and data) as you mentioned. It has at least one Ruby example to show the population features. Please note I work for the company that created Docmosis.
Another option is the open source JOD Reports.
I hope that helps.

Getting an IVsTextLines from file path

I've written a basic LanguageService extension for Visual Studio 2008 for my studio's proprietary scripting language. It works perfectly fine, and I've implemented a basic symbol table to keep track of script definitions and calls allowing for goto definition functionality.
The problem I've run into is that I only know how to parse the current active view, and I'd like to scan the entire solution's contents so that the user can goto the definition of a script defined in a file they have yet to open and have parsed. I've figured out how to generate a list of all files in the solution, but now I need to create a new Microsoft.VisualStudio.Package.Source which requires a Microsoft.VisualStudio.TextManager.Interop.IVsTextLines and I have no idea how to create a new one based off of the file I have.
Maybe I'm going about the problem the wrong way and someone can point me towards a better way to cause a file to be parsed by the LanguageService.
Regards,
Colin
Poking around I found that the reason Visual Studio needs a new Source is that it's keeping an internal list of them, and they're like the view into the text file held by the editor.
I came to the conclusion that files that are closed do not need IVsTextLines or to be entered into the VS internal list of Source files because I'm not doing any operations directly on them, all I care about in this case is to build a table of symbols and their corresponding TextSpan. So instead I created a new API for my parser that just took in a string and built my AST instead of grabbing the text from a ParseRequest, and only worried about specific types of symbols I needed to record. I then pushed this into a BackgroundWorker.
So I guess I was going about the problem in the wrong way. Although it does seem weird I can't just trigger a file to be opened into the Source list.
Interestingly I asked this question to Microsoft on their support forums and they advised me I had to purchase some service and support plan for them to answer my question.

Libraries/Tools for Website Parsing

I would like to start working with parsing large numbers of raw HTML pages into semantic data structures.
Just interested in the community opinion on various available tools for such a task, particularly various useful libraries in any language.
So far, planning on using Hadoop to manage a lot of the processing, but curious about alternatives.
First you need to download your page source and then create a DOM tree.
if you are coding in C# you can user the following tools to create your DOM tree.
1) http://htmlagilitypack.codeplex.com/
2) http://www.majestic12.co.uk/projects/html_parser.php
the first one is easy to use but second one is much faster and memory friendly and I suggest you to use the second one if you want to create a robust application
then you can extract usefull content from web page using:
http://www.chrisspen.com/blog/how-to-extract-a-webpages-main-article-content.html
and many other articles you can find to extract content from web page by Googling (extract main content from web page)
Hope it helps

MS Excel automation without macros in the generated reports. Any thoughts?

I know that the web is full of questions like this one, but I still haven't been able to apply the answers I can find to my situation.
I realize there is VBA, but I always disliked having the program/macro living inside the Excel file, with the resulting bloat, security warnings, etc. I'm thinking along the lines of a VBScript that works on a set of Excel files while leaving them macro-free. Now, I've been able to "paint the first column blue" for all files in a directory following this approach, but I need to do more complex operations (charts, pivot tables, etc.), which would be much harder (impossible?) with VBScript than with VBA.
For this specific example knowing how to remove all macros from all files after processing would be enough, but all suggestions are welcome. Any good references? Any advice on how to best approach external batch processing of Excel files will be appreciated.
Thanks!
PS: I eagerly tried Mark Hammond's great PyWin32 package, but the lack of documentation and interpreter feedback discouraged me.
You could put your macros in a separate excel file.
Almost anything you can do in VBA to automate excel you can do in VBScript (or any other script/language that supports COM).
Once you have created an instance of Excel.Application you can pretty much drop your VBA into a VBS and go from there.
If it's the Excel/VBA capability that you're looking to use then you could always start by creating all of the code that will interact with the Excel files you're wanting to work on within an Excel file - a kind of master file that is separated from the regular files, as suggested by Karsten W.
This gives you the freedom to write Excel/VBA.
Then you can call your master workbook (which can be configured to run your code when the book is opened, for example) from a VB script, batch file, Task Scheduler, etc.
If you want to get fancy, you can even use VBA in your master file to create/modify/delete custom macros/VBA modules in any of the target files that you're processing.
The info for just about all of the techniques I'm describing I got from the Excel VBA built-in reference docs, but it certainly helps to be familiar with the specific programming tasks that you're tackling. I'd advise that the best approach is to put together your tasks (eg, make column blue, update/sort data etc) one by one and then worry about the automation at the end.

Resources