Determining whether mdworker (Spotlight) has completed first scan - macos

How do I determine that mdworker (Spotlight) has completed its first scan? I'm basically looking for the point at which the little "." in the spotlight search icon would go away and you'd be able to perform searches. (Obviously the OS has a way to determine this since it displays a dot until it's ready...) I'm not seeing anything from mdutil and I can't find anything in the Spotlight APIs.
I'm currently forcing my own scan synchronously using mdimport, but this introduces a long delay (from minutes to hours depending on how aggressive I'm being about where to search) and duplicates work that mdworker is already doing.
Any solution (programmatic, scripted, documented, or undocumented) is fair game here.

I opened a DTS for this with Apple. The answer is that there is no supported way to do this as of 10.7. The "little dot" that the spotlight search icon uses is controlled with a private interface.
My goal has been to get an inventory of installed applications.
My solution currently is to gather a list of all the apps in /Applications using fts and searching for things named ".app", and pruning as I go so I don't get sub-applications. (This would be easier to do with NSDirectoryEnumeration, but this particular piece of code is in C++ with Core Foundation. It would be easier to do with CFURLEnumerator, but I need to support 10.4. So fts is fine.)
Scanning for this list is very fast. Once I know the minimum number of apps on the box, I compare that to what system_profiler outputs. If system_profiler tells me that there are fewer apps than I know are in /Applications, then I scan all the bundles myself. Otherwise, I use the output from system_profiler.
This isn't ideal, but it's a decent heuristic, is "mostly" right, and prevents drastic underreporting of applications.

Related

Popup window in Turbo Pascal 7

In Turbo Pascal 7 for DOS you can use the Crt unit to define a window. If you define a second window on top of the first one, like a popup, I don’t see a way to get rid of the second one except for redrawing the first one on top again.
Is there a window close technique I’m overlooking?
I’m considering keeping an array of screens in memory to make it work, but the TP IDE does popups like I want to do, so maybe it’s easy and I’m just looking in the wrong place?
I don't think there's a window-closing technique you're missing, if you mean one provided by the CRT unit.
The library Borland used for the TP7 IDE was called TurboVision (see https://en.wikipedia.org/wiki/Turbo_Vision) and it was eventually released to the public domain, but well before that, a number of 3rd-party screen handling/windowing libraries had become available and these were much more powerful than what could be achieved with the CRT unit. Probably the best known was Turbopower Software's Object Professional (aka OPro).
Afaik, these libraries (and, fairly obviously TurboVision) were all based on an in-memory representation of a framed window which could be rapidly copied in and out of the PC's video memory and, as in Windows with a capital W, they were treated as a stack with a z-order. So the process or closing/erasing the top level window was one of getting the window(s) that it had been covering to re-draw itself/themselves. Otoh, CRT had basically evolved from v. primitive origins similar to, if not based on, the old DEC VT100 display protocol and wasn't really up to the job of supporting independent, stackable window objects.
Although you may still be able to track down the PD release of TurboVision, it never really caught on as a library for developers. In an ideal world, a better place to start would be with OPro. It was apparently on SoureForge for a while, but seems to have been taken down sometime since about 2007, and these days even if you could get hold of a copy, there is a bit of a question mark over licensing. However ...
There was also a very popular freeware library available for TP by the name of the "Technojock's toolkit" and which had a large functionality overlap (including screen handling) with OPro and it is still available on github - see https://github.com/lallousx86/TurboPascal/tree/master/TotLib/TOTSRC11. Unlike OPro, I never used TechnoJocks myself, but devotees swore by it. Take a look.

Automatically take list of terms, import into Windows search function (for content), and export lists of results. (AutoIT?)

My next big challenge is to write a script (I assume it would be in AutoIT, an area I have little experience with) to automate the Windows search function.
The end goal is to take a list of search terms from a .txt file (one string per line), and search the contents of every document on the computer for said search terms (one at a time).
I can make this happen by hand - turn on the search by content function, index all files on all attached drives, search the terms one by one, and highlight all > shift-click > Copy as path > paste in notepad, and save as [searchterm].txt.
However, I need to automate that whole process. I understand that I might need to write a separate script for each version of Windows it would be used with (XP, Vista, 7, 8).
Is this an easy enough task to accomplish, or would it take a lot of programming hours? Can anyone point me in the right direction? All help is appreciated.
Well, assuming your text file of queries is large enough, and you don't want to actually iterate the entire file system for each, you are describing a classic information retrieval problem.
Index the data from your file system (this is a preprocessing that is done only once)
For each query - search for it in the index, and get the relevant documents.
The field of Information Retrieval is a huge area of research, and I really don't encourage you to try implementing it from scratch.
I do encourage using built in libraries that are already developed and tested for you that do it. For example, in java a popular choice is lucene - which is very widely used for searching everywhere.
If you are not familiar with java, I am also aware of python (pylucene) and .NET (lucene.NET) bindings of this library.
To learn more about Information Retrieval I recommend Manning's Introduction to Information Retrieval

Is it possible for a native OS X app to read and copy the Spotlight search index?

I don't want to alter the index in any way, simply read it, monitor it for changes, and replicate it. It would be with a native app/service, that would run in the background. I"m assuming I'd be targeting 10.6+, but that's not written in stone.
Where is the actual index? Can I read it in any semantically useful way?
Googling around, I haven't found any references to the actual Spotlight index location, or an API to read the whole thing. I did find the Search Kit Reference, which seems to explain how the underlying technology works and might be helpful, but doesn't explain how one might retrieve the entire index, or monitor the index over time.
I also noticed an app called Houdah that portends to provide an improved frontend to Spotlight, which may of interest, though I don't know how they acheived their effect - if it's literally just a frontend that calls the same Search Kit API's as Spotlight against the same index, that's not quite what I'm after...
Edit: Can't believe I hadn't read the wikipedia article on Spotlight - good reference, but I think my question stands.
(I'm a front-end web guy, apologies for noobishness.)
UPDATE: An OS X developer friend thought it would be stored in an SQLite database in a hidden file, but couldn't locate the actual file in the few minutes he spent looking. He did find a hidden .spotlight directory, but this was empty.
On Mac OS X 10.7 -- previous versions are significantly different -- the Spotlight index is stored in /.Spotlight-V100/Store-V2. The storage format is undocumented, but is definitely not SQLite.
I doubt that there's any useful way to extract data from the Spotlight index without an impractical amount of reverse engineering. Even if you did, it'd be likely to break with new releases of Mac OS X.

How can I quickly search my code using Windows?

I've got the same problem as in this question, except in Windows. Our product has a 100+ MB code base, and searching for stuff in there takes an awful amount of time (several minutes). It's nice when you can narrow your search to a specific subfolder, but that isn't always possible.
I was wondering if there is some tool that would make it faster, probably by indexing. Accuracy is paramount, if a substring exists somewhere, it must be found, even if the file is not indexed or the index is out of date. Also it would be ideal if .svn folders would be ignored when searching.
Failing that, I was wondering if I could make something like that myself. Is there maybe a ready made indexing engine available for such tasks? I was wondering about Windows Indexing Service (or whatever it is called these days), but so far my experience with it (the Windows standard file search facility) has been rather dismal, with it often missing files that were right in front of its nose.
Yes, I have seen Window Indexing service miss files too, but I haven't checked KBs or user forums for explanations. I'm glad to see it confirmed that it's not just me ;-)!
There look to be alot of file index programs available, I would be surprised if you can't find one that meets your needs (although, see later).
Here are some things to consider:
If your team is using an IDE, isn't there an index feature/plug-in? (none of the SVNs provide Indexing capabilites?). Also, add some tags to your question so this will be seen by other windows developers using the same dev enviorment that you are using.
The SO link you provided mentions several options: slocate, rlocate, and I found mlocate. The wikipedia page for slocate says
Locate32 for Windows Windows analog of GNU locate with GUI, released under GNU license
which seems to meet your main requirement. Looking at the screen shots with the multi-tab interface (one labeled advanced) would give me hope that you can exclude svn (at least from results, possibly from what is indexed).
Your requirement for
if a substring exists somewhere, it
must be found, even if the file is not
indexed or the index is out of date.
seems contradictory. For the substring requirement, I can see many indexing programs ignore c lang syntax elements ( {([])}, etc), and, for example, 'then' is either removed because it is considered a noise word, or that it gets stemmed-down to 'the' and THEN is removed because it is noise word.
To get to 'must be found', and really be sure, you would have to develop a test suite to see what the index program is doing for anything that is corner case. (For a 100 MB code base, not out of the question, especially since you are considering rolling your own).
Finally 'even if the file is not indexed ...'. Well, you either use an index or your don't (obviously). Unfortunately, for your requirement, while rlocate is looking for changes all the time, slocate (on Unix) doesn't seem to. Probably if you read/check on the docs or user forums for locate32 you'll get the answers you need.
Rlocate would give you what you need, but from an rlocate page 'rlocate will work only on Linux with version 2.6.'. mlocate doesn't seem to be have a Windows port either only.
Finally here is a link I found that is interesting about mlocate : mlocate vs rlocate. This is the google cache, because the redhat.com said 'not available'.

Use NSSpeechRecognizer or alternative with audio file instead of microphone input?

Is it possible to use the NSSpeechRecognizer with an pre-recorded audio file instead of direct microphone input?
Or is there any other speech-to-text framework for Objective-C/Cocoa available?
Added:
Rather than using voice at the machine that is running the application external devices (e.g. iPhone) could be used for sending just an recorded audio stream to that desktop application. The desktop Cocoa app then would process and do whatever it's supposed to do using the assigned commands.
Thanks.
I don't see any obvious way to switch the input programmatically, though the "Speech" companion guide's first paragraph in the "Recognizing Speech" section seems to imply other inputs can be used. I think this is meant to be set via System Preferences, though. I'm guessing it uses the primary audio input device selected there.
I suspect, though, you're looking for open-ended speech recognition, which NSSpeechRecognizer is not. If you're looking to transform any pre-recorded audio into text (ie, make a transcript of a recording), you're completely out of luck with NSSpeechRecognizer, as you must give it an array of "commands" to listen for.
Theoretically, you could feed it the whole dictionary, but I don't think that would work since you usually have to give it clear, distinct commands. Its performance would suffer, I would guess, if you gave it a bunch of stuff to analyze for (in real time).
Your best bet is to look at third-party open source solutions. There are a few generalized packages out there (none specifically for Cocoa/Objective-C), but this poses another question: What kind of recognition are you looking for? The two main forms of speech recognition ('trained' is more accurate but less flexible for different voices and the recording environment, whereas 'open' is generally much less accurate).
It'd probably be best if you stated exactly what you're trying to accomplish.

Resources