I use Microsoft Document Explorer version 9.0.21022.8.RTM. Why do I feel that using search in this MS Documentation is quite slow and quite bad? What i miss, what I do wrong?
Quite slow: When I search the same thing in google, yahoo or bing, i get the same results much faster. In dexplorer.exe i configured 20 results per page, local only. Each time i get 500 total results and of course I always find the thing I search in the first page of 20 results. In 'Tools' menu i can't configure to have total results first 50 or 100 max.
Quite bad: When i click F1 on a keywork in Visual Studio often i get very bad result in dexplorer. Some object which is quite far from what I do.
And in the end google becomes one of my best friends. I don't believe microsoft are stupid but they are flighting to prove it very hard. It's not a big problem but still i prefer to use a desktop app with much flexible & advanced search (like filtering language, platform, content, etc) then a browser.

Your itty-bitty machine is no match against the hardware and smarts that Google throws at the search problem. Use Google to search, you can use the site: selector to narrow the search to MSDN Library pages. The Microsoft search page is not entirely unfunctional either but lately tends to show too many unhelpful MSDN forum questions. You find topics in DE by using the index. Which is very well done, it does take a bit of time to learn how to use it however.
The Contents tab is only useful when you found a topic and want to see associated topics or introductory material. Use the "Sync with Table of Contents" button on the toolbar. It doesn't always work for older topics.


LinkedIn bot. Search profile for keywords and export to word

I was thinking about a way to sort of automate my job/have to look through less LinkedIn profiles.
So here is my question. Would it be possible to write a program that would search LinkedIn for you with your "keywords" and have to program automatically click through the profiles, then when it clicks a profile search for each individual keyword and keep count, then export the amount of times the keywords are mentioned to a word document, then go back and do that to each profile. I have no ides what language could do this though and I only have a highschool class worth of Javascript so I would be teaching myself how to do this. I could run this program and night and come back in the morning and be able to look through the best profiles and waste less time looking though ones where people do not have the experience they say they do.
Basically it would go:
Execute search
click first profile
find total number of keywords
export to word
click back or return to results button
next profile
repeat for, say, 300 profiles.
I don't know how feasible this would be to figure out how to write or if its even really possible. Thanks for the helpful replies!
I got some help on reddit, and the replier said that it would probably be easiest in Ruby/RubyGems?
Your best option would probably be to use a process called "scraping"; You extract the html from the page and sort through it for useful information.
Programming languages are like religons; different people say different languages are the best. For parsing html most people (not all) would agree a high-level language like Ruby or Python would be best. However, you did specify ruby, so start by installing it.
After installing ruby (see here), run gem install nokogiri
You can look for general guides on nokogiri here. Start by looking at the source code and seeing where the interesting information is (eg. links to the profiles on the search page). 300 profiles should be no problem. However, when you are testing make sure you only try 3 or 4 profiles at a time. A program requesting 300 pages being run many times may get noticed, but a one-time run should be fine (no guarantees).
Also, I would not recomend exporting to word. You can scan the raw text for keywords and it will be much faster.
As a final note, this will take a long time. From what it sounds like you haven't programmed much before (although previous experience in javascript will help). A lot of your time will most likely be spent reading through tutorials and searching your problem on google. Feel free to come back here when you have specific problems, and good luck!

Why do search APIs return different results from the main search engine?

I've been playing around with the Google and Bing search APIs, and I've noticed that even when both are configured to search the entire web, the APIs return different results from conducting searches on the actual search engines.
I've also noticed that for very long queries, the APIs tend to return very few results, and sometimes no results when a normal search on their website would return many results.
Why is this?
Search Engines tend to weight the results based on your own usage patterns online. Let's say you search for "Fluffy kitten", and you regularly spend time browsing around bar/restaurant directory sites, you might get that new hipster bar "Fluffy Kitten" at the top of your search results, while the person who spend more time reading pet fanatic sites might get the cute and cuddly search results.
This often catches people out who think their cool new site is number one in Google, when in fact, nobody else has the result at #1 and Google is just favouring it for them based on their online activity.
The APIs don't have the same knowledge about the user, so your results will vary.
There may be other reasons, but this usage-tracking stuff is very true.

How can I quickly search my code using Windows?

I've got the same problem as in this question, except in Windows. Our product has a 100+ MB code base, and searching for stuff in there takes an awful amount of time (several minutes). It's nice when you can narrow your search to a specific subfolder, but that isn't always possible.
I was wondering if there is some tool that would make it faster, probably by indexing. Accuracy is paramount, if a substring exists somewhere, it must be found, even if the file is not indexed or the index is out of date. Also it would be ideal if .svn folders would be ignored when searching.
Failing that, I was wondering if I could make something like that myself. Is there maybe a ready made indexing engine available for such tasks? I was wondering about Windows Indexing Service (or whatever it is called these days), but so far my experience with it (the Windows standard file search facility) has been rather dismal, with it often missing files that were right in front of its nose.
Yes, I have seen Window Indexing service miss files too, but I haven't checked KBs or user forums for explanations. I'm glad to see it confirmed that it's not just me ;-)!
There look to be alot of file index programs available, I would be surprised if you can't find one that meets your needs (although, see later).
Here are some things to consider:
If your team is using an IDE, isn't there an index feature/plug-in? (none of the SVNs provide Indexing capabilites?). Also, add some tags to your question so this will be seen by other windows developers using the same dev enviorment that you are using.
The SO link you provided mentions several options: slocate, rlocate, and I found mlocate. The wikipedia page for slocate says
Locate32 for Windows Windows analog of GNU locate with GUI, released under GNU license
which seems to meet your main requirement. Looking at the screen shots with the multi-tab interface (one labeled advanced) would give me hope that you can exclude svn (at least from results, possibly from what is indexed).
Your requirement for
if a substring exists somewhere, it
must be found, even if the file is not
indexed or the index is out of date.
seems contradictory. For the substring requirement, I can see many indexing programs ignore c lang syntax elements ( {([])}, etc), and, for example, 'then' is either removed because it is considered a noise word, or that it gets stemmed-down to 'the' and THEN is removed because it is noise word.
To get to 'must be found', and really be sure, you would have to develop a test suite to see what the index program is doing for anything that is corner case. (For a 100 MB code base, not out of the question, especially since you are considering rolling your own).
Finally 'even if the file is not indexed ...'. Well, you either use an index or your don't (obviously). Unfortunately, for your requirement, while rlocate is looking for changes all the time, slocate (on Unix) doesn't seem to. Probably if you read/check on the docs or user forums for locate32 you'll get the answers you need.
Rlocate would give you what you need, but from an rlocate page 'rlocate will work only on Linux with version 2.6.'. mlocate doesn't seem to be have a Windows port either only.
Finally here is a link I found that is interesting about mlocate : mlocate vs rlocate. This is the google cache, because the said 'not available'.

Google crawling indexing algorithms

I am looking for some documents on how Google crawl and index content. I read many "light" papers and articles on what you need to do to improve your ranking and make sure your content is properly indexed but I am looking for some more advanced technical documents on how Google crawl and index content.
The things I would like to know more about:
What elements Google look for when it crawls: page content, URLs format, keywords, description etc...
How the index is updated?
Basically, I am trying to understand why some pages are indexed but not others even if the formats are similar. Why only 10% of my site's pages appear when I do a search on the entire domain even if I can see on my server logs that Google crawled every single link.
The answers to both things are closely-guarded trade secrets, ostensibly to prevent gaming the system.
Also keep in mind that Google makes over 400 algorithmic changes per year, making it close to impossible for an outsider to be accurate and up-to-date. Short of working for Google, you're likely not going to find an in-depth and accurate answer.
However, Matt Cutts, head of the web spam team, frequently provides the most accurate insights in how Google handles content, both on his blog and on the GoogleWebmasterHelp YouTube channel. It's worth going through his content to get a much better understanding of Google's methodology.
In order to provide a technical approach of how a webcrawler works I will suggest you to take a deep look into solution.
A typical webcrawler displays the following areas, a fetcher, a parser, and indexer and a searcher. To put it briefly a webcrawler fetch all urls available on a website and creates segments where its store up to 101kb per page. Those pages are parsed but typical words such as and-or-the are not stored but other words are analyzed using bayesian calculations in order to make a rank.
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. These tasks are mainly performed by storing a list of occurrences of each search critera, typically in the form of a hash table or binary tree using an inverted index.
As Mark stated Google´s calculations are mainly trade secrets but Patents issued by google could be a good start. Pagerank analyses backlinks mainly and the importance that websites pointing to your site have on people´s preferences. In my experience its important to offer an xml sitemap stating all your webpages at your site. On that sitemap you could define the crawl frequency for each page. is an interesting possibility.
Google Website Optimizer will give you the chance to see what is google finding on your site, logs are ok but probably the robot finds problem and the best way to know that is with google´s website optimizer in order to display errors.
Finally most of your concerns are things that SEO´s specialist live for, I suggest you to check sites like and their tools... You will learn how to position your website better on organic results on search engines.
hope it helps!, sebastian.
"Yes" Google like fresh & unique content.
Use Google webmaster guideline "try this instead" H1 or H2 meta tag on your HTML programming under the head tag ....your keyword. Anchor have to must use your business related keywords in H1, H2, it can help your site search engine.
Also use for Rich snippets in this tag..!
It scans you web page very precisely and sensitively. Factors like you have javascript embedded or in different file matter, whether you are using frames in designing or using heavy graphics can reduce the ranking of your page. Keywords are obviously rank affecting entities. Broken links also bring your website ranking down.
Basically you can refer to to go through all the important points of google's crawler. This will take a maximum of 40 mins.
MapReduce: Simplified Data Processing on Large Clusters
I analysed the latest algorithm and found that now
Google gives more importance to CONTENT rather than LINKS.
So if your content is good enough with properly available tags, Google will automatically generate index for you. I would suggest H1 - H6 all to be used in good manner.

Ways to enhance a trial user's first time experience

I am looking for some ideas on enhancing a trial-user's user experience when he uses a product for the first time. The product is aimed at a particular domain and has various features/workflows. Experienced users of the product naturally find interesting ways to combine features to get the results they want (somewhat like using an IDE from a programmer's perspective).Trial users get to use all features of the product in a limited fashion (For ex: If there is a search functionality, the trial-user might see only the top 20 results, or he may be allowed to search only a 100 times). My question is: What are the best ways to help a trial-user explore/understand the possibilities of the product in the trial period, especially in the first 20 - 60 mins before the user gives up on the product?
Edit 1: The product is a desktop app (served via JNLP, so no install required) and as pointed out in the comments, the expectations can be different in this case. That said, many webapps do take a virtual desktop form and so, all suggestions are welcome.
Check out how handles this. It's an invoicing app, but to prevent it from looking too empty for a new account, they show static images in places where you'd actually have content if you used the app. Makes it look less barren at first until you get your own data in.
if you can, avoid feature limiting a trial. it stops the user from experiencing what the product is ACTUALLY like. It also prevents a user from finding out if a feature actually works like they want/expect/need it to.
if you have a trial version, and you can, optimise it for first time use. focus on / highlight the features that allow the user to quickly and easily get benefits for useful output from the system.
allow users to export any data they enter into a trial system - and indicate that this is possible/easy. you don't want them to be put off from trying something because of a potential for wasted effort.
avoid users being required to do lots of configuration before using a trial. prepopulate settings based on typical/common/popular settings. you may also want to consider having default settings for different types of usage. e.g. "If you want to see what the system is like for scenario X, use configuration J. If you want to see what the system is like for use case Y, use configuration K." where J & K are collections of settings best suited to a particular type of usage.
I'll speak from personal experience while evaluating trial applications.
The most annoying trial applications are those which keep popping up nag screens or constantly reminding me that I'm using a trial. Trials which act exactly like the real product from the beginning till the end of the trial period are just awesome. Limited features are annoying, the only exception I can think of when you could use it is where you have rarely used feature which would allow people to exploit the trial (by using this "once-in-lifetime" needed feature and uninstalling). If you have for example video editing software trial which puts "trial" watermark on output, I'd uninstall it as soon as I'd notice it. In my opinion trial should seamlessly integrate into user work-flow so that once the trial ends they would think "Hey, I have been using this awesome program almost each day since I got the trial, I absolutely have to buy it." Sure some people will exploit it, but at the end you should target the group which will use your product in daily work-flow instead of one time users. Even if user "trials" it 2 times per year, he will keep coming back to your product and might even buy it after 2nd or 3rd "one-time use".
(Sorry for the wall of the text and rant)
As for how to improve the first session. I usually find my way around programs easily, but one time only pop-up/screen (or with check-box to never show it again) with videos showing off best features and intended work-flow are quite helpful. Also links to sample documents might be helpful. If your application can self-present itself (for example slide-show about the your slide-show program) you could include such document. People don't like to read long and boring help files, but if you have designer in your team, you could ask him to make a short colourful intro pdf. Also don't throw all the features at the user at the same time. Split information into simple categories and if user is interested into one specific category keep feeding him more specific information. That's why videos are so good, with 3-6 x ~3-5 minute videos you can tell a lot. Also depending how complex your program is you could include picture with information where specific things are located on the screen.
Just my personal opinion, I have never made a trial myself. Hope it helps.
An interactive walk through/lab exercise that really highlights the major and exciting offerings of your application.
Example: Yahoo mail does the same when the users opt to use new mail interface
There are so many ways you can go with this. I still can't claim to have found the best approach.
However, my plan from the beginning with my online (Silverlight) software was to give away something thousands of people will find useful and can use for free. The free version is pretty well representative of the professional product, with only a few features missing that enhance productivity (I'm working on those professional features now). And then I do have a nag popup that comes up every 5 minutes suggesting that you should buy it. That popup can be dismissed as many times as you want. I know that popup will annoy some people but I suppose that's the trade off. There is no perfect plan. But I don't think the occasional nag popup scares that many people away, especially when it can be dismissed with a single click.
I was inspired by Balsamiq Mockups, which has been hugely successful over the past couple years. My trial/nag popup way of doing things was copied almost exactly from Balsamiq. I honestly don't know if this is the ideal plan, but it has obviously worked for them. By the way, I think another reason for Balsamiq's success is that the demo doesn't have to be downloaded & installed. Since the demo is in Flash, there's a very high conversion rate of users actually trying it and becoming addicted to it.
