firefox 3 address bar auto-complete dependency - firefox

Background:
Those of you who use FF3 may be familiar with an interesting new attribute of the address bar. It allows you to do sub-string auto-complete in order to filter through URLs that you have viewed previously.
Therefore, if you want to open the following URL:
http://longservernamehere.thatyou.nevercanremember.com/support/asdf1235234/kbid?1245
You can simply type any sub-strings of that URL that are sufficient to uniquely distinguish the URL:
long<space>never<space>support<ENTER>
This changes the way users can think about URLs, because now all they have to remember are the keywords (sub-strings) that will help narrow down the potential links
Problem: This feature is great, but there is a downside. Users have a decreased incentive to bookmark and memorize URLs. This obviously becomes a problem if a User needs to type in a URL at a remote site (for example during a sales call) and they fumble around because they cannot remember the URL of the snazzy product catalogue that they want to show during a meeting.
Obviously, there are ways around this problem: bookmark your urls and copy your bookmarks to your laptop before you go on a meeting; use a third-party solution or online bookmarking portal; social bookmarking sites and so on.
Question
The question is, for those users who do not want to use any of the above workarounds, is there actually a way to directly dig into the FF3 internals so I can write a script that will extract the components necessary to replicate a users auto-complete behavior on any machine?

Firefox stores all this information in SQLite databases. You can query it directly if you have SQLite installed. You can also browse it using the SQLite Manager Firefox plugin.
In summary, the url history is stored in moz_places, and the various "phrases" that you have typed in the address bar are associated with places via moz_inputhistory, which is a child table.
Their algorithm seems to be: as you type each character into the address bar, query moz_inputhistory for matching entries and display them in descending order by use_count.
Hope that helps.
EDIT: This site has a bunch of good information about the Firefox databases: firefoxforensic.com

Related

Scraping pages that do not seem to have URLs

I'm trying to scrape these listings and provide more exposure for these job listings on a site that belongs to a client of mine. The issue is that I need to be able to link to the specific job listing in order for the job seeker to apply. This is the page I'm trying to save listing links from.
It would be ideal if I could save an address for the job seeker to click on to see the original listing and then apply.
What is this website doing to not feature a URL for these pages
Is it possible to provide a listing specific address
If that's possible how could I generate that address?
If I can't get a specific address I think I could get it so that the user clicks a link that triggers an internal script on my client's site which takes the listing ID and searches the site I found that listing on, and then redirects the user to that specific listing.
The downside to this is that the user will have to wait a little while depending on how far back the listing is on a directory. I could put some kind of progress bar with a pleasant "Searching for your listing! Thanks for being patient" message.
If I can avoid having to do this, though, that'd be great!
I'm using Nokogiri and Mechanize.
The page you refer to appears to be generated by an Oracle product, so one would think they'd be willing to construct a web form properly (and with reference to accessibility concerns). They haven't, so it occurs to me that either their engineer was having a bad day, or they are deliberately making it (slightly) harder to scrape.
The reason your browser shows no href when you hover over those links is that there isn't one. What the page does instead is to use JavaScript to capture the click event, populate a POST form with some hidden values, and call the submit method programmatically. This can cause problems with screen-readers and other accessibility devices, as well as causing problems with the way in which back buttons have to re-submit the page.
The good news is that constructions of this kind can usually be scraped by creating a form yourself, either using a real one on a third party page, or via a crawler library. If you post the right values to the target URI, reverse-engineered from examining the page's script, the resulting document should be the "linked" page you expect.

Joomla - filter content by IP addresses (intranet / extranet)

we are developing site (unfortunately on Joomla), where we need to restrict access to some content - articles (and also categories if possible).
Ideally, this content should be hidden even from menu. It will be enough if we were able to specify three access levels for our articles:
public visibility
visible only for intranet
visible only for extranet
Unfortunately we found no extension that could meet our requirements.
Do you have any suggestions, where we should implement this IP filter (detect IP address and check if it is from intranet or extranet is simple task, but we are quite new to Joomla API).
Approach 1 would be a System plugin, as #Lodder suggested. It would pick $_SERVER['REMOTE_ADDR'] (check this, as if you're behind a proxy, another variable might need checking - like X-FORWARDED-FOR or another). Then you can check it vs. conditions set for the article or category — yet to decide how exactly you would mark a particular article as 'Intranet only'. In case of 'access denied' just redirect visitors to the home page. All articles would be open to all by default, and can be market either 'intranet' or 'extranet'.
Approach 2 would be to have two sites instead of one, sharing same database. They can use individual template files, picking different module positions to place menus. Thus there will be two sets of menus in the system: one for Intranet, one for Extranet. Of course in this case anyone with a correct link would be able to access any article, no matter what IP he comes from. So its just a decoration.

Get URI fragment (hash) to affect SEO? Get indexed by SEs?

I am building a forum site where the post is retrieved on the same page as the listing via AJAX. When a new post is shown, the URI fragment is changed (ex: .php#1_This-is-the-first-post). Also the title and meta tags are changed.
My question is this. I have read that search engines aren't able to use #these-words. So therefore, my entire site won't be able to be indexed (as it will look like one page).
What can i do to get around this, or at least make my sub-pages be able to get indexed?
NOTE: I have built almost all of the site, so radically changes would be hard. SEO is my weakest geek-skill.
Add non-AJAX versions of every page, and link to them from your popups as "permalinks" (or whatever you want to call them). Not only aren't your pages available to search engines, they can't be bookmarked or emailed to friends. I recently worked with some designers on a site and talked them out of using an AJAX-only design. They ended up putting article "teasers" in popups and making users go to a page with a bookmarkable URL to read the complete texts.
As difficult as it may be, the "best" answer may be to re-architect your site to use the hash tag URL scheme more sparingly
Short of that, I'd suggest the following:
Create an alternative, non-hash based URL scheme. This is a must.
Create a site-map that allows search engines to find your existing pages through the new URL scheme.
Slowly port your site over. You might consider adding these deeper links on the page, or encourage users to share those links instead of the hash-based ones, etc.
Hope this helps!

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

How do I Extend Blogengine.Net to collect statistics of visitors?

I love BlogEngine. But from what I can se it does not collect the standard information about the visitors I would like to see (referrer, browser-type and so on).
When I log in as Admin I have a menu item named "Referrer". I can choose a weekday and then I'll be presented with 1 or 2 rows with
"google.com 4 hits, "itmaskinen.se 6 hits" and so on, But that's not what I want to se, I want to se where my visitors come from, country, IP if possible, how many visitors and so on.
If someone of you are familiar with Blogengine.Net and can point me in the right direction to where I would put my own log-code or if you know any visitor-statistic-extension that can do it for me, I would be really happy to know. I prefer an extension, because if I make changes myself to BlogEngine it may break later updates I install.
Blogengine.Net is a blog software made in .Net found here: http://www.dotnetblogengine.net/
And yes, I prefer to take this question here rather then in the Blogengine.Net forum, you know why. ;)
(Anyone, feel free to edit my (bad) english in this post and after that delete this sentence)
This isn't an extension, but it's what I use to collect all my blogengine.net data and it should be upgrade safe.
When you log into the Blogengine.NET admin screens you can go to "Settings> Custome Code > Tracking Script", here you can put your http://www.google.com/analytics/ logging script. Google Analytics provides all the referrer, browser type, etc stuff you were wanting. And what's nice is you can then create additional accounts for other sites if you choose.
I use both Google Analytics and StatCounter to track visitor stats. I find that each one provides useful information that the other doesn't. And they're both free to a certain extent.
I place their javascript code int the site.master file of my custom BE.Net skin.
For Google Analytics I go a step further and pass the username of authenticated users as a custom variable. That way I can match users names up with the stats. To do this you can use the _setVar javascript method on the GA pageTracker like so:
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-129049-25");
var userDefinedValue = '<%= System.Web.Security.Membership.GetUser() != null ? System.Web.Security.Membership.GetUser().UserName : "" %>';
pageTracker._setVar(userDefinedValue);
pageTracker._trackPageview();
</script>
Anyone noticed that we miss all the hits coming from RSS readers? Syndication.axd does not run the analytics javascripts. So we miss the vast majority of viewers from the statistics. And we happily analyze that is just not impotant - ad-hoc visitors.
For the vast majority of cases, Google Analytics does just fine. It all depends on how much data you want. For example, if you want to keep note of IP addresses and resolve them to get domain names, and also highlight all visits to your blog from, say, your coworkers at the company where you work, you'd have to write some custom code yourself. However, it's all fairly primitive - these sorts of things are easily achievable using ASP.NET.
I set up gathering statistics on IIS web site of my BlogEngine instance and then analyze the logs using WebLog Expert - http://www.weblogexpert.com.
It is more reliable than google analytics, since I see really ALL requests that are coming to my IIS, no matter if this is a request to axd or to some static content. And, once I've found out that google was fooling me in the number of visits. After that I trust my IIS statistics much more than google.
There is a Widget which can be use to display Visits and Online Users Statistics.
You can find it from following links:
http://www.nuget.org/packages/Statistics/
http://www.itnerd.ir/post/2013/07/25/Visits-and-Online-Users-Statistics-widget-for-BlogEngine-2
but to see the instructions go to the second link.

Resources