Content Water Marking - caching

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.

If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks

You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.

What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Related

Ajax Website, problems with History and Seo

I have a few problems i could use some input on.
I have a website, where all the content is loaded with ajax, it works quite well. There are a few issues with that approach though, or some UX issues.
User cannot copy URL from loaded content, since it will allways show the default URL only.
SEO will take a hit, since it cannot be crawled, the sitemap is like 2 pages only, even though when a normal user browses, they will see alot more.
Browser history, back and forward, does not work. Hitting the back button goes to the main page.
Now, i have searched and read alot.
Google has a hack, that seems to allow the site to be crawled, IF you use # in your url, does not work with empty url, which leads me to...
Manipulating the browser history with pushState/popState.
Now, i have tried getting it to work, but i just cant get my head around which process is the best way to take. Should i redo all my ajax?
Right now i have 2 div boxes, and i switch between them with loaded content, to get that nice sweet transition between pages. My frontpage is basically just 2 empty divs, nothing else. It works, but i get the feeling it is a pretty bad way to do it, thoughts?
If anyone know some good guides, feel free to give me, i have as i said read alot, but i might have missed some golden ones out there.
Google does execute some Javascript when indexing and ranking pages. However, text which is not immediately visible to users is demoted when establishing content relevancy.
Manipulating the browser history with pushState/popState.
It is very unlikely Google will trust your content if you need to use those tricks. And content which is not trusted is not ranked.
UPDATE: Manipulating browser history with pushState is ok.
Moreover, if your URLs change all the time, Google won't appreciate it, unless you manage to set canonical links.

Check on which pages an image is used?

Is there a certain way to check which pages on a website use a specific image?
Say I have some image which I don't use on a page anymore, so I'd like to delete it from my server. But I'm not entirely sure if it's being used on other pages, is there a way to check if it's still being shown on other pages?
You can hook your website to google webmaster tools and wait a little bit after a while 404 errors will appear there. This way you can track unused resources and dead ends.
This includes images.
There is a better way if you have direct access to the web server.
Visit every page in your website or let google crawl it.
You can later sort the files by date modified and ones which are not modified lately are not used.
You have to make sure you get the images from the pages so I would use a historyless cahceless session.
How to sort the files according to the time stamp in unix?

How can I set up custom ImageResizer urls?

I'm just getting started with ImageResizer and I'm stuck on what seem like totally basic questions:
I have an uploader that I use to put images into a directory that's not directly accessible over HTTP. (If I just put a image at, say, /images/myimage.jpg, then anyone could access it by just asking for it, whereas I want to limit access via thumbnails, watermarks, etc.). So I want to put it at /offlimits/myimage.jpg, but be able to serve it up at /public/images/myimage.jpg.
I don't really want to dump all the images in the same offlimits folder, because putting lots of files in one folder makes Windows unhappy. But I don't want to expose the details of that subdirectory structure either, so where do I put the mapping between the public facing url and the actual image location?
Most generally, I don't necessarily want an image extension at all, so I'd like to say /public/image_id?width=100... and have this map to /offlimits/sub1/sub2/sub3/image_id.jpg.
Can anyone advise about how to set this up?
Three part questions are generally frowned upon here at SO, but I'll bite anyhow :)
If you're allowing access to images based on authentication, then you need to use ASP.NET's URL Authorization feature. ImageResizer supports URL Authorization rules. If you just don't want the source files available, and want to force them resized or watermarked, read the docs on how to implement arbitrary rules like this.
You can rewrite image paths to your heart's content with Config.Current.Rewrite, which works just like the PostRewrite event mentioned earlier. Just remember you'll have to keep it all straight in your head later.
Image extensions are good things. Don't fight them. They let the server figure out the right mime-type to send and help errant browsers recover from related bugs. They prevent issues on several platforms and make the Save As dialog work. They significantly improve server efficiency as well, since handling logic doesn't have wait as long. This is particularly relevant because of the design of the IIS/ASP.NET modules system.

How does Facebook grab the text of the article when pasting the url?

Im a bit curious about this Facebook's useful functionality. When I paste a URL on the 'What's on your mind?' box, it almost perfectly gets the body of the article. How does Facebook do this?
Thanks!
It's part of how Facebook Share works.
The URL Linter is pretty helpful as well. For example, if we test it with this very question, you can scroll down and see where it's getting the data from
"Hello, Im a bit curious about this
Facebook's useful functionality. When
I paste a URL on the 'What's on your
mind?' box, it almost perfectly gets
the body of the article. How does
Facebook do this?" extracted from
<description> or first <p>
I can't speak for Facebook specifically, but there are entire companies dedicated to providing that kind of service. For example, Reddit recently outsourced preview generation to a 3rd party.
So, essentially, there's a certain amount of automation and a large amount of manual tweaking and configuration.
You might also look at the Readability tool, which extracts the main content of a web page - that might provide some insight into the processes involved.
You can put your own entries into the shared content, by using the things described in the OpenGraph protocol on Facebook developer website.
It basically goes to the page and begins sniffing for ID's in the HTML marked as Content or Main and probably a few other common terms people use when building a site and specifying where things like menus, content, main body, right menu, top menu, main article, etc are placed in the page when pulling it in dynamically (or non dynamically for that matter).
For example, look at the source of this page itself. You'll see an area that begins div id="content"
Bingo. That's where the facebook sniffer begins. It then grabs probably the first picture it finds within that area as well as the first bit of text in that area as well.

Get URI fragment (hash) to affect SEO? Get indexed by SEs?

I am building a forum site where the post is retrieved on the same page as the listing via AJAX. When a new post is shown, the URI fragment is changed (ex: .php#1_This-is-the-first-post). Also the title and meta tags are changed.
My question is this. I have read that search engines aren't able to use #these-words. So therefore, my entire site won't be able to be indexed (as it will look like one page).
What can i do to get around this, or at least make my sub-pages be able to get indexed?
NOTE: I have built almost all of the site, so radically changes would be hard. SEO is my weakest geek-skill.
Add non-AJAX versions of every page, and link to them from your popups as "permalinks" (or whatever you want to call them). Not only aren't your pages available to search engines, they can't be bookmarked or emailed to friends. I recently worked with some designers on a site and talked them out of using an AJAX-only design. They ended up putting article "teasers" in popups and making users go to a page with a bookmarkable URL to read the complete texts.
As difficult as it may be, the "best" answer may be to re-architect your site to use the hash tag URL scheme more sparingly
Short of that, I'd suggest the following:
Create an alternative, non-hash based URL scheme. This is a must.
Create a site-map that allows search engines to find your existing pages through the new URL scheme.
Slowly port your site over. You might consider adding these deeper links on the page, or encourage users to share those links instead of the hash-based ones, etc.
Hope this helps!

Resources