Parsing data: is img link slower then image from own server? - image

I'm parsing data from other website, and question wether it's better to download images and show them on my own or just links to website images I parsed . Is the link to image by default slower then image from own source?
Couldn't find answer to the simple question. If question is discussable and doesn't belong here, someone comment down please in order to delete it.

Some rules of thumb:
Don't display content on your page which you 'source' from another site without the other sites permission. ('Share this' links provided by youtube are okay, directly linking to the .flv file of someones video from another site to display on yours is not).
Don't copy content from other domains onto your domain without their permission first (doing so would be a copyright violation).
So to answer your question: You should copy the content onto your domain/host, but only if they have given permission to allow this kind of use.
Edit: I am interpreting your question as "I am taking content from another website [and putting it on my own] and I am wondering if I should link directly to their content ( tags pointing to the other domain) or if I should download/copy the content to my website and have my server handle everything?"
The "technical" answer is "it depends on how good your host is compared to the other host when serving content to the average visitor". Compare a page run by Google vs. the same thing run on a home server behind a 56k modem. It matters if you have broadband, but if you're on a 33.3k modem it doesn't.

Related

Check on which pages an image is used?

Is there a certain way to check which pages on a website use a specific image?
Say I have some image which I don't use on a page anymore, so I'd like to delete it from my server. But I'm not entirely sure if it's being used on other pages, is there a way to check if it's still being shown on other pages?
You can hook your website to google webmaster tools and wait a little bit after a while 404 errors will appear there. This way you can track unused resources and dead ends.
This includes images.
There is a better way if you have direct access to the web server.
Visit every page in your website or let google crawl it.
You can later sort the files by date modified and ones which are not modified lately are not used.
You have to make sure you get the images from the pages so I would use a historyless cahceless session.
How to sort the files according to the time stamp in unix?

How to host a SWF file from any page

Is there a way to host a SWF game like Bloons Tower Defense, onto my ASP.NET page? Can you somehow cache a page containing the .SWF file and then play it on that page?
Also is it legal to get any SWF file you want and play it on a different page, if you have credited the author?
I'm very new to this topic and want to create a website that will search for all avaliable, online games, and then play it on the page of my website.
How will I go about this approach? I have thought of creating foreign pages as a string through the WebRequest and WebResponse class, and then searching for the .swf, then downloading it to my server and rehosting it on a page, but that seems too much of a hassle and like it won't work.
What way should I use and how should I go about this?
If not crazy, at least this is illegal. You can't host stealed content from other sites without the written approval of the copyright owner or conforming to its licence.
Talk to the site owner and ask him for his content and read the licence therms if any. This applies to any content you want from any other sources.
Answering your question: Don't do it! and Don't do it to yourself!

IMDB Poster URL Returns Referral Denied

In my Ruby on Rails app, I use the imdb gem (https://rubygems.org/gems/imdb) to search for a movie by title and grab the poster url and add it to the movie model I have in my database. Then in my view, I put that url in an image source tag and display the image to the user.
I don't have any problems when I'm running my application locally, but when I deploy it to Heroku, sometimes a few images are rendered successfully but for the most part, they aren't displayed properly. I've tried multiple browsers and as it turns out when I try to load the image, I get a "Referral Denied" message saying:
You don't have permission to access "[poster url here]" on this server. Reference #[some ref. number here]
How would I go about fixing this? I'm guessing it's because the IMDB server is denying my access because either I'm making too many requests from my application or because my application doesn't have the necessary credentials to get the data or maybe some combination of both. Is there a way to bypass this at all?
IMDB blocks the direct linking of images from their site on other sites, I think this previous question covers the topic.
The easiest way to get around this is to download the image and host it yourself rather than linking IMDB's copy. Alternatively you could investigate alternative movie DBs to see if they can offer what you want - the answers to this question on IMDB APIs lists a few. The Movie DB API looks like a good bet.

Google Traffic and CDN

I have moved all images from a website to a content delivery network on another domain. And, as a result, lost all Google image search traffic. Is damage permanent or will traffic return, will images on another domain still allow my site to be in image search results? Maybe I should have moved images gradually? Any advise?
It's gone because Google can now no longer find the images it had previously spidered, google will of course find the new locations however there is no guarantee it will rank your images the same as before.
The best way to recover is to implement 301 redirects using your .htaccess file. Depending on how you've moved the images and if they are in the same or a different folder structure it may require a bit of work to fix.

Content Water Marking

We have members-only paid content that is frequently copied and republished without our permission.
We are trying to ‘watermark’ our content by including each customer’s user id in a fake css class, for example <p class='userid_1234'> (except not so obivous, of course :), that would help us track the source of the copying, and then we place that class somewhere in the article body.
The problem is, by including user-specific information into an article, it makes it so that the article content is ineligible for caching because it is now unique to each user.
This bumps the page load time from ~.8ms to ~2.5sec for each article page view.
Does anyone know of any watermarking strategies that can still be used with caching?
Alternatively, what can be done to speed up database access? ( ha, ha, that there’s just a tiny topic i’m sure.. )
We're using the CMS Expression Engine, but I'd like to hear about any strategies. They don't have to be EE-specific.
If you're talking about images then you could use PHP to add a watermark to the images.
How can I add an image onto an image in PHP like a watermark
its a tool to help track down the lazy copiers who just copy the source code as-is. this is not preventative, nor is it a deterrent. – Ian 12 hours ago
Going by your above comment you are happy with users copying your content, just not without the formatting etc. So what you could do is provide the users an embed type of source code for that particular content just like YouTube does with videos. Into that embed source code you could add your own links back to your site, utilize your own CSS etc.
That way you can still allow the members to use the content but it will always come out the way you intended it with links back to your site.
Thanks
You could always cache a version that uses a special string, like #!username!#, and then later fill it in with PHP based on which user is viewing it.
Another way I believe is to switch from caching on the server to instead let the browser cache it locally for a little. That way it is only cached per user, and it reduces the calls to your database. Because an article is pretty static, you could just let the local computer cache it, and pull in comments via javascript.
This last one is probably not one you are really looking for, but I'm gonna come out and say it anyway. You could not treat your users like thieves, and instead treat the thieves as thieves. Go to the person hosting the servers your content is on and send them an email telling them copyrighted premium content is being hosted on their servers without your permission. You can even automate that process.
How to find out what sites are posting your content? Put a link in the body content to your site, and do a Google Search/Blog Search for articles linking to that site. To automate it, use Google Blog Search because it offers RSS feeds. Any one that has a link back to your site could go into a database with a link to the page, someone could look at it, and if it is the entire article, go do a Whois and send them an email.
What makes you think adding css to something is going to stop people from copying it without that CSS? It's more likely that they are just coping the source of the content you are showing them and ignoring all the styling around it. For example, I use tamper data to look at all HTTP requests made by Firefox, if I can see it on the page, I can see it in the logs. Even with all the "protection" some sites try to put in place, they generally will never work. I can grab what I want, without using any screen capture/recording.
If you were serving flv's, for example, I would easily be able to grab the source of that even if you overlayed it with some CSS. I think the best approach would be to get the sites publishing your premium content and ask them to remove it. It's either that or watermark the actual content on the fly while sending it to the browser.

Resources