Email body images - base64 or link to source? - image

I got a question, since i know that mail providers look up on many different particulars in emails and then decides if email is spam or not,
do you think it's safe to send if i got:
inside the email body, is this assumed for spam?
Or is it safe to use base64 encoded images in email body , instead of using link to the image source?
thanks

Nope, I send a ton of emails with images in the body which regularly have a spamassassin score of 0.3 out of 4. Use your regular <img src="x.jpg"/>. Just make sure you have a good amount of real text.
If you're a 'trusted sender', meaning you send a high volume of emails that people don't mark as spam, you can get away with all images. Examples of companies that do this off the top of my head are Seamless Web and Lego.
I could see base64 code either being stripped or setting off the spam alarm though, based on how many hack scripts there are that use base64 to obfuscate code and then run it.

Related

Decoding a picture from a gps tracker

I'am developing a server for a GPS Tracker that can send pictures taken by a camera connected to it, inside a vehicle.
The problem is that I follow every step in the manual and I can't still decode the bytes sent by the tracker into a picture:
I receive the picture in packages separated by the headers and "tails", each one. When I receive the bytes I convert them into hexadecimals as the manual expecifies, then I have to remove the headers and "tails" and apparently after joinned the remain data and saved as a .jpeg, the image should appear, but it doesn't.
the company name is "Toplovo" from China. Have anyone else solve something similar?
Are the line feeds part of your actual data? Because if so I doubt that's supposed to happen.
Otherwise, make sure you're writing the file in binary mode. In some languages this matters. You didn't really specify, but make sure you're not in text mode. Also make sure you're not using any datatypes unsuited for hexidecimal values (again, we don't even know what language you're using, so, it's kind of hard to give specific suggestions.)

irrelevant messages

i am thinking of making a website..
bt how can i make sure that when a user who is asking some question is nt using any abusive language or the message is totally subject oriented..
i m nt talking about spams..i know about captcha and all..
what i am asking is how can i keep an eye on human activity[in this case the messages sent] and at the same time providing the user his complete privacy!
One word... manually.
They're on the web, they already don't have complete privacy.
Offer the community the means to police themselves, whether by explicitly appointing moderators (like most bulletin boards), allowing them to decide who they can and cannot see (like social media sites), or collaborative moderation (like here).
You can set up a system where comments/posts must be approved by a moderator before being allowed to be posted. I believe Wordpress can do this.
There are curse-word filtering libraries available in most languages, usually complete with the ability to customize the words that are filtered out.
In order to filter spam, there are things like bayesian spam filters which attempt to determine whether a message is spam based on keywords in the response. This really isn't something you would want to attempt to do yourself.
Another thing to look at is Markov Chains. They are designed to generate strings of seemingly valid text based on the probability that any given word is followed by any other particular word. Using a reverse process you can attempt to determine if a string of text is valid by checking whether the words used are following by other "on-topic" words.
This would be very difficult as well.
In order to keep the privacy of the users, you could use combinations of these three tests to create a threshold. That is, you will examine no messages unless they reach a high curse/spam/off-topic score. At that point, those messages will be manually checked to see if they are appropriate.
There currently is no way to have a 100% automated process that won't block valid messages and let invalid ones through.
how can i keep an eye on human activity
Your answer lies here. I don't quite understand what you're getting at about privacy though.

How would you start automating my job?

At my new job, we sell imported stuff. In order to be able to sell said stuff, currently the following things need to happen for every incoming shipment:
Invoice arrives, in the form of an email attachment, Excel spreadsheet
Monkey opens invoice, copy-pastes the relevant part of three columns into the relevant parts of a spreadsheet template, where extremely complex calculations happen, like =B2*550
Monkey sends this new spreadsheet to boss (email if lucky, printer otherwise), who sets the retail price
Monkey opens the reply, then proceeds to input the data into the production database using a client program that is unusable on so many levels it's not even worth detailing
Monkey fires up HyperTerminal, types in "AT", disconnect
Monkey sends text messages and emails to customers using another part of the horrible client program, one at a time
I want to change Monkey from myself to software wherever possible. I've never written anything that interfaces with email, Excel, databases or SMS before, but I'd be more than happy to learn if it saves me from this.
Here's my uneducated wishlist:
Monkey asks Thunderbird (mail server perhaps?) for the attachment
Monkey tells Excel to dump the spreadsheet into a more Jurily-friendly format, like CSV or something
Monkey parses the output, does the complex calculations
Monkey sends a link to the boss with a web form, where he can set the prices
Monkey connects to the database, inserts data
Monkey spams costumers
Is all this feasible? If yes, where do I start reading? How would you improve it? What language/framework do you think would be ideal for this? What would you do about the boss?
There are lots of tools that you could apply here, including Python, Excel macros, VB Script, etc.
In this case, PowerShell seems like an excellent choice, as it naturally combines COM access to Office, .NET, and scripting, and is all-around-awesome. If you already know a suitable technology, you'll get the job done fastest with what you know. Othewise, PowerShell.
(C# 4.0 is also reasonable, although earlier versions suck when interacting with Office's COM interfaces.)
Don't get carried away trying to solve the whole problem at once. Start by picking a small, easy part that gets you a lot of value right away. You are more likely to succeed this way. (To get your boss to agree, you need success fast. If you aren't telling your boss, you need success even faster!). Once you have that done, you can use your new-found free time (maybe only a few minutes per day) to extend your tools and skills to the next bite-sized piece. Success will accelerate success.
In time you will replace monkey with code, and either get a promotion or quit in disgust and get a better job.
The big parts are Excel and email. Excel can be handled with either COM or some sort of interaction with OpenOffice.org. Email, well, there's dozens of ways to do it. My hammer of choice is Python, along with pywin32 or PyUNO, and poplib and smtplib.
Boss... will always be boss. Not always very much you can do about the icky wetware stuff.
I'd start by asking myself the following questions
Does the invoice have to come via email or can there be a web form where the users can enter the data? There is a easy way to put a form on google docs so you can download the response in excel format in a common format set by you. I'm sure there are better ways too.
Does the boss need to create a new spreadsheet of can you provide him with a database app where he can view your form, enter the price, check "approved" and have that fire off the process that puts it in the production db?
Can the interface to the client program be worked around? Can you have some other app call the client
Can the text to the end user be sent by you and not the client app? If so, ca you automate that part
Just some thoughts.
One solution to #1 is to send email to a Unix server (instead of Exchange) and use procmail to dump out attachments (see http://gimpel.ath.cx/howto_fetch_proc_metamail.html for an example of how)
As for boss, have a nice web page which you can email him a link to. And send him a short email (3 lines or less) telling him that using that page will save him 30 mins of work over the course of a month and you 2 hours of work in a month. Just be prepared to back up the #s.
However, very high level, un less you're prepared to do the whole automation thing on your own time, you better be able to sell your boss that overall time savings x6 months are less than time to develop this. 'cause may be monkey salary in his eyes is low enough that the cost of software is just not worth it - and sadly he just may be right depending on how complicated a bulletproof robust solution is.
As I noted above, your last question is probably the most salient. It is probably best approached as a personal skunkwork project where you show the boss a completed product one day, collect your innovation bonus and then get fired because a stupider monkey can now do your job instead of you.

Techniques to reduce data harvesting from AJAX/JSON services

I was wondering if anyone had come across any techniques to reduce the chances of data exposed through JSON type services on the server (intended to supply AJAX functions) from being harvested by external agents.
It seems to me that the problem is not so difficult if you had say a Flash client consuming the data. Then you could send encrypted data to the client, which would know how to decrypt it. The same method seems impossible with AJAX though, due to the open nature of the Javascript source.
Has anybody implemented a clever technique here?
Whatever the method, it should still allow a genuine AJAX function to consume the data.
Note that I'm not really talking about protecting 'sensitive' information here, the odd record leaking out is not a problem. Rather I am thinking about stopping a situation where the whole DB is hoovered up by bots (either in one go, or gradually over time).
Thanks.
First, I would like to clear on this:
It seems to me that the problem is not
so difficult if you had say a Flash
client consuming the data. Then you
could send encrypted data to the
client, which would know how to
decrypt it. The same method seems
impossible with AJAX though, due to
the open nature of the Javascrip
source.
It will be pretty obvious the information is being sent encrypted to the flash client & it won't be that hard for the attacker to find out from your flash compiled program what's being used for this - replicate & get all that data.
If the data does happens to have the value you are thinking, you can count on the above.
If this is public information, embrace that & don't combat it - instead find ways to capitalize on it.
If this is information that you are only exposing to a set of users, make sure you have the corresponding authentication / secure communication. Track usage as others have said, and have measures that act on it,
The first thing to prevent bots from stealing your data is not technological, it's legal. First, make sure you have the right language in your site's Terms of Use that what you're trying to prevent is actually disallowed and defensible from a legal standpoint. Second, make sure you design your technical strategy with legal issues in mind. For example, in the US, if you put data behind an authentication barrier and an attacker steals it, it's likely a violation of the DMCA law. Third, find a lawyer who can advise you on IP and DMCA issues... nice folks on StackOverflow aren't enough. :-)
Now, about the technology:
A reasonable solution is to require that users be authenticated before they can get access to your sensitive Ajax calls. This allows you to simply monitor per-user usage of your Ajax calls and (manually or automatically) cancel the account of any user who makes too many requests in a particular time period. (or too many total requests, if you're trying to defend against a trickle approach).
This approach of course is vulnerable to sophisticated bots who automatically sign up new "users", but with a reasonably good CAPTCHA implementation, it's quite hard to build this kind of bot. (see "circumvention" section at http://en.wikipedia.org/wiki/CAPTCHA)
If you are trying to protect public data (no authentication) then your options are much more limited. As other answers noted, you can try IP-address-based limits (and run afoul of large corporate proxy users) but sophisticated attackers can get around this by distributing the load. There's also likley sophisticated software which watches things like request timing, request patterns, etc. and tries to spot bots. Poker sites, for example, spend a lot of time on this. But don't expect these kinds of systems to be cheap. One easy thing you can do is to mine your web logs (e.g. using Splunk) and find the top N IP addresses hitting your site, and then do a reverse-IP lookup on them. Some will be legitimate corporate or ISP proxies. But if you recognize a compeitor's domain name among the list, you can block their domain or follow up with your lawyers.
In addition to pre-theft defense, you might also want to think about inserting a "honey pot": deliberately fake information that you can track later. This is how, for example, maps manufacturers catch plaigarism: they insert a fake street in their maps and see which other maps show the same fake street. While this doesn't prevent determined folks from sucking out all your data, it does let you find out later who's re-using your data. This can be done by embedding unique text strings in your text output, and then searching for those strings on Google later (assuming your data is re-usable on another public website). If your data is HTML or images, you can include an image which points back to your site, and you can track who is downloading it, and look for patterns you can use to bust the freeloaders.
Note that the javascript encryption approach noted in one of the other answers won't work for non-authenticated sessions-- an attacker can simply download the javascript and run it just like a regular browser would. Moral of the story: public data is essentially indefensible. If you want to keep data protected, put it behind an authentication barrier.
This is obvious, but if your data is publicly searchable by search engines, you'll both need a non-AJAX solution for them (Google won't read your ajax data!) and you'll want to mark those pages NOARCHIVE so your data doesn't show up in Google's cache. You'll also probably want a white list of search engine crawler IP addreses which you allow into your search-engine-crawlable pages (you can work with Google, Bing, Yahoo, etc. to get these), otherwise malicious bots could simply impersonate Google and get your data.
In conclusion, I want to echo #kdgregory above: make sure that the threat is real enough that it's worth the effort required. Many companies overestimate the interest that other people (both legitimate customers and nefarious actors) have in their business. It might be that yours is an oddball case where you have particularly important data, it's particularly valuable to obtain, it must be publicly accessible without authentication, and your legal recourses will be limited if someone steals your data. But all those together is admittedly an unusual case.
P.S. - another way to think about this problem which may or may not apply in your case. Sometimes it's easier to change how your data works which obviates securing it. For example, can you tie your data in some way to a service on your site so that the data isn't very useful unless it's being used in conjunction with your code. Or can you embed advertising in it, so that wherever it's shown you get paid? And so on. I don't know if any of these mitigations apply to your case, but many businesses have found ways to give stuff away for free on the Internet (and encourage rather than prevent wide re-distribution) and still make money, so a hybrid free/pay strategy may (or may not) be possible in your case.
If you have an internal Memcached box, you could consider using a technique where you create an entry for each IP that hits your server with an hour expiration. Then increment that value each time the IP hits your AJAX endpoint. If the value gets over a particular threshold, fry the connection. If the value expires in Memcached, you know it isn't getting "hoovered away".
This isn't a concrete answer with a proof of concept, but maybe a starting point for you. You could create a javascript function that provides encryption/decryption functions. The javascript would need to be built dynamically, and you would include an encryption key that is unique to the session. On the server side, you'd have an encryption service that uses the key from the session to encrypt your JSON before delivering it.
This would at least prevent someone from listening to your web traffic, pulling information out of your database.
I'm with kdgergory though, it sounds like your data is too open.
Some techniques are listed in Further thoughts on hindering screen scraping.
If you use PHP, Bad behavior is a nice tool to help. If you don't use PHP, it can give some ideas on how to filter (see How it works page).
Incredibill's blog is giving nice tips, lists of User-agents/IP ranges to block, etc...
Here are a variety of suggestions:
Issue tokens required for redemption along with each AJAX request. Expire the tokens.
Track how many queries are coming from each client, and throttle excessive usage based on expected normal usage of your site.
Look for patterns in usage such as sequential queries, spikes in requests, or queries that occur faster than a human could conduct.
Check user-agents. Many bots don't completely replicate the user agent info of a browser, and you can eliminate programatic scraping of your data using this method.
Change the front-end component of your website to redirect to a captcha (or some other human verifying mechanism) once a request threshold is exceeded.
Modify your logic so the respsonse data is returned in a few different ways to complicate the code required to parse.
Obsfucate your client-side javascript.
Block IPs of offending clients.
Bots usually doesn't parse Javascript, so your ajax code won't be instantly executed. And if they even do, bots usually doesn't maintain sessions/cookies as well. Knowing that, you could reject the request if it is invoked without a valid session/cookie (which is obviously set on the server side beforehand by the request on the parent page).
This does not protect you from human hazard though. The safest way is to restrict access to users with a login/password. If that is not your intent, well, then you have to live with the fact that it's a public application. You could of course scan logs and maintian blacklists with IP addresses and useragents, but that goes extreme.

What data formats can AJAX transfer?

I'm new to AJAX, but as an overview I'd like to know what formats you can upload and download. Is it limited to JSON or XML or can you even send binary types like MP3 or UTF-8 HTML. And finally, do you have full control over the data, byte for byte in something like a byte array, or is only a string sent/received.
If we are talking about ajax we are talking about javascript? And about XMLHTTPRequest?
The XMLHttpRequest which is only a http request can transfer everything. But there is no byte array in javascript. Only strings, numbers and such. Every thing you get from an ajax call is a piece of text (responseText). That might be parsed into XML (which gives you reponseXML). Special encodings should be more a matter of the http transport.
The binary stuff is not ajax dependent but javascript dependent. There are some weird encodings for strings to deliver byte data inside in javascript (especially for images) but it is not a general solution.
HTML is not a problem and that is the most prominent use case. From this type of request you get an HTML string delivered and that is added to some node in the DOM per innerHTML that parses the HTML.
Since data is transported via HTTP you will have to make sure that you use some kind of encoding. One of the most popular is base64 encoding. You can find more information at: http://www.webtoolkit.info/javascript-base64.html
The methodology is to base64-encode the data you would like to send and then base64-decode the data at the server(or the client) and use the original data as you intended.
You can transfer any type of data either string or bytes
You can send anything you like, the problem may be how to handle it once you get it ;)
Standard HTML is probably the most common type of ajax content in use out there - you can choose character encoding too, although it's always best to stick with one type of encoding.
AJAX simply means you're transferring data asynchronously over HTTP with a JavaScript call. So your script makes a "normal" HTTP request using the XmlHttpRequest() object. However, as the name implies, it's really only suited for text-based data formats since you generally want to perform some action on the client side with the data you got back from the server (not always though, sometimes people just send XmlHttpRequests only to update something on the server).
On a side note, I have never seen an application where sending binary data would have been appropriate anyway.
Most often, people choose to send data over to the server with POST or GET (which is basically a method to transfer name-value pairs inherent to HTTP). For sending more complex data, for example hierarchical structures, they need to be encoded somehow. XML documents can be made natively per JavaScript, sent over to the server and get parsed into whatever data types necessary. But since XML can be a bit of a pain, many devs use JSON encoded data instead because it's easy to generate and easy to parse.
What the server sends back is equally as arbitrary. Usually, you specify a callback function in your Javascript that handles the incoming data. Again, the popular choices are XML and JSON, they parse easily into a document object or an array structure respectively. You could also send plain text or some other packaging but remember that you then have to take care of extracting the usable data from it yourself. Sometimes, it can also be beneficial to send actual HTML fragments to the client to update something on the page directly.
For starters, I suggest you have a look at JQuery. It's a very lightweight framework that abstracts many of evil compatibility stuff and lets you write AJAX requests very nicely.
You can move anything that can be sent over HTTP. There are restrictions about the call being made to the same domain as the page loaded from, but not on the content of the transfer. You can do either GET or POST transactions too.
There is a Digg the Blog entry titled DUI.Stream and MXHR that shows off what they call "Multipart XMLHttpRequests." It is alpha code now, but there is a demo that handles images.

Resources