Decoding quoted-printable messages in Swift

Decoding quoted-printable messages in Swift - macos

I have a quoted-printable string such as "The cost would be =C2=A31,000". How do I convert this to "The cost would be £1,000".
I'm just converting text manually at the moment and this doesn't cover all cases. I'm sure there is just one line of code that will help with this.
Here is my code:
func decodeUTF8(message: String) -> String
{
var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
return newMessage
}
Thanks

An easy way would be to utilize the (NS)String method
stringByRemovingPercentEncoding for this purpose.
This was observed in
decoding quoted-printables,
so the first solution is mainly a translation of the answers in
that thread to Swift.
The idea is to replace the quoted-printable "=NN" encoding by the
percent encoding "%NN" and then use the existing method to remove
the percent encoding.
Continuation lines are handled separately.
Also, percent characters in the input string must be encoded first,
otherwise they would be treated as the leading character in a percent
encoding.
func decodeQuotedPrintable(message : String) -> String? {
return message
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.stringByReplacingOccurrencesOfString("%", withString: "%25")
.stringByReplacingOccurrencesOfString("=", withString: "%")
.stringByRemovingPercentEncoding
}
The function returns an optional string which is nil for invalid input.
Invalid input can be:
A "=" character which is not followed by two hexadecimal digits,
e.g. "=XX".
A "=NN" sequence which does not decode to a valid UTF-8 sequence,
e.g. "=E2=64".
Examples:
if let decoded = decodeQuotedPrintable("=C2=A31,000") {
print(decoded) // £1,000
}
if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
print(decoded) // “Hello … world!”
}
Update 1: The above code assumes that the message uses the UTF-8
encoding for quoting non-ASCII characters, as in most of your examples: C2 A3 is the UTF-8 encoding for "£", E2 80 A4 is the UTF-8 encoding for ….
If the input is "Rub=E9n" then the message is using the
Windows-1252 encoding.
To decode that correctly, you have to replace
.stringByRemovingPercentEncoding
by
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)
There are also ways to detect the encoding from a "Content-Type"
header field, compare e.g. https://stackoverflow.com/a/32051684/1187415.
Update 2: The stringByReplacingPercentEscapesUsingEncoding
method is marked as deprecated, so the above code will always generate
a compiler warning. Unfortunately, it seems that no alternative method
has been provided by Apple.
So here is a new, completely self-contained decoding method which
does not cause any compiler warning. This time I have written it
as an extension method for String. Explaining comments are in the
code.
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.decodeQuotedPrintableSequences(enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = rangeOfString("=", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< range.startIndex])
position = range.startIndex
// Decode one or more successive "=HH" sequences to a byte array:
let bytes = NSMutableData()
repeat {
let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
if hexCode.characters.count < 2 {
return nil // Incomplete hex code
}
guard var byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.appendBytes(&byte, length: 1)
position = position.advancedBy(3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.appendContentsOf(dec)
}
// Copy remaining characters to the result:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
Example usage:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
print(decoded) // Rubén
}
Update for Swift 4 (and later):
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.replacingOccurrences(of: "=\r\n", with: "")
.replacingOccurrences(of: "=\n", with: "")
.decodeQuotedPrintableSequences(encoding: enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = range(of: "=", range: position..<endIndex) {
result.append(contentsOf: self[position ..< range.lowerBound])
position = range.lowerBound
// Decode one or more successive "=HH" sequences to a byte array:
var bytes = Data()
repeat {
let hexCode = self[position...].dropFirst().prefix(2)
if hexCode.count < 2 {
return nil // Incomplete hex code
}
guard let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: 3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.append(contentsOf: dec)
}
// Copy remaining characters to the result:
result.append(contentsOf: self[position ..< endIndex])
return result
}
}
Example usage:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
print(decoded) // Rubén
}

This encoding is called 'quoted-printable', and what you need to do is convert string to NSData using ASCII encoding, then just iterate over the data replacing all 3-symbol parties like '=A3' with the byte/char 0xA3, and then converting the resulting data to string using NSUTF8StringEncoding.

Unfortunately, I'm a bit late with my answer. It might be helpful for the others though.
var string = "The cost would be =C2=A31,000"
var finalString: String? = nil
if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%$1")
print(intermediatePercentEscapedString)
finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
print(finalString)
}

In order to give an applicable solution, a few more information is required. So, I will make some assumptions.
In an HTML or Mail message for example, you can apply one or more encodings to some kind of source data. For example, you could encode a binary file e.g. an png file with base64 and then zip it. The order is important.
In your example as you say, the source data is a String and has been encoded via UTF-8.
In a HTPP message, your Content-Type is thus text/plain; charset = UTF-8. In your example there seems also an additional encoding applied,
a "Content-Transfer-Encoding": possibly Content-transfer-encoding is quoted-printable or base64 (not sure about that, though).
In order to revert it back, you would need to apply the corresponding decodings in reverse order.
Hint:
You can view the headers (Contente-type and Content-Transfer-Encoding) of a mail message when viewing the raw source of the mail.

You can also look at this working solution - https://github.com/dunkelstern/QuotedPrintable
let result = QuotedPrintable.decode(string: quoted)

Related

Regular expression

I want the user to enter text in a text field, and if the user types "<" a space should be automatically appended to the text in the field
I tried removing the special character but I need the user to input that as well.
let RISTRICTED_CHARACTERS = "<"
func textField(_ textField: UITextField, shouldChangeCharactersIn range: NSRange, replacementString string: String) -> Bool {
let set = CharacterSet(charactersIn: RISTRICTED_CHARACTERS)
let inverted = set.inverted
let filtered = string.components(separatedBy: inverted).joined(separator: "")
if filtered == string && string != "" {
return false
} else {
let maxLength = maxLenghtOfTextField
let currentString: NSString = textField.text! as NSString
let newString: NSString = currentString.replacingCharacters(in: range, with: string) as NSString
return newString.length <= maxLength
}
In this code I'm not allowing "<" this character. I want the text field to be like this.
My output should be : hello <(space) world.
The space should be automatically appended if I start with "<" sign.

Instead of .replacingCharacters maybe try .replacingOccurences
let updatedString: String? = textField.text.replacingOccurrences(of: "<", with: " ")

What does "Stream did not contain valid UTF-8" mean?

I'm creating a simple HTTP server. I need to read the requested image and send it to browser. I'm using this code:
fn read_file(mut file_name: String) -> String {
file_name = file_name.replace("/", "");
if file_name.is_empty() {
file_name = String::from("index.html");
}
let path = Path::new(&file_name);
if !path.exists() {
return String::from("Not Found!");
}
let mut file_content = String::new();
let mut file = File::open(&file_name).expect("Unable to open file");
let res = match file.read_to_string(&mut file_content) {
Ok(content) => content,
Err(why) => panic!("{}",why),
};
return file_content;
}
This works if the requested file is text based, but when I want to read an image I get the following message:
stream did not contain valid UTF-8
What does it mean and how to fix it?

The documentation for String describes it as:
A UTF-8 encoded, growable string.
The Wikipedia definition of UTF-8 will give you a great deal of background on what that is. The short version is that computers use a unit called a byte to represent data. Unfortunately, these blobs of data represented with bytes have no intrinsic meaning; that has to be provided from outside. UTF-8 is one way of interpreting a sequence of bytes, as are file formats like JPEG.
UTF-8, like most text encodings, has specific requirements and sequences of bytes that are valid and invalid. Whatever image you have tried to load contains a sequence of bytes that cannot be interpreted as a UTF-8 string; this is what the error message is telling you.
To fix it, you should not use a String to hold arbitrary collections of bytes. In Rust, that's better represented by a Vec:
fn read_file(mut file_name: String) -> Vec<u8> {
file_name = file_name.replace("/", "");
if file_name.is_empty() {
file_name = String::from("index.html");
}
let path = Path::new(&file_name);
if !path.exists() {
return String::from("Not Found!").into();
}
let mut file_content = Vec::new();
let mut file = File::open(&file_name).expect("Unable to open file");
file.read_to_end(&mut file_content).expect("Unable to read");
file_content
}
To evangelize a bit, this is a great aspect of why Rust is a nice language. Because there is a type that represents "a set of bytes that is guaranteed to be a valid UTF-8 string", we can write safer programs since we know that this invariant will always be true. We don't have to keep checking throughout our program to "make sure" it's still a string.

How to replace emoji characters with their descriptions in a Swift string

I'm looking for a way to replace emoji characters with their description in a Swift string.
Example:
Input "This is my string 😄"
I'd like to replace the 😄 to get:
Output "This is my string {SMILING FACE WITH OPEN MOUTH AND SMILING EYES}"
To date I'm using this code modified from the original code of this answer by MartinR, but it works only if I deal with a single character.
let myCharacter : Character = "😄"
let cfstr = NSMutableString(string: String(myCharacter)) as CFMutableString
var range = CFRangeMake(0, CFStringGetLength(cfstr))
CFStringTransform(cfstr, &range, kCFStringTransformToUnicodeName, Bool(0))
var newStr = "\(cfstr)"
// removing "\N" from the result: \N{SMILING FACE WITH OPEN MOUTH AND SMILING EYES}
newStr = newStr.stringByReplacingOccurrencesOfString("\\N", withString:"")
print("\(newStr)") // {SMILING FACE WITH OPEN MOUTH AND SMILING EYES}
How can I achieve this?

Simply do not use a Character in the first place but use a String as input:
let cfstr = NSMutableString(string: "This 😄 is my string 😄") as CFMutableString
that will finally output
This {SMILING FACE WITH OPEN MOUTH AND SMILING EYES} is my string {SMILING FACE WITH OPEN MOUTH AND SMILING EYES}
Put together:
func transformUnicode(input : String) -> String {
let cfstr = NSMutableString(string: input) as CFMutableString
var range = CFRangeMake(0, CFStringGetLength(cfstr))
CFStringTransform(cfstr, &range, kCFStringTransformToUnicodeName, Bool(0))
let newStr = "\(cfstr)"
return newStr.stringByReplacingOccurrencesOfString("\\N", withString:"")
}
transformUnicode("This 😄 is my string 😄")

Here is a complete implementation.
It avoids to convert to description also the non-emoji characters (e.g. it avoids to convert “ to {LEFT DOUBLE QUOTATION MARK}). To accomplish this, it uses an extension based on this answer by Arnold that returns true or false whether a string contains an emoji.
The other part of the code is based on this answer by MartinR and the answer and comments to this answer by luk2302.
var str = "Hello World 😄 …" // our string (with an emoji and a horizontal ellipsis)
let newStr = str.characters.reduce("") { // loop through str individual characters
var item = "\($1)" // string with the current char
let isEmoji = item.containsEmoji // true or false
if isEmoji {
item = item.stringByApplyingTransform(String(kCFStringTransformToUnicodeName), reverse: false)!
}
return $0 + item
}.stringByReplacingOccurrencesOfString("\\N", withString:"") // strips "\N"
extension String {
var containsEmoji: Bool {
for scalar in unicodeScalars {
switch scalar.value {
case 0x1F600...0x1F64F, // Emoticons
0x1F300...0x1F5FF, // Misc Symbols and Pictographs
0x1F680...0x1F6FF, // Transport and Map
0x2600...0x26FF, // Misc symbols
0x2700...0x27BF, // Dingbats
0xFE00...0xFE0F, // Variation Selectors
0x1F900...0x1F9FF: // Various (e.g. 🤖)
return true
default:
continue
}
}
return false
}
}
print (newStr) // Hello World {SMILING FACE WITH OPEN MOUTH AND SMILING EYES} …
Please note that some emoji could not be included in the ranges of this code, so you should check if all the emoji are converted at the time you will implement the code.

How can I find the substrings from the NSTextCheckingResult objects in swift?

I wonder how it is possible to find substrings from a NSTextCheckingResult object. I have tried this so far:
import Foundation {
let input = "My name Swift is Taylor Swift "
let regex = try NSRegularExpression(pattern: "Swift|Taylor", options:NSRegularExpressionOptions.CaseInsensitive)
let matches = regex.matchesInString(input, options: [], range: NSMakeRange(0, input.characters.count))
for match in matches {
// what will be the code here?
}

Try this:
import Foundation
let input = "My name Swift is Taylor Swift "// the input string where we will find for the pattern
let nsString = input as NSString
let regex = try NSRegularExpression(pattern: "Swift|Taylor", options: NSRegularExpressionOptions.CaseInsensitive)
//matches will store the all range objects in form of NSTextCheckingResult
let matches = regex.matchesInString(input, options: [], range: NSMakeRange(0, input.characters.count)) as Array<NSTextCheckingResult>
for match in matches {
// what will be the code
let range = match.range
let matchString = nsString.substringWithRange(match.range) as String
print("match is \(range) \(matchString)")
}

Here is code that works for Swift 3. It returns array of String
results.map {
String(text[Range($0.range, in: text)!])
}
So overall example could be like this:
let regex = try NSRegularExpression(pattern: regex)
let results = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return results.map {
String(text[Range($0.range, in: text)!])
}

You can put this code inside the for loop. The str will contain the string that matches.
let range = match.range
let str = (input as NSString).substringWithRange(range)

Format Numbers in Textfields using Swift

I am trying to format a number from a UITextfield, as its being typed, to a decimal with commas.
I have done so with the following code:
#IBAction func editingDidBegin(sender : AnyObject)
{
costField.addTarget(self, action: Selector("textFieldDidChange:"), forControlEvents: UIControlEvents.EditingChanged)
}
func textFieldDidChange(theTextField:UITextField) -> Void
{
var textFieldText = theTextField.text.stringByReplacingOccurrencesOfString(",", withString: " ", options: NSStringCompareOptions.RegularExpressionSearch, range: Range(start: theTextField.text.startIndex, end: theTextField.text.endIndex))
var formatter:NSNumberFormatter = NSNumberFormatter()
formatter.numberStyle = NSNumberFormatterStyle.DecimalStyle
var formattedOutput = formatter.stringFromNumber(textFieldText.bridgeToObjectiveC().integerValue)
costField.text = formattedOutput
}
The problem with this, is after four digits are entered, everything after the comma is deleted. For example if I enter 4000 it formats to 4,000, then if I type another number like 8 it reformats to 48.
Is there another way I can format this, maybe through IB or how can I fix the code?

Replace the line with:
var textFieldText = theTextField.text.stringByReplacingOccurrencesOfString(",", withString: "", options: NSStringCompareOptions.RegularExpressionSearch, range: Range(start: theTextField.text.startIndex, end: theTextField.text.endIndex))
(I only removed the space between the double quotes).
Fact is, NSNumberFormatter doesn't like the added spaces in the string.
Works fine afterwards.

I know I am late to the party but this worked well for me.
var phoneNumber = " 1 (888) 555-5551 "
var strippedPhoneNumber = "".join(phoneNumber.componentsSeparatedByCharactersInSet(NSCharacterSet.decimalDigitCharacterSet().invertedSet))
It takes out the spaces and strips out the non decimal numeric characters.
The end result is "1888555551"

I've updated this answer to the newest version of swift. This borrows 90% from the two answers above however, also accounts for nil exception from the textfield when the textfield is cleared.
func textFieldDidChangeCommas(theTextField:UITextField) -> Void
{
if theTextField.text != nil {
var textFieldText = theTextField.text!.stringByReplacingOccurrencesOfString(",", withString: "", options: NSStringCompareOptions.RegularExpressionSearch, range: Range(start: theTextField.text!.startIndex, end: theTextField.text!.endIndex))
var formatter:NSNumberFormatter = NSNumberFormatter()
formatter.numberStyle = NSNumberFormatterStyle.DecimalStyle
if textFieldText != "" {
var formattedOutput = formatter.stringFromNumber(Int(textFieldText)!)
costField.text = formattedOutput
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Decoding quoted-printable messages in Swift - macos

This encoding is called 'quoted-printable', and what you need to do is convert string to NSData using ASCII encoding, then just iterate over the data replacing all 3-symbol parties like '=A3' with the byte/char 0xA3, and then converting the resulting data to string using NSUTF8StringEncoding.

You can also look at this working solution - https://github.com/dunkelstern/QuotedPrintable let result = QuotedPrintable.decode(string: quoted)

Related

Regular expression

What does "Stream did not contain valid UTF-8" mean?

How to replace emoji characters with their descriptions in a Swift string

How can I find the substrings from the NSTextCheckingResult objects in swift?

Format Numbers in Textfields using Swift

Categories

Resources