I was wondering how to print unicode characters, such as Japanese or fun characters like ๐ฆ.
I can print hearts with:
hearts = "\u2665"
puts hearts.encode('utf-8')
How can I print more unicode charaters with Ruby in Command Prompt?
My method works with some characters but not all.
Code examples would be greatly appreciated.
You need to enclose the unicode character in { and } if the number of hex digits isn't 4 (credit : /u/Stefan) e.g.:
heart = "\u2665"
package = "\u{1F4E6}"
fire_and_one_hundred = "\u{1F525 1F4AF}"
puts heart
puts package
puts fire_and_one_hundred
Alternatively you could also just put the unicode character directly in your source, which is quite easy at least on macOS with the Emoji & Symbols menu accessed by Ctrl + Command + Space by default (a similar menu can be accessed on Windows 10 by Win + ; ) in most applications including your text editor/Ruby IDE most likely:
heart = "โฅ"
package = "๐ฆ"
fire_and_one_hundred = "๐ฅ๐ฏ"
puts heart
puts package
puts fire_and_one_hundred
Output:
โฅ
๐ฆ
๐ฅ๐ฏ
How it looks in the macOS terminal:
Related
I am writing a simple program for windows using Python 2.7. It opens an email, take some words from it and puts them in a form on web. Problem starts when the email contains polish letters like ร, ลน, ล etc. Whenever I try to print it I get something like: ['\xc4\x84', '\xc5\xbb', '\xc3\x93', '\xc4\x86', '\xc5\xb9'].
I already know it is because of encoding and that Python 3 has no such problem. Here is what I tried already:
mail = " ฤ ลป ร ฤ ลน"
mail = mail.split()
mail = mail.decode("UTF-8")
print mail
or
mail = " ฤ ลป ร ฤ ลน"
mail = mail.split()
[x.encode('UTF8') for x in mail]
print mail
Can anyone please show me how to make the list print properly ?
Python 2.x uses ASCII as a default encoding. To force it to use Unicode, add this line to the top of your program.
# -*- coding: utf-8 -*-
Also you should prefix any string literals with 'u'. e.g.
polishLetters = u'ฤ ลป ร ฤ ลน'
I'm having trouble reading an IP from a text file and properly writing it to another text file. It shows the written IP in the file as: "รฟรพ1 9 2 . 1 6 8 . 1 1 0 . 4"
#Read the first line for the IP
def get_server_ip
File.open("d:\\ip_addr.txt") do |line|
a = line.readline()
b = a.to_s
end
end
#append the ip to file2
def append_ip
FileUtils.cp('file1.txt', 'file2.txt')
file_names = ['file2.txt']
file_names.each do |file_name|
text = File.read(file_name)
b = get_server_ip
new_contents = text.gsub('ip_here', b)
File.open(file_name, "w") {|file| file.puts new_contents }
end
end
I've tried .strip and .delete(' ') with no luck. Can anyone see the issue?
Thank you
The file was generated with Notepad on Windows. It is encoded as UTF-16LE.
The first two bytes in the file have the codes 0xFF and 0xFE; this is the Bytes Order Mark of UTF-16LE.
Each character is encoded on 2 bytes (16 bits), the least significant byte first (Less Endian order).
The spaces between the printable characters in the output are, in fact NUL characters (characters with code 0).
What you can do (apart from converting the file to a more decent format like UTF-8 or even ISO-8859-1) is to pass 'rb:BOM|UTF-16LE' as the second argument of File#open.
r tells File#open to open the file in read-only mode (which is also does by default);
b means "binary mode"; it is required by BOM|UTF-16;
:BOM|UTF-16LE tells Ruby to read and ignore the BOM if it is present in the file and to expect the rest of the file being encoded as UTF16-LE.
If you can, I recommend you to convert the file encoding using a decent editor (even Notepad can be used) to UTF-8 or ISO-8859-1 and all these problems vanish.
In Ruby, I'm reading an .ifc file to get some information, but I can't decode it. For example, the file content:
"'S\X2\00E9\X0\jour/Cuisine'"
should be:
"'Sรฉjour/Cuisine'"
I'm trying to encode it with:
puts ifcFileLine.encode("Windows-1252")
puts ifcFileLine.encode("ISO-8859-1")
puts ifcFileLine.encode("ISO-8859-5")
puts ifcFileLine.encode("iso-8859-1").force_encoding("utf-8")'
But nothing gives me what I need.
I don't know anything about IFC, but based solely on the page Denis linked to and your example input, this works:
ESCAPE_SEQUENCE_EXPR = /\\X2\\(.*?)\\X0\\/
def decode_ifc(str)
str.gsub(ESCAPE_SEQUENCE_EXPR) do
$1.gsub(/..../) { $&.to_i(16).chr(Encoding::UTF_8) }
end
end
str = 'S\X2\00E9\X0\jour/Cuisine'
puts "Input:", str
puts "Output:", decode_ifc(str)
All this code does is replace every sequence of four characters (/..../) between the delimiters, which will each be a Unicode code point in hexadecimal, with the corresponding Unicode character.
Note that this code handles only this specific encoding. A quick glance at the implementation guide shows other encodings, including an \X4 directive for Unicode characters outside the Basic Multilingual Plane. This ought to get you started, though.
See it on eval.in: https://eval.in/776980
If someone is interested, I wrote here a Python Code that decode 3 of the IFC encodings : \X, \X2\ and \S\
import re
def decodeIfc(txt):
# In regex "\" is hard to manage in Python... I use this workaround
txt = txt.replace('\\', 'ยตยตยต')
txt = re.sub('ยตยตยตX2ยตยตยต([0-9A-F]{4,})+ยตยตยตX0ยตยตยต', decodeIfcX2, txt)
txt = re.sub('ยตยตยตSยตยตยต(.)', decodeIfcS, txt)
txt = re.sub('ยตยตยตXยตยตยต([0-9A-F]{2})', decodeIfcX, txt)
txt = txt.replace('ยตยตยต','\\')
return txt
def decodeIfcX2(match):
# X2 encodes characters with multiple of 4 hexadecimal numbers.
return ''.join(list(map(lambda x : chr(int(x,16)), re.findall('([0-9A-F]{4})',match.group(1)))))
def decodeIfcS(match):
return chr(ord(match.group(1))+128)
def decodeIfcX(match):
# Sometimes, IFC files were made with old Mac... wich use MacRoman encoding.
num = int(match.group(1), 16)
if (num <= 127) | (num >= 160):
return chr(num)
else:
return bytes.fromhex(match.group(1)).decode("macroman")
Is there a way to programmatically get a list of characters a .ttf file supports using Ruby and/or Bash. I am trying to pipe the supported character codes into a text file for later processing.
(I would prefer not to use Font Forge.)
Found a Ruby gem called ttfunk which can be found here.
After a gem install ttfunk, you can get all unicode characters by running the following script:
require 'ttfunk'
file = TTFunk::File.open("path/to/font.ttf")
cmap = file.cmap
chars = {}
unicode_chars = []
cmap.tables.each do |subtable|
next if !subtable.unicode?
chars = chars.merge( subtable.code_map )
end
unicode_chars = chars.keys.map{ |dec| dec.to_s(16) }
puts "\n -- Found #{unicode_chars.length} characters in this font \n\n"
p unicode_chars
Which will output something like:
- Found 2815 characters in this font
["20", "21", "22", "23", ... , "fef8", "fef9", "fefa", "fefb", "fefc", "fffc", "ffff"]
Consider this Ruby code:
puts "*****"
puts " *"
puts " "
puts "*****"
puts " *"
My Output is like this:
*****
*
*****
*
Why the heck a whitespace doesn't fill the same space as * character in Scite?
I've tried it in Eclypse with Java and it works just fine.
Proportional fonts have characters of varying widths, ruining space-based alignment.
Switch to a monospace font (e.g., Courier) so all characters are the same size and it'll work.
In order to make it work in Scite You should add
style.errorlist.32=$(font.monospace) in
SciteUser.properties file