It appears that golang doesn't support all unicode characters for its runes
package main
import "fmt"
func main() {
standardSuits := []rune{'♠️', '♣️', '♥️', '♦️'}
fmt.Println(standardSuits)
}
Generates the following error:
./main.go:6: missing '
./main.go:6: invalid identifier character U+FE0F '️'
./main.go:6: syntax error: unexpected ️, expecting comma or }
./main.go:6: missing '
./main.go:6: invalid identifier character U+FE0F '️'
./main.go:6: missing '
./main.go:6: invalid identifier character U+FE0F '️'
./main.go:6: missing '
./main.go:6: invalid identifier character U+FE0F '️'
./main.go:6: missing '
./main.go:6: too many errors
Is there a way to get around this, or should I just live with this limitation and use something else?
It looks to me like a parsing issue. You could use the unicode points to produce that runes, which should give the same result as using the chars.
package main
import "fmt"
func main() {
standardSuits := []rune{'\u2660', '\u2663', '\u2665', '\u2666', '⌘'}
fmt.Println(standardSuits)
}
Generates
[9824 9827 9829 9830 8984]
Playground link: https://play.golang.org/p/jTLsbs7DM1
I added the additional 5th rune to check if the result from code point or char gives the same. Looks like it does.
Edit:
Not sure what is wrong with your chars (did not view them in a hex editor, have none around), but something is strange about them.
I also got this to run by copy pasting the chars from Wikipedia:
package main
import "fmt"
func main() {
standardSuits := []rune{'♠', '♣', '♥', '♦'}
fmt.Println(standardSuits)
}
https://play.golang.org/p/CKR0u2_IIB
The unicode string you use in your source code consist of more than one "character", but a character constant '...' is not allowed to contain strings of length greater than one. In more detail:
If I copy&paste your source code and print a hexdump, I can see the exact bytes in your source code:
>>> hexdump -C x.go
00000000 70 61 63 6b 61 67 65 20 6d 61 69 6e 0a 0a 69 6d |package main..im|
00000010 70 6f 72 74 20 22 66 6d 74 22 0a 0a 66 75 6e 63 |port "fmt"..func|
00000020 20 6d 61 69 6e 28 29 20 7b 0a 20 20 73 74 61 6e | main() {. stan|
00000030 64 61 72 64 53 75 69 74 73 20 3a 3d 20 5b 5d 72 |dardSuits := []r|
00000040 75 6e 65 7b 27 e2 99 a0 ef b8 8f 27 2c 20 27 e2 |une{'......', '.|
00000050 99 a3 ef b8 8f 27 2c 20 27 e2 99 a5 ef b8 8f 27 |.....', '......'|
00000060 2c 20 27 e2 99 a6 ef b8 8f 27 7d 0a 20 20 66 6d |, '......'}. fm|
00000070 74 2e 50 72 69 6e 74 6c 6e 28 73 74 61 6e 64 61 |t.Println(standa|
00000080 72 64 53 75 69 74 73 29 0a 7d 0a |rdSuits).}.|
This shows, for example, that your '♠️' is encoded using the hex bytes e2 99 a0 ef b8 8f. In utf-8 encoding this corresponds to the two(!) characters \u2660 \uFE0F. This is not obvious by looking at the code, since \uFE0F is no printable character, but Go complains, because you have more than one character in a character constant. Using '♠' or '\u2660' instead works as expected.
Related
I've got some code to create labels in Gmail, which usually works fine. But now the requirement is to create a label with Japanese characters, specifically "アーカイブ". I am encoding the json like this:
7B 0D 0A 22 6E 61 6D 65 22 3A 22 E3 82 A2 E3 83 {.."name":".....
BC E3 82 AB E3 82 A4 E3 83 96 22 2C 0D 0A 22 6D ..........",.."m
65 73 73 61 67 65 4C 69 73 74 56 69 73 69 62 69 essageListVisibi
6C 69 74 79 22 3A 22 73 68 6F 77 22 2C 0D 0A 22 lity":"show",.."
6C 61 62 65 6C 4C 69 73 74 56 69 73 69 62 69 6C labelListVisibil
69 74 79 22 3A 22 6C 61 62 65 6C 53 68 6F 77 22 ity":"labelShow"
0D 0A 7D 0D 0A 00 00 00 00 00 00 00 00 00 00 00 ..}.............
As you can see, the first character is the UTF8 sequence E3 82 A2, which if you look at this table (https://www.utf8-chartable.de/unicode-utf8-table.pl?start=12352&names=-) seems to be correct for that first character. The others look OK also.
As a test, I created a Japanese folder with that name in the UI, then got a dump of the json that Gmail produces when I get a list of existing folders. What Gmail produces is exactly the same as what I'm trying to import. So I don't see what I could be doing wrong here. Any help appreciated.
Never mind this - turns out my Japanese characters translate to "Archive" which is apparently a reserved folder name.
I have a csv file with text and numbers.
If a number is bigger than 1000, formatted like this: 1 000,
so it has a space as thousand separator, but it is not space. I tried to sed it, and it worked where real space was, but not in this format.
It is also not TAB, I removed all the TABs with "expand -t 1".
The following is a line that demonstrates the issue:
x17_Provident_GDN_REMARKETING_provident.hu_listák;Display_Hálózat;Szeged;2021-03-09;Kedd;Mobil;HUF;1 736;9;130.83;0.00
In penultimate row, in column 8: 1 736
is the problem.
And running this: grep -E -m 1 -e '[;]1[^;]+736[;]' <yourfile.csv | hexdump -C
gives:
00000000 78 31 37 5f 50 72 6f 76 69 64 65 6e 74 5f 47 44 |x17_Provident_GD|
00000010 4e 5f 52 45 4d 41 52 4b 45 54 49 4e 47 5f 70 72 |N_REMARKETING_pr|
00000020 6f 76 69 64 65 6e 74 2e 68 75 5f 6c 69 73 74 c3 |ovident.hu_list.|
00000030 a1 6b 3b 44 69 73 70 6c 61 79 5f 48 c3 a1 6c c3 |.k;Display_H..l.|
00000040 b3 7a 61 74 3b 53 7a 65 67 65 64 3b 32 30 32 31 |.zat;Szeged;2021|
00000050 2d 30 33 2d 30 39 3b 4b 65 64 64 3b 4d 6f 62 69 |-03-09;Kedd;Mobi|
00000060 6c 3b 48 55 46 3b 31 c2 a0 37 33 36 3b 39 3b 31 |l;HUF;1..736;9;1|
00000070 33 30 2e 38 33 3b 30 2e 30 30 0a |30.83;0.00.|
0000007b
It's a 2 byte, UTF-8 encoded non breaking space - c2 a0.
You can use perl to safely remove it.
perl -pe 's/\xc2\xa0//g' dirty.csv > clean.csv
After we know it is No break space, I simply sed it on mac with entry method:
opt+space
cat test4.csv | sed 's/ //g'
Similar to perl, you can use GNU sed with LC_ALL=C:
LC_ALL=C sed 's/\xc2\xa0//g'
I have a 240MB logfile from a PuTTY session. This was mistakenly logged in the "SSH packets and raw data" format instead of "All session output". If I open the file in a text editor then I can see that the data I require (the plain text).
The problem is extracting that from the raw data.
For example:
Incoming raw data at 2016-01-06 15:47:42
00000000 e8 fd c2 d2 88 a9 39 b9 2a 77 2a 7b 4a 60 fc 21 ......9.*w*{J`.!
00000010 1d f5 fc d4 b1 58 1f 4d 68 a4 ef 83 03 39 59 b7 .....X.Mh....9Y.
00000020 41 be 36 7b b5 3c 10 fa 65 27 77 30 77 97 02 39 A.6{.<..e'w0w..9
00000030 46 4c 28 da 5c c6 2c 1e ae 33 db e1 a8 09 ea 4a FL(.\.,..3.....J
00000040 06 94 c6 eb 38 8e d3 d3 33 13 78 08 7c 5f 41 56 ....8...3.x.|_AV
00000050 f1 13 9e e1 ....
Incoming packet #0x31, type 94 / 0x5e (SSH2_MSG_CHANNEL_DATA)
00000000 00 00 01 00 00 00 00 20 64 69 73 61 62 6c 69 6e ....... disablin
00000010 67 20 61 20 72 75 6e 6e 69 6e 67 20 77 61 74 63 g a running watc
00000020 68 64 6f 67 2e 2e 0d 0a hdog....
Incoming raw data at 2016-01-06 15:47:42
00000000 dc 96 f3 54 f8 a8 5c 83 80 7b a8 07 da 79 95 50 ...T..\..{...y.P
00000010 3f 19 2f 0c f0 03 a1 01 a3 33 2f 97 75 9d 47 15 ?./......3/.u.G.
00000020 b9 95 df c6 66 e0 50 32 88 1e db 5b 73 1b 7b ad ....f.P2...[s.{.
I think what I need to do is read only the sections of the file labelled "Incoming packet". Then I can read the ascii character codes and convert to readable text (this will recover the tabs, linefeeds and carriage returns).
I'm not familiar with awk or sed, but I know a bit of grep. How can I go about firstly extracting the sections (of variable size) that I need to translate from ASCII codes to text?
sed -n '/^Incoming packet/,/^Incoming raw data/{//!p}
This will print lines between the matches Incoming packet and Incoming raw. Process this output further to get your desired output.
Print only ASCII characters (print last 17 characters) from the matching line:
sed -n '/Incoming packet/,/Incoming raw data/{//!{s/^.*\(.\{17\}\)/\1/;p}}'
Ref:1, 2
I have a script that collects a bunch of file system object information (hashes, dates, etc) and stores it in a MySQL database (one row per object).
The script is running in Bash in Mac OS X 10.10.4 (MBP).
I would like to store the HFS+ Extended Attributes in the database as well. xattr gives output as shown below, I would like to dump the hex and formatting text leaving just the attribute name and the ASCII value. This means not just dumping the line numbers, hex, and | formatting characters but also concatenate the value onto one line per attribute name with the attribute name prepended.
Note that each object (file/folder) may have multiple attributes and the attribute names are not defined.
Take this input:
$xattr -l wordpress-3.9.6.zip
com.apple.metadata:kMDItemWhereFroms:
00000000 62 70 6C 69 73 74 30 30 A2 01 02 5F 10 29 68 74 |bplist00..._.)ht|
00000010 74 70 73 3A 2F 2F 77 6F 72 64 70 72 65 73 73 2E |tps://wordpress.|
00000020 6F 72 67 2F 77 6F 72 64 70 72 65 73 73 2D 33 2E |org/wordpress-3.|
00000030 39 2E 36 2E 7A 69 70 5F 10 2F 68 74 74 70 73 3A |9.6.zip_./https:|
00000040 2F 2F 77 6F 72 64 70 72 65 73 73 2E 6F 72 67 2F |//wordpress.org/|
00000050 64 6F 77 6E 6C 6F 61 64 2F 72 65 6C 65 61 73 65 |download/release|
00000060 2D 61 72 63 68 69 76 65 2F 08 0B 37 00 00 00 00 |-archive/..7....|
00000070 00 00 01 01 00 00 00 00 00 00 00 03 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 69 |...........i|
0000008c
com.apple.quarantine: 0001;55701556;Google Chrome.app;8AD80928-CB48-48EA-8A1B-EC4B0BE656A9
And make it look like this:
com.apple.metadata:kMDItemWhereFroms: bplist00..._.)https://wordpress.org/wordpress-3.9.6.zip_./https://wordpress.org/download/release-archive/..7...............................i
com.apple.quarantine: 0001;55701556;Google Chrome.app;8AD80928-CB48-48EA-8A1B-EC4B0BE656A9
Thanks for any help
MC
xattr is not very customizable; it's meant more for human browsing than scripted use. You're better off using another language. Here's an example in Python:
import xattr
x = xattr.xattr('wordpress-3.9.6.zip')
for name, value in x:
print name, repr(x[name])
You may want to drop the call to repr (or use a different wrapper around x[name]), depending on the desired output.
Note that you almost certainly do not want the . from the ASCII output of the xattr program, since they represent any non-printable ASCII character.
I'm trying to write a script to extract the original download URL from disk images downloaded with Safari on OS X using xattr, so that I can rename them but still easily obtain their original names for reference.
This command prints the hex representation of the URL that the given file was downloaded from, as an example:
xattr -p com.apple.metadata:kMDItemWhereFroms *.dmg
gives
62 70 6C 69 73 74 30 30 A1 01 5F 10 4F 68 74 74
70 3A 2F 2F 61 64 63 64 6F 77 6E 6C 6F 61 64 2E
61 70 70 6C 65 2E 63 6F 6D 2F 4D 61 63 5F 4F 53
5F 58 2F 6D 61 63 5F 6F 73 5F 78 5F 31 30 2E 36
2E 31 5F 62 75 69 6C 64 5F 31 30 62 35 30 34 2F
30 34 31 35 30 37 33 61 2E 64 6D 67 08 0A 00 00
00 00 00 00 01 01 00 00 00 00 00 00 00 02 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 5C
The URL starts at the 14th byte (if I counted correctly) and is NULL terminated. How can I format this string so that I get a string output as follows:
http://adcdownload.apple.com/Mac_OS_X/mac_os_x_10.6.1_build_10b504/0415073a.dmg
(don't worry, this link doesn't work unless you're logged in to ADC)
...essentially, the same thing Finder will display in Get Info. I tried piping xattr's output to xxd but I'm not sure how to specify the offset so the string starts at the right place.
So, after looking at the binary data returned by xattr -p, I realized that it was actually a binary plist... hence "bplist" at the front of the data. For some reason I didn't notice this before, but in light of this, here's a proper solution that should work on every OS X from 10.5 to 10.8.
To avoid duplication, I'll link to the source instead of pasting it: https://github.com/jakepetroules/wherefrom