Creating Gmail labels with Japanese characters - utf-8

I've got some code to create labels in Gmail, which usually works fine. But now the requirement is to create a label with Japanese characters, specifically "アーカイブ". I am encoding the json like this:
7B 0D 0A 22 6E 61 6D 65 22 3A 22 E3 82 A2 E3 83 {.."name":".....
BC E3 82 AB E3 82 A4 E3 83 96 22 2C 0D 0A 22 6D ..........",.."m
65 73 73 61 67 65 4C 69 73 74 56 69 73 69 62 69 essageListVisibi
6C 69 74 79 22 3A 22 73 68 6F 77 22 2C 0D 0A 22 lity":"show",.."
6C 61 62 65 6C 4C 69 73 74 56 69 73 69 62 69 6C labelListVisibil
69 74 79 22 3A 22 6C 61 62 65 6C 53 68 6F 77 22 ity":"labelShow"
0D 0A 7D 0D 0A 00 00 00 00 00 00 00 00 00 00 00 ..}.............
As you can see, the first character is the UTF8 sequence E3 82 A2, which if you look at this table (https://www.utf8-chartable.de/unicode-utf8-table.pl?start=12352&names=-) seems to be correct for that first character. The others look OK also.
As a test, I created a Japanese folder with that name in the UI, then got a dump of the json that Gmail produces when I get a list of existing folders. What Gmail produces is exactly the same as what I'm trying to import. So I don't see what I could be doing wrong here. Any help appreciated.

Never mind this - turns out my Japanese characters translate to "Archive" which is apparently a reserved folder name.

Related

Get more informative data set produced from Flume+Kafka

Kafka, I have configured my Flume Job with Kafka as the source I'm getting Data in my folder which is not informative is there something needs to be changed in my configuration flume
FLume JOb Config
#source
MY_AGENT.sources.my-source.type = org.apache.flume.source.kafka.KafkaSource
MY_AGENT.sources.my-source.channels = my-channel
MY_AGENT.sources.my-source.batchSize = 10000
MY_AGENT.sources.my-source.useFlumeEventFormat = false
MY_AGENT.sources.my-source.batchDurationMillis = 5000
MY_AGENT.sources.my-source.kafka.bootstrap.servers =${BOOTSTRAP_SERVERS}
MY_AGENT.sources.my-source.kafka.topics = my-topic
MY_AGENT.sources.my-source.kafka.consumer.group.id = my-topic_grp
MY_AGENT.sources.my-source.kafka.consumer.client.id = my-topic_clnt
MY_AGENT.sources.my-source.kafka.compressed.topics = my-topic
MY_AGENT.sources.my-source.kafka.auto.commit.enable = false
MY_AGENT.sources.my-source.kafka.consumer.session.timeout.ms=100000
MY_AGENT.sources.my-source.kafka.consumer.request.timeout.ms=120000
MY_AGENT.sources.my-source.kafka.consumer.max.partition.fetch.bytes=704857
MY_AGENT.sources.my-source.kafka.consumer.auto.offset.reset=latest
#channel
MY_AGENT.channels.my-channel.type = memory
MY_AGENT.channels.my-channel.capacity = 100000000
MY_AGENT.channels.my-channel.transactionCapacity = 100000
MY_AGENT.channels.my-channel.parseAsFlumeEvent = false
#Sink
MY_AGENT.sinks.my-sink.channel = my-channel
MY_AGENT.sinks.my-sink.type = hdfs
MY_AGENT.sinks.my-sink.hdfs.writeFormat= Text
MY_AGENT.sinks.my-sink.hdfs.fileType = DataStream
MY_AGENT.sinks.my-sink.hdfs.kerberosPrincipal =${user}
MY_AGENT.sinks.my-sink.hdfs.kerberosKeytab =${keytab}
MY_AGENT.sinks.my-sink.hdfs.useLocalTimeStamp = true
MY_AGENT.sinks.my-sink.hdfs.path = hdfs://nameservice1/my_hdfs/my_table1/timestamp=%Y%m%d
MY_AGENT.sinks.my-sink.hdfs.rollCount=0
MY_AGENT.sinks.my-sink.hdfs.rollSize=0
MY_AGENT.sinks.my-sink.hdfs.batchSize=100000
MY_AGENT.sinks.my-sink.hdfs.maxOpenFiles=2000
MY_AGENT.sinks.my-sink.hdfs.callTimeout=50000
MY_AGENT.sinks.my-sink.hdfs.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
MY_AGENT.sinks.my-sink.hdfs.schema.registry.urr=l = ${SCHEMA_URL}
O/P of Data
0000000: 53 45 51 06 21 6f 72 67 2e 61 70 61 63 68 65 2e SEQ.!org.apache.
0000010: 68 61 64 6f 6f 70 2e 69 6f 2e 4c 6f 6e 67 57 72 hadoop.io.LongWr
0000020: 69 74 61 62 6c 65 22 6f 72 67 2e 61 70 61 63 68 itable"org.apach
0000030: 65 2e 68 61 64 6f 6f 70 2e 69 6f 2e 42 79 74 65 e.hadoop.io.Byte
0000040: 73 57 72 69 74 61 62 6c 65 00 00 00 00 00 00 85 sWritable.......
0000050: a6 6f 46 0c f4 16 33 a6 eb 43 c2 21 5c 1b 4f 00 .oF...3..C.!\.O.
0000060: 00 00 18 00 00 00 08 00 00 01 4d c6 1b 01 1f 00 ..........M.....
0000070: 00 00 0c 48 65 6c 6c 6f 20 48 44 46 53 21 0d ...Hello HDFS!.
This kind of Output I'm getting I was Expecting json kind of result can there is something needs to be changed in my config flume file

How can I remove non-breaking spaces from a text file in bash?

I have a csv file with text and numbers.
If a number is bigger than 1000, formatted like this: 1 000,
so it has a space as thousand separator, but it is not space. I tried to sed it, and it worked where real space was, but not in this format.
It is also not TAB, I removed all the TABs with "expand -t 1".
The following is a line that demonstrates the issue:
x17_Provident_GDN_REMARKETING_provident.hu_listák;Display_Hálózat;Szeged;2021-03-09;Kedd;Mobil;HUF;1 736;9;130.83;0.00
In penultimate row, in column 8: 1 736
is the problem.
And running this: grep -E -m 1 -e '[;]1[^;]+736[;]' <yourfile.csv | hexdump -C
gives:
00000000 78 31 37 5f 50 72 6f 76 69 64 65 6e 74 5f 47 44 |x17_Provident_GD|
00000010 4e 5f 52 45 4d 41 52 4b 45 54 49 4e 47 5f 70 72 |N_REMARKETING_pr|
00000020 6f 76 69 64 65 6e 74 2e 68 75 5f 6c 69 73 74 c3 |ovident.hu_list.|
00000030 a1 6b 3b 44 69 73 70 6c 61 79 5f 48 c3 a1 6c c3 |.k;Display_H..l.|
00000040 b3 7a 61 74 3b 53 7a 65 67 65 64 3b 32 30 32 31 |.zat;Szeged;2021|
00000050 2d 30 33 2d 30 39 3b 4b 65 64 64 3b 4d 6f 62 69 |-03-09;Kedd;Mobi|
00000060 6c 3b 48 55 46 3b 31 c2 a0 37 33 36 3b 39 3b 31 |l;HUF;1..736;9;1|
00000070 33 30 2e 38 33 3b 30 2e 30 30 0a |30.83;0.00.|
0000007b
It's a 2 byte, UTF-8 encoded non breaking space - c2 a0.
You can use perl to safely remove it.
perl -pe 's/\xc2\xa0//g' dirty.csv > clean.csv
After we know it is No break space, I simply sed it on mac with entry method:
opt+space
cat test4.csv | sed 's/ //g'
Similar to perl, you can use GNU sed with LC_ALL=C:
LC_ALL=C sed 's/\xc2\xa0//g'

Chunk size appears on Browser page

I'm implementing a small web server into a wifi micro. To aid in development and test, I have ported it to Windows console program.
I use chunked transfer processing. The following is what shows up on the browser:
0059
Hello World
0
The 59 is the hex size of the chunk and the 0 is the chunked terminating size
This is the data captured via wireshark:
This is the first message I send which are the headers
0000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d HTTP/1.1 200 OK.
0010 0a 53 65 72 76 65 72 3a 20 54 72 61 6e 73 66 65 .Server: Transfe
0020 72 2d 45 6e 63 6f 64 69 6e 67 3a 20 63 68 75 6e r-Encoding: chun
0030 6b 65 64 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 ked..Content-Typ
0040 65 3a 20 74 65 78 74 2f 68 74 6d 6c 0d 0a 43 61 e: text/html..Ca
0050 63 68 65 2d 43 6f 6e 74 72 6f 6c 3a 20 6d 61 78 che-Control: max
0060 2d 61 67 65 3d 33 36 30 30 2c 20 6d 75 73 74 2d -age=3600, must-
0070 72 65 76 61 6c 69 64 61 74 65 0d 0a 0d 0a revalidate....
The next block is the chunked data
0000 30 30 35 39 0d 0a 3c 68 74 6d 6c 3e 0a 3c 68 65 0059..<html>.<he
0010 61 64 3e 3c 74 69 74 6c 65 3e 57 65 62 20 53 65 ad><title>Web Se
0020 72 76 65 72 3c 2f 74 69 74 6c 65 3e 0a 3c 2f 68 rver</title>.</h
0030 65 61 64 3e 0a 3c 62 6f 64 79 3e 0a 3c 68 31 3e ead>.<body>.<h1>
0040 48 65 6c 6c 6f 20 57 6f 72 6c 64 3c 2f 68 31 3e Hello World</h1>
0050 0a 3c 2f 62 6f 64 79 3e 3c 2f 68 74 6d 6c 3e 0d .</body></html>.
0060 0a 30 0d 0a 0d 0a .0....
The chunked values are being displayed on both Chrome and IE.
Can anyone see an issue with my data that would cause the issue.
Thanks
Solved:
I mistakenly remove the server name so now the browser is taking the transfer encoding as the server name and does not understand the chunked message size -- it thinks its just data to display.

Reformat xattr output and store it in MySQL using a BASH script

I have a script that collects a bunch of file system object information (hashes, dates, etc) and stores it in a MySQL database (one row per object).
The script is running in Bash in Mac OS X 10.10.4 (MBP).
I would like to store the HFS+ Extended Attributes in the database as well. xattr gives output as shown below, I would like to dump the hex and formatting text leaving just the attribute name and the ASCII value. This means not just dumping the line numbers, hex, and | formatting characters but also concatenate the value onto one line per attribute name with the attribute name prepended.
Note that each object (file/folder) may have multiple attributes and the attribute names are not defined.
Take this input:
$xattr -l wordpress-3.9.6.zip
com.apple.metadata:kMDItemWhereFroms:
00000000 62 70 6C 69 73 74 30 30 A2 01 02 5F 10 29 68 74 |bplist00..._.)ht|
00000010 74 70 73 3A 2F 2F 77 6F 72 64 70 72 65 73 73 2E |tps://wordpress.|
00000020 6F 72 67 2F 77 6F 72 64 70 72 65 73 73 2D 33 2E |org/wordpress-3.|
00000030 39 2E 36 2E 7A 69 70 5F 10 2F 68 74 74 70 73 3A |9.6.zip_./https:|
00000040 2F 2F 77 6F 72 64 70 72 65 73 73 2E 6F 72 67 2F |//wordpress.org/|
00000050 64 6F 77 6E 6C 6F 61 64 2F 72 65 6C 65 61 73 65 |download/release|
00000060 2D 61 72 63 68 69 76 65 2F 08 0B 37 00 00 00 00 |-archive/..7....|
00000070 00 00 01 01 00 00 00 00 00 00 00 03 00 00 00 00 |................|
00000080 00 00 00 00 00 00 00 00 00 00 00 69 |...........i|
0000008c
com.apple.quarantine: 0001;55701556;Google Chrome.app;8AD80928-CB48-48EA-8A1B-EC4B0BE656A9
And make it look like this:
com.apple.metadata:kMDItemWhereFroms: bplist00..._.)https://wordpress.org/wordpress-3.9.6.zip_./https://wordpress.org/download/release-archive/..7...............................i
com.apple.quarantine: 0001;55701556;Google Chrome.app;8AD80928-CB48-48EA-8A1B-EC4B0BE656A9
Thanks for any help
MC
xattr is not very customizable; it's meant more for human browsing than scripted use. You're better off using another language. Here's an example in Python:
import xattr
x = xattr.xattr('wordpress-3.9.6.zip')
for name, value in x:
print name, repr(x[name])
You may want to drop the call to repr (or use a different wrapper around x[name]), depending on the desired output.
Note that you almost certainly do not want the . from the ASCII output of the xattr program, since they represent any non-printable ASCII character.

Printing string representations of xattr hex output

I'm trying to write a script to extract the original download URL from disk images downloaded with Safari on OS X using xattr, so that I can rename them but still easily obtain their original names for reference.
This command prints the hex representation of the URL that the given file was downloaded from, as an example:
xattr -p com.apple.metadata:kMDItemWhereFroms *.dmg
gives
62 70 6C 69 73 74 30 30 A1 01 5F 10 4F 68 74 74
70 3A 2F 2F 61 64 63 64 6F 77 6E 6C 6F 61 64 2E
61 70 70 6C 65 2E 63 6F 6D 2F 4D 61 63 5F 4F 53
5F 58 2F 6D 61 63 5F 6F 73 5F 78 5F 31 30 2E 36
2E 31 5F 62 75 69 6C 64 5F 31 30 62 35 30 34 2F
30 34 31 35 30 37 33 61 2E 64 6D 67 08 0A 00 00
00 00 00 00 01 01 00 00 00 00 00 00 00 02 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 5C
The URL starts at the 14th byte (if I counted correctly) and is NULL terminated. How can I format this string so that I get a string output as follows:
http://adcdownload.apple.com/Mac_OS_X/mac_os_x_10.6.1_build_10b504/0415073a.dmg
(don't worry, this link doesn't work unless you're logged in to ADC)
...essentially, the same thing Finder will display in Get Info. I tried piping xattr's output to xxd but I'm not sure how to specify the offset so the string starts at the right place.
So, after looking at the binary data returned by xattr -p, I realized that it was actually a binary plist... hence "bplist" at the front of the data. For some reason I didn't notice this before, but in light of this, here's a proper solution that should work on every OS X from 10.5 to 10.8.
To avoid duplication, I'll link to the source instead of pasting it: https://github.com/jakepetroules/wherefrom

Resources