I wrote a data encrypt tool, it works on mac os, but not on ubuntu.
The following code shows the difference.
var crypto = require('crypto');
var k = '1234567890123456';
var v = '1234567890123456';
var alg = 'AES-128-CBC';
var buf = new Buffer('Hello world!');
console.log(buf);
var cipher = crypto.createCipheriv(alg, k, v);
var result = cipher.update(buf);
result += cipher.final();
buf = new Buffer(result, 'binary');
console.log(buf);
var decipher = crypto.createDecipheriv(alg, k, v);
decipher.setAutoPadding(auto_padding=false);
result = decipher.update(buf);
result += decipher.final();
buf = new Buffer(result, 'binary');
console.log(buf);
console.log(buf.toString());
outputs, on mac:
<Buffer 48 65 6c 6c 6f 20 77 6f 72 6c 64 21>
<Buffer 17 0e 2d 73 94 bf d4 24 95 b3 a7 49 73 58 5e 3f>
<Buffer 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 04 04 04 04>
Hello world!
ubuntu:
<Buffer 48 65 6c 6c 6f 20 77 6f 72 6c 64 21>
<Buffer 17 0e 2d 73 fd fd fd 24 fd fd fd 49 73 58 5e 3f>
<Buffer 05 6d 69 fd fd 1b 49 62 60 39 fd 68 fd fd fd>
mi��Ib`9�h���
any idea? thx
Node 0.10.0 introduced some internal changes to the crypto module which might break existing code.
With the following fix (as suggested by http://nodejs.org/api/crypto.html#crypto_recent_api_changes) it works on my Debian machine:
var crypto = require('crypto');
crypto.DEFAULT_ENCODING = 'binary';
...
(thanks to #user568109 for making me read the page!)
The aforementioned page also makes suggestions to permanently fix your code, as setting crypto.DEFAULT_ENCODING is considered to be a temporary measure.
Related
Kafka, I have configured my Flume Job with Kafka as the source I'm getting Data in my folder which is not informative is there something needs to be changed in my configuration flume
FLume JOb Config
#source
MY_AGENT.sources.my-source.type = org.apache.flume.source.kafka.KafkaSource
MY_AGENT.sources.my-source.channels = my-channel
MY_AGENT.sources.my-source.batchSize = 10000
MY_AGENT.sources.my-source.useFlumeEventFormat = false
MY_AGENT.sources.my-source.batchDurationMillis = 5000
MY_AGENT.sources.my-source.kafka.bootstrap.servers =${BOOTSTRAP_SERVERS}
MY_AGENT.sources.my-source.kafka.topics = my-topic
MY_AGENT.sources.my-source.kafka.consumer.group.id = my-topic_grp
MY_AGENT.sources.my-source.kafka.consumer.client.id = my-topic_clnt
MY_AGENT.sources.my-source.kafka.compressed.topics = my-topic
MY_AGENT.sources.my-source.kafka.auto.commit.enable = false
MY_AGENT.sources.my-source.kafka.consumer.session.timeout.ms=100000
MY_AGENT.sources.my-source.kafka.consumer.request.timeout.ms=120000
MY_AGENT.sources.my-source.kafka.consumer.max.partition.fetch.bytes=704857
MY_AGENT.sources.my-source.kafka.consumer.auto.offset.reset=latest
#channel
MY_AGENT.channels.my-channel.type = memory
MY_AGENT.channels.my-channel.capacity = 100000000
MY_AGENT.channels.my-channel.transactionCapacity = 100000
MY_AGENT.channels.my-channel.parseAsFlumeEvent = false
#Sink
MY_AGENT.sinks.my-sink.channel = my-channel
MY_AGENT.sinks.my-sink.type = hdfs
MY_AGENT.sinks.my-sink.hdfs.writeFormat= Text
MY_AGENT.sinks.my-sink.hdfs.fileType = DataStream
MY_AGENT.sinks.my-sink.hdfs.kerberosPrincipal =${user}
MY_AGENT.sinks.my-sink.hdfs.kerberosKeytab =${keytab}
MY_AGENT.sinks.my-sink.hdfs.useLocalTimeStamp = true
MY_AGENT.sinks.my-sink.hdfs.path = hdfs://nameservice1/my_hdfs/my_table1/timestamp=%Y%m%d
MY_AGENT.sinks.my-sink.hdfs.rollCount=0
MY_AGENT.sinks.my-sink.hdfs.rollSize=0
MY_AGENT.sinks.my-sink.hdfs.batchSize=100000
MY_AGENT.sinks.my-sink.hdfs.maxOpenFiles=2000
MY_AGENT.sinks.my-sink.hdfs.callTimeout=50000
MY_AGENT.sinks.my-sink.hdfs.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder
MY_AGENT.sinks.my-sink.hdfs.schema.registry.urr=l = ${SCHEMA_URL}
O/P of Data
0000000: 53 45 51 06 21 6f 72 67 2e 61 70 61 63 68 65 2e SEQ.!org.apache.
0000010: 68 61 64 6f 6f 70 2e 69 6f 2e 4c 6f 6e 67 57 72 hadoop.io.LongWr
0000020: 69 74 61 62 6c 65 22 6f 72 67 2e 61 70 61 63 68 itable"org.apach
0000030: 65 2e 68 61 64 6f 6f 70 2e 69 6f 2e 42 79 74 65 e.hadoop.io.Byte
0000040: 73 57 72 69 74 61 62 6c 65 00 00 00 00 00 00 85 sWritable.......
0000050: a6 6f 46 0c f4 16 33 a6 eb 43 c2 21 5c 1b 4f 00 .oF...3..C.!\.O.
0000060: 00 00 18 00 00 00 08 00 00 01 4d c6 1b 01 1f 00 ..........M.....
0000070: 00 00 0c 48 65 6c 6c 6f 20 48 44 46 53 21 0d ...Hello HDFS!.
This kind of Output I'm getting I was Expecting json kind of result can there is something needs to be changed in my config flume file
I have a csv file with text and numbers.
If a number is bigger than 1000, formatted like this: 1 000,
so it has a space as thousand separator, but it is not space. I tried to sed it, and it worked where real space was, but not in this format.
It is also not TAB, I removed all the TABs with "expand -t 1".
The following is a line that demonstrates the issue:
x17_Provident_GDN_REMARKETING_provident.hu_listák;Display_Hálózat;Szeged;2021-03-09;Kedd;Mobil;HUF;1 736;9;130.83;0.00
In penultimate row, in column 8: 1 736
is the problem.
And running this: grep -E -m 1 -e '[;]1[^;]+736[;]' <yourfile.csv | hexdump -C
gives:
00000000 78 31 37 5f 50 72 6f 76 69 64 65 6e 74 5f 47 44 |x17_Provident_GD|
00000010 4e 5f 52 45 4d 41 52 4b 45 54 49 4e 47 5f 70 72 |N_REMARKETING_pr|
00000020 6f 76 69 64 65 6e 74 2e 68 75 5f 6c 69 73 74 c3 |ovident.hu_list.|
00000030 a1 6b 3b 44 69 73 70 6c 61 79 5f 48 c3 a1 6c c3 |.k;Display_H..l.|
00000040 b3 7a 61 74 3b 53 7a 65 67 65 64 3b 32 30 32 31 |.zat;Szeged;2021|
00000050 2d 30 33 2d 30 39 3b 4b 65 64 64 3b 4d 6f 62 69 |-03-09;Kedd;Mobi|
00000060 6c 3b 48 55 46 3b 31 c2 a0 37 33 36 3b 39 3b 31 |l;HUF;1..736;9;1|
00000070 33 30 2e 38 33 3b 30 2e 30 30 0a |30.83;0.00.|
0000007b
It's a 2 byte, UTF-8 encoded non breaking space - c2 a0.
You can use perl to safely remove it.
perl -pe 's/\xc2\xa0//g' dirty.csv > clean.csv
After we know it is No break space, I simply sed it on mac with entry method:
opt+space
cat test4.csv | sed 's/ //g'
Similar to perl, you can use GNU sed with LC_ALL=C:
LC_ALL=C sed 's/\xc2\xa0//g'
I have a 240MB logfile from a PuTTY session. This was mistakenly logged in the "SSH packets and raw data" format instead of "All session output". If I open the file in a text editor then I can see that the data I require (the plain text).
The problem is extracting that from the raw data.
For example:
Incoming raw data at 2016-01-06 15:47:42
00000000 e8 fd c2 d2 88 a9 39 b9 2a 77 2a 7b 4a 60 fc 21 ......9.*w*{J`.!
00000010 1d f5 fc d4 b1 58 1f 4d 68 a4 ef 83 03 39 59 b7 .....X.Mh....9Y.
00000020 41 be 36 7b b5 3c 10 fa 65 27 77 30 77 97 02 39 A.6{.<..e'w0w..9
00000030 46 4c 28 da 5c c6 2c 1e ae 33 db e1 a8 09 ea 4a FL(.\.,..3.....J
00000040 06 94 c6 eb 38 8e d3 d3 33 13 78 08 7c 5f 41 56 ....8...3.x.|_AV
00000050 f1 13 9e e1 ....
Incoming packet #0x31, type 94 / 0x5e (SSH2_MSG_CHANNEL_DATA)
00000000 00 00 01 00 00 00 00 20 64 69 73 61 62 6c 69 6e ....... disablin
00000010 67 20 61 20 72 75 6e 6e 69 6e 67 20 77 61 74 63 g a running watc
00000020 68 64 6f 67 2e 2e 0d 0a hdog....
Incoming raw data at 2016-01-06 15:47:42
00000000 dc 96 f3 54 f8 a8 5c 83 80 7b a8 07 da 79 95 50 ...T..\..{...y.P
00000010 3f 19 2f 0c f0 03 a1 01 a3 33 2f 97 75 9d 47 15 ?./......3/.u.G.
00000020 b9 95 df c6 66 e0 50 32 88 1e db 5b 73 1b 7b ad ....f.P2...[s.{.
I think what I need to do is read only the sections of the file labelled "Incoming packet". Then I can read the ascii character codes and convert to readable text (this will recover the tabs, linefeeds and carriage returns).
I'm not familiar with awk or sed, but I know a bit of grep. How can I go about firstly extracting the sections (of variable size) that I need to translate from ASCII codes to text?
sed -n '/^Incoming packet/,/^Incoming raw data/{//!p}
This will print lines between the matches Incoming packet and Incoming raw. Process this output further to get your desired output.
Print only ASCII characters (print last 17 characters) from the matching line:
sed -n '/Incoming packet/,/Incoming raw data/{//!{s/^.*\(.\{17\}\)/\1/;p}}'
Ref:1, 2
I found that wcslen() in VC++2010 returns correct count of letters; meanwhile Xcode does not.
For example, the code below returns correct 11 in VC++ 2010, but returns incorrect 17 in Xcode 4.2.
const wchar_t *p = L"123abc가1나1다";
size_t plen = wcslen(p);
I guess Xcode app stores wchar_t string as UTF-8 in memory. This is another strange thing.
How can I get 11 just like VC++ in Xcode too?
I ran this program on a Mac Mini running MacOS X 10.7.2 (Xcode 4.2):
#include <stdio.h>
#include <wchar.h>
int main(void)
{
const wchar_t p[] = L"123abc가1나1다";
size_t plen = wcslen(p);
if (fwide(stdout, 1) <= 0)
{
fprintf(stderr, "Failed to make stdout wide-oriented\n");
return -1;
}
wprintf(L"String <<%ls>>\n", p);
putwc(L'\n', stdout);
wprintf(L"Length = %zu\n", plen);
for (size_t i = 0; i < sizeof(p)/sizeof(*p); i++)
wprintf(L"Character %zu = 0x%X\n", i, p[i]);
return 0;
}
When I do a hex dump of the source file, I see:
0x0000: 23 69 6E 63 6C 75 64 65 20 3C 73 74 64 69 6F 2E #include <stdio.
0x0010: 68 3E 0A 23 69 6E 63 6C 75 64 65 20 3C 77 63 68 h>.#include <wch
0x0020: 61 72 2E 68 3E 0A 0A 69 6E 74 20 6D 61 69 6E 28 ar.h>..int main(
0x0030: 76 6F 69 64 29 0A 7B 0A 20 20 20 20 63 6F 6E 73 void).{. cons
0x0040: 74 20 77 63 68 61 72 5F 74 20 70 5B 5D 20 3D 20 t wchar_t p[] =
0x0050: 4C 22 31 32 33 61 62 63 EA B0 80 31 EB 82 98 31 L"123abc...1...1
0x0060: EB 8B A4 22 3B 0A 20 20 20 20 73 69 7A 65 5F 74 ...";. size_t
0x0070: 20 70 6C 65 6E 20 3D 20 77 63 73 6C 65 6E 28 70 plen = wcslen(p
0x0080: 29 3B 0A 20 20 20 20 69 66 20 28 66 77 69 64 65 );. if (fwide
0x0090: 28 73 74 64 6F 75 74 2C 20 31 29 20 3C 3D 20 30 (stdout, 1) <= 0
0x00A0: 29 0A 20 20 20 20 7B 0A 20 20 20 20 20 20 20 20 ). {.
0x00B0: 66 70 72 69 6E 74 66 28 73 74 64 65 72 72 2C 20 fprintf(stderr,
0x00C0: 22 46 61 69 6C 65 64 20 74 6F 20 6D 61 6B 65 20 "Failed to make
0x00D0: 73 74 64 6F 75 74 20 77 69 64 65 2D 6F 72 69 65 stdout wide-orie
0x00E0: 6E 74 65 64 5C 6E 22 29 3B 0A 20 20 20 20 20 20 nted\n");.
0x00F0: 20 20 72 65 74 75 72 6E 20 2D 31 3B 0A 20 20 20 return -1;.
0x0100: 20 7D 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C }. wprintf(L
0x0110: 22 53 74 72 69 6E 67 20 3C 3C 25 6C 73 3E 3E 5C "String <<%ls>>\
0x0120: 6E 22 2C 20 70 29 3B 0A 20 20 20 20 70 75 74 77 n", p);. putw
0x0130: 63 28 4C 27 5C 6E 27 2C 20 73 74 64 6F 75 74 29 c(L'\n', stdout)
0x0140: 3B 0A 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22 ;. wprintf(L"
0x0150: 4C 65 6E 67 74 68 20 3D 20 25 7A 75 5C 6E 22 2C Length = %zu\n",
0x0160: 20 70 6C 65 6E 29 3B 0A 20 20 20 20 66 6F 72 20 plen);. for
0x0170: 28 73 69 7A 65 5F 74 20 69 20 3D 20 30 3B 20 69 (size_t i = 0; i
0x0180: 20 3C 20 73 69 7A 65 6F 66 28 70 29 2F 73 69 7A < sizeof(p)/siz
0x0190: 65 6F 66 28 2A 70 29 3B 20 69 2B 2B 29 0A 20 20 eof(*p); i++).
0x01A0: 20 20 20 20 20 20 77 70 72 69 6E 74 66 28 4C 22 wprintf(L"
0x01B0: 43 68 61 72 61 63 74 65 72 20 25 7A 75 20 3D 20 Character %zu =
0x01C0: 30 78 25 58 5C 6E 22 2C 20 69 2C 20 70 5B 69 5D 0x%X\n", i, p[i]
0x01D0: 29 3B 0A 20 20 20 20 72 65 74 75 72 6E 20 30 3B );. return 0;
0x01E0: 0A 7D 0A .}.
0x01E3:
The output when compiled with GCC is:
String <<123abc
Length = 11
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xAC00
Character 7 = 0x31
Character 8 = 0xB098
Character 9 = 0x31
Character 10 = 0xB2E4
Character 11 = 0x0
Note that the string is truncated at the zero byte - I think that is probably a bug in the system, but it seems a little unlikely that I'd manage to find one on my first attempt at using wprintf(), so it is more likely I'm doing something wrong.
You're right, in the multi-byte UTF-8 source code, the string occupies 17 bytes (8 one-byte basic Latin-1 characters, and 3 characters each encoded using 3 bytes). So, the raw strlen() on the source string would return 17 bytes.
GCC version is:
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Just for giggles, I tried clang, and I get a different result. Compiled using:
clang -o row row.c -Wall -std=c99
using:
Apple clang version 2.1 (tags/Apple/clang-163.7.1) (based on LLVM 3.0svn)
Target: x86_64-apple-darwin11.3.0
Thread model: posix
The output when compiled with clang is:
String <<123abc가1나1다>>
Length = 17
Character 0 = 0x31
Character 1 = 0x32
Character 2 = 0x33
Character 3 = 0x61
Character 4 = 0x62
Character 5 = 0x63
Character 6 = 0xEA
Character 7 = 0xB0
Character 8 = 0x80
Character 9 = 0x31
Character 10 = 0xEB
Character 11 = 0x82
Character 12 = 0x98
Character 13 = 0x31
Character 14 = 0xEB
Character 15 = 0x8B
Character 16 = 0xA4
Character 17 = 0x0
So, now the string appears correctly, but the length is given as 17 instead of 11. Superficially, you can take your choice of bugs - string looks OK (in a terminal - /Applications/Utilities/Terminal - acclimatized to UTF8) but length is wrong, or length is right but string does not appear correctly.
I note that sizeof(wchar_t) in both gcc and clang is 4.
The left hand does not understand what the right hand is doing. I think there's a case for claiming both are broken, in different ways.
i hope you can give me an idea about what's going wrong.
The Szenario:
I run gitweb (CGI) with a script in fastcgi mode:
#!/bin/sh
export FCGI_SOCKET_PATH=127.0.0.1:7001
su git -c "/var/www/vh_[vhost]/htdocs/gitweb.cgi --fastcgi &"
Then i use nginx to serve that content:
...
fastcgi_pass 127.0.0.1:7001;
...
Everything works as expected, but here's the problem:
$ wget "http://git.[host].de/?p=[repo].git;a=summary" -O /tmp/test.txt && file --mime-encoding /tmp/test.txt
> /tmp/test.txt: iso-8859-1
$ su git -c "./gitweb.cgi \"?p=[repo].git;a=summary\" > ./test" && file --mime-encoding ./test
> ./test: utf-8
Which obviously means that fast-cgi output is utf8 while content served by nginx is iso-8859-1.
FireBugs Response Header:
Server nginx
Date Fri, 02 Sep 2011 14:14:08 GMT
Content-Type application/xhtml+xml; charset=utf-8
Transfer-Encoding chunked
Connection close
It looks like the transfer using the socket leads to an encoding problem.
I've tested a lot but can't figure out how to solve this.
although you aren't using PHP, I found the fix for my issue but wrapping the pieces that were being exposed as ISO-8859-1 with: utf8_encode(): http://php.net/manual/en/function.utf8-encode.php
If your CGI is in PERL, maybe http://perldoc.perl.org/utf8.html will solve your problem. It solved mine ... Z�rich
Another option could be to add the following to the http { } statement in your nginx.conf:
charset utf-8;
-sd
I can make it works by using fcgiwrap.
I though some environment variables where different between the two methods, so I added the following code to the gitweb.cgi dispatch() sub:
open my $tmplogfile, ">", "/tmp/gitweb-env.txt";
foreach my $varkey (sort keys %ENV) {
print $tmplogfile "$varkey = $ENV{$varkey}\n";
}
close $tmplogfile;
but the environment were the same.
Something may be done by fcgiwrap, I do not yet found what.
Here are the commands I use and the differences I found using tcpdump on the fcgi socket:
# gitweb spawned by fcgiwrap outputs utf-8
/usr/bin/spawn-fcgi -d /usr/share/gitweb -a 127.0.0.1 -p 3000 -u www-data -g gitolite -P /run/gitweb/gitweb.cgi.pid -- /usr/sbin/fcgiwrap
# Require the following nginx gitweb_fastcgi_params
# fastcgi_param QUERY_STRING $query_string;
# fastcgi_param REQUEST_METHOD $request_method;
# fastcgi_param SCRIPT_NAME $fastcgi_script_name;
# fastcgi_param DOCUMENT_ROOT $document_root;
# With the following nginx configuration
# upstream gitweb {
# server 127.0.0.1:3000;
# }
#
# server {
# listen 80;
#
# server_name git.example.net;
#
# root /usr/share/gitweb;
#
# access_log /var/log/nginx/gitweb-access.log;
# error_log /var/log/nginx/gitweb-errors.log;
#
# location / {
# alias /usr/share/gitweb/gitweb.cgi;
# include gitweb_fastcgi_params;
# fastcgi_pass gitweb;
# }
#
# location /static {
# alias /usr/share/gitweb/static;
# expires 31d;
# }
# }
# STDOUT captured on lo
# Begin of the FCGI answer
# 00000000 01 06 00 01 1f f8 00 00 53 74 61 74 75 73 3a 20 ........ Status:
# 00000010 32 30 30 20 4f 4b 0d 0a 43 6f 6e 74 65 6e 74 2d 200 OK.. Content-
# 00000020 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f Type: ap plicatio
# 00000030 6e 2f 78 68 74 6d 6c 2b 78 6d 6c 3b 20 63 68 61 n/xhtml+ xml; cha
# 00000040 72 73 65 74 3d 75 74 66 2d 38 0d 0a 0d 0a 3c 3f rset=utf -8....<?
# 00000050 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 xml vers ion="1.0
# [...]
#
# "Guido Günther" as UTF-8
# 00000FA0 6c 65 3d 22 53 65 61 72 63 68 20 66 6f 72 20 63 le="Sear ch for c
# 00000FB0 6f 6d 6d 69 74 73 20 61 75 74 68 6f 72 65 64 20 ommits a uthored
# 00000FC0 62 79 20 47 75 69 64 6f 20 47 c3 bc 6e 74 68 65 by Guido G..nthe
# 00000FD0 72 22 20 63 6c 61 73 73 3d 22 6c 69 73 74 22 20 r" class ="list"
Before, gitweb --fastcgi was directly spawned by spawn-fcgi:
# gitweb spawned by spawn-fcgi outputs iso-8859-1
/usr/bin/spawn-fcgi -d /usr/share/gitweb -a 127.0.0.1 -p 3000 -u www-data -g gitolite -P /run/gitweb/gitweb.cgi.pid -- /usr/share/gitweb/gitweb.cgi --fastcgi
# STDOUT captured on lo
# Begin of the FCGI answer with "00 46 02" in place of "1f f8 00" for utf-8 output
# 00000000 01 06 00 01 00 46 02 00 53 74 61 74 75 73 3a 20 .....F.. Status:
# 00000010 32 30 30 20 4f 4b 0d 0a 43 6f 6e 74 65 6e 74 2d 200 OK.. Content-
# 00000020 54 79 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f Type: ap plicatio
# 00000030 6e 2f 78 68 74 6d 6c 2b 78 6d 6c 3b 20 63 68 61 n/xhtml+ xml; cha
# 00000040 72 73 65 74 3d 75 74 66 2d 38 0d 0a 0d 0a 00 00 rset=utf -8......
# 00000050 01 06 00 01 02 88 00 00 3c 3f 78 6d 6c 20 76 65 ........ <?xml ve
# 00000060 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f rsion="1 .0" enco
# 00000070 64 69 6e 67 3d 22 75 74 66 2d 38 22 3f 3e 0a 3c ding="ut f-8"?>.<
# [...]
#
# "Guido Günther" as ISO-8859-1
# 00001128 74 6c 65 3d 22 53 65 61 72 63 68 20 66 6f 72 20 tle="Sea rch for
# 00001138 63 6f 6d 6d 69 74 73 20 61 75 74 68 6f 72 65 64 commits authored
# 00001148 20 62 79 20 47 75 69 64 6f 20 47 fc 6e 74 68 65 by Guid o G.nthe