UTF-8 bytes to string - elisp

I'm trying to convert a list like '(110 111 101 204 136 108) to a string like "noël".
I tried using (mapcar (lambda (c) (decode-char 'unicode c)) '(110 111 101 204 136 108)), but it resulted in (110 111 101 204 136 108), the same as the input. (Also, I recognize that there's no way to decode a Unicode character from a single byte of UTF-8, so that's definitely the wrong function.)

A few options...
(with-temp-buffer
(set-buffer-multibyte nil)
(apply #'insert '(110 111 101 204 136 108))
(decode-coding-region (point-min) (point-max) 'utf-8 t))
or:
(decode-coding-string
(mapconcat #'byte-to-string '(110 111 101 204 136 108) "")
'utf-8)
or more directly:
(string-as-multibyte
(apply #'unibyte-string '(110 111 101 204 136 108)))

Related

zsh: no such file or directory error but file exist

I'm trying to run a compiler but I'm getting an error saying it can not be found, but it looks to exist and the path is good. I even tried a different shell incase zsh was mis-configured, but got the same error. Lost at what to do, any suggestions?
6909077c228a% ls -l toolchain/bin/armv7l-timesys-linux-gnueabi-gcc
-rwxr-xr-x 2 root root 2287465 Sep 11 13:19 toolchain/bin/armv7l-timesys-linux-gnueabi-gcc
6909077c228a% ./toolchain/bin/armv7l-timesys-linux-gnueabi-gcc
zsh: no such file or directory: ./toolchain/bin/armv7l-timesys-linux-gnueabi-gcc
#switch to bash
6909077c228a:~$ ./toolchain/bin/armv7l-timesys-linux-gnueabi-gcc
bash: ./toolchain/bin/armv7l-timesys-linux-gnueabi-gcc: No such file or directory
Edit:
Update showing suggestion, don't see any odd character inserted.
6909077c228a% ls -l toolchain/bin/armv7l-timesys-linux-gnueabi-gcc | od -xcb
0000000 722d 7877 2d72 7278 782d 3220 7220 6f6f
- r w x r - x r - x 2 r o o
055 162 167 170 162 055 170 162 055 170 040 062 040 162 157 157
0000020 2074 6f72 746f 3220 3832 3437 3536 5320
t r o o t 2 2 8 7 4 6 5 S
164 040 162 157 157 164 040 062 062 070 067 064 066 065 040 123
0000040 7065 3120 2031 3331 313a 2039 6f74 6c6f
e p 1 1 1 3 : 1 9 t o o l
145 160 040 061 061 040 061 063 072 061 071 040 164 157 157 154
0000060 6863 6961 2f6e 6962 2f6e 7261 766d 6c37
c h a i n / b i n / a r m v 7 l
143 150 141 151 156 057 142 151 156 057 141 162 155 166 067 154
0000100 742d 6d69 7365 7379 6c2d 6e69 7875 672d
- t i m e s y s - l i n u x - g
055 164 151 155 145 163 171 163 055 154 151 156 165 170 055 147
0000120 756e 6165 6962 672d 6363 000a
n u e a b i - g c c \n
156 165 145 141 142 151 055 147 143 143 012
Depending on how you typed in your initial ls -l line, there may be funny characters in the file name. If you use auto completion, it may have put those funny characters in for you so, if you subsequently attempt to type in the file name without auto completion, that could result in a file not found situation.
The first thing you should do is to check the filename completely, with something like:
ls -l toolchain/bin/armv7l-timesys-linux-gnueabi-gcc | od -xcb
and check the output to ensure there's no funny characters in the name.
If the file does exist in that for (no funny characters), one other possibility is that you're trying to run a 32-bit ELF program on a system that's not correctly set up to run them (i.e., a 64-bit system without the libraries and support infrastructure for 32-bit).
That results in an unhelpful error message since it really should be complaining about not being able to find the loader for your 32-bit executable, rather than the executable itself.
If this is the case, you will need to identify those missing items and install them.

How to list all non ascii bytes with awk?

Here is the test file on google drive.
sample :test file
I want to list all bytes non ascii byte which beyond \x00-\x7f with awk in the test file.
There are 12 bytes beyond \x00-\x7f.
It is my try.
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)print i,$i}' test
146 “
148 ”
181 “
184 ”
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %x \n", i,$i)}' test
146 0
148 0
181 0
184 0
Failed,how to list all the 12 bytes in the file as below format.
146 e2
147 80
148 9c
150 e2
151 80
152 9d
185 e2
186 80
187 9c
190 e2
191 80
192 9d
export LC_ALL=C
awk 'BEGIN{FS=""}{for(i=1;i<=NF;++i)if($i~/[^\x00-\x7f]/)printf("%d %c\n",i,$i)}' test
146
147 �
148 �
150
151 �
152 �
185
186 �
187 �
190
191 �
192 �
How to fix my code?
I'm in a UTF8 shell:
$ locale
LANG=en_US.UTF-8
...
so first:
$ export LC_ALL=C
Then:
$ awk -F '' ' # split record in fields
BEGIN { for(n=0;n<256;n++) # iterate all values
ord[sprintf("%c",n)]=n } # make a hash ord[char]=n
{ for(i=1;i<=NF;i++) # iterate all fields
if(ord[$i]>127) # beyond 7f
print ord[$i] } # print n (value)
' test
Outputs:
226
128
156
226
128
157
226
128
156
226
128
157
which in hex would be:
e2
80
9c
...

How to read all data into one line?

For example:
12 711
112 011 111 61 070 401 2216 515
4 14 516 3
read as
127111120111116107040122165154145163?
Well I reading about STDIN but idk
"12 711 112 011 111 61 070 401 2216 515 4 14 516 3".delete("\s")
or
"12 711 112 011 111 61 070 401 2216 515 4 14 516 3".gsub(/\s/,'')
Read from standard input, then delete all newlines and whitespace.
STDIN.read.delete "\n\s"

awk find the closest match of a list in a matrix

I look for common elements in two files or which row of a matrix has the most elements from a given row. what I understood until now is how to compare fields. I receive the lines which hold the same value in the same fieldnumber.
But how can I open the search to the other field numbers?
awk 'NR==FNR{a[$1];next}$1 in a{print $1" "FNR}' file1 file2
104 3
Expected output:
104 3 111 4 117 2 134 2 148 - 156 4 166 4 176 3 186 - 198 1 221 6 236 -
best match row 4 with 3 elements common.
file 1
104 111 117 134 148 156 166 176 186 198 221 236
file 2
102 108 116 124 132 141 151 162 173 185 198 211
103 109 117 125 134 143 153 163 175 187 200 213
104 110 118 126 135 144 154 165 176 188 201 215
105 111 119 127 136 145 156 166 178 190 203 217
106 112 120 128 137 147 157 168 179 192 205 219
107 113 121 130 139 148 158 169 181 193 207 221
108 114 122 131 140 150 160 171 183 195 208 200
This solution assumes 1) that file1 contains unique values as shown in the provided example and 2) there is only one top ranked line in file2.
awk -v string=$(cat file1 | tr " " ",") \
'{split(string,array,","); cnt=0;
for(i in array) {for(j=1;j<=NF;j++) if(array[i]==$j) cnt++};
if(cnt>cntmax) {cntmax=cnt; NRmax=NR}} END{print NRmax}' file2
4

Golang: "compress/flate" module can't decompress valid deflate compressed HTTP body

This question continues the discussion started here. I found out that the HTTP response body can't be unmarshaled into JSON object because of deflate compression of the latter. Now I wonder how can I perform decompression with Golang. I will appreciate anyone who can show the errors in my code.
Input data
I've dumped the HTTP response body into the 'test' file. Here is it:
$ cat test
x��PAN�0�
;��NtJ�FӮdU�|"oVR�C%�f�����Z.�^Hs�dW뮑�'��DH�S�SFVC����r)G,�����<���z}�x_g�+�2��sl�r/�Oy>��J3\�G�9���N���#[5M�^v/�2Ҕ��|�h��[�~7�_崛<D*���/��i
Let's make sure that this file can be decompressed and even contains valid JSON:
$ zlib-flate -uncompress < test
{"timestamp":{"tv_sec":1428488670,"tv_usec":197041},"string_timestamp":"2015-04-08 10:24:30.197041","monitor_status":"enabled","commands":{"REVERSE_LOOKUP":{"cache":{"outside":{"successes":0,"failures":0,"size":0,"time":0},"internal":{"successes":0,"failures":0,"size":0,"time":0}},"disk":{"outside":{"successes":0,"failures":0,"size":0,"time":0},"internal":{"successes":13366,"failures":0,"size":0,"time":501808}},"total":{"storage":{"successes":0,"failures":0},"proxy":{"successes":13366,"failures":0}}},"clients":{}}}
$ zlib-flate -uncompress < test | python -m json.tool
{
"commands": {
"REVERSE_LOOKUP": {
"cache": {
....
Source code
package main
import (
"bytes"
"compress/flate"
"fmt"
"io/ioutil"
)
func main() {
fname := "./test"
content, err := ioutil.ReadFile(fname)
if err != nil {
panic(err)
}
fmt.Println("File content:\n", content)
enflated, err := ioutil.ReadAll(flate.NewReader(bytes.NewReader(content)))
if err != nil {
panic(err)
}
fmt.Println("Enflated:\n", enflated)
}
Error
$ go run uncompress.go
File content:
[120 156 181 80 65 78 195 48 16 252 10 242 57 69 118 226 166 38 247 156 64 42 42 130 107 100 156 165 88 196 118 149 93 35 160 234 223 89 183 61 112 42 226 192 109 118 118 102 103 180 123 65 62 0 146 13 59 209 237 5 189 15 8 78 116 74 215 70 27 211 174 100 85 184 124 34 111 86 82 171 67 37 144 102 31 183 195 15 167 168 165 90 46 164 94 72 115 165 100 87 235 174 145 215 39 189 168 68 72 209 83 154 7 22 83 70 86 67 180 207 19 140 188 114 41 4 27 71 44 225 155 254 169 223 60 244 195 221 122 125 251 120 95 24 103 221 43 20 144 50 161 31 143 16 179 115 128 8 108 225 114 47 214 79 121 62 15 232 191 224 8 74 51 6 92 213 71 130 57 218 233 175 78 182 142 30 223 254 35 91 53 77 219 94 118 47 165 50 210 148 18 148 232 124 128 31 104 183 151 91 176 126 55 167 143 207 95 3 15 229 180 155 60 68 42 159 231 241 27 47 165 167 25]
panic: flate: corrupt input before offset 5
goroutine 1 [running]:
runtime.panic(0x4a7180, 0x5)
/usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
main.main()
/home/isaev/side-projects/elliptics-manager/uncompress.go:20 +0x2a3
exit status 2
PS Ubuntu 14.10, Go 1.2.1
Your input is not a simple deflated block, it's a zlib stream.
According to the ZLIB Compressed Data Format Specification 3.3 the first 2 bytes are:
-------------
| CMF | FLG |
-------------
The Compression Method and flags. Your input starts with [120, 156] which is 78 9C in hexa. This is the Default Compression. Also no dictionary follows, so the subsequent data is the compressed data.
Bits 0 to 3 are CM Compression Method and bits 4 to 7 are CINFO Compression Info. In this case CINFO=7 indicates a 32K window size, CM=8 denotes the "deflate" compression method. FLG bit 5 tells if a dictionary is preset, which is in this case. Details of the FLG are also in the linked RFC 1950.
So your input basically tells the rest of the data was constructed using default compression, but the go flate package does not decode this.
Change your decompression to omit the first 2 bytes like this and it will work:
enflated, err := ioutil.ReadAll(flate.NewReader(bytes.NewReader(content[2:])))
Try it on the Go Playground. But...
Use Proper ZLib decompression!
We got lucky this time because the compression level is the default and dictionary was preset. If not, you won't be able to decode it using the flate package. Since the input is a zlib stream, you should use the compress/zlib package to properly decode it and not rely on luck:
r, err := zlib.NewReader(bytes.NewReader(content))
if err != nil {
panic(err)
}
enflated, err := ioutil.ReadAll(r)
if err != nil {
panic(err)
}
fmt.Println(string(enflated))
Try the zlib variant on the Go Playground.

Resources