How can I translate this IDNA URL to Unicode?

How can I translate this IDNA URL to Unicode? - go

I want to translate an IDNA ASCII URL to Unicode.
package main
import (
"golang.org/x/net/idna"
"log"
)
func main() {
input := "https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai"
idnaProfile := idna.New()
output, err := idnaProfile.ToUnicode(input)
if err != nil {
log.Fatal(err)
}
log.Printf("%s", output)
}
The output is: https://xn---36-mddtcafmzdgfgpbxs0h7c.рф
It seems the IDNA package only converts the TLD. Is there some option that can convert the full URL?
I need to get the same result as when I paste the ASCII URL into Chrome:
https://природный-источник36.рф

You simply need to parse the URL first:
package main
import (
"golang.org/x/net/idna"
"net/url"
)
func main() {
p, e := url.Parse("https://xn---36-mddtcafmzdgfgpbxs0h7c.xn--p1ai")
if e != nil {
panic(e)
}
s, e := idna.ToUnicode(p.Host)
if e != nil {
panic(e)
}
println(s == "природный-источник36.рф")
}
https://golang.org/pkg/net/url#Parse

An IDNA string consists of "labels" separated by dots ".". Each label may be encoded (if it starts with "xn--") or not (if it doesn't). Your string consists of two labels, https://xn---36-mddtcafmzdgfgpbxs0h7c and xn--p1ai. Only the second one is IDNA encoded.
Just process those parts of the URL which are IDNA encoded (i.e. the hostname). Anything else is just nonsensical and cannot work.

Related

How to parse Prometheus data

I have been able to obtain the metrices by sending an HTTP GET as follows:
# TYPE net_conntrack_dialer_conn_attempted_total untyped net_conntrack_dialer_conn_attempted_total{dialer_name="federate",instance="localhost:9090",job="prometheus"} 1 1608520832877
Now I need to parse this data and obtain control over every piece of data so that I can convert tand format like json.
I have been looking into the ebnf package in Go:
ebnf package
Can somebody point me the right direction to parse the above data?

There's a nice package already available to do that and it's by the Prometheus's Authors itself.
They have written a bunch of Go libraries that are shared across Prometheus components and libraries. They are considered internal to Prometheus but you can use them.
Refer: github.com/prometheus/common doc. There's a package called expfmt that can decode and encode the Prometheus's Exposition Format (Link). Yes, it follows the EBNF syntax so ebnf package could also be used but you're getting expfmt right out of the box.
Package used: expfmt
Sample Input:
# HELP net_conntrack_dialer_conn_attempted_total
# TYPE net_conntrack_dialer_conn_attempted_total untyped
net_conntrack_dialer_conn_attempted_total{dialer_name="federate",instance="localhost:9090",job="prometheus"} 1 1608520832877
Sample Program:
package main
import (
"flag"
"fmt"
"log"
"os"
dto "github.com/prometheus/client_model/go"
"github.com/prometheus/common/expfmt"
)
func fatal(err error) {
if err != nil {
log.Fatalln(err)
}
}
func parseMF(path string) (map[string]*dto.MetricFamily, error) {
reader, err := os.Open(path)
if err != nil {
return nil, err
}
var parser expfmt.TextParser
mf, err := parser.TextToMetricFamilies(reader)
if err != nil {
return nil, err
}
return mf, nil
}
func main() {
f := flag.String("f", "", "set filepath")
flag.Parse()
mf, err := parseMF(*f)
fatal(err)
for k, v := range mf {
fmt.Println("KEY: ", k)
fmt.Println("VAL: ", v)
}
}
Sample Output:
KEY: net_conntrack_dialer_conn_attempted_total
VAL: name:"net_conntrack_dialer_conn_attempted_total" type:UNTYPED metric:<label:<name:"dialer_name" value:"federate" > label:<name:"instance" value:"localhost:9090" > label:<name:"job" value:"prometheus" > untyped:<value:1 > timestamp_ms:1608520832877 >
So, expfmt is a good choice for your use-case.
Update: Formatting problem in OP's posted input:
Refer:
https://github.com/prometheus/pushgateway/issues/147#issuecomment-368215305
https://github.com/prometheus/pushgateway#command-line
Note that in the text protocol, each line has to end with a line-feed
character (aka 'LF' or '\n'). Ending a line in other ways, e.g. with
'CR' aka '\r', 'CRLF' aka '\r\n', or just the end of the packet, will
result in a protocol error.
But from the error message, I could see \r char is present in in the put which is not acceptable by design. So use \n for line endings.

Illegal base64 data at input byte for seemingly valid png

I'm attempting to decode a data URL that was generated from a javascript canvas' toDataURL function.
The following golang application fails with the error illegal base64 data at input byte 129)
package main
import (
"encoding/base64"
"fmt"
"net/url"
"strings"
)
func main() {
pngData := "iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABHNCSVQICAgIfAhkiAAAALVJREFUGFdt0MsKQVEYhuG9CeU0VgamihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQhhHUTTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/o99s5Qf3v0sUAHK///JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuqV82oHOrphNEPw3UwfBVmbU4AAAAASUVORK5CYII="
pngData, err := url.PathUnescape(pngData)
if err != nil {
fmt.Printf("Failed to unescape", err.Error())
return
}
pngData = strings.Replace(pngData, "+", "", -1)
_, err = base64.URLEncoding.WithPadding(base64.NoPadding).DecodeString(pngData)
if err != nil {
fmt.Printf("Failed to decode", err.Error())
}
}
If I pass the value from pngData into a web-based base64 to png converter, it has no problem generating the image. (a horizontal line of white-ish values)
I have tried StdEncoding, RawURLEncoding, and their Raw counterparts. I've also tried with or without padding and I've tried the same pngData string with an additional = and without the trailing =.
Any thoughts on why Golang is refusing to decode this data?
Some of the images I get from the canvas decode just fine. But some, like this one, do not.

Steven Penny's answer shows a way to do this, but I have to ask:
Why do you call url.PathUnescape? The data contain no path escape characters (no %-encoding). The call is harmless but unnecessary.
Why did you use the alternate encoding (URLEncoding)? As we see in the base64 package documentation, the difference between the standard encoding and the alternate encoding is that the alternate encoding uses - and _ in place of + and /. But if we look at the data string, it contains plus signs and slashes, and has no dashes or underscores, so it has clearly been encoded with the standard encoding.
Why did you call for base64.NoPadding? The input data ends with =, which is a padding character.
Why did you call for base64.NoPadding via base64.URLEncoding.WithPadding(base64.NoPadding)? The documentation shows us that this can be spelled base64.RawURLEncoding.
Why did you explicitly ask to strip out + characters (not a good idea) but not / characters?
If we drop all of those (and split up a long input line for posting purposes) we get this (playground link):
package main
import (
"encoding/base64"
"fmt"
)
func main() {
data := "iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABH" +
"NCSVQICAgIfAhkiAAAALVJREFUGFdt0MsKQVEYhuG9CeU0Vgam" +
"ihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQh" +
"hHUTTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+" +
"vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/o99s5Qf3v0sUAHK/" +
"//JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuq" +
"V82oHOrphNEPw3UwfBVmbU4AAAAASUVORK5CYII="
b, err := base64.StdEncoding.DecodeString(data)
if err != nil {
fmt.Printf("Failed to decode: %s\n", err)
} else {
fmt.Printf("bytes begin with: %q\n", b[0:4])
}
}

This seems to work fine:
package main
import (
"encoding/base64"
"image"
"image/png"
"os"
"strings"
)
func main() {
s := `iVBORw0KGgoAAAANSUhEUgAAAF0AAAABCAYAAAC8PaJPAAAABHNCSVQICAgIfAhkiAAAALVJ
REFUGFdt0MsKQVEYhuG9CeU0VgamihBl6hqMXYoLchduQFuKicyFARPn0/J+9Q2tevrba639H1YcQhhHU
TTBACloFZHFw3FJbODj7xdxjSqmqGCHms+3vv8m3nDwWSA+vVcipjFHHlcMcUTBtcuueSHqfgzlVJ7Nn/
o99s5Qf3v0sUAHK///JbahmZQ3gy4S96OZc9A90fkdepM6tNTHDOqz6T3NofeTluuqV82oHOrphNEPw3U
wfBVmbU4AAAAASUVORK5CYII=`
d := base64.NewDecoder(base64.StdEncoding, strings.NewReader(s))
p, e := png.Decode(d)
if e != nil {
panic(e)
}
c, e := os.Create("a.png")
if e != nil {
panic(e)
}
png.Encode(c, p.(*image.NRGBA))
}

Go regExRepl-Script does not change the text file

my go script should add one newline before matching the regEx-Search-String ^(.+[,]+\n).
The Prototype i had tested before into the editor:
i want add newlines before this lines: \n$1.
This works if i try it into the Text-Editor.
If i try this (see line 24) with my script it is changing nothing and sends no error.
Any ideas what i do wrong?
Example
i like to use PCRE like it works in this Example https://regex101.com/r/sB9wW6/17
Same Example here:
Example source
Dear sir,
Thanks for your interest.
expected result
#### here is a newline ####
Dear sir,
Thanks for your interest.
result is (produced by the script below)
Dear sir,
Thanks for your interest.
go script:
// replace in files and store the new copy of it.
package main
import (
"fmt"
"io/ioutil"
"os"
"path/filepath"
"regexp"
"strings"
"time"
)
func visit(path string, fi os.FileInfo, err error) error {
matched, err := filepath.Match("*.csv", fi.Name())
if err != nil {
panic(err)
return err
}
if matched {
read, err := ioutil.ReadFile(path)
if err != nil {
panic(err)
}
newContents := string(read)
newContents = regExRepl(`^(.+[,]+\n)`, newContents, `\n\n\n$1`)
var re = regexp.MustCompile(`[\W]+`)
t_yymmdd := regexp.MustCompile(`[\W]+`).ReplaceAllString(time.Now().Format(time.RFC3339), `-`)[:10]
t_hhss := re.ReplaceAllString(time.Now().Format(time.RFC3339), `-`)[11:19]
t_yymmddhhss := t_yymmdd + "_" + t_hhss
fmt.Println(t_yymmddhhss)
filePath := fileNameWithoutExtension(path) + t_yymmddhhss + ".csv"
err = ioutil.WriteFile(filePath, []byte(newContents), 0)
if err != nil {
panic(err)
}
}
return nil
}
func regExRepl(regExPatt string, newContents string, regExRepl string) string {
return regexp.MustCompile(regExPatt).ReplaceAllString(newContents, regExRepl)
}
func main() {
err := filepath.Walk("./november2020messages.csv", visit) // <== read all files in current folder 20:12:06 22:44:42
if err != nil {
panic(err)
}
}
func fileNameWithoutExtension(fileName string) string {
return strings.TrimSuffix(fileName, filepath.Ext(fileName))
}

for interpretation \n as newline don't us
`\n`` use "\n"
may use ^(.+[,]+) instead ^(.+[,]+\n) and ad (?m) before for multi-line replacements
this suggestion you could test here: https://play.golang.org/p/25_0GJ93oCT
The following example illustrates the difference (in golang-playground here https://play.golang.org/p/FkPwElhx-Xu ):
// example from:
package main
import (
"fmt"
"regexp"
)
func main() {
newContents := `line 1,
line 2
line a,
line b`
newContents1 := regexp.MustCompile(`^(.+[,]+\n)`).ReplaceAllString(newContents, `\n$1`)
fmt.Println("hi\n" + newContents1)
newContents1 = regexp.MustCompile(`(?m)^(.+[,]+\n)`).ReplaceAllString(newContents, "\n$1")
fmt.Println("ho\n" + newContents1)
}
Result:
hi
\nline 1,
line 2
line a,
line b
ho
line 1,
line 2
line a,
line b

How to decompress a []byte content in gzip format that gives an error when unmarshaling

I'm making a request to an API, which with I get a []byte out of the response (ioutil.ReadAll(resp.Body)). I'm trying to unmarshal this content, but seems to be not encoded on utf-8 format, as unmarshal returns an error. I'm trying this to do so:
package main
import (
"encoding/json"
"fmt"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte variable
var data interface{}
err := json.Unmarshal(content, &data)
if err != nil {
panic(err.Error())
}
fmt.Println("Data from response", data)
}
I get as an error that invalid character '\x1f' looking for beginning of value. For the record, the response includes in the header that Content-Type:[application/json; charset=utf-8].
How can I decode content to avoid these invalid characters when unmarshaling?
Edit
This is the hexdump of content: play.golang.org/p/oJ5mqERAmj

Judging by your hex dump you are receiving gzip encoded data so you'll need to use compress/gzip to decode it first.
Try something like this
package main
import (
"bytes"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"some/api"
)
func main() {
content := api.SomeAPI.SomeRequest() // []byte variable
// decompress the content into an io.Reader
buf := bytes.NewBuffer(content)
reader, err := gzip.NewReader(buf)
if err != nil {
panic(err)
}
// Use the stream interface to decode json from the io.Reader
var data interface{}
dec := json.NewDecoder(reader)
err = dec.Decode(&data)
if err != nil && err != io.EOF {
panic(err)
}
fmt.Println("Data from response", data)
}
Previous
Character \x1f is the unit separator character in ASCII and UTF-8. It is never part of an UTF-8 encoding, however can be used to mark off different bits of text. A string with an \x1f can valid UTF-8 but not valid json as far as I know.
I think you need to read the API specification closely to find out what they are using the \x1f markers for, but in the meantime you could try removing them and see what happens, eg
import (
"bytes"
"fmt"
)
func main() {
b := []byte("hello\x1fGoodbye")
fmt.Printf("b was %q\n", b)
b = bytes.Replace(b, []byte{0x1f}, []byte{' '}, -1)
fmt.Printf("b is now %q\n", b)
}
Prints
b was "hello\x1fGoodbye"
b is now "hello Goodbye"
Playground link

How to properly output a string in a Windows console with go?

I have a exe in go which prints utf-8 encoded strings, with special characters in it.
Since that exe is made to be used from a console window, its output is mangled because Windows uses ibm850 encoding (aka code page 850).
How would you make sure the go exe print correctly encoded strings for a console windows, ie print for instance:
éèïöîôùòèìë
instead of (without any translation to the right charset)
├®├¿├»├Â├«├┤├╣├▓├¿├¼├½

// Alert: This is Windows-specific, uses undocumented methods, does not
// handle stdout redirection, does not check for errors, etc.
// Use at your own risk.
// Tested with Go 1.0.2-windows-amd64.
package main
import "unicode/utf16"
import "syscall"
import "unsafe"
var modkernel32 = syscall.NewLazyDLL("kernel32.dll")
var procWriteConsoleW = modkernel32.NewProc("WriteConsoleW")
func consolePrintString(strUtf8 string) {
var strUtf16 []uint16
var charsWritten *uint32
strUtf16 = utf16.Encode([]rune(strUtf8))
if len(strUtf16) < 1 {
return
}
syscall.Syscall6(procWriteConsoleW.Addr(), 5,
uintptr(syscall.Stdout),
uintptr(unsafe.Pointer(&strUtf16[0])),
uintptr(len(strUtf16)),
uintptr(unsafe.Pointer(charsWritten)),
uintptr(0),
0)
}
func main() {
consolePrintString("Hello ☺\n")
consolePrintString("éèïöîôùòèìë\n")
}

The online book "Network programming with Go" (CC BY-NC-SA 3.0) has a chapter on Charsets (Managing character sets and encodings), in which Jan Newmarch details the conversion of one charset to another. But it seems cumbersome.
Here is a solution (I might have missed a much simpler one), using the library go-charset (from Roger Peppe).
I translate an utf-8 string to an ibm850 encoded one, allowing me to print in a DOS windows:
éèïöîôùòèìë
The translation function is detailed below:
package main
import (
"bytes"
"code.google.com/p/go-charset/charset"
_ "code.google.com/p/go-charset/data"
"fmt"
"io"
"log"
"strings"
)
func translate(tr charset.Translator, in string) (string, error) {
var buf bytes.Buffer
r := charset.NewTranslatingReader(strings.NewReader(in), tr)
_, err := io.Copy(&buf, r)
if err != nil {
return "", err
}
return string(buf.Bytes()), nil
}
func Utf2dos(in string) string {
dosCharset := "ibm850"
cs := charset.Info(dosCharset)
if cs == nil {
log.Fatal("no info found for %q", dosCharset)
}
fromtr, err := charset.TranslatorTo(dosCharset)
if err != nil {
log.Fatal("error making translator from %q: %v", dosCharset, err)
}
out, err := translate(fromtr, in)
if err != nil {
log.Fatal("error translating from %q: %v", dosCharset, err)
}
return out
}
func main() {
test := "éèïöîôùòèìë"
fmt.Println("utf-8:\n", test)
fmt.Println("ibm850:\n", Utf2dos(test))
}

Since 2016, You can now (2017) consider the golang.org/x/text, which comes with a encoding charmap including the ISO-8859 family as well as the Windows 1252 character set.
See "Go Quickly - Converting Character Encodings In Golang"
r := charmap.ISO8859_1.NewDecoder().Reader(f)
io.Copy(out, r)
That is an extract of an example opening a ISO-8859-1 source text (my_isotext.txt), creating a destination file (my_utf.txt), and copying the first to the second.
But to decode from ISO-8859-1 to UTF-8, we wrap the original file reader (f) with a decoder.
I just tested (pseudo-code for illustration):
package main
import (
"fmt"
"golang.org/x/text/encoding"
"golang.org/x/text/encoding/charmap"
)
func main() {
t := "string composed of character in cp 850"
d := charmap.CodePage850.NewDecoder()
st, err := d.String(t)
if err != nil {
panic(err)
}
fmt.Println(st)
}
The result is a string readable in a Windows CMD.
See more in this Nov. 2018 reddit thread.

It is something that Go still can't do out of the box - see http://code.google.com/p/go/issues/detail?id=3376#c6.
Alex

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I translate this IDNA URL to Unicode? - go

Related

How to parse Prometheus data

Illegal base64 data at input byte for seemingly valid png

Go regExRepl-Script does not change the text file

How to decompress a []byte content in gzip format that gives an error when unmarshaling

How to properly output a string in a Windows console with go?

Categories

Resources