How to access string as character value - go

http://play.golang.org/p/ZsALO8oF3W
I want to traverse a string and return the character values. How do I, not return the numeric values per each letter, and return the actual characters?
Now I am getting this
0 72 72
1 101 101
2 108 108
3 108 108
4 111 111
My desired output would be
0 h h
1 e e
2 l l
3 l l
4 o o
package main
import "fmt"
func main() {
str := "Hello"
for i, elem := range str {
fmt.Println(i, str[i], elem)
}
for elem := range str {
fmt.Println(elem)
}
}
Thanks,

For statements
For a string value, the "range" clause iterates over the Unicode code
points in the string starting at byte index 0. On successive
iterations, the index value will be the index of the first byte of
successive UTF-8-encoded code points in the string, and the second
value, of type rune, will be the value of the corresponding code
point. If the iteration encounters an invalid UTF-8 sequence, the
second value will be 0xFFFD, the Unicode replacement character, and
the next iteration will advance a single byte in the string.
For example,
package main
import "fmt"
func main() {
str := "Hello"
for _, r := range str {
c := string(r)
fmt.Println(c)
}
fmt.Println()
for i, r := range str {
fmt.Println(i, r, string(r))
}
}
Output:
H
e
l
l
o
0 72 H
1 101 e
2 108 l
3 108 l
4 111 o

package main
Use Printf to indicate you want to print characters.
import "fmt"
func main() {
str := "Hello"
for i, elem := range str {
fmt.Printf("%d %c %c\n", i, str[i], elem)
}
}

The way you are iterating over the characters in the string is workable (although str[i] and elem are the duplicative of each other). You have the right data.
In order to get it to display correctly, you just need to output with the right formatting (i.e. interpreted as a unicode character rather than an int).
Change:
fmt.Println(i, str[i], elem)
to:
fmt.Printf("%d %c %c\n", i, str[i], elem)
%c is the character represented by the corresponding Unicode code point per the Printf doc: http://golang.org/pkg/fmt/

Related

How to remove Unicode characters from byte buffer in Go?

I have a bytes.Buffer type variable which I filled with Unicode characters:
var mbuff bytes.Buffer
unicodeSource := 'کیا حال ھے؟'
for i,r := range(unicodeSource) {
mbuff.WriteRune(r)
}
Note: I iterated over a Unicode literals here, but really the source is an infinite loop of user input characters.
Now, I want to remove a Unicode character from any position in the buffer mbuff. The problem is that characters may be of variable byte sizes. So I cannot just pick out the ith byte from mbuff.String() as it might be the beginning, middle, or end of a character. This is my trivial (and horrendous) solution:
// removing Unicode character at position n
var tempString string
currChar := 0
for _, ch := range(mbuff.String()) { // iterate over Unicode chars
if currChar != n { // skip concatenating nth char
tempString += ch
}
currChar++
}
mbuff.Reset() // empty buffer
mbuff.WriteString(tempString) // write new string
This is bad in many ways. For one, I convert buffer to string, remove ith element, and write a new string back into the buffer. Too many operations. Second, I use the += operator in the loop to concatenate Unicode characters into a new string. I am using buffers in the first place exactly to avoid concatenation using += which is slow as this answer points out.
What is an efficient method to remove the ith Unicode character in a bytes.Buffer?
Also what is an efficient way to insert a Unicode character after i-1 Unicode characters (i.e. in the ith place)?
To remove the ith rune from a slice of bytes, loop through the slice counting runes. When the ith rune is found, copy the bytes following the rune down to the position of the ith rune:
func removeAtBytes(p []byte, i int) []byte {
j := 0
k := 0
for k < len(p) {
_, n := utf8.DecodeRune(p[k:])
if i == j {
p = p[:k+copy(p[k:], p[k+n:])]
}
j++
k += n
}
return p
}
This function modifies the backing array of the argument slice, but it does not allocate memory.
Use this function to remove a rune from a bytes.Buffer.
p := removeAtBytes(mbuf.Bytes(), i)
mbuf.Truncate(len(p)) // backing bytes were updated, adjust length
playground example
To remove the ith rune from a string, loop through the string counting runes. When the ith rune is found, create a string by concatenating the segment of the string before the rune with the segment of the string after the rune.
func removeAt(s string, i int) string {
j := 0 // count of runes
k := 0 // index in string of current rune
for k < len(s) {
_, n := utf8.DecodeRuneInString(s[k:])
if i == j {
return s[:k] + s[k+n:]
}
j++
k += n
}
return s
}
This function allocates a single string, the result. DecodeRuneInString is a function in the standard library unicode/utf8 package.
Taking a step back, go often works on Readers and Writers, so an alternative solution would be to use the text/transform package. You create a Transformer, attach it to a Reader and use the new Reader to produce a transformed string. For example here's a skipper:
func main() {
src := strings.NewReader("کیا حال ھے؟")
skipped := transform.NewReader(src, NewSkipper(5))
var buf bytes.Buffer
io.Copy(&buf, skipped)
fmt.Println("RESULT:", buf.String())
}
And here's the implementation:
package main
import (
"bytes"
"fmt"
"io"
"strings"
"unicode/utf8"
"golang.org/x/text/transform"
)
type skipper struct {
pos int
cnt int
}
// NewSkipper creates a text transformer which will remove the rune at pos
func NewSkipper(pos int) transform.Transformer {
return &skipper{pos: pos}
}
func (s *skipper) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
for utf8.FullRune(src) {
_, sz := utf8.DecodeRune(src)
// not enough space in the dst
if len(dst) < sz {
return nDst, nSrc, transform.ErrShortDst
}
if s.pos != s.cnt {
copy(dst[:sz], src[:sz])
// track that we stored in dst
dst = dst[sz:]
nDst += sz
}
// track that we read from src
src = src[sz:]
nSrc += sz
// on to the next rune
s.cnt++
}
if len(src) > 0 && !atEOF {
return nDst, nSrc, transform.ErrShortSrc
}
return nDst, nSrc, nil
}
func (s *skipper) Reset() {
s.cnt = 0
}
There may be bugs with this code, but hopefully you can see the idea.
The benefit of this approach is it could work on a potentially infinite amount of data without having to store all of it in memory. For example you could transform a file this way.
Edit:
Remove the ith rune in the buffer:
A: Shift all runes one location to the left (Here A is faster than B), try it on The Go Playground:
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s
}
r := []rune(s)
if runePosition >= len(r) {
return s
}
copy(r[runePosition:], r[runePosition+1:])
return string(r[:len(r)-1])
}
B: Copy to new buffer, try it on The Go Playground
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s // avoid allocation
}
r := []rune(s)
if runePosition >= len(r) {
return s // avoid allocation
}
t := make([]rune, len(r)-1) // Apply replacements to buffer.
w := copy(t, r[:runePosition])
w += copy(t[w:], r[runePosition+1:])
return string(t[:w])
}
C: Try it on The Go Playground:
package main
import (
"bytes"
"fmt"
)
func main() {
str := "hello"
fmt.Println(str)
fmt.Println(removeRuneAt(str, 1))
buf := bytes.NewBuffer([]byte(str))
fmt.Println(buf.Bytes())
buf = bytes.NewBuffer([]byte(removeRuneAt(buf.String(), 1)))
fmt.Println(buf.Bytes())
}
func removeRuneAt(s string, runePosition int) string {
if runePosition < 0 {
return s // avoid allocation
}
r := []rune(s)
if runePosition >= len(r) {
return s // avoid allocation
}
t := make([]rune, len(r)-1) // Apply replacements to buffer.
w := copy(t, r[0:runePosition])
w += copy(t[w:], r[runePosition+1:])
return string(t[0:w])
}
D: Benchmark:
A: 745.0426ms
B: 1.0160581s
for 2000000 iterations
1- Short Answer: to replace all (n) instances of a character (or even a string):
n := -1
newR := ""
old := "µ"
buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, n)))
2- For replacing the character(string) in the ith instance in the buffer, you may use:
buf = bytes.NewBuffer([]byte(Replace(buf.String(), oldString, newOrEmptyString, ith)))
See:
// Replace returns a copy of the string s with the ith
// non-overlapping instance of old replaced by new.
func Replace(s, old, new string, ith int) string {
if len(old) == 0 || old == new || ith < 0 {
return s // avoid allocation
}
i, j := 0, 0
for ; ith >= 0; ith-- {
j = strings.Index(s[i:], old)
if j < 0 {
return s // avoid allocation
}
j += i
i = j + len(old)
}
t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
w := copy(t, s[0:j])
w += copy(t[w:], new)
w += copy(t[w:], s[j+len(old):])
return string(t[0:w])
}
Try it on The Go Playground:
package main
import (
"bytes"
"fmt"
"strings"
)
func main() {
str := `How are you?µ`
fmt.Println(str)
fmt.Println(Replace(str, "µ", "", 0))
buf := bytes.NewBuffer([]byte(str))
fmt.Println(buf.Bytes())
buf = bytes.NewBuffer([]byte(Replace(buf.String(), "µ", "", 0)))
fmt.Println(buf.Bytes())
}
func Replace(s, old, new string, ith int) string {
if len(old) == 0 || old == new || ith < 0 {
return s // avoid allocation
}
i, j := 0, 0
for ; ith >= 0; ith-- {
j = strings.Index(s[i:], old)
if j < 0 {
return s // avoid allocation
}
j += i
i = j + len(old)
}
t := make([]byte, len(s)+(len(new)-len(old))) // Apply replacements to buffer.
w := copy(t, s[0:j])
w += copy(t[w:], new)
w += copy(t[w:], s[j+len(old):])
return string(t[0:w])
}
3- If you want to remove all instances of Unicode character (old string) from any position in the string, you may use:
strings.Replace(str, old, "", -1)
4- Also this works fine for removing from bytes.buffer:
strings.Replace(buf.String(), old, newR, -1)
Like so:
buf = bytes.NewBuffer([]byte(strings.Replace(buf.String(), old, newR, -1)))
Here is the complete working code (try it on The Go Playground):
package main
import (
"bytes"
"fmt"
"strings"
)
func main() {
str := `کیا حال ھے؟` //How are you?
old := `ک`
newR := ""
fmt.Println(strings.Replace(str, old, newR, -1))
buf := bytes.NewBuffer([]byte(str))
// for _, r := range str {
// buf.WriteRune(r)
// }
fmt.Println(buf.Bytes())
bs := []byte(strings.Replace(buf.String(), old, newR, -1))
buf = bytes.NewBuffer(bs)
fmt.Println(" ", buf.Bytes())
}
output:
یا حال ھے؟
[218 169 219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]
[219 140 216 167 32 216 173 216 167 217 132 32 218 190 219 146 216 159]
5- strings.Replace is very efficient, see inside:
// Replace returns a copy of the string s with the first n
// non-overlapping instances of old replaced by new.
// If old is empty, it matches at the beginning of the string
// and after each UTF-8 sequence, yielding up to k+1 replacements
// for a k-rune string.
// If n < 0, there is no limit on the number of replacements.
func Replace(s, old, new string, n int) string {
if old == new || n == 0 {
return s // avoid allocation
}
// Compute number of replacements.
if m := Count(s, old); m == 0 {
return s // avoid allocation
} else if n < 0 || m < n {
n = m
}
// Apply replacements to buffer.
t := make([]byte, len(s)+n*(len(new)-len(old)))
w := 0
start := 0
for i := 0; i < n; i++ {
j := start
if len(old) == 0 {
if i > 0 {
_, wid := utf8.DecodeRuneInString(s[start:])
j += wid
}
} else {
j += Index(s[start:], old)
}
w += copy(t[w:], s[start:j])
w += copy(t[w:], new)
start = j + len(old)
}
w += copy(t[w:], s[start:])
return string(t[0:w])
}

Read input from console in Unicode instead of UTF-8 (hex) in golang

I am trying to read a user input with bufio in console. The text can have some special characters (é, à, ♫, ╬,...).
The code look like this :
reader := bufio.NewReader(os.Stdin)
input, _ := reader.ReadString('\n')
If I type for example "é", the ReadString will read it as "c3 a9" instead of "00e9". How can I read the text input in Unicode instead of UTF-8 ? I need to use this value as a hash table key.
Thanks
Go strings are conceptually a read-only slice to a read-only bytearray. The encoding of that bytearray is not specified, but string constants will be UTF-8 and using UTF-8 in other strings is the recommended approach.
Go provides convenience functions for accessing the UTF-8 as unicode codepoints (or runes in go-speak). A range loop over a string will do the utf8 decoding for you. Converting to []rune will give you a rune slice i.e. the unicode codepoints in order. These goodies only work on UTF-8 encoded strings/bytearrays. I would strongly suggest using UTF-8 internally.
An example:
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
reader := bufio.NewReader(os.Stdin)
input, _ := reader.ReadString('\n')
println("non-range loop - bytes")
for i := 0; i < len(input); i++ {
fmt.Printf("%d %d %[2]x\n", i, input[i])
}
println("range-loop - runes")
for idx, r := range input {
fmt.Printf("%d %d %[2]c\n", idx, r)
}
println("converted to rune slice")
rs := []rune(input)
fmt.Printf("%#v\n", rs)
}
With the input: X é X
non-range loop - bytes
0 88 58
1 32 20
2 195 c3
3 169 a9
4 32 20
5 88 58
6 10 a
range-loop - runes
0 88 X
1 32
2 233 é
4 32
5 88 X
6 10
converted to rune slice
[]int32{88, 32, 233, 32, 88, 10}
Unicode and utf8 are not comparable. String can be both unicode and utf8. I learned a lot of stuff about those by reading Strings, bytes, runes and characters in Go.
To answer your question,
You can use DecodeRuneInString from unicode/utf8 package.
s := "é"
rune, _ := utf8.DecodeRuneInString(s)
fmt.Printf("%x", rune)
What DecodeRuneInString(s) does is, it returns the first utf8 encoded character (rune) in s along with that characters width in bytes. So if you want to get unicode code points of each rune in a string heres how to do it. This is the example given in the linked documentation only slightly modified.
str := "Hello, 世界"
for len(str) > 0 {
r, size := utf8.DecodeRuneInString(str)
fmt.Printf("%x %v\n", r, size)
str = str[size:]
}
Try in Playground.
Alternatively as Juergen points out you can use a range loop on the string to get runes contained in the string.
str := "Hello, 世界"
for _, rune := range(str) {
fmt.Printf("%x \n", rune)
}
Try in Playground

How can I convert from int to hex

I want to convert from int to hex in Golang.
In strconv, there is a method that converts strings to hex. Is there a similar method to get a hex string from an int?
Since hex is a Integer literal, you can ask the fmt package for a string representation of that integer, using fmt.Sprintf(), and the %x or %X format.
See playground
i := 255
h := fmt.Sprintf("%x", i)
fmt.Printf("Hex conv of '%d' is '%s'\n", i, h)
h = fmt.Sprintf("%X", i)
fmt.Printf("HEX conv of '%d' is '%s'\n", i, h)
Output:
Hex conv of '255' is 'ff'
HEX conv of '255' is 'FF'
"Hex" isn't a real thing. You can use a hexadecimal representation of a number, but there's no difference between 0xFF and 255. More info on that can be found in the docs which point out you can use 0xff to define an integer constant 255! As you mention, if you're trying to find the hexadecimal representation of an integer you could use strconv
package main
import (
"fmt"
"strconv"
)
func main() {
fmt.Println(strconv.FormatInt(255, 16))
// gives "ff"
}
Try it in the playground
If formatting some bytes, hex needs a 2 digits representation, with leading 0.
For exemple: 1 => '01', 15 => '0f', etc.
It is possible to force Sprintf to respect this :
h:= fmt.Sprintf("%02x", 14)
fmt.Println(h) // 0e
h2:= fmt.Sprintf("%02x", 231)
fmt.Println(h2) // e7
The pattern "%02x" means:
'0' force using zeros
'2' set the output size as two charactes
'x' to convert in hexadecimal
i := 4357640193405743614
h := fmt.Sprintf("%016x",i)
fmt.Printf("Decimal: %d,\nHexa: %s", i, h)
# Result
Decimal..: 4357640193405743614,
Hexa.....: 3c7972ab0ae9f1fe
Playground: https://play.golang.org/p/ndlMyBdQjmT
Sprintf is more versatile but FormatInt is faster. Choose what is better for you
func Benchmark_sprintf(b *testing.B) { // 83.8 ns/op
for n := 0; n < b.N; n++ {
_ = fmt.Sprintf("%x", n)
}
}
func Benchmark_formatint(b *testing.B) { // 28.5 ns/op
bn := int64(b.N)
for n := int64(0); n < bn; n++ {
_ = strconv.FormatInt(n, 16)
}
}
E.g. if its uint32, you can convert it to HEX as seen below =>
var p uint32
p = 4278190335
r := p >> 24 & 0xFF
g := p >> 16 & 0xFF
b := p >> 8 & 0xFF
fmt.Println(r, g, b)//255 0 0
DEMO
you can also check this online tool for ref. https://cryptii.com/pipes/integer-encoder

How to transform a string into an ASCII string like in C?

I have to do a cryptography project for my school and I choose Go for this project !
I read the doc but I only C, so it's kinda hard for me right now.
First , I needed to collect the program arguments, I did it. I stockd all arguments in a string variable like :
var text, base string = os.Args[1], os. Args[6]
Now , i need to store the ASCII number in a array of int , for exemple , in C I would done something like that :
int arr[18];
char str[18] = "Hi Stack OverFlow";
arr[i] = str[i] - 96;
So how could I do that in Go?
Thanks !
Here's an example that is similar to the other answer but avoids importing additional packages.
Create a slice of int with the length equal to the string's length. Then iterate over the string to extract each character as int and assign it to the corresponding index in the int slice. Here's code (also on the Go Playground):
package main
import "fmt"
func main() {
s := "Hi Stack OverFlow"
fmt.Println(StringToInts(s))
}
// makes a slice of int and stores each char from string
// as int in the slice
func StringToInts(s string) (intSlice []int) {
intSlice = make([]int, len(s))
for i, _ := range s {
intSlice[i] = int(s[i])
}
return
}
Output of the above program is:
[72 105 32 83 116 97 99 107 32 79 118 101 114 70 108 111 119]
The StringToInts function in the above should do what you want. Though it returns a slice (not an array) of int, it should satisfy your usecase.
My guess is that you want something like this:
package main
import (
"fmt"
"strings"
)
// transform transforms ASCII letters to numbers.
// Letters in the English (basic Latin) alphabet, both upper and lower case,
// are represented by a number between one and twenty-six. All other characters,
// including space, are represented by the number zero.
func transform(s string) []int {
n := make([]int, 0, len(s))
other := 'a' - 1
for _, r := range strings.ToLower(s) {
if 'a' > r || r > 'z' {
r = other
}
n = append(n, int(r-other))
}
return n
}
func main() {
s := "Hi Stack OverFlow"
fmt.Println(s)
n := transform(s)
fmt.Println(n)
}
Output:
Hi Stack OverFlow
[8 9 0 19 20 1 3 11 0 15 22 5 18 6 12 15 23]
Take A Tour of Go and see if you can understand what the program does.

print a value of particular byte in Array of string in golang

I am new to go lang and I want to print the individual byte of array of string
as in below code I want to print the values 'h','e','l','l','o' once at a time but I am not able to do the same.
func main() {
strslice := make([]string, 4, 5)
strslice[0] = "hello"
strslice[1] = "go"
strslice[2] = "lang"
strslice[3] = "whatsup"
for i := 0; i < len(strslice[i]); i++ {
fmt.Printf("slice is %c \n", strslice[i])
}
}
In Go, character literals are stored in a string as a variable-width sequence of UTF-8 encoded bytes. The ASCII code points (0x00..0x7F) occupy one byte. Other code points occupy two to four bytes. To print code points (characters) separately,
package main
import "fmt"
func main() {
strslice := make([]string, 5, 5)
strslice[0] = "hello"
strslice[1] = "go"
strslice[2] = "lang"
strslice[3] = "whatsup"
strslice[4] = "Hello, 世界"
for _, s := range strslice {
for _, c := range s {
fmt.Printf("%c ", c)
}
fmt.Printf("\n")
}
}
Output:
h e l l o
g o
l a n g
w h a t s u p
H e l l o , 世 界
Here's an illustration of the difference between UTF-8 encoded bytes and characters,
package main
import "fmt"
func main() {
str := "Hello, 世界"
fmt.Println("Bytes:")
for i := 0; i < len(str); i++ {
fmt.Printf("'%c' ", str[i])
}
fmt.Printf("\n")
fmt.Println("Characters:")
for _, c := range str {
fmt.Printf("'%c' ", c)
}
fmt.Printf("\n")
}
Output:
Bytes:
'H' 'e' 'l' 'l' 'o' ',' ' ' 'ä' '¸' '' 'ç' '' ''
Characters:
'H' 'e' 'l' 'l' 'o' ',' ' ' '世' '界'
References:
Unicode UTF-8 FAQ
For statements, The Go Programming Language Specification
One possible approach:
func main() {
strslice := make([]string, 4, 5)
strslice[0] = "hello"
strslice[1] = "go"
strslice[2] = "lang"
strslice[3] = "whatsup"
for i := 0; i < len(strslice); i++ {
for j := 0; j < len(strslice[i]); j++ {
fmt.Printf("slice[%d] is %c \n", i, strslice[i][j])
}
}
}
Demo. As you see, each strslice element is iterated in a nested for loop, using its own loop variable (j).
In strslice[i][j], i is used to access an element of slice (a string), and j is used to access a specific byte of this string.
Note that it's byte, not character - because that's exactly what has been asked. But check wonderful #peterSO's answer if you actually want to print out each character of the string - as there's a big chance you do. )

Resources