Go rune
last modified April 11, 2024
In this article we show how to work with runes in Golang.
$ go version go version go1.22.2 linux/amd64
We use Go version 1.22.2.
A rune is an alias to the int32
data type. It represents a Unicode
code point. A Unicode code point or code position is a numerical value that is
usually used to represent a Unicode character. The int32
is big
enough to represent the current volume of 140,000 unicode characters.
ASCII defines 128 characters, identified by the code points 0โ127. Unicode, a superset of ASCII, defines the codespace of 1,114,112 code points.
The original rune word is a letter belonging to the written language of various ancient Germanic peoples, especially the Scandinavians and the Anglo-Saxons.
A string is a sequence of bytes; more precisely, a slice of arbitrary bytes. In Go, source code is UTF8. Strings can contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes.
Go rune constants
A Go rune constant is delimited with a pair of single quote '
characters.
package main import ( "fmt" "reflect" ) func main() { a1 := '๐ฆ' var a2 = 'k' var a3 rune = '๐ฆก' fmt.Printf("%c - %s\n", a1, reflect.TypeOf(a1)) fmt.Printf("%c - %s\n", a2, reflect.TypeOf(a2)) fmt.Printf("%c - %s\n", a3, reflect.TypeOf(a3)) }
We define three rune constants.
a1 := '๐ฆ' var a2 = 'k' var a3 rune = '๐ฆก'
We have two emojis and a ASCII character. Go automatically infers a
rune
type if not specified explicitly.
fmt.Printf("%c - %s\n", a1, reflect.TypeOf(a1)) fmt.Printf("%c - %s\n", a2, reflect.TypeOf(a2)) fmt.Printf("%c - %s\n", a3, reflect.TypeOf(a3))
We print the characters and their types.
$ go run rune_constant.go ๐ฆ - int32 k - int32 ๐ฆก - int32
We can use escapes to define rune constants.
package main import ( "fmt" ) func main() { a1 := '๐งบ' a2 := '\u2665' a3 := '\U0001F3A8' fmt.Printf("%c\n", a1) fmt.Printf("%c\n", a2) fmt.Printf("%c\n", a3) }
We define three constants; two of them are using escapes.
a2 := '\u2665'
In the first case, the \u
is followed by exactly four hexadecimal
digits.
a3 := '\U0001F3A8'
In the second case, the \U
is followed by exactly eight hexadecimal
digits.
$ go run rune_escapes.go ๐งบ โฅ ๐จ
Go rune Unicode code points
The Unicode code points refer to the characters in the Unicode table.
package main import "fmt" func main() { s1 := "falcon" r1 := []rune(s1) fmt.Printf("%U\n", r1) s2 := "๐ง๐ง๐ง" r2 := []rune(s2) fmt.Printf("%U\n", r2) }
With the %U
format verb, we get the Unicode code point.
$ go run code_points.go [U+0066 U+0061 U+006C U+0063 U+006F U+006E] [U+1F427 U+1F427 U+1F427]
Go counting runes
In the following example, we count the number of runes in a string.
package main import ( "fmt" "unicode/utf8" ) func main() { msg := "one ๐" n1 := len(msg) n2 := utf8.RuneCountInString(msg) fmt.Println(n1) fmt.Println(n2) }
With the len
function, we get the number of bytes. To count the
number of runes, we use the utf8.RuneCountInString
function.
$ go run count.go 8 5
Go runes and bytes
A byte in Go is an alias for uint8
; it is an "ASCII byte".
package main import ( "fmt" ) func main() { msg := "๐ ๐ฆฅ ๐" data := []rune(msg) fmt.Println(data) data2 := []byte(msg) fmt.Println(data2) }
We have a string consisting of three emojis and two spaces. We print the slice of runes and bytes for comparison.
$ go run rune_bytes.go [128024 32 129445 32 128011] [240 159 144 152 32 240 159 166 165 32 240 159 144 139]
Go loop over runes
The for/range form iterates over runes.
package main import ( "fmt" ) func main() { msg := "one ๐ and three ๐" for idx, e := range msg { fmt.Printf("Char:%s Byte pos: %d \n", string(e), idx) } }
The example iterates over runes. It shows the character and its byte position in the string.
$ go run loop.go Char:o Byte pos: 0 Char:n Byte pos: 1 Char:e Byte pos: 2 Char: Byte pos: 3 Char:๐ Byte pos: 4 Char: Byte pos: 8 Char:a Byte pos: 9 Char:n Byte pos: 10 Char:d Byte pos: 11 Char: Byte pos: 12 Char:t Byte pos: 13 Char:h Byte pos: 14 Char:r Byte pos: 15 Char:e Byte pos: 16 Char:e Byte pos: 17 Char: Byte pos: 18 Char:๐ Byte pos: 19
In the next example, we have another way of traversing runes.
package main import ( "fmt" ) func main() { msg := "one ๐ and three ๐" data := []rune(msg) for i := 0; i < len(data); i++ { fmt.Printf("Char %c Unicode: %U, Rune pos: %d\n", data[i], data[i], i) } fmt.Println() }
We convert the string to a rune slice and then we loop over the slice with a for loop.
$ go run loop2.go Char o Unicode: U+006F, Rune pos: 0 Char n Unicode: U+006E, Rune pos: 1 Char e Unicode: U+0065, Rune pos: 2 Char Unicode: U+0020, Rune pos: 3 Char ๐ Unicode: U+1F418, Rune pos: 4 Char Unicode: U+0020, Rune pos: 5 Char a Unicode: U+0061, Rune pos: 6 Char n Unicode: U+006E, Rune pos: 7 Char d Unicode: U+0064, Rune pos: 8 Char Unicode: U+0020, Rune pos: 9 Char t Unicode: U+0074, Rune pos: 10 Char h Unicode: U+0068, Rune pos: 11 Char r Unicode: U+0072, Rune pos: 12 Char e Unicode: U+0065, Rune pos: 13 Char e Unicode: U+0065, Rune pos: 14 Char Unicode: U+0020, Rune pos: 15 Char ๐ Unicode: U+1F40B, Rune pos: 16
Source
Strings, bytes, runes and characters in Go
In this article we have worked with Go runes.
Author
List all Go tutorials.