Golang 中[]byte
, string
和[]rune
的相互转化的底层原理和剖析 在golang中有些场景经常会用到[]byte和string的相互转化,尤其是在使用json.Marshal和json.Unmarshal的时候,经常会遇到需要这种转化。
本文主要说明以下内容:
几种类型相互转化的方法和性能分析
这些类型的底层存储
代码gist
相互转化 []byte和string的相互转化 string -> []byte 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 func BenchmarkStringToByteSlice (b *testing.B) { s := genString(10000 ) b.ReportAllocs() for i := 0 ; i < b.N; i++ { bs := []byte (s) if len (bs) != len (s) { b.Error("error" ) } } } func BenchmarkStringToByteSliceUnsafe (b *testing.B) { s := genString(10000 ) b.ReportAllocs() for i := 0 ; i < b.N; i++ { l := len (s) bs := *(*[]byte )(unsafe.Pointer(&reflect.SliceHeader{ Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, Len: l, Cap: l, })) if len (bs) != len (s) { b.Error("error" ) } } }
第一种使用[]byte这种直接转化,也是我们常用的方式,第二种是使用unsafe的方式。这两种区别就在于一个是重新分配了内存,另一个是复用了原来的内存。
benchmark的结果也验证了这一点
1 2 3 4 5 6 7 8 9 10 go test -run=BenchmarkStringToByteSlice -bench=StringToByteSlice # go-demo.test goos: darwin goarch: amd64 pkg: go-demo BenchmarkStringToByteSlice-12 1164224 964 ns/op 10285 B/op 1 allocs/op BenchmarkStringToByteSliceUnsafe-12 1000000000 0.380 ns/op 0 B/op 0 allocs/op PASS ok go-demo 2.089s
[]byte -> string 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 func BenchmarkSliceByteToString (b *testing.B) { bs := genSliceByte(100 ) b.ReportAllocs() for i := 0 ; i < b.N; i++ { s := string (bs) if len (s) != len (bs) { b.Error("error" ) } } } func BenchmarkSliceByteToStringUnsafe (b *testing.B) { bs := genSliceByte(100 ) b.ReportAllocs() for i := 0 ; i < b.N; i++ { s := *(*string )(unsafe.Pointer(&bs)) if len (s) != len (bs) { b.Log("slice: " , len (bs), " string: " , len (s)) b.Error("error: " ) } } }
benchmark 结果
1 2 3 4 5 6 7 8 9 10 go test -run=BenchmarkSliceByteToString -bench=SliceByteToString # go-demo.test goos: darwin goarch: amd64 pkg: go-demo BenchmarkSliceByteToString-12 35913873 32.4 ns/op 112 B/op 1 allocs/op BenchmarkSliceByteToStringUnsafe-12 1000000000 0.253 ns/op 0 B/op 0 allocs/op PASS ok go-demo 3.796s
string和[]rune的相互转化 string和rune的相互转化其实和上面类似,主要是[]rune对应的[]byte数组长度需要计算下,这里就只贴一个[]rune到string的转化了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 func BenchmarkSliceRuneToStringUnsafe (b *testing.B) { bs := genSliceRune(100 ) s1 := string (bs) b.ReportAllocs() for i := 0 ; i < b.N; i++ { var l int for _, r := range bs { l += utf8.RuneLen(r) } s := *(*string )(unsafe.Pointer(&reflect.StringHeader{ Data: (*(*reflect.SliceHeader)(unsafe.Pointer(&bs))).Data, Len: l, })) if len (s1) != len (s) { b.Error("error" ) } } }
String和Slice的底层存储分析 1 2 3 4 5 6 7 8 9 type StringHeader struct { Data uintptr Len int } type SliceHeader struct { Data uintptr Len int Cap int }
两者类型基本一样,Slice多了一个Cap,其实这也决定了[]byte可以直接使用指针强转成string,但是反过来却不行
slice的底层存储 1 2 3 4 5 type slice struct { array unsafe.Pointer len int cap int }
以汇编的形式看下slice的底层结构 1 2 3 4 package pkgvar data = []int {1 , 2 }
1 2 3 4 5 6 7 8 9 10 go tool compile -S pkg.go go.cuinfo.packagename. SDWARFINFO dupok size=0 0x0000 70 6b 67 pkg "".data SDATA size=24 0x0000 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 ................ 0x0010 02 00 00 00 00 00 00 00 ........ rel 0+8 t=1 ""..stmp_0+0 ""..stmp_0 SNOPTRDATA size=16 0x0000 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 ................ ...
可以看到””.data 对应的是size是24(8byte的指针,len和cap各自8byte),slice里的内容是两个int对应的就是,””.stmp_0 里的内容
进一步分析data对应的二进制
data+8是02 00 ...
,对应len
data+16是02 00
对应cap
整个slice struct在内存里是紧凑分布的,所以我们可以进行指针类的强制转化,类似于c++中reinterpret_cast
string的底层结构 1 2 3 4 package pkgvar testStr = "abc"
1 2 3 4 5 6 7 go.cuinfo.packagename. SDWARFINFO dupok size=0 0x0000 70 6b 67 pkg go.string."abc" SRODATA dupok size=3 0x0000 61 62 63 abc "".testStr SDATA size=16 0x0000 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 ................ rel 0+8 t=1 go.string."abc"+0
和上文的slice很类似,size变成了16而已
Fat Pointer 像slice这种结构在c中常被称为fatpointer,感兴趣的同学可以参考Go Slices are Fat Pointers
总结
介绍了golang中string,[]byte和[]rune的转化及简单的性能分析
slice在golang中的底层存储