How It Works
Toggle navigation
How It Works
Home
About Me
Archives
Tags
一种梳理二进制局部变量的方法
2017-03-10 04:11:24
502
0
0
ochapman
#为什么要梳理临时变量 协程栈大小限制为128k,太大的局部变量有风险。 #现有的梳理方法 function_read是同事提供的检查局部变量风险的工具 function_read.tgz ``` chapmanou@dev:~/handle/var$ tar -tf function_read.tgz function_read libelf.so libdwarf.so ``` 运行方法 ``` export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD ./function_read YOUR_BINARY ``` 输出案例 ``` [FILE NAME] ./iconv.c(line 537) [FUNC NAME] libiconvlist [WARNING] var[aliasbuf] offset[11996] ``` 即检查出libiconvlist中有局部变量aliasbuf, 其偏移是11996。偏移大致等于本身的大小 ##function_read的局限 无法找出块中定义的变量,如下所示 ``` #include <stdio.h> #include <stdarg.h> void test() { { char buf[65536] = {0}; snprintf(buf, sizeof(buf), "%s", "hello"); printf("buf: %s", buf); } char buf2[65536] = {0}; snprintf(buf2, sizeof(buf2), "%s", "hello2"); printf("buf2: %s", buf2); } int main() { test(); } ``` 编译方法 ``` gcc -m32 -g3 block_scope.c -o block_scope ``` 运行检测 ``` $ ./function_read block_scope [FILE NAME] /home/chapmanou/handle/block_scope.c(line 4) [FUNC NAME] test [WARNING] var[buf2] offset[65544] ``` 从上可以看到,只能检测出buf2。那么,对于下面宏用法,就无法检测出来 ``` #define _printf(fmt, args...) \ {\ if(bRedirect)\ {\ char szTmp[1024]={0};\ snprintf(szTmp, sizeof(szTmp), fmt, ##args);\ strOutPut += (string)szTmp;\ }\ else \ {\ printf(fmt, ##args);\ }\ } ``` 当这类宏在函数被大量使用的时候,就会有风险 ``` void PrintJsonBegin(string sCallbackName, string sTopName) { stackIsFirst.push(true); if (sTopName == "error") { //printf("%s_Errback(\n", sCallbackName.c_str()); _printf("%s_Callback(\n", sCallbackName.c_str()); } else { _printf("%s_Callback(\n", sCallbackName.c_str()); } _printf("{\"%s\":{\n", sTopName.c_str()); status = 1; } ``` #funcread是什么原理 function_read是个二进制 ``` file ./function_read ./function_read: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), for GNU/Linux 2.6.4, not stripped ``` 不过,可以通过二进制的符号窥探一下 ``` objdump -C -t function_read ``` 输出了一些符号(选取片段) ``` 00000000 F *UND* 00000297 dwarf_dealloc 00000000 F *UND* 00000128 dwarf_formudata 00000000 F *UND* 00000271 dwarf_child 00000000 F *UND* 00000498 dwarf_loclist_n 0804a164 g O .data 00000000 .hidden __dso_handle 080496b0 g F .text 00000005 __libc_csu_fini 08048b83 g F .text 00000198 do_read_varoffset 00000000 F *UND* 000001b2 puts@@GLIBC_2.0 0804867c g F .init 00000000 _init 08048a9e g F .text 000000e5 get_die_line_num 08048800 g F .text 00000000 _start 00000000 F *UND* 0000005a dwarf_next_cu_header 00000000 F *UND* 00000030 dwarf_siblingof 080488a4 g F .text 00000064 check_die_tag_type 080496c0 g F .text 0000005c __libc_csu_init 00000000 F *UND* 00000030 dwarf_finish 0804a16c g *ABS* 00000000 __bss_start 080494e9 g F .text 000001c2 main ``` 像dwarf*这类函数链接是function_read.tgz压缩包里面的libdwarf.so。大致可以猜到应该dwarf这个库完成主要工作 #dwarf是什么 简单地讲,是调试信息格式 详细的可以看看[这里](https://www.ibm.com/developerworks/library/os-debugging/),这里不展开。 原理大致是dwarf会记录局部变量的相关信息(名字,文件名,函数,偏移) 既然dwarf是一种广泛应用的格式,那应该早有分析工具才对嘛 #dwarfdump dwarfdump跟objdump类似,ubuntu上安装 ``` sudo apt-get install dwarfdump ``` 用法,用它来查看,我们前面的案例的 ``` dwarfdump block_var/block_scope ``` 输出结果(片段) ``` < 2><0x00000092> DW_TAG_variable DW_AT_name "buf" DW_AT_decl_file 0x00000001 /home/chapmanou/handle/var/block_var/block_scope.c DW_AT_decl_line 0x00000005 DW_AT_type <0x000000b3> DW_AT_location len 0x0004: 91ecff77: DW_OP_fbreg -131092 < 2><0x000000a2> DW_TAG_variable DW_AT_name "buf2" DW_AT_decl_file 0x00000001 /home/chapmanou/handle/var/block_var/block_scope.c DW_AT_decl_line 0x00000006 DW_AT_type <0x000000b3> DW_AT_location len 0x0004: 91ecff7b: DW_OP_fbreg -65556 ``` 从上可以看到,块中的变量是可以输出的。变量的名字(DW_AT_name字段),文件名(DW_AT_decl_file)等等信息都有了,那么,直接解析上面输出的结果,可以实现快速梳理二进制局部变量。当然通过dwarf提供的api接口,也可以实现目标。 #解析darfdump结果 从上面dwarfdump看,输出的结果算是比较“结构化”,解析并不难。思路是 1. 先分解为块(block)每个块以"< "开始和结束 2. 每个块中是"DW_TAG", 因为目标是找变量,因此过滤出"DW_TAG_variable" 3. 对DW_TAG_variable类型进行解析,并只对DW_AT_location的进行过滤 Go实现代码如下 ``` /* * File Name: dwarfdump_var.go * Description: * Created: 2017-03-09 */ package main import ( "bytes" "errors" "fmt" "io/ioutil" "os" "strconv" ) type tagBlock []byte type tagBlocks []tagBlock func SplitBlocks(file string) tagBlocks { buf, err := ioutil.ReadFile(file) if err != nil { panic("ReadFile() failed") } var blockStart int var blockEnd int var nextStart int nextStart = 0 tags := make(tagBlocks, 0) var absBlkStart int var absBlkEnd int for { blockStart = bytes.Index(buf[nextStart:], []byte("< ")) if blockStart != -1 { blockEnd = bytes.Index(buf[nextStart+blockStart+1:], []byte("< ")) if blockEnd == -1 { break } absBlkStart = nextStart + blockStart absBlkEnd = absBlkStart + blockEnd + 1 nextStart = nextStart + blockStart + blockEnd + 1 } else { break } tag := buf[absBlkStart:absBlkEnd] //fmt.Printf("tag: %s\n", string(tag)) tags = append(tags, tag) } return tags } type Tag struct { name string file string line int64 offset int } func ParseBlock(blk tagBlock) (Tag, error) { var t Tag tagVarIdx := bytes.Index(blk, []byte("DW_TAG_variable")) if tagVarIdx == -1 { return t, errors.New("is not variale") } //name nameIdx := bytes.Index(blk, []byte("DW_AT_name")) if nameIdx == -1 { return t, errors.New("name not found") } else { nameIdxStart := bytes.Index(blk[nameIdx:], []byte("\"")) nameIdxEnd := bytes.Index(blk[nameIdx+nameIdxStart:], []byte("\n")) //fmt.Printf("nameIdx: %d, nameIdxStart: %d content: %s\n", nameIdx, nameIdxStart, blk[nameIdx+nameIdxStart:]) t.name = string(blk[nameIdx+nameIdxStart+1 : nameIdx+nameIdxStart+nameIdxEnd-1]) } //file fileIdx := bytes.Index(blk, []byte("DW_AT_decl_file")) if fileIdx == -1 { return t, errors.New("file not found") } else { fileIdxEnd := bytes.Index(blk[fileIdx:], []byte("\n")) fileIdxStart := bytes.LastIndex(blk[fileIdx:fileIdx+fileIdxEnd], []byte(" ")) t.file = string(blk[fileIdx+fileIdxStart+1 : fileIdx+fileIdxEnd]) } //line lineIdx := bytes.Index(blk, []byte("DW_AT_decl_line")) if lineIdx == -1 { return t, errors.New("line not found") } else { lineIdxEnd := bytes.Index(blk[lineIdx:], []byte("\n")) lineIdxStart := bytes.LastIndex(blk[lineIdx:lineIdx+lineIdxEnd], []byte("0x")) line := blk[lineIdx+lineIdxStart : lineIdx+lineIdxEnd] //fmt.Printf("lineIdx: %d, lineIdxStart: %d content: %s\n", lineIdx, lineIdxStart, blk[lineIdx+lineIdxStart:]) lineNum, err := strconv.ParseInt(string(line), 0, 32) if err != nil { return t, fmt.Errorf("line error: %s", err) } else { t.line = lineNum } } //location locIdx := bytes.Index(blk, []byte("DW_AT_location")) if locIdx == -1 { return t, errors.New("location not found") } else { locIdxStart := bytes.Index(blk[locIdx:], []byte("-")) if locIdxStart == -1 { return t, errors.New("locIdx not found") } //fmt.Printf("locIdx: %d, locIdxStart: %d content: %#v\n", locIdx, locIdxStart, blk[locIdx+locIdxStart:]) locIdxEnd := bytes.Index(blk[locIdx+locIdxStart:], []byte("\n")) if locIdxEnd == -1 { return t, errors.New("locIdxEnd not found") } loc := blk[locIdx+locIdxStart+1 : locIdx+locIdxStart+locIdxEnd] locNum, err := strconv.Atoi(string(loc)) if err != nil { return t, fmt.Errorf("locNum: %s", err) } t.offset = locNum } return t, nil } func main() { blks := SplitBlocks(os.Args[1]) for _, b := range blks { t, err := ParseBlock(b) if err != nil { //fmt.Printf("block: %s, err: %s\n", string(b), err) continue } //fmt.Printf("block %d %s\n", i, string(b)) //csv format fmt.Printf("%s,%d,%s:%d\n", t.name, t.offset, t.file, t.line) } } ``` 从location获取的方法,缺点也很明显,就是多个变量在一起的时候,偏移也会变大 #dwarfdump api接口解析 Golang的标准库中有dwarf库实现 https://golang.org/pkg/debug/dwarf/ 编写了一个解析案例,如果要关联所有的信息,需要对dwarf内部的细节有更多的了解。暂不展开。
Pre:
protobuf源码-编码的过程
Next:
一种梳理二进制局部变量的方法
0
likes
502
Weibo
Wechat
Tencent Weibo
QQ Zone
RenRen
Table of content