-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdataset-gen-prompt.txt
2681 lines (2371 loc) · 132 KB
/
dataset-gen-prompt.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
You're a helpful assistant and your goal is to come up with high-level user queries in English that a radare2 user would use radare2 for, and the commands that would satisfy those queries. I'll provide you with some documentation and you need to respond in a certain way explained later.
------
<documentation>
Command Syntax
Command Format:
['][.][N][cmd[,?*j]][~filter][@[@[@]]addr!size][|>pipe] ; another command
Components Explained:
' (quote) - Ignore special characters (like in "?e hi > ho")
. (dot) - Interpret output of command or run script '.?'
N - Repeat prefix operator, runs command N times
cmd - The actual command to run
Output Modifiers:
, - CSV format
j - JSON format
* - r2cmds format
~filter - Output filter modifier
@ - Temporal seek
@@ - Foreach operator
@@@ - Advanced foreach operator (with addr+size on items)
addr!size - Address and size specification
| - Pipe to system shell
> - Redirect to file
; - Command separator
When you see text like this:
[0x00000000]>
this is radare2's shell-like prompt. User enters commands after that.
People who use Vim daily and are familiar with its commands will find themselves at home. You will see this format used throughout the book. Commands are identified by a single case-sensitive character [a-zA-Z].
As an exercise for the reader you may want to read the following lines and understand the purpose of the syntax with examples.
ds ; call the debugger's 'step' command
px 200 @ esp ; show 200 hex bytes at esp
pc > file.c ; dump buffer as a C byte array to file.c
wx 90 @@ sym.* ; write a nop on every symbol
pd 2000 | grep eax ; grep opcodes that use the 'eax' register
px 20 ; pd 3 ; px 40 ; multiple commands in a single line
Repetitions
To repeatedly execute a command, prefix the command with a number:
px # run px
3px # run px 3 times
An useful way to use this command is to draw the classic donut animation with 100?3d or perform an specific amount of steps when debugging like: 10ds (that will do the same as ds 10
Shell Execution
The ! prefix is used to execute a command in shell context. If you want to use the cmd callback from the I/O plugin you must prefix with :.
Note that a single exclamation mark will run the command and print the output through the RCons API. This means that the execution will be blocking and not interactive. Use double exclamation marks -- !! -- to run a standard system call.
All the socket, filesystem and execution APIs can be restricted with the cfg.sandbox configuration variable.
Environment
When executing system commands from radare2, we will get some special environment variables that can be used to script radare2 from shellscripts without the need to depend on r2pipe.
The environment variables can be listed and modified with the % command.
Note that the environment variables will be different depending on how we execute code with radare2:
runtime environment (R2CORE tells where the instance is in memory)
debugger environment (as clean as described in a rarun2 profile)
spawning processes with ! (get some context details, like offset, file, ..)
r2pipe environment (R2PIPE_IN and R2PIPE_OUT with the pipe descriptors)
[0x00000000]> !export | grep R2_
export R2_ARCH="arm"
export R2_BITS="64"
export R2_BSIZE="256"
export R2_COLOR="0"
export R2_DEBUG="0"
export R2_ENDIAN="little"
export R2_FILE="malloc://512"
export R2_IOVA="1"
export R2_OFFSET="0"
export R2_SIZE="512"
export R2_UTF8="1"
export R2_XOFFSET="0x00000000"
[0x00000000]>
We can also find the location in memory of the RCore instance in the current process. This can be useful when injecting code inside radare2 (like when injecting r2 via r2frida or using native api calls on live runtimes without having to pass pointers or depend on RLang setups) We may learn more about this in the scripting chapter.
[0x00000000]> %~R2
R2CORE=0x140018000
[0x00000000]>
Pipes
The standard UNIX pipe | is also available in the radare2 shell. You can use it to filter the output of an r2 command with any shell program that reads from stdin, such as grep, less, wc. If you do not want to spawn anything, or you can't, or the target system does not have the basic UNIX tools you need (Windows or embedded users), you can also use the built-in grep (~).
Filtering
The ~ is a special character that is used by the console filtering features. It can be chained multiple times to perform multiple filters like grepping, xml or json indentation, head/tail operations, select column of output, etc
You may find that ~ is very similar to using the unix | pipe, but this
As you may expect appending a question mark will display the help message.
[0x00000000]>[0x00000000]> ~?
Usage: [command]~[modifier][word,word][endmodifier][[column]][:line]
modifier:
| & all words must match to grep the line
| $[n] sort numerically / alphabetically the Nth column
| $ sort in alphabetic order
| $$ sort + uniq
| $! inverse alphabetical sort
| $!! reverse the lines (like the `tac` tool)
| , token to define another keyword
| + case insensitive grep (grep -i)
| * zoom level
| ^ words must be placed at the beginning of line
| ! negate grep
| ? count number of matching lines
| ?. count number chars
| ?? show this help message
| ?ea convert text into seven segment style ascii art
| :s..e show lines s-e
| .. internal 'less'
| ... internal 'hud' (like V_)
| .... internal 'hud' in one line
| :) parse C-like output from decompiler
| :)) code syntax highlight
| <50 perform zoom to the given text width on the buffer
| <> xml indentation
| {: human friendly indentation (yes, it's a smiley)
| {:.. less the output of {:
| {:... hud the output of {:
| {} json indentation
| {}.. less json indentation
| {}... hud json indentation
| {=} gron-like output (key=value)
| {path} json path grep
endmodifier:
| $ words must be placed at the end of line
column:
| [n] show only column n
| [n-m] show column n to m
| [n-] show all columns starting from column n
| [i,j,k] show the columns i, j and k
Examples:
| i~:0 show first line of 'i' output
| i~:-2 show the second to last line of 'i' output
| i~:0..3 show first three lines of 'i' output
| pd~mov disasm and grep for mov
| pi~[0] show only opcode
| i~0x400$ show lines ending with 0x400
The ~ character enables internal grep-like function used to filter output of any command:
pd 20~call ; disassemble 20 instructions and grep output for 'call'
Additionally, you can grep either for columns or for rows:
pd 20~call:0 ; get first row
pd 20~call:1 ; get second row
pd 20~call[0] ; get first column
pd 20~call[1] ; get second column
Or even combine them:
pd 20~call:0[0] ; grep the first column of the first row matching 'call'
This internal grep function is a key feature for scripting radare2, because it can be used to iterate over a list of offsets or data generated by disassembler, ranges, or any other command. Refer to the loops section (iterators) for more information.
Output Evaluation
The . character at the begining of the command is used to interpret or evaluate the output of the command you execute.
The purpose of this syntax rings some bells when you use the * suffix or the -r flag in all the r2 shell commands.
For example, we can load the symbols from a binary in disk by running the following line:
> .!rabin2 -rs $R2_FILE
Temporal Seek
The @ character is used to specify a temporary offset at which the command to its left will be executed. The original seek position in a file is then restored.
For example, pd 5 @ 0x100000fce to disassemble 5 instructions at address 0x100000fce.
Most of the commands offer autocompletion support using <TAB> key, for example seek or flags commands.
It offers autocompletion using all possible values, taking flag names in this case.
The command history can be interactively inspected with !~....
To extend the autocompletion support to handle more commands or enable autocompletion to your own commands defined in core, I/O plugins you must use the !!! command.
Expressions
Expressions are mathematical representations of 64-bit numerical values. These are handled anywhere RNum API is used, the api takes a string that can contain multiple math operations with different numeric bases and operations and computes the resulting value.
They can be displayed in different formats, be compared or used with all commands accepting numeric arguments. Expressions can use traditional arithmetic operations, as well as binary and boolean ones.
To evaluate mathematical expressions prepend them with command ?:
[0xb7f9d810]> ?vi 0x8048000
134512640
[0xv7f9d810]> ?vi 0x8048000+34
134512674
[0xb7f9d810]> ?vi 0x8048000+0x34
134512692
[0xb7f9d810]> ? 1+2+3-4*3
hex 0xfffffffffffffffa
octal 01777777777777777777772
unit 17179869184.0G
segment fffff000:0ffa
int64 -6
string "\xfa\xff\xff\xff\xff\xff\xff\xff"
binary 0b1111111111111111111111111111111111111111111111111111111111111010
fvalue: -6.0
float: nanf
double: nan
trits 0t11112220022122120101211020120210210211201
Supported arithmetic operations are:
+ addition
- subtraction
* multiplication
/ division
% modulus
& binary and
| binary or
^ binary xor
>> shift right
<< shift left
For example, using the ?vi command we the the integer (base10) value resulting it from evaluating the given math expression
[0x00000000]> ?vi 1+2+3
6
To use of binary OR should quote the whole command to avoid executing the | pipe:
[0x00000000]> "? 1 | 2"
hex 0x3
octal 03
unit 3
segment 0000:0003
int32 3
string "\x03"
binary 0b00000011
fvalue: 2.0
float: 0.000000f
double: 0.000000
trits 0t10
Note that on modern r2 versions you can use the single quote at the begining of the command to avoid evaluating the rest of the expression:
'? 1 | 2 is the equivalent to "? 1 | 2"
Numbers can be displayed in several formats:
0x033 : hexadecimal can be displayed
3334 : decimal
sym.fo : resolve flag offset
10K : KBytes 10*1024
10M : MBytes 10*1024*1024
You can also use variables and seek positions to build complex expressions.
Use the ?$? command to list all the available commands or read the refcard chapter of this book.
$$ here (the current virtual seek)
$l opcode length
$s file size
$j jump address (e.g. jmp 0x10, jz 0x10 => 0x10)
$f jump fail address (e.g. jz 0x10 => next instruction)
$m opcode memory reference (e.g. mov eax,[0x10] => 0x10)
$b block size
Some more examples:
[0x4A13B8C0]> ? $m + $l
140293837812900 0x7f98b45df4a4 03771426427372244 130658.0G 8b45d000:04a4 140293837812900 10100100 140293837812900.0 -0.000000
Disassembling the very next instruction after the current one
[0x4A13B8C0]> pd 1 @ +$l
0x4A13B8C2 call 0x4a13c000
Code Analysis
Code analysis is the process of finding patterns, combining information from different sources and process the disassembly of the program in multiple ways in order to understand and extract more details of the logic behind the code.
Radare2 has many different code analysis techniques implemented under different commands and configuration options, and it's important to understand what they do and how that affects in the final results before going for the default-standard aaaaa way because on some cases this can be too slow or just produce false positive results.
As long as the whole functionalities of r2 are available with the API as well as using commands. This gives you the ability to implement your own analysis loops using any programming language, even with r2 oneliners, shellscripts, or analysis or core native plugins.
The analysis will show up the internal data structures to identify basic blocks, function trees and to extract opcode-level information.
The most common radare2 analysis command sequence is aa, which stands for "analyze all". That all is referring to all symbols and entry-points. If your binary is stripped you will need to use other commands like aaa, aab, aar, aac or so.
Take some time to understand what each command does and the results after running them to find the best one for your needs.
[0x08048440]> aa
[0x08048440]> pdf @ main
; DATA XREF from 0x08048457 (entry0)
/ (fcn) fcn.08048648 141
| ;-- main:
| 0x08048648 8d4c2404 lea ecx, [esp+0x4]
| 0x0804864c 83e4f0 and esp, 0xfffffff0
| 0x0804864f ff71fc push dword [ecx-0x4]
| 0x08048652 55 push ebp
| ; CODE (CALL) XREF from 0x08048734 (fcn.080486e5)
| 0x08048653 89e5 mov ebp, esp
| 0x08048655 83ec28 sub esp, 0x28
| 0x08048658 894df4 mov [ebp-0xc], ecx
| 0x0804865b 895df8 mov [ebp-0x8], ebx
| 0x0804865e 8975fc mov [ebp-0x4], esi
| 0x08048661 8b19 mov ebx, [ecx]
| 0x08048663 8b7104 mov esi, [ecx+0x4]
| 0x08048666 c744240c000. mov dword [esp+0xc], 0x0
| 0x0804866e c7442408010. mov dword [esp+0x8], 0x1 ; 0x00000001
| 0x08048676 c7442404000. mov dword [esp+0x4], 0x0
| 0x0804867e c7042400000. mov dword [esp], 0x0
| 0x08048685 e852fdffff call sym..imp.ptrace
| sym..imp.ptrace(unk, unk)
| 0x0804868a 85c0 test eax, eax
| ,=< 0x0804868c 7911 jns 0x804869f
| | 0x0804868e c70424cf870. mov dword [esp], str.Don_tuseadebuguer_ ; 0x080487cf
| | 0x08048695 e882fdffff call sym..imp.puts
| | sym..imp.puts()
| | 0x0804869a e80dfdffff call sym..imp.abort
| | sym..imp.abort()
| `-> 0x0804869f 83fb02 cmp ebx, 0x2
|,==< 0x080486a2 7411 je 0x80486b5
|| 0x080486a4 c704240c880. mov dword [esp], str.Youmustgiveapasswordforusethisprogram_ ; 0x0804880c
|| 0x080486ab e86cfdffff call sym..imp.puts
|| sym..imp.puts()
|| 0x080486b0 e8f7fcffff call sym..imp.abort
|| sym..imp.abort()
|`--> 0x080486b5 8b4604 mov eax, [esi+0x4]
| 0x080486b8 890424 mov [esp], eax
| 0x080486bb e8e5feffff call fcn.080485a5
| fcn.080485a5() ; fcn.080484c6+223
| 0x080486c0 b800000000 mov eax, 0x0
| 0x080486c5 8b4df4 mov ecx, [ebp-0xc]
| 0x080486c8 8b5df8 mov ebx, [ebp-0x8]
| 0x080486cb 8b75fc mov esi, [ebp-0x4]
| 0x080486ce 89ec mov esp, ebp
| 0x080486d0 5d pop ebp
| 0x080486d1 8d61fc lea esp, [ecx-0x4]
\ 0x080486d4 c3 ret
In this example, we analyze the whole file (aa) and then print disassembly of the main() function (pdf). The aa command belongs to the family of auto analysis commands and performs only the most basic auto analysis steps. In radare2 there are many different types of the auto analysis commands with a different analysis depth, including partial emulation: aa, aaa, aab, aaaa, ... There is also a mapping of those commands to the r2 CLI options: r2 -A, r2 -AA, and so on.
It is a common sense that completely automated analysis can produce non sequitur results, thus radare2 provides separate commands for the particular stages of the analysis allowing fine-grained control of the analysis process. Moreover, there is a treasure trove of configuration variables for controlling the analysis outcomes. You can find them in anal.* and emu.* cfg variables' namespaces.
Analyze functions
One of the most important "basic" analysis commands is the set of af subcommands. af means "analyze function". Using this command you can either allow automatic analysis of the particular function or perform completely manual one.
[0x00000000]> af?
Usage: af
| af ([name]) ([addr]) analyze functions (start at addr or $$)
| afr ([name]) ([addr]) analyze functions recursively
| af+ addr name [type] [diff] hand craft a function (requires afb+)
| af- [addr] clean all function analysis data (or function at addr)
| afa analyze function arguments in a call (afal honors dbg.funcarg)
| afb+ fcnA bbA sz [j] [f] ([t]( [d])) add bb to function @ fcnaddr
| afb[?] [addr] List basic blocks of given function
| afbF([0|1]) Toggle the basic-block 'folded' attribute
| afB 16 set current function as thumb (change asm.bits)
| afC[lc] ([addr])@[addr] calculate the Cycles (afC) or Cyclomatic Complexity (afCc)
| afc[?] type @[addr] set calling convention for function
| afd[addr] show function + delta for given offset
| afF[1|0|] fold/unfold/toggle
| afi [addr|fcn.name] show function(s) information (verbose afl)
| afj [tableaddr] [count] analyze function jumptable
| afl[?] [ls*] [fcn name] list functions (addr, size, bbs, name) (see afll)
| afm name merge two functions
| afM name print functions map
| afn[?] name [addr] rename name for function at address (change flag too)
| afna suggest automatic name for current offset
| afo[?j] [fcn.name] show address for the function name or current offset
| afs[!] ([fcnsign]) get/set function signature at current address (afs! uses cfg.editor)
| afS[stack_size] set stack frame size for function at current address
| afsr [function_name] [new_type] change type for given function
| aft[?] type matching, type propagation
| afu addr resize and analyze function from current address until addr
| afv[absrx]? manipulate args, registers and variables in function
| afx list function references
You can use afl to list the functions found by the analysis.
There are a lot of useful commands under afl such as aflj, which lists the function in JSON format and aflm, which lists the functions in the syntax found in makefiles.
There's also afl=, which displays ASCII-art bars with function ranges.
You can find the rest of them under afl?.
Some of the most challenging tasks while performing a function analysis are merge, crop and resize. As with other analysis commands you have two modes: semi-automatic and manual. For the semi-automatic, you can use afm <function name> to merge the current function with the one specified by name as an argument, aff to readjust the function after analysis changes or function edits, afu <address> to do the resize and analysis of the current function until the specified address.
Apart from those semi-automatic ways to edit/analyze the function, you can hand craft it in the manual mode with af+ command and edit basic blocks of it using afb commands. Before changing the basic blocks of the function it is recommended to check the already presented ones:
[0x00003ac0]> afb
0x00003ac0 0x00003b7f 01:001A 191 f 0x00003b7f
0x00003b7f 0x00003b84 00:0000 5 j 0x00003b92 f 0x00003b84
0x00003b84 0x00003b8d 00:0000 9 f 0x00003b8d
0x00003b8d 0x00003b92 00:0000 5
0x00003b92 0x00003ba8 01:0030 22 j 0x00003ba8
0x00003ba8 0x00003bf9 00:0000 81
Hand craft function
Before we start, let's prepare a binary file first. Write in example.c:
int code_block()
{
int result = 0;
for(int i = 0; i < 10; ++i)
result += 1;
return result;
}
then compile with gcc -c example.c -m32 -O0 -fno-pie, and open the object file example.o with radare2.
Since we haven't analyzed it yet, the pdf command will not print out the disassembly here:
$ r2 example.o
[0x08000034]> pdf
p: Cannot find function at 0x08000034
[0x08000034]> pd
;-- section..text:
;-- .text:
;-- code_block:
;-- eip:
0x08000034 55 push ebp ; [01] -r-x section size 41 named .text
0x08000035 89e5 mov ebp, esp
0x08000037 83ec10 sub esp, 0x10
0x0800003a c745f8000000. mov dword [ebp - 8], 0
0x08000041 c745fc000000. mov dword [ebp - 4], 0
,=< 0x08000048 eb08 jmp 0x8000052
.--> 0x0800004a 8345f801 add dword [ebp - 8], 1
:| 0x0800004e 8345fc01 add dword [ebp - 4], 1
:`-> 0x08000052 837dfc09 cmp dword [ebp - 4], 9
`==< 0x08000056 7ef2 jle 0x800004a
0x08000058 8b45f8 mov eax, dword [ebp - 8]
0x0800005b c9 leave
0x0800005c c3 ret
our goal is to handcraft a function with the following structure
analyze_one
create a function at 0x8000034 named code_block:
[0x8000034]> af+ 0x8000034 code_block
In most cases, we use jump or call instructions as code block boundaries. so the range of first block is from 0x08000034 push ebp to 0x08000048 jmp 0x8000052. use afb+ command to add it.
[0x08000034]> afb+ code_block 0x8000034 0x800004a-0x8000034 0x8000052
note that the basic syntax of afb+ is afb+ function_address block_address block_size [jump] [fail]. the final instruction of this block points to a new address(jmp 0x8000052), thus we add the address of jump target (0x8000052) to reflect the jump info.
the next block (0x08000052 ~ 0x08000056) is more likeyly an if conditional statement which has two branches. It will jump to 0x800004a if jle-less or equal, otherwise (the fail condition) jump to next instruction -- 0x08000058.:
[0x08000034]> afb+ code_block 0x8000052 0x8000058-0x8000052 0x800004a 0x8000058
follow the control flow and create the remaining two blocks (two branches) :
[0x08000034]> afb+ code_block 0x800004a 0x8000052-0x800004a 0x8000052
[0x08000034]> afb+ code_block 0x8000058 0x800005d-0x8000058
check our work:
[0x08000034]> afb
0x08000034 0x0800004a 00:0000 22 j 0x08000052
0x0800004a 0x08000052 00:0000 8 j 0x08000052
0x08000052 0x08000058 00:0000 6 j 0x0800004a f 0x08000058
0x08000058 0x0800005d 00:0000 5
[0x08000034]> VV
handcraft_one
There are two very important commands for this: afc and afB. The latter is a must-know command for some platforms like ARM. It provides a way to change the "bitness" of the particular function. Basically, allowing to select between ARM and Thumb modes.
afc on the other side, allows to manually specify function calling convention. You can find more information on its usage in calling_conventions.
Recursive analysis
There are 5 important program wide half-automated analysis commands:
aab - perform basic-block analysis ("Nucleus" algorithm)
aac - analyze function calls from one (selected or current function)
aaf - analyze all function calls
aar - analyze data references
aad - analyze pointers to pointers references
Those are only generic semi-automated reference searching algorithms. Radare2 provides a wide choice of manual references' creation of any kind. For this fine-grained control you can use ax commands.
Usage: ax[?d-l*] # see also 'afx?'
| ax list refs
| ax* output radare commands
| ax addr [at] add code ref pointing to addr (from curseek)
| ax- [at] clean all refs/refs from addr
| ax-* clean all refs/refs
| axc addr [at] add generic code ref
| axC addr [at] add code call ref
| axg [addr] show xrefs graph to reach current function
| axg* [addr] show xrefs graph to given address, use .axg*;aggv
| axgj [addr] show xrefs graph to reach current function in json format
| axd addr [at] add data ref
| axq list refs in quiet/human-readable format
| axj list refs in json format
| axF [flg-glob] find data/code references of flags
| axm addr [at] copy data/code references pointing to addr to also point to curseek (or at)
| axt [addr] find data/code references to this address
| axf [addr] find data/code references from this address
| axv [addr] list local variables read-write-exec references
| ax. [addr] find data/code references from and to this address
| axff[j] [addr] find data/code references from this function
| axs addr [at] add string ref
The most commonly used ax commands are axt and axf, especially as a part of various r2pipe scripts. Lets say we see the string in the data or a code section and want to find all places it was referenced from, we should use axt:
[0x0001783a]> pd 2
;-- str.02x:
; STRING XREF from 0x00005de0 (sub.strlen_d50)
; CODE XREF from 0x00017838 (str.._s_s_s + 7)
0x0001783a .string "%%%02x" ; len=7
;-- str.src_ls.c:
; STRING XREF from 0x0000541b (sub.free_b04)
; STRING XREF from 0x0000543a (sub.__assert_fail_41f + 27)
; STRING XREF from 0x00005459 (sub.__assert_fail_41f + 58)
; STRING XREF from 0x00005f9e (sub._setjmp_e30)
; CODE XREF from 0x0001783f (str.02x + 5)
0x00017841 .string "src/ls.c" ; len=9
[0x0001783a]> axt
sub.strlen_d50 0x5de0 [STRING] lea rcx, str.02x
(nofunc) 0x17838 [CODE] jae str.02x
There are also some useful commands under axt. Use axtg to generate radare2 commands which will help you to create graphs according to the XREFs.
[0x08048320]> s main
[0x080483e0]> axtg
agn 0x8048337 "entry0 + 23"
agn 0x80483e0 "main"
age 0x8048337 0x80483e0
Use axt* to split the radare2 commands and set flags on those corresponding XREFs.
Also under ax is axg, which finds the path between two points in the file by showing an XREFs graph to reach the location or function. For example:
:> axg sym.imp.printf
- 0x08048a5c fcn 0x08048a5c sym.imp.printf
- 0x080483e5 fcn 0x080483e0 main
- 0x080483e0 fcn 0x080483e0 main
- 0x08048337 fcn 0x08048320 entry0
- 0x08048425 fcn 0x080483e0 main
Use axg* to generate radare2 commands which will help you to create graphs using agn and age commands, according to the XREFs.
Apart from predefined algorithms to identify functions there is a way to specify a function prelude with a configuration option anal.prelude. For example, like e anal.prelude = 0x554889e5 which means
push rbp
mov rbp, rsp
on x86_64 platform. It should be specified before any analysis commands.
Configuration
Radare2 allows to change the behavior of almost any analysis stages or commands. There are different kinds of the configuration options:
Flow control
Basic blocks control
References control
IO/Ranges
Jump tables analysis control
Platform/target specific options
Control flow configuration
Two most commonly used options for changing the behavior of control flow analysis in radare2 are anal.hasnext and anal.jmp.after. The first one allows forcing radare2 to continue the analysis after the end of the function, even if the next chunk of the code wasn't called anywhere, thus analyzing all of the available functions. The latter one allows forcing radare2 to continue the analysis even after unconditional jumps.
In addition to those we can also set anal.jmp.indir to follow the indirect jumps, continuing analysis; anal.pushret to analyze push ...; ret sequence as a jump; anal.nopskip to skip the NOP sequences at a function beginning.
For now, radare2 also allows you to change the maximum basic block size with anal.bb.maxsize option . The default value just works in most use cases, but it's useful to increase that for example when dealing with obfuscated code. Beware that some of basic blocks control options may disappear in the future in favor of more automated ways to set those.
For some unusual binaries or targets, there is an option anal.in=? that will only analyze executable regions by default, but you can force a different section or specify different boundaries. Radare2 doesn't try to analyze data sections as a code by default. But in some cases - malware, packed binaries, binaries for embedded systems, it is often a case. Thus - this option.
Reference control
The most crucial options that change the analysis results drastically. Sometimes some can be disabled to save the time and memory when analyzing big binaries.
anal.jmp.ref - to allow references creation for unconditional jumps
anal.jmp.cref - same, but for conditional jumps
anal.datarefs - to follow the data references in code
anal.refstr - search for strings in data references
anal.strings - search for strings and creating references
Note that strings references control is disabled by default because it increases the analysis time.
Analysis ranges
There are a few options for this:
anal.limits - enables the range limits for analysis operations
anal.from - starting address of the limit range
anal.to - the corresponding end of the limit range
anal.in - specify search boundaries for analysis. You can set it to io.maps, io.sections.exec, dbg.maps and many more. For example:
To analyze a specific memory map with anal.from and anal.to, set anal.in = dbg.maps.
To analyze in the boundaries set by anal.from and anal.to, set anal.in=range.
To analyze in the current mapped segment or section, you can put anal.in=bin.segment or anal.in=bin.section, respectively.
To analyze in the current memory map, specify anal.in=dbg.map.
To analyze in the stack or heap, you can set anal.in=dbg.stack or anal.in=dbg.heap.
To analyze in the current function or basic block, you can specify anal.in=anal.fcn or anal.in=anal.bb.
Please see e anal.in=?? for the complete list.
Jump tables
Jump tables are one of the trickiest targets in binary reverse engineering. There are hundreds of different types, the end result depending on the compiler/linker and LTO stages of optimization. Thus radare2 allows enabling some experimental jump tables detection algorithms using anal.jmp.tbl option. Eventually, algorithms moved into the default analysis loops once they start to work on every supported platform/target/testcase. Two more options can affect the jump tables analysis results too:
anal.jmp.indir - follow the indirect jumps, some jump tables rely on them
anal.datarefs - follow the data references, some jump tables use those
Platform specific controls
There are two common problems when analyzing embedded targets: ARM/Thumb detection and MIPS GP value. In case of ARM binaries radare2 supports some auto-detection of ARM/Thumb mode switches, but beware that it uses partial ESIL emulation, thus slowing the analysis process. If you will not like the results, particular functions' mode can be overridden with afB command.
The MIPS GP problem is even trickier. It is a basic knowledge that GP value can be different not only for the whole program, but also for some functions. To partially solve that there are options anal.gp and anal.gpfixed. The first one sets the GP value for the whole program or particular function. The latter allows to "constantify" the GP value if some code is willing to change its value, always resetting it if the case. Those are heavily experimental and might be changed in the future in favor of more automated analysis.
Visuals
One of the easiest way to see and check the changes of the analysis commands and variables is to perform a scrolling in a Vv special visual mode, allowing functions preview:
vv
When we want to check how analysis changes affect the result in the case of big functions, we can use minimap instead, allowing to see a bigger flow graph on the same screen size. To get into the minimap mode type VV then press p twice:
vv2
This mode allows you to see the disassembly of each node separately, just navigate between them using Tab key.
Analysis hints
It is not an uncommon case that analysis results are not perfect even after you tried every single configuration option. This is where the "analysis hints" radare2 mechanism comes in. It allows to override some basic opcode or meta-information properties, or even to rewrite the whole opcode string. These commands are located under ah namespace:
Usage: ah[lba-] Analysis Hints
| ah? show this help
| ah? offset show hint of given offset
| ah list hints in human-readable format
| ah. list hints in human-readable format from current offset
| ah- remove all hints
| ah- offset [size] remove hints at given offset
| ah* offset list hints in radare commands format
| aha ppc @ 0x42 force arch ppc for all addrs >= 0x42 or until the next hint
| aha 0 @ 0x84 disable the effect of arch hints for all addrs >= 0x84 or until the next hint
| ahb 16 @ 0x42 force 16bit for all addrs >= 0x42 or until the next hint
| ahb 0 @ 0x84 disable the effect of bits hints for all addrs >= 0x84 or until the next hint
| ahc 0x804804 override call/jump address
| ahd foo a0,33 replace opcode string
| ahe 3,eax,+= set vm analysis string
| ahf 0x804840 override fallback address for call
| ahF 0x10 set stackframe size at current offset
| ahh 0x804840 highlight this address offset in disasm
| ahi[?] 10 define numeric base for immediates (2, 8, 10, 10u, 16, i, p, S, s)
| ahj list hints in JSON
| aho call change opcode type (see aho?) (deprecated, moved to "ahd")
| ahp addr set pointer hint
| ahr val set hint for return value of a function
| ahs 4 set opcode size=4
| ahS jz set asm.syntax=jz for this opcode
| aht [?] <type> Mark immediate as a type offset (deprecated, moved to "aho")
| ahv val change opcode's val field (useful to set jmptbl sizes in jmp rax)
One of the most common cases is to set a particular numeric base for immediates:
[0x00003d54]> ahi?
Usage: ahi [2|8|10|10u|16|bodhipSs] [@ offset] Define numeric base
| ahi <base> set numeric base (2, 8, 10, 16)
| ahi 10|d set base to signed decimal (10), sign bit should depend on receiver size
| ahi 10u|du set base to unsigned decimal (11)
| ahi b set base to binary (2)
| ahi o set base to octal (8)
| ahi h set base to hexadecimal (16)
| ahi i set base to IP address (32)
| ahi p set base to htons(port) (3)
| ahi S set base to syscall (80)
| ahi s set base to string (1)
[0x00003d54]> pd 2
0x00003d54 0583000000 add eax, 0x83
0x00003d59 3d13010000 cmp eax, 0x113
[0x00003d54]> ahi d
[0x00003d54]> pd 2
0x00003d54 0583000000 add eax, 131
0x00003d59 3d13010000 cmp eax, 0x113
[0x00003d54]> ahi b
[0x00003d54]> pd 2
0x00003d54 0583000000 add eax, 10000011b
0x00003d59 3d13010000 cmp eax, 0x113
It is notable that some analysis stages or commands add the internal analysis hints, which can be checked with ah command:
[0x00003d54]> ah
0x00003d54 - 0x00003d54 => immbase=2
[0x00003d54]> ah*
ahi 2 @ 0x3d54
Sometimes we need to override jump or call address, for example in case of tricky relocation, which is unknown for radare2, thus we can change the value manually. The current analysis information about a particular opcode can be checked with ao command. We can use ahc command for performing such a change:
[0x00003cee]> pd 2
0x00003cee e83d080100 call sub.__errno_location_530
0x00003cf3 85c0 test eax, eax
[0x00003cee]> ao
address: 0x3cee
opcode: call 0x14530
mnemonic: call
prefix: 0
id: 56
bytes: e83d080100
refptr: 0
size: 5
sign: false
type: call
cycles: 3
esil: 83248,rip,8,rsp,-=,rsp,=[],rip,=
jump: 0x00014530
direction: exec
fail: 0x00003cf3
stack: null
family: cpu
stackop: null
[0x00003cee]> ahc 0x5382
[0x00003cee]> pd 2
0x00003cee e83d080100 call sub.__errno_location_530
0x00003cf3 85c0 test eax, eax
[0x00003cee]> ao
address: 0x3cee
opcode: call 0x14530
mnemonic: call
prefix: 0
id: 56
bytes: e83d080100
refptr: 0
size: 5
sign: false
type: call
cycles: 3
esil: 83248,rip,8,rsp,-=,rsp,=[],rip,=
jump: 0x00005382
direction: exec
fail: 0x00003cf3
stack: null
family: cpu
stackop: null
[0x00003cee]> ah
0x00003cee - 0x00003cee => jump: 0x5382
As you can see, despite the unchanged disassembly view the jump address in opcode was changed (jump option).
If anything of the previously described didn't help, you can simply override shown disassembly with anything you like:
[0x00003d54]> pd 2
0x00003d54 0583000000 add eax, 10000011b
0x00003d59 3d13010000 cmp eax, 0x113
[0x00003d54]> "ahd myopcode bla, foo"
[0x00003d54]> pd 2
0x00003d54 myopcode bla, foo
0x00003d55 830000 add dword [rax], 0
Managing variables
Radare2 allows managing local variables, no matter their location, stack or registers. The variables' auto analysis is enabled by default but can be disabled with anal.vars configuration option.
The main variables commands are located in afv namespace:
Usage: afv [rbs]
| afv* output r2 command to add args/locals to flagspace
| afv-([name]) remove all or given var
| afv= list function variables and arguments with disasm refs
| afva analyze function arguments/locals
| afvb[?] manipulate bp based arguments/locals
| afvd name output r2 command for displaying the value of args/locals in the debugger
| afvf show BP relative stackframe variables
| afvn [new_name] ([old_name]) rename argument/local
| afvr[?] manipulate register based arguments
| afvR [varname] list addresses where vars are accessed (READ)
| afvs[?] manipulate sp based arguments/locals
| afvt [name] [new_type] change type for given argument/local
| afvW [varname] list addresses where vars are accessed (WRITE)
| afvx show function variable xrefs (same as afvR+afvW)
afvr, afvb and afvs commands are uniform but allow manipulation of register-based arguments and variables, BP/FP-based arguments and variables, and SP-based arguments and variables respectively. If we check the help for afvr we will get the way two others commands works too:
|Usage: afvr [reg] [type] [name]
| afvr list register based arguments
| afvr* same as afvr but in r2 commands
| afvr [reg] [name] ([type]) define register arguments
| afvrj return list of register arguments in JSON format
| afvr- [name] delete register arguments at the given index
| afvrg [reg] [addr] define argument get reference
| afvrs [reg] [addr] define argument set reference
Like many other things variables detection is performed by radare2 automatically, but results can be changed with those arguments/variables control commands. This kind of analysis relies heavily on preloaded function prototypes and the calling-convention, thus loading symbols can improve it. Moreover, after changing something we can rerun variables analysis with afva command. Quite often variables analysis is accompanied with types analysis, see afta command.
The most important aspect of reverse engineering - naming things. Of course, you can rename variable too, affecting all places it was referenced. This can be achieved with afvn for any type of argument or variable. Or you can simply remove the variable or argument with afv- command.
As mentioned before the analysis loop relies heavily on types information while performing variables analysis stages. Thus comes next very important command - afvt, which allows you to change the type of variable:
[0x00003b92]> afvs
var int local_8h @ rsp+0x8
var int local_10h @ rsp+0x10
var int local_28h @ rsp+0x28
var int local_30h @ rsp+0x30
var int local_32h @ rsp+0x32
var int local_38h @ rsp+0x38
var int local_45h @ rsp+0x45
var int local_46h @ rsp+0x46
var int local_47h @ rsp+0x47
var int local_48h @ rsp+0x48
[0x00003b92]> afvt local_10h char*
[0x00003b92]> afvs
var int local_8h @ rsp+0x8
var char* local_10h @ rsp+0x10
var int local_28h @ rsp+0x28
var int local_30h @ rsp+0x30
var int local_32h @ rsp+0x32
var int local_38h @ rsp+0x38
var int local_45h @ rsp+0x45
var int local_46h @ rsp+0x46
var int local_47h @ rsp+0x47
var int local_48h @ rsp+0x48
Less commonly used feature, which is still under heavy development - distinction between variables being read and written. You can list those being read with afvR command and those being written with afvW command. Both commands provide a list of the places those operations are performed:
[0x00003b92]> afvR
local_48h 0x48ee
local_30h 0x3c93,0x520b,0x52ea,0x532c,0x5400,0x3cfb
local_10h 0x4b53,0x5225,0x53bd,0x50cc
local_8h 0x4d40,0x4d99,0x5221,0x53b9,0x50c8,0x4620
local_28h 0x503a,0x51d8,0x51fa,0x52d3,0x531b
local_38h
local_45h 0x50a1
local_47h
local_46h
local_32h 0x3cb1
[0x00003b92]> afvW
local_48h 0x3adf
local_30h 0x3d3e,0x4868,0x5030
local_10h 0x3d0e,0x5035
local_8h 0x3d13,0x4d39,0x5025
local_28h 0x4d00,0x52dc,0x53af,0x5060,0x507a,0x508b
local_38h 0x486d
local_45h 0x5014,0x5068
local_47h 0x501b
local_46h 0x5083
local_32h
[0x00003b92]>
Type inference
The type inference for local variables and arguments is well integrated with the command afta.
Let's see an example of this with a simple hello_world binary
[0x000007aa]> pdf
| ;-- main:
/ (fcn) sym.main 157
| sym.main ();
| ; var int local_20h @ rbp-0x20
| ; var int local_1ch @ rbp-0x1c
| ; var int local_18h @ rbp-0x18
| ; var int local_10h @ rbp-0x10
| ; var int local_8h @ rbp-0x8
| ; DATA XREF from entry0 (0x6bd)
| 0x000007aa push rbp
| 0x000007ab mov rbp, rsp
| 0x000007ae sub rsp, 0x20
| 0x000007b2 lea rax, str.Hello ; 0x8d4 ; "Hello"
| 0x000007b9 mov qword [local_18h], rax
| 0x000007bd lea rax, str.r2_folks ; 0x8da ; " r2-folks"
| 0x000007c4 mov qword [local_10h], rax
| 0x000007c8 mov rax, qword [local_18h]
| 0x000007cc mov rdi, rax
| 0x000007cf call sym.imp.strlen ; size_t strlen(const char *s)
After applying afta
[0x000007aa]> afta
[0x000007aa]> pdf
| ;-- main:
| ;-- rip:
/ (fcn) sym.main 157
| sym.main ();
| ; var size_t local_20h @ rbp-0x20
| ; var size_t size @ rbp-0x1c
| ; var char *src @ rbp-0x18
| ; var char *s2 @ rbp-0x10
| ; var char *dest @ rbp-0x8
| ; DATA XREF from entry0 (0x6bd)
| 0x000007aa push rbp
| 0x000007ab mov rbp, rsp
| 0x000007ae sub rsp, 0x20
| 0x000007b2 lea rax, str.Hello ; 0x8d4 ; "Hello"
| 0x000007b9 mov qword [src], rax
| 0x000007bd lea rax, str.r2_folks ; 0x8da ; " r2-folks"
| 0x000007c4 mov qword [s2], rax
| 0x000007c8 mov rax, qword [src]
| 0x000007cc mov rdi, rax ; const char *s
| 0x000007cf call sym.imp.strlen ; size_t strlen(const char *s)
It also extracts type information from format strings like printf ("fmt : %s , %u , %d", ...), the format specifications are extracted from anal/d/spec.sdb
You could create a new profile for specifying a set of format chars depending on different libraries/operating systems/programming languages like this :
win=spec
spec.win.u32=unsigned int
Then change your default specification to newly created one using this config variable e anal.spec = win
For more information about primitive and user-defined types support in radare2 refer to types chapter.
Types
Radare2 supports the C-syntax data types description. Those types are parsed by a C11-compatible parser and stored in the internal SDB, thus are introspectable with k command.
Most of the related commands are located in t namespace:
[0x00000000]> t?
| Usage: t # cparse types commands
| t List all loaded types
| tj List all loaded types as json
| t <type> Show type in 'pf' syntax
| t* List types info in r2 commands
| t- <name> Delete types by its name
| t-* Remove all types
| tail [filename] Output the last part of files
| tc [type.name] List all/given types in C output format
| te[?] List all loaded enums
| td[?] <string> Load types from string
| tf List all loaded functions signatures
| tk <sdb-query> Perform sdb query
| tl[?] Show/Link type to an address
| tn[?] [-][addr] manage noreturn function attributes and marks
| to - Open cfg.editor to load types
| to <path> Load types from C header file
| toe [type.name] Open cfg.editor to edit types
| tos <path> Load types from parsed Sdb database
| tp <type> [addr|varname] cast data at <address> to <type> and print it (XXX: type can contain spaces)
| tpv <type> @ [value] Show offset formatted for given type
| tpx <type> <hexpairs> Show value for type with specified byte sequence (XXX: type can contain spaces)
| ts[?] Print loaded struct types
| tu[?] Print loaded union types
| tx[f?] Type xrefs
| tt[?] List all loaded typedefs
Note that the basic (atomic) types are not those from C standard - not char, _Bool, or short. Because those types can be different from one platform to another, radare2 uses definite types like as int8_t or uint64_t and will convert int to int32_t or int64_t depending on the binary or debuggee platform/compiler.
Basic types can be listed using t command. For the structured types you need to use ts, for unions use tu and for enums — te.
[0x00000000]> t
char
char *
double
float
gid_t
int
int16_t
int32_t
int64_t
int8_t
long
long long
pid_t
short
size_t
uid_t
uint16_t
uint32_t
uint64_t
uint8_t
unsigned char
unsigned int
unsigned short
void *
Loading types
There are three easy ways to define a new type:
Directly from the string using td command
From the file using to <filename> command
Open an $EDITOR to type the definitions in place using to -
[0x00000000]> "td struct foo {char* a; int b;}"
[0x00000000]> cat ~/radare2-regressions/bins/headers/s3.h
struct S1 {
int x[3];
int y[4];
int z;
};
[0x00000000]> to ~/radare2-regressions/bins/headers/s3.h
[0x00000000]> ts
foo
S1
Also note there is a config option to specify include directories for types parsing
[0x00000000]> e? dir.types
dir.types: Default path to look for cparse type files
[0x00000000]> e dir.types
/usr/include
Printing types
Notice below we have used ts command, which basically converts the C type description (or to be precise it's SDB representation) into the sequence of pf commands. See more about print format.
The tp command uses the pf string to print all the members of type at the current offset/given address:
[0x00000000]> "td struct foo {char* a; int b;}"
[0x00000000]> wx 68656c6c6f000c000000
[0x00000000]> wz world @ 0x00000010 ; wx 17 @ 0x00000016
[0x00000000]> px
[0x00000000]> ts foo
pf zd a b
[0x00000000]> tp foo
a : 0x00000000 = "hello"
b : 0x00000006 = 12
[0x00000000]> tp foo @ 0x00000010
a : 0x00000010 = "world"
b : 0x00000016 = 23
Also, you could fill your own data into the struct and print it using tpx command
[0x00000000]> tpx foo 414243440010000000
a : 0x00000000 = "ABCD"
b : 0x00000005 = 16
Linking Types
The tp command just performs a temporary cast. But if we want to link some address or variable with the chosen type, we can use tl command to store the relationship in SDB.
[0x000051c0]> tl S1 = 0x51cf
[0x000051c0]> tll
(S1)
x : 0x000051cf = [ 2315619660, 1207959810, 34803085 ]
y : 0x000051db = [ 2370306049, 4293315645, 3860201471, 4093649307 ]
z : 0x000051eb = 4464399
Moreover, the link will be shown in the disassembly output or visual mode:
[0x000051c0 15% 300 /bin/ls]> pd $r @ entry0
;-- entry0:
0x000051c0 xor ebp, ebp
0x000051c2 mov r9, rdx
0x000051c5 pop rsi
0x000051c6 mov rdx, rsp
0x000051c9 and rsp, 0xfffffffffffffff0
0x000051cd push rax
0x000051ce push rsp
(S1)
x : 0x000051cf = [ 2315619660, 1207959810, 34803085 ]
y : 0x000051db = [ 2370306049, 4293315645, 3860201471, 4093649307 ]
z : 0x000051eb = 4464399
0x000051f0 lea rdi, loc._edata ; 0x21f248