Obfuscation (software)
Obfuscated code is source code that is (usually intentionally) very hard to read and understand. Some languages are more prone to obfuscation than others. C, C++ and Perl are most often cited as easily obfuscatable languages. Macro preprocessors are often used to create hard to read code by masking the standard language syntax and grammar from the main body of code. The term shrouded code has also been used.
There are also programs known as obfuscators that may operate on source code, object code, or both, for the purpose of deterring reverse engineering.
Recreational obfuscation
Code is sometimes obfuscated deliberately for recreational purposes. There are programming contests which reward the most creatively obfuscated code: the International Obfuscated C Code Contest, Obfuscated Perl Contest, International Obfuscated Ruby Code Contest and Obfuscated PostScript Contest.
There are many varieties of interesting obfuscations ranging from simple keyword substitution, use/non-use of whitespace to create artistic effects, to clever self-generating or heavily compressed programs.
Short obfuscated Perl programs printing "Just another Perl hacker" or something like that are often found in signatures of Perl programmers.
Examples
Take this infamous example from Internet lore:
#include <stdio.h> main(t,_,a)char *a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_, main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==2?_<13? main(2,_+1,"%s %d %d\n"):9:16:t<0?t<-72?main(_,t, "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#\ ;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \ q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# \ ){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \ iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \ ;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}{rl#'{n' ')# \ }'+}##(!!/") :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1) :0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a, "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
Although unintelligible at first glance, it is a legal C program which when compiled and run will generate the 12 verses of The 12 Days of Christmas. It actually contains all the strings required for the poem in an encoded form inlined in the code. The code then iterates through the 12 days displaying what it needs to.
Another example is a program's source listing that was formatted to resemble an empty tic-tac-toe board. Each pass through the program modified the sourcecode to show a turn in the game, to be executed for the next move.
Yet another example is this short program that generates mazes of arbitrary length:
char*M,A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf(M="%d",&C); -- E; J[ E] =T [E ]= E) printf("._"); for(;(A-=Z=!Z) || (printf("\n|" ) , A = 39 ,C -- ) ; Z || printf (M ))M[Z]=Z[A-(E =A[J-Z])&&!C & A == T[ A] |6<<27<rand()||!C&!Z?J[T[E]=T[A]]=E,J[T[A]=A-Z]=A,"_.":" |"];}
Note the shape of the corridors in the program. Modern C compilers don't allow constant strings to be overwritten, which can be avoided by changing the first line to
char M[2],A,Z,E=40,J[40],T[40];main(C){for(*J=A=scanf("%d",&C); or using the flag -fwritable-strings in gcc (the GNU Compiler for C).
Obfuscation by code morphing
Its main difference from other obfuscation types is its code transformation called "Code Morphing". This technology protects the code on the CPU-command level. It is known the x86 processors command system is redundant and allows the execution of the same ‘code’ using system commands. It breaks up the protected code into several processor commands or small command snippets and replace them by others, while maintaining the same end result. Thus the protector obfuscates the code not on the source level but on the level of the CPU commands.
The Code Morphing is multilevel technology containing hundreds of unique code transformation patterns. In addition this technology includes the special layer that transforms some commands into Virtual Machine commands (like P-Code). Code Morphing turns binary code into an undecipherable mess that is not similar to normal compiled code, and completely hides execution logic of the protected code.
There is no concept of code decryption with this approach. Protected code blocks are always in the executable state, and they are executed as a transformed code. The original code is completely lost and code restoration is an NP-hard problem.
The weak point of such scheme is that it significantly increases the size and affects the speed of a program. But protecting an application author usually doesn't need to transform its entire code. It is enough to protect only critical parts of your code, responsible for serial number verification, trial expiration date, and other evaluation restrictions. The rest of application code remains intact and software execution speed remains the same.
Below is a code sample generated by Delphi and a partial (the full listing contains over 500 instructions) listing of the transformed code.
Source code :
writeln('Test OK'); After compilation mov eax, [$ 004092ec] mov edx, $00408db4 call @WriteOLString call @WriteLn call @_IOTest After the code transformation (partial): db 3 add al, $30 xlat call +$000025b2 jmp +$00000eec call +$00000941 or al, $4a scasd call -$304ffbe9 rol eax, $14 mov edi, [ebx] jmp +$00001738 mov ebx, eax shr ebx, $03 push ebx jmp +$0001b5e call -$000001eb jmp +$00003203 jmp +$00005df8 call +$00000910 adc dh, ah fmul st(7) adc [eax], al les eax, [ecx+$0118bfc0] stosb
Obfuscation Tools
A vast variety of tools exists to perform or assist with code obfuscation. These include experimental research tools created by academics, hobbyist tools, commercial products written by professionals, and Open-source software.
Software obfuscation tools include specialized obfuscators to demonstrate a relatively limited technique, more general obfuscators which attempt a more thorough obfuscation, and combined-function tools which obfuscate code as part of a larger goal such as software licensing enforcement.
Obfuscation and information-hiding
One definition of "code obfuscation" is a set of transformations on a program, that preserve the same black box specification while making the internals difficult to reverse-engineer. There turns out to be many such transformations.
For example, dynamic languages such as Java, C#, and Lisp store a program's symbol table within the compiled output. One common obfuscation is to rename every class from something descriptive like "Encryption_Index", to a meaningless sequence such as "rb". The class methods can be renamed to a(), b(), etc.
When writing source code, programmers generally create a great deal of structure, according to rules from Structured Programming, OOP, and other methodologies. Compilers tend to propagate this structure into compiled code. The job of a good obfuscator is to destroy as much as possible of this structure that lends a program to being human-readable.
Uses for obfuscation
Makes reverse engineering more difficult
Even when a language is compiled to an executable or bytecode file, someone may choose to run a decompiler which converts these files back into human-readable form (generally without comments). This could help them understand whatever lies hidden within the source code, against the wishes of the code's creator. Obfuscation serves to increase the difficulty of decompilation, usually forcing someone who wants that information to use more costly forms of reverse engineering.
However, some parts of language obfuscation can be easily defeated (reverse engineered). For example, some websites obscure their JavaScripts so as to prevent code copying and/or modification. This can be defeated quickly by viewing the DOM of the page. This can enable one to see the JavaScript code, removing some of the confusion, but scrambled variable names still can make the code extremely hard to understand.
Minimizes code size
Obfuscation usually breaks down structures which make programs modular and maintainable. This has the pleasant side-effect of reducing code size in many cases. For example, in dynamic languages that incorporate a symbol table with the executable code, simple variable renaming can save a great deal of space in the resulting code footprint. This is a crucial consideration if code size must be kept to a minimum, as with code that must be sent over a network or embedded into a small device.
Concealment of evidence
Spammers frequently use obfuscated JavaScript or HTML code in spam messages. The obfuscated message, when displayed by an HTML-capable e-mail client, appears as a reasonably normal message -- albeit with obnoxious JavaScript behaviors such as spawning pop-up windows. However, when the source is viewed, the obfuscations make it far more difficult for investigators to discern where the links go, or what the JavaScript code does.
Dealers in spamming software have sold JavaScript obfuscators for the purpose of confounding investigators. Some of the techniques use JavaScript's dynamic nature -- a piece of code is stored as an encrypted string, which is decrypted and evaluated. This may be done several times. Other techniques include insertion of dummy code, as well as dummy HTML links to legitimate pages.
Disadvantages of obfuscation
One Layer of Security
No obfuscator known today provides any guarantees on the difficulty of reverse engineering, and this seems to be an inherent issue (see for example, this paper). Thus, obfuscators do not provide security of a level similar to modern encryption schemes, and should be used with other measures in tandem, in cases where security is of high importance.
Debugging
Obfuscated code is extremely difficult to debug. Variable names will no longer make sense, and the structure of the code itself will likely be modified into unrecognizability. This fact generally forces developers to maintain two builds: One with the original, unobfuscated source code that can be easily debugged, and another for release. While both builds should be tested to make sure they act identically, the second build is generally easily and reliably constructed from the first by an obfuscation tool.
Obviously this limitation does not apply to intermediate language (Java, C#, etc.) obfuscators, which generally work on compiled assemblies rather than on source code.
Portability
Obfuscated code often depends on the particular characteristics of the platform and compiler, making it difficult to manage if either change.
Defective obfuscators
Occasionally an obfuscator may be buggy, in a difficult to reproduce way. For binary obfuscators, there is little one can do except find or create a newer version or fiddle with any inputs to the obfuscator until it magically works. Source code obfuscators are often buggy because most are built using simple-string munging tools that fail to account for all the complexities of the source language syntax. Reliable source code obfuscators tend to use true language parsers to ensure that all the syntax is properly handled.
Conflicts with Reflection APIs
Reflection is a set of APIs in various languages that allow an object to be examined or created just by knowing its classname at run-time. Many obfuscators allow specified classes to be exempt from renaming; and it is also possible to let a class be renamed and call it by its new name. However, the former option places limits on the dynamism of code, while the latter adds a great deal of complexity and inconvenience to the system.
See also
- Copy protection
- International Obfuscated C Code Contest
- Obfuscated Perl contest
- International Obfuscated Ruby Code Contest
External links
- International Obfuscated C Code Contest
- International Obfuscated Ruby Code Contest
- Protecting Java Code Via Code Obfuscation
- How-To-Select an Obfuscation Tool for .NET -- A guide on obfuscation tools, published by Xtras.net in Aug 2005 (needs some updates).
- Obfuscation tools for .NET, on MSDN — Obfuscation resources for .NET, on the Microsoft Developer Center.
- List of code obfuscators and protectors, on Sharptoolbox
- List of code obfuscators and protectors, on Javatoolbox
- Can we obfuscate programs?
- Yury Lifshits. Obfuscation and Cryptography. Intenisive course at Tartu University (Spring'2006)
- Yury Lifshits. Lecture Notes on Program Obfuscation (Spring'2005)
- B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan and K. Yang. "On the (Im)possibility of Obfuscating Programs". 21st Annual International Cryptology Conference, Santa Barbara, California, USA. Springer Verlag LNCS Volume 2139, 2001.
- Java obfuscators at Curlie
- Analysis of the 12 days program
- Analysis of the obfuscated maze generating program
- The free perl obfuscation service
- Obfuscated Perl program with explanation
- x86 assembler obfuscator