Monday, April 13, 2015

A Tale of Two Exploits

Posted by Natalie Silvanovich, Collision Investigator and (Object) Field Examiner

CVE-2015-0336 is a type confusion vulnerability in the AS2 NetConnection class. I reported this issue in January and soon wrote a proof-of-concept exploit for the bug. The issue was patched by Adobe in March and less than a week later, in what was likely a case of bug collision, it was found in two exploit kits in the wild. This created an interesting opportunity to compare a real exploit to a theoretical one and better understand how attackers exploit Flash vulnerabilities.


The Bug

CVE-2105-0336 is caused by a faulty check in the ActionScript 2 NetConnection class. To understand the bug, it is important to understand the structure of AS2 objects.

ActionScript 2 is a legacy scripting language supported by the Adobe Flash player. While it shares some classes with the more modern ActionScript 3, it is a unique scripting language with different opcodes and typing rules, implemented in a separate VM. SWF files cannot combine AS2 and AS3 code, they must either be built for an early version of Flash and only use AS2 or be built for a more recent version and only use AS3.

The diagram below shows an AS2 object:
WARNING: This is a dramatization. Code is pseudocode that represents member widths but not exact types. Your AS2 object may be slightly different than pictured. 

AS2 objects are backed by a number of native structures. Typically, an AS2 object is referenced by a pointer-width structure called a ScriptAtom, which contains a pointer (or other data for primitives that are not objects). The pointer points to a native ScriptObject that is common to all AS2 classes, and that object contains a several properties (with some indirection), including the native data and the type of the object. Like an atom, the native data can contain object data (for example, the value of a boolean object) or a pointer to a native class that backs the specific object type. The interpretation of the native data is based on the type property of the ScriptObject. The diagram above shows a NetConnection object, where the native data is a pointer to a native NetConnection object.

When a native function uses an AS2 object, it must do a type check before it uses the native data. For example, the Number class checks that the ScriptObject is of type Number before casting its native data to the native Number type. CVE-2015-0336 occurs because this check in the NetConnection class is faulty. The check passes if the this object is of type NetConnection, or if the object is of type Object and has a NetConnection object in its __proto__ chain (i.e. the object is a NetConnection, or a subclass of NetConnection, or a subsubclass of NetConnection and so on). Since objects of type Object should have a null native data property, this check should work. Except there’s one way to get an object of type Object with a non-null native data.

Native functions themselves are represented by an object of type Object in AS2. These objects can be created by calling:


    ASnative(x, y);

The x parameter determines what native function to call, and generally corresponds to a class, such as Number or NetConnection, and is stored as a property in the ScriptObject. The y parameter is used by the native function, generally in a switch statement to further direct the call, and is stored as a Number (int for 32-bit build, double for 64-bit build) cast to a DWORD in the object’s native data. This means that if a NetConnection method is called with a parameter that is a native function object, its native data can be specified as a Number by the caller, but be interpreted as a pointer (unfortunately only a 32-bit one due to the DWORD casting). My proof-of-concept of this issue when I reported it was as follows:

   var b = ASnative(2100, 0x77777777);
   var n = new NetConnection();
   b.__proto__ = n;
   var f = ASnative(2100, 0); //NetConnection.connect
   f.call(b, 1);


b is a native function object with a y parameter and therefore a native data of 0x77777777. It has its __proto__ set to NetConnection n so it passes the NetConnection class’s type check. When NetConnection.connect is called, the function attempts to access address 0x77777777, which it thinks is the location of the NetConnection object and crashes.


Exploiting the Bug

Compared to other Flash type confusion bugs reported recently, I would characterize CVE-2015-0336 as medium quality. It’s not obviously reliably exploitable on all platforms, but there’s no doubt it can be exploited in some situations. I wrote a proof of concept for Firefox on 32-bit Linux, as this seemed like an environment the bug was likely to work reasonably reliably. In addition, I exploited the bug using AS2 only. This was for two reasons. First, bridging from AS2 to AS3 in the absence of existing code is time consuming and error prone. Second, I was some concerned that since the bug is in the NetConnection class, exploitation would interfere with AS2/AS3 communication, as all connections are in a global linked list in Flash.

Based on the bug, I thought there were a few steps that would be needed to exploit the bug:

  • Create a ‘fake’ NetConnection in memory that the confused NetConnection pointer points to 
  • Use this object to perform reads to bypass ASLR 
  • Use this object to move IP to ROP gadget addresses found by reading 

To start making a ‘fake’ NetConnection, we need to be able to control the contents of a buffer at a known location, and then point the NetConnection pointer at that location. The lack of contiguous, mutable data types in AS2 makes this difficult. While AS3 contains the ByteArray and Vector classes which can be used to allocate memory on the heap with arbitrary size and contents, AS2 lacks such types. There are a few classes that are close runners up, though. String objects are allocated in contiguous memory and can be of any size, but are not mutable, and terminate as soon as two zero bytes occur. BitmapData objects can be up to 31 MB (81918191), are allocated contiguously and are mutable, but their memory can only be set to valid ARGB pixel values (more details later). A few classes also have members that are backed by heap allocated arrays in memory: the ConvolutionFilter matrix property, the AsBroadcaster _listeners property and the filters property of all objects that support filters. These properties are immutable though, and are limited to the values permitted by the type of the array contents.

I decided to use a BitmapData object, as I noticed that they have an interesting property. The native data of a BitmapData object is a pointer to a native BitmapData object, which contains many members including a pointer to a pixel buffer. How this buffer is allocated is highly platform dependent, and also varies based on whether the device has a GPU on some platforms. I wrote this exploit for Firefox on 32-bit Ubuntu, in which case the pixel buffer is a GDK buffer. This means that the pixels are allocated using g_malloc, which uses mmap to allocate larger buffer sizes. Allocating 256 1 MB (2880x91) pixel BitmapData objects, the objects are consistently 1 MB aligned, and if enough are allocated, it’s fairly easy to guess a location that is the beginning of one of these buffers, though you don’t know which one. You can then set the pointer to the type-confused NetConnection to this address, and create a fake NetConnection object that is backed by the BitmapData pixels.

From here, it is pretty easy to control the instruction pointer, as the NetConnection object starts with a vtable and NetConnection.addHeader calls an object method immediately-- so long as you’re happy with it pointing somewhere ARGB valid.

ARGB is a scheme for specifying the color of transparent pixels. The A is for alpha and is a one byte value that represents how transparent the bitmap is (255 means it’s opaque and 0 means it’s transparent). The RGB represents standard one byte values containing the intensity of the colours red, green and blue. Any value is theoretically ARGB valid, but when Flash stores pixel values, it corrects for transparency. So if a pixel is set to 0xbbff0000 (alpha=0xbb, red=0xff), Flash calculates:

   red = red * alpha / 255

And the red value is stored as 0xbb (alpha is never altered). So the subsequent bytes in each four-byte pixel can never be larger than the MSB. It is possible to specify an arbitrary four-byte value if it’s not aligned (the alpha of one pixel is the LSB, and the RGB of the next pixel, with an alpha of 0xff are the remaining bytes), but it is not possible to specify two arbitrary four-byte values in a row, nor any arbitrary aligned values.

Since the instruction pointer does not need to be aligned, it is easy enough to move it to an arbitrary location by making a fake vtable in the BitmapData object that points to the desired IP location. But where to point it?

There are a few methods in the NetConnection object that can be used as info leaks. I started by trying to use the nearNonce getter, but this method checks that the NetConnection is connected with certain properties before leaking memory, so it is only useful for reading memory locations where a value roughly 200 bytes above the value is a valid pointer that the method can read. Instead, I used the nearID getter. This method returns a string at an address that can be specified in the BitmapData buffer, so long as a few other values in the buffer are set so that connectivity checks succeed. The problem with this info leak is that it treats the memory it reads as UTF-8 values when it creates the string it returns. This means that if a character it reads is not a valid UTF-8 value it ignores it and goes to the next value, if it is an unknown glyph, it appends 0xfffd to the string and if it is a valid character it is converted to its UTF-16 equivalent. Practically, this means that the info leak can only reliably return the value at an address if it is an ASCII string.

Even worse, the location of the pointer the info leak reads needs to be aligned, meaning that it can only read locations with addresses that are valid ARGB pixels and point to valid ASCII strings. For a library at a higher address, say around 0xb0000000, that means only about 70% of the address space is addressable, and this goes down for lower addresses. But probabilistically speaking, there’s usually going to be something like that every time you load libc, right?

Running strings on libc, there are about 200 strings that are at least 15 characters long and appear only once in libc and no other library. Creating a table of these with their corresponding offsets from the base of libc, and using the info leak to look for ASCII strings that match these, it’s possible to find the address of libc fairly reliably, though sometimes it takes a long time to run. There’s a bit of a tradeoff here. The lower the address you start with, the more likely you are to find libc, but it takes longer, and the browser eventually prompts the user to stop execution. If you start with a higher address it runs faster (most of the good strings are near the end of the library), but it’s more likely to not find libc and crash when it hits the end mapped memory.

With the address of libc, it’s easy to point the instruction pointer to a ROP gadget and use it to call system. I picked one that allows both the pointer to system and the pointer to its parameter to be unaligned so that they don’t have to be ARBG valid. The contents of the string do have to be ARBG valid though, which is why my exploit spawns ghex, which ends with a letter with higher ASCII value than all the others. With a clever use of spacing, it should be possible to run a reasonable set of commands though, and worst case, you could put the command in a string on the heap, and use the info leak again to search for it.

This exploit works, but suffers from a few problems:

  • It’s not 100% reliable. While it works fairly consistently, there are two chances for a pointer to “miss”: when the BitmapData objects are allocated, and when setting the pointer to scan for libc.
  • It’s limited to 32-bit platforms. I think this would work on 32-bit Windows with a few changes (in particular, scanning libc for wouldn’t work, but you could probably scan the heap for a relevant value instead), but it definitely wouldn’t work on 64-bit, as the controllable portion of the native data pointer is only 32 bits.
  • The swf is bulky and takes a long time to run.

The Exploit Kits

CVE-2015-0336 was patched by Adobe on March 12, 2015 and an exploit for the bug was discovered in the Nuclear Exploit Kit by March 19. A day later, identical exploit code turned up in another exploit kit, Angler, and a few other exploit kits added the code later on (you can see more details here). The exploit initially surfaced less than a week after the bug was fixed, and before the details of the bug were made public in the Project Zero tracker, meaning there are two likely ways the authors of the exploit kit could have gained knowledge of the bug. The first is through reverse engineering the patch, which is possible, but seems unlikely for this specific bug in this time frame. The second, more likely option is that the exploit kit authors had previously discovered the issue.

This bug was fixed by updating the “normal check”, the function that is used to verify that an object is of type Object. The check was updated to verify that the native function pointer of the object is null, meaning that the object is not a native function. The fix affected all calls to the check, not just the one in the NetConnection class. Moreover, the March 12 update added several more normal checks to address CVE-2015-0334. So, in order to determine the issue via reverse engineering, the authors would have had to realize that the change in the normal check function was not related to the additional calls to the function that were added in the patch. They would have also had to identify NetConnection as the class containing the vulnerability the patch intended to fix, even though this class was not modified in the patch, and even though many other classes contain similar checks. In addition, the method for creating an object that violates the condition of the check (calling ASnative to create a native function object and setting its __proto__ to a new NetConnection) isn’t particularly intuitive, and would take substantial time to figure out.

What seems more likely is that the exploit authors already had knowledge of the bug though their own independent research, and it being patched caused them caused them to alter their deployment strategy to include exploit kits. In this case, patching the bug likely prevented a lot of attacks.

Decompiling the sample swf, the exploit uses both AS2 and AS3, and AS2 is limited to a single class:

   var _loc2_ = _global.ASnative(2100,438181888);
   var _loc3_ = new Object();
   _loc2_.__proto__ = _loc3_;
   _global.ASnative(2100,200)(_loc3_); //Netconnection constructor
   _global.ASnative(2100,8).apply(_loc2_,[1]); //NetConnection.farID


While the code and the order of the calls is slightly different, this is the same type confusion bug, but they used a different NetConnection method to exploit it. My exploit called the nearID getter to read, and then the call method to set IP, but this exploit only calls the farID getter, which has an identical native implementation to the nearID getter. So how did they manage to perform an exploit with only one call?

It turns out that the near/farID getters set an internal property of the NetConnection to a pointer to a string before returning the value. So calling near/farID on a type-confused pointer will cause a value near that pointer to be overwritten with a large value, most of the time. There are a few broad things that need to be true about the surrounding memory, such as certain values not being zero that need to be true for this to work, though.

This is sufficient to use a common exploitation method involving corrupting the length of a vector. I won’t go into a lot of detail about this, as there is already a great blog entry describing it. The basic idea is that a large number of Vectors are allocated on the heap, and then the memory corruption is used to increase the length of one Vector, which is then used to increase the length of another even further to 0x7fffffff, which then makes the entire memory space readable and writable to the attacker. They then read an object value stored in the Vector to determine the location of a library to bypass ASLR, and then overwrite the vtable of a different object (which happens to be a FileReference in this case) to set IP.

This exploit is more compact and runs faster than my exploit. Though there is no good way to test this, I suspect it is more reliable as well, though it is still reported to suffer from reliability problems. There are likely four sources of unreliability in this exploit:

  • The type confused pointer might not ‘hit’ the heap spray because of unexpected heap entropy.
  • The exploit relies on every 80th vector being different and containing structures for the next stage of the exploit, which the author called the lucky Vector. If the memory corruption hits the (un)lucky Vector, the exploit does not work. This happens one in every eighty times.
  • Calling the farID getter might not succeed because a value on the heap is not correct.
  • There could be a crash during AS2 to AS3 communication due to the connection chain containing an invalid NetConnection. This is fairly unlikely to happen, it would require Flash to idle during exploitation. 

My exploit is probably similarly likely to ‘hit’ allocated memory as this one, and they both suffer similar unreliability from this source. Scanning through libc also adds a lot of unreliability to my exploit, and I suspect this greatly outweighs the other sources of unreliability in the exploit kit, as they all seem fairly unlikely to happen.

The exploit kit is for 32-bit Windows only, but I suspect this is related to the attackers’ motives versus them not being able to get it to work on Linux. There’s no reason this wouldn’t work on 32-bit Linux if the pointers were updated to have the correct values.


Conclusion

Unsurprisingly, the exploit kits, which were intended for malicious use on a broad scale contained a more reliable method of exploiting CVE-2015-0336 than my proof-of-concept. It’s especially interesting how they used the bug to corrupt memory when it could also be used to read memory and set IP, which counterintuitively led to a more reliable exploit. While there are a lot of sources of unreliability in the exploit kit, it’s likely only two major ones that cause the majority of failures. Avoiding the actions that are most likely to cause failure is what makes the exploit kit more reliable.

Bug collision is one way that Project Zero measures its success because collision disrupts vulnerabilities that are used by sophisticated attackers. Due to the speed at which the exploit kits released an exploit for CVE-2015-0336 and the difficulty of determining this bug through reverse engineering, we believe it was a case of bug collision. In this case, fixing this bug likely prevented its continued zero-day use by attackers.