Apple's M4 Neural Engine Unveiled: Engineers Reverse‑Engineer Core AI Chip Part 1
Photo by BoliviaInteligente (unsplash.com/@boliviainteligente) on Unsplash
While analysts expected Apple’s M4 to be a modest upgrade, Maderix reports the team already bypassed CoreML to talk directly to the chip, exposing its inner workings.
Key Facts
- •Key company: Apple
Maderix’s reverse‑engineering team has mapped the full software stack that sits between Apple’s CoreML framework and the silicon‑level driver for the M4 Neural Engine (codenamed H16G). By interrogating the AppleNeuralEngine.framework binary with dyld_info and ‑objc metadata, the researchers identified the hidden ANEClient API and reconstructed the in‑memory MIL (Machine Learning Intermediate Language) compilation pipeline that Apple uses to translate a CoreML model into a hardware‑ready graph. According to the Maderix report, this pipeline was previously only reachable through the opaque CoreML runtime, which adds multiple optimization passes and a substantial latency overhead before the graph reaches the ANE [ Maderix ].
The team then leveraged the IOKit kernel driver to inject compiled graphs directly into the ANE’s execution queue, bypassing CoreML entirely. Their experiments show that the M4’s ANE comprises 16 fixed‑function cores with a queue depth of 127 concurrent evaluation requests and independent dynamic voltage/frequency scaling (DVFS) per core. When idle, the engine’s hard power‑gating reduces consumption to zero milliwatts, a detail not disclosed in Apple’s public documentation. By measuring raw throughput on a synthetic 1×1 convolution workload, Maderix found that Apple’s advertised “38 TOPS” figure is inflated by roughly 30 % when CoreML’s overhead is stripped away, confirming earlier community observations that the ANE’s real peak performance is lower than the marketing spec [ Maderix ].
Beyond benchmarking, the researchers achieved a first‑of‑its‑kind training run on the ANE. Historically, the accelerator has been marketed as inference‑only, with Apple’s own reference implementation for transformers emphasizing channel‑first data layouts and 1×1 convolutions to suit the engine’s fixed‑function datapaths [ Maderix ]. By feeding a small back‑propagation loop into the direct‑access API, the team demonstrated that weight updates can be performed on‑chip, albeit at a reduced speed compared with inference. This breakthrough suggests that future iOS and macOS updates could expose a training‑capable API, a possibility hinted at in Apple’s iOS 26.4 beta release notes that mention “AI‑enhanced on‑device learning” [ TechCrunch ].
Maderix’s work builds on a foundation laid by earlier community efforts. Matthijs Hollemans’ hollance/neural-engine repository catalogued ANE behavior and supported operations, while the mdaiter/ane project provided Python and Objective‑C samples that interacted with the ANECompiler framework. The Asahi Linux project’s reverse‑engineered Linux driver also shed light on the kernel‑level interface [ Maderix ]. However, none of these prior contributions achieved the three milestones that Maderix claims: (a) direct ANEClient API access on the M4, (b) full decryption of the MIL compilation path, and (c) empirical measurement of peak throughput without CoreML’s abstraction layer [ Maderix ].
The implications for developers are immediate. With the ability to compile and dispatch neural‑graph binaries directly, third‑party toolchains could bypass CoreML’s opaque optimizer, enabling custom quantization schemes or operator mixes that Apple’s public SDK does not support. Moreover, the disclosed queue depth of 127 suggests that high‑throughput batch processing—previously thought impractical on a mobile accelerator—may be viable for edge‑AI workloads such as real‑time video analytics or on‑device recommendation engines. Apple’s upcoming iOS 26.4 beta already introduces AI‑driven features like music playlists and video podcasts, underscoring the company’s push to embed more sophisticated models on‑device [ TechCrunch ].
While Maderix’s findings are technically robust, they also raise security considerations. Direct access to the ANE circumvents Apple’s sandboxed execution model, potentially exposing a new attack surface for malicious code that could hijack the accelerator’s power‑gating or DVFS controls. Apple has not publicly responded to the reverse‑engineering effort, but the company’s history of restricting low‑level hardware access suggests that future firmware updates may close the uncovered pathways. For now, the M4 Neural Engine’s inner workings are no longer a black box, and the community can begin to explore the true capabilities—and limits—of Apple’s most advanced on‑device AI accelerator.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.