Someone just bypassed Apple's Neural Engine to train models. The Neural Engine inside every M-series Mac was designed for inference. Run models, don't train them. No public API, no documentation, and certainly no backpropagation. A researcher reverse-engineered the private APIs anyway and built a transformer training loop that runs forward and backward passes directly on the ANE hardware. The method bypasses CoreML entirely. Instead of using Apple's official tools, the project constructs programs in MIL (Model Intermediate Language), compiles them in-memory using undocumented `_ANEClient` APIs, and feeds data through IOSurface shared memory buffers. Weights get baked into the compiled programs as constants. E ach training step dispatches six custom kernels: attention forward, feedforward forward, then four backward passes that compute gradients with respect to inputs. Weight gradients still run on the CPU using Accelerate's matrix libraries, but the heavy lifting (matrix multiplies, softmax, activation functions) happens on the ANE. This makes three things possible that weren't before: 1. Training small models locally without burning through your battery 2. Fine-tuning on-device without sending data to a server or spinning up the GPU 3. Research into what the ANE hardware can actually do when you ignore Apple's guardrails If this approach scales, the next wave of on-device AI stops being about running someone else's frozen model.