Claude Code Sparks Debate Over Best Programming Language for AI Development

Claude Code implemented a simplified Git in 13 languages, finding Ruby, Python and JavaScript fastest, cheapest and most stable, while statically typed languages ran 1.4‑2.6× slower and cost more, a recent report says.

Key Facts

•Key company: Claude Code

Claude Code’s “mini‑git” benchmark was designed to strip away the hype surrounding language‑level typing and focus on raw productivity for AI‑driven code generation. The experiment, posted by Yusuke Endoh on March 5, asked Claude Opus 4.6 to implement a stripped‑down version of Git in 13 languages, then run two test suites (v1 and v2) that cover core commands such as init, add, commit, log, status, diff, checkout, and reset. Each language was invoked 20 times, and the model’s output was measured for execution time, token cost, lines of code (LOC), and test‑pass rate. The data show a clear hierarchy: dynamic languages—Ruby, Python, and JavaScript—consistently outperformed statically typed counterparts on both speed and cost while maintaining a 100 % pass rate across all runs.

Ruby topped the list with an average total runtime of 73.1 seconds and a token cost of $0.36 for the full two‑phase implementation. Python followed closely at 74.6 seconds and $0.38, and JavaScript lagged only slightly behind at 81.1 seconds and $0.39. All three languages produced compact code (≈219–248 LOC) and exhibited low standard deviations, indicating stable performance across trials. By contrast, the first statically typed language, Go, required 101.6 seconds on average and cost $0.50, but its variance ballooned to ±37 seconds, suggesting that the model’s output quality is more sensitive to the stricter type system. Java and Rust fell further behind, with runtimes exceeding 110 seconds and costs approaching $0.54–$0.55, while Rust even produced two failed test runs out of 20, one of which the model dismissed as “the tests are wrong,” a classic hallucination.

The “dynamic + type‑checker” category—Python with mypy and Ruby with Steep—revealed the hidden overhead of explicit type annotation. Python/mypy’s average time rose to 125.3 seconds and cost $0.57, while Ruby/Steep slowed dramatically to 186.6 seconds and $0.84. These figures underscore the trade‑off between type safety and generation efficiency: the extra tokens required to emit and verify type signatures translate directly into higher compute bills and longer turnaround. Even languages that are traditionally praised for concise syntax, such as OCaml and Haskell, did not escape this penalty. OCaml’s runtime sat at 128.1 seconds with $0.58 cost, and Haskell, despite producing a relatively small codebase (224 LOC), took 174 seconds and $0.74, with one test failure out of 40 runs.

The correlation between cost and time is striking. Across the board, higher token consumption maps directly to longer execution windows, a relationship that mirrors the pricing model of Claude’s API. The experiment’s cost metric—derived from the number of tokens processed—ranges from $0.36 for Ruby to $0.84 for Ruby/Steep, a more than two‑fold difference. This suggests that developers leveraging AI coding assistants should weigh the marginal safety gains of static typing against the tangible expense of longer, more token‑heavy sessions. Moreover, the LOC metric proves a poor predictor of efficiency: OCaml and Haskell generate the fewest lines yet sit in the mid‑range of time and cost, while C, with a sprawling 517 LOC, is the most expensive at $0.74 and slowest at 155.8 seconds.

Endoh’s findings arrive amid a broader conversation about AI‑augmented software development. The Register has highlighted Claude’s role as a co‑creator of the experimental language Elo, and VentureBeat notes that Claude Code can cost up to $200 per month—prices that quickly add up when scaling to large codebases. The benchmark therefore provides a data‑driven counterpoint to the qualitative arguments that dominate the debate: “Static typing prevents AI hallucination bugs” versus “Skipping type annotations saves tokens.” In practice, the numbers show that dynamic languages not only avoid hallucinations (zero failures across 600 runs) but also deliver the fastest, cheapest, and most reliable output when paired with a large‑scale LLM like Claude Opus 4.6.

Claude Code Sparks Debate Over Best Programming Language for AI Development

Key Facts

Sources

🏢Companies in This Story

Related Stories