Moore Threads GPU: The China Domestic GPU That Now Powers Pony.ai and Offers 30-Day Free Trials for AI Founders

Moore Threads GPU: The China Domestic GPU That Now Powers Pony.ai and Offers 30-Day Free Trials for AI Founders

You are an AI founder building the next generation of autonomous systems, large language models, or computer vision applications. Your biggest bottleneck is not code. It is compute.

NVIDIA GPUs are expensive, supply-constrained, and increasingly caught in geopolitical crosscurrents. The search for alternatives is no longer theoretical—it is a business imperative.

In February 2026, Pony.ai, the world’s first publicly traded Robotaxi company, made a decision that signals a fundamental shift in the AI hardware landscape. It partnered with Moore Threads to power its L4 autonomous driving training and simulation workloads using domestic Chinese GPUs .

This is not a pilot project. Pony.ai’s world model generates over 10 billion kilometers of test data weekly . That scale of compute demand is now running, in part, on Moore Threads hardware.

For AI founders, startup builders, and entrepreneurs, this is the moment to evaluate whether the Moore Threads GPU ecosystem has reached production readiness. This article gives you the facts: what works, how to access it for free, what it costs, and why other founders are already switching.

Why AI Founders Are Looking at Moore Threads GPU in 2026

The supply chain for AI accelerators has become a strategic vulnerability. Founders who relied on a single source now face unpredictable lead times, price spikes, and geopolitical uncertainty.

Moore Threads offers a domestic alternative that has already passed the most important test: real-world adoption. Pony.ai did not sign a marketing agreement. They deployed MTT S5000 cards into their world model training pipeline—the same pipeline that generates 10 billion kilometers of simulated driving data every week .

Zhang Jianzhong, Moore Threads founder and CEO, explained the partnership in a February 2026 statement: “This marks the first deep collaboration between domestic full-function GPU computing power and leading autonomous driving algorithms” .

For AI founders, this means the hardware has been vetted by a company that cannot afford to experiment. Pony.ai’s L4 autonomous driving stack runs on these cards. Your inference pipeline can too.

The financial reality also matters. Moore Threads reported 14.5 billion yuan to 15.2 billion yuan in revenue for 2025, representing 230.7% to 246.7% year-over-year growth . The company listed on Shanghai’s STAR Market in December 2025 with a peak valuation exceeding 900 yuan per share, establishing itself as the first publicly traded domestic GPU company in China .

Moore Threads GPU: The China Domestic GPU That Now Powers Pony.ai and Offers 30-Day Free Trials for AI Founders. InfoPinky.com

The Hardware That AI Founders Can Actually Buy

Moore Threads has shipped multiple GPU generations, with products spanning consumer, professional, and datacenter markets.

ProductMarketKey SpecPrice (as of March 2026)Availability
MTT S80Consumer14.4 TFLOPS, 16GB GDDR6, PCIe 5.01,349 yuan (~$185)JD.com, in stock
MTT S5000Datacenter1000 TFLOPS FP8, 80GB memory, 1.6TB/s bandwidthContact salesEnterprise channel
Kua’e ClusterDatacenter10 EFLOPS, up to 1024 cardsContact salesEnterprise channel

MTT S80: The Consumer Card That Runs AI Locally

The MTT S80 is the entry point. Priced at 1,349 yuan (approximately $185) during current JD.com promotions, it runs on any standard PC with PCIe 5.0 support . It features:

  • 4,096 MUSA cores
  • 16GB GDDR6 memory
  • 448 GB/s memory bandwidth
  • 1.8 GHz core clock
  • 14.4 TFLOPS single-precision compute
  • PCIe 5.0 x16 interface (first in the industry)

The PCIe 5.0 interface delivers 128 GB/s bidirectional data transfer, significantly reducing CPU-GPU communication latency . For AI inference workloads, this matters—lower latency means faster model responses.

Independent developers have used the MTT S80 to run models like DeepSeek-R1 locally. The 16GB GDDR6 memory is sufficient for many inference workloads, while the 14.4 TFLOPS compute provides adequate performance for development and testing.

MTT S5000: The Datacenter Card Powering Autonomous Driving

For production workloads, the MTT S5000 is the workhorse. Built on the 4th-generation Pinghu architecture, it was the card Pony.ai deployed . Key capabilities:

  • 1000 TFLOPS FP8 compute performance
  • 80GB memory capacity
  • 1.6 TB/s memory bandwidth
  • Full support for FP8, BF16, FP16, FP32, FP64, and INT8 compute modes
  • MTTLINK 2.0 interconnect for multi-GPU scaling

The FP8 compute capability is critical. In independent testing from January 2026, the MTT S5000 achieved 4000 tokens per second in prefill mode and 1000 tokens per second in decode mode on DeepSeek V3 671B—performance approaching international high-end AI accelerators .

The MTT S5000 is not a standalone card—it is designed for the Kua’e intelligent computing cluster, which can scale to over 100,000 GPUs per cluster, supporting massive AI training workloads .

Kua’e Cluster: The AI Factory Infrastructure

For founders scaling beyond single-card inference, Moore Threads offers the Kua’e cluster architecture. Each node contains 8 MTT S5000 GPUs connected via MTTLINK 2.0, with RDMA FC8 fabric interconnects across racks . The cluster supports:

  • 10 EFLOPS total compute capacity
  • Up to 1024 cards per cluster
  • 95% linear scaling efficiency from 64 to 1024 cards
  • Integration with the MUSA software stack

The Software Stack That Makes CUDA Code Portable

The biggest fear for any founder considering a GPU switch is the cost of rewriting code. Moore Threads has addressed this head-on with its MUSA platform.

MUSA (Meta-computing Unified System Architecture) is a parallel computing platform that includes a CUDA-compatible programming model, a compiler, and a set of optimized libraries . The key components for developers are:

  • MUSA Toolkit: Compiler, runtime, and GPU-accelerated libraries
  • Musify: A tool that translates existing CUDA code to MUSA format
  • MUSA Compute Libraries: cuBLAS and cuDNN equivalents
  • Neural network acceleration libraries: PyTorch and TensorFlow integrations
  • MUSA Driver: Runtime and device management

You do not rewrite your kernels. You run them through Musify, then compile for Moore Threads hardware.

The MUSA SDK is open source and available for free download. According to Moore Threads documentation, the software stack includes:

  • Tensor engine for AI acceleration
  • High-performance communication libraries (muDNN, MCCL)
  • Full PyTorch compatibility

For developers, this means existing PyTorch models that use standard operations (convolutions, attention, matrix multiplies) will run through the MUSA-compatible libraries without modification. Custom CUDA kernels require translation via Musify, but the translation is automated.

How to Access Moore Threads GPU for Free: A Step-by-Step Guide for AI Founders

Moore Threads runs multiple programs that give AI founders zero-cost access to hardware and software. Here is exactly how to claim them.

Step 1: Download the MUSA SDK for Free

The MUSA SDK is free. No credit card required. No time limit.

What to do:

  • Visit the Moore Threads developer portal
  • Register for a developer account
  • Download MUSA SDK 4.2.0 for Ubuntu
  • Install and run the sample examples to verify installation

What you get:

  • Full access to the compiler, runtime, and libraries
  • The Musify migration tool
  • Sample code and documentation
  • Compute libraries (muDNN, MCCL) optimized for MTT S5000

You can evaluate the software stack, test Musify on your own CUDA code, and confirm compatibility without spending a dollar.

Step 2: Claim the 30-Day Free Trial for AI Coding Plan

On February 3, 2026, Moore Threads launched AI Coding Plan, a developer-facing suite compatible with tools like Cursor, Claude Code, and OpenCode .

The plan includes a 30-day free trial (Free Trial tier) that gives you access to:

  • Code completion and generation powered by Moore Threads backend
  • GLM-4.7 code model integration
  • SiliconFlow inference acceleration engine
  • Integration with existing development workflows
  • Performance optimized for MTT S5000 and S80 cards

How to claim:

  • Visit the AI Coding Plan product page on Moore Threads’ website
  • Click on “申请免费体验” (Apply for Free Trial)
  • Register or log in with your developer account
  • The 30-day trial activates immediately upon approval

Pricing for after the trial:

  • Lite Plan: 120 yuan/quarter (approximately $16.50) for light usage
  • Pro Plan: 600 yuan/quarter (approximately $82.50) for regular usage
  • Max Plan: 1200 yuan/quarter (approximately $165) for enterprise-level high-frequency calls

Step 3: Apply for the Lighthouse Program (Free Compute for Research)

Moore Threads operates a Lighthouse Program (灯塔计划) that provides free compute resources for research teams and qualified startups. The program is designed to accelerate development on domestic GPU platforms.

How to apply:

  • Submit a proposal through the developer portal
  • Describe your project, team, and expected compute needs
  • Highlight your research or startup status
  • If accepted, you receive access to a remote MTT S5000 cluster for development and testing

The program has supported over 200 research groups as of early 2026. Academic affiliations and open-source contributions are viewed favorably in applications.

Step 4: Join the Developer Community for Support

The Moore Threads developer community has grown substantially since the company’s December 2025 IPO. Forums, sample code repositories, and direct technical support channels are available.

You can ask questions about:

  • Porting specific models to MUSA
  • Performance optimization for MTT S5000
  • Integration with PyTorch or TensorFlow
  • Compatibility with existing CUDA codebases

The community is active and, importantly, English-friendly for international developers.

Where Moore Threads GPU Is Already Running Production Workloads

The question founders ask is not whether the hardware works in a lab. It is whether it works at scale.

Pony.ai World Model Training

Pony.ai’s partnership with Moore Threads was announced on February 6, 2026 . The scope includes:

  • Training and optimization of Pony.ai’s world model (PonyWorld)
  • Deployment on Moore Threads MTT S5000 cards
  • Use of the Kua’e intelligent computing cluster
  • Validation of vehicle-end models on domestic GPU hardware

Peng Jun, Pony.ai founder and CEO, stated: “Our cooperation with Moore Threads is a true integration of ‘AI algorithms’ and ‘AI computing power.’ What we are truly building together is the infrastructure for the next generation of smart mobility and logistics” .

The world model generates over 10 billion kilometers of test data weekly. That scale of compute is now running, in part, on Moore Threads hardware. Pony.ai currently operates 1,159 Robotaxi vehicles and targets 3,000+ vehicles by the end of 2026 .

DeepSeek V3 671B Inference

On January 21, 2026, Moore Threads and SiliconFlow announced successful DeepSeek V3 671B deployment on MTT S5000 .

Performance metrics:

  • Prefill throughput: 4000 tokens per second
  • Decode throughput: 1000 tokens per second
  • FP8 low-precision inference technology enabled the breakthrough
  • Performance approaches international high-end AI accelerators

This deployment used the full 671-billion parameter “full-blooded” version of DeepSeek V3, previously requiring NVIDIA H100 or A100 cards. The MTT S5000 achieved this with full-stack optimization from driver to operator library to inference engine .

Wan2.1 Video Generation

A 16-card MTT S5000 cluster was shown to generate video samples at 61.8 samples per second for the Wan2.1 model—a performance level that makes real-time video generation feasible for creative applications.

GLM-4.7 Code Model Integration

Moore Threads has fully integrated GLM-4.7, the top-ranked code model on Code Arena (beating Gemini-3-Flash and GPT-5.2), into its AI Coding Plan . This gives developers access to state-of-the-art code generation on domestic hardware.

The One Question Every AI Founder Asks

“Can I run my existing PyTorch models without rewriting everything?”

The answer comes from the software stack. PyTorch models that use standard operations (convolutions, attention, matrix multiplies) will run through the MUSA-compatible libraries without modification. Custom CUDA kernels require translation via Musify, but the translation is automated.

The Moore Threads developer portal maintains a list of verified models, including:

  • DeepSeek-V3 (full 671B version)
  • GLM-4 series
  • LLaMA family
  • Stable Diffusion variants
  • Whisper and other speech models

If your model is not on the list, you can request verification through the support forum or apply for Lighthouse Program access to test compatibility.

“How does Moore Threads compare to NVIDIA H100?”

According to independent testing published in January 2026:

  • DeepSeek-V3 inference: MTT S5000 achieves 1000 tokens/sec decode vs comparable performance to H20
  • SPONGE molecular simulation: 1.7x H100 performance
  • DSDP molecular docking: 8.1x H100 performance
  • Training loss precision on DeepSeek-236B: 0.6% relative error vs H100 baseline

The MTT S5000 is not a drop-in H100 replacement across all workloads, but in specific high-value AI inference tasks, it is competitive and available.

“What happens after the 30-day free trial?”

After the AI Coding Plan free trial ends, you can continue with paid tiers starting at 120 yuan/quarter (approximately $16.50) . The MUSA SDK remains free forever. For compute access, you can either purchase hardware (MTT S80 for $185) or apply for extended research support through the Lighthouse Program.

How to Get Started Today

StepActionTimeCost
1Download MUSA SDK15 minutesFree
2Test Musify on your CUDA code1-2 hoursFree
3Sign up for 30-day AI Coding Plan trial5 minutesFree
4Apply for Lighthouse Program compute access1-2 weeksFree (approved applicants)
5Purchase MTT S80 for development (optional)6-8 weeks delivery1,349 yuan (~$185)

The SDK is free. The trial is free. The Lighthouse Program is free. You do not need to commit hardware dollars to determine whether Moore Threads GPUs work for your stack.

Leave a Comment