Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
So-called AI reasoning models are becoming easier — and cheaper — to develop.
On Friday, NovaSky, a team of researchers at UC Berkeley’s Sky Computing Lab, released the Sky-T1-32B-Preview, a thinking model that rivals an earlier version of OpenAI’s o1 on a number of key measures. The Sky-T1 appears to be the first truly open reasoning model in the sense that it can be replicated from scratch; the team has released the dataset they used for training, as well as the required training code.
“Amazingly, the Sky-T1-32B-Preview was trained for less than $450,” the team wrote in blog post“demonstrating that it is possible to affordably and efficiently replicate high-level inference capabilities.”
$450 might not sound that affordable. But it wasn’t that long ago that the price to train a model with comparable performance it was often in the millions of dollars. Synthetic training data or training data generated by other models helped reduce costs. Palmyra X 004, a model recently released by AI company Writer, is almost entirely trained on synthetic datait reportedly cost only $700,000 to develop.
Unlike most AI, thinking models effectively check themselves, which it helps them avoid some of the pitfalls that normally trip up models. Reasoning models take slightly longer – typically a few seconds to minutes longer – to reach a solution compared to a typical non-reasoning model. The upside is that they tend to be more reliable in areas such as physics, science and math.
The NovaSky team says it used a different reasoning model, Alibaba’s QwQ-32B-Reviewto generate the initial training data for Sky-T1, then “curated” the data mix and leveraged OpenAI GPT-4o-mini rearrange the data into a more usable format. Training Sky-T1 with 32 billion parameters took about 19 hours using a rack of 8 Nvidia H100 GPUs. (The parameters roughly correspond to the model’s problem-solving skills.)
According to the NovaSky team, the Sky-T1 performs better than the early version o1 on MATH500, a competition-level collection of math challenges. The model also outperforms previous preview o1 on a set of difficult problems from LiveCodeBench, a coding evaluation.
However, Sky-T1 falls behind exam o1 on GPQA-Diamond, which contains questions related to physics, biology and chemistry that a PhD student would be expected to know.
It is also important to note that OpenAI GA release o1 is a stronger model than the previous version o1 and OpenAI is expected to release an even better thinking model, o3in the weeks ahead.
But the NovaSky team says Sky-T1 marks just the beginning of their journey to develop an open-source model with advanced inference capabilities.
“Going forward, we will focus on developing more efficient models that maintain strong inference performance and exploring advanced techniques that further improve the model’s efficiency and accuracy at test time,” the team wrote in a release. “Stay with us as we move forward with these exciting initiatives.”