​​​​​​​​​​​​​​​​​         

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

AI benchmarking organization criticized for waiting to disclose funding from OpenAI


The organization that develops mathematical benchmarks for artificial intelligence did not disclose that it received funding from OpenAI until relatively recently, drawing accusations of impropriety from some in the artificial intelligence community.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grant-making foundation, revealed on December 20 that OpenAI supported the creation of FrontierMath. FrontierMath, a test with expert-level problems designed to measure the math skills of artificial intelligence, was one of the benchmarks OpenAI used to demonstrate its upcoming cutting-edge AI, o3.

ua publish on the LessWrong forum, an Epoch AI contractor under the username “Meemi” says that many contributors to the FrontierMath benchmark were not informed of OpenAI’s involvement until it was made public.

“Communication on this has been non-transparent,” Meemi wrote. “In my opinion, Epoch AI should have disclosed OpenAI’s funding, and contractors should have transparent information about the potential of their work being used for capabilities when choosing whether to work at scale.”

On social networks, some user raised concerns that secrecy could tarnish FrontierMath’s reputation as an objective benchmark. In addition to supporting FrontierMath, OpenAI had access to many of the problems and solutions in the benchmark — a fact Epoch AI didn’t reveal before December 20, when o3 was announced.

In response to Meemi’s post, Tamay Besiroglu, Epoch AI’s associate director and one of the organization’s co-founders, asserted that FrontierMath’s integrity had not been compromised, but acknowledged that Epoch AI “made a mistake” by not being more transparent.

“We were limited to partnership disclosure up until the time o3 launched, and in hindsight, we should have negotiated harder to be transparent to benchmark contributors as soon as possible,” Besiroglu wrote. “Our mathematicians deserved to know who might have access to their work. Although we were contractually limited in what we could say, we should have made transparency with our collaborators a non-negotiable part of our contract with OpenAI.”

Besiroglu added that while OpenAI has access to FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem set to train its AI. (Training AI on FrontierMath would be similar teaching to the test.) Epoch AI also has a “separate set of holds” that serve as additional safeguards for independent verification of the FrontierMath benchmark results, Besiroglu said.

“OpenAI … fully supported our decision to maintain a separate, unprecedented hold set,” Besiroglu wrote.

Muddying the waters, however, Epoch AI lead mathematician Ellot Glazer stated in a post on Reddit that Epoch AI could not independently verify OpenAI’s FrontierMath o3 results.

“My personal opinion is like that [OpenAI’s] the result is legitimate (ie, they didn’t train on the data set) and that they have no incentive to lie about the performance of the internal benchmark,” Glazer said. “However, we cannot vouch for them until our independent assessment is complete.”

It’s a saga more more example the challenges of developing empirical benchmarks for assessing artificial intelligence — and securing the necessary resources to develop benchmarks without creating the perception of a conflict of interest.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *