OpenAI has formally accused Chinese AI startup DeepSeek of using sophisticated techniques to extract data from U.S. AI models in order to train its own systems, according to memos shared with U.S. lawmakers and reported by Bloomberg News.
In documents submitted to the House Select Committee on Strategic Competition Between the United States and the Chinese Communist Party, OpenAI raised concerns that DeepSeek may be using model output from American AI systems—generated by companies such as OpenAI itself—to develop competitive products, including its next generation R1 chatbot.
The allegations center on the use of “distillation,” a machine learning method in which a smaller model learns from outputs of a larger, more advanced one. Bloom berg reported that OpenAI believes DeepSeek’s use of distillation could effectively allow the company to “free-ride” on intellectual property developed by U.S. labs. The memo also suggests that DeepSeek is deploying new, obfuscated techniques designed to evade existing safeguards aimed at preventing misuse of AI model outputs.
Escalating Competition and Security Concerns
The allegations underscore mounting global competition in AI technology, particularly between U.S. firms and Chinese companies seeking to accelerate their capabilities. OpenAI’s memo reportedly flagged the data extraction issue as a strategic concern, highlighting potential risks to American technological leadership if safeguards are ineffective.
A separate report by Reuters confirmed that the memo was circulated to members of the House Select Committee, which has been hearing testimony on national security and China-U.S. technological competition. According to Reuters, OpenAI did not make the memo public but shared it with lawmakers who have oversight responsibilities for emerging technologies.
What OpenAI Says
While neither OpenAI nor DeepSeek has publicly released the detailed contents of the memo, people familiar with the matter told Bloomberg that the memo outlines specific examples of alleged model output extraction. OpenAI declined to comment publicly on the allegations when contacted by reporters.
Analysts say the dispute highlights broader challenges in regulating AI development and protecting innovations in an environment where digital outputs can be copied and reused in opaque ways.
Broader Industry Context
The issue of training AI systems on the outputs of other models is not new. Researchers and companies have debated for years how AI generated content, web data and third-party information should be used for training purposes. Distillation itself is a legitimate academic technique, but when applied to proprietary model outputs without authorization, it can raise intellectual property and competitive concerns.
For policymakers, the DeepSeek allegation arrives at a time when Congress is increasingly focused on AI governance, data protection and the strategic implications of artificial intelligence deployment. Lawmakers have also moved to restrict exports of semiconductor technology and express concern over foreign access to advanced AI capabilities.
What Happens Next
At present, there is no public legal action against DeepSeek tied to the memo. The accusations, as reported, have yet to be litigated or independently verified beyond OpenAI’s submission to lawmakers.
Still, the episode illustrates how disputes over data usage and training practices are becoming central to debates over AI policy and international competition. As AI systems grow more powerful and pervasive, industry observers say governance frameworks will need to evolve to address complex questions of ownership, security and fair competition.