Roblox deploys open-source AI, Sentinel, for child safety in chats
Roblox, the immensely popular online gaming platform frequented by millions of children and teenagers, has unveiled an open-source artificial intelligence system designed to proactively identify predatory language within its game chats. This significant move comes amidst mounting legal challenges and public criticism, with lawsuits alleging the company has not done enough to safeguard its younger users from online predators. One recent lawsuit, filed in Iowa, claims a 13-year-old girl was introduced to an adult predator on Roblox, subsequently kidnapped, trafficked across multiple states, and raped. The suit specifically contends that Roblox’s platform design makes children particularly vulnerable.
Roblox maintains that it strives to make its systems as safe as possible by default, acknowledging, however, that “no system is perfect” and that detecting critical harms like potential child endangerment remains one of the industry’s most formidable challenges. The new AI system, named Sentinel, is specifically engineered to detect early indicators of possible child endangerment, including sexually exploitative language. The company reports that Sentinel’s insights led to 1,200 reports of potential child exploitation attempts being submitted to the National Center for Missing and Exploited Children during the first half of 2025 alone. By open-sourcing this technology, Roblox aims to extend its protective capabilities to other platforms facing similar online safety concerns.
Detecting potential dangers to children through AI can be exceptionally complex, mirroring the difficulties faced by human moderators. Initial conversational exchanges, such as seemingly innocuous questions like “how old are you?” or “where are you from?”, might not trigger immediate red flags. Yet, when analyzed within the broader context of an extended conversation, these phrases can reveal a sinister underlying intent. Roblox, which boasts over 111 million monthly users, already prohibits sharing videos or images in chats and attempts to block personal information like phone numbers, though users frequently find ways to circumvent such safeguards. Additionally, children under 13 are restricted from chatting with other users outside of games unless explicit parental permission is granted. Unlike many other platforms, Roblox does not encrypt private chat conversations, enabling it to monitor and moderate interactions.
Matt Kaufman, Roblox’s chief safety officer, explained the limitations of previous filtering methods. He noted that while older filters were effective at blocking profanity and various forms of abusive language based on single lines or short text snippets, behaviors related to child endangerment or grooming typically unfold over much longer periods. Sentinel addresses this by capturing one-minute snapshots of chats across Roblox – processing approximately 6 billion messages daily – and analyzing them for potential harm. To achieve this, Roblox developed two distinct reference models: one comprising benign messages and another containing chats definitively identified as violating child endangerment policies.
This innovative approach allows Sentinel to recognize harmful patterns that extend beyond merely flagging specific words or phrases, instead considering the entire conversational context. Naren Koneru, Roblox’s vice president of engineering for trust and safety, elaborated on this, stating that the “negative” reference model continuously improves as more malicious actors are detected, while a “positive” model represents typical, normal user behavior. As users chat, the system continuously assesses their interactions, scoring whether their behavior aligns more closely with the benign or harmful reference model. Koneru emphasized that this assessment isn’t based on a single message but rather on the cumulative pattern of interactions over days. If a user’s score indicates a lean towards the “negative” cluster, human reviewers are prompted to conduct a much deeper investigation, examining all related conversations, connections, and games played by that user. Ultimately, any risky interactions identified through this process are reviewed by human safety experts and, when appropriate, reported to law enforcement agencies.