NSFW AI is such an example, where the failure modes can be profound both for platforms and users. In 2022, a global social media site saw complaints rise by another b20% when its NSFW AI began to erroneously identify non-explicit content as adult-themed one leading to suspension of user accounts and their contents. These failures usually originate in inconsistencies of well-known image recognition algorithms (CNNs) as CONVnets that perhaps misinterpret the parts being seen and identify artistic or culturally sensitive matter for nudity, therefore creating false positives.
Likewise, there are financial consequences as well. Any site using large amounts of automated moderation would experience a slight decrease in user activity which could cause up to 15% loss for ad revenue best double deepfake detect solutions, if users feel they are targeted unjustifiably. Furthermore, from erroneously deleted content to accusations of censorship; there is a clear operational cost in the form of legal challenges that can easily run over $100k/ year for businesses. This gets worse if the AI cannot separate dangerous and disturbing news from good one, putting any platform in risk of fines under laws like EU's Digital Services Act which has strict rules for moderation content.
This lack of accuracy in NSFW AI can destroy the trust a user has with your company. In a year-end survey after 2021, only 35% of users (who also participated in this same YouGov poll) felt satisfied with AI-driven moderation citing reasons to do so including trust and accuracy. Has all this vividness and drama led to the public uproar though, and once a very famous artist was incorrectly announced as having stolen his own work, causing an outcry of bad PR. In such cases human-in-the-loop (HITL) systems are useful to allow humans—be it volunteers or otherwise—to moderate against these AIs mistakes, although this adds around 20% cost.
Over-filtering is not the only failure mode; another category of unintended content includes an under filtering where explicit posts manage to reach through AI detection mechanisms, resulting in PR nightmares. For instance, if just 80% of porn is identified and blocked then that other 20% might be images or videos which a site could accidentally serve to an unintended audience like children. Which is not only against the policies of platform but also harmful to reputation and legal status.
This creates the problem for developers that AI models must be continuously retrained to address these issues, a process which requires significant computational resources and time. Since many AI training datasets are already over 100 terabytes in size, the re-training can take weeks, further slowing down how quickly new models find their way into production. This in turn leads to a loop of failure, with the AI unable to learn from any new forms and styles of text
In the worst-case scenario, NSFW AI flubs can result in entire platforms being shut down or a mass exodus to competitors if users feel that it is too stringent (or just plain overwhelmed by all those bums). A few months later a significant failure of one major content-sharing site led to 25% fewer active users, and indeed showed the need for robust and reliable systems in order to moderate content on-line.
The potential for such cascading effects highlights the importance of accurate NSFW AI as well. But perhaps nowhere is the fine line between effective content moderation and potential pitfalls of AI failure as razor-thin than in this thorny realm of nsfw ai.