Bumble Releases Open-Source Version of Private Detector A.I. Feature to Help Tech Community Combat Cyberflashing
At Bumble, safety has always been at the heart of our mission. Since 2018, we’ve worked to help pass legislation in both the U.S. and U.K. to combat the sending of unsolicited nudes online, known as cyberflashing. In 2019, we harnessed technology to better shield our community from unwanted lewd images, launching our Private Detector™ A.I. feature in the Bumble app.
Private Detector™ works by automatically blurring a potential nude image shared within a chat on Bumble. You’ll be notified, and it’s up to you to decide whether to view or block the image. (You can also easily report it to Bumble. We don’t tolerate any bad behavior at all on our app.)
Now, Bumble’s Data Science team has written a white paper explaining the technology of Private Detector™ and has made an open-source version of it available on GitHub. It’s our hope that the feature will be adopted by the wider tech community as we work in tandem to make the internet a safer place.
For more on Bumble’s legislative work, see here.
At Bumble Inc., the parent company of Bumble, Badoo, and Fruitz, safety has been a central part of our mission and a core value that informs the company’s product innovations and roadmap. We’ve leveraged the latest advancement in technology and Artificial Intelligence (AI) to help provide our community of users with the tools and resources they need to have a safe experience on our platforms. In 2019 we launched Private Detector™ across Bumble and Badoo app, an AI-powered feature that detects and blurs lewd images and a warning is sent to users about the photo before they open it.
As just one of many players in the world of dating apps and social media at large, we also recognize that there’s a need to address this issue beyond Bumble’s product ecosystem and engage in a larger conversation about how to address the issue of unsolicited lewd photos – also known as cyberflashing – to make the internet a safer and kinder place for everyone.
In an effort to help address this larger issue of cyberflashing, Bumble teamed up with legislators from across the aisle in 2019 in Texas to pass a bill that effectively made sending unsolicited lewd photos a punishable offense. Since the passing of HB 2789 in Texas in 2019, Bumble has continued to successfully advocate for similar laws across the United States and globally.
In 2022, Bumble reached another milestone in public policy by helping to pass SB 493 in Virginia and most recently SB 53 in California, adding another layer of online safety in one of the most populous states in the United States.
These new laws are the first step to creating accountability and consequences for this everyday form of harassment that causes victims—predominantly women—to feel distressed, violated, and vulnerable online.
As Bumble continues to help curb cyberflashing through legislative efforts and provide safety tools such as Private Detector™ to help keep our community safe from unsolicited nudes within our apps, we hope to make a ripple effect of change across the internet and social media at large. This is why today we are extremely proud to release a version of the Private Detector™ to the wider tech community with the hope of democratizing access to our technology and to help scientists and engineers experiencing the same challenges around the world to improve their approach to online safety.
How does it work?
Since the early days of Badoo, we have always been pioneers in leveraging technology and advanced procedures to improve both our match-making experience and our integrity and safety capabilities. Behind the scenes, we started designing and implementing machine learning solutions for lewd image detection for almost a decade, trying to leverage both our best-in-class knowledge in the tech space and the insights collected by our apps, thanks to our dominant position in the dating industry.
Machine learning (ML) is a field devoted to understanding and building methods that learn (or better, mimic) how to reach human-level performances on specific tasks, leveraging data to improve their accuracy. The development cycle requires you to carefully design and develop a neural network’s architecture and to provide it iteratively with a curated set of samples (dataset) from the problem – in our case, detecting if a picture contains lewd content or not.
Even though the number of users sending lewd images on our apps is luckily a negligible minority – just 0.1% – our scale allows us to collect a best-in-the-industry dataset of both lewd and non-lewd images, tailored to achieve the best possible performances on the task. Our Private Detector™ is trained using very high volume data sets, with the negative samples (the ones not containing any lewd content) carefully selected in order to better reflect edge cases and other parts of the human body (eg. legs, arms) in order not to flag them as abusive. Iteratively adding samples to the training dataset to reflect actual users’ behavior or test misclassification, proved to be a successful exercise that we applied during the years in all our machine learning endeavors. Even if the downstream task is framed as a binary classification problem (as in our case!) nothing prevents data scientists from possibly defining more concepts (or labels), to possibly merge them back right before the actual training epochs.
Traversing the trade-offs between state-of-the-art performance and the ability to serve our user base at scale, we implemented (in its latest iteration) an EfficientNetv2-based binary classifier: a convolutional network that has faster training speed and overall better parameters efficiency. It uses a combination of better designed architecture and scaling, with layers like MBConv (that utilizes 1×1 convolutions to wide up the space and depth-wise convolutions for reducing the number of overall parameters) and FusedMBConv (that merges some steps of the vanilla MBConv above for faster execution), to jointly optimize training speed and parameter efficiency. The model has been trained leveraging our GPU powered data centers in a continuous exercise of dataset, network and hyperparameters (the settings used to speed up or improve the training performance) optimization.
When analyzing its performance in different conditions (both offline and online) we are proud to state that it achieves world class performance (>98% accuracy, both in upsampled and production-like settings, with no clear tradeoffs between precision and recall).
What are we releasing today?
Concomitantly with this White Paper, we are releasing on Github.com the source code we used to train the machine learning engine powering the Private Detector™, together with a ready-to-use SavedModel to deploy the model as it is (using TensorFlow Serving) and a checkpoint for possibly finetune it with additional images, improving its performance on samples that are important for specific use cases. In both scenarios, the repository comes with extensive documentation and a guide on how to perform those actions, in order to make the experience as smooth as possible for all the scientists, engineers or product folks around the world.
This version of the Private Detector™ is released under the Apache License, so that it is available for everyone to implement it as the standard for blurring lewd images as it is, or after fine tuning it with additional training samples. Improvements to the architecture or to the overall code quality and structure are welcome.
Check out bumble-tech for any other exciting projects happening at Bumble.