Reports2023-07-22 Letter to the Community

2023-07-22 Letter to the Community

Published: July 22, 2023

To read the incident report with a timeline of the events, please click here.

Letter to the Community

by Jay Graber

Last week, users on Bluesky reported an account that used a racial slur as its handle. Within an hour, Bluesky’s moderation team took down the account, as we consider racial slurs to be unacceptable hate speech. We’re sorry that this handle slipped through — we should have had automated filters on user handles. Our Black community deserved better. If we could go back in time, we would have added a filter before bringing on any users. The next best thing we can do is to fix the issue and share a plan for moving forward.

After the account was taken down Wednesday night, the team began working on an automated slur filter. With the help of open-source contributors who had proactively submitted some basic word lists, we implemented a simple filter that night on a noncomprehensive list to prevent slurs in handles. When we merged two incomplete word lists together, it was done through a confusing PR that appeared to remove some ableist slurs, which caused controversy. Open source code sometimes means that incomplete work is assumed to be final by onlookers, and this was such a case. All of the terms that were removed in that PR were added to the final backend filtering system that we ended up with. The next day, we deployed a more comprehensive technical fix that would flag fuzzy matched handles and allow human moderators to take context into account when making decisions. A few days later, we expanded that system to include profile display names, custom feed names, and list names.

People have asked why we didn’t already have a technical solution in place that would prevent this problem. We had a list of handles that were reserved for technical reasons, pulled from a list of commonly banned subdomain prefixes, but didn’t set up automated moderation to catch slurs. This was a mistake. Our team initially relied on human systems not out of malice, but because the automated filtering of slurs is a notoriously tricky problem. The right technical solution was more complex than throwing in a list of banned slurs along with the subdomain prefixes. Slurs need to be caught when they appear within handles, not just when the word is the handle itself. But lots of innocent words or names contain substrings that an automated system would filter out as a slur. The use of slurs is also dependent on context — there are users on Bluesky right now whose handles include reclaimed slurs for groups to which they belong. This reclamation of language is important, but the harm of seeing certain words in a hateful context means that some handles should be forbidden outright. We failed to take into account how important it is to implement some automated systems up-front, even imperfect ones, when leaning on the high-touch human approach to handle moderation.

We’re always trying to balance human-led Trust & Safety with automated labeling and filtering. In general, we prefer to have humans involved in moderation decisions. The more comprehensive solution for slur filtering we’ve arrived at in the past week is a two-step system for automated filtering and flagging of user handles. This solution combines an automated filter for some handles with a system that automatically flags others for a human moderator to review when they’re created. Keeping a human in the loop will help us take into account context that a purely automated system would miss. We’ll continue to improve this system over time.

On the community & communications side, we’ve been accelerating some changes we were already working on making. We’ve been consulting with Black community builders to advise us on how to create a space that is safe for marginalized communities, and to help us detect and prevent these sorts of issues in the future. We’ve been growing the Trust & Safety team. And we’ve been working with communications consultants, since we realize communicating with the community is also an area where we failed here.

Many users expressed dismay at the team’s silence in the wake of this incident. We realize that the silence from our personal accounts could come off as if we didn’t care or were okay with the usage of slurs in handles. This is absolutely not the case, and we apologize for this failure in communication. We’re a very small team that juggles a lot, and though getting an intense volume of feedback and some harassment from users through the app is part of running a social site, it makes doing the rest of our jobs difficult. We’ve started leaning on the support of communications and T&S professionals, who helped us post updates about last week’s incident from the main Bluesky account at @bsky.app. However, this change in tone was a noted contrast to the way team members casually interacted with the community from individual accounts in the past.

We’ve now created a new account, @safety.bsky.app, to communicate about Trust and Safety with the community. We plan to be more transparent and communicative from the Safety account, especially in high-profile cases that require either code or policy changes — starting with an in-depth incident report about what happened last week.

Since we reached 100k users, we’ve been encouraging people to submit moderation reports through the in-app flows, and to file support requests through the in-app form. We’ve been moving this direction because it’s become increasingly infeasible for individual team members to address moderation and safety issues directly with the community from their personal accounts. Every moderation report that you submit in-app is read and handled by a human moderator, generally within 24 hours. We’re developing more structured feedback channels as we grow, but in the meantime, if you have something that you’d like to share with the team, you can do so here.

Bluesky is a small team trying to do something big. We’ve made mistakes in the past, and despite our best efforts, we will probably make mistakes in the future. But it’s always our goal to correct those wrongs, and as we work to improve over time, we hope you’ll see that our actions support our intentions. We ask that you give us grace as we figure it out, and trust that we’re working as hard as we can to build a better, safer, more resilient social web.

To read the incident report with a timeline of the events, please click here.