9 Sneaky Ways People Bypass Auto Moderation

Written by | October 26, 2018
two clever people bypassing auto moderation

Moderators are faced with over thousands of user-generated content to filter on a daily basis. Truth be told, a scenario where UGC outnumbers moderation manpower happens more frequently than expected. It is for this particular reason that automated moderation (also called AI moderation) was developed. Automated moderation is said to help human moderators double the speed of their work by diminishing the load of content that needs to be moderated.

What is auto moderation?

Auto moderation is powered by technology and designed to boost an online community’s moderation processes. It is a faster alternative to how human moderators check end-user content. This type of moderation can be programmed to meet various moderation needs and demands, through setting up algorithms and base models that define contents that the software should monitor on a regular basis.

How exactly does auto moderation work?

The keyword being ‘automated’, the scanning, filtering, and removal of text, images, and videos containing graphic and offensive content are done based on a set of rules or keywords. Assigning keywords to a machine-operated moderator involves a process called machine learning. Machine learning is a cluster of information and methods (otherwise known as data set) arranged and combined to solve a particular problem, or, in this case, moderate a specific type of content.
AI moderation can also be trained through the use of natural language processing. Some of the most common features of a machine-powered moderation include whitelisting, blacklisting, and blocking specific keywords. Indeed, there is a lot of work that can be accomplished based on speed and work load alone. But can auto moderation really be considered the ‘future of moderating content?’
Black Computer keyboard
(Image Courtesy of Pexels)

AI moderation may be big on speed, but it does have limitations that make it susceptible to instances of being bypassed by end-users

The quick detection of malicious and disturbing content may be one of the strengths of AI moderation, but it does not possess the kind of accuracy and flexibility that human moderators have. AI moderators require a base model in order to function. The base model contains references to a context or group of keywords that the auto moderator is assigned to monitor and filter, along with the outcomes that a brand or client wishes to predict.
It needs to be clearly defined or else it will not be able to function as optimally as needed. If an AI moderator is created to detect and moderate profanities, its base model should then contain a wide list of swear words and other information related to vulgarity.
Let’s put the above explanation into a scenario:
An auto moderator is designed to filter profanity. The said AI moderator is then programmed to either approve or reject a user post based on whether it contains or promotes offensive content or not. In this case, let’s assume that how it approves and rejects member posts is based on a 0 or 1 scoring system—where 0 implies the content is decent while 1 indicates that the content is inappropriate or profane. For AI moderation, the closer the scores are to the extremes, the easier and the more precise the approval and rejection rate will be.
However, if the score falls near the median, the program would have a harder time determining whether content should be allowed or not. This is the blind spot where human moderation excels over automated moderation. Human moderators are able to distinguish user intent without basing it on extremes.

Human intent can be masked to trick set AI moderation algorithms and guidelines

Artificial intelligence algorithms are not multi-dimensional. In the case of moderation operated through technology, it only functions based on how it was programmed and trained. AI moderation tools that lack sufficient semantic analysis are less vigilant and accurate in detecting subtle ways end-users insinuate disturbing topics in conversations.
It cannot venture into unfamiliar territory, or determine and recognize various human intentions for posting a particular type of content. Unless the base model contains answers to the why’s and how’s behind a user-generated content, opportunistic community members will continue to deceive AI moderators with covert techniques. At the same time, posts that are not intentionally offensive or malicious, but contain terminologies related to inappropriate content may end up being repeatedly rejected.
A good example for this is an initiative against the spread of extremist ideals. Clearly, the intention is good but since machine-operated moderation focuses greatly on keywords and base models, it will not be able to look past the terms ‘extremist’ or ‘extremism’. The end-result is the content will be banned or flagged as inappropriate even though the user’s intention is not to offend anyone.

Woman typing on a laptop

(Image Courtesy of Pexels)


Bypassing auto moderation filters can come in various forms, such as:

1. Using alphanumeric characters when spelling out curse words. If a post blatantly contains swear words or cursing, auto moderators will automatically delete it. However, if offensive language is spelled with characters such as “$”, “@”, or “#”, chances are it will not be filtered by the AI moderator.

2. Employing a method called ‘cloaking’ on social media, where links to malicious content including pornography, bogus miracle fitness supplements and earn-money-quick scams are concealed as safe, non-violating and end-user-friendly ads. These links are carefully cloaked that when Facebook’s moderation system goes through the link, it is redirected to a safe, informative website that adheres to its ad policies. Meanwhile, unknowing Facebook users are taken to pages containing inappropriate content.

3. Adding texts to images or blurring explicit details on the images. This method keeps AI moderation from detecting inappropriate depictions on the images posted by users violating the brand’s community guidelines.

4. Live streaming videos that depict offensive content are not moderated immediately if the moderation process used does not enable real-time monitoring of user-generated content.

5. Modifying the metadata of images to confuse or trick moderation bots into interpreting the image as something appropriate or compliant with the website’s posting guidelines.

6. Creating a fake account where minors can add a fake birth date to make them age-appropriate to view disturbing videos on a website. Age restrictions are easily surpassed in this manner, making adolescents susceptible to highly graphic videos online.

7. Using clever product placement to conceal content or products that may not be appropriate to the general audience from auto moderators. While the said product may carefully hidden from a moderation algorithms, it is still made visible enough for the viewers to take notice of it. Briefly flashing or showing the product onscreen imprints the product’s name into the subconscious mind of the viewers, thus driving higher attention to the product without the computerized moderation system detecting it.

8. Posting under the guise of a prominent organization, social media personality, or a widely known provider of a particular service instantly grabs the attention and trust of fellow community members. Some auto moderators may not be programmed to detect fraud profiles or scammers, making other members fall prey easily to email phishing and other forms of identity theft.

9. Joining a brand’s online community for the purpose of defaming it and its supporters through online trolling. The more vigilant an online community is in monitoring online trolls, the smarter and more creative trolls become in executing their objectives. There are trolls that create multiple fake profiles or accounts to avoid being detected. Meanwhile, some online trolls may conceal their text through a code or filter their images to prevent their posts from being detected and deleted by the site’s AI moderator.


Computer-run moderation and social media

There are instances where auto moderation filters may not be compatible with certain social media channels. For instance, Instagram and Twitter’s default posting settings do not allow nor include a ‘Hide’ or ‘Remove’ feature for user posts. Likewise, mentions and messages on Twitter are not allowed to be removed. As such, this makes automated UGC filtering and hiding useless in these channels.
Despite the limitations of AI in social media moderation, its role in monitoring millions of user content remains a relevant part of improving user experience and protecting them from disturbing text, images, and videos. Facebook, Instagram, Twitter and YouTube can use data collected by AI to take a closer look on user posts and comments and create a solid basis for how moderation guidelines can be enhanced.

Is AI moderation a waste of time and money?

The answer to that is no. AI moderation may have its limitations, but its contributions still highly benefit several brands online. When moderation is automated, or done through the aid of AI technology, it helps lessen the work load on the part of human content moderators. One good example is how Facebook and YouTube faced controversy and backlash after failing to provide ample support and care for their human moderators.
Due to the massive amount of disturbing content they have to monitor on a daily basis, the stress and trauma eventually took a toll on the moderators’ well-being and on the overall brand protection strategy of the two social media giants. At present violent content is rampant in Facebook and in YouTube. But, when auto moderation and human moderation work hand-in-hand, the complementary roles of both types of moderation can fill areas where the brand may be lacking in terms of keeping track of user-generated content.

An efficient content moderation service is a harmonized combination of auto moderation and human moderation services

Truth be told, a brand cannot have a fool-proof defense against unwanted user-generated content by simply depending on a single type of moderation alone. The secret to a well-oiled moderation process is finding the perfect balance between AI-powered user-generated content monitoring and human content moderation.
Moderation powered by technology enables quick and continuous content moderation especially for brands that regularly cater to large volumes of end-user content. With the aid of human moderation, the moderation done by AI can be made more accurate and modified based on investigating the user’s intent and purpose for sharing a text, image, or video.