9 Sneaky Ways People Bypass Auto-Moderation

Updated

August 20, 2022

Written by

Stephanie Walker

Moderators are tasked to filter thousands of pieces of user-generated content on a daily basis. The truth is that situations where UGC outweighs moderator resources occur more frequently than anticipated. It is for this reason that automated moderation (also known as AI moderation) was created. Automated moderation is said to help human moderators get more work done by reducing the amount of content that needs to be moderated.

What is Auto-Moderation?

Auto-moderation is powered by technology and designed to boost an online community’s moderation processes. It is a faster alternative to how human moderators check user content. This type of moderation can be programmed to meet various moderation needs and demands by setting up algorithms and base models that define the content that the software should monitor on a regular basis.

How Exactly Does Auto-Moderation Work?

With "automated" being the operative word here, the process of scanning, filtering, and deleting text, images, and videos that contain graphic or offensive content is carried out based on a predetermined set of guidelines or keywords. Assigning keywords to a machine-operated moderator involves a process called machine learning. Machine learning is the collection of information and methods (otherwise known as a data set) arranged and combined to solve a particular problem, or, in this case, moderate a specific type of content.

Natural language processing, also known as NLP, is another method that can be used to train AI to moderate user-generated content. This method enables the AI to comprehend more complex user-generated content than a standard AI. Whitelisting, blacklisting, and the blocking of specific keywords are three of the most common functions that can be found in machine-powered moderation.

Also, while it is vital to know the work processes of auto-moderation, it still pays to know what profanity filters are.

How does a profanity filter work?

Simply, a profanity filter is a software program that screens all user-generated content posted online to remove all forms of profanity across various online platforms such as social media, online communities, forums, and even online marketplaces, among other channels. Employing profanity filters allows administrators or brand owners to establish a set of rules and policies for moderators to distinguish which types of words or content in general should be scanned and monitored by the profanity filter. While most profanity filters are fueled by smart algorithms, some are still backed by human operators, making them an ideal addition to a brand’s initiative to reduce their users’ exposure to profanity and other repulsive remarks online.

The Future of Content Moderation

Indeed, there is much work that can be accomplished based on speed and workload alone. On the other hand, should we really believe that auto-moderation represents the "future of moderating content"?

Detecting malicious and disturbing content in a jiffy may be among the strengths of AI moderation, but it still falls short of the complexity, accuracy, and flexibility of comprehension that only human operators possess. As opposed to human-powered content moderation, AI moderators require a base model to function. The base model contains references to a context or group of keywords that the auto moderator is assigned to monitor and filter, along with the outcomes that a brand or client wishes to predict.

It needs to be clearly defined, otherwise it will not be able to function as optimally as needed. If an AI moderator is created to detect and moderate profanities, its base model should then contain an extensive list of swear words and other information related to vulgarity.

AI moderation may be big on speed, but it does have its limitations.

These limitations of AI make it susceptible to instances of being bypassed by end-users.

Specifically, human intent can be masked to avoid profanity filters, moderation algorithms, and guidelines. Artificial intelligence algorithms are not multi-dimensional. In the case of moderation operated through technology, it only functions based on how it was programmed and trained. Particularly, AI moderation tools that lack sufficient semantic analysis are less vigilant and accurate in detecting subtle ways end-users insinuate disturbing topics in conversations.

It cannot venture into unfamiliar territory or determine and recognize various human intentions for posting a specific context. Unless the base model contains answers to the why’s and how’s behind the user’s content, opportunistic community members will continue to deceive AI moderators with covert techniques. At the same time, posts that are not intentionally offensive or malicious but contain terminologies related to sensitive topics may end up being rejected repeatedly.

A good example of this is an initiative against the spread of extremist ideas. Clearly, the intention is good, but since machine-operated moderation focuses greatly on keywords and base models, it will not be able to look past the terms "extremist" or "extremism". The end result is that the content will be banned or flagged as inappropriate even though the user’s intention is not to offend anyone.

Clever Ways to Avoid Auto-Moderation and Profanity Filters

Users are becoming increasingly savvy as a result of today's technology, and they are beginning to learn how to bypass swear filters to carry out their malicious intentions online. Some can also go further and learn how to turn off profanity filters entirely to turn the tables to their advantage. To become protected against trolls and hostile users is to know how to bypass profanity filters. In this light, bypassing auto-moderation filters can come in various forms, such as:

1. The use of alphanumeric characters when spelling out curse words. If a post blatantly contains swear words or cursing, the auto-moderators will automatically delete it. However, if offensive language is spelled with characters such as "$," "@," or "#," chances are it will not be filtered by the AI moderator.

2. Employing a method called "cloaking" on social media, where links to malicious content including pornography, bogus miracle fitness supplements, and earn-money-quick scams are concealed as safe, non-violating, and user-friendly ads. These links are carefully cloaked so that when Facebook’s moderation system goes through the link, it is redirected to a safe, informative website that adheres to its ad policies. Meanwhile, unknowing Facebook users are taken to pages containing inappropriate content.

3. Adding texts to images or blurring explicit details on the images. This method keeps AI moderation from detecting inappropriate depictions of the images posted by users who violate community guidelines.

4. Live streaming videos that depict offensive content are not moderated immediately if the moderation process used does not enable real-time monitoring of user-generated content.

5. Modifying the metadata of images to confuse or trick moderation bots into interpreting the image as something appropriate or compliant with the website’s posting guidelines.

6. Creating a fake account where minors can use a fake birthdate to conceal their real age and gain access to view disturbing videos on R-18 websites. Age restrictions are easily surpassed in this manner, making adolescents susceptible to highly graphic videos online.

7. Using clever product placement to conceal content or products that may not be appropriate to the general audience from auto moderators. While the said product may be hidden from moderation algorithms, it is still made visible enough for the viewers to take notice. Briefly flashing or showing the product onscreen imprints the product’s name into the subconscious minds of the viewers, thus driving higher attention to the product without the computerized moderation system detecting it.

8. Posting under the guise of a prominent organization, social media personality, or a widely known provider of a particular service instantly grabs the attention and trust of fellow community members. Some auto moderators may not be programmed to detect fraud profiles or scammers, making other members fall prey easily to email phishing and other forms of identity theft.

9. Joining a brand’s online community for the purpose of defaming it and its supporters through online trolling. The more vigilant an online community is in monitoring online trolls, the smarter and more creative trolls become in executing their schemes. There are trolls that create multiple fake profiles or accounts to avoid being detected. Meanwhile, some online trolls may conceal their text through a code or filter their images to prevent their posts from being detected and deleted by the site’s AI moderator.

Computer-Powered Moderation and Social Media

There are instances where auto-moderation filters may not be compatible with certain social media channels. For instance, Instagram and Twitter’s default posting settings do not allow nor include a 'Hide’ or 'Remove’ feature for user posts. Likewise, mentions and messages on Twitter are not allowed to be removed. As such, this makes automated UGC filtering and hiding useless in these channels.

Despite the limitations of AI in social media moderation, its role in monitoring millions of user content remains a relevant part of improving the user experience and protecting them from disturbing text, images, and videos. Facebook, Instagram, Twitter, and YouTube can use data collected by AI to take a closer look at user posts and comments and create a solid basis for how moderation guidelines can be enhanced.

Is AI moderation a waste of time and money?

The answer to that is no.

While AI moderation may have its limitations, its contributions still greatly benefit several brands online. When existing moderation processes are combined with AI technology, it helps lessen the workload on the part of human content moderators. When auto-moderation and human moderation work hand-in-hand, the complementary roles of both types of moderation can fill areas where an online community, platform or website may be lacking in terms of keeping track of user-generated content.

An efficient content moderation service is a harmonized combination of auto-moderation and human moderation services.

Truth be told, a brand cannot have a fool-proof defense against unwanted user-generated content by simply depending on a single type of moderation alone. The secret to a well-oiled moderation process is finding the perfect balance between AI and human-powered moderation.

Moderation powered by technology enables quick and continuous content moderation, especially for brands that regularly cater to large volumes of end-user content. With the aid of human moderators, AI’s content checking process is made more accurate and efficient.