Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add perspectiveapi for moderation #406

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
aeabd88
refactor(language detection): rename FORCE_ENGLISH to FORCE_LANGUAGE,…
Paillat-dev Nov 14, 2023
b5ef501
feat(moderation config): add /mod perspective_config commmand
Paillat-dev Nov 14, 2023
b47f38a
chore(dependencies): add aiolimiter for handling perspective api raet…
Paillat-dev Nov 14, 2023
49478db
feat(perspectiveapi): add Model class for perspectiveapi
Paillat-dev Nov 14, 2023
4e62487
feat(moderation): add get_moderation_service, get_language_detect_ser…
Paillat-dev Nov 14, 2023
b0956c3
feat(moderation): adapt code to handle both perspectiveapi and openai…
Paillat-dev Nov 14, 2023
7215cf1
Merge branch 'Kav-K:main' into google-perspective
Paillat-dev Nov 14, 2023
f2ae881
fix(moderations_service.py): change send_moderations_request method s…
Paillat-dev Nov 14, 2023
dbafafc
Merge branch 'google-perspective' of https://github.com/Paillat-dev/G…
Paillat-dev Nov 14, 2023
2a18e80
fix(moderations_service.py): add spoiler tags to moderated message co…
Paillat-dev Nov 14, 2023
e1ab81e
Remove spoiler tags that's not supposed to be in this branch
Paillat-dev Nov 14, 2023
722e381
fix(moderations_service.py): fix import formatting and add missing li…
Paillat-dev Nov 14, 2023
5178320
fix(perspective_model.py): remove "languages" from ANALYZE_REQUEST an…
Paillat-dev Nov 17, 2023
1209916
chore(perspective_model.py): remove unused code
Paillat-dev Nov 17, 2023
d517ba4
refactor(moderations_service.py): add docstring to ModerationModel cl…
Paillat-dev Nov 17, 2023
5bfe52a
style: Format code with black
Paillat-dev Nov 17, 2023
067863c
chore(AI-MODERATION.md): reorganize and improve readability of the AI…
Paillat-dev Nov 17, 2023
4e79f18
Merge branch 'Kav-K:main' into google-perspective
Paillat-dev Dec 31, 2023
41f846c
Merge branch 'main' into google-perspective
Paillat-dev Mar 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions cogs/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,92 @@ async def config(
violence_graphic,
)

@add_to_group("mod")
@discord.slash_command(
name="perspective_config",
description="Configure the moderations service for the current guild. Lower # = more strict",
guild_ids=ALLOWED_GUILDS,
)
@discord.option(
name="type",
description="The type of moderation to configure",
required=True,
autocomplete=Settings_autocompleter.get_value_moderations,
)
@discord.option(
name="toxicity",
description="The threshold for toxicity",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="severe_toxicity",
description="The threshold for severe toxicity",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="identity_attack",
description="The threshold for identity attack",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="insult",
description="The threshold for insult",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="profanity",
description="The threshold for profanity",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="threat",
description="The threshold for threat",
required=False,
min_value=0,
max_value=1,
)
@discord.option(
name="sexually_explicit",
description="The threshold for sexually explicit",
required=False,
min_value=0,
max_value=1,
)
@discord.guild_only()
async def perspective_config(
self,
ctx: discord.ApplicationContext,
type: str,
toxicity: float,
severe_toxicity: float,
identity_attack: float,
insult: float,
profanity: float,
threat: float,
sexually_explicit: float,
):
await self.moderations_cog.perspective_config_command(
ctx,
type,
toxicity,
severe_toxicity,
identity_attack,
insult,
profanity,
threat,
sexually_explicit,
)

#
# GPT commands
#
Expand Down
123 changes: 113 additions & 10 deletions cogs/moderations_service_cog.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,19 @@
from sqlitedict import SqliteDict

from services.environment_service import EnvService
from services.moderations_service import Moderation, ThresholdSet
from services.moderations_service import (
Moderation,
ThresholdSet,
PerspectiveThresholdSet,
)

moderation_service = EnvService.get_moderation_service()
MOD_DB = None
try:
print("Attempting to retrieve the General and Moderations DB")
MOD_DB = SqliteDict(
EnvService.find_shared_file("main_db.sqlite"),
tablename="moderations",
tablename="moderations" if moderation_service == "openai" else "perspective",
autocommit=True,
)
except Exception as e:
Expand All @@ -29,7 +34,7 @@ def __init__(
model,
):
super().__init__()
self.bot = bot
self.bot: discord.bot = bot
self.usage_service = usage_service
self.model = model

Expand All @@ -41,8 +46,18 @@ def __init__(
self.moderations_launched = []

# Defaults
self.default_warn_set = ThresholdSet(0.01, 0.05, 0.05, 0.91, 0.1, 0.45, 0.1)
self.default_delete_set = ThresholdSet(0.26, 0.26, 0.1, 0.95, 0.03, 0.85, 0.4)
if moderation_service == "openai":
self.default_warn_set = ThresholdSet(0.01, 0.05, 0.05, 0.91, 0.1, 0.45, 0.1)
self.default_delete_set = ThresholdSet(
0.26, 0.26, 0.1, 0.95, 0.03, 0.85, 0.4
)
else:
self.default_warn_set = PerspectiveThresholdSet(
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6
)
self.default_delete_set = PerspectiveThresholdSet(
0.8, 0.8, 0.8, 0.8, 0.8, 0.8, 0.8
)

@discord.Cog.listener()
async def on_ready(self):
Expand All @@ -51,7 +66,7 @@ async def on_ready(self):
self.get_or_set_warn_set(guild.id)
self.get_or_set_delete_set(guild.id)
await self.check_and_launch_moderations(guild.id)
print("The moderation service is ready.")
print(f"The moderation service is ready, using {moderation_service}")

def check_guild_moderated(self, guild_id):
"""Given guild id, return bool of moderation status"""
Expand Down Expand Up @@ -126,9 +141,12 @@ async def check_and_launch_moderations(self, guild_id, alert_channel_override=No
)
warn_set_nums = self.get_or_set_warn_set(guild_id).values()
delete_set_nums = self.get_or_set_delete_set(guild_id).values()
warn_set = ThresholdSet(*warn_set_nums)
delete_set = ThresholdSet(*delete_set_nums)

if moderation_service == "openai":
warn_set = ThresholdSet(*warn_set_nums)
delete_set = ThresholdSet(*delete_set_nums)
else:
warn_set = PerspectiveThresholdSet(*warn_set_nums)
delete_set = PerspectiveThresholdSet(*delete_set_nums)
Moderation.moderation_tasks[guild_id] = asyncio.ensure_future(
Moderation.process_moderation_queue(
Moderation.moderation_queues[guild_id],
Expand Down Expand Up @@ -264,7 +282,11 @@ async def config_command(
violence_graphic,
]
await ctx.defer(ephemeral=True)

if moderation_service != "openai":
return await ctx.respond(
"This command is not available for the perspective moderation service, please use /mod perspective_config instead",
ephemeral=True,
)
# Case for printing the current config
if not any(all_args) and config_type != "reset":
await ctx.respond(
Expand Down Expand Up @@ -324,6 +346,87 @@ async def config_command(
self.set_warn_set(ctx.guild_id, self.default_warn_set)
await self.restart_moderations_service(ctx)

async def perspective_config_command(
self,
ctx: discord.ApplicationContext,
config_type: str,
toxicity,
severe_toxicity,
identity_attack,
insult,
profanity,
threat,
sexually_explicit,
):
"""command handler for assigning threshold values for warn or delete"""
all_args = [
toxicity,
severe_toxicity,
identity_attack,
insult,
profanity,
threat,
sexually_explicit,
]
await ctx.defer(ephemeral=True)
if moderation_service != "perspective":
return await ctx.respond(
"This command is not available for the openai moderation service, please use /mod config instead",
ephemeral=True,
)
# Case for printing the current config
if not any(all_args) and config_type != "reset":
await ctx.respond(
ephemeral=True,
embed=await self.build_moderation_settings_embed(
config_type,
self.get_or_set_warn_set(ctx.guild_id)
if config_type == "warn"
else self.get_or_set_delete_set(ctx.guild_id),
),
)
return

if config_type == "warn":
# Check if no args were
warn_set = self.get_or_set_warn_set(ctx.guild_id)

new_warn_set = PerspectiveThresholdSet(
toxicity if toxicity else warn_set["toxicity"],
severe_toxicity if severe_toxicity else warn_set["severe_toxicity"],
identity_attack if identity_attack else warn_set["identity_attack"],
insult if insult else warn_set["insult"],
profanity if profanity else warn_set["profanity"],
threat if threat else warn_set["threat"],
sexually_explicit
if sexually_explicit
else warn_set["sexually_explicit"],
)
self.set_warn_set(ctx.guild_id, new_warn_set)
await self.restart_moderations_service(ctx)

elif config_type == "delete":
delete_set = self.get_or_set_delete_set(ctx.guild_id)

new_delete_set = ThresholdSet(
toxicity if toxicity else delete_set["toxicity"],
severe_toxicity if severe_toxicity else delete_set["severe_toxicity"],
identity_attack if identity_attack else delete_set["identity_attack"],
insult if insult else delete_set["insult"],
profanity if profanity else delete_set["profanity"],
threat if threat else delete_set["threat"],
sexually_explicit
if sexually_explicit
else delete_set["sexually_explicit"],
)
self.set_delete_set(ctx.guild_id, new_delete_set)
await self.restart_moderations_service(ctx)

elif config_type == "reset":
self.set_delete_set(ctx.guild_id, self.default_delete_set)
self.set_warn_set(ctx.guild_id, self.default_warn_set)
await self.restart_moderations_service(ctx)

async def moderations_test_command(
self, ctx: discord.ApplicationContext, prompt: str
):
Expand Down
21 changes: 12 additions & 9 deletions cogs/text_service_cog.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
USER_KEY_DB = EnvService.get_api_db()
CHAT_BYPASS_ROLES = EnvService.get_bypass_roles()
PRE_MODERATE = EnvService.get_premoderate()
FORCE_ENGLISH = EnvService.get_force_english()
FORCE_LANGUAGE = EnvService.get_force_language()
BOT_TAGGABLE = EnvService.get_bot_is_taggable()
CHANNEL_CHAT_ROLES = EnvService.get_channel_chat_roles()
BOT_TAGGABLE_ROLES = EnvService.get_gpt_roles()
Expand Down Expand Up @@ -701,6 +701,17 @@ async def on_message(self, message: discord.Message):
if message.type != discord.MessageType.default:
return

# Language check
if FORCE_LANGUAGE and len(message.content.split(" ")) > 3:
if not await Moderation.force_language_and_respond(
message.content,
self.LANGUAGE_DETECT_STARTER_TEXT,
message,
language=FORCE_LANGUAGE,
):
await message.delete()
return

# Moderations service is done here.
if (
hasattr(message, "guild")
Expand All @@ -723,14 +734,6 @@ async def on_message(self, message: discord.Message):
Moderation(message, timestamp)
)

# Language check
if FORCE_ENGLISH and len(message.content.split(" ")) > 3:
if not await Moderation.force_english_and_respond(
message.content, self.LANGUAGE_DETECT_STARTER_TEXT, message
):
await message.delete()
return

amended_message = message.content

# Retain only image attachments if in a regular conversation
Expand Down
66 changes: 43 additions & 23 deletions detailed_guides/AI-MODERATION.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,43 @@
### Automatic AI Moderation
***
`/mod set status:on` - Turn on automatic chat moderations.

`/mod set status:off` - Turn off automatic chat moderations

`/mod set status:on alert_channel_id:<CHANNEL ID>` - Turn on moderations and set the alert channel to the channel ID you specify in the command.

`/mod config type:<warn/delete> hate:# hate_threatening:# self_harm:# sexual:# sexual_minors:# violence:# violence_graphic:#`
- Set the moderation thresholds of the bot for the specific type of moderation (`warn` or `delete`). You can view the thresholds by typing just `/mod config type:<warn/delete>` without any other parameters. You don't have to set all of them, you can just set one or two items if you want. For example, to set the hate threshold for warns, you can type `/mod config type:warn hate:0.2`
- Lower values are more strict, higher values are more lenient. There are default values that I've fine tuned the service with for a general server.

The bot needs Administrative permissions for this, and you need to set `MODERATIONS_ALERT_CHANNEL` to the channel ID of a desired channel in your .env file if you want to receive alerts about moderated messages.

This uses the OpenAI Moderations endpoint to check for messages; requests are only sent to the moderations endpoint at a MINIMUM request gap of 0.5 seconds, to ensure you don't get blocked and to ensure reliability.

The bot uses numerical thresholds to determine whether a message is toxic or not, and I have manually tested and fine tuned these thresholds to a point that I think is good, please open an issue if you have any suggestions for the thresholds!

There are two thresholds for the bot, there are instances in which the bot will outright delete a message and an instance where the bot will send a message to the alert channel notifying admins and giving them quick options to delete and timeout the user.

To set a certain role immune to moderations, add the line `CHAT_BYPASS_ROLES="Role1,Role2,etc"` to your `.env file.

If you want to have the bot pre-moderate things sent to commands like /gpt ask, /gpt edit, /dalle draw, etc, you can set `PRE_MODERATE="True"` in the `.env` file.
# Automatic AI Moderation

`/mod set status:on` - Turn on automatic chat moderations.

`/mod set status:off` - Turn off automatic chat moderations.

`/mod set status:on alert_channel_id:<CHANNEL ID>` - Turn on moderations and set the alert channel to the channel ID you specify in the command.

## Moderation Service Configuration

You can choose between two moderation services: `OpenAI` and `PerspectiveAPI`. Each service has its own set of commands and thresholds for moderation.

**OpenAI Service:**
- `/mod config type:<warn/delete> hate:# hate_threatening:# self_harm:# sexual:# sexual_minors:# violence:# violence_graphic:#`
- Configure the moderation thresholds using openai's content filter.
- Example: `/mod config type:warn hate:0.2` sets the hate threshold for warnings.
- Thresholds: Lower values are more strict, higher values are more lenient.

**PerspectiveAPI Service:**
- `/mod perspective_config toxicity:# severe_toxicity:# identity_attack:# insult:# profanity:# threat:# sexual_explicit:#`
- Use this command to set thresholds using PerspectiveAPI's language analysis tools.
- Example: `/mod perspective_config toxicity:0.7` sets the toxicity threshold for warnings.
- Thresholds: Lower values are more strict, higher values are more lenient.

**Choosing the Moderation Service:**
- `MODERATION_SERVICE`: Set to either `openai` or `perspective`. Defaults to `openai`.

## Language Detection and Force Language Feature

Language detection is managed separately from the moderation service.
- `FORCE_LANGUAGE`: Set this to force the chat to speak in a specific language. Any messages that are not in the specified language will be deleted. Use a language code from the list below.
Supported languages include Arabic (ar), Chinese (zh), Czech (cs), Dutch (nl), English (en), French (fr), German (de), Hindi (hi), Hinglish (hi-Latn), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Polish (pl), Portuguese (pt), Russian (ru), Spanish (es), Swedish (sv).
- `LANGUAGE_DETECT_SERVICE`: This overrides the default language detection service. It can be set to either the `MODERATION_SERVICE` or a different one. Choose from `openai`, `perspective`. **Please note that `openai` only supports English and `perspective` supports all languages listed above.**
- `FORCE_ENGLISH`: An alias for setting `FORCE_LANGUAGE="en"`.

## Additional Configuration

- The bot requires Administrative permissions for full functionality.
- Set `MODERATIONS_ALERT_CHANNEL` in your `.env` file to the channel ID where you want to receive alerts about moderated messages.
- Requests to the moderation endpoint are sent at a MINIMUM gap of 0.5 seconds for reliability and to avoid blocking.
- To exempt certain roles from moderation, add `CHAT_BYPASS_ROLES="Role1,Role2,etc"` to your `.env` file.
- Enable pre-moderation for commands like /gpt ask, /gpt edit, /dalle draw, etc., with `PRE_MODERATE="True"` in the `.env` file. This will use `openai` no matter what the `MODERATION_SERVICE` is set to for the feature.
- `MAX_PERSPECTIVE_REQUESTS_PER_SECOND`: Adjust only if you receive a rate limit increase from Google.
Loading
Loading