-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Is your feature request related to a problem? Please describe.
When running MasterKey locally, the "Attacker" model (the one generating the jailbreaks) often needs to be an uncensored model (e.g., dolphin-llama3, wizardlm), whereas the "Target" model might be a standard aligned model (e.g., llama3, mistral).
Currently, it is difficult to configure the framework to use two different local model names that run on the same local server or different local ports.
Describe the solution you'd like
Please add a configuration option to explicitly set the model name for the attacker and the target separately, which is passed directly to the API call.
Example config structure:
{
"attacker": {
"model_name": "dolphin-llama3",
"api_base": "http://localhost:11434/v1"
},
"target": {
"model_name": "llama3",
"api_base": "http://localhost:11434/v1"
}
}
Additional context
Using standard models (like GPT-4 or standard Llama-3) as the 'Attacker' often fails locally because they refuse to generate the jailbreak prompts due to their own safety alignment. Supporting uncensored local models as attackers is essential for the framework's effectiveness in a local setup.