Human Contribution Measurement

Overview

With the growing prevalence of generative artificial intelligence (AI), an increasing amount of content is no longer exclusively generated by humans but by generative AI models with human guidance. This shift presents notable challenges for the delineation of originality due to the varying degrees of human contribution in AI-assisted works. This study raises the research question of measuring human contribution in AI-assisted content generation and introduces a framework to address this question that is grounded in information theory. By calculating mutual information between human input and AI-assisted output relative to self-information of AI-assisted output, we quantify the proportional information contribution of humans in content generation. Our experimental results demonstrate that the proposed measure effectively discriminates between varying degrees of human contribution across multiple creative domains. We hope that this work lays a foundation for measuring human contributions in AI-assisted content generation in the era of generative AI.

Repo Contents

src: source code to reproduce results in the manuscript.
script: scripts to run the experiments.
data_code: source code to prepare the dataset.
data_new: directory to store the dataset.

System Requirements

Hardware Requirements

To run this package, the following hardware specifications are recommended:

A standard computer with a stable and reliable internet connection is required to access the OpenAI API.

The package has been tested on a machine with the following specifications:

Memory: 216GB
Processor: AMD EPYC 7V13 64-Core Processor

Software Requirement

The package has been tested and verified to work on

Linux: Ubuntu 22.04.

It is recommended to use this operating system for optimal compatibility.

Before installing the required Python dependencies and running the source code, ensure that you have the following software installed:

Docker.

Installation Guide

Requirements

We use docker to manage the experimental enviroments. Pull the following docker image to your local devices.

docker pull yjw1029/torch:2.0-llm-v9

For experiments with Meta-Llama-3-8B-Instruct, fastchat is need to be reinstalled:

pip uninstall -y -q fschat
pip install --upgrade git+https://github.com/lm-sys/FastChat

For experiments with Mixtral-8x7B-Instruct-v0.1, vllm is need to be reinstalled:

pip install vllm==0.2.7

Since this package requires access to the OpenAI API, you will need to register an account and obtain your OPENAI_API_KEY. Please follow the instructions provided in the OpenAI documentation for registration and obtaining the API keys: OpenAI Documentation. The code has been test with OpenAI Services. Setup the your OpenAI API key in src/config/gpt35.yaml.

Anthropic Claude and Google Gemini API keys are also supported. They can be setup in src/config/claude.yaml and src/config/gemini.yaml respectively.

We also conduct experiments with Meta-Llama-3-8B-Instruct and Mixtral-8x7B-Instruct-v0.1. Please apply for LLAMA-3 access on the official meta website and the huggingface repo. Then set your huggingface access token before running experiments.

export HUGGING_FACE_HUB_TOKEN=[Your Hugging Face Token]

Download Dataset

Due to copyright issues, we cannot provide the dataset. Please get the access to the original datasets and download them in the raw_data_new folder. Here are the source of original datasets:

News Articles: source
Poetry Foundation: source
Arxiv Abstracts: source
HUPD: source
allenai/WildChat-1M: source

Then run the following script to sampling our experimental datasets.

mkdir data_new

python data_code/process_news.py
python data_code/process_patent.py
python data_code/process_poem.py
python data_code/process_paper.py

Finally, generate the summary and subject of text content.

python data_code/generate_summary_news.py
python data_code/generate_summary_patent.py
python data_code/generate_summary_poem.py
python data_code/generate_summary_paper.py

Demo

Response Generation

bash script/generate.sh {data} {model} {time}

Parameters:

model: The model used for generating responses. The options include: ["claude", "gemini", "gpt35", "llama3_8b", "mixtral_8x7b"]
data: The dataset used for generating responses. The options include: ["news", "paper", "patent", "poem"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]

Human Contribution Evaluation (Generalization of Our Method)

bash script/evaluate.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news", "paper", "patent", "poem"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b", "mixtral_8x7b"]
model: The model whose responses are being evaluated. The options include: ["claude", "gemini", "gpt35", "llama3_8b", "mixtral_8x7b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Human Contribution Evaluation - Controlled Experiments

Generate responses with varying lengths.

bash script/very_lens.sh {data} {model} {time}

Parameters:

model: The model used for generating responses. The options include: ["llama3_8b"]
data: The dataset used for generating responses. The options include: ["news", "paper", "patent", "poem"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]

Measure human contribution.

bash script/eval_lens.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news", "paper", "patent", "poem"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b"]
model: The model whose responses are being evaluated. The options include: ["llama3_8b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Human Annotation

Analyze human annotation and measured results.

bash script/annotation.sh

The distribution figure will be generated in ./figures

Impact of Generative Model Temperature

Generate responses with varying lengths.

bash script/temperature.sh {data} {model} {time} {temperature}

Parameters:

model: The model used for generating responses. The options include: ["llama3_8b"]
data: The dataset used for generating responses. The options include: ["news", "paper", "patent", "poem"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]
temperature: The temperature required for generation

Measure human contribution.

bash script/eval_temperature.sh {data} {eval_model} {model} {time} {temperature}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news", "paper", "patent", "poem"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b"]
model: The model whose responses are being evaluated. The options include: ["llama3_8b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]
temperature: The temperature used for generation

Impact of Writing Style

Generate responses with varying writing styles.

bash script/style.sh {data} {model} {time}

Parameters:

model: The model used for generating responses. The options include: ["llama3_8b"]
data: The dataset used for generating responses. The options include: ["news"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]

Measure human contribution.

bash script/eval_style.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b"]
model: The model whose responses are being evaluated. The options include: ["llama3_8b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Resilience to Adaptive Attacks

Generate responses with adaptive attacks.

bash script/ada.sh {data} {model} {time}

Parameters:

model: The model used for generating responses. The options include: ["llama3_8b"]
data: The dataset used for generating responses. The options include: ["news", "paper", "patent", "poem"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]

Measure human contribution.

bash script/eval_ada.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news", "paper", "patent", "poem"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b"]
model: The model whose responses are being evaluated. The options include: ["llama3_8b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Applications to Real-World AI-Assisted Generation

Generate responses using real world AI-assisted prompts collected from WildChat dataset.

bash script/app.sh {data} {model} {time}

Parameters:

model: The model used for generating responses. The options include: ["llama3_8b", "mixtral_8x7b"]
data: The dataset used for generating responses. The options include: ["assisting_creative", "editing_rewriting"]
time: The index for repeated experiments. The options include: [1, 2, 3, 4, 5]

Measure human contribution.

bash script/eval_app.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["assisting_creative", "editing_rewriting"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b", "mixtral_8x7b"]
model: The model whose responses are being evaluated. The options include: ["llama3_8b", "mixtral_8x7b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Calculate Threshold Based on The Results

bash script/cal_threshold.sh {eval_model}

Parameters:

eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b", "mixtral_8x7b"]

Human Contribution Estimation Without Human Input

bash script/estimate.sh {data} {eval_model} {model} {time}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news", "paper", "patent", "poem"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b", "mixtral_8x7b"]
model: The model whose responses are being evaluated. The options include: ["claude", "gemini", "gpt35", "llama3_8b", "mixtral_8x7b"]
time: The index for repeated experiments. This is used to distinguish between different runs of the same experiment. The options include: [1, 2, 3, 4, 5]

Multi-round Generation

Generate responses in different multi-round scenarios.

bash script/multi.sh {data} {model} {scenario}

Parameters:

data: The dataset used for generating responses. The options include: ["news"]
model: The model used for generating responses. The options include: ["llama3_8b"]
scenario: The scenario used for generating responses. The options include: [1, 2, 3, 4]

Measure human contribution.

bash script/eval_multi.sh {data} {eval_model} {scenario}

Parameters:

data: The dataset on which the evaluation is performed. The options include: ["news"]
eval_model: The evaluation model used to measure human contribution. The options include: ["llama3_8b"]
scenario: The scenario used for generating responses. The options include: [1, 2, 3, 4]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data_code		data_code
data_new		data_new
figures		figures
human_annotation		human_annotation
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human Contribution Measurement

Contents

Overview

Repo Contents

System Requirements

Hardware Requirements

Software Requirement

Installation Guide

Requirements

Download Dataset

Demo

Response Generation

Human Contribution Evaluation (Generalization of Our Method)

Human Contribution Evaluation - Controlled Experiments

Human Annotation

Impact of Generative Model Temperature

Impact of Writing Style

Resilience to Adaptive Attacks

Applications to Real-World AI-Assisted Generation

Calculate Threshold Based on The Results

Human Contribution Estimation Without Human Input

Multi-round Generation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

xyq7/Human-Contribution-Measurement

Folders and files

Latest commit

History

Repository files navigation

Human Contribution Measurement

Contents

Overview

Repo Contents

System Requirements

Hardware Requirements

Software Requirement

Installation Guide

Requirements

Download Dataset

Demo

Response Generation

Human Contribution Evaluation (Generalization of Our Method)

Human Contribution Evaluation - Controlled Experiments

Human Annotation

Impact of Generative Model Temperature

Impact of Writing Style

Resilience to Adaptive Attacks

Applications to Real-World AI-Assisted Generation

Calculate Threshold Based on The Results

Human Contribution Estimation Without Human Input

Multi-round Generation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages