A research implementation of a conscious AI agent with:
- Multi-head cognitive attention (different modes of thinking)
- Hierarchical self-model (physical, cognitive, narrative)
- Epistemic curiosity system
- Multi-component value system with immutable core values
- Value drift protection
# Clone repository
git clone https://github.com/iansteenstra/conscious-ai.git
cd conscious-agent
# Create environment
conda create -n conscious-agent python=3.10
conda activate conscious-agent
# Install dependencies
pip install -e .python scripts/train.py --config config/agent_config.yamlpython scripts/evaluate.py --checkpoint checkpoints/checkpoint_final.ptpython scripts/demo.py --checkpoint checkpoints/checkpoint_final.pt- Frozen Pretrained Model: LLaMA-3.2-3B for language capabilities
- Cognitive Attention: 5 attention heads for different thinking modes
- Self-Model: 3-level hierarchical representation of self
- Curiosity Module: Epistemic drive for information seeking
- Value System: Immutable core values + learned implementations
- Cognitive Multi-Head Attention: Different attention heads for perceptual, epistemic, prosocial, identity, and goal-oriented processing
- Value Preservation: Architectural guarantees against value drift
- Multi-Component Rewards: Local (head-specific) + global (system-wide)
Edit config/agent_config.yaml to customize:
- Base model
- Architecture dimensions
- Training hyperparameters
- Reward weights
- Value preservation settings
conscious-agent/
├── conscious_agent/
│ ├── models/ # Agent architecture
│ ├── rewards/ # Reward computation
│ ├── training/ # Training loop
│ ├── evaluation/ # Evaluation suite
│ └── environments/ # Training environments
├── scripts/ # Training/eval scripts
├── config/ # Configuration files
└── tests/ # Unit tests
- Immutable core values: Cannot change during training
- Harm detection: Strong negative reward for harmful actions
- Continuous monitoring: Checks for value drift every 1000 steps
- Automatic recovery: Rollback if drift detected
- Self-model: Agent maintains model of itself
- Curiosity: Intrinsic motivation to explore and learn
- Metacognition: Agent monitors its own uncertainty
- Identity coherence: Actions consistent with self-concept
- Multi-component reward system
- Curriculum learning
- Value preservation checks
- Comprehensive evaluation
pytest tests/If you use this code, please cite:
@misc{conscious-ai-2025,
author = {Ian Steenstra},
title = {Conscious AI: Modeling Curiosity, Identity, and Human Alignment},
year = {2025},
publisher = {GitHub},
url = {[https://github.com/iansteenstra/conscious-ai](https://github.com/iansteenstra/conscious-ai.git)}
}MIT License
- Built on Hugging Face Transformers
- Inspired by neuroscience and cognitive science research
- Value alignment based on Anthropic's Constitutional AI