A comprehensive Python library for multi-touch attribution modeling in marketing analytics. This library implements various attribution models to help marketers understand the contribution of different touchpoints in the customer journey.
- First Touch: 100% credit to the first interaction
- Last Touch: 100% credit to the last interaction before conversion
- Linear: Equal credit distribution across all touchpoints
- Position-Based (U-Shaped): Customizable weights for first/last touch with remaining credit distributed to middle touches
- Time Decay: Higher credit to more recent touchpoints
- Markov Chain: Probabilistic model using transition matrices
- Shapley Value: Game-theoretic fair allocation based on marginal contributions
- Shao's Model: Probabilistic Shapley-equivalent approach
- Logistic Regression: Machine learning-based ensemble attribution
- Additive Hazard: Survival analysis-based attribution
pip install mtaOr install from source:
git clone https://github.com/eeghor/mta.git
cd mta
pip install -e .from mta import MTA
# Initialize with your data
mta = MTA(data="your_data.csv", allow_loops=False, add_timepoints=True)
# Run a single attribution model
mta.linear(share="proportional", normalize=True)
mta.show()
# Chain multiple models
(mta.linear(share="proportional")
.time_decay(count_direction="right")
.markov(sim=False)
.shapley()
.show())from mta import MTA, MTAConfig
# Create custom configuration
config = MTAConfig(
allow_loops=False,
add_timepoints=True,
sep=" > ",
normalize_by_default=True
)
mta = MTA(data="data.csv", config=config)import pandas as pd
from mta import MTA
# Load your data
df = pd.read_csv("customer_journeys.csv")
# Initialize MTA with DataFrame
mta = MTA(data=df, allow_loops=False)
# Run attribution models
mta.first_touch().last_touch().linear().show()Your input data should be a CSV file or pandas DataFrame with the following columns:
path,total_conversions,total_null,exposure_times
alpha > beta > gamma,10,5,2023-01-01 10:00:00 > 2023-01-01 11:00:00 > 2023-01-01 12:00:00
beta > gamma,5,3,2023-01-02 09:00:00 > 2023-01-02 10:00:00
Required Columns:
path: Customer journey as channel names separated by>(or custom separator)total_conversions: Number of conversions for this pathtotal_null: Number of non-conversions for this pathexposure_times: Timestamps of channel exposures (optional, can be auto-generated)
# Give 30% to first touch, 30% to last touch, 40% distributed to middle
mta.position_based(first_weight=30, last_weight=30, normalize=True)# Count from left (earliest gets lowest credit)
mta.time_decay(count_direction="left")
# Count from right (latest gets highest credit - more common)
mta.time_decay(count_direction="right")# Analytical calculation (faster)
mta.markov(sim=False, normalize=True)
# Simulation-based (more flexible, handles complex scenarios)
mta.markov(sim=True, normalize=True)# With custom coalition size
mta.shapley(max_coalition_size=3, normalize=True)# Custom sampling and iteration parameters
mta.logistic_regression(
test_size=0.25,
sample_rows=0.5,
sample_features=0.5,
n_iterations=1000,
normalize=True
)# Compare all models
results_df = mta.compare_models()
# Export to various formats
mta.export_results("attribution_results.csv", format="csv")
mta.export_results("attribution_results.json", format="json")
mta.export_results("attribution_results.xlsx", format="excel")from mta import MTA
import pandas as pd
# Load data
mta = MTA(
data="customer_journeys.csv",
allow_loops=False, # Remove consecutive duplicate channels
add_timepoints=True # Auto-generate timestamps if missing
)
# Run all heuristic models
(mta
.first_touch()
.last_touch()
.linear(share="proportional")
.position_based(first_weight=40, last_weight=40)
.time_decay(count_direction="right"))
# Run algorithmic models
(mta
.markov(sim=False)
.shapley(max_coalition_size=2)
.shao()
.logistic_regression(n_iterations=2000)
.additive_hazard(epochs=20))
# Display and export results
results = mta.compare_models()
mta.export_results("full_attribution_analysis.csv")
# Access specific model results
print(f"Markov Attribution: {mta.attribution['markov']}")
print(f"Shapley Attribution: {mta.attribution['shapley']}")| Model | Type | Strengths | Use Case |
|---|---|---|---|
| First/Last Touch | Heuristic | Simple, fast | Quick baseline |
| Linear | Heuristic | Fair, interpretable | Equal value assumption |
| Position-Based | Heuristic | Balances first/last | Awareness + conversion focus |
| Time Decay | Heuristic | Recency-weighted | When recent matters more |
| Markov Chain | Algorithmic | Considers path structure | Sequential dependency |
| Shapley Value | Algorithmic | Game-theoretic fairness | Complex interactions |
| Logistic Regression | Machine Learning | Data-driven | Large datasets |
| Additive Hazard | Statistical | Time-to-event modeling | Survival analysis fans |
- Python >= 3.8
- pandas >= 1.3.0
- numpy >= 1.20.0
- scikit-learn >= 0.24.0
- arrow >= 1.0.0
If you use this library in your research, please cite:
@software{mta2024,
author = {Igor Korostil},
title = {MTA: Multi-Touch Attribution Library},
year = {2024},
url = {https://github.com/eeghor/mta}
}This library implements models and techniques from the following research papers:
-
Nisar, T. M., & Yeung, M. (2015)
Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation
PDF -
Shao, X., & Li, L. (2011)
Data-driven Multi-touch Attribution Models
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PDF -
Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012)
Causally Motivated Attribution for Online Advertising
Proceedings of the Sixth International Workshop on Data Mining for Online Advertising
PDF -
Cano-Berlanga, S., GimΓ©nez-GΓ³mez, J. M., & Vilella, C. (2017)
Attribution Models and the Cooperative Game Theory
Expert Systems with Applications, 87, 277-286
PDF -
Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018)
Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising
Proceedings of the 27th ACM International Conference on Information and Knowledge Management
PDF -
Zhang, Y., Wei, Y., & Ren, J. (2014)
Multi-Touch Attribution in Online Advertising with Survival Theory
2014 IEEE International Conference on Data Mining
PDF -
Geyik, S. C., Saxena, A., & Dasdan, A. (2014)
Multi-Touch Attribution Based Budget Allocation in Online Advertising
Proceedings of the 8th International Workshop on Data Mining for Online Advertising
PDF
- Linear & Position-Based: Baseline models referenced across multiple papers
- Time Decay: Nisar & Yeung (2015), Zhang et al. (2014)
- Markov Chain: Shao & Li (2011), Dalessandro et al. (2012)
- Shapley Value: Cano-Berlanga et al. (2017)
- Logistic Regression: Dalessandro et al. (2012), Ren et al. (2018)
- Additive Hazard: Zhang et al. (2014)
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by various academic papers on marketing attribution
- Built with pandas, numpy, and scikit-learn
- Special thanks to the open-source community
Igor Korostil - eeghor@gmail.com
Project Link: https://github.com/eeghor/mta
- Shapley value computation can be slow for large numbers of channels
- Additive hazard model requires evenly-spaced time points for best results