Abstract
++ This is the Multi-Armed Bandit Outreach project. This project was built in an effort to provide + outreach to students who are interested in machine learning and are between the education levels of + 11th Grade and college Sophomore. +
++ The application uses a real-world scenario of selecting a + restaurant to eat at. The restaurants are each given a reward + distribution, it is the goal of the system to find which restaurant + is the optimal choice for each iteration. The participant is shown a simulation of + the problem both without context and with it. This is to help them build an understanding + of the purpose of adding context to multi-armed bandits, as well as to see how context affects every + day scenarios. The participant is also able to change the scale of the system, by changing the number of restaurants, + number of iterations or even selecting a different bandit model. +
++ The user is able to select from Epsilon Greedy, Thompson Sampling, Upper Confidence Bound and Random Selection models. + The participant may also choose to run the application in a way that will compare these models for them and show the + results of each bandit model. This functionality helps the participant understand that not all multi-armed bandits + operate exactly the same and shows that there are different solutions to the same problem. Even if they all fall under the + category of multi-armed bandits, each model approaches the problem differently. +
++
Download
++ This project can be accessed at + OutreachMAB Github. +
+Setup
+ Python 3.10 is required for this application. You can then clone the repository in order to access the project. + +Repository Cloning
+-
+
-
+ Clone
+
+git clone https://github.com/iCMAB/OutreachMAB.git
+
+ -
+ Install dependencies
+
pip3 install -r requirements.txt
+
+ -
+ Run
+
python3 main.py
+
+
Application Usage
++ When running the program, there are 3 important screens to pay attention to: +
+1. Settings Selection
+
+
+ Here you can change the bandit model, number of arms (restaurants in the context of the problem) and then number of iterations. +
+2. Simulation
+
+
+ The simulation consists of three major parts. The control center in the top left where the participant can go through the iterations of the simulation. + Then there is the reward and regret graphs, one is cumulative and one is per iteration. The last part is the graphs along the right side of the screen. + These graphs show the current distribution of rewards that the bandit has collected from each restaurant. +
+ +3. Results
+
+
+ At the conclusion of the simulation, the final graphs are shown. +
++ The two larger graphs show reward and regret in two different ways. The first graph is cumulative reward and regret, with the second being + average reward and regret over each iteration. Each of these graphs have a description beneath them to explain what the graph represents. + + Then on the right the final distribution found by the bandit for each restaurant is shown. +
++
The Team
+| Carter Vail | +Dante Falardeau | +
|---|---|
|
+ Fourth Year Software Engineering student. + Interested in software development. LinkedIn + |
+
+ Fifth year Software Engineer interested in integrating automation into existing workflows. + LinkedIn + |
+
| Devroop Kar | +Dr. Daniel Krutz | +
|
+ Incoming PhD Student in Computing and Information Sciences. + Data Engineer and AI Enthusiast + LinkedIn + |
+
+ Director of the
+ AWARE LAB
+ and assistant professor. Interested in Self + Adaptive Systems, Strategic Reasoning and + Computing Education. + |
+