diff --git a/images/NSF.png b/images/NSF.png new file mode 100644 index 0000000..25b1089 Binary files /dev/null and b/images/NSF.png differ diff --git a/images/aware-lab-logo.png b/images/aware-lab-logo.png new file mode 100644 index 0000000..e5f1556 Binary files /dev/null and b/images/aware-lab-logo.png differ diff --git a/images/results.png b/images/results.png new file mode 100644 index 0000000..e61a4cc Binary files /dev/null and b/images/results.png differ diff --git a/images/rit.png b/images/rit.png new file mode 100644 index 0000000..0a5466c Binary files /dev/null and b/images/rit.png differ diff --git a/images/select-settings.png b/images/select-settings.png new file mode 100644 index 0000000..2135b18 Binary files /dev/null and b/images/select-settings.png differ diff --git a/images/simulation-screen.png b/images/simulation-screen.png new file mode 100644 index 0000000..3e690aa Binary files /dev/null and b/images/simulation-screen.png differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..e16fb09 --- /dev/null +++ b/index.html @@ -0,0 +1,384 @@ + + + + + + + OutreachMAB + + + + +
+
+

Multi-Armed Bandit Outreach

+
+
+ +
+

Abstract

+

+ This is the Multi-Armed Bandit Outreach project. This project was built in an effort to provide + outreach to students who are interested in machine learning and are between the education levels of + 11th Grade and college Sophomore. +

+

+ The application uses a real-world scenario of selecting a + restaurant to eat at. The restaurants are each given a reward + distribution, it is the goal of the system to find which restaurant + is the optimal choice for each iteration. The participant is shown a simulation of + the problem both without context and with it. This is to help them build an understanding + of the purpose of adding context to multi-armed bandits, as well as to see how context affects every + day scenarios. The participant is also able to change the scale of the system, by changing the number of restaurants, + number of iterations or even selecting a different bandit model. +

+

+ The user is able to select from Epsilon Greedy, Thompson Sampling, Upper Confidence Bound and Random Selection models. + The participant may also choose to run the application in a way that will compare these models for them and show the + results of each bandit model. This functionality helps the participant understand that not all multi-armed bandits + operate exactly the same and shows that there are different solutions to the same problem. Even if they all fall under the + category of multi-armed bandits, each model approaches the problem differently. +

+
+

Download

+

+ This project can be accessed at + OutreachMAB Github. +

+

Setup

+ Python 3.10 is required for this application. You can then clone the repository in order to access the project. + +

Repository Cloning

+
    +
  1. + Clone +
    +git clone https://github.com/iCMAB/OutreachMAB.git
    +
  2. +
  3. + Install dependencies +
    pip3 install -r requirements.txt
    +
  4. +
  5. + Run +
    python3 main.py
    +
  6. +
+

Application Usage

+

+ When running the program, there are 3 important screens to pay attention to: +

+

1. Settings Selection

+ A screen prompting the user to select a multi-armed bandit model, the number of arms and the number of iterations.
+

+ Here you can change the bandit model, number of arms (restaurants in the context of the problem) and then number of iterations. +

+

2. Simulation

+ The simulation screen, including graphs for cumulative reward and regret, a distribution of rewards received from each restaurant and a control center for going through the iterations.
+

+ The simulation consists of three major parts. The control center in the top left where the participant can go through the iterations of the simulation. + Then there is the reward and regret graphs, one is cumulative and one is per iteration. The last part is the graphs along the right side of the screen. + These graphs show the current distribution of rewards that the bandit has collected from each restaurant. +

+ +

3. Results

+ The final results of the situation. Total Reward: 627.8, Total Regret: 194.6. Restaurants were selected 11, 18, 6, 59, and 6 times respectively.
+

+ At the conclusion of the simulation, the final graphs are shown. +

+

+ The two larger graphs show reward and regret in two different ways. The first graph is cumulative reward and regret, with the second being + average reward and regret over each iteration. Each of these graphs have a description beneath them to explain what the graph represents. + + Then on the right the final distribution found by the bandit for each restaurant is shown. +

+
+

The Team

+ + + + + + + + + + + + + + + + + +
Carter VailDante Falardeau
+ Fourth Year Software Engineering student.
+ Interested in software development.
LinkedIn +
+ Fifth year Software Engineer interested in integrating automation into existing workflows.
+ LinkedIn +
Devroop KarDr. Daniel Krutz
+ Incoming PhD Student in Computing and Information Sciences.
+ Data Engineer and AI Enthusiast
+ LinkedIn +
+ Director of the + AWARE LAB + and assistant professor.
Interested in Self + Adaptive Systems, Strategic Reasoning and + Computing Education. +
+
+
+ +
+ + \ No newline at end of file