Math Concepts for Developers - SoftUni 2023 course project
This project investigates the effect of different distance metrics on the performance of Eigenfaces for face recognition using the Labeled Faces in the Wild (LFW) database. Following the technique originally proposed by Turk and Pentland, the Eigenfaces algorithm is implemented, involving data preprocessing, covariance matrix computation, eigenvalue decomposition, and projection. To ensure accurate and comparable results, a preprocessing approach is applied to the images prior to analysis. The primary focus is on evaluating the impact of three commonly used distance metrics: City block, Euclidean, and Cosine. The model is trained on a subset of the dataset, with the remaining images (25%) used for testing. The recognition performance of Eigenfaces is assessed using these different distance metrics. The evaluation results indicate that the Euclidean distance metric performed slightly better than the Cityblock and Cosine metrics in terms of accuracy, recall, and F1 score. It achieved a slightly higher accuracy and better overall performance compared to the other two metrics. Notably, all three distance metrics achieved perfect precision, indicating no false positive predictions in the face recognition task. It is worth mentioning that some publications have reported on the influence of different distance metrics. These publications suggest that supporting various distance metrics is important for achieving optimal performance. Our results suggest that the Euclidean distance metric may be more suitable for the given face recognition task, as it demonstrated higher accuracy and better overall performance. However, it is crucial to consider other factors and evaluate the specific requirements of the face recognition system for optimal metric selection.
The facial image database: Labeled Faces in the Wild (LFW) is used in our study. It contains 13,233 facial images collected from the web: 5749 individuals where 1680 have two or more distinct images. Each image is a 250x250 jpg, detected and centered using the openCV implementation of Viola-Jones face detector. The cropping region returned by the detector was then automatically enlarged by a factor of 2.2 in each dimension to capture more of the head and then scaled to a uniform size. We extacracted 744 samples, where each sample represents face image. These images correspond to 58 different individuals. Each image is available as "selected_faces_3/name/name_xxxx.jpg", where "xxxx" is the image number padded to four characters with leading zeroes. Forexample, the 10th George_W_Bush image can be found as "selected_faces_3/George_W_Bush/George_W_Bush_0010.jpg".
The results presented in this study may exhibit variability upon rerunning the code due to the random nature of the data splitting process. The random.shuffle() function is used to shuffle the data during the splitting process, which can result in different train and test splits in each run. To ensure reproducibility, it is recommended to set a fixed random seed (e.g., )." This emphasizes the importance of setting a fixed random seed for achieving consistent and reproducible results in the code.
random_seed = 42 # Set the random seed to a fixed value (e.g., 42)
X_train, X_test, y_train, y_test = split_data(X, y, test_size = 0.25, random_seed = random_seed)
However, it is important to note that even with a fixed random seed, minor variations in the results may still occur due to other random processes or dependencies within the code. Therefore, readers are encouraged to consider the overall trends and patterns observed across multiple runs rather than relying solely on individual run results. Additionally, conducting cross-validation or running the analysis multiple times can provide a more comprehensive understanding of the model's performance and help mitigate the impact of random variations.
*It is a reference to the science fiction novel "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, where it is humorously suggested as the "Answer to the Ultimate Question of Life, the Universe, and Everything." The choice of 42 is arbitrary and can be replaced with any other integer value.