In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend his thesis
We propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps improve the accuracy results of Naive Bayes in some, but not all, cases of real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is a lot more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters.
Date: Wednesday, April 23, 2014
Time: 9:00 AM
Place: PGH 362
Faculty, students, and the general public are invited.
Advisor: Prof. Ricardo Vilalta