Résumé
E-mail is a common form of communication in regular use today. As such, it is a normal part of investigating a person or a crime. At present, there are many tools to perform bulk analysis and basic searching, but our research advances the state of the art by applying text mining and unsupervised learning techniques to automate the e-mail analysis process. Our key goals are to group similar e-mails together and to identify the concepts (subjects of discussion) of those e-mail groups. We present several new methods to increase the grouping accuracy: e-mail domain analysis and word pair analysis. We also present a technique for concept analysis. These goals are achieved by integrating our research with the capabilities of Weka, an open-source machine learning suite, and WordNet, a lexical database of the English language. We apply this research to the publicly available Enron e-mail dataset. We verify the results by examining the comparative advantage of each new technique.