TOPIC MODELING TOOL
The “Topic Modelling Tool” (TMT), available through the Google Code Repository, provides quick and easy topic model generation and navigation. The TMT is easy to install. With a working Java application on any platform, download the TMT program and click on it to run it. Java will open the program.
The advantages of the TMT are its availability and use. Using these tutorials from Graham, Shawn, Ian Milligan, Scott Weingart, it is easy to perform topic modelling, or to text analysis of a large corpuses of textual information. It is an attempt to inject semantic meaning into vocabulary. Topics are collections of words co-occurring in documents across a corpus. You can name topics, and/or assign meaning to them to see how topics are arranged in the corpus. Because topic modeling creates models, researchers should consider the entire model as they analyze their results. Herein lies the main limitation of the tool for digital humanities work. Focusing too much on a single topic without considering the others may invalidate the results. Before you begin with topic modeling, you should seriously consider your research question and whether this type of distant reading is useful to your project. Matthew Kirschenbaum’s Distant Reading is a good place to start if one is interested in understanding when this type of work is warranted and when it is not.
A great example of the use of the TMT for scholarly research can be found in Cameron Blevins' Topic Modeling Martha Ballard's Diary. One can see how Blevins has highlighted topics found in the texts of the diaries and visualized and analyzed these findings, while maintaining a critical awareness of the limitations of the TMT as a research resource.
Leave a Reply.
This is the DH Blog of Britt Paris, a 2nd-year IS PhD student at UCLA.