Ever wondered how your research connects to work across different departments? Curious about unexplored areas in your field? We're excited to introduce a new interactive visualization tool that maps the entire landscape of graduate research at CMU!
Imagine a star map, but instead of stars, you're looking at every graduate thesis in KiltHub. Each point represents a thesis, and the space between them shows how related their topics are – the closer together, the more similar their research themes. This creates a fascinating "galaxy" of CMU graduate research where you can:
- Discover unexpected connections between different fields
- Find potential collaborators across departments
- Identify unique research opportunities
- Explore the evolution of research themes across colleges
- Zoom: Like Google Maps, but for research! Zoom in to explore specific research clusters or out to see the big picture
- Hover: Mouse over any point to see thesis details, including:
- Title
- College
- Research topics
- Filter: Click on college names to show or hide different departments
- Find clusters of related work across different colleges
- Spot gaps between fields that might inspire new research directions
- Identify potential interdisciplinary collaboration opportunities
- See how your research interests connect to other fields
This visualization was created using state-of-the-art natural language processing techniques that transform thesis abstracts into a semantic map. Think of it as a GPS system for research topics – it captures the semantic meaning of the research, not just keywords, allowing it to show conceptual relationships between different works.
Feature Extraction: Count vectorizer followed by Class-based TF-IDF transformation
Embedding Model: `paraphrase-MiniLM-L6-v2` from sentence-transformers to generate 384-dimensional embeddings of thesis descriptions
Clustering Method: HDBSCAN to assign points to clusters in the embedding space
Dimensionality Reduction: We used UMAP to reduce from 384 to 2 dimensions for visualization purposes
We used BERTopic to develop the workflow for this project.
This project is a collaboration between the University Libraries and the Tartan Research Data Alliance. Built with love for the Love Data Week 2025 and the CMU community.
TRDA members: Chehak Arora and Alfredo González-Espinoza