Skip to content

cmu-lib/lovedata2025-theses-map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Explore the Universe of Graduate Research at CMU! 🚀

Introducing the KiltHub Thesis Explorer 🔭

Ever wondered how your research connects to work across different departments? Curious about unexplored areas in your field? We're excited to introduce a new interactive visualization tool that maps the entire landscape of graduate research at CMU!

What You'll See 👀

Imagine a star map, but instead of stars, you're looking at every graduate thesis in KiltHub. Each point represents a thesis, and the space between them shows how related their topics are – the closer together, the more similar their research themes. This creates a fascinating "galaxy" of CMU graduate research where you can:

  • Discover unexpected connections between different fields
  • Find potential collaborators across departments
  • Identify unique research opportunities
  • Explore the evolution of research themes across colleges

How to Explore 🔍

Interactive Features

  • Zoom: Like Google Maps, but for research! Zoom in to explore specific research clusters or out to see the big picture
  • Hover: Mouse over any point to see thesis details, including:
    • Title
    • College
    • Research topics
  • Filter: Click on college names to show or hide different departments

Research Insights ⚛️

  • Find clusters of related work across different colleges
  • Spot gaps between fields that might inspire new research directions
  • Identify potential interdisciplinary collaboration opportunities
  • See how your research interests connect to other fields

Behind the Scenes 🛠️

This visualization was created using state-of-the-art natural language processing techniques that transform thesis abstracts into a semantic map. Think of it as a GPS system for research topics – it captures the semantic meaning of the research, not just keywords, allowing it to show conceptual relationships between different works.

Technical Details

Feature Extraction: Count vectorizer followed by Class-based TF-IDF transformation
Embedding Model: `paraphrase-MiniLM-L6-v2` from sentence-transformers to generate 384-dimensional embeddings of thesis descriptions
Clustering Method: HDBSCAN to assign points to clusters in the embedding space
Dimensionality Reduction: We used UMAP to reduce from 384 to 2 dimensions for visualization purposes

Acknowledgements

We used BERTopic to develop the workflow for this project.


This project is a collaboration between the University Libraries and the Tartan Research Data Alliance. Built with love for the Love Data Week 2025 and the CMU community.

TRDA members: Chehak Arora and Alfredo González-Espinoza

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages