-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
87c060f
commit 365b41c
Showing
7 changed files
with
100 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,23 @@ | ||
.Rproj.user | ||
.Rhistory | ||
.RData | ||
.Ruserdata | ||
doc | ||
Meta | ||
|
||
# History files | ||
.Rhistory | ||
.Rapp.history | ||
|
||
# Session Data files | ||
.RData | ||
|
||
# Example code in package build process | ||
*-Ex.R | ||
|
||
# Output files from R CMD check | ||
/*.Rcheck/ | ||
|
||
# RStudio files | ||
.Rproj.user/ | ||
|
||
# Mac OS | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,15 @@ | ||
# cui2vec | ||
# cui2vec | ||
|
||
This repo contains the code associated with the following paper (under review): | ||
|
||
> Kompa, B., Schmaltz, A., Fried, I., Griffin, W, Palmer, N.P., Shi, X., Cai, T., Kohane, I.S., and Beam, A.L., 2019. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. arXiv preprint arXiv:1804.01486. | ||
# Overview | ||
|
||
This repo contains the R pacakge `cui2vec`, which provides code for fitting embeddings to your own co-occurrence data in the manner presented in the above paper. The package can be installed locally from source. An overview of usage is provided in the following HTML vignette, which can be viewed in your browser: | ||
|
||
[vignettes/rendered/2019_07_31/cui2vecWorkflow.html](vignettes/rendered/2019_07_31/cui2vecWorkflow.html). | ||
|
||
Additional information on each of the public functions can be accessed in the standard way (e.g., ```?cui2vec::construct_word2vec_embedding```). | ||
|
||
Data agreements prevent us from releasing all of our original source data, but upon acceptance, we will release our embeddings at the following URL: TBD. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
% Generated by Paperpile. Check out http://paperpile.com for more information. | ||
% BibTeX export options can be customized via Settings -> BibTeX. | ||
@ARTICLE{Beam2018-vl, | ||
title = "Clinical Concept Embeddings Learned from Massive Sources of | ||
Multimodal Medical Data", | ||
author = "Beam, Andrew L and Kompa, Benjamin and Fried, Inbar and | ||
Palmer, Nathan P and Shi, Xu and Cai, Tianxi and Kohane, | ||
Isaac S", | ||
abstract = "Word embeddings are a popular approach to unsupervised | ||
learning of word relationships that are widely used in | ||
natural language processing. In this article, we present a | ||
new set of embeddings for medical concepts learned using an | ||
extremely large collection of multimodal medical data. | ||
Leaning on recent theoretical insights, we demonstrate how | ||
an insurance claims database of 60 million members, a | ||
collection of 20 million clinical notes, and 1.7 million | ||
full text biomedical journal articles can be combined to | ||
embed concepts into a common space, resulting in the largest | ||
ever set of embeddings for 108,477 medical concepts. To | ||
evaluate our approach, we present a new benchmark | ||
methodology based on statistical power specifically designed | ||
to test embeddings of medical concepts. Our approach, called | ||
cui2vec, attains state of the art performance relative to | ||
previous methods in most instances. Finally, we provide a | ||
downloadable set of pre-trained embeddings for other | ||
researchers to use, as well as an online tool for | ||
interactive exploration of the cui2vec embeddings.", | ||
month = apr, | ||
year = 2018, | ||
keywords = "cui2vec", | ||
archivePrefix = "arXiv", | ||
primaryClass = "cs.CL", | ||
eprint = "1804.01486" | ||
} |