This is a repository to extract and structure information from given Breast cancer pathology progress notes and pathology report.
The given dataset is separated by |
and ||
symbol. We created report2csv.py
in order to turn the report into csv
format.
python report2csv.py -i input_report.txt -o output_report.csv
Install using setup.py
, running the following
$ python setup.py install
Here are few implemented functions available to extract information from breast cancer reports or progress notes
split()
- split report into list of sentencesextract_time()
- return list of datetime for given stringextract_age_report()
- return approximate age of patientextract_dob_report()
- return date of birth from report if existedextract_estrogen()
- return list of estrogen receptor and its value from reportextract_progesterone()
- return list of progesterone receptor and its value from reportextract_her2()
- return list of HER2 receptor and its value from reportextract_dcis()
- return list of DCIS related sentences and its value
Run StanfordCoreNLP backend
In order to use extractor
, we also incorporate pyner
in order to help
doing name entity recognition task. See this page to
run pyner
on the backend.
Here is example on how to use extractor
library
import extractor
dob = extractor.extract_dob_report(report)