Skip to content

Breast cancer pathology and progress notes information extraction (WIP)

Notifications You must be signed in to change notification settings

yejunbin/pathology_extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Pathology Extractor

This is a repository to extract and structure information from given Breast cancer pathology progress notes and pathology report.

Report text to csv file

The given dataset is separated by | and || symbol. We created report2csv.py in order to turn the report into csv format.

python report2csv.py -i input_report.txt -o output_report.csv

Install and use extractors

Install using setup.py, running the following

$ python setup.py install

Here are few implemented functions available to extract information from breast cancer reports or progress notes

  • split() - split report into list of sentences
  • extract_time() - return list of datetime for given string
  • extract_age_report() - return approximate age of patient
  • extract_dob_report() - return date of birth from report if existed
  • extract_estrogen() - return list of estrogen receptor and its value from report
  • extract_progesterone() - return list of progesterone receptor and its value from report
  • extract_her2() - return list of HER2 receptor and its value from report
  • extract_dcis() - return list of DCIS related sentences and its value

Run StanfordCoreNLP backend

In order to use extractor, we also incorporate pyner in order to help doing name entity recognition task. See this page to run pyner on the backend.

Examples

Here is example on how to use extractor library

import extractor
dob = extractor.extract_dob_report(report)

Dependencies

About

Breast cancer pathology and progress notes information extraction (WIP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages