Replies: 6 comments 3 replies
-
The "DICOM Metadata Digest (CSV)" for this dataset is an outlier. Most TCIA collections don't have this file provided as a CSV attachment on the wiki. However, all of the information in that CSV is available via our API which is likely preferable to relying on a CSV wiki attachment. Option #1: https://services.cancerimagingarchive.net/services/v4/TCIA/query/getSeries?Collection=ISPY1 might be sufficient for your needs. This will spit out a subset of image metadata for each DICOM scan/series in the ISPY1 collection which could then be merged with the clinical spreadsheet. This should be more generalizable and easier to work with than manually tracking down a spreadsheet from the wiki. Note that you can add "&format=JSON" to specify JSON/CSV/XML. Option #2: If you want an even more robust set of DICOM metadata you could look into this NBIA API endpoint: https://wiki.cancerimagingarchive.net/display/Public/NBIA+Search+REST+API+Guide#NBIASearchRESTAPIGuide-SeriesMetadataAPI. This API endpoint lets a user specify a list of Series UIDs and provides a longer list of metadata fields. This NBIA API requires some extra steps to set up an authorization token before you can use it. Once you have your list of Series UIDs from the previous query you can use https://services.cancerimagingarchive.net/services/v4/TCIA/query/getImage?SeriesInstanceUID=1.3.6.1.4.1.14519.5.2.1.7695.1700.250955243295773832626617549482 to iteratively download each one. The down side of this is that it provides a zip file for each scan that you have to unpack and organize the data yourself into a logical directory hierarchy. We are also very close (weeks, I think, not months) to releasing a "command line interface" version of the NBIA Data Retriever which might also be of interest. This will let you specify a ".TCIA" manifest file (which you can obtain using the "Image Download" button on the ISPY page) and that will easily download all the data from the collection/manifest into a well organized hierarchy (Collection / Patient / Study / Series / Image). |
Beta Was this translation helpful? Give feedback.
-
Regarding DICOM SR, you might also want to take a look at these datasets as examples: QIN-PROSTATE-Repeatability - https://doi.org/10.7937/K9/TCIA.2018.MR1CKGND |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
We discussed a dev task in #2877, and @Nic-Ma has kindly come up with an initial solution: |
Beta Was this translation helpful? Give feedback.
-
Nice! Had not thought of downloading the CSV file. As long as we also
have the option of specifying only filename (and not a url) so that we can
load a local CSV file, your solution looks good.
Is it assumed that the first row of the csv gives the name for each column
(i.e., col_name)? That seems fine, but should be made explicit in
description.
Should it be possible to specify rows for training, testing, and validation
- much like the decathlon data?
Should there be an md5 checksum for the CSV, to ensure its data hasn't been
modified?
…On Mon, Sep 6, 2021 at 9:16 AM Wenqi Li ***@***.***> wrote:
We discussed a dev task in #2877
<#2877>, and @Nic-Ma
<https://github.com/Nic-Ma> has kindly come up with an initial solution:
https://github.com/Project-MONAI/tutorials/blob/82e1e623c2cfaad3b3dd94db537bb743dce523a6/modules/tcia_csv_processing.ipynb
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2212 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACEJL22F5CAWYBICRITTT3UAS5KTANCNFSM45D52SSQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Stephen R. Aylward, Ph.D.
Senior Director of Strategic Initiatives
---
Kitware: *Delivering innovative, open source, scientific software.*
|
Beta Was this translation helpful? Give feedback.
-
For specifying training, testing, and validation data, perhaps a column can
be designated to contain that info. It would greatly reduce the
complexity of the command line and allow for a full experiment to be
defined and reproducible.
On Tue, Sep 7, 2021 at 12:35 PM Stephen Aylward ***@***.***>
wrote:
… Nice! Had not thought of downloading the CSV file. As long as we also
have the option of specifying only filename (and not a url) so that we can
load a local CSV file, your solution looks good.
Is it assumed that the first row of the csv gives the name for each column
(i.e., col_name)? That seems fine, but should be made explicit in
description.
Should it be possible to specify rows for training, testing, and
validation - much like the decathlon data?
Should there be an md5 checksum for the CSV, to ensure its data hasn't
been modified?
On Mon, Sep 6, 2021 at 9:16 AM Wenqi Li ***@***.***> wrote:
> We discussed a dev task in #2877
> <#2877>, and @Nic-Ma
> <https://github.com/Nic-Ma> has kindly come up with an initial solution:
>
> https://github.com/Project-MONAI/tutorials/blob/82e1e623c2cfaad3b3dd94db537bb743dce523a6/modules/tcia_csv_processing.ipynb
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#2212 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACEJL22F5CAWYBICRITTT3UAS5KTANCNFSM45D52SSQ>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
>
--
Stephen R. Aylward, Ph.D.
Senior Director of Strategic Initiatives
---
Kitware: *Delivering innovative, open source, scientific software.*
--
Stephen R. Aylward, Ph.D.
Senior Director of Strategic Initiatives
---
Kitware: *Delivering innovative, open source, scientific software.*
|
Beta Was this translation helpful? Give feedback.
-
This is a proposal from the MONAI Developers and the I/O, Data, and Deploy Working Groups.
A. Overview
Premise:
A potential growth area for MONAI is via the incorporation of adjunct data (patient demographics, lab results, image acquisition parameters and other non-image data) with images for diagnoses and outcome prediction. See MIDL 2020 keynote talk by Prof. Nikos Paragios https://2020.midl.io/keynotes.html
Goal:
Provide reference implementation of a Dataset loader in MONAI to help guide challenge organizers and researchers in their organization of adjunct data for input into MONAI.
Proposed solution:
Create a CSV_TCIA_Dataset loader for image and adjunct data, where the images are stored on The Cancer Image Archive (TCIA).
B. Data Details
Proposed data:
ISPY1 = Breast cancer MRI curated cases with DICOM images and adjunct CSV data are available on the TCIA.
Data Location:
https://wiki.cancerimagingarchive.net/display/Public/ISPY1
Data Description:
ACRIN 6657 was designed as a prospective study to test MRI for ability to predict response to treatment and risk-of-recurrence in patients with stage 2 or 3 breast cancer receiving neoadjuvant chemotherapy (NACT). ACRIN 6657 was conducted as a companion study to CALGB 150007, a correlative science study evaluating tissue-based biomarkers in the setting of neoadjuvant treatment of breast cancer. Collectively, CALGB 150007 and ACRIN 6657 formed the basis of the multicenter Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and moLecular Analysis (I-SPY TRIAL) breast cancer trial, a study of imaging and tissue-based biomarkers for predicting pathologic complete response (pCR) and recurrence-free survival (RFS).Participant Eligibility and Enrollment: Criteria for inclusion were patients enrolling on CALGB 150007 with T3 tumors measuring at least 3 cm in diameter by clinical exam or imaging and receiving neoadjuvant chemotherapy with an anthracycline-cyclophosphamide regimen alone or followed by a taxane. Pregnant patients and those with ferromagnetic prostheses were excluded from the study. The study was open to enrollment from May 2002 to March 2006. 237 patients were enrolled, of which 230 met eligibility criteria.
C. Accessing the Data
1) MONAI users will have a local CSV file
The local CSV file points to the TCIA data via a URL and includes adjunct data and outcomes. This will be the input to the Dataset loader along with lists specifying columns for inputs and outcomes.
The local CSV file is based on info currently spread across multiple CSV files, but we should consolidate to one CSV for this demo. For example, see the two source CSV files at:
(a) https://drive.google.com/file/d/1DGAz4MVjupAiai3bOaYEerM6LImXIJg5/view?usp=sharing - Provides all adjunct medical data and the URL the points to each patient's collection on TCIA.
(b) https://drive.google.com/file/d/1D2zfyWCLfFHPwfDKIfNFEgeBqiS1YeNR/view?usp=sharing - Lists the MRI scans available for each patient on TCIA.
Both (a) and (b) should be combined into a single CSV file. That combination is partially completed in this file - https://drive.google.com/file/d/1HQ7BZvBr1edmi8HIwdG5KBweXWms5Uzk/view?usp=sharing
2) The CSV file will be passed to the CSV_TCIA_Dataset command
Let's assuming we want to load the adjunct data in columns M (age), N (ERpos), and P (PfRpos); the URL of the data on TCIA is given in column AM (https); and the outcome to be determined is in column AE (RFS). the command may look like:
monai.apps.CsvTciaDataset("ISPY1_Combined.csv",["M","N","P"],["AM"],["AE"],"/tmp","training",transforms)
3) Image data can be loaded from TCIA using its REST API
The API is documented at https://wiki.cancerimagingarchive.net/display/Public/TCIA+Programmatic+Interface+REST+API+Guides
Via that API, we can access individual cases. In this study, for each case, there are studies from 4 different time points. At each time point, there are the DICOM images and segmentations. For our example, the Dynamic-3dfgre may be most informative of outcome, and we should use that scan from the last time point for each patient.
D. Open Issues
E. Future Opportunities
Beta Was this translation helpful? Give feedback.
All reactions