Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single SeqInfo contains files from multiple DICOM series #813

Open
WinderAJ opened this issue Feb 13, 2025 · 0 comments
Open

Single SeqInfo contains files from multiple DICOM series #813

WinderAJ opened this issue Feb 13, 2025 · 0 comments

Comments

@WinderAJ
Copy link

WinderAJ commented Feb 13, 2025

The gist of the issue is that Heudiconv is grouping files into SeqInfo objects incorrectly for certain (albeit fringe) inputs. This results in a silent error in Heudiconv that is very counterintuitive for users to identify without going into Heudiconv's source and digging through the DICOM headers of the source data. In more detail:

For some data I was given, derived images have inherited their SeriesNumber and ProtocolName attributes from the original images (exactly as they were in the original images, with no modifications), but were assigned their own unique SeriesInstanceUID, ImageType, etc.

Image

In Heudiconv's dicoms.py, because files are grouped into SeqInfo objects based on the SeriesID NamedTuple, which tracks only SeriesNumber and ProtocolName, the derived images are being added to the file list of the same SeqInfo object as the original files despite the fact that the dicomwrapper for that SeqInfo reports the dicom header attributes for the original (and not the derived) images.

The consequence of this behavior is that I can write a heuristic file that will put original images in output file A and derived images in output file B, but then observe that derived dicom files are erroneously included in A (due to a collision of their SeriesID NamedTuple with the original image's SeriesID). No errors or warnings are given (at least with outtype=("dicom",) in create_key() of my heuristics file) and if I were to place a debug statement inside my heuristic's infotodict method that prints the image_type and series_uid of each SeqInfo, I would see that the SeqInfo for output file A reports the image type as ORIGINAL as well as the expected SeriesInstanceUID. However, some of the DICOM included in output file A would report an image type of DERIVED as well as an unexpected SeriesInstanceUID if examined individually. This makes the source of the unexpected behavior so difficult for a user to identify that I feel it could use some attention in Heudiconv's code.

Because I'm not familiar with the whole of Heudiconv's code, I'm wary about changing anything in the source without consulting the pros. That said, I've done some digging and I think I can identify exactly where the issue occurs. If that's potentially helpful, read on:


The original and derived images would have been correctly identified as belonging to different series on line 369 of dicoms.py, resulting in distinct values being appended to group_values. The issue (I think) only occurs because the original and derived images append the same SeriesID object to the group_keys list, which is then used to map each file to a dicomwrapper through group_map (on line 389) and to group files (on line 402).

ingrp = False
# check if same series was already converted
for idx in range(len(mwgroup)):
    if mw.is_same_series(mwgroup[idx]): # This is line 369
        if grouping != "all":
            assert (
                mwgroup[idx].dcm_data.get("StudyInstanceUID") == file_studyUID
            ), "Same series found for multiple different studies"
        ingrp = True
        series_id = SeriesID(
            mwgroup[idx].dcm_data.SeriesNumber,
            mwgroup[idx].dcm_data.ProtocolName,
        )
        if per_studyUID:
            series_id = series_id._replace(file_studyUID=file_studyUID)
        group_keys.append(series_id)
        group_values.append(idx)

if not ingrp:
    mwgroup.append(mw)
    group_keys.append(series_id)
    group_values.append(len(mwgroup) - 1)

group_map = dict(zip(group_keys, group_values))

...

for series_id, mwidx in sorted(group_map.items()):
    mw = mwgroup[mwidx]
    series_files = [files[i] for i, s in enumerate(group_keys) if s == series_id] # This is line 402

It doesn't look to me like it is necessary to use group_keys for either of those tasks. The list group_values already maps each file to the dicomwrapper of the first file in its series (as determined by dicomwrapper.is_same_series on line 369) which should be sufficent to group the files by series and retrieve the relevant dicomwrapper to pass to create_seqinfo. The value of series_id is derived from the dicomwrapper stored in mwgroup anyways, so it could be computed immediately before calling create_seqinfo without needing to track it at all through group_keys. I feel like I must be overlooking something very important, though, so I will leave it to more experienced Heudiconv contributors to determine what, if anything, should be changed.

Best,
Anthony

@WinderAJ WinderAJ changed the title SeriesID(NamedTuple) does not uniquely identify series Single SeqInfo contains files from multiple DICOM series Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant