You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gist of the issue is that Heudiconv is grouping files into SeqInfo objects incorrectly for certain (albeit fringe) inputs. This results in a silent error in Heudiconv that is very counterintuitive for users to identify without going into Heudiconv's source and digging through the DICOM headers of the source data. In more detail:
For some data I was given, derived images have inherited their SeriesNumber and ProtocolName attributes from the original images (exactly as they were in the original images, with no modifications), but were assigned their own unique SeriesInstanceUID, ImageType, etc.
In Heudiconv's dicoms.py, because files are grouped into SeqInfo objects based on the SeriesID NamedTuple, which tracks only SeriesNumber and ProtocolName, the derived images are being added to the file list of the same SeqInfo object as the original files despite the fact that the dicomwrapper for that SeqInfo reports the dicom header attributes for the original (and not the derived) images.
The consequence of this behavior is that I can write a heuristic file that will put original images in output file A and derived images in output file B, but then observe that derived dicom files are erroneously included in A (due to a collision of their SeriesID NamedTuple with the original image's SeriesID). No errors or warnings are given (at least with outtype=("dicom",) in create_key() of my heuristics file) and if I were to place a debug statement inside my heuristic's infotodict method that prints the image_type and series_uid of each SeqInfo, I would see that the SeqInfo for output file A reports the image type as ORIGINAL as well as the expected SeriesInstanceUID. However, some of the DICOM included in output file A would report an image type of DERIVED as well as an unexpected SeriesInstanceUID if examined individually. This makes the source of the unexpected behavior so difficult for a user to identify that I feel it could use some attention in Heudiconv's code.
Because I'm not familiar with the whole of Heudiconv's code, I'm wary about changing anything in the source without consulting the pros. That said, I've done some digging and I think I can identify exactly where the issue occurs. If that's potentially helpful, read on:
The original and derived images would have been correctly identified as belonging to different series on line 369 of dicoms.py, resulting in distinct values being appended to group_values. The issue (I think) only occurs because the original and derived images append the same SeriesID object to the group_keys list, which is then used to map each file to a dicomwrapper through group_map (on line 389) and to group files (on line 402).
ingrp = False
# check if same series was already converted
for idx in range(len(mwgroup)):
if mw.is_same_series(mwgroup[idx]): # This is line 369
if grouping != "all":
assert (
mwgroup[idx].dcm_data.get("StudyInstanceUID") == file_studyUID
), "Same series found for multiple different studies"
ingrp = True
series_id = SeriesID(
mwgroup[idx].dcm_data.SeriesNumber,
mwgroup[idx].dcm_data.ProtocolName,
)
if per_studyUID:
series_id = series_id._replace(file_studyUID=file_studyUID)
group_keys.append(series_id)
group_values.append(idx)
if not ingrp:
mwgroup.append(mw)
group_keys.append(series_id)
group_values.append(len(mwgroup) - 1)
group_map = dict(zip(group_keys, group_values))
...
for series_id, mwidx in sorted(group_map.items()):
mw = mwgroup[mwidx]
series_files = [files[i] for i, s in enumerate(group_keys) if s == series_id] # This is line 402
It doesn't look to me like it is necessary to use group_keys for either of those tasks. The list group_values already maps each file to the dicomwrapper of the first file in its series (as determined by dicomwrapper.is_same_series on line 369) which should be sufficent to group the files by series and retrieve the relevant dicomwrapper to pass to create_seqinfo. The value of series_id is derived from the dicomwrapper stored in mwgroup anyways, so it could be computed immediately before calling create_seqinfo without needing to track it at all through group_keys. I feel like I must be overlooking something very important, though, so I will leave it to more experienced Heudiconv contributors to determine what, if anything, should be changed.
Best,
Anthony
The text was updated successfully, but these errors were encountered:
WinderAJ
changed the title
SeriesID(NamedTuple) does not uniquely identify series
Single SeqInfo contains files from multiple DICOM series
Feb 13, 2025
The gist of the issue is that Heudiconv is grouping files into SeqInfo objects incorrectly for certain (albeit fringe) inputs. This results in a silent error in Heudiconv that is very counterintuitive for users to identify without going into Heudiconv's source and digging through the DICOM headers of the source data. In more detail:
For some data I was given, derived images have inherited their SeriesNumber and ProtocolName attributes from the original images (exactly as they were in the original images, with no modifications), but were assigned their own unique SeriesInstanceUID, ImageType, etc.
In Heudiconv's dicoms.py, because files are grouped into SeqInfo objects based on the SeriesID NamedTuple, which tracks only SeriesNumber and ProtocolName, the derived images are being added to the file list of the same SeqInfo object as the original files despite the fact that the dicomwrapper for that SeqInfo reports the dicom header attributes for the original (and not the derived) images.
The consequence of this behavior is that I can write a heuristic file that will put original images in output file A and derived images in output file B, but then observe that derived dicom files are erroneously included in A (due to a collision of their SeriesID NamedTuple with the original image's SeriesID). No errors or warnings are given (at least with outtype=("dicom",) in create_key() of my heuristics file) and if I were to place a debug statement inside my heuristic's infotodict method that prints the image_type and series_uid of each SeqInfo, I would see that the SeqInfo for output file A reports the image type as ORIGINAL as well as the expected SeriesInstanceUID. However, some of the DICOM included in output file A would report an image type of DERIVED as well as an unexpected SeriesInstanceUID if examined individually. This makes the source of the unexpected behavior so difficult for a user to identify that I feel it could use some attention in Heudiconv's code.
Because I'm not familiar with the whole of Heudiconv's code, I'm wary about changing anything in the source without consulting the pros. That said, I've done some digging and I think I can identify exactly where the issue occurs. If that's potentially helpful, read on:
The original and derived images would have been correctly identified as belonging to different series on line 369 of dicoms.py, resulting in distinct values being appended to group_values. The issue (I think) only occurs because the original and derived images append the same SeriesID object to the group_keys list, which is then used to map each file to a dicomwrapper through group_map (on line 389) and to group files (on line 402).
It doesn't look to me like it is necessary to use group_keys for either of those tasks. The list group_values already maps each file to the dicomwrapper of the first file in its series (as determined by dicomwrapper.is_same_series on line 369) which should be sufficent to group the files by series and retrieve the relevant dicomwrapper to pass to create_seqinfo. The value of series_id is derived from the dicomwrapper stored in mwgroup anyways, so it could be computed immediately before calling create_seqinfo without needing to track it at all through group_keys. I feel like I must be overlooking something very important, though, so I will leave it to more experienced Heudiconv contributors to determine what, if anything, should be changed.
Best,
Anthony
The text was updated successfully, but these errors were encountered: