Merge pull request #78 from yarikoptic/enh-codespell

Add codespell support: config, workflow + make it fix all typos
PeerHerholz · Jun 10, 2024 · dc0153e · dc0153e
2 parents 23317f6 + f91be07
commit dc0153e
Show file tree

Hide file tree

Showing 13 changed files with 41 additions and 11 deletions.
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,23 @@
+# Codespell configuration is within setup.cfg
+---
+name: Codespell
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/.github/workflows/container_build_publish.yml b/.github/workflows/container_build_publish.yml
@@ -19,7 +19,7 @@ jobs:
       - name: Checkout code
         uses: actions/checkout@v3
 
-      # setup Docker buld action
+      # setup Docker build action
       - name: Set up Docker Buildx
         id: buildx
         uses: docker/setup-buildx-action@v2

diff --git a/CODE_OF_CONDUCT.rst b/CODE_OF_CONDUCT.rst
@@ -44,7 +44,7 @@ Project maintainers have the right and responsibility to remove, edit, or reject
 
 ## Enforcement
 
-Members of the community who violate these rules - no matter how much they have contributed to the BIDS Starter Kit, or how specialised their skill set - will be approached by Peer or Rita. If inappropriate behaviour persists after this discussion, the contributer will be asked to discontinue their participation in the BIDSonym project.
+Members of the community who violate these rules - no matter how much they have contributed to the BIDS Starter Kit, or how specialised their skill set - will be approached by Peer or Rita. If inappropriate behaviour persists after this discussion, the contributor will be asked to discontinue their participation in the BIDSonym project.
 
 **To report an issue you have with community interactions** please contact [Peer](https://github.com/peerherholz) or [Michael](https://github.com/M-earnest). All communication will be treated as confidential.
 

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -55,7 +55,7 @@ If you find one that's similar but there are subtle differences, please referenc
 *These pull requests have been closed for inactivity.*
 
 Before proposing a new pull request, browse through the "orphaned" pull requests.
-You may find that someone has already made significant progress toward your goal, and you can re-use their
+You may find that someone has already made significant progress toward your goal, and you can reuse their
 unfinished work.
 An adopted PR should be updated to merge or rebase the current master, and a new PR should be created (see
 below) that references the original PR.

diff --git a/bidsonym/_version.py b/bidsonym/_version.py
@@ -268,7 +268,7 @@ def git_pieces_from_vcs(tag_prefix, root, verbose, run_command=run_command):
         # TAG-NUM-gHEX
         mo = re.search(r'^(.+)-(\d+)-g([0-9a-f]+)$', git_describe)
         if not mo:
-            # unparseable. Maybe git-describe is misbehaving?
+            # unparsable. Maybe git-describe is misbehaving?
             pieces["error"] = ("unable to parse git-describe output: '%s'"
                                % describe_out)
             return pieces

diff --git a/bidsonym/run_deeid.py b/bidsonym/run_deeid.py
@@ -87,7 +87,7 @@ def run_deeid():
     if args.brainextraction is None:
         raise Exception("For post defacing quality it is required to run a form of brainextraction"
                         "on the non-deindentified data. Thus please either indicate bet "
-                        "(--brainextration bet) or nobrainer (--brainextraction nobrainer).")
+                        "(--brainextraction bet) or nobrainer (--brainextraction nobrainer).")
 
     if args.skip_bids_validation:
         print("Input data will not be checked for BIDS compliance.")

diff --git a/bidsonym/utils.py b/bidsonym/utils.py
@@ -212,7 +212,7 @@ def del_meta_data(bids_dir, subject_label, fields_del):
 
 def rename_non_deid(bids_dir, subject_label):
     """
-    Rename orginal non-defaced images and meta-data json files
+    Rename original non-defaced images and meta-data json files
     to add respective identifier ('desc-nondeid').
 
     Parameters

diff --git a/docs/reference.bib b/docs/reference.bib
@@ -20,7 +20,7 @@ @article{bischoff-grethe_technique_2007
 	pages = {892--903},
 	number = {9},
 	journaltitle = {Human brain mapping},
-	shortjournal = {Hum Brain Mapp},
+	shortjournal = {Hum Brain Map},
 	author = {Bischoff-Grethe, Amanda and Ozyurt, I. Burak and Busa, Evelina and Quinn, Brian T. and Fennema-Notestine, Christine and Clark, Camellia P. and Morris, Shaunna and Bondi, Mark W. and Jernigan, Terry L. and Dale, Anders M. and Brown, Gregory G. and Fischl, Bruce},
 	urldate = {2019-10-08},
 	date = {2007-09},

diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -30,7 +30,7 @@ to employ the ``latest``/most up to date ``version`` you can either run
 
     docker pull peerherholz/bidsonym:latest
 
-or the same command withouth the ``:latest`` tag, as ``Docker`` searches for the ``latest`` tag by default.
+or the same command without the ``:latest`` tag, as ``Docker`` searches for the ``latest`` tag by default.
 However, as the ``latest`` version is subject to changes and not necessarily in synch with the most recent ``numbered version``, it 
 is recommend to utilize the latter to ensure reproducibility. For example, if you want to employ ``BIDSonym v0.0.4`` the command would look as follows:
 

diff --git a/docs/source/processing_details.rst b/docs/source/processing_details.rst
@@ -47,7 +47,7 @@ When running ``BIDSonym``, the following processing steps are executed:
     back from to the ``bids_dataset`` directory without the necessity to run the corresponding DICOM to Nifti in
     BIDS conversion again.
 
-  4. **evalution of metadata**:
+  4. **evaluation of metadata**:
 
     The metadata found in both, the ``header of the images`` and ``sidecar JSON files`` will gathered
     and saved in a tabular data file (.tsv) of the form ``metadata field : value`` to the

diff --git a/paper/BIDSonym.bib b/paper/BIDSonym.bib
@@ -323,7 +323,7 @@ @misc{brett_nibabel_2020
 	title = {nibabel},
 	shorttitle = {nipy/nibabel},
 	url = {https://zenodo.org/record/3757992#.X-Tef-lKjUI},
-	abstract = {3.1.0 (Monday 20 April 2020) New feature release in the 3.1.x series. New features Conformation function (processing.conform) and CLI tool (nib-conform) to apply shape, orientation and zooms (pr/853) (Jakub Kaczmarzyk, reviewed by CM, YOH) Affine rescaling function (affines.rescale\_affine) to update dimensions and voxel sizes (pr/853) (CM, reviewed by Jakub Kaczmarzyk) Bug fixes Delay import of h5py until neded (pr/889) (YOH, reviewed by CM) Maintenance Fix typo in documentation (pr/893) (Zvi Baratz, reviewed by CM) Tests converted from nose to pytest (pr/865 + many sub-PRs) (Dorota Jarecka, Krzyzstof Gorgolewski, Roberto Guidotti, Anibal Solon, Or Duek, CM) API changes and deprecations kw\_only\_meth/kw\_only\_func decorators are deprecated (pr/848) (RM, reviewed by CM)},
+	abstract = {3.1.0 (Monday 20 April 2020) New feature release in the 3.1.x series. New features Conformation function (processing.conform) and CLI tool (nib-conform) to apply shape, orientation and zooms (pr/853) (Jakub Kaczmarzyk, reviewed by CM, YOH) Affine rescaling function (affines.rescale\_affine) to update dimensions and voxel sizes (pr/853) (CM, reviewed by Jakub Kaczmarzyk) Bug fixes Delay import of h5py until needed (pr/889) (YOH, reviewed by CM) Maintenance Fix typo in documentation (pr/893) (Zvi Baratz, reviewed by CM) Tests converted from nose to pytest (pr/865 + many sub-PRs) (Dorota Jarecka, Krzyzstof Gorgolewski, Roberto Guidotti, Anibal Solon, Or Duek, CM) API changes and deprecations kw\_only\_meth/kw\_only\_func decorators are deprecated (pr/848) (RM, reviewed by CM)},
 	urldate = {2020-12-24},
 	publisher = {Zenodo},
 	author = {Brett, Matthew and Markiewicz, Christopher J. and Hanke, Michael and Côté, Marc-Alexandre and Cipollini, Ben and McCarthy, Paul and Jarecka, Dorota and Cheng, Christopher P. and Halchenko, Yaroslav O. and Cottaar, Michiel and Ghosh, Satrajit and Larson, Eric and Wassermann, Demian and Gerhard, Stephan and Lee, Gregory R. and Wang, Hao-Ting and Kastman, Erik and Kaczmarzyk, Jakub and Guidotti, Roberto and Duek, Or and Rokem, Ariel and Madison, Cindee and Morency, Félix C. and Moloney, Brendan and Goncalves, Mathias and Markello, Ross and Riddell, Cameron and Burns, Christopher and Millman, Jarrod and Gramfort, Alexandre and Leppäkangas, Jaakko and Sólon, Anibal and van den Bosch, Jasper J.F. and Vincent, Robert D. and Braun, Henry and Subramaniam, Krish and Gorgolewski, Krzysztof J. and Raamana, Pradeep Reddy and Nichols, B. Nolan and Baker, Eric M. and Hayashi, Soichi and Pinsard, Basile and Haselgrove, Christian and Hymers, Mark and Esteban, Oscar and Koudoro, Serge and Oosterhof, Nikolaas N. and Amirbekian, Bago and Nimmo-Smith, Ian and Nguyen, Ly and Reddigari, Samir and St-Jean, Samuel and Panfilov, Egor and Garyfallidis, Eleftherios and Varoquaux, Gael and Legarreta, Jon Haitz and Hahn, Kevin S. and Hinds, Oliver P. and Fauber, Bennet and Poline, Jean-Baptiste and Stutters, Jon and Jordan, Kesshi and Cieslak, Matthew and Moreno, Miguel Estevan and Haenel, Valentin and Schwartz, Yannick and Baratz, Zvi and Darwin, Benjamin C and Thirion, Bertrand and Papadopoulos Orfanos, Dimitri and Pérez-García, Fernando and Solovey, Igor and Gonzalez, Ivan and Palasubramaniam, Jath and Lecher, Justin and Leinweber, Katrin and Raktivan, Konstantinos and Fischer, Peter and Gervais, Philippe and Gadde, Syam and Ballinger, Thomas and Roos, Thomas and Reddam, Venkateswara Reddy and freec84},

diff --git a/paper/paper.md b/paper/paper.md
@@ -27,7 +27,7 @@ bibliography: BIDSonym.bib
 ---
 
 ## Statement of Need
-Due to the evolution of research incentives, technical advancements, and the development of new standards [@eickhoff_sharing_2016; @gorgolewski_brain_2016; @nichols_best_2017; @poldrack_toward_2013; @poldrack_making_2014; @poldrack_openfmri_2017], increasingly greater amounts of neuroimaging data are being shared either publicly or made available through data user agreements. These datasets originate from small samples of participants collected by individual research groups, as well as from “Big Data” samples including thousands of participants collected by large research consortia (UK Biobank [@sudlow_uk_2015], HCP [@van_essen_wu-minn_2013], ABIDE [@di_martino_autism_2014], ADNI [@mueller_alzheimers_2005], etc.) While data sharing is important and beneficial [@eickhoff_sharing_2016; @nichols_best_2017; @poldrack_making_2014; @poline_data_2012], the privacy of participant data must be protected [@bannier_open_2020; @brakewood_ethics_2013]. To that end, Ethic Review Boards and data sharing platforms typically require that uploaded datasets are provided in anonymized or pseudo-anonymized form, limiting participant reidentification.  However, the (pseudo-)anonymization process is deceptively complex; attempts at ensuring data privacy must take into consideration all dataset components, including imaging modalities, as well as national legal and ethical frameworks. Several algorithms have been developed to (pseudo-)anonymize imaging datasets but they offer limited solutions. Some are attached to specific software and some are limited to specific computing environments; most miss an in-depth assessment and treatment of the metadata attached to the dataset or lack the capacity to automatize (pseudo-)anonymization across large datasets. BIDSonym was created to address these points in one simple, flexible, and general tool that offers users an array of automated (pseudo-)anonymization options to augment participant privacy in neuroimaging datasets. There are two components of neuroimaging datasets that arguably pose the largest risk to maintaining participant privacy: the structural images and accompanying metadata (e.g., metadata text files or information embedded in image file headers). Structural images contain visible identifiable participant information via facial features like the eyes, nose, and mouth, and privacy is usually addressed through a process called “defacing”, within which all or a subset of these features are removed from the final structural data files. The metadata text files may additionally contain identifiable participant data through the recording of acquisition time and location, and personal details such as date of birth, height, and weight. Here, privacy is maintained by removing or blurring this information from the final dataset. BIDSonym addresses both vulnerabilities in neuroimaging datasets, obviating the need for multiple steps within a data sharing pipeline to ensure participant privacy.
+Due to the evolution of research incentives, technical advancements, and the development of new standards [@eickhoff_sharing_2016; @gorgolewski_brain_2016; @nichols_best_2017; @poldrack_toward_2013; @poldrack_making_2014; @poldrack_openfmri_2017], increasingly greater amounts of neuroimaging data are being shared either publicly or made available through data user agreements. These datasets originate from small samples of participants collected by individual research groups, as well as from “Big Data” samples including thousands of participants collected by large research consortia (UK Biobank [@sudlow_uk_2015], HCP [@van_essen_wu-minn_2013], ABIDE [@di_martino_autism_2014], ADNI [@mueller_alzheimers_2005], etc.) While data sharing is important and beneficial [@eickhoff_sharing_2016; @nichols_best_2017; @poldrack_making_2014; @poline_data_2012], the privacy of participant data must be protected [@bannier_open_2020; @brakewood_ethics_2013]. To that end, Ethic Review Boards and data sharing platforms typically require that uploaded datasets are provided in anonymized or pseudo-anonymized form, limiting participant reidentification.  However, the (pseudo-)anonymization process is deceptively complex; attempts at ensuring data privacy must take into consideration all dataset components, including imaging modalities, as well as national legal and ethical frameworks. Several algorithms have been developed to (pseudo-)anonymize imaging datasets but they offer limited solutions. Some are attached to specific software and some are limited to specific computing environments; most miss an in-depth assessment and treatment of the metadata attached to the dataset or lack the capacity to automate (pseudo-)anonymization across large datasets. BIDSonym was created to address these points in one simple, flexible, and general tool that offers users an array of automated (pseudo-)anonymization options to augment participant privacy in neuroimaging datasets. There are two components of neuroimaging datasets that arguably pose the largest risk to maintaining participant privacy: the structural images and accompanying metadata (e.g., metadata text files or information embedded in image file headers). Structural images contain visible identifiable participant information via facial features like the eyes, nose, and mouth, and privacy is usually addressed through a process called “defacing”, within which all or a subset of these features are removed from the final structural data files. The metadata text files may additionally contain identifiable participant data through the recording of acquisition time and location, and personal details such as date of birth, height, and weight. Here, privacy is maintained by removing or blurring this information from the final dataset. BIDSonym addresses both vulnerabilities in neuroimaging datasets, obviating the need for multiple steps within a data sharing pipeline to ensure participant privacy.
 
 ## Summary
 In concordance with the BIDS-App template [@gorgolewski_bids_2017], BIDSonym operates as a command line tool written in Python [@rossum_python_1995] and

diff --git a/setup.cfg b/setup.cfg
@@ -4,3 +4,10 @@ style = pep440
 versionfile_source = bidsonym/_version.py
 versionfile_build = bidsonym/_version.py
 tag_prefix = v
+
+[codespell]
+# Ref: https://github.com/codespell-project/codespell#using-a-config-file
+skip = .git,versioneer.py
+check-hidden = true
+# ignore-regex = 
+# ignore-words-list =