Cumulative Commit: Pruning + Artefact Functionality & HowTo

First version, to be continued
ibs-lab · Feb 16, 2024 · 961ee89 · 961ee89
1 parent 51c9344
commit 961ee89
Show file tree

Hide file tree

Showing 8 changed files with 207 additions and 1 deletion.
diff --git a/_doc/contributing_code.md b/_doc/contributing_code.md
@@ -0,0 +1,86 @@
+# A brief beginners guide for contributing functionality
+
+The aim of this document is to help you get started with contributing your own code/functionality to cedalion. Since the toolbox is in its early stage the documentation is only beginning to be set up and information will be updated continuously.
+
+## General Rules
+When contributing code, try to incorporate it into the existing file and folder structure, which also represents cedalions package and subpackages. Python files (=modules) contain functions of the same category and folders (subpackages) contain python files of the same family. Examples:
+- "SplineSG" and "tPCA" functions should be part of the same python file (module) "artefactcorr", which contains all functions of the "artefact correction" category
+- "artefactcorr.py" and "quality.py" belong into the folder (subpackage) "sigproc", which contains all categories(modules) that belong to the"signal processing" family 
+
+Only if your new code does semantically or functionally not belong to any existing module or subpackage should you create new ones. But please do so in that case. To commit new code, please create a pull-request to  @emidell on github. The code will be merged after a code review.
+
+## File and folder structure
+When you write code, you will have to integrate it into the existing structure. The cedalion directory is structured into 4 sections: 
+
+![parent_directories](img/contributing_code/dirs_parent.png)
+1) **doc**: documentation.
+2) **examples**: a directory with jupyter notebooks that exemplify an assembled pipeline or functionality. Should be executable in a standalone fashion and annotated to explain the code to a naive user. If you add significantly complex new functionality, you generate an example notebook here. For contributing new functions, you might want to start with a jupyter notebook, from which you then migrate matured code into functions in the source folder.
+3) **src**: the source folder that is the codebase of the toolbox. More details below.
+4) **tests**: unit tests for cedalion code, are run upon commit on github to check code consistency.
+
+The following gives a brief overview of the **src** directory structure:
+
+![src_directories](img/contributing_code/dirs_src.png)
+
+Widely used general functionality is part of the files on the top level (e.g. nirs.py ).
+
+| **directory** | **explanation** |
+| ----------- | ----------- |
+| **data** | Look up tables and other small datasets often required for ecexuting functions. |
+| **dataclasses** | Dataclass definitions that are used in cedalion. Example: in xrschemas.py you can find that we work with xarray objects (see more detail in the next section). For time series data these have to have at least two dimensions: "time" and "channel".  |
+| **geometry** | Functions for geometric manipulations, e.g. for optode registration, building landmarks on a 3D head, head segmentation, etc. |
+| **imagereco** | Functions for DOT image reconstruction |
+| **io** | Functions for reading and writing data to and from cedalion. This includes for instance fnirs data in snirf format, probe geometries or reading anatomies (e.g. segmentation masks). |
+| **models** | Functions for data modelling, for instance the General Linear Model (GLM).|
+| **sigdecomp** | Functions for signal decomposition methods that are not part of a standard python distribution, e.g. advanced ICA methods.|
+| **sigproc** | Functions for time series signal processing, i.e. filtering, artefact rejection, signal quality assessment.|
+| **sim** | Functionality for simulation and data augmentation, for instance adding synthetic HRFs or creating toydata.|
+
+
+## Data Classes and X-Arrays 
+Cedalion is using XArrays as main data structures for processing. There are currently two jupyter notebooks that explain these data structures:
+
+[Using xarray-based data structures for calculating the Beer-Lambert transformation](https://github.com/ibs-lab/cedalion/blob/main/examples/new_conference_example1.ipynb)
+
+[An example finger tapping analysis](https://github.com/ibs-lab/cedalion/blob/main/examples/new_conference_example2.ipynb)
+
+## Example for contributing new functionality to Cedalion
+In this example we will incorporate two new functions into the toolbox: A function that identifies motion artifacts in fNIRS data and a function to prune bad channels. Both functions are replicating the Homer3 functions "hmR_MotionArtifact" and "hmR_PruneChannels". 
+
+### Where do these functions belong?
+The function to prune channels belongs to /siproc/quality.py and will simply be called "prune". It will then be available with 
+
+`import cedalion.sigproc.quality` and calling `quality.prune()`.
+
+The function to identify motion artifacts belongs to a new module in the sigproc subpackage, since at the time of this example sigproc only has a module "frequency" and "quality". We will call this module "artifact" and the function "id_motion". It will then be available with 
+
+`import cedalion.sigproc.artifact` and calling `artifact.id_motion()`.
+
+### Adding a new function
+The barebone structure of a function in cedalion is the following:
+
+```
+import cedalion.dataclasses as cdc
+import cedalion.typing as cdt
+from cedalion import Quantity, units
+
+
+@cdc.validate_schemas
+def function_name(inputVar1: cdt.NDTimeSeries, inputVar2: Quantity):
+    
+    """What does this function do?.
+
+    [1] Authors et al., "title", Journal vol., year, doi:
+    """
+
+    #
+    # YOUR CODE
+    #
+
+    return something
+
+```
+
+This generates a new function *function_name* with two input arguments *inputVar1* and *inputVar2* that returns what it did as *something*.
+The input arguments should have expected data types assigned to them. Here, *inputVar1* is expected to be of the data type "NDTimeSeriesSchema", which is an xarray that contains at least two dimensions "channel" and "time". *inputVar2* is expected to be of the data type *Quantity* ==[TBD? @emiddell]==.   
+The function is wrapped by putting `@cdc.validate_schemas`in front, which will check these data types and assert an error at runtime, if the inputs do not match the expected type.
diff --git a/_doc/img/contributing_code/dirs_parent.png b/_doc/img/contributing_code/dirs_parent.png
diff --git a/_doc/img/contributing_code/dirs_src.png b/_doc/img/contributing_code/dirs_src.png
diff --git a/examples/pruning_and_motion_artifacts.ipynb b/examples/pruning_and_motion_artifacts.ipynb
diff --git a/src/cedalion/dataclasses/xrschemas.py b/src/cedalion/dataclasses/xrschemas.py
@@ -83,6 +83,12 @@ def wrapper(*args, **kwargs):
 )
 
 
+NDDataSetSchema = DataArraySchema(
+    dims=["channel", "time"],
+    coords={"time": ["time", "samples"], "channel": ["channel"], "geo3d": ["label", "pos"] },
+)
+
+
 # FIXME better location?
 def build_timeseries(
     data: np.ndarray,

diff --git a/src/cedalion/sigproc/artifact.py b/src/cedalion/sigproc/artifact.py
@@ -0,0 +1,55 @@
+import numpy as np
+
+import cedalion.dataclasses as cdc
+import cedalion.typing as cdt
+from cedalion import Quantity, units
+
+
+@cdc.validate_schemas
+def id_motion(fNIRSdata: cdt.NDTimeSeries, 
+              tMotion: Quantity, tMask: Quantity, STDEVthresh: Quantity, AMPthresh: Quantity):
+    """Identify motion artifacts in an fNIRS input data array. If any active
+    data channel exhibits a signal change greater than std_thresh or amp_thresh, 
+    then a segment of data around that time point is marked as a motion artifact.
+
+    Based on Homer3 [1] v1.80.2 "hmR_MotionArtifact.m"
+    Boston University Neurophotonics Center
+    https://github.com/BUNPC/Homer3
+
+    [1] Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009).
+     "HomER: a review of time-series analysis methods for near-infrared spectroscopy of the brain". 
+     Appl Opt, 48(10), D280–D298. https://doi.org/10.1364/ao.48.00d280
+    
+    
+    INPUTS:
+    amplitudes:     NDTimeSeries, input fNIRS data array with at least time and channel dimensions. 
+                    Expectation is data in optical density units.
+    tMotion:        Quantity, time in seconds for motion artifact detection. Checks for signal change indicative
+                    of a motion artifact over time range tMotion. 
+    tMask:          Quantity, time in seconds to mask around motion artifacts. Mark data over +/- tMask seconds 
+                    around the identified motion artifact as a motion artifact. 
+    STDEVthresh:    Quantity, threshold for std deviation of signal change. If the signal d for any given active 
+                    channel changes by more that stdev_thresh * stdev(d) over the time interval tMotion, then
+                    this time point is marked as a motion artifact. The standard deviation is determined for 
+                    each channel independently. Typical value ranges from 5 to 20. Use a value of 100 or greater 
+                    if you wish for this condition to not find motion artifacts.
+    AMPthresh:      Quantity, threshold for amplitude of signal change. If the signal d for any given active channel 
+                    changes by more that amp_thresh over the time interval tMotion, then this time point is marked 
+                    as a motion artifact. Typical value ranges from 0.01 to 0.3 for optical density units. 
+                    Use a value greater than 100 if you wish for this condition to not find motion artifacts.
+
+    OUTPUS: 
+    fNIRSdata:      NDTimeSeries, input fNIRS data array with an additional dimension "artefact_mask" that is a boolean
+                    array with "true" for data included and "false" to indicate motion artifacts.
+
+    PARAMETERS:
+    tMotion: 0.5
+    tMask: 1.0
+    STDEVthresh: 50.0
+    AMPthresh: 5.0
+
+    """
+
+
+
+    return sci
diff --git a/src/cedalion/sigproc/quality.py b/src/cedalion/sigproc/quality.py
@@ -2,6 +2,7 @@
 
 import cedalion.dataclasses as cdc
 import cedalion.typing as cdt
+import cedalion.sigproc
 from cedalion import Quantity, units
 
 from .frequency import freq_filter, sampling_rate
@@ -41,3 +42,60 @@ def sci(amplitudes: cdt.NDTimeSeries, window_length: Quantity):
     sci /= windows.std("window").prod("wavelength")
 
     return sci
+
+
+@cdc.validate_schemas
+def snr(amplitudes: cdt.NDTimeSeries):
+    """Calculates channel SNR of each channel and wavelength as the ratio betwean mean and std 
+    of the amplitude signal    
+    """
+    snr = amplitudes.mean("time") / amplitudes.std("time")
+
+    return snr
+
+
+#@cdc.validate_schemas
+def prune(data: cdt.NDTimeSeries, SNRThresh: Quantity, dRange: Quantity, SDrange: Quantity):
+    """Prune channels from the measurement list if their signal is too weak, too strong, or their 
+    standard deviation is too great. This function updates MeasListAct based on whether data 'd' 
+    meets these conditions as specified by'dRange' and 'SNRthresh'.
+
+    Based on Homer3 [1] v1.80.2 "hmR_PruneChannels.m"
+    Boston University Neurophotonics Center
+    https://github.com/BUNPC/Homer3
+
+    [1] Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009).
+     "HomER: a review of time-series analysis methods for near-infrared spectroscopy of the brain". 
+     Appl Opt, 48(10), D280–D298. https://doi.org/10.1364/ao.48.00d280
+    
+    INPUTS:
+    amplitues:  NDDataSetSchema, input fNIRS data array with at least time and channel dimensions and geo3D coordinates.
+    SNRthresh:  Quantity, SNR threshold (unitless). 
+                If mean(d)/std(d) < SNRthresh then it is excluded as an active channel
+    dRange:     Quantity, in mm if mean(d) < dRange(1) or > dRange(2) then it is excluded as an active channel
+    SDrange - will prune channels with a source-detector separation <SDrange(1) or > SDrange(2)
+
+    OUTPUTS:    input NDDataSetSchema with a the Measurement List "MeasList" reduced by pruned channels 
+    """
+
+    # create list of active channels if it does not exist yet
+    if not hasattr(data, "MeasList"):
+        # add measurement list to dat xarray object. it countains the full list of channels
+        data['MeasList'] = list(data.coords['channel'].values)
+
+    # for all active channels in the MeasList check if they meet the conditions or drop them from the list
+
+    ## find channels with bad snr
+    # calculate SNR of channel amplitudes and find channels with SNR below threshold. Then remove them from the MeasList
+    dat_snr = snr(data.amp) 
+    drop_list = dat_snr.where(dat_snr < SNRThresh).dropna(dim="channel").channel.values   
+    data = data.drop_sel(MeasList=drop_list)
+
+    ##
+
+
+
+    ##
+
+
+    return data
diff --git a/src/cedalion/typing.py b/src/cedalion/typing.py
@@ -2,9 +2,10 @@
 
 import xarray as xr
 
-from cedalion.dataclasses.xrschemas import LabeledPointCloudSchema, NDTimeSeriesSchema
+from cedalion.dataclasses.xrschemas import LabeledPointCloudSchema, NDTimeSeriesSchema, NDDataSetSchema
 
 LabeledPointCloud = Annotated[xr.DataArray, LabeledPointCloudSchema]
 NDTimeSeries = Annotated[xr.DataArray, NDTimeSeriesSchema]
+NDDataSet = Annotated[xr.Dataset, NDDataSetSchema]
 
 AffineTransform: TypeAlias = xr.DataArray