Skip to content

Commit

Permalink
Cumulative Commit: Pruning + Artefact Functionality & HowTo
Browse files Browse the repository at this point in the history
First version, to be continued
  • Loading branch information
avolu committed Feb 16, 2024
1 parent 51c9344 commit 961ee89
Show file tree
Hide file tree
Showing 8 changed files with 207 additions and 1 deletion.
86 changes: 86 additions & 0 deletions _doc/contributing_code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# A brief beginners guide for contributing functionality

The aim of this document is to help you get started with contributing your own code/functionality to cedalion. Since the toolbox is in its early stage the documentation is only beginning to be set up and information will be updated continuously.

## General Rules
When contributing code, try to incorporate it into the existing file and folder structure, which also represents cedalions package and subpackages. Python files (=modules) contain functions of the same category and folders (subpackages) contain python files of the same family. Examples:
- "SplineSG" and "tPCA" functions should be part of the same python file (module) "artefactcorr", which contains all functions of the "artefact correction" category
- "artefactcorr.py" and "quality.py" belong into the folder (subpackage) "sigproc", which contains all categories(modules) that belong to the"signal processing" family

Only if your new code does semantically or functionally not belong to any existing module or subpackage should you create new ones. But please do so in that case. To commit new code, please create a pull-request to @emidell on github. The code will be merged after a code review.

## File and folder structure
When you write code, you will have to integrate it into the existing structure. The cedalion directory is structured into 4 sections:

![parent_directories](img/contributing_code/dirs_parent.png)
1) **doc**: documentation.
2) **examples**: a directory with jupyter notebooks that exemplify an assembled pipeline or functionality. Should be executable in a standalone fashion and annotated to explain the code to a naive user. If you add significantly complex new functionality, you generate an example notebook here. For contributing new functions, you might want to start with a jupyter notebook, from which you then migrate matured code into functions in the source folder.
3) **src**: the source folder that is the codebase of the toolbox. More details below.
4) **tests**: unit tests for cedalion code, are run upon commit on github to check code consistency.

The following gives a brief overview of the **src** directory structure:

![src_directories](img/contributing_code/dirs_src.png)

Widely used general functionality is part of the files on the top level (e.g. nirs.py ).

| **directory** | **explanation** |
| ----------- | ----------- |
| **data** | Look up tables and other small datasets often required for ecexuting functions. |
| **dataclasses** | Dataclass definitions that are used in cedalion. Example: in xrschemas.py you can find that we work with xarray objects (see more detail in the next section). For time series data these have to have at least two dimensions: "time" and "channel". |
| **geometry** | Functions for geometric manipulations, e.g. for optode registration, building landmarks on a 3D head, head segmentation, etc. |
| **imagereco** | Functions for DOT image reconstruction |
| **io** | Functions for reading and writing data to and from cedalion. This includes for instance fnirs data in snirf format, probe geometries or reading anatomies (e.g. segmentation masks). |
| **models** | Functions for data modelling, for instance the General Linear Model (GLM).|
| **sigdecomp** | Functions for signal decomposition methods that are not part of a standard python distribution, e.g. advanced ICA methods.|
| **sigproc** | Functions for time series signal processing, i.e. filtering, artefact rejection, signal quality assessment.|
| **sim** | Functionality for simulation and data augmentation, for instance adding synthetic HRFs or creating toydata.|


## Data Classes and X-Arrays
Cedalion is using XArrays as main data structures for processing. There are currently two jupyter notebooks that explain these data structures:

[Using xarray-based data structures for calculating the Beer-Lambert transformation](https://github.com/ibs-lab/cedalion/blob/main/examples/new_conference_example1.ipynb)

[An example finger tapping analysis](https://github.com/ibs-lab/cedalion/blob/main/examples/new_conference_example2.ipynb)

## Example for contributing new functionality to Cedalion
In this example we will incorporate two new functions into the toolbox: A function that identifies motion artifacts in fNIRS data and a function to prune bad channels. Both functions are replicating the Homer3 functions "hmR_MotionArtifact" and "hmR_PruneChannels".

### Where do these functions belong?
The function to prune channels belongs to /siproc/quality.py and will simply be called "prune". It will then be available with

`import cedalion.sigproc.quality` and calling `quality.prune()`.

The function to identify motion artifacts belongs to a new module in the sigproc subpackage, since at the time of this example sigproc only has a module "frequency" and "quality". We will call this module "artifact" and the function "id_motion". It will then be available with

`import cedalion.sigproc.artifact` and calling `artifact.id_motion()`.

### Adding a new function
The barebone structure of a function in cedalion is the following:

```
import cedalion.dataclasses as cdc
import cedalion.typing as cdt
from cedalion import Quantity, units
@cdc.validate_schemas
def function_name(inputVar1: cdt.NDTimeSeries, inputVar2: Quantity):
"""What does this function do?.
[1] Authors et al., "title", Journal vol., year, doi:
"""
#
# YOUR CODE
#
return something
```

This generates a new function *function_name* with two input arguments *inputVar1* and *inputVar2* that returns what it did as *something*.
The input arguments should have expected data types assigned to them. Here, *inputVar1* is expected to be of the data type "NDTimeSeriesSchema", which is an xarray that contains at least two dimensions "channel" and "time". *inputVar2* is expected to be of the data type *Quantity* ==[TBD? @emiddell]==.
The function is wrapped by putting `@cdc.validate_schemas`in front, which will check these data types and assert an error at runtime, if the inputs do not match the expected type.
Binary file added _doc/img/contributing_code/dirs_parent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _doc/img/contributing_code/dirs_src.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
6 changes: 6 additions & 0 deletions src/cedalion/dataclasses/xrschemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ def wrapper(*args, **kwargs):
)


NDDataSetSchema = DataArraySchema(
dims=["channel", "time"],
coords={"time": ["time", "samples"], "channel": ["channel"], "geo3d": ["label", "pos"] },
)


# FIXME better location?
def build_timeseries(
data: np.ndarray,
Expand Down
55 changes: 55 additions & 0 deletions src/cedalion/sigproc/artifact.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import numpy as np

import cedalion.dataclasses as cdc
import cedalion.typing as cdt
from cedalion import Quantity, units


@cdc.validate_schemas
def id_motion(fNIRSdata: cdt.NDTimeSeries,
tMotion: Quantity, tMask: Quantity, STDEVthresh: Quantity, AMPthresh: Quantity):
"""Identify motion artifacts in an fNIRS input data array. If any active
data channel exhibits a signal change greater than std_thresh or amp_thresh,
then a segment of data around that time point is marked as a motion artifact.
Based on Homer3 [1] v1.80.2 "hmR_MotionArtifact.m"
Boston University Neurophotonics Center
https://github.com/BUNPC/Homer3
[1] Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009).
"HomER: a review of time-series analysis methods for near-infrared spectroscopy of the brain".
Appl Opt, 48(10), D280–D298. https://doi.org/10.1364/ao.48.00d280
INPUTS:
amplitudes: NDTimeSeries, input fNIRS data array with at least time and channel dimensions.
Expectation is data in optical density units.
tMotion: Quantity, time in seconds for motion artifact detection. Checks for signal change indicative
of a motion artifact over time range tMotion.
tMask: Quantity, time in seconds to mask around motion artifacts. Mark data over +/- tMask seconds
around the identified motion artifact as a motion artifact.
STDEVthresh: Quantity, threshold for std deviation of signal change. If the signal d for any given active
channel changes by more that stdev_thresh * stdev(d) over the time interval tMotion, then
this time point is marked as a motion artifact. The standard deviation is determined for
each channel independently. Typical value ranges from 5 to 20. Use a value of 100 or greater
if you wish for this condition to not find motion artifacts.
AMPthresh: Quantity, threshold for amplitude of signal change. If the signal d for any given active channel
changes by more that amp_thresh over the time interval tMotion, then this time point is marked
as a motion artifact. Typical value ranges from 0.01 to 0.3 for optical density units.
Use a value greater than 100 if you wish for this condition to not find motion artifacts.
OUTPUS:
fNIRSdata: NDTimeSeries, input fNIRS data array with an additional dimension "artefact_mask" that is a boolean
array with "true" for data included and "false" to indicate motion artifacts.
PARAMETERS:
tMotion: 0.5
tMask: 1.0
STDEVthresh: 50.0
AMPthresh: 5.0
"""



return sci
58 changes: 58 additions & 0 deletions src/cedalion/sigproc/quality.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import cedalion.dataclasses as cdc
import cedalion.typing as cdt
import cedalion.sigproc
from cedalion import Quantity, units

from .frequency import freq_filter, sampling_rate
Expand Down Expand Up @@ -41,3 +42,60 @@ def sci(amplitudes: cdt.NDTimeSeries, window_length: Quantity):
sci /= windows.std("window").prod("wavelength")

return sci


@cdc.validate_schemas
def snr(amplitudes: cdt.NDTimeSeries):
"""Calculates channel SNR of each channel and wavelength as the ratio betwean mean and std
of the amplitude signal
"""
snr = amplitudes.mean("time") / amplitudes.std("time")

return snr


#@cdc.validate_schemas
def prune(data: cdt.NDTimeSeries, SNRThresh: Quantity, dRange: Quantity, SDrange: Quantity):
"""Prune channels from the measurement list if their signal is too weak, too strong, or their
standard deviation is too great. This function updates MeasListAct based on whether data 'd'
meets these conditions as specified by'dRange' and 'SNRthresh'.
Based on Homer3 [1] v1.80.2 "hmR_PruneChannels.m"
Boston University Neurophotonics Center
https://github.com/BUNPC/Homer3
[1] Huppert, T. J., Diamond, S. G., Franceschini, M. A., & Boas, D. A. (2009).
"HomER: a review of time-series analysis methods for near-infrared spectroscopy of the brain".
Appl Opt, 48(10), D280–D298. https://doi.org/10.1364/ao.48.00d280
INPUTS:
amplitues: NDDataSetSchema, input fNIRS data array with at least time and channel dimensions and geo3D coordinates.
SNRthresh: Quantity, SNR threshold (unitless).
If mean(d)/std(d) < SNRthresh then it is excluded as an active channel
dRange: Quantity, in mm if mean(d) < dRange(1) or > dRange(2) then it is excluded as an active channel
SDrange - will prune channels with a source-detector separation <SDrange(1) or > SDrange(2)
OUTPUTS: input NDDataSetSchema with a the Measurement List "MeasList" reduced by pruned channels
"""

# create list of active channels if it does not exist yet
if not hasattr(data, "MeasList"):
# add measurement list to dat xarray object. it countains the full list of channels
data['MeasList'] = list(data.coords['channel'].values)

# for all active channels in the MeasList check if they meet the conditions or drop them from the list

## find channels with bad snr
# calculate SNR of channel amplitudes and find channels with SNR below threshold. Then remove them from the MeasList
dat_snr = snr(data.amp)
drop_list = dat_snr.where(dat_snr < SNRThresh).dropna(dim="channel").channel.values
data = data.drop_sel(MeasList=drop_list)

##



##


return data
3 changes: 2 additions & 1 deletion src/cedalion/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

import xarray as xr

from cedalion.dataclasses.xrschemas import LabeledPointCloudSchema, NDTimeSeriesSchema
from cedalion.dataclasses.xrschemas import LabeledPointCloudSchema, NDTimeSeriesSchema, NDDataSetSchema

LabeledPointCloud = Annotated[xr.DataArray, LabeledPointCloudSchema]
NDTimeSeries = Annotated[xr.DataArray, NDTimeSeriesSchema]
NDDataSet = Annotated[xr.Dataset, NDDataSetSchema]

AffineTransform: TypeAlias = xr.DataArray

0 comments on commit 961ee89

Please sign in to comment.