Dataset
This class is for loading empirical Datasets. Datasets are stored as matrices and can include functional (e.g. fMRI) and structural (e.g. DTI) data.
Format
Empirical datasets are
stored in the neurolib/data/datasets directory. In each dataset, subject-wise functional
and structural data is stored as MATLAB .mat matrices that can be opened in
Python using SciPy's loadmat function. Structural data are
\(N \times N\), and functional time series are \(N \times t\) matrices, \(N\) being the number
of brain regions and \(t\) the number of time steps. Example datasets are included in neurolib
and custom datasets can be added by placing them in the dataset directory.
Structural DTI data
To simulate a whole-brain network model, first we need to load the structural connectivity
matrices from a DTI data set. The matrices are usually a result of processing DTI data and
performing fiber tractography using software like FSL or
DSIStudio. The handling of the datasets is done by the
Dataset class, and the attributes in the following refer to its instances.
Upon initialization, the subject-wise data set is loaded from disk. For all examples
in this paper, we use freely available data from the ConnectomeDB of the
Human Connectome Project (HCP). For a given parcellation of the brain
into \(N\) brain regions, these matrices are the \(N \times N\) adjacency matrix self.Cmat,
i.e. the structural connectivity matrix, which determines the coupling strengths between
brain areas, and the fiber length matrix Dmat which determines the signal
transmission delays. The two example datasets currently included in neurolib use the the 80
cortical regions of the AAL2 atlas to define the brain areas and are
sorted in a LRLR-ordering.
Connectivity matrix normalization
The elements of the structural connectivity matrix Cmat are typically the number
of reconstructed fibers from DTI tractography. Since the number of fibers depends on the
method and the parameters of the (probabilistic or deterministic) tractography, they need to
be normalized using one of the three implemented methods. The first method max is to
simply divide the entries of Cmat by the largest entry, such that the the largest
entry becomes 1. The second method waytotal divides the entries of each column of
Cmat by the number fiber tracts generated from the respective brain region during
probabilistic tractography in FSL, which is contained in the waytotal.txt file.
The third method nvoxel divides the entries of each column of Cmat by the
size, e.g., the number of voxels of the corresponding brain area. The last two methods yield
an asymmetric connectivity matrix, while the first one keeps Cmat symmetric.
All normalization steps are done on the subject-wise matrices Cmats and
Dmats. In a final step, all matrices can also be averaged across all subjects
to yield one Cmat and Dmat per dataset.
Functional MRI data
Subject-wise fMRI time series must be in a \((N \times t)\)-dimensional format, where \(N\) is the
number of brain regions and \(t\) the length of the time series. Each region-wise time series
represents the BOLD activity averaged across all voxels of that region, which can be also obtained
from software like FSL. Functional connectivity (FC) captures the spatial correlation structure
of the BOLD time series averaged across the entire time of the recording. FC matrices are
accessible via the attribute FCs and are generated by computing the Pearson correlation
of the time series between all regions, yielding a \(N \times N\) matrix for each subject.
To capture the temporal fluctuations of time-dependent FC(t), which are lost when averaging
across the entire recording time, functional connectivity dynamics matrices (FCDs) are
computed as the element-wise Pearson correlation of time-dependent FC(t) matrices in a moving
window across the BOLD time series of a chosen window length of, for example, 1 min. This
yields a \(t_{FCD} \times t_{FCD}\) FCD matrix for each subject, with \(t_{FCD}\) being the number
of steps the window was moved.
Source code in neurolib/utils/loadData.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | |
__init__(datasetName=None, normalizeCmats='max', fcd=False, subcortical=False)
Load the empirical data sets that are provided with neurolib.
Right now, datasets work on a per-subject base. A dataset must be located
in the neurolib/data/datasets/ directory. Each subject's dataset
must be in the subjects subdirectory of that folder. In each subject
folder there is a directory called functional for time series data
and structural the structural connectivity data.
See loadData.loadSubjectFiles() for more details on which files are
being loaded.
The structural connectivity data (accessible using the attribute
loadData.Cmat), can be normalized using the normalizeCmats flag.
This defaults to "max" which normalizes the Cmat by its maxmimum.
Other options are waytotal or nvoxel, which normalizes the
Cmat by dividing every row of the matrix by the waytotal or
nvoxel files that are provided in the datasets.
Info: the waytotal.txt and the nvoxel.txt are files extracted from
the tractography of DTI data using probtrackX from the fsl pipeline.
Individual subject data is provided with the class attributes: self.BOLDs: BOLD timeseries of each individual self.FCs: Functional connectivity of BOLD timeseries
Mean data is provided with the class attributes: self.Cmat: Structural connectivity matrix (for coupling strenghts between areas) self.Dmat: Fiber length matrix (for delays) self.BOLDs: BOLD timeseries of each area self.FCs: Functional connectiviy matrices of each BOLD timeseries
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasetName |
str
|
Name of the dataset to load |
None
|
normalizeCmats |
str
|
Normalization method for the structural connectivity matrix. normalizationMethods = ["max", "waytotal", "nvoxel"] |
'max'
|
fcd |
bool
|
Compute FCD matrices of BOLD data, defaults to False |
False
|
subcortical |
bool
|
Include subcortical areas from the atlas or not, defaults to False |
False
|
Source code in neurolib/utils/loadData.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
getDataPerSubject(name, apply='single', apply_function=None, apply_function_kwargs={}, normalizeCmats='max')
Load data of a certain kind for all users of the current dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
Name of data type, i.e. "bold" or "cm" |
required |
apply |
str, optional
|
Apply function per subject ("single") or on all subjects ("all"), defaults to "single" |
'single'
|
apply_function |
function, optional
|
Apply function on data, defaults to None |
None
|
apply_function_kwargs |
dict, optional
|
Keyword arguments of fuction, defaults to {} |
{}
|
Returns:
| Type | Description |
|---|---|
list[np.ndarray]
|
Subjectwise data, after function apply |
Source code in neurolib/utils/loadData.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | |
loadDataset(datasetName, normalizeCmats='max', fcd=False, subcortical=False)
Load data into accessible class attributes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
datasetName |
str
|
Name of the dataset (must be in |
required |
normalizeCmats |
str, optional
|
Normalization method for Cmats, defaults to "max" |
'max'
|
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If unknown normalization method is used |
Source code in neurolib/utils/loadData.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
loadMatrix(matFileName, key='', verbose=False)
Function to furiously load .mat files with scipy.io.loadmat. Info: More formats are supported but commented out in the code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
matFileName |
str
|
Filename of matrix to load |
required |
key |
str
|
.mat file key in which data is stored (example: "sc") |
''
|
Returns:
| Type | Description |
|---|---|
numpy.ndarray
|
Loaded matrix |
Source code in neurolib/utils/loadData.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 | |