Dataset
This class is for loading empirical Datasets. Datasets are stored as matrices and can include functional (e.g. fMRI) and structural (e.g. DTI) data.
Format
Empirical datasets are
stored in the neurolib/data/datasets
directory. In each dataset, subject-wise functional
and structural data is stored as MATLAB .mat
matrices that can be opened in
Python using SciPy's loadmat
function. Structural data are
\(N \times N\), and functional time series are \(N \times t\) matrices, \(N\) being the number
of brain regions and \(t\) the number of time steps. Example datasets are included in neurolib
and custom datasets can be added by placing them in the dataset directory.
Structural DTI data
To simulate a whole-brain network model, first we need to load the structural connectivity
matrices from a DTI data set. The matrices are usually a result of processing DTI data and
performing fiber tractography using software like FSL or
DSIStudio. The handling of the datasets is done by the
Dataset
class, and the attributes in the following refer to its instances.
Upon initialization, the subject-wise data set is loaded from disk. For all examples
in this paper, we use freely available data from the ConnectomeDB of the
Human Connectome Project (HCP). For a given parcellation of the brain
into \(N\) brain regions, these matrices are the \(N \times N\) adjacency matrix self.Cmat
,
i.e. the structural connectivity matrix, which determines the coupling strengths between
brain areas, and the fiber length matrix Dmat
which determines the signal
transmission delays. The two example datasets currently included in neurolib
use the the 80
cortical regions of the AAL2 atlas to define the brain areas and are
sorted in a LRLR-ordering.
Connectivity matrix normalization
The elements of the structural connectivity matrix Cmat
are typically the number
of reconstructed fibers from DTI tractography. Since the number of fibers depends on the
method and the parameters of the (probabilistic or deterministic) tractography, they need to
be normalized using one of the three implemented methods. The first method max
is to
simply divide the entries of Cmat
by the largest entry, such that the the largest
entry becomes 1. The second method waytotal
divides the entries of each column of
Cmat
by the number fiber tracts generated from the respective brain region during
probabilistic tractography in FSL, which is contained in the waytotal.txt
file.
The third method nvoxel
divides the entries of each column of Cmat
by the
size, e.g., the number of voxels of the corresponding brain area. The last two methods yield
an asymmetric connectivity matrix, while the first one keeps Cmat
symmetric.
All normalization steps are done on the subject-wise matrices Cmats
and
Dmats
. In a final step, all matrices can also be averaged across all subjects
to yield one Cmat
and Dmat
per dataset.
Functional MRI data
Subject-wise fMRI time series must be in a \((N \times t)\)-dimensional format, where \(N\) is the
number of brain regions and \(t\) the length of the time series. Each region-wise time series
represents the BOLD activity averaged across all voxels of that region, which can be also obtained
from software like FSL. Functional connectivity (FC) captures the spatial correlation structure
of the BOLD time series averaged across the entire time of the recording. FC matrices are
accessible via the attribute FCs
and are generated by computing the Pearson correlation
of the time series between all regions, yielding a \(N \times N\) matrix for each subject.
To capture the temporal fluctuations of time-dependent FC(t), which are lost when averaging
across the entire recording time, functional connectivity dynamics matrices (FCDs
) are
computed as the element-wise Pearson correlation of time-dependent FC(t) matrices in a moving
window across the BOLD time series of a chosen window length of, for example, 1 min. This
yields a \(t_{FCD} \times t_{FCD}\) FCD matrix for each subject, with \(t_{FCD}\) being the number
of steps the window was moved.
Source code in neurolib/utils/loadData.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
|
__init__(datasetName=None, normalizeCmats='max', fcd=False, subcortical=False)
Load the empirical data sets that are provided with neurolib
.
Right now, datasets work on a per-subject base. A dataset must be located
in the neurolib/data/datasets/
directory. Each subject's dataset
must be in the subjects
subdirectory of that folder. In each subject
folder there is a directory called functional
for time series data
and structural
the structural connectivity data.
See loadData.loadSubjectFiles()
for more details on which files are
being loaded.
The structural connectivity data (accessible using the attribute
loadData.Cmat), can be normalized using the normalizeCmats
flag.
This defaults to "max" which normalizes the Cmat by its maxmimum.
Other options are waytotal
or nvoxel
, which normalizes the
Cmat by dividing every row of the matrix by the waytotal or
nvoxel files that are provided in the datasets.
Info: the waytotal.txt and the nvoxel.txt are files extracted from
the tractography of DTI data using probtrackX
from the fsl
pipeline.
Individual subject data is provided with the class attributes: self.BOLDs: BOLD timeseries of each individual self.FCs: Functional connectivity of BOLD timeseries
Mean data is provided with the class attributes: self.Cmat: Structural connectivity matrix (for coupling strenghts between areas) self.Dmat: Fiber length matrix (for delays) self.BOLDs: BOLD timeseries of each area self.FCs: Functional connectiviy matrices of each BOLD timeseries
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasetName |
str
|
Name of the dataset to load |
None
|
normalizeCmats |
str
|
Normalization method for the structural connectivity matrix. normalizationMethods = ["max", "waytotal", "nvoxel"] |
'max'
|
fcd |
bool
|
Compute FCD matrices of BOLD data, defaults to False |
False
|
subcortical |
bool
|
Include subcortical areas from the atlas or not, defaults to False |
False
|
Source code in neurolib/utils/loadData.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
getDataPerSubject(name, apply='single', apply_function=None, apply_function_kwargs={}, normalizeCmats='max')
Load data of a certain kind for all users of the current dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of data type, i.e. "bold" or "cm" |
required |
apply |
str, optional
|
Apply function per subject ("single") or on all subjects ("all"), defaults to "single" |
'single'
|
apply_function |
function, optional
|
Apply function on data, defaults to None |
None
|
apply_function_kwargs |
dict, optional
|
Keyword arguments of fuction, defaults to {} |
{}
|
Returns:
Type | Description |
---|---|
list[np.ndarray]
|
Subjectwise data, after function apply |
Source code in neurolib/utils/loadData.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
loadDataset(datasetName, normalizeCmats='max', fcd=False, subcortical=False)
Load data into accessible class attributes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasetName |
str
|
Name of the dataset (must be in |
required |
normalizeCmats |
str, optional
|
Normalization method for Cmats, defaults to "max" |
'max'
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
If unknown normalization method is used |
Source code in neurolib/utils/loadData.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
loadMatrix(matFileName, key='', verbose=False)
Function to furiously load .mat files with scipy.io.loadmat. Info: More formats are supported but commented out in the code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
matFileName |
str
|
Filename of matrix to load |
required |
key |
str
|
.mat file key in which data is stored (example: "sc") |
''
|
Returns:
Type | Description |
---|---|
numpy.ndarray
|
Loaded matrix |
Source code in neurolib/utils/loadData.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
|