dsmtools.preprocessing
One of the major features of this package, also as the prerequisite for you to run a neural network on neuron morphology data, is to convert SWC into sequences of quantifiable features. This module provides classes to:
- commit quality control on a batch of SWC dataframes to make sure the structures can be converted to sequence.
- convert a batch of SWC into sequences of tree node traversals annotated with useful features.
The classes under this module also use multiprocessing to boost the speed as the tasks are highly homogeneous.
Traversal of Binary Tree
A binary tree can be traversed by preorder, inorder and postorder, which is the simplest idea of converting a tree structure to sequence. Here, we don't include the breadth- and depth-first methods that produce more diverse results for general tree structures.
How do you get each of them? As a simple explanation, when you come into a tree node, w/o children typically recognized as its subtrees, do the following steps recursively:
Order | Step 1 | Step 2 | Step 3 |
---|---|---|---|
pre | output the current node | traverse the left subtree | traverse the right subtree |
in | traverse the left subtree | output the current node | traverse the right subtree |
post | traverse the left subtree | traverse the right subtree | output the current node |
This workflow is executed from the root to the leaves for any node in the tree. As the structure of the tree and its subtrees are comparable, this is usually implemented with recursion.
Any binary tree can and only be uniquely represented by either pre+inorder or post+inorder traversal pairs. A single traversal of any kind is attainable from a bunch of binary trees, the number of which is given by the Catalan number.
The matter of left or right is decided by the builder of the tree. They are essentially interchangeable, so you would call where you start the left.
A full explanation of tree traversal, if you are interested.
As you may note, these 3 orders of traversal make sense only for binary trees. A node with #children more than 2 would give more combinations of where to put the current node in the sequence. So how do we deal with neuron tree structures, which can have nodes with many children, especially at the soma? It's answered in the following section.
Neuron Tree to Sequence
To make a neuron tree traversable, some preprocessing steps has to be done. In quality control, the program detects any node with #children more than 2 except for the root node (soma), and turns them into a series of bifurcations by a specific priority (further explained in the QC section).
Now, the whole neuron tree is not yet binary, but all of its subtrees, so we can reconstruct multiple binary trees with a subset of nodes excluding the root. After getting traversal sequence for each tree, it chains the sequences into one by a specific priority. Or can interpret it as an extended definition of traversal:
Order | Step 1 | Step 2 | Step 3 | ... | Step n |
---|---|---|---|---|---|
extended pre | parent | child 1 | child 2 | ... | child n-1 |
extended post | child 1 | child 2 | child 3 | ... | parent |
You may still wonder:
What does the specific priority mean?
By the total path length of the subtree, which you can take as weight. So every time running into a node, we compare the weight of the 2 subtrees and decide either the lesser(default) or greater will be traversed first. Same is true for the final chaining of sequences, and we use this priority for converting the branch nodes to binary ones.What kind of traversal order are you using?
Though we provide the API for all the 3 orders, we only do preorder for our dataset generation. A preorder can be mapped to multiple different tree structures. That being said, we found that one order is quite enough for information embedding in our application, and there is no significant difference between them.Now that we need only 1 traversal, why must binary trees?
You may have found, a long as the tree is traversed by a determined priority and only 1 order is enough, whether the tree is binary matters no more. That's true, but we still expect this package to give you a chance to try the complete representation of a tree by sequence for your own application. Another reason is bifurcations take up the majority of neuron tree branch nodes, and we are more interested in bifurcations for analysis.How many binary trees are there?
It depends on the number of stems your neuron has. A stem is the branch emerging from the root, so one binary tree corresponds to one stem. However, our program adopts a more interesting strategy to build binary trees, and you can actually get more than that. In our package, by default, we build trees for axon and dendrite separately. For neurons where axon can emerge from the dendrite not as a stem, this makes a difference, in that the axon would otherwise be counted as part of dendrite in a binary tree. The Customization section has a more instructive guide.
Reasons for Quality Control
Before converting your SWC files to our sequences, some prerequisites should be met. They are:
- 1 tree for 1 SWC, i.e. no more than 1 root node in an SWC file.
- Short segments are removed.
- All non-root branching nodes are binary.
remove multiple trees
As a usual practice, a single SWC would only be given a single neuron tree, but it may not be true for some automatic tracing algorithms and tracings with error.
Although our final transformation of neuron tree to sequence doesn't require there to be only one connected component, as you may find out, there are still some processing steps that rely on this, so we would suggest, if you need all the unconnected components, separate them beforehand.
Otherwise, our program will mark the first root in the table and retain the component it's connected to.
guarantee bifurcation
As described above, nodes with many children would give multiple traversing choices. Although we don't really need a binary tree for getting a pre/postorder sequence of the extended definition, the nature of neuron branching and the unique representation of binary tree encourage us to assume that.
As already suggested, this QC step will consider the subtree total path length as branching priority. To be exact, a subtree branch of a lesser scale would be adjusted closer to the root. This way, if you traverse the tree from the lesser to the greater, the lesser adjusted part will always be visited first, equivalent to the way it would be without the adjustment. If you start from the greater it's the same. Yet we still implement the traversing unit as binary, so the adjustment is necessary.
Together with this adjustment, the tree nodes will be sorted so that node numbers increase from the root to the leaves, and the maximum node ID equals the tree size. This step is favorable as we need to assign new nodes for the adjustment, and the final result will be sorted again. As this sort_swc function doesn't connect tree components (unlike the Vaa3D plugin sort_neuron_swc), it requires the input SWC to have only 1 root, so the first QC step is necessary.
prune short segments
We found the main skeletons of a neuron is sufficient for encoding the tree structure, and removing the short segments can reduce the computation load. This program will prune the tree iteratively until there is no segment shorter than a given threshold. The pruning wouldn't cause a tree to 'shrink', i.e. as far as a path is longer than the threshold, it wouldn't be affected at all. You can see this pruning as removing branches that look like one. Its logic is identical to that of the Vaa3D plugin sort_neuron_swc.
Sequence Customization
The data preparation module designed for our deep learning models can be easily understood and modified. You are encouraged to learn from the processing code and use the infrastructures to customize subtree sequences, in terms of not only different traversal orders or their combination, but also the node types.
There is a trend that people are interested in disintegrating the neuron into multiple subunits by some algorithm and quantify their features, going further than the conventional axon-dendrite dichotomy. We provide the function of this very dichotomy, but you can follow it to try your own partition of neuron nodes. Given any set of nodes, even if they are unconnected, or without terminals, the program can find binary trees for you.
Three classes are designed for data preparation except QC: NeuronSequenceDataset
, NeuronTree
and BinarySubtree
.
NeuronSequenceDataset
NeuronSequenceDataset
is a high-level pipeline execution interface utilizing parallel computation.
It commits data loading, quality control and traversal serially but in parallel for each step.
As a good template to start with, it contains instances of parallel computation by pairing the host processing
function (ending with _parallel
) with its own unit processing function (ending with _proc
), for all the three steps.
In most cases where you are satisfied with the multiprocessing design,
you only need to inherit and derive the unit functions, and let the host invoke them for you.
In the traversal step that you might be mostly interested in for customization,
the high-level class calls a child process to execute binary tree reconstruction, traversal,
and retrieval of spatial features indexed by the traversal within one SWC, where NeuronTree
is put in use.
NeuronTree
NeuronTree
is an SWC-level interface to find binary subtrees by sets of nodes and commit traversal upon them.
After initialization, you can provide the interface with a set of nodes to find subtrees.
The number of trees you can get for a given node set depends on the connectivity. If you specify all the nodes within a neuron tree, as the root will always be ignored, it will return subtrees for each stem. When you have multiple sets of nodes, like axon & dendrite, you can invoke this discovery process multiple times, each time for one set. e.g., you can find axon trees first, and the class will store the found trees, and then you invoke that for dendrite, and this time the dendrite trees will be added to the storage. The ultimate subtree number can be larger than the that of the node set.
After the discovery process, you can use this interface to do traversal by pre, in or postorder in a single shot. Here, you can also specify whether to do it from lesser to greater subtrees or the other way around. If you'd like to make a more customizable traversal, you can retrieve the binary trees and do traversal for each one of them yourself and assemble as you like.
This class only allows SWC of bifurcations except for the root, as it uses BinarySubtree
for tree building.
BinarySubtree
BinarySubtree
is the basic class that undertakes recursive searching of node, total subtree path length and traversal.
Its instances are basically tree nodes pointing to their children, so the built subtrees exposed by NeuronTree
are
actually their roots.
All the recursion functions are decorated to cache data prevent stack overflow caused by Python's stack limit.
Note, the caching behavior applies to any node, and can cause bugs if you modify the tree structure yourself,
so you should keep it as what NeuronTree
builds.
If you'd like to have other stats of the binary subtree, you can write your own recursion functions. Please follow
the instructions of dsmtools.utils.misc.safe_recursion_bootstrap
if you want to bypass the stack problem. Caching
can only be used by a tree node member function, so ideally you could inherit BinarySubtree
,
but building would require refactoring NeuronTree
.
Below are the class API documentations of the above classes for you to learn more about their usage and implementation. Have fun!
18class NeuronSequenceDataset: 19 """ 20 An interface for generating dataset from multiple SWC files to be further converted into a dataset that can be 21 used by our deep learning models. See [the above guide](#neuronsequencedataset) for a more detailed description 22 of its principles. 23 24 The default usage should be: loading SWC files with path given during initialization, then operating a quality 25 control defined by `SWCQualityControl`, and then 26 finding binary trees in the neuron tree and committing traversals to get the sequences. They defined as member 27 functions are should be invoked in this order. 28 29 All the above procedures are coded as a host function paired with its unit function that can be called in parallel. 30 for different SWC dataframes. You'll find 3 pairs of them, including: 31 32 |Host|Unit| 33 |----|----| 34 |read_parallel|read_proc| 35 |qc_parallel|qc_proc| 36 |make_sequence_parallel|make_sequence_proc| 37 38 See their own function documentation for more info. 39 40 If you need specifying some or all of the sequence conversion to your like, you can inherit this class 41 and derive its functions. Deriving only the unit function will not change how the host invoke them and maintain the 42 multiprocessing feature. 43 44 The result of this class is multiple dataframes in an OrderedDict indexed by paths, which can be retrieved 45 as a property or saved as a pickle. This result can be further fed 46 to the deep learning model class in the `dsmtools.modeling` module. Here, you also have access to 47 the SWC dataframes after QC as a member property (also an OrderedDict). 48 49 **Note**: Do not modify the retrieved property data, which can cause bugs. Deep copy them if you need. 50 """ 51 52 def __init__(self, 53 swc_paths: Iterable[Union[str, 'PathLike[str]']], 54 show_failure=True, 55 debug=False, 56 chunk_size=1, 57 jobs=1): 58 """Set up the class, specifying common processing options. 59 60 :param swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings. 61 :param show_failure: Whether to print failed swc during processing. 62 :param debug: Whether to print traceback messages when processes fail. 63 :param chunk_size: The chunk size for parallel processing. 64 :param jobs: number of processes for parallel computation. 65 """ 66 67 self._swc_paths: tuple[str] = tuple(str(i) for i in swc_paths) 68 self._swc_raw: Optional[OrderedDict[str, pd.DataFrame]] = None 69 self._swc_qc: Optional[OrderedDict[str, pd.DataFrame]] = None 70 self._sequences: Optional[OrderedDict[str, tuple[pd.DataFrame, str]]] = None 71 self.chunk_size = chunk_size 72 self.jobs = jobs 73 self.show_failure = show_failure 74 self.debug = debug 75 76 def read_parallel(self): 77 """ 78 Read all the given SWC files and store them as an OrderedDict of dataframes within this object. 79 The multiprocessing parameters are given by the class initialization. 80 """ 81 82 with Pool(self.jobs) as p: 83 res = p.map(self.read_proc, self._swc_paths, chunksize=self.chunk_size) 84 res = filter(lambda i: i[1] is not None, zip(self._swc_paths, res)) 85 self._swc_raw = OrderedDict(res) 86 print(f'Successfully loaded {len(self._swc_raw)} SWC files.') 87 88 def read_proc(self, path: str) -> Optional[pd.DataFrame]: 89 """ 90 Loading of a single SWC file, with a try except mechanism to prevent interruption. 91 Failed processes will print a message and return None. 92 93 :param path: Path to the SWC. 94 :return: An SWC dataframe, or None if failed with some error. 95 """ 96 97 try: 98 return utils.swc.read_swc(path) 99 except: 100 if self.show_failure: 101 if self.debug: 102 traceback.print_exc() 103 print(f'Some error occurred when reading SWC: {path}, better check your file.') 104 return None 105 106 def qc_parallel(self, qc_len_thr=10, min_node_count=10): 107 """ 108 Commit quality control on all the loaded SWC dataframes. This operation is not inplace 109 and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.swc_after_qc`. 110 Here, you can specify QC parameters passed to the unit processing function. 111 112 To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of 113 which is also customizable. 114 115 Before running this function, it will check if SWC files are loaded. If not, it will invoke 116 `NeuronSequenceDataset.read_parallel`. 117 118 The multiprocessing parameters are given by the class initialization. 119 120 :param qc_len_thr: The pruning length threshold for quality control. 121 :param min_node_count: The minimum allowed count of nodes for the final SWC. 122 """ 123 124 if self._swc_raw is None: 125 warn("No SWC, running read_parallel..") 126 self.read_parallel() 127 qc_proc = partial(self.qc_proc, qc_len_thr=qc_len_thr) 128 with Pool(self.jobs) as p: 129 res = p.map(qc_proc, self._swc_raw.keys(), chunksize=self.chunk_size) 130 res = filter(lambda i: i[1] is not None and len(i[1]) >= min_node_count, zip(self._swc_raw.keys(), res)) 131 self._swc_qc = OrderedDict(res) 132 print(f'Successfully quality-controlled {len(self._swc_qc)} SWC files.') 133 134 def qc_proc(self, key: str, qc_len_thr: float) -> Optional[pd.DataFrame]: 135 """ 136 Quality control of a single SWC file, with a try except mechanism to prevent interruption. 137 Failed processes will print a message and return None. 138 139 It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments. 140 It's advised that this order is maintained. 141 142 :param key: The key of the loaded SWC, should be a string path. 143 :param qc_len_thr: The pruning length threshold for quality control. 144 :return: An SWC dataframe, or None if failed with some error. 145 """ 146 147 try: 148 swc = self._swc_raw[key].copy(deep=True) 149 SWCQualityControl(swc).retain_only_1st_root().degrade_to_bifurcation().prune_by_len(qc_len_thr) 150 return swc 151 except: 152 if self.show_failure: 153 if self.debug: 154 traceback.print_exc() 155 print(f'Some error occurred during quality control with SWC: {key}.') 156 return None 157 158 def make_sequence_parallel(self, ordering='pre', lesser_first=True): 159 """Convert all neuron trees passing previous procedures to sequences. This operation is not inplace 160 and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.result`. 161 Here, you can specify some traversal parameters passed to the unit processing function. 162 163 Before running this function, it will check if quality control is done. If not, it will invoke 164 `NeuronSequenceDataset.qc_parallel`. 165 166 The multiprocessing parameters are given by the class initialization. 167 168 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 169 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 170 """ 171 172 if self._swc_qc is None: 173 warn("No QC SWC, running qc_parallel with default parameters..") 174 self.qc_parallel() 175 assert ordering in ['pre', 'in', 'post'] 176 make_seq = partial(self.make_sequence_proc, ordering=ordering, lesser_first=lesser_first) 177 with Pool(self.jobs) as p: 178 res = p.map(make_seq, self._swc_qc) 179 res = filter(lambda i: i[1] is not None, zip(self._swc_qc.keys(), res)) 180 self._sequences = OrderedDict(res) 181 print(f'Successfully made {len(self._sequences)} sequences.') 182 183 def make_sequence_proc(self, key: str, ordering: str, lesser_first: bool): 184 """Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence. 185 186 It uses `NeuronTree` to build binary trees separately for axon and dendrite from an SWC, and traverses the trees 187 with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized. 188 189 The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by 190 the traversal sequence. 191 192 The added features are CCFv3 region abbreviations retrieved by `dsmtools.utils.ccf_atlas` module (in columns 193 'region'), and topological annotations (in column 'node_type') of four types: GBTR, 194 representing general, branch, terminal, and root. 195 196 :param key: The key of the loaded SWC, should be a string path. 197 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 198 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 199 :return: 200 """ 201 try: 202 # shallow copy the qc dataframe for new columns 203 swc = self._swc_qc[key].copy(deep=True) 204 nt = NeuronTree(swc) 205 axon_index = swc.index[swc['type'] == utils.swc.Type.AXON] 206 dendrite_index = swc.index[swc['type'].isin([utils.swc.Type.APICAL, utils.swc.Type.BASAL])] 207 nt.find_binary_trees_in(axon_index) 208 nt.find_binary_trees_in(dendrite_index) 209 traversal = nt.collective_traversal(ordering=ordering, lesser_first=lesser_first) 210 # node type, topological annotation 211 root = swc.index[swc['parent'] == -1] 212 children_counts = swc['parent'].value_counts() 213 branch = children_counts.index[children_counts.values > 1] 214 terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index) 215 swc['node_type'] = 'G' 216 swc.loc[root, "node_type"] = 'R' 217 swc.loc[branch, 'node_type'] = 'B' 218 swc.loc[terminal, 'node_type'] = 'T' 219 # ccf region annotation 220 swc['region'] = utils.ccf_atlas.annotate_swc(swc) 221 return swc.loc[traversal, ['x', 'y', 'z', 'type', 'node_type', 'region']] 222 except: 223 if self.show_failure: 224 if self.debug: 225 traceback.print_exc() 226 print(f'Some error occurred when converting SWC to sequence by default: {key}.') 227 return None 228 229 def pickle_qc(self, path: Union[str, 'PathLike[str]']): 230 """Save the SWC dataframes after qc as an ordered dict to a pickle file. 231 232 :param path: the path to save the pickle 233 """ 234 with open(path, 'wb') as f: 235 pickle.dump(self._swc_qc, f) 236 237 def pickle_result(self, path: Union[str, 'PathLike[str]']): 238 """Save the generated sequences as an ordered dict to a pickle file. 239 240 :param path: the path to save the pickle 241 """ 242 with open(path, 'wb') as f: 243 pickle.dump(self._sequences, f) 244 245 @property 246 def swc_after_qc(self) -> Optional[OrderedDict[str, pd.DataFrame]]: 247 """Reference to the SWC files after quality control. If no QC is done, it's None.""" 248 return self._swc_qc 249 250 @property 251 def result(self) -> Optional[OrderedDict[str, pd.DataFrame]]: 252 """The sequence dataframe as an OrderedDict, indexed by paths.""" 253 return self._sequences
An interface for generating dataset from multiple SWC files to be further converted into a dataset that can be used by our deep learning models. See the above guide for a more detailed description of its principles.
The default usage should be: loading SWC files with path given during initialization, then operating a quality
control defined by SWCQualityControl
, and then
finding binary trees in the neuron tree and committing traversals to get the sequences. They defined as member
functions are should be invoked in this order.
All the above procedures are coded as a host function paired with its unit function that can be called in parallel. for different SWC dataframes. You'll find 3 pairs of them, including:
Host | Unit |
---|---|
read_parallel | read_proc |
qc_parallel | qc_proc |
make_sequence_parallel | make_sequence_proc |
See their own function documentation for more info.
If you need specifying some or all of the sequence conversion to your like, you can inherit this class and derive its functions. Deriving only the unit function will not change how the host invoke them and maintain the multiprocessing feature.
The result of this class is multiple dataframes in an OrderedDict indexed by paths, which can be retrieved
as a property or saved as a pickle. This result can be further fed
to the deep learning model class in the dsmtools.modeling
module. Here, you also have access to
the SWC dataframes after QC as a member property (also an OrderedDict).
Note: Do not modify the retrieved property data, which can cause bugs. Deep copy them if you need.
52 def __init__(self, 53 swc_paths: Iterable[Union[str, 'PathLike[str]']], 54 show_failure=True, 55 debug=False, 56 chunk_size=1, 57 jobs=1): 58 """Set up the class, specifying common processing options. 59 60 :param swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings. 61 :param show_failure: Whether to print failed swc during processing. 62 :param debug: Whether to print traceback messages when processes fail. 63 :param chunk_size: The chunk size for parallel processing. 64 :param jobs: number of processes for parallel computation. 65 """ 66 67 self._swc_paths: tuple[str] = tuple(str(i) for i in swc_paths) 68 self._swc_raw: Optional[OrderedDict[str, pd.DataFrame]] = None 69 self._swc_qc: Optional[OrderedDict[str, pd.DataFrame]] = None 70 self._sequences: Optional[OrderedDict[str, tuple[pd.DataFrame, str]]] = None 71 self.chunk_size = chunk_size 72 self.jobs = jobs 73 self.show_failure = show_failure 74 self.debug = debug
Set up the class, specifying common processing options.
Parameters
- swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings.
- show_failure: Whether to print failed swc during processing.
- debug: Whether to print traceback messages when processes fail.
- chunk_size: The chunk size for parallel processing.
- jobs: number of processes for parallel computation.
76 def read_parallel(self): 77 """ 78 Read all the given SWC files and store them as an OrderedDict of dataframes within this object. 79 The multiprocessing parameters are given by the class initialization. 80 """ 81 82 with Pool(self.jobs) as p: 83 res = p.map(self.read_proc, self._swc_paths, chunksize=self.chunk_size) 84 res = filter(lambda i: i[1] is not None, zip(self._swc_paths, res)) 85 self._swc_raw = OrderedDict(res) 86 print(f'Successfully loaded {len(self._swc_raw)} SWC files.')
Read all the given SWC files and store them as an OrderedDict of dataframes within this object. The multiprocessing parameters are given by the class initialization.
88 def read_proc(self, path: str) -> Optional[pd.DataFrame]: 89 """ 90 Loading of a single SWC file, with a try except mechanism to prevent interruption. 91 Failed processes will print a message and return None. 92 93 :param path: Path to the SWC. 94 :return: An SWC dataframe, or None if failed with some error. 95 """ 96 97 try: 98 return utils.swc.read_swc(path) 99 except: 100 if self.show_failure: 101 if self.debug: 102 traceback.print_exc() 103 print(f'Some error occurred when reading SWC: {path}, better check your file.') 104 return None
Loading of a single SWC file, with a try except mechanism to prevent interruption. Failed processes will print a message and return None.
Parameters
- path: Path to the SWC.
Returns
An SWC dataframe, or None if failed with some error.
106 def qc_parallel(self, qc_len_thr=10, min_node_count=10): 107 """ 108 Commit quality control on all the loaded SWC dataframes. This operation is not inplace 109 and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.swc_after_qc`. 110 Here, you can specify QC parameters passed to the unit processing function. 111 112 To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of 113 which is also customizable. 114 115 Before running this function, it will check if SWC files are loaded. If not, it will invoke 116 `NeuronSequenceDataset.read_parallel`. 117 118 The multiprocessing parameters are given by the class initialization. 119 120 :param qc_len_thr: The pruning length threshold for quality control. 121 :param min_node_count: The minimum allowed count of nodes for the final SWC. 122 """ 123 124 if self._swc_raw is None: 125 warn("No SWC, running read_parallel..") 126 self.read_parallel() 127 qc_proc = partial(self.qc_proc, qc_len_thr=qc_len_thr) 128 with Pool(self.jobs) as p: 129 res = p.map(qc_proc, self._swc_raw.keys(), chunksize=self.chunk_size) 130 res = filter(lambda i: i[1] is not None and len(i[1]) >= min_node_count, zip(self._swc_raw.keys(), res)) 131 self._swc_qc = OrderedDict(res) 132 print(f'Successfully quality-controlled {len(self._swc_qc)} SWC files.')
Commit quality control on all the loaded SWC dataframes. This operation is not inplace
and set up a new OrderedDict, which can be referenced by the property NeuronSequenceDataset.swc_after_qc
.
Here, you can specify QC parameters passed to the unit processing function.
To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of which is also customizable.
Before running this function, it will check if SWC files are loaded. If not, it will invoke
NeuronSequenceDataset.read_parallel
.
The multiprocessing parameters are given by the class initialization.
Parameters
- qc_len_thr: The pruning length threshold for quality control.
- min_node_count: The minimum allowed count of nodes for the final SWC.
134 def qc_proc(self, key: str, qc_len_thr: float) -> Optional[pd.DataFrame]: 135 """ 136 Quality control of a single SWC file, with a try except mechanism to prevent interruption. 137 Failed processes will print a message and return None. 138 139 It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments. 140 It's advised that this order is maintained. 141 142 :param key: The key of the loaded SWC, should be a string path. 143 :param qc_len_thr: The pruning length threshold for quality control. 144 :return: An SWC dataframe, or None if failed with some error. 145 """ 146 147 try: 148 swc = self._swc_raw[key].copy(deep=True) 149 SWCQualityControl(swc).retain_only_1st_root().degrade_to_bifurcation().prune_by_len(qc_len_thr) 150 return swc 151 except: 152 if self.show_failure: 153 if self.debug: 154 traceback.print_exc() 155 print(f'Some error occurred during quality control with SWC: {key}.') 156 return None
Quality control of a single SWC file, with a try except mechanism to prevent interruption. Failed processes will print a message and return None.
It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments. It's advised that this order is maintained.
Parameters
- key: The key of the loaded SWC, should be a string path.
- qc_len_thr: The pruning length threshold for quality control.
Returns
An SWC dataframe, or None if failed with some error.
158 def make_sequence_parallel(self, ordering='pre', lesser_first=True): 159 """Convert all neuron trees passing previous procedures to sequences. This operation is not inplace 160 and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.result`. 161 Here, you can specify some traversal parameters passed to the unit processing function. 162 163 Before running this function, it will check if quality control is done. If not, it will invoke 164 `NeuronSequenceDataset.qc_parallel`. 165 166 The multiprocessing parameters are given by the class initialization. 167 168 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 169 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 170 """ 171 172 if self._swc_qc is None: 173 warn("No QC SWC, running qc_parallel with default parameters..") 174 self.qc_parallel() 175 assert ordering in ['pre', 'in', 'post'] 176 make_seq = partial(self.make_sequence_proc, ordering=ordering, lesser_first=lesser_first) 177 with Pool(self.jobs) as p: 178 res = p.map(make_seq, self._swc_qc) 179 res = filter(lambda i: i[1] is not None, zip(self._swc_qc.keys(), res)) 180 self._sequences = OrderedDict(res) 181 print(f'Successfully made {len(self._sequences)} sequences.')
Convert all neuron trees passing previous procedures to sequences. This operation is not inplace
and set up a new OrderedDict, which can be referenced by the property NeuronSequenceDataset.result
.
Here, you can specify some traversal parameters passed to the unit processing function.
Before running this function, it will check if quality control is done. If not, it will invoke
NeuronSequenceDataset.qc_parallel
.
The multiprocessing parameters are given by the class initialization.
Parameters
- ordering: The traversal type, either 'pre', 'in', or 'post'.
- lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
183 def make_sequence_proc(self, key: str, ordering: str, lesser_first: bool): 184 """Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence. 185 186 It uses `NeuronTree` to build binary trees separately for axon and dendrite from an SWC, and traverses the trees 187 with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized. 188 189 The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by 190 the traversal sequence. 191 192 The added features are CCFv3 region abbreviations retrieved by `dsmtools.utils.ccf_atlas` module (in columns 193 'region'), and topological annotations (in column 'node_type') of four types: GBTR, 194 representing general, branch, terminal, and root. 195 196 :param key: The key of the loaded SWC, should be a string path. 197 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 198 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 199 :return: 200 """ 201 try: 202 # shallow copy the qc dataframe for new columns 203 swc = self._swc_qc[key].copy(deep=True) 204 nt = NeuronTree(swc) 205 axon_index = swc.index[swc['type'] == utils.swc.Type.AXON] 206 dendrite_index = swc.index[swc['type'].isin([utils.swc.Type.APICAL, utils.swc.Type.BASAL])] 207 nt.find_binary_trees_in(axon_index) 208 nt.find_binary_trees_in(dendrite_index) 209 traversal = nt.collective_traversal(ordering=ordering, lesser_first=lesser_first) 210 # node type, topological annotation 211 root = swc.index[swc['parent'] == -1] 212 children_counts = swc['parent'].value_counts() 213 branch = children_counts.index[children_counts.values > 1] 214 terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index) 215 swc['node_type'] = 'G' 216 swc.loc[root, "node_type"] = 'R' 217 swc.loc[branch, 'node_type'] = 'B' 218 swc.loc[terminal, 'node_type'] = 'T' 219 # ccf region annotation 220 swc['region'] = utils.ccf_atlas.annotate_swc(swc) 221 return swc.loc[traversal, ['x', 'y', 'z', 'type', 'node_type', 'region']] 222 except: 223 if self.show_failure: 224 if self.debug: 225 traceback.print_exc() 226 print(f'Some error occurred when converting SWC to sequence by default: {key}.') 227 return None
Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence.
It uses NeuronTree
to build binary trees separately for axon and dendrite from an SWC, and traverses the trees
with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized.
The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by the traversal sequence.
The added features are CCFv3 region abbreviations retrieved by dsmtools.utils.ccf_atlas
module (in columns
'region'), and topological annotations (in column 'node_type') of four types: GBTR,
representing general, branch, terminal, and root.
Parameters
- key: The key of the loaded SWC, should be a string path.
- ordering: The traversal type, either 'pre', 'in', or 'post'.
- lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
Returns
229 def pickle_qc(self, path: Union[str, 'PathLike[str]']): 230 """Save the SWC dataframes after qc as an ordered dict to a pickle file. 231 232 :param path: the path to save the pickle 233 """ 234 with open(path, 'wb') as f: 235 pickle.dump(self._swc_qc, f)
Save the SWC dataframes after qc as an ordered dict to a pickle file.
Parameters
- path: the path to save the pickle
237 def pickle_result(self, path: Union[str, 'PathLike[str]']): 238 """Save the generated sequences as an ordered dict to a pickle file. 239 240 :param path: the path to save the pickle 241 """ 242 with open(path, 'wb') as f: 243 pickle.dump(self._sequences, f)
Save the generated sequences as an ordered dict to a pickle file.
Parameters
- path: the path to save the pickle
9class SWCQualityControl: 10 """Interface for executing SWC quality controls. See [the above explanation](#reasons-for-qc) for its principle. 11 12 To use this class, initialize it with an SWC dataframe (used as reference, so all the changes are inplace), and 13 call each member function in a specific order (the recommended order is implemented by that of 14 `NeuronSequenceDataset.qc_proc`). 15 """ 16 17 def __init__(self, swc: pd.DataFrame): 18 """Set the referenced SWC dataframe to modify. 19 20 :param swc: An SWC dataframe, all changes are in place. 21 """ 22 23 self._swc = swc 24 25 def retain_only_1st_root(self): 26 """ 27 Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component. 28 """ 29 30 roots = self._swc[self._swc.parent == -1] 31 if len(roots) == 1: 32 return self 33 if len(roots) == 0: 34 raise RuntimeError("No root in this SWC.") 35 visited = get_subtree_nodes(self._swc, roots.iloc[0, 0]) 36 self._swc.drop(index=self._swc.index.difference(visited), inplace=True) 37 return self 38 39 def degrade_to_bifurcation(self): 40 """Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node. 41 42 It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be 43 generated). Then it iterates through the unqualified nodes and turn them into bifurcations. 44 45 The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path 46 length are nearer to the root. 47 """ 48 49 sort_swc(self._swc) 50 swc = self._swc.copy(deep=True) 51 new_ind = max(swc.index) 52 child_dict = get_child_dict(swc) 53 for node, children in child_dict.items(): 54 if len(children) <= 2 or swc.at[node, 'parent'] == -1: 55 continue 56 # get path length of all subtrees and sort 57 len_list = [get_path_len(swc, get_subtree_nodes(swc, c)) for c in children] 58 # from short to long disintegrate child segments to make them bifurcation 59 # short child seg would be brought up to a parent node first 60 # the parent node share the coordinate with the original branch node 61 for i in np.argsort(len_list)[:-1]: 62 to_move = children[i] 63 multi = self._swc.at[to_move, 'parent'] 64 parent = self._swc.at[multi, 'parent'] 65 new_ind += 1 66 self._swc.loc[new_ind] = self._swc.loc[multi].copy(deep=True) # new node 67 self._swc.at[new_ind, 'parent'] = parent 68 self._swc.at[multi, 'parent'] = new_ind 69 self._swc.at[to_move, 'parent'] = new_ind 70 sort_swc(self._swc) 71 return self 72 73 def prune_by_len(self, len_thr=10): 74 """ 75 This function iteratively prune the tree. Every time it finds the short terminal branches and delete them, 76 and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with 77 no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained. 78 79 :param len_thr: The min length allowed for terminal branches, default as 10 80 """ 81 82 while True: 83 segments = get_segments(self._swc) 84 drop_ind = [] 85 terminal_branch = np.unique([row['parentSeg'] for ind, row in segments.iterrows() if row['childSeg'] == []]) 86 for ind in terminal_branch: 87 # retain non-terminal branches 88 child_seg = [i for i in segments.at[ind, 'childSeg'] if segments.at[i, 'childSeg'] == []] 89 # get length for all segments 90 seg_len = [get_path_len(self._swc, segments.at[c, 'nodes']) for c in child_seg] 91 # all are terminal branches, then the longest branch must be retained 92 if child_seg == segments.at[ind, 'childSeg']: 93 k = np.argmax(seg_len) 94 child_seg.pop(k) 95 seg_len.pop(k) 96 # drop short branches 97 drop_ind.extend(chain.from_iterable(segments.at[c, 'nodes'] 98 for c, s in zip(child_seg, seg_len) if s <= len_thr)) 99 if not drop_ind: 100 break 101 self._swc.drop(index=drop_ind, inplace=True) 102 return self
Interface for executing SWC quality controls. See the above explanation for its principle.
To use this class, initialize it with an SWC dataframe (used as reference, so all the changes are inplace), and
call each member function in a specific order (the recommended order is implemented by that of
NeuronSequenceDataset.qc_proc
).
17 def __init__(self, swc: pd.DataFrame): 18 """Set the referenced SWC dataframe to modify. 19 20 :param swc: An SWC dataframe, all changes are in place. 21 """ 22 23 self._swc = swc
Set the referenced SWC dataframe to modify.
Parameters
- swc: An SWC dataframe, all changes are in place.
25 def retain_only_1st_root(self): 26 """ 27 Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component. 28 """ 29 30 roots = self._swc[self._swc.parent == -1] 31 if len(roots) == 1: 32 return self 33 if len(roots) == 0: 34 raise RuntimeError("No root in this SWC.") 35 visited = get_subtree_nodes(self._swc, roots.iloc[0, 0]) 36 self._swc.drop(index=self._swc.index.difference(visited), inplace=True) 37 return self
Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component.
39 def degrade_to_bifurcation(self): 40 """Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node. 41 42 It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be 43 generated). Then it iterates through the unqualified nodes and turn them into bifurcations. 44 45 The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path 46 length are nearer to the root. 47 """ 48 49 sort_swc(self._swc) 50 swc = self._swc.copy(deep=True) 51 new_ind = max(swc.index) 52 child_dict = get_child_dict(swc) 53 for node, children in child_dict.items(): 54 if len(children) <= 2 or swc.at[node, 'parent'] == -1: 55 continue 56 # get path length of all subtrees and sort 57 len_list = [get_path_len(swc, get_subtree_nodes(swc, c)) for c in children] 58 # from short to long disintegrate child segments to make them bifurcation 59 # short child seg would be brought up to a parent node first 60 # the parent node share the coordinate with the original branch node 61 for i in np.argsort(len_list)[:-1]: 62 to_move = children[i] 63 multi = self._swc.at[to_move, 'parent'] 64 parent = self._swc.at[multi, 'parent'] 65 new_ind += 1 66 self._swc.loc[new_ind] = self._swc.loc[multi].copy(deep=True) # new node 67 self._swc.at[new_ind, 'parent'] = parent 68 self._swc.at[multi, 'parent'] = new_ind 69 self._swc.at[to_move, 'parent'] = new_ind 70 sort_swc(self._swc) 71 return self
Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node.
It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be generated). Then it iterates through the unqualified nodes and turn them into bifurcations.
The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path length are nearer to the root.
73 def prune_by_len(self, len_thr=10): 74 """ 75 This function iteratively prune the tree. Every time it finds the short terminal branches and delete them, 76 and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with 77 no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained. 78 79 :param len_thr: The min length allowed for terminal branches, default as 10 80 """ 81 82 while True: 83 segments = get_segments(self._swc) 84 drop_ind = [] 85 terminal_branch = np.unique([row['parentSeg'] for ind, row in segments.iterrows() if row['childSeg'] == []]) 86 for ind in terminal_branch: 87 # retain non-terminal branches 88 child_seg = [i for i in segments.at[ind, 'childSeg'] if segments.at[i, 'childSeg'] == []] 89 # get length for all segments 90 seg_len = [get_path_len(self._swc, segments.at[c, 'nodes']) for c in child_seg] 91 # all are terminal branches, then the longest branch must be retained 92 if child_seg == segments.at[ind, 'childSeg']: 93 k = np.argmax(seg_len) 94 child_seg.pop(k) 95 seg_len.pop(k) 96 # drop short branches 97 drop_ind.extend(chain.from_iterable(segments.at[c, 'nodes'] 98 for c, s in zip(child_seg, seg_len) if s <= len_thr)) 99 if not drop_ind: 100 break 101 self._swc.drop(index=drop_ind, inplace=True) 102 return self
This function iteratively prune the tree. Every time it finds the short terminal branches and delete them, and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained.
Parameters
- len_thr: The min length allowed for terminal branches, default as 10
138class NeuronTree: 139 """Interface for generating binary trees and commit traversals for the whole SWC neuron tree. 140 See [the above guide](#neurontree) for a more detailed description of its principles. 141 142 With this interface, you can provide a set of nodes to its binary tree find function 143 `NeuronTree.find_binary_trees_in`. The method will take your input as the foreground, and do a backtracking and 144 merge to generate multiple trees (the set of nodes given can be disconnected). 145 146 For tree generation, it implements a tree building method `NeuronTree.build_a_tree` using `BinarySubtree` as units. 147 At last, the root `BinarySubtree` nodes of the built trees will be saved. 148 149 You can repeat this process multiple times and the interface will aggregate your results by default, 150 which is accessible as `NeuronTree.binary_trees`. You are free to use this property to traverse them with the 151 `BinaryTree` methods yourself or use `NeuronTree.collective_traversal` to get it at once. 152 """ 153 154 def __init__(self, swc: pd.DataFrame, tree_node_class=BinarySubtree): 155 """ 156 Initialize by assigning an SWC dataframe as a reference. 157 It will only use the xyz and parent fields, and will check if they exist at this stage. 158 Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated. 159 160 :param swc: An SWC dataframe, no modification takes place. 161 :param tree_node_class: The binary tree node class to use for building, as you can provide your own. 162 """ 163 164 self._df = swc[['x', 'y', 'z', 'parent']] 165 self._tree_class = tree_node_class 166 self._trees = list[tree_node_class]() 167 self._child_dict = get_child_dict(self._df) 168 169 @property 170 def binary_trees(self): 171 return self._trees 172 173 @safe_recursion_bootstrap 174 def build_a_tree(self, rt: int, within: set, stack: LifoQueue): 175 """Build a binary tree from a root using a specified tree node class. 176 177 :param rt: The root to start with, and to proceed in recursion. 178 :param within: The range of node IDs to build trees from, children outside will be ignored. 179 :param stack: A new queue.LifoQueue to handle recursion in decorators. 180 :return: The root of the binary tree as the specified tree node class. 181 """ 182 children = [] 183 for i in self._child_dict[rt]: 184 if i in within: 185 children.append((yield self.build_a_tree(i, within, stack=stack))) 186 yield self._tree_class(rt, self._df.loc[rt, ['x', 'y', 'z']], children) 187 188 def find_binary_trees_in(self, index: Sequence[int], overwrite=False): 189 """Find the largest binary trees within a subset of nodes of an SWC. 190 191 With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get 192 traversals for specific structures. 193 Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the 194 whole neuron will not be counted even if provided. 195 196 By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees 197 are as deep as they can be, working like a union-find. The backtracking starts from 'terminals', 198 the nodes without children in the set, rather than the real terminals of the SWC. 199 200 After backtracking, it invokes `NeuronTree.build_a_tree` to build the trees as linked nodes recursively. By 201 default, it uses `BinarySubtree` as the tree node class. You can change it during initialization. 202 203 This method will save the results in the property `NeuronTree.binary_trees`. It's a list and every time you 204 invoke the function, the new results will be appended, unless you specify to overwrite the whole list. 205 206 :param index: A sequence containing nodes including the subtrees to search. 207 :param overwrite: Whether to overwrite the existing trees or extend. 208 """ 209 210 index = pd.Index(index) 211 child_count = self._df.loc[index, 'parent'].value_counts() 212 ends = index.difference(child_count.index) 213 sorted_nodes = {} 214 root_index = self._df.index[self._df['parent'] == -1] 215 for n in ends: 216 backtrack = [] 217 while n in index and n not in root_index: # root can't be included in a binary tree, must stop 218 backtrack.append(n) 219 n = self._df.loc[n, 'parent'] 220 for root, tree in sorted_nodes.items(): 221 if n in tree: 222 sorted_nodes[root] |= set(backtrack) 223 break 224 else: # when current subtrees contain none of backtrack, continue backtrack 225 continue 226 break # complete backtracking, break and start from next terminal 227 else: # execute when cur_id run out of nodes, i.e. meeting a new ending 228 sorted_nodes[backtrack[-1]] = set(backtrack) 229 230 trees = [self.build_a_tree(root, nodes, stack=LifoQueue()) for root, nodes in sorted_nodes.items()] 231 if overwrite: 232 self._trees = trees 233 else: 234 self._trees.extend(trees) 235 236 def collective_traversal(self, ordering='pre', lesser_first=True) -> tuple[int]: 237 """Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority. 238 239 You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees 240 are traversed first than the greater. 241 This method will also chain the traversals of each binary tree into one, by the same order with the traversal in 242 each subtree. 243 244 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 245 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 246 :return: A list of node IDs. 247 """ 248 249 trees = sorted(self._trees, key=lambda x: x.tot_path_len(stack=LifoQueue()), reverse=not lesser_first) 250 if ordering == 'pre': 251 traversal = (t.preorder(lesser_first, stack=LifoQueue()) for t in trees) 252 elif ordering == 'in': 253 traversal = (t.inorder(lesser_first, stack=LifoQueue()) for t in trees) 254 elif ordering == 'post': 255 traversal = (t.postorder(lesser_first, stack=LifoQueue()) for t in trees) 256 else: 257 raise "Must specify an ordering among 'pre', 'in' and 'post'." 258 return sum(traversal, ())
Interface for generating binary trees and commit traversals for the whole SWC neuron tree. See the above guide for a more detailed description of its principles.
With this interface, you can provide a set of nodes to its binary tree find function
NeuronTree.find_binary_trees_in
. The method will take your input as the foreground, and do a backtracking and
merge to generate multiple trees (the set of nodes given can be disconnected).
For tree generation, it implements a tree building method NeuronTree.build_a_tree
using BinarySubtree
as units.
At last, the root BinarySubtree
nodes of the built trees will be saved.
You can repeat this process multiple times and the interface will aggregate your results by default,
which is accessible as NeuronTree.binary_trees
. You are free to use this property to traverse them with the
BinaryTree
methods yourself or use NeuronTree.collective_traversal
to get it at once.
154 def __init__(self, swc: pd.DataFrame, tree_node_class=BinarySubtree): 155 """ 156 Initialize by assigning an SWC dataframe as a reference. 157 It will only use the xyz and parent fields, and will check if they exist at this stage. 158 Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated. 159 160 :param swc: An SWC dataframe, no modification takes place. 161 :param tree_node_class: The binary tree node class to use for building, as you can provide your own. 162 """ 163 164 self._df = swc[['x', 'y', 'z', 'parent']] 165 self._tree_class = tree_node_class 166 self._trees = list[tree_node_class]() 167 self._child_dict = get_child_dict(self._df)
Initialize by assigning an SWC dataframe as a reference. It will only use the xyz and parent fields, and will check if they exist at this stage. Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated.
Parameters
- swc: An SWC dataframe, no modification takes place.
- tree_node_class: The binary tree node class to use for building, as you can provide your own.
36 def wrapped_func(*args, stack: LifoQueue, **kwargs): 37 if not stack.empty(): # where next goes to, return the generator directly 38 return func(*args, stack=stack, **kwargs) 39 recur = func(*args, stack=stack, **kwargs) # initialized generator 40 while True: 41 if isinstance(recur, Generator): 42 stack.put(recur) 43 recur = next(recur) 44 else: # it's a number then, computation done for this branch 45 stack.get() 46 if stack.empty(): 47 break 48 recur = stack.queue[-1].send(recur) # send the result back to its parent 49 return recur
Build a binary tree from a root using a specified tree node class.
Parameters
- rt: The root to start with, and to proceed in recursion.
- within: The range of node IDs to build trees from, children outside will be ignored.
- stack: A new queue.LifoQueue to handle recursion in decorators.
Returns
The root of the binary tree as the specified tree node class.
188 def find_binary_trees_in(self, index: Sequence[int], overwrite=False): 189 """Find the largest binary trees within a subset of nodes of an SWC. 190 191 With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get 192 traversals for specific structures. 193 Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the 194 whole neuron will not be counted even if provided. 195 196 By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees 197 are as deep as they can be, working like a union-find. The backtracking starts from 'terminals', 198 the nodes without children in the set, rather than the real terminals of the SWC. 199 200 After backtracking, it invokes `NeuronTree.build_a_tree` to build the trees as linked nodes recursively. By 201 default, it uses `BinarySubtree` as the tree node class. You can change it during initialization. 202 203 This method will save the results in the property `NeuronTree.binary_trees`. It's a list and every time you 204 invoke the function, the new results will be appended, unless you specify to overwrite the whole list. 205 206 :param index: A sequence containing nodes including the subtrees to search. 207 :param overwrite: Whether to overwrite the existing trees or extend. 208 """ 209 210 index = pd.Index(index) 211 child_count = self._df.loc[index, 'parent'].value_counts() 212 ends = index.difference(child_count.index) 213 sorted_nodes = {} 214 root_index = self._df.index[self._df['parent'] == -1] 215 for n in ends: 216 backtrack = [] 217 while n in index and n not in root_index: # root can't be included in a binary tree, must stop 218 backtrack.append(n) 219 n = self._df.loc[n, 'parent'] 220 for root, tree in sorted_nodes.items(): 221 if n in tree: 222 sorted_nodes[root] |= set(backtrack) 223 break 224 else: # when current subtrees contain none of backtrack, continue backtrack 225 continue 226 break # complete backtracking, break and start from next terminal 227 else: # execute when cur_id run out of nodes, i.e. meeting a new ending 228 sorted_nodes[backtrack[-1]] = set(backtrack) 229 230 trees = [self.build_a_tree(root, nodes, stack=LifoQueue()) for root, nodes in sorted_nodes.items()] 231 if overwrite: 232 self._trees = trees 233 else: 234 self._trees.extend(trees)
Find the largest binary trees within a subset of nodes of an SWC.
With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get traversals for specific structures. Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the whole neuron will not be counted even if provided.
By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees are as deep as they can be, working like a union-find. The backtracking starts from 'terminals', the nodes without children in the set, rather than the real terminals of the SWC.
After backtracking, it invokes NeuronTree.build_a_tree
to build the trees as linked nodes recursively. By
default, it uses BinarySubtree
as the tree node class. You can change it during initialization.
This method will save the results in the property NeuronTree.binary_trees
. It's a list and every time you
invoke the function, the new results will be appended, unless you specify to overwrite the whole list.
Parameters
- index: A sequence containing nodes including the subtrees to search.
- overwrite: Whether to overwrite the existing trees or extend.
236 def collective_traversal(self, ordering='pre', lesser_first=True) -> tuple[int]: 237 """Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority. 238 239 You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees 240 are traversed first than the greater. 241 This method will also chain the traversals of each binary tree into one, by the same order with the traversal in 242 each subtree. 243 244 :param ordering: The traversal type, either 'pre', 'in', or 'post'. 245 :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees. 246 :return: A list of node IDs. 247 """ 248 249 trees = sorted(self._trees, key=lambda x: x.tot_path_len(stack=LifoQueue()), reverse=not lesser_first) 250 if ordering == 'pre': 251 traversal = (t.preorder(lesser_first, stack=LifoQueue()) for t in trees) 252 elif ordering == 'in': 253 traversal = (t.inorder(lesser_first, stack=LifoQueue()) for t in trees) 254 elif ordering == 'post': 255 traversal = (t.postorder(lesser_first, stack=LifoQueue()) for t in trees) 256 else: 257 raise "Must specify an ordering among 'pre', 'in' and 'post'." 258 return sum(traversal, ())
Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority.
You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees are traversed first than the greater. This method will also chain the traversals of each binary tree into one, by the same order with the traversal in each subtree.
Parameters
- ordering: The traversal type, either 'pre', 'in', or 'post'.
- lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
Returns
A list of node IDs.
12class BinarySubtree: 13 """Binary tree implemented as linked nodes. 14 See [the above guide](#binarysubtree) for a more detailed description of its principles. 15 16 The reason why the tree is reconstructed instead of using the form of SWC comes from the benefit of node-level 17 caching and easy recursion design. 18 19 Its definition is still different from a general binary tree in some ways. It keeps an order of children by weight 20 rather, named as 'lesser' and 'greater' than left and right. When it comes to a linear connected node the child will 21 be lesser. 22 23 The weight is calculated as the total path length of subtrees, so 24 the determination of the subtree structure is necessary to build the parent. The construction of the tree has to be 25 recursive, and once it's completed, there should be no more modification. 26 27 This way, all properties obtained by recursion is definite, so we can apply caching on them, such as the total 28 path length itself, and traversal. 29 """ 30 31 def __init__(self, key: int, coord: Sequence[float], children: Sequence): 32 """Set up a tree node with already constructed children nodes. 33 34 :param key: The SWC node ID. 35 :param coord: The 3D coordinate of the node. 36 :param children: A sequence containing at most 2 already constructed child BinarySubtree. 37 """ 38 self._key = key 39 self._coord = np.array(coord) 40 assert len(children) <= 2 41 self._lesser: Optional[BinarySubtree] = None 42 self._greater: Optional[BinarySubtree] = None 43 if len(children) == 2: 44 if children[0].tot_path_len(stack=LifoQueue()) <= children[1].tot_path_len(stack=LifoQueue()): 45 self._lesser = children[0] 46 self._greater = children[1] 47 else: 48 self._lesser = children[1] 49 self._greater = children[0] 50 elif len(children) == 1: 51 self._lesser = children[0] 52 53 @property 54 def key(self) -> int: 55 """Node ID.""" 56 return self._key 57 58 @property 59 def coord(self) -> Sequence[float]: 60 """Node coordinate.""" 61 return self._coord 62 63 @property 64 def lesser(self) -> 'BinarySubtree': 65 """Child tree node of the subtree with lesser total path length.""" 66 return self._lesser 67 68 @property 69 def greater(self) -> 'BinarySubtree': 70 """Child tree node of the subtree with greater total path length.""" 71 return self._greater 72 73 @cache 74 @safe_recursion_bootstrap 75 def tot_path_len(self, stack: LifoQueue) -> float: 76 """ 77 Recursive sum of path length of the subtree, implemented as recursion safe, and with a cache feature. 78 It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`. 79 80 Usually this value is already calculated and cached after the tree construction completes. 81 82 :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion. 83 :return: Total path length of this subtree. 84 """ 85 s = 0.0 86 if self._lesser is not None: 87 s += (yield self._lesser.tot_path_len(stack=stack)) + np.linalg.norm(self._coord - self._lesser.coord) 88 if self._greater is not None: 89 s += (yield self._greater.tot_path_len(stack=stack)) + np.linalg.norm(self._coord - self._greater.coord) 90 yield s 91 92 @cache 93 @safe_recursion_bootstrap 94 def preorder(self, lesser_first, stack: LifoQueue) -> tuple[int]: 95 """ 96 Preorder traversal for this subtree, implemented as recursion safe, and with a cache feature. 97 It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`. 98 99 :param lesser_first: Whether to traverse the lesser subtree before the greater. 100 :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion. 101 :return: A tuple of node IDs. 102 """ 103 a = (yield self._lesser.preorder(lesser_first, stack=stack)) if self._lesser is not None else tuple() 104 b = (yield self._greater.preorder(lesser_first, stack=stack)) if self._greater is not None else tuple() 105 yield (self._key,) + a + b if lesser_first else (self._key,) + b + a 106 107 @cache 108 @safe_recursion_bootstrap 109 def inorder(self, lesser_first, stack: LifoQueue) -> tuple[int]: 110 """ 111 Inorder traversal for this subtree, implemented as recursion safe, and with a cache feature. 112 It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`. 113 114 :param lesser_first: Whether to traverse the lesser subtree before the greater. 115 :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion. 116 :return: A tuple of node IDs. 117 """ 118 a = (yield self._lesser.inorder(lesser_first, stack=stack)) if self._lesser is not None else tuple() 119 b = (yield self._greater.inorder(lesser_first, stack=stack)) if self._greater is not None else tuple() 120 yield a + (self._key,) + b if lesser_first else b + (self._key,) + a 121 122 @cache 123 @safe_recursion_bootstrap 124 def postorder(self, lesser_first, stack: LifoQueue) -> tuple[int]: 125 """ 126 Postorder traversal for this subtree, implemented as recursion safe, and with a cache feature. 127 It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`. 128 129 :param lesser_first: Whether to traverse the lesser subtree before the greater. 130 :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion. 131 :return: A tuple of node IDs. 132 """ 133 a = (yield self._lesser.postorder(lesser_first, stack=stack)) if self._lesser is not None else tuple() 134 b = (yield self._greater.postorder(lesser_first, stack=stack)) if self._greater is not None else tuple() 135 yield a + b + (self._key,) if lesser_first else b + a + (self._key,)
Binary tree implemented as linked nodes. See the above guide for a more detailed description of its principles.
The reason why the tree is reconstructed instead of using the form of SWC comes from the benefit of node-level caching and easy recursion design.
Its definition is still different from a general binary tree in some ways. It keeps an order of children by weight rather, named as 'lesser' and 'greater' than left and right. When it comes to a linear connected node the child will be lesser.
The weight is calculated as the total path length of subtrees, so the determination of the subtree structure is necessary to build the parent. The construction of the tree has to be recursive, and once it's completed, there should be no more modification.
This way, all properties obtained by recursion is definite, so we can apply caching on them, such as the total path length itself, and traversal.
31 def __init__(self, key: int, coord: Sequence[float], children: Sequence): 32 """Set up a tree node with already constructed children nodes. 33 34 :param key: The SWC node ID. 35 :param coord: The 3D coordinate of the node. 36 :param children: A sequence containing at most 2 already constructed child BinarySubtree. 37 """ 38 self._key = key 39 self._coord = np.array(coord) 40 assert len(children) <= 2 41 self._lesser: Optional[BinarySubtree] = None 42 self._greater: Optional[BinarySubtree] = None 43 if len(children) == 2: 44 if children[0].tot_path_len(stack=LifoQueue()) <= children[1].tot_path_len(stack=LifoQueue()): 45 self._lesser = children[0] 46 self._greater = children[1] 47 else: 48 self._lesser = children[1] 49 self._greater = children[0] 50 elif len(children) == 1: 51 self._lesser = children[0]
Set up a tree node with already constructed children nodes.
Parameters
- key: The SWC node ID.
- coord: The 3D coordinate of the node.
- children: A sequence containing at most 2 already constructed child BinarySubtree.
Child tree node of the subtree with greater total path length.
36 def wrapped_func(*args, stack: LifoQueue, **kwargs): 37 if not stack.empty(): # where next goes to, return the generator directly 38 return func(*args, stack=stack, **kwargs) 39 recur = func(*args, stack=stack, **kwargs) # initialized generator 40 while True: 41 if isinstance(recur, Generator): 42 stack.put(recur) 43 recur = next(recur) 44 else: # it's a number then, computation done for this branch 45 stack.get() 46 if stack.empty(): 47 break 48 recur = stack.queue[-1].send(recur) # send the result back to its parent 49 return recur
Recursive sum of path length of the subtree, implemented as recursion safe, and with a cache feature.
It can be used as a normal recursion, but must be provided with a new stack of LifoQueue
.
Usually this value is already calculated and cached after the tree construction completes.
Parameters
- stack: A newly initialized
LifoQueue
passed by keyword, will be passed down in the recursion.
Returns
Total path length of this subtree.
36 def wrapped_func(*args, stack: LifoQueue, **kwargs): 37 if not stack.empty(): # where next goes to, return the generator directly 38 return func(*args, stack=stack, **kwargs) 39 recur = func(*args, stack=stack, **kwargs) # initialized generator 40 while True: 41 if isinstance(recur, Generator): 42 stack.put(recur) 43 recur = next(recur) 44 else: # it's a number then, computation done for this branch 45 stack.get() 46 if stack.empty(): 47 break 48 recur = stack.queue[-1].send(recur) # send the result back to its parent 49 return recur
Preorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
It can be used as a normal recursion, but must be provided with a new stack of LifoQueue
.
Parameters
- lesser_first: Whether to traverse the lesser subtree before the greater.
- stack: A newly initialized
LifoQueue
passed by keyword, will be passed down in the recursion.
Returns
A tuple of node IDs.
36 def wrapped_func(*args, stack: LifoQueue, **kwargs): 37 if not stack.empty(): # where next goes to, return the generator directly 38 return func(*args, stack=stack, **kwargs) 39 recur = func(*args, stack=stack, **kwargs) # initialized generator 40 while True: 41 if isinstance(recur, Generator): 42 stack.put(recur) 43 recur = next(recur) 44 else: # it's a number then, computation done for this branch 45 stack.get() 46 if stack.empty(): 47 break 48 recur = stack.queue[-1].send(recur) # send the result back to its parent 49 return recur
Inorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
It can be used as a normal recursion, but must be provided with a new stack of LifoQueue
.
Parameters
- lesser_first: Whether to traverse the lesser subtree before the greater.
- stack: A newly initialized
LifoQueue
passed by keyword, will be passed down in the recursion.
Returns
A tuple of node IDs.
36 def wrapped_func(*args, stack: LifoQueue, **kwargs): 37 if not stack.empty(): # where next goes to, return the generator directly 38 return func(*args, stack=stack, **kwargs) 39 recur = func(*args, stack=stack, **kwargs) # initialized generator 40 while True: 41 if isinstance(recur, Generator): 42 stack.put(recur) 43 recur = next(recur) 44 else: # it's a number then, computation done for this branch 45 stack.get() 46 if stack.empty(): 47 break 48 recur = stack.queue[-1].send(recur) # send the result back to its parent 49 return recur
Postorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
It can be used as a normal recursion, but must be provided with a new stack of LifoQueue
.
Parameters
- lesser_first: Whether to traverse the lesser subtree before the greater.
- stack: A newly initialized
LifoQueue
passed by keyword, will be passed down in the recursion.
Returns
A tuple of node IDs.