dsmtools.preprocessing

One of the major features of this package, also as the prerequisite for you to run a neural network on neuron morphology data, is to convert SWC into sequences of quantifiable features. This module provides classes to:

  • commit quality control on a batch of SWC dataframes to make sure the structures can be converted to sequence.
  • convert a batch of SWC into sequences of tree node traversals annotated with useful features.

The classes under this module also use multiprocessing to boost the speed as the tasks are highly homogeneous.

Traversal of Binary Tree

A binary tree can be traversed by preorder, inorder and postorder, which is the simplest idea of converting a tree structure to sequence. Here, we don't include the breadth- and depth-first methods that produce more diverse results for general tree structures.

How do you get each of them? As a simple explanation, when you come into a tree node, w/o children typically recognized as its subtrees, do the following steps recursively:

Order Step 1 Step 2 Step 3
pre output the current node traverse the left subtree traverse the right subtree
in traverse the left subtree output the current node traverse the right subtree
post traverse the left subtree traverse the right subtree output the current node

This workflow is executed from the root to the leaves for any node in the tree. As the structure of the tree and its subtrees are comparable, this is usually implemented with recursion.

Any binary tree can and only be uniquely represented by either pre+inorder or post+inorder traversal pairs. A single traversal of any kind is attainable from a bunch of binary trees, the number of which is given by the Catalan number.

The matter of left or right is decided by the builder of the tree. They are essentially interchangeable, so you would call where you start the left.

A full explanation of tree traversal, if you are interested.

As you may note, these 3 orders of traversal make sense only for binary trees. A node with #children more than 2 would give more combinations of where to put the current node in the sequence. So how do we deal with neuron tree structures, which can have nodes with many children, especially at the soma? It's answered in the following section.

Neuron Tree to Sequence

To make a neuron tree traversable, some preprocessing steps has to be done. In quality control, the program detects any node with #children more than 2 except for the root node (soma), and turns them into a series of bifurcations by a specific priority (further explained in the QC section).

Now, the whole neuron tree is not yet binary, but all of its subtrees, so we can reconstruct multiple binary trees with a subset of nodes excluding the root. After getting traversal sequence for each tree, it chains the sequences into one by a specific priority. Or can interpret it as an extended definition of traversal:

Order Step 1 Step 2 Step 3 ... Step n
extended pre parent child 1 child 2 ... child n-1
extended post child 1 child 2 child 3 ... parent

You may still wonder:

  1. What does the specific priority mean?
    By the total path length of the subtree, which you can take as weight. So every time running into a node, we compare the weight of the 2 subtrees and decide either the lesser(default) or greater will be traversed first. Same is true for the final chaining of sequences, and we use this priority for converting the branch nodes to binary ones.

  2. What kind of traversal order are you using?
    Though we provide the API for all the 3 orders, we only do preorder for our dataset generation. A preorder can be mapped to multiple different tree structures. That being said, we found that one order is quite enough for information embedding in our application, and there is no significant difference between them.

  3. Now that we need only 1 traversal, why must binary trees?
    You may have found, a long as the tree is traversed by a determined priority and only 1 order is enough, whether the tree is binary matters no more. That's true, but we still expect this package to give you a chance to try the complete representation of a tree by sequence for your own application. Another reason is bifurcations take up the majority of neuron tree branch nodes, and we are more interested in bifurcations for analysis.

  4. How many binary trees are there?
    It depends on the number of stems your neuron has. A stem is the branch emerging from the root, so one binary tree corresponds to one stem. However, our program adopts a more interesting strategy to build binary trees, and you can actually get more than that. In our package, by default, we build trees for axon and dendrite separately. For neurons where axon can emerge from the dendrite not as a stem, this makes a difference, in that the axon would otherwise be counted as part of dendrite in a binary tree. The Customization section has a more instructive guide.

Reasons for Quality Control

Before converting your SWC files to our sequences, some prerequisites should be met. They are:

  1. 1 tree for 1 SWC, i.e. no more than 1 root node in an SWC file.
  2. Short segments are removed.
  3. All non-root branching nodes are binary.

remove multiple trees

As a usual practice, a single SWC would only be given a single neuron tree, but it may not be true for some automatic tracing algorithms and tracings with error.

Although our final transformation of neuron tree to sequence doesn't require there to be only one connected component, as you may find out, there are still some processing steps that rely on this, so we would suggest, if you need all the unconnected components, separate them beforehand.

Otherwise, our program will mark the first root in the table and retain the component it's connected to.

guarantee bifurcation

As described above, nodes with many children would give multiple traversing choices. Although we don't really need a binary tree for getting a pre/postorder sequence of the extended definition, the nature of neuron branching and the unique representation of binary tree encourage us to assume that.

As already suggested, this QC step will consider the subtree total path length as branching priority. To be exact, a subtree branch of a lesser scale would be adjusted closer to the root. This way, if you traverse the tree from the lesser to the greater, the lesser adjusted part will always be visited first, equivalent to the way it would be without the adjustment. If you start from the greater it's the same. Yet we still implement the traversing unit as binary, so the adjustment is necessary.

Together with this adjustment, the tree nodes will be sorted so that node numbers increase from the root to the leaves, and the maximum node ID equals the tree size. This step is favorable as we need to assign new nodes for the adjustment, and the final result will be sorted again. As this sort_swc function doesn't connect tree components (unlike the Vaa3D plugin sort_neuron_swc), it requires the input SWC to have only 1 root, so the first QC step is necessary.

prune short segments

We found the main skeletons of a neuron is sufficient for encoding the tree structure, and removing the short segments can reduce the computation load. This program will prune the tree iteratively until there is no segment shorter than a given threshold. The pruning wouldn't cause a tree to 'shrink', i.e. as far as a path is longer than the threshold, it wouldn't be affected at all. You can see this pruning as removing branches that look like one. Its logic is identical to that of the Vaa3D plugin sort_neuron_swc.

Sequence Customization

The data preparation module designed for our deep learning models can be easily understood and modified. You are encouraged to learn from the processing code and use the infrastructures to customize subtree sequences, in terms of not only different traversal orders or their combination, but also the node types.

There is a trend that people are interested in disintegrating the neuron into multiple subunits by some algorithm and quantify their features, going further than the conventional axon-dendrite dichotomy. We provide the function of this very dichotomy, but you can follow it to try your own partition of neuron nodes. Given any set of nodes, even if they are unconnected, or without terminals, the program can find binary trees for you.

Three classes are designed for data preparation except QC: NeuronSequenceDataset, NeuronTree and BinarySubtree.

NeuronSequenceDataset

NeuronSequenceDataset is a high-level pipeline execution interface utilizing parallel computation. It commits data loading, quality control and traversal serially but in parallel for each step.

As a good template to start with, it contains instances of parallel computation by pairing the host processing function (ending with _parallel) with its own unit processing function (ending with _proc), for all the three steps. In most cases where you are satisfied with the multiprocessing design, you only need to inherit and derive the unit functions, and let the host invoke them for you.

In the traversal step that you might be mostly interested in for customization, the high-level class calls a child process to execute binary tree reconstruction, traversal, and retrieval of spatial features indexed by the traversal within one SWC, where NeuronTree is put in use.

NeuronTree

NeuronTree is an SWC-level interface to find binary subtrees by sets of nodes and commit traversal upon them. After initialization, you can provide the interface with a set of nodes to find subtrees.

The number of trees you can get for a given node set depends on the connectivity. If you specify all the nodes within a neuron tree, as the root will always be ignored, it will return subtrees for each stem. When you have multiple sets of nodes, like axon & dendrite, you can invoke this discovery process multiple times, each time for one set. e.g., you can find axon trees first, and the class will store the found trees, and then you invoke that for dendrite, and this time the dendrite trees will be added to the storage. The ultimate subtree number can be larger than the that of the node set.

After the discovery process, you can use this interface to do traversal by pre, in or postorder in a single shot. Here, you can also specify whether to do it from lesser to greater subtrees or the other way around. If you'd like to make a more customizable traversal, you can retrieve the binary trees and do traversal for each one of them yourself and assemble as you like.

This class only allows SWC of bifurcations except for the root, as it uses BinarySubtree for tree building.

BinarySubtree

BinarySubtree is the basic class that undertakes recursive searching of node, total subtree path length and traversal. Its instances are basically tree nodes pointing to their children, so the built subtrees exposed by NeuronTree are actually their roots.

All the recursion functions are decorated to cache data prevent stack overflow caused by Python's stack limit. Note, the caching behavior applies to any node, and can cause bugs if you modify the tree structure yourself, so you should keep it as what NeuronTree builds.

If you'd like to have other stats of the binary subtree, you can write your own recursion functions. Please follow the instructions of dsmtools.utils.misc.safe_recursion_bootstrap if you want to bypass the stack problem. Caching can only be used by a tree node member function, so ideally you could inherit BinarySubtree, but building would require refactoring NeuronTree.


Below are the class API documentations of the above classes for you to learn more about their usage and implementation. Have fun!

 1"""
 2.. include:: preprocessing_guide.md
 3"""
 4
 5from .swc_qc import SWCQualityControl
 6from .sequencing import NeuronSequenceDataset
 7from .neuron_tree import BinarySubtree, NeuronTree
 8
 9
10__all__ = ['NeuronSequenceDataset', 'SWCQualityControl', 'NeuronTree', 'BinarySubtree']
class NeuronSequenceDataset:
 18class NeuronSequenceDataset:
 19    """
 20    An interface for generating dataset from multiple SWC files to be further converted into a dataset that can be
 21    used by our deep learning models. See [the above guide](#neuronsequencedataset) for a more detailed description
 22    of its principles.
 23
 24    The default usage should be: loading SWC files with path given during initialization, then operating a quality
 25    control defined by `SWCQualityControl`, and then
 26    finding binary trees in the neuron tree and committing traversals to get the sequences. They defined as member
 27    functions are should be invoked in this order.
 28
 29    All the above procedures are coded as a host function paired with its unit function that can be called in parallel.
 30    for different SWC dataframes. You'll find 3 pairs of them, including:
 31
 32    |Host|Unit|
 33    |----|----|
 34    |read_parallel|read_proc|
 35    |qc_parallel|qc_proc|
 36    |make_sequence_parallel|make_sequence_proc|
 37
 38    See their own function documentation for more info.
 39
 40    If you need specifying some or all of the sequence conversion to your like, you can inherit this class
 41    and derive its functions. Deriving only the unit function will not change how the host invoke them and maintain the
 42    multiprocessing feature.
 43
 44    The result of this class is multiple dataframes in an OrderedDict indexed by paths, which can be retrieved
 45    as a property or saved as a pickle. This result can be further fed
 46    to the deep learning model class in the `dsmtools.modeling` module. Here, you also have access to
 47    the SWC dataframes after QC as a member property (also an OrderedDict).
 48
 49    **Note**: Do not modify the retrieved property data, which can cause bugs. Deep copy them if you need.
 50    """
 51
 52    def __init__(self,
 53                 swc_paths: Iterable[Union[str, 'PathLike[str]']],
 54                 show_failure=True,
 55                 debug=False,
 56                 chunk_size=1,
 57                 jobs=1):
 58        """Set up the class, specifying common processing options.
 59
 60        :param swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings.
 61        :param show_failure: Whether to print failed swc during processing.
 62        :param debug: Whether to print traceback messages when processes fail.
 63        :param chunk_size: The chunk size for parallel processing.
 64        :param jobs: number of processes for parallel computation.
 65        """
 66
 67        self._swc_paths: tuple[str] = tuple(str(i) for i in swc_paths)
 68        self._swc_raw: Optional[OrderedDict[str, pd.DataFrame]] = None
 69        self._swc_qc: Optional[OrderedDict[str, pd.DataFrame]] = None
 70        self._sequences: Optional[OrderedDict[str, tuple[pd.DataFrame, str]]] = None
 71        self.chunk_size = chunk_size
 72        self.jobs = jobs
 73        self.show_failure = show_failure
 74        self.debug = debug
 75
 76    def read_parallel(self):
 77        """
 78        Read all the given SWC files and store them as an OrderedDict of dataframes within this object.
 79        The multiprocessing parameters are given by the class initialization.
 80        """
 81
 82        with Pool(self.jobs) as p:
 83            res = p.map(self.read_proc, self._swc_paths, chunksize=self.chunk_size)
 84            res = filter(lambda i: i[1] is not None, zip(self._swc_paths, res))
 85            self._swc_raw = OrderedDict(res)
 86        print(f'Successfully loaded {len(self._swc_raw)} SWC files.')
 87
 88    def read_proc(self, path: str) -> Optional[pd.DataFrame]:
 89        """
 90        Loading of a single SWC file, with a try except mechanism to prevent interruption.
 91        Failed processes will print a message and return None.
 92
 93        :param path: Path to the SWC.
 94        :return:  An SWC dataframe, or None if failed with some error.
 95        """
 96
 97        try:
 98            return utils.swc.read_swc(path)
 99        except:
100            if self.show_failure:
101                if self.debug:
102                    traceback.print_exc()
103                print(f'Some error occurred when reading SWC: {path}, better check your file.')
104            return None
105
106    def qc_parallel(self, qc_len_thr=10, min_node_count=10):
107        """
108        Commit quality control on all the loaded SWC dataframes. This operation is not inplace
109        and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.swc_after_qc`.
110        Here, you can specify QC parameters passed to the unit processing function.
111
112        To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of
113        which is also customizable.
114
115        Before running this function, it will check if SWC files are loaded. If not, it will invoke
116        `NeuronSequenceDataset.read_parallel`.
117
118        The multiprocessing parameters are given by the class initialization.
119
120        :param qc_len_thr: The pruning length threshold for quality control.
121        :param min_node_count: The minimum allowed count of nodes for the final SWC.
122        """
123
124        if self._swc_raw is None:
125            warn("No SWC, running read_parallel..")
126            self.read_parallel()
127        qc_proc = partial(self.qc_proc, qc_len_thr=qc_len_thr)
128        with Pool(self.jobs) as p:
129            res = p.map(qc_proc, self._swc_raw.keys(), chunksize=self.chunk_size)
130            res = filter(lambda i: i[1] is not None and len(i[1]) >= min_node_count, zip(self._swc_raw.keys(), res))
131            self._swc_qc = OrderedDict(res)
132        print(f'Successfully quality-controlled {len(self._swc_qc)} SWC files.')
133
134    def qc_proc(self, key: str, qc_len_thr: float) -> Optional[pd.DataFrame]:
135        """
136        Quality control of a single SWC file, with a try except mechanism to prevent interruption.
137        Failed processes will print a message and return None.
138
139        It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments.
140        It's advised that this order is maintained.
141
142        :param key: The key of the loaded SWC, should be a string path.
143        :param qc_len_thr: The pruning length threshold for quality control.
144        :return: An SWC dataframe, or None if failed with some error.
145        """
146
147        try:
148            swc = self._swc_raw[key].copy(deep=True)
149            SWCQualityControl(swc).retain_only_1st_root().degrade_to_bifurcation().prune_by_len(qc_len_thr)
150            return swc
151        except:
152            if self.show_failure:
153                if self.debug:
154                    traceback.print_exc()
155                print(f'Some error occurred during quality control with SWC: {key}.')
156            return None
157
158    def make_sequence_parallel(self, ordering='pre', lesser_first=True):
159        """Convert all neuron trees passing previous procedures to sequences. This operation is not inplace
160        and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.result`.
161        Here, you can specify some traversal parameters passed to the unit processing function.
162
163        Before running this function, it will check if quality control is done. If not, it will invoke
164        `NeuronSequenceDataset.qc_parallel`.
165
166        The multiprocessing parameters are given by the class initialization.
167
168        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
169        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
170        """
171
172        if self._swc_qc is None:
173            warn("No QC SWC, running qc_parallel with default parameters..")
174            self.qc_parallel()
175        assert ordering in ['pre', 'in', 'post']
176        make_seq = partial(self.make_sequence_proc, ordering=ordering, lesser_first=lesser_first)
177        with Pool(self.jobs) as p:
178            res = p.map(make_seq, self._swc_qc)
179            res = filter(lambda i: i[1] is not None, zip(self._swc_qc.keys(), res))
180            self._sequences = OrderedDict(res)
181        print(f'Successfully made {len(self._sequences)} sequences.')
182
183    def make_sequence_proc(self, key: str, ordering: str, lesser_first: bool):
184        """Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence.
185
186        It uses `NeuronTree` to build binary trees separately for axon and dendrite from an SWC, and traverses the trees
187        with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized.
188
189        The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by
190        the traversal sequence.
191
192        The added features are CCFv3 region abbreviations retrieved by `dsmtools.utils.ccf_atlas` module (in columns
193        'region'), and topological annotations (in column 'node_type') of four types: GBTR,
194        representing general, branch, terminal, and root.
195
196        :param key: The key of the loaded SWC, should be a string path.
197        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
198        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
199        :return:
200        """
201        try:
202            # shallow copy the qc dataframe for new columns
203            swc = self._swc_qc[key].copy(deep=True)
204            nt = NeuronTree(swc)
205            axon_index = swc.index[swc['type'] == utils.swc.Type.AXON]
206            dendrite_index = swc.index[swc['type'].isin([utils.swc.Type.APICAL, utils.swc.Type.BASAL])]
207            nt.find_binary_trees_in(axon_index)
208            nt.find_binary_trees_in(dendrite_index)
209            traversal = nt.collective_traversal(ordering=ordering, lesser_first=lesser_first)
210            # node type, topological annotation
211            root = swc.index[swc['parent'] == -1]
212            children_counts = swc['parent'].value_counts()
213            branch = children_counts.index[children_counts.values > 1]
214            terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index)
215            swc['node_type'] = 'G'
216            swc.loc[root, "node_type"] = 'R'
217            swc.loc[branch, 'node_type'] = 'B'
218            swc.loc[terminal, 'node_type'] = 'T'
219            # ccf region annotation
220            swc['region'] = utils.ccf_atlas.annotate_swc(swc)
221            return swc.loc[traversal, ['x', 'y', 'z', 'type', 'node_type', 'region']]
222        except:
223            if self.show_failure:
224                if self.debug:
225                    traceback.print_exc()
226                print(f'Some error occurred when converting SWC to sequence by default: {key}.')
227            return None
228
229    def pickle_qc(self, path: Union[str, 'PathLike[str]']):
230        """Save the SWC dataframes after qc as an ordered dict to a pickle file.
231
232        :param path: the path to save the pickle
233        """
234        with open(path, 'wb') as f:
235            pickle.dump(self._swc_qc, f)
236
237    def pickle_result(self, path: Union[str, 'PathLike[str]']):
238        """Save the generated sequences as an ordered dict to a pickle file.
239
240        :param path: the path to save the pickle
241        """
242        with open(path, 'wb') as f:
243            pickle.dump(self._sequences, f)
244
245    @property
246    def swc_after_qc(self) -> Optional[OrderedDict[str, pd.DataFrame]]:
247        """Reference to the SWC files after quality control. If no QC is done, it's None."""
248        return self._swc_qc
249
250    @property
251    def result(self) -> Optional[OrderedDict[str, pd.DataFrame]]:
252        """The sequence dataframe as an OrderedDict, indexed by paths."""
253        return self._sequences

An interface for generating dataset from multiple SWC files to be further converted into a dataset that can be used by our deep learning models. See the above guide for a more detailed description of its principles.

The default usage should be: loading SWC files with path given during initialization, then operating a quality control defined by SWCQualityControl, and then finding binary trees in the neuron tree and committing traversals to get the sequences. They defined as member functions are should be invoked in this order.

All the above procedures are coded as a host function paired with its unit function that can be called in parallel. for different SWC dataframes. You'll find 3 pairs of them, including:

Host Unit
read_parallel read_proc
qc_parallel qc_proc
make_sequence_parallel make_sequence_proc

See their own function documentation for more info.

If you need specifying some or all of the sequence conversion to your like, you can inherit this class and derive its functions. Deriving only the unit function will not change how the host invoke them and maintain the multiprocessing feature.

The result of this class is multiple dataframes in an OrderedDict indexed by paths, which can be retrieved as a property or saved as a pickle. This result can be further fed to the deep learning model class in the dsmtools.modeling module. Here, you also have access to the SWC dataframes after QC as a member property (also an OrderedDict).

Note: Do not modify the retrieved property data, which can cause bugs. Deep copy them if you need.

NeuronSequenceDataset( swc_paths: Iterable[Union[str, os.PathLike[str]]], show_failure=True, debug=False, chunk_size=1, jobs=1)
52    def __init__(self,
53                 swc_paths: Iterable[Union[str, 'PathLike[str]']],
54                 show_failure=True,
55                 debug=False,
56                 chunk_size=1,
57                 jobs=1):
58        """Set up the class, specifying common processing options.
59
60        :param swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings.
61        :param show_failure: Whether to print failed swc during processing.
62        :param debug: Whether to print traceback messages when processes fail.
63        :param chunk_size: The chunk size for parallel processing.
64        :param jobs: number of processes for parallel computation.
65        """
66
67        self._swc_paths: tuple[str] = tuple(str(i) for i in swc_paths)
68        self._swc_raw: Optional[OrderedDict[str, pd.DataFrame]] = None
69        self._swc_qc: Optional[OrderedDict[str, pd.DataFrame]] = None
70        self._sequences: Optional[OrderedDict[str, tuple[pd.DataFrame, str]]] = None
71        self.chunk_size = chunk_size
72        self.jobs = jobs
73        self.show_failure = show_failure
74        self.debug = debug

Set up the class, specifying common processing options.

Parameters
  • swc_paths: An iterable of file paths to the SWC files to load. Note they will be forced as strings.
  • show_failure: Whether to print failed swc during processing.
  • debug: Whether to print traceback messages when processes fail.
  • chunk_size: The chunk size for parallel processing.
  • jobs: number of processes for parallel computation.
def read_parallel(self):
76    def read_parallel(self):
77        """
78        Read all the given SWC files and store them as an OrderedDict of dataframes within this object.
79        The multiprocessing parameters are given by the class initialization.
80        """
81
82        with Pool(self.jobs) as p:
83            res = p.map(self.read_proc, self._swc_paths, chunksize=self.chunk_size)
84            res = filter(lambda i: i[1] is not None, zip(self._swc_paths, res))
85            self._swc_raw = OrderedDict(res)
86        print(f'Successfully loaded {len(self._swc_raw)} SWC files.')

Read all the given SWC files and store them as an OrderedDict of dataframes within this object. The multiprocessing parameters are given by the class initialization.

def read_proc(self, path: str) -> Optional[pandas.core.frame.DataFrame]:
 88    def read_proc(self, path: str) -> Optional[pd.DataFrame]:
 89        """
 90        Loading of a single SWC file, with a try except mechanism to prevent interruption.
 91        Failed processes will print a message and return None.
 92
 93        :param path: Path to the SWC.
 94        :return:  An SWC dataframe, or None if failed with some error.
 95        """
 96
 97        try:
 98            return utils.swc.read_swc(path)
 99        except:
100            if self.show_failure:
101                if self.debug:
102                    traceback.print_exc()
103                print(f'Some error occurred when reading SWC: {path}, better check your file.')
104            return None

Loading of a single SWC file, with a try except mechanism to prevent interruption. Failed processes will print a message and return None.

Parameters
  • path: Path to the SWC.
Returns

An SWC dataframe, or None if failed with some error.

def qc_parallel(self, qc_len_thr=10, min_node_count=10):
106    def qc_parallel(self, qc_len_thr=10, min_node_count=10):
107        """
108        Commit quality control on all the loaded SWC dataframes. This operation is not inplace
109        and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.swc_after_qc`.
110        Here, you can specify QC parameters passed to the unit processing function.
111
112        To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of
113        which is also customizable.
114
115        Before running this function, it will check if SWC files are loaded. If not, it will invoke
116        `NeuronSequenceDataset.read_parallel`.
117
118        The multiprocessing parameters are given by the class initialization.
119
120        :param qc_len_thr: The pruning length threshold for quality control.
121        :param min_node_count: The minimum allowed count of nodes for the final SWC.
122        """
123
124        if self._swc_raw is None:
125            warn("No SWC, running read_parallel..")
126            self.read_parallel()
127        qc_proc = partial(self.qc_proc, qc_len_thr=qc_len_thr)
128        with Pool(self.jobs) as p:
129            res = p.map(qc_proc, self._swc_raw.keys(), chunksize=self.chunk_size)
130            res = filter(lambda i: i[1] is not None and len(i[1]) >= min_node_count, zip(self._swc_raw.keys(), res))
131            self._swc_qc = OrderedDict(res)
132        print(f'Successfully quality-controlled {len(self._swc_qc)} SWC files.')

Commit quality control on all the loaded SWC dataframes. This operation is not inplace and set up a new OrderedDict, which can be referenced by the property NeuronSequenceDataset.swc_after_qc. Here, you can specify QC parameters passed to the unit processing function.

To prevent empty dataframes, this function applies a final filter on the QC results, the parameter of which is also customizable.

Before running this function, it will check if SWC files are loaded. If not, it will invoke NeuronSequenceDataset.read_parallel.

The multiprocessing parameters are given by the class initialization.

Parameters
  • qc_len_thr: The pruning length threshold for quality control.
  • min_node_count: The minimum allowed count of nodes for the final SWC.
def qc_proc( self, key: str, qc_len_thr: float) -> Optional[pandas.core.frame.DataFrame]:
134    def qc_proc(self, key: str, qc_len_thr: float) -> Optional[pd.DataFrame]:
135        """
136        Quality control of a single SWC file, with a try except mechanism to prevent interruption.
137        Failed processes will print a message and return None.
138
139        It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments.
140        It's advised that this order is maintained.
141
142        :param key: The key of the loaded SWC, should be a string path.
143        :param qc_len_thr: The pruning length threshold for quality control.
144        :return: An SWC dataframe, or None if failed with some error.
145        """
146
147        try:
148            swc = self._swc_raw[key].copy(deep=True)
149            SWCQualityControl(swc).retain_only_1st_root().degrade_to_bifurcation().prune_by_len(qc_len_thr)
150            return swc
151        except:
152            if self.show_failure:
153                if self.debug:
154                    traceback.print_exc()
155                print(f'Some error occurred during quality control with SWC: {key}.')
156            return None

Quality control of a single SWC file, with a try except mechanism to prevent interruption. Failed processes will print a message and return None.

It serially commits retaining tree from the root detected, bifurcation checking, and removing short segments. It's advised that this order is maintained.

Parameters
  • key: The key of the loaded SWC, should be a string path.
  • qc_len_thr: The pruning length threshold for quality control.
Returns

An SWC dataframe, or None if failed with some error.

def make_sequence_parallel(self, ordering='pre', lesser_first=True):
158    def make_sequence_parallel(self, ordering='pre', lesser_first=True):
159        """Convert all neuron trees passing previous procedures to sequences. This operation is not inplace
160        and set up a new OrderedDict, which can be referenced by the property `NeuronSequenceDataset.result`.
161        Here, you can specify some traversal parameters passed to the unit processing function.
162
163        Before running this function, it will check if quality control is done. If not, it will invoke
164        `NeuronSequenceDataset.qc_parallel`.
165
166        The multiprocessing parameters are given by the class initialization.
167
168        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
169        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
170        """
171
172        if self._swc_qc is None:
173            warn("No QC SWC, running qc_parallel with default parameters..")
174            self.qc_parallel()
175        assert ordering in ['pre', 'in', 'post']
176        make_seq = partial(self.make_sequence_proc, ordering=ordering, lesser_first=lesser_first)
177        with Pool(self.jobs) as p:
178            res = p.map(make_seq, self._swc_qc)
179            res = filter(lambda i: i[1] is not None, zip(self._swc_qc.keys(), res))
180            self._sequences = OrderedDict(res)
181        print(f'Successfully made {len(self._sequences)} sequences.')

Convert all neuron trees passing previous procedures to sequences. This operation is not inplace and set up a new OrderedDict, which can be referenced by the property NeuronSequenceDataset.result. Here, you can specify some traversal parameters passed to the unit processing function.

Before running this function, it will check if quality control is done. If not, it will invoke NeuronSequenceDataset.qc_parallel.

The multiprocessing parameters are given by the class initialization.

Parameters
  • ordering: The traversal type, either 'pre', 'in', or 'post'.
  • lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
def make_sequence_proc(self, key: str, ordering: str, lesser_first: bool):
183    def make_sequence_proc(self, key: str, ordering: str, lesser_first: bool):
184        """Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence.
185
186        It uses `NeuronTree` to build binary trees separately for axon and dendrite from an SWC, and traverses the trees
187        with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized.
188
189        The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by
190        the traversal sequence.
191
192        The added features are CCFv3 region abbreviations retrieved by `dsmtools.utils.ccf_atlas` module (in columns
193        'region'), and topological annotations (in column 'node_type') of four types: GBTR,
194        representing general, branch, terminal, and root.
195
196        :param key: The key of the loaded SWC, should be a string path.
197        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
198        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
199        :return:
200        """
201        try:
202            # shallow copy the qc dataframe for new columns
203            swc = self._swc_qc[key].copy(deep=True)
204            nt = NeuronTree(swc)
205            axon_index = swc.index[swc['type'] == utils.swc.Type.AXON]
206            dendrite_index = swc.index[swc['type'].isin([utils.swc.Type.APICAL, utils.swc.Type.BASAL])]
207            nt.find_binary_trees_in(axon_index)
208            nt.find_binary_trees_in(dendrite_index)
209            traversal = nt.collective_traversal(ordering=ordering, lesser_first=lesser_first)
210            # node type, topological annotation
211            root = swc.index[swc['parent'] == -1]
212            children_counts = swc['parent'].value_counts()
213            branch = children_counts.index[children_counts.values > 1]
214            terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index)
215            swc['node_type'] = 'G'
216            swc.loc[root, "node_type"] = 'R'
217            swc.loc[branch, 'node_type'] = 'B'
218            swc.loc[terminal, 'node_type'] = 'T'
219            # ccf region annotation
220            swc['region'] = utils.ccf_atlas.annotate_swc(swc)
221            return swc.loc[traversal, ['x', 'y', 'z', 'type', 'node_type', 'region']]
222        except:
223            if self.show_failure:
224                if self.debug:
225                    traceback.print_exc()
226                print(f'Some error occurred when converting SWC to sequence by default: {key}.')
227            return None

Convert an SWC dataframe to a feature dataframe indexed by a traversal sequence.

It uses NeuronTree to build binary trees separately for axon and dendrite from an SWC, and traverses the trees with the parameters specified, which by default is preorder traversal and lesser subtrees prioritized.

The final dataframe is similar to an SWC, but different as it's with some additional features and indexed by the traversal sequence.

The added features are CCFv3 region abbreviations retrieved by dsmtools.utils.ccf_atlas module (in columns 'region'), and topological annotations (in column 'node_type') of four types: GBTR, representing general, branch, terminal, and root.

Parameters
  • key: The key of the loaded SWC, should be a string path.
  • ordering: The traversal type, either 'pre', 'in', or 'post'.
  • lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
Returns
def pickle_qc(self, path: Union[str, os.PathLike[str]]):
229    def pickle_qc(self, path: Union[str, 'PathLike[str]']):
230        """Save the SWC dataframes after qc as an ordered dict to a pickle file.
231
232        :param path: the path to save the pickle
233        """
234        with open(path, 'wb') as f:
235            pickle.dump(self._swc_qc, f)

Save the SWC dataframes after qc as an ordered dict to a pickle file.

Parameters
  • path: the path to save the pickle
def pickle_result(self, path: Union[str, os.PathLike[str]]):
237    def pickle_result(self, path: Union[str, 'PathLike[str]']):
238        """Save the generated sequences as an ordered dict to a pickle file.
239
240        :param path: the path to save the pickle
241        """
242        with open(path, 'wb') as f:
243            pickle.dump(self._sequences, f)

Save the generated sequences as an ordered dict to a pickle file.

Parameters
  • path: the path to save the pickle
swc_after_qc: Optional[collections.OrderedDict[str, pandas.core.frame.DataFrame]]

Reference to the SWC files after quality control. If no QC is done, it's None.

result: Optional[collections.OrderedDict[str, pandas.core.frame.DataFrame]]

The sequence dataframe as an OrderedDict, indexed by paths.

class SWCQualityControl:
  9class SWCQualityControl:
 10    """Interface for executing SWC quality controls. See [the above explanation](#reasons-for-qc) for its principle.
 11
 12    To use this class, initialize it with an SWC dataframe (used as reference, so all the changes are inplace), and
 13    call each member function in a specific order (the recommended order is implemented by that of
 14    `NeuronSequenceDataset.qc_proc`).
 15    """
 16
 17    def __init__(self, swc: pd.DataFrame):
 18        """Set the referenced SWC dataframe to modify.
 19
 20        :param swc: An SWC dataframe, all changes are in place.
 21        """
 22
 23        self._swc = swc
 24
 25    def retain_only_1st_root(self):
 26        """
 27        Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component.
 28        """
 29
 30        roots = self._swc[self._swc.parent == -1]
 31        if len(roots) == 1:
 32            return self
 33        if len(roots) == 0:
 34            raise RuntimeError("No root in this SWC.")
 35        visited = get_subtree_nodes(self._swc, roots.iloc[0, 0])
 36        self._swc.drop(index=self._swc.index.difference(visited), inplace=True)
 37        return self
 38
 39    def degrade_to_bifurcation(self):
 40        """Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node.
 41
 42        It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be
 43        generated). Then it iterates through the unqualified nodes and turn them into bifurcations.
 44
 45        The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path
 46        length are nearer to the root.
 47        """
 48
 49        sort_swc(self._swc)
 50        swc = self._swc.copy(deep=True)
 51        new_ind = max(swc.index)
 52        child_dict = get_child_dict(swc)
 53        for node, children in child_dict.items():
 54            if len(children) <= 2 or swc.at[node, 'parent'] == -1:
 55                continue
 56            # get path length of all subtrees and sort
 57            len_list = [get_path_len(swc, get_subtree_nodes(swc, c)) for c in children]
 58            # from short to long disintegrate child segments to make them bifurcation
 59            # short child seg would be brought up to a parent node first
 60            # the parent node share the coordinate with the original branch node
 61            for i in np.argsort(len_list)[:-1]:
 62                to_move = children[i]
 63                multi = self._swc.at[to_move, 'parent']
 64                parent = self._swc.at[multi, 'parent']
 65                new_ind += 1
 66                self._swc.loc[new_ind] = self._swc.loc[multi].copy(deep=True)       # new node
 67                self._swc.at[new_ind, 'parent'] = parent
 68                self._swc.at[multi, 'parent'] = new_ind
 69                self._swc.at[to_move, 'parent'] = new_ind
 70        sort_swc(self._swc)
 71        return self
 72
 73    def prune_by_len(self, len_thr=10):
 74        """
 75        This function iteratively prune the tree. Every time it finds the short terminal branches and delete them,
 76        and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with
 77        no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained.
 78
 79        :param len_thr: The min length allowed for terminal branches, default as 10
 80        """
 81
 82        while True:
 83            segments = get_segments(self._swc)
 84            drop_ind = []
 85            terminal_branch = np.unique([row['parentSeg'] for ind, row in segments.iterrows() if row['childSeg'] == []])
 86            for ind in terminal_branch:
 87                # retain non-terminal branches
 88                child_seg = [i for i in segments.at[ind, 'childSeg'] if segments.at[i, 'childSeg'] == []]
 89                # get length for all segments
 90                seg_len = [get_path_len(self._swc, segments.at[c, 'nodes']) for c in child_seg]
 91                # all are terminal branches, then the longest branch must be retained
 92                if child_seg == segments.at[ind, 'childSeg']:
 93                    k = np.argmax(seg_len)
 94                    child_seg.pop(k)
 95                    seg_len.pop(k)
 96                # drop short branches
 97                drop_ind.extend(chain.from_iterable(segments.at[c, 'nodes']
 98                                                    for c, s in zip(child_seg, seg_len) if s <= len_thr))
 99            if not drop_ind:
100                break
101            self._swc.drop(index=drop_ind, inplace=True)
102        return self

Interface for executing SWC quality controls. See the above explanation for its principle.

To use this class, initialize it with an SWC dataframe (used as reference, so all the changes are inplace), and call each member function in a specific order (the recommended order is implemented by that of NeuronSequenceDataset.qc_proc).

SWCQualityControl(swc: pandas.core.frame.DataFrame)
17    def __init__(self, swc: pd.DataFrame):
18        """Set the referenced SWC dataframe to modify.
19
20        :param swc: An SWC dataframe, all changes are in place.
21        """
22
23        self._swc = swc

Set the referenced SWC dataframe to modify.

Parameters
  • swc: An SWC dataframe, all changes are in place.
def retain_only_1st_root(self):
25    def retain_only_1st_root(self):
26        """
27        Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component.
28        """
29
30        roots = self._swc[self._swc.parent == -1]
31        if len(roots) == 1:
32            return self
33        if len(roots) == 0:
34            raise RuntimeError("No root in this SWC.")
35        visited = get_subtree_nodes(self._swc, roots.iloc[0, 0])
36        self._swc.drop(index=self._swc.index.difference(visited), inplace=True)
37        return self

Find the first root(parent=-1) in the SWC dataframe and remove all the other nodes outside this component.

def degrade_to_bifurcation(self):
39    def degrade_to_bifurcation(self):
40        """Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node.
41
42        It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be
43        generated). Then it iterates through the unqualified nodes and turn them into bifurcations.
44
45        The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path
46        length are nearer to the root.
47        """
48
49        sort_swc(self._swc)
50        swc = self._swc.copy(deep=True)
51        new_ind = max(swc.index)
52        child_dict = get_child_dict(swc)
53        for node, children in child_dict.items():
54            if len(children) <= 2 or swc.at[node, 'parent'] == -1:
55                continue
56            # get path length of all subtrees and sort
57            len_list = [get_path_len(swc, get_subtree_nodes(swc, c)) for c in children]
58            # from short to long disintegrate child segments to make them bifurcation
59            # short child seg would be brought up to a parent node first
60            # the parent node share the coordinate with the original branch node
61            for i in np.argsort(len_list)[:-1]:
62                to_move = children[i]
63                multi = self._swc.at[to_move, 'parent']
64                parent = self._swc.at[multi, 'parent']
65                new_ind += 1
66                self._swc.loc[new_ind] = self._swc.loc[multi].copy(deep=True)       # new node
67                self._swc.at[new_ind, 'parent'] = parent
68                self._swc.at[multi, 'parent'] = new_ind
69                self._swc.at[to_move, 'parent'] = new_ind
70        sort_swc(self._swc)
71        return self

Turn all branch nodes with more than 2 children into multiple serial bifurcations except for the root node.

It commits a node ID sorting at first to find the start number for new nodes (as new bifurcations can be generated). Then it iterates through the unqualified nodes and turn them into bifurcations.

The new bifurcations from the original branch node are arranged in a way that subtrees of lesser total path length are nearer to the root.

def prune_by_len(self, len_thr=10):
 73    def prune_by_len(self, len_thr=10):
 74        """
 75        This function iteratively prune the tree. Every time it finds the short terminal branches and delete them,
 76        and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with
 77        no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained.
 78
 79        :param len_thr: The min length allowed for terminal branches, default as 10
 80        """
 81
 82        while True:
 83            segments = get_segments(self._swc)
 84            drop_ind = []
 85            terminal_branch = np.unique([row['parentSeg'] for ind, row in segments.iterrows() if row['childSeg'] == []])
 86            for ind in terminal_branch:
 87                # retain non-terminal branches
 88                child_seg = [i for i in segments.at[ind, 'childSeg'] if segments.at[i, 'childSeg'] == []]
 89                # get length for all segments
 90                seg_len = [get_path_len(self._swc, segments.at[c, 'nodes']) for c in child_seg]
 91                # all are terminal branches, then the longest branch must be retained
 92                if child_seg == segments.at[ind, 'childSeg']:
 93                    k = np.argmax(seg_len)
 94                    child_seg.pop(k)
 95                    seg_len.pop(k)
 96                # drop short branches
 97                drop_ind.extend(chain.from_iterable(segments.at[c, 'nodes']
 98                                                    for c, s in zip(child_seg, seg_len) if s <= len_thr))
 99            if not drop_ind:
100                break
101            self._swc.drop(index=drop_ind, inplace=True)
102        return self

This function iteratively prune the tree. Every time it finds the short terminal branches and delete them, and new terminal branches emerge, until no short branches detected. This way, the tree is ensured to be with no terminal branches shorter than the threshold, but the main skeleton to the farthest reach is maintained.

Parameters
  • len_thr: The min length allowed for terminal branches, default as 10
class NeuronTree:
138class NeuronTree:
139    """Interface for generating binary trees and commit traversals for the whole SWC neuron tree.
140    See [the above guide](#neurontree) for a more detailed description of its principles.
141
142    With this interface, you can provide a set of nodes to its binary tree find function
143    `NeuronTree.find_binary_trees_in`. The method will take your input as the foreground, and do a backtracking and
144    merge to generate multiple trees (the set of nodes given can be disconnected).
145
146    For tree generation, it implements a tree building method `NeuronTree.build_a_tree` using `BinarySubtree` as units.
147    At last, the root `BinarySubtree` nodes of the built trees will be saved.
148
149    You can repeat this process multiple times and the interface will aggregate your results by default,
150    which is accessible as `NeuronTree.binary_trees`. You are free to use this property to traverse them with the
151    `BinaryTree` methods yourself or use `NeuronTree.collective_traversal` to get it at once.
152    """
153
154    def __init__(self, swc: pd.DataFrame, tree_node_class=BinarySubtree):
155        """
156        Initialize by assigning an SWC dataframe as a reference.
157        It will only use the xyz and parent fields, and will check if they exist at this stage.
158        Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated.
159
160        :param swc: An SWC dataframe, no modification takes place.
161        :param tree_node_class: The binary tree node class to use for building, as you can provide your own.
162        """
163
164        self._df = swc[['x', 'y', 'z', 'parent']]
165        self._tree_class = tree_node_class
166        self._trees = list[tree_node_class]()
167        self._child_dict = get_child_dict(self._df)
168
169    @property
170    def binary_trees(self):
171        return self._trees
172
173    @safe_recursion_bootstrap
174    def build_a_tree(self, rt: int, within: set, stack: LifoQueue):
175        """Build a binary tree from a root using a specified tree node class.
176
177        :param rt: The root to start with, and to proceed in recursion.
178        :param within: The range of node IDs to build trees from, children outside will be ignored.
179        :param stack: A new queue.LifoQueue to handle recursion in decorators.
180        :return: The root of the binary tree as the specified tree node class.
181        """
182        children = []
183        for i in self._child_dict[rt]:
184            if i in within:
185                children.append((yield self.build_a_tree(i, within, stack=stack)))
186        yield self._tree_class(rt, self._df.loc[rt, ['x', 'y', 'z']], children)
187
188    def find_binary_trees_in(self, index: Sequence[int], overwrite=False):
189        """Find the largest binary trees within a subset of nodes of an SWC.
190
191        With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get
192        traversals for specific structures.
193        Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the
194        whole neuron will not be counted even if provided.
195
196        By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees
197        are as deep as they can be, working like a union-find. The backtracking starts from 'terminals',
198        the nodes without children in the set, rather than the real terminals of the SWC.
199
200        After backtracking, it invokes `NeuronTree.build_a_tree` to build the trees as linked nodes recursively. By
201        default, it uses `BinarySubtree` as the tree node class. You can change it during initialization.
202
203        This method will save the results in the property `NeuronTree.binary_trees`. It's a list and every time you
204        invoke the function, the new results will be appended, unless you specify to overwrite the whole list.
205
206        :param index: A sequence containing nodes including the subtrees to search.
207        :param overwrite: Whether to overwrite the existing trees or extend.
208        """
209
210        index = pd.Index(index)
211        child_count = self._df.loc[index, 'parent'].value_counts()
212        ends = index.difference(child_count.index)
213        sorted_nodes = {}
214        root_index = self._df.index[self._df['parent'] == -1]
215        for n in ends:
216            backtrack = []
217            while n in index and n not in root_index:  # root can't be included in a binary tree, must stop
218                backtrack.append(n)
219                n = self._df.loc[n, 'parent']
220                for root, tree in sorted_nodes.items():
221                    if n in tree:
222                        sorted_nodes[root] |= set(backtrack)
223                        break
224                else:  # when current subtrees contain none of backtrack, continue backtrack
225                    continue
226                break  # complete backtracking, break and start from next terminal
227            else:  # execute when cur_id run out of nodes, i.e. meeting a new ending
228                sorted_nodes[backtrack[-1]] = set(backtrack)
229
230        trees = [self.build_a_tree(root, nodes, stack=LifoQueue()) for root, nodes in sorted_nodes.items()]
231        if overwrite:
232            self._trees = trees
233        else:
234            self._trees.extend(trees)
235
236    def collective_traversal(self, ordering='pre', lesser_first=True) -> tuple[int]:
237        """Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority.
238
239        You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees
240        are traversed first than the greater.
241        This method will also chain the traversals of each binary tree into one, by the same order with the traversal in
242        each subtree.
243
244        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
245        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
246        :return: A list of node IDs.
247        """
248
249        trees = sorted(self._trees, key=lambda x: x.tot_path_len(stack=LifoQueue()), reverse=not lesser_first)
250        if ordering == 'pre':
251            traversal = (t.preorder(lesser_first, stack=LifoQueue()) for t in trees)
252        elif ordering == 'in':
253            traversal = (t.inorder(lesser_first, stack=LifoQueue()) for t in trees)
254        elif ordering == 'post':
255            traversal = (t.postorder(lesser_first, stack=LifoQueue()) for t in trees)
256        else:
257            raise "Must specify an ordering among 'pre', 'in' and 'post'."
258        return sum(traversal, ())

Interface for generating binary trees and commit traversals for the whole SWC neuron tree. See the above guide for a more detailed description of its principles.

With this interface, you can provide a set of nodes to its binary tree find function NeuronTree.find_binary_trees_in. The method will take your input as the foreground, and do a backtracking and merge to generate multiple trees (the set of nodes given can be disconnected).

For tree generation, it implements a tree building method NeuronTree.build_a_tree using BinarySubtree as units. At last, the root BinarySubtree nodes of the built trees will be saved.

You can repeat this process multiple times and the interface will aggregate your results by default, which is accessible as NeuronTree.binary_trees. You are free to use this property to traverse them with the BinaryTree methods yourself or use NeuronTree.collective_traversal to get it at once.

NeuronTree( swc: pandas.core.frame.DataFrame, tree_node_class=<class 'dsmtools.preprocessing.BinarySubtree'>)
154    def __init__(self, swc: pd.DataFrame, tree_node_class=BinarySubtree):
155        """
156        Initialize by assigning an SWC dataframe as a reference.
157        It will only use the xyz and parent fields, and will check if they exist at this stage.
158        Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated.
159
160        :param swc: An SWC dataframe, no modification takes place.
161        :param tree_node_class: The binary tree node class to use for building, as you can provide your own.
162        """
163
164        self._df = swc[['x', 'y', 'z', 'parent']]
165        self._tree_class = tree_node_class
166        self._trees = list[tree_node_class]()
167        self._child_dict = get_child_dict(self._df)

Initialize by assigning an SWC dataframe as a reference. It will only use the xyz and parent fields, and will check if they exist at this stage. Any change of the original copy of the dataframe is deprecated, as the initialization info would be outdated.

Parameters
  • swc: An SWC dataframe, no modification takes place.
  • tree_node_class: The binary tree node class to use for building, as you can provide your own.
def build_a_tree(*args, stack: queue.LifoQueue, **kwargs):
36    def wrapped_func(*args, stack: LifoQueue, **kwargs):
37        if not stack.empty():               # where next goes to, return the generator directly
38            return func(*args, stack=stack, **kwargs)
39        recur = func(*args, stack=stack, **kwargs)       # initialized generator
40        while True:
41            if isinstance(recur, Generator):
42                stack.put(recur)
43                recur = next(recur)
44            else:       # it's a number then, computation done for this branch
45                stack.get()
46                if stack.empty():
47                    break
48                recur = stack.queue[-1].send(recur)     # send the result back to its parent
49        return recur

Build a binary tree from a root using a specified tree node class.

Parameters
  • rt: The root to start with, and to proceed in recursion.
  • within: The range of node IDs to build trees from, children outside will be ignored.
  • stack: A new queue.LifoQueue to handle recursion in decorators.
Returns

The root of the binary tree as the specified tree node class.

def find_binary_trees_in(self, index: Sequence[int], overwrite=False):
188    def find_binary_trees_in(self, index: Sequence[int], overwrite=False):
189        """Find the largest binary trees within a subset of nodes of an SWC.
190
191        With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get
192        traversals for specific structures.
193        Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the
194        whole neuron will not be counted even if provided.
195
196        By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees
197        are as deep as they can be, working like a union-find. The backtracking starts from 'terminals',
198        the nodes without children in the set, rather than the real terminals of the SWC.
199
200        After backtracking, it invokes `NeuronTree.build_a_tree` to build the trees as linked nodes recursively. By
201        default, it uses `BinarySubtree` as the tree node class. You can change it during initialization.
202
203        This method will save the results in the property `NeuronTree.binary_trees`. It's a list and every time you
204        invoke the function, the new results will be appended, unless you specify to overwrite the whole list.
205
206        :param index: A sequence containing nodes including the subtrees to search.
207        :param overwrite: Whether to overwrite the existing trees or extend.
208        """
209
210        index = pd.Index(index)
211        child_count = self._df.loc[index, 'parent'].value_counts()
212        ends = index.difference(child_count.index)
213        sorted_nodes = {}
214        root_index = self._df.index[self._df['parent'] == -1]
215        for n in ends:
216            backtrack = []
217            while n in index and n not in root_index:  # root can't be included in a binary tree, must stop
218                backtrack.append(n)
219                n = self._df.loc[n, 'parent']
220                for root, tree in sorted_nodes.items():
221                    if n in tree:
222                        sorted_nodes[root] |= set(backtrack)
223                        break
224                else:  # when current subtrees contain none of backtrack, continue backtrack
225                    continue
226                break  # complete backtracking, break and start from next terminal
227            else:  # execute when cur_id run out of nodes, i.e. meeting a new ending
228                sorted_nodes[backtrack[-1]] = set(backtrack)
229
230        trees = [self.build_a_tree(root, nodes, stack=LifoQueue()) for root, nodes in sorted_nodes.items()]
231        if overwrite:
232            self._trees = trees
233        else:
234            self._trees.extend(trees)

Find the largest binary trees within a subset of nodes of an SWC.

With this method, you can build binary trees only upon the 'foreground' nodes in the SWC, so you can get traversals for specific structures. Naturally, if the nodes are not contiguous, there will be multiple binary trees. Here, the root of the whole neuron will not be counted even if provided.

By merging any founded backtracking, it determines which tree each node belongs to, and ensures the trees are as deep as they can be, working like a union-find. The backtracking starts from 'terminals', the nodes without children in the set, rather than the real terminals of the SWC.

After backtracking, it invokes NeuronTree.build_a_tree to build the trees as linked nodes recursively. By default, it uses BinarySubtree as the tree node class. You can change it during initialization.

This method will save the results in the property NeuronTree.binary_trees. It's a list and every time you invoke the function, the new results will be appended, unless you specify to overwrite the whole list.

Parameters
  • index: A sequence containing nodes including the subtrees to search.
  • overwrite: Whether to overwrite the existing trees or extend.
def collective_traversal(self, ordering='pre', lesser_first=True) -> tuple[int]:
236    def collective_traversal(self, ordering='pre', lesser_first=True) -> tuple[int]:
237        """Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority.
238
239        You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees
240        are traversed first than the greater.
241        This method will also chain the traversals of each binary tree into one, by the same order with the traversal in
242        each subtree.
243
244        :param ordering: The traversal type, either 'pre', 'in', or 'post'.
245        :param lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
246        :return: A list of node IDs.
247        """
248
249        trees = sorted(self._trees, key=lambda x: x.tot_path_len(stack=LifoQueue()), reverse=not lesser_first)
250        if ordering == 'pre':
251            traversal = (t.preorder(lesser_first, stack=LifoQueue()) for t in trees)
252        elif ordering == 'in':
253            traversal = (t.inorder(lesser_first, stack=LifoQueue()) for t in trees)
254        elif ordering == 'post':
255            traversal = (t.postorder(lesser_first, stack=LifoQueue()) for t in trees)
256        else:
257            raise "Must specify an ordering among 'pre', 'in' and 'post'."
258        return sum(traversal, ())

Obtain binary tree traversals of three orders for a list of trees and combine them into one by some priority.

You can specify which of pre, in and postorder traversal is performed, as well as whether the lesser subtrees are traversed first than the greater. This method will also chain the traversals of each binary tree into one, by the same order with the traversal in each subtree.

Parameters
  • ordering: The traversal type, either 'pre', 'in', or 'post'.
  • lesser_first: Whether traversing from lesser to greater, affect both within and among subtrees.
Returns

A list of node IDs.

class BinarySubtree:
 12class BinarySubtree:
 13    """Binary tree implemented as linked nodes.
 14    See [the above guide](#binarysubtree) for a more detailed description of its principles.
 15
 16    The reason why the tree is reconstructed instead of using the form of SWC comes from the benefit of node-level
 17    caching and easy recursion design.
 18
 19    Its definition is still different from a general binary tree in some ways. It keeps an order of children by weight
 20    rather, named as 'lesser' and 'greater' than left and right. When it comes to a linear connected node the child will
 21    be lesser.
 22
 23    The weight is calculated as the total path length of subtrees, so
 24    the determination of the subtree structure is necessary to build the parent. The construction of the tree has to be
 25    recursive, and once it's completed, there should be no more modification.
 26
 27    This way, all properties obtained by recursion is definite, so we can apply caching on them, such as the total
 28    path length itself, and traversal.
 29    """
 30
 31    def __init__(self, key: int, coord: Sequence[float], children: Sequence):
 32        """Set up a tree node with already constructed children nodes.
 33
 34        :param key: The SWC node ID.
 35        :param coord: The 3D coordinate of the node.
 36        :param children: A sequence containing at most 2 already constructed child BinarySubtree.
 37        """
 38        self._key = key
 39        self._coord = np.array(coord)
 40        assert len(children) <= 2
 41        self._lesser: Optional[BinarySubtree] = None
 42        self._greater: Optional[BinarySubtree] = None
 43        if len(children) == 2:
 44            if children[0].tot_path_len(stack=LifoQueue()) <= children[1].tot_path_len(stack=LifoQueue()):
 45                self._lesser = children[0]
 46                self._greater = children[1]
 47            else:
 48                self._lesser = children[1]
 49                self._greater = children[0]
 50        elif len(children) == 1:
 51            self._lesser = children[0]
 52
 53    @property
 54    def key(self) -> int:
 55        """Node ID."""
 56        return self._key
 57
 58    @property
 59    def coord(self) -> Sequence[float]:
 60        """Node coordinate."""
 61        return self._coord
 62
 63    @property
 64    def lesser(self) -> 'BinarySubtree':
 65        """Child tree node of the subtree with lesser total path length."""
 66        return self._lesser
 67
 68    @property
 69    def greater(self) -> 'BinarySubtree':
 70        """Child tree node of the subtree with greater total path length."""
 71        return self._greater
 72
 73    @cache
 74    @safe_recursion_bootstrap
 75    def tot_path_len(self, stack: LifoQueue) -> float:
 76        """
 77        Recursive sum of path length of the subtree, implemented as recursion safe, and with a cache feature.
 78        It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`.
 79
 80        Usually this value is already calculated and cached after the tree construction completes.
 81
 82        :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion.
 83        :return: Total path length of this subtree.
 84        """
 85        s = 0.0
 86        if self._lesser is not None:
 87            s += (yield self._lesser.tot_path_len(stack=stack)) + np.linalg.norm(self._coord - self._lesser.coord)
 88        if self._greater is not None:
 89            s += (yield self._greater.tot_path_len(stack=stack)) + np.linalg.norm(self._coord - self._greater.coord)
 90        yield s
 91
 92    @cache
 93    @safe_recursion_bootstrap
 94    def preorder(self, lesser_first, stack: LifoQueue) -> tuple[int]:
 95        """
 96        Preorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
 97        It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`.
 98
 99        :param lesser_first: Whether to traverse the lesser subtree before the greater.
100        :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion.
101        :return: A tuple of node IDs.
102        """
103        a = (yield self._lesser.preorder(lesser_first, stack=stack)) if self._lesser is not None else tuple()
104        b = (yield self._greater.preorder(lesser_first, stack=stack)) if self._greater is not None else tuple()
105        yield (self._key,) + a + b if lesser_first else (self._key,) + b + a
106
107    @cache
108    @safe_recursion_bootstrap
109    def inorder(self, lesser_first, stack: LifoQueue) -> tuple[int]:
110        """
111        Inorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
112        It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`.
113
114        :param lesser_first: Whether to traverse the lesser subtree before the greater.
115        :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion.
116        :return: A tuple of node IDs.
117        """
118        a = (yield self._lesser.inorder(lesser_first, stack=stack)) if self._lesser is not None else tuple()
119        b = (yield self._greater.inorder(lesser_first, stack=stack)) if self._greater is not None else tuple()
120        yield a + (self._key,) + b if lesser_first else b + (self._key,) + a
121
122    @cache
123    @safe_recursion_bootstrap
124    def postorder(self, lesser_first, stack: LifoQueue) -> tuple[int]:
125        """
126        Postorder traversal for this subtree, implemented as recursion safe, and with a cache feature.
127        It can be used as a normal recursion, but must be provided with a new stack of `LifoQueue`.
128
129        :param lesser_first: Whether to traverse the lesser subtree before the greater.
130        :param stack: A newly initialized `LifoQueue` passed by keyword, will be passed down in the recursion.
131        :return: A tuple of node IDs.
132        """
133        a = (yield self._lesser.postorder(lesser_first, stack=stack)) if self._lesser is not None else tuple()
134        b = (yield self._greater.postorder(lesser_first, stack=stack)) if self._greater is not None else tuple()
135        yield a + b + (self._key,) if lesser_first else b + a + (self._key,)

Binary tree implemented as linked nodes. See the above guide for a more detailed description of its principles.

The reason why the tree is reconstructed instead of using the form of SWC comes from the benefit of node-level caching and easy recursion design.

Its definition is still different from a general binary tree in some ways. It keeps an order of children by weight rather, named as 'lesser' and 'greater' than left and right. When it comes to a linear connected node the child will be lesser.

The weight is calculated as the total path length of subtrees, so the determination of the subtree structure is necessary to build the parent. The construction of the tree has to be recursive, and once it's completed, there should be no more modification.

This way, all properties obtained by recursion is definite, so we can apply caching on them, such as the total path length itself, and traversal.

BinarySubtree(key: int, coord: Sequence[float], children: Sequence)
31    def __init__(self, key: int, coord: Sequence[float], children: Sequence):
32        """Set up a tree node with already constructed children nodes.
33
34        :param key: The SWC node ID.
35        :param coord: The 3D coordinate of the node.
36        :param children: A sequence containing at most 2 already constructed child BinarySubtree.
37        """
38        self._key = key
39        self._coord = np.array(coord)
40        assert len(children) <= 2
41        self._lesser: Optional[BinarySubtree] = None
42        self._greater: Optional[BinarySubtree] = None
43        if len(children) == 2:
44            if children[0].tot_path_len(stack=LifoQueue()) <= children[1].tot_path_len(stack=LifoQueue()):
45                self._lesser = children[0]
46                self._greater = children[1]
47            else:
48                self._lesser = children[1]
49                self._greater = children[0]
50        elif len(children) == 1:
51            self._lesser = children[0]

Set up a tree node with already constructed children nodes.

Parameters
  • key: The SWC node ID.
  • coord: The 3D coordinate of the node.
  • children: A sequence containing at most 2 already constructed child BinarySubtree.
key: int

Node ID.

coord: Sequence[float]

Node coordinate.

Child tree node of the subtree with lesser total path length.

Child tree node of the subtree with greater total path length.

def tot_path_len(*args, stack: queue.LifoQueue, **kwargs):
36    def wrapped_func(*args, stack: LifoQueue, **kwargs):
37        if not stack.empty():               # where next goes to, return the generator directly
38            return func(*args, stack=stack, **kwargs)
39        recur = func(*args, stack=stack, **kwargs)       # initialized generator
40        while True:
41            if isinstance(recur, Generator):
42                stack.put(recur)
43                recur = next(recur)
44            else:       # it's a number then, computation done for this branch
45                stack.get()
46                if stack.empty():
47                    break
48                recur = stack.queue[-1].send(recur)     # send the result back to its parent
49        return recur

Recursive sum of path length of the subtree, implemented as recursion safe, and with a cache feature. It can be used as a normal recursion, but must be provided with a new stack of LifoQueue.

Usually this value is already calculated and cached after the tree construction completes.

Parameters
  • stack: A newly initialized LifoQueue passed by keyword, will be passed down in the recursion.
Returns

Total path length of this subtree.

def preorder(*args, stack: queue.LifoQueue, **kwargs):
36    def wrapped_func(*args, stack: LifoQueue, **kwargs):
37        if not stack.empty():               # where next goes to, return the generator directly
38            return func(*args, stack=stack, **kwargs)
39        recur = func(*args, stack=stack, **kwargs)       # initialized generator
40        while True:
41            if isinstance(recur, Generator):
42                stack.put(recur)
43                recur = next(recur)
44            else:       # it's a number then, computation done for this branch
45                stack.get()
46                if stack.empty():
47                    break
48                recur = stack.queue[-1].send(recur)     # send the result back to its parent
49        return recur

Preorder traversal for this subtree, implemented as recursion safe, and with a cache feature. It can be used as a normal recursion, but must be provided with a new stack of LifoQueue.

Parameters
  • lesser_first: Whether to traverse the lesser subtree before the greater.
  • stack: A newly initialized LifoQueue passed by keyword, will be passed down in the recursion.
Returns

A tuple of node IDs.

def inorder(*args, stack: queue.LifoQueue, **kwargs):
36    def wrapped_func(*args, stack: LifoQueue, **kwargs):
37        if not stack.empty():               # where next goes to, return the generator directly
38            return func(*args, stack=stack, **kwargs)
39        recur = func(*args, stack=stack, **kwargs)       # initialized generator
40        while True:
41            if isinstance(recur, Generator):
42                stack.put(recur)
43                recur = next(recur)
44            else:       # it's a number then, computation done for this branch
45                stack.get()
46                if stack.empty():
47                    break
48                recur = stack.queue[-1].send(recur)     # send the result back to its parent
49        return recur

Inorder traversal for this subtree, implemented as recursion safe, and with a cache feature. It can be used as a normal recursion, but must be provided with a new stack of LifoQueue.

Parameters
  • lesser_first: Whether to traverse the lesser subtree before the greater.
  • stack: A newly initialized LifoQueue passed by keyword, will be passed down in the recursion.
Returns

A tuple of node IDs.

def postorder(*args, stack: queue.LifoQueue, **kwargs):
36    def wrapped_func(*args, stack: LifoQueue, **kwargs):
37        if not stack.empty():               # where next goes to, return the generator directly
38            return func(*args, stack=stack, **kwargs)
39        recur = func(*args, stack=stack, **kwargs)       # initialized generator
40        while True:
41            if isinstance(recur, Generator):
42                stack.put(recur)
43                recur = next(recur)
44            else:       # it's a number then, computation done for this branch
45                stack.get()
46                if stack.empty():
47                    break
48                recur = stack.queue[-1].send(recur)     # send the result back to its parent
49        return recur

Postorder traversal for this subtree, implemented as recursion safe, and with a cache feature. It can be used as a normal recursion, but must be provided with a new stack of LifoQueue.

Parameters
  • lesser_first: Whether to traverse the lesser subtree before the greater.
  • stack: A newly initialized LifoQueue passed by keyword, will be passed down in the recursion.
Returns

A tuple of node IDs.