dsmtools.utils.swc
This module contains basic handlers for SWC as pandas dataframes.
What is SWC
An SWC is a text file containing a space-delimited table that holds all the geometric data required for reconstructing a neuron morphology. Usually with 7 columns, illustrated by:
# n type x y z r parent
1 1 0 0 0 1 -1
2 1 2 0 0 1 1
3 2 -3 0 0 0.7 1
4 3 20 0 0 1 2
- The 1st column is node ID, must be a positive integer.
- The 2nd column is node type: 1 for soma, 2 for axon, 3 for basal dendrite, 4 for apical dendrite.
- The 3rd-5th columns are the 3D coordinates of the node in space.
- The 6th column is the node radius.
- The 7th column is the parent node ID, corresponding to some record in the first column. As all node IDs are positive, -1 is used for no parent, and the node without a parent is a root.
SWC Files allow you to add comments started in each line with '#', including the header. The naming of the columns in the file, as such, does not really matter. The dataframe column names used in this package are consistent with the previous example.
Pandas Dataframe
The pandas package is perfect for handling table data like what SWC files typically hold. The retrieval of nodes as records is based on the hash table index of node IDs, and nodes can also be easily filtered with logical operations.
However, as objects in Python are usually passed by reference and pandas indexing can introduce shallow copy of dataframes, you should make yourself familiar with the usage of the pandas package and avoid the problems for yourself. Functions in this module have already been implemented in a way that is minimally bug-causing.
1"""This module contains basic handlers for SWC as pandas dataframes. 2 3## What is SWC 4 5An SWC is a text file containing a space-delimited table that holds all the geometric data required for 6reconstructing a neuron morphology. Usually with 7 columns, illustrated by: 7 8``` 9# n type x y z r parent 101 1 0 0 0 1 -1 112 1 2 0 0 1 1 123 2 -3 0 0 0.7 1 134 3 20 0 0 1 2 14``` 15 16* The 1st column is node ID, must be a positive integer. 17* The 2nd column is node type: 1 for soma, 2 for axon, 3 for basal dendrite, 4 for apical dendrite. 18* The 3rd-5th columns are the 3D coordinates of the node in space. 19* The 6th column is the node radius. 20* The 7th column is the parent node ID, corresponding to some record in the first column. As all node IDs are positive, 21-1 is used for no parent, and the node without a parent is a root. 22 23SWC Files allow you to add comments started in each line with '#', including the header. The naming of the columns 24in the file, as such, does not really matter. The dataframe column names used in this package 25are consistent with the previous example. 26 27## Pandas Dataframe 28 29The [pandas](https://pandas.pydata.org/) package is perfect for handling table data like what SWC files typically hold. 30The retrieval of nodes as 31records is based on the hash table index of node IDs, and nodes can also be easily filtered with logical operations. 32 33However, as objects in Python are usually passed by reference and pandas indexing can introduce shallow copy of 34dataframes, you should make yourself familiar with the usage of the pandas package and avoid the problems for yourself. 35Functions in this module have already been implemented in a way that is minimally bug-causing. 36""" 37import warnings 38from typing import Iterable, Union, Sequence 39from os import PathLike 40from enum import IntEnum 41from queue import LifoQueue, SimpleQueue 42from .misc import safe_recursion_bootstrap 43 44import numpy as np 45import pandas as pd 46 47 48__all__ = ['read_swc', 'sort_swc', 'get_child_dict', 'get_path_len', 'get_segments', 'get_subtree_nodes'] 49 50 51class Type(IntEnum): 52 """ 53 An IntEnum class for clear type identification in the code. Numbering is consistent with the general SWC protocol. 54 """ 55 SOMA = 1 56 AXON = 2 57 BASAL = 3 58 APICAL = 4 59 60 61def read_swc(path: Union[str, 'PathLike[str]'], more_features: Sequence[str] = tuple()) -> pd.DataFrame: 62 """A wrapper reading function for SWC based on `pandas.read_csv`. 63 64 This wrapper does the following job more than just load the 7-column table: 65 66 * If your table is delimited by comma, it's still readable with this wrapper. 67 * Specification of `more_features` with more column names to load more features on the nodes. 68 69 :param path: A string or path like object to an SWC file. 70 :param more_features: A sequence of column names with proper order, default as empty. 71 :return: A `pandas.DataFrame`. 72 """ 73 n_skip = 0 74 with open(path, "r") as f: 75 for line in f.readlines(): 76 line = line.strip() 77 if line.startswith("#"): 78 n_skip += 1 79 else: 80 break 81 names = ["n", "type", "x", "y", "z", "r", "parent"] 82 used_cols = [0, 1, 2, 3, 4, 5, 6] 83 if more_features: 84 names.extend(more_features) 85 used_cols = list(range(len(used_cols) + len(more_features))) 86 try: 87 df = pd.read_csv(path, index_col=0, skiprows=n_skip, sep=" ", 88 usecols=used_cols, 89 names=names 90 ) 91 except: 92 df = pd.read_csv(path, index_col=0, skiprows=n_skip, sep=",", 93 usecols=used_cols, 94 names=names 95 ) 96 if df['parent'].iloc[0] != -1: 97 print('In func readSWC: ', path.split('\\')[-1], ' not sorted') 98 return df 99 100 101def sort_swc(swc: pd.DataFrame) -> None: 102 """Use recursion to give the SWC a new set of id starting from 1, and make sure the ID of parents are smaller than 103 their children. 104 105 The SWC has to contain only 1 root (parent as -1), and the modification is inplace (doesn't return a new dataframe). 106 As you may ask, it is different from the Vaa3D *sort_neuron_swc* plugin, it doesn't use a LUT and node with the same 107 coordinates and radius won't be merged, which is why only a single root is allowed. 108 109 The recursion used by this piece of code is safe from overflow. You don't need to reset the recursion limit. Yet, it 110 doesn't mean it wouldn't fall into an endless loop. Try not to input a ring-like structure. 111 112 :param swc: An SWC dataframe to be sorted. 113 """ 114 115 assert 'n' not in swc.columns 116 root = swc.loc[swc['parent'] == -1].index 117 assert len(root) == 1 118 root = root[0] 119 120 child_dict = get_child_dict(swc) 121 ind = [0] 122 swc['n'] = -1 123 124 @safe_recursion_bootstrap 125 def dfs(node: int, stack: LifoQueue): 126 ind[0] += 1 127 swc.loc[node, 'n'] = ind[0] 128 for i in child_dict[node]: 129 yield dfs(i, stack=stack) 130 yield 131 132 dfs(root, stack=LifoQueue()) 133 par_to_change = swc.loc[swc['parent'] != -1, 'parent'] 134 swc.loc[par_to_change.index, 'parent'] = swc.loc[par_to_change, 'n'].tolist() 135 swc.set_index('n', inplace=True) 136 swc.sort_index(inplace=True) 137 138 139def get_child_dict(swc: pd.DataFrame) -> dict[int, list[int]]: 140 """Get a Python dict object that maps from the SWC node IDs to their children. 141 142 As for the SWC dataframe, you can find the parent for a node by index a hash table, but it's not so straightforward 143 for locating their children. Therefore, a one-to-many mapping is needed. This function iterates through the SWC 144 to document the children for each node as a list, and assemble them as a dict, so that you can find a node's 145 children with constant time as well. 146 147 Nodes without any children will be given an empty list. 148 149 :param swc: An SWC dataframe, no modification takes place. 150 :return:A Python dict mapping from node ID as int to lists of children node IDs. 151 """ 152 child_dict = dict([(i, []) for i in swc.index]) 153 for ind, row in swc.iterrows(): 154 p_idx = int(row['parent']) 155 if p_idx in child_dict: 156 child_dict[p_idx].append(ind) 157 else: 158 if p_idx != -1: 159 warnings.warn(f"Node index {p_idx} doesn't exist in SWC, but referenced as parent.") 160 return child_dict 161 162 163def get_path_len(swc: pd.DataFrame, nodes: Iterable[int] = None) -> float: 164 """Calculate the total path length within an SWC, given a subset of node IDs. 165 166 The path length is defined as the sum of the cartesian distance between pairs of connected nodes, which 167 can be a good indicator for the size of a neuron structure. 168 169 This function also offers an argument to select only part of the neuron tree to calculate the path length. The nodes 170 specified are seen as children nodes. 171 172 This SWC can also be one without a parent. The aggregation overlooks nodes without parent found. 173 174 :param swc: An SWC dataframe, no modification takes place. 175 :param nodes: A set of node indices as children among the connections. 176 :return: The sum of path length. 177 """ 178 if nodes is None: 179 down_nodes = swc.index[swc['parent'].isin(swc.index)] 180 else: 181 par = swc.loc[pd.Index(nodes).intersection(swc.index), 'parent'] # input nodes must be in index 182 down_nodes = par.index[par.isin(swc.index)] # their parents must be in index 183 up_nodes = swc.loc[down_nodes, 'parent'] 184 return np.linalg.norm(swc.loc[down_nodes, ['x', 'y', 'z']].values - 185 swc.loc[up_nodes, ['x', 'y', 'z']].values, axis=1).sum() 186 187 188def get_segments(swc: pd.DataFrame) -> pd.DataFrame: 189 """Convert SWC to a new dataframe of segments. 190 191 A segment is defined as a series connected points between critical nodes (root, branches & terminals), 192 usually start from the first node out of the branching point and ends at the next branching point or terminal. 193 194 The conversion can manifest the topology of a neuron tree and convenience some operations at segment levels, such 195 as pruning. 196 197 Utilizing pandas powerful expressiveness, this function will set up a new dataframe object formatted like: 198 199 | tail(index) | nodes | parentSeg | childSeg | 200 |-------------|-------------------|-----------|-------------| 201 | 1 | [1] | -1 | [2, 20, 90] | 202 | ... | ... | ... | ... | 203 | 26 | [26, 25, ..., 18] | 17 | [27, 50] | 204 | ... | ... | ... | ... | 205 206 The index is set as the end point of each segment, which are unique. The *nodes* field contains the list of nodes 207 for the segment from the end to the start. The *parentSeg* field is similar to the parent column of and SWC. 208 Also, the *childSeg* field functions like what `get_child_dict` would do for an SWC. 209 210 Root nodes are seen as a single segment, with *parentSeg* as -1. 211 212 :param swc: An SWC dataframe, no modification takes place. 213 :return: A new dataframe similar to SWC, seeing segments as nodes. 214 """ 215 children_counts = swc['parent'].value_counts() 216 branch = children_counts.index[children_counts.values > 1] 217 terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index) 218 219 # assign each node with a segment id 220 # seg: from soam_far 2 soam_near or from terminal2(branch-1) or branch2(branch2branch-1) 221 # (down->up) 222 segments = pd.DataFrame(columns=['tail', 'nodes', 'parentSeg', 'childSeg']) 223 segments.set_index('tail', inplace=True) 224 temp_index = set(swc.index) 225 for ind in terminal: 226 seg = [] 227 while True: 228 seg.append(ind) 229 temp_index.remove(ind) 230 ind = swc.loc[ind, 'parent'] 231 if ind in branch or ind == -1: # soma will be a single node segment if it's a branch point 232 segments.loc[seg[0]] = [seg, ind, []] 233 seg = [] 234 if ind not in temp_index: 235 break 236 237 for ind in segments.index.intersection(branch): 238 child_seg_id = segments.index[segments['parentSeg'] == ind].tolist() 239 segments.at[ind, 'childSeg'] = child_seg_id 240 segments.loc[child_seg_id, 'parentSeg'] = ind 241 242 return segments 243 244 245def get_subtree_nodes(swc: pd.DataFrame, n_start: int) -> list[int]: 246 """Retrieve IDs of all nodes under a specified node in an SWC by BFS. 247 248 **Note**: there should be no ring in the SWC, or it will raise an error. 249 250 :param swc: An SWC dataframe, no modification takes place. 251 :param n_start: the node ID of the subtree root to start with. 252 :return: A list of node IDs. 253 """ 254 sq = SimpleQueue() 255 sq.put(n_start) 256 visited = set() 257 child_dict = get_child_dict(swc) 258 while not sq.empty(): 259 head = sq.get() 260 visited.add(head) 261 for ch in child_dict[head]: 262 if ch in visited: 263 raise RuntimeError("Detected a ring in the SWC, couldn't a get subtree.") 264 else: 265 sq.put(ch) 266 return list(visited)
62def read_swc(path: Union[str, 'PathLike[str]'], more_features: Sequence[str] = tuple()) -> pd.DataFrame: 63 """A wrapper reading function for SWC based on `pandas.read_csv`. 64 65 This wrapper does the following job more than just load the 7-column table: 66 67 * If your table is delimited by comma, it's still readable with this wrapper. 68 * Specification of `more_features` with more column names to load more features on the nodes. 69 70 :param path: A string or path like object to an SWC file. 71 :param more_features: A sequence of column names with proper order, default as empty. 72 :return: A `pandas.DataFrame`. 73 """ 74 n_skip = 0 75 with open(path, "r") as f: 76 for line in f.readlines(): 77 line = line.strip() 78 if line.startswith("#"): 79 n_skip += 1 80 else: 81 break 82 names = ["n", "type", "x", "y", "z", "r", "parent"] 83 used_cols = [0, 1, 2, 3, 4, 5, 6] 84 if more_features: 85 names.extend(more_features) 86 used_cols = list(range(len(used_cols) + len(more_features))) 87 try: 88 df = pd.read_csv(path, index_col=0, skiprows=n_skip, sep=" ", 89 usecols=used_cols, 90 names=names 91 ) 92 except: 93 df = pd.read_csv(path, index_col=0, skiprows=n_skip, sep=",", 94 usecols=used_cols, 95 names=names 96 ) 97 if df['parent'].iloc[0] != -1: 98 print('In func readSWC: ', path.split('\\')[-1], ' not sorted') 99 return df
A wrapper reading function for SWC based on pandas.read_csv
.
This wrapper does the following job more than just load the 7-column table:
- If your table is delimited by comma, it's still readable with this wrapper.
- Specification of
more_features
with more column names to load more features on the nodes.
Parameters
- path: A string or path like object to an SWC file.
- more_features: A sequence of column names with proper order, default as empty.
Returns
A
pandas.DataFrame
.
102def sort_swc(swc: pd.DataFrame) -> None: 103 """Use recursion to give the SWC a new set of id starting from 1, and make sure the ID of parents are smaller than 104 their children. 105 106 The SWC has to contain only 1 root (parent as -1), and the modification is inplace (doesn't return a new dataframe). 107 As you may ask, it is different from the Vaa3D *sort_neuron_swc* plugin, it doesn't use a LUT and node with the same 108 coordinates and radius won't be merged, which is why only a single root is allowed. 109 110 The recursion used by this piece of code is safe from overflow. You don't need to reset the recursion limit. Yet, it 111 doesn't mean it wouldn't fall into an endless loop. Try not to input a ring-like structure. 112 113 :param swc: An SWC dataframe to be sorted. 114 """ 115 116 assert 'n' not in swc.columns 117 root = swc.loc[swc['parent'] == -1].index 118 assert len(root) == 1 119 root = root[0] 120 121 child_dict = get_child_dict(swc) 122 ind = [0] 123 swc['n'] = -1 124 125 @safe_recursion_bootstrap 126 def dfs(node: int, stack: LifoQueue): 127 ind[0] += 1 128 swc.loc[node, 'n'] = ind[0] 129 for i in child_dict[node]: 130 yield dfs(i, stack=stack) 131 yield 132 133 dfs(root, stack=LifoQueue()) 134 par_to_change = swc.loc[swc['parent'] != -1, 'parent'] 135 swc.loc[par_to_change.index, 'parent'] = swc.loc[par_to_change, 'n'].tolist() 136 swc.set_index('n', inplace=True) 137 swc.sort_index(inplace=True)
Use recursion to give the SWC a new set of id starting from 1, and make sure the ID of parents are smaller than their children.
The SWC has to contain only 1 root (parent as -1), and the modification is inplace (doesn't return a new dataframe). As you may ask, it is different from the Vaa3D sort_neuron_swc plugin, it doesn't use a LUT and node with the same coordinates and radius won't be merged, which is why only a single root is allowed.
The recursion used by this piece of code is safe from overflow. You don't need to reset the recursion limit. Yet, it doesn't mean it wouldn't fall into an endless loop. Try not to input a ring-like structure.
Parameters
- swc: An SWC dataframe to be sorted.
140def get_child_dict(swc: pd.DataFrame) -> dict[int, list[int]]: 141 """Get a Python dict object that maps from the SWC node IDs to their children. 142 143 As for the SWC dataframe, you can find the parent for a node by index a hash table, but it's not so straightforward 144 for locating their children. Therefore, a one-to-many mapping is needed. This function iterates through the SWC 145 to document the children for each node as a list, and assemble them as a dict, so that you can find a node's 146 children with constant time as well. 147 148 Nodes without any children will be given an empty list. 149 150 :param swc: An SWC dataframe, no modification takes place. 151 :return:A Python dict mapping from node ID as int to lists of children node IDs. 152 """ 153 child_dict = dict([(i, []) for i in swc.index]) 154 for ind, row in swc.iterrows(): 155 p_idx = int(row['parent']) 156 if p_idx in child_dict: 157 child_dict[p_idx].append(ind) 158 else: 159 if p_idx != -1: 160 warnings.warn(f"Node index {p_idx} doesn't exist in SWC, but referenced as parent.") 161 return child_dict
Get a Python dict object that maps from the SWC node IDs to their children.
As for the SWC dataframe, you can find the parent for a node by index a hash table, but it's not so straightforward for locating their children. Therefore, a one-to-many mapping is needed. This function iterates through the SWC to document the children for each node as a list, and assemble them as a dict, so that you can find a node's children with constant time as well.
Nodes without any children will be given an empty list.
Parameters
- swc: An SWC dataframe, no modification takes place.
Returns
A Python dict mapping from node ID as int to lists of children node IDs.
164def get_path_len(swc: pd.DataFrame, nodes: Iterable[int] = None) -> float: 165 """Calculate the total path length within an SWC, given a subset of node IDs. 166 167 The path length is defined as the sum of the cartesian distance between pairs of connected nodes, which 168 can be a good indicator for the size of a neuron structure. 169 170 This function also offers an argument to select only part of the neuron tree to calculate the path length. The nodes 171 specified are seen as children nodes. 172 173 This SWC can also be one without a parent. The aggregation overlooks nodes without parent found. 174 175 :param swc: An SWC dataframe, no modification takes place. 176 :param nodes: A set of node indices as children among the connections. 177 :return: The sum of path length. 178 """ 179 if nodes is None: 180 down_nodes = swc.index[swc['parent'].isin(swc.index)] 181 else: 182 par = swc.loc[pd.Index(nodes).intersection(swc.index), 'parent'] # input nodes must be in index 183 down_nodes = par.index[par.isin(swc.index)] # their parents must be in index 184 up_nodes = swc.loc[down_nodes, 'parent'] 185 return np.linalg.norm(swc.loc[down_nodes, ['x', 'y', 'z']].values - 186 swc.loc[up_nodes, ['x', 'y', 'z']].values, axis=1).sum()
Calculate the total path length within an SWC, given a subset of node IDs.
The path length is defined as the sum of the cartesian distance between pairs of connected nodes, which can be a good indicator for the size of a neuron structure.
This function also offers an argument to select only part of the neuron tree to calculate the path length. The nodes specified are seen as children nodes.
This SWC can also be one without a parent. The aggregation overlooks nodes without parent found.
Parameters
- swc: An SWC dataframe, no modification takes place.
- nodes: A set of node indices as children among the connections.
Returns
The sum of path length.
189def get_segments(swc: pd.DataFrame) -> pd.DataFrame: 190 """Convert SWC to a new dataframe of segments. 191 192 A segment is defined as a series connected points between critical nodes (root, branches & terminals), 193 usually start from the first node out of the branching point and ends at the next branching point or terminal. 194 195 The conversion can manifest the topology of a neuron tree and convenience some operations at segment levels, such 196 as pruning. 197 198 Utilizing pandas powerful expressiveness, this function will set up a new dataframe object formatted like: 199 200 | tail(index) | nodes | parentSeg | childSeg | 201 |-------------|-------------------|-----------|-------------| 202 | 1 | [1] | -1 | [2, 20, 90] | 203 | ... | ... | ... | ... | 204 | 26 | [26, 25, ..., 18] | 17 | [27, 50] | 205 | ... | ... | ... | ... | 206 207 The index is set as the end point of each segment, which are unique. The *nodes* field contains the list of nodes 208 for the segment from the end to the start. The *parentSeg* field is similar to the parent column of and SWC. 209 Also, the *childSeg* field functions like what `get_child_dict` would do for an SWC. 210 211 Root nodes are seen as a single segment, with *parentSeg* as -1. 212 213 :param swc: An SWC dataframe, no modification takes place. 214 :return: A new dataframe similar to SWC, seeing segments as nodes. 215 """ 216 children_counts = swc['parent'].value_counts() 217 branch = children_counts.index[children_counts.values > 1] 218 terminal = swc.index.difference(swc.loc[filter(lambda x: x != -1, swc['parent'])].index) 219 220 # assign each node with a segment id 221 # seg: from soam_far 2 soam_near or from terminal2(branch-1) or branch2(branch2branch-1) 222 # (down->up) 223 segments = pd.DataFrame(columns=['tail', 'nodes', 'parentSeg', 'childSeg']) 224 segments.set_index('tail', inplace=True) 225 temp_index = set(swc.index) 226 for ind in terminal: 227 seg = [] 228 while True: 229 seg.append(ind) 230 temp_index.remove(ind) 231 ind = swc.loc[ind, 'parent'] 232 if ind in branch or ind == -1: # soma will be a single node segment if it's a branch point 233 segments.loc[seg[0]] = [seg, ind, []] 234 seg = [] 235 if ind not in temp_index: 236 break 237 238 for ind in segments.index.intersection(branch): 239 child_seg_id = segments.index[segments['parentSeg'] == ind].tolist() 240 segments.at[ind, 'childSeg'] = child_seg_id 241 segments.loc[child_seg_id, 'parentSeg'] = ind 242 243 return segments
Convert SWC to a new dataframe of segments.
A segment is defined as a series connected points between critical nodes (root, branches & terminals), usually start from the first node out of the branching point and ends at the next branching point or terminal.
The conversion can manifest the topology of a neuron tree and convenience some operations at segment levels, such as pruning.
Utilizing pandas powerful expressiveness, this function will set up a new dataframe object formatted like:
tail(index) | nodes | parentSeg | childSeg |
---|---|---|---|
1 | [1] | -1 | [2, 20, 90] |
... | ... | ... | ... |
26 | [26, 25, ..., 18] | 17 | [27, 50] |
... | ... | ... | ... |
The index is set as the end point of each segment, which are unique. The nodes field contains the list of nodes
for the segment from the end to the start. The parentSeg field is similar to the parent column of and SWC.
Also, the childSeg field functions like what get_child_dict
would do for an SWC.
Root nodes are seen as a single segment, with parentSeg as -1.
Parameters
- swc: An SWC dataframe, no modification takes place.
Returns
A new dataframe similar to SWC, seeing segments as nodes.
246def get_subtree_nodes(swc: pd.DataFrame, n_start: int) -> list[int]: 247 """Retrieve IDs of all nodes under a specified node in an SWC by BFS. 248 249 **Note**: there should be no ring in the SWC, or it will raise an error. 250 251 :param swc: An SWC dataframe, no modification takes place. 252 :param n_start: the node ID of the subtree root to start with. 253 :return: A list of node IDs. 254 """ 255 sq = SimpleQueue() 256 sq.put(n_start) 257 visited = set() 258 child_dict = get_child_dict(swc) 259 while not sq.empty(): 260 head = sq.get() 261 visited.add(head) 262 for ch in child_dict[head]: 263 if ch in visited: 264 raise RuntimeError("Detected a ring in the SWC, couldn't a get subtree.") 265 else: 266 sq.put(ch) 267 return list(visited)
Retrieve IDs of all nodes under a specified node in an SWC by BFS.
Note: there should be no ring in the SWC, or it will raise an error.
Parameters
- swc: An SWC dataframe, no modification takes place.
- n_start: the node ID of the subtree root to start with.
Returns
A list of node IDs.