SuchTree Class API Reference
Overview
The SuchTree class provides high-performance phylogenetic tree manipulation using Cython. It supports:
- Fast tree traversal and node queries
- Patristic distance calculations
- Topological analysis
- Multiple tree formats (Newick, URL, file path)
- Integration with NetworkX and igraph
Some best practices to keep in mind :
- Use node IDs for performance-critical code
- Prefer bulk methods (
distances_bulk) for multiple calculations - Cache frequently-used properties (like RED values)
- Use traversal generators for memory efficiency
The supporting SuchLinkedTrees class provides several useful bookkeeping features for working with
co-phylogeny datasets, most importantly the ability to perform reciprocal masking of one tree by
clades in its linked tree. Subset masking is thread-safe, fast and non-destructive. However, please
note that SuchTree 1.3 is the last release that will include this interface for SuchLinkedTrees. The
next major release will include a modernized, multi-tree container with a new interface.
Initialization
class SuchTree(tree_input: Union[str, Path])
Example:
tree = SuchTree("(A:0.1,B:0.2,(C:0.3,D:0.4));")
tree = SuchTree("https://example.com/tree.newick")
Core Properties
Tree Structure
| Property | Type | Description | Example |
|---|---|---|---|
size |
int |
Total nodes | tree.size → 7 |
depth |
int |
Max depth | tree.depth → 3 |
num_leaves |
int |
Leaf count | tree.num_leaves → 4 |
root_node |
int |
Root ID | tree.root_node → 0 |
polytomy_epsilon |
float |
Polytomy resolution | 1e-20 |
Node Collections
| Property | Type | Description | Contains |
|---|---|---|---|
leaves |
Dict[str, int] |
Name → ID | {'A': 1, 'B': 2} |
leaf_nodes |
Dict[int, str] |
ID → Name | {1: 'A', 2: 'B'} |
internal_nodes |
np.ndarray |
Internal IDs | [0, 3, 4] |
all_nodes |
np.ndarray |
All IDs | [0, 1, 2, 3, 4, 5, 6] |
leaf_node_ids |
np.ndarray |
Leaf IDs | [1, 2, 5, 6] |
leaf_names |
list |
Leaf names | ['A', 'B', 'C', 'D'] |
Example Usage
Here are some examples of operations you can do with SuchTree.
# Initialize a tree and print some basic properties
tree = SuchTree("(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;")
print(f"Tree depth: {tree.depth}")
print(f"Leaf names: {tree.leaf_names}")
# Find some node relationships
node_id = tree.leaves['A']
parent_id = tree.get_parent(node_id)
children = tree.get_children(parent_id)
# Distance analysis
dist = tree.distance('A', 'C')
dist_matrix = tree.pairwise_distances(['A', 'B', 'C', 'D'])
# Calculate RED values
red_values = tree.relative_evolutionary_divergence()
# Export tree as a networkx graph (requires networkx)
nx_graph = tree.to_networkx_graph()
Error Handling
NodeNotFoundError: Invalid leaf nameInvalidNodeError: Invalid node IDTreeStructureError: Invalid tree operationsValueError: Invalid input formatTypeError: Incorrect argument type
Core Methods
Node Relationships
get_parent(node: Union[int, str]) -> int
Args: node: Node identifier as either integer ID or leaf name string
Returns: Integer ID of parent node. Returns -1 if called on root node.
Raises: NodeNotFoundError: If leaf name doesn't exist in the tree InvalidNodeError: If node ID is out of valid range (0 <= id < tree.size)
Example:
parent_id = tree.get_parent("A")
parent_id = tree.get_parent(5)
get_children(node: Union[int, str]) -> Tuple[int, int]
Args: node: Node identifier as either integer ID or leaf name string
Returns: Tuple of (left_child, right_child) node IDs. Returns (-1, -1) for leaf nodes.
Raises: NodeNotFoundError: If leaf name doesn't exist InvalidNodeError: If node ID is invalid
Note:
For multifurcating trees, only the first two children are returned. Use
traverse_children() method for complete child iteration.
get_ancestors(node: Union[int, str]) -> Generator[int, None, None]
get_descendants(node_id: int) -> Generator[int, None, None]
get_leaves(node: Union[int, str]) -> np.ndarray
get_support(node: Union[int, str]) -> float
Tree Navigation
is_leaf(node: Union[int, str]) -> bool
is_internal(node: Union[int, str]) -> bool
is_leaf but provides clearer intent.
is_ancestor(ancestor: Union[int, str], descendant: Union[int, str]) -> int
1if ancestor of descendant-1if descendant is ancestor0if no direct relationship
is_descendant(descendant: Union[int, str], ancestor: Union[int, str]) -> bool
is_root(node: Union[int, str]) -> bool
is_sibling(node1: Union[int, str], node2: Union[int, str]) -> bool
has_children(node: Union[int, str]) -> bool
is_internal but may be more intuitive for some users.
has_parent(node: Union[int, str]) -> bool
is_root.
common_ancestor(a: Union[int, str], b: Union[int, str]) -> int
path_between_nodes(a: Union[int, str], b: Union[int, str]) -> List[int]
Distance Analysis
SuchTree was originally built to compute patristic (leaf-to-leaf) distances as efficiently as possible. You have two choices to make :
- Do you want to use leaf names or leaf IDs?
- Do you want to to calculate distances for single a pair of leafs, or a table of leaf pairs?
It is important to remember that even if you have two trees with exactly the same
leaf names, the leafs will have different node IDs if the topologies are different.
For those cases, it is better to use use the leaf name to ID mappings (SuchTree.leaves
and SuchTree.leaf_nodes) to associate your leaf nodes with other data.
distance(a: Union[int, str], b: Union[int, str]) -> float
Args: a: First node identifier (ID or name) b: Second node identifier (ID or name)
Returns: Sum of branch lengths along the path between nodes via their most recent common ancestor (MRCA)
Raises: NodeNotFoundError: If either node name doesn't exist InvalidNodeError: If either node ID is invalid
Complexity: O(h) where h is the height of the tree. Uses cached ancestor paths for optimal performance.
Example:
dist = tree.distance("A", "B")
dist = tree.distance(2, 5)
distances_bulk(pairs: np.ndarray) -> np.ndarray
distances_by_name(pairs: List[Tuple[str, str]]) -> List[float]
pairwise_distances(nodes: list = None) -> np.ndarray
distance_to_root(node: Union[int, str]) -> float
nearest_neighbors(node: Union[int, str], k=1) -> List[Tuple[Union[int, str], float]]
Tree Traversal
SuchTree includes a collection of generators for traversing the tree in different ways. They all work the same way, other than the different order of traversal.
traverse_inorder(include_distances: bool = True) -> Generator
traverse_preorder(from_node: Union[int, str] = None) -> Generator
"""
Iterate through nodes in pre-order traversal (parent before children).
Args:
from_node: Starting node (default: root). Can be ID or name.
Yields:
Node IDs in traversal order
Raises:
NodeNotFoundError: If from_node name doesn't exist
InvalidNodeError: If from_node ID is invalid
Memory:
O(h) space complexity due to stack implementation, where h is tree height
Example:
```python
for node_id in tree.traverse_preorder():
print(f"Visiting node {node_id}")
traverse_postorder(from_node: Union[int, str] = None) -> Generator
traverse_levelorder(from_node: Union[int, str] = None) -> Generator
traverse_leaves_only(from_node: Union[int, str] = None) -> Generator
traverse_internal_only(from_node: Union[int, str] = None) -> Generator
traverse_with_depth(from_node: Union[int, str] = None) -> Generator[Tuple[int, int], None, None]
traverse_with_distances(from_node: Union[int, str] = None) -> Generator[Tuple[int, float, float], None, None]
Topological Analysis
bipartition(node: Union[int, str], by_id=False) -> frozenset
bipartitions(by_id=False) -> Generator[frozenset, None, None]
quartet_topology(a: Union[int, str], b: Union[int, str], c: Union[int, str], d: Union[int, str]) -> frozenset
quartet_topologies_bulk(quartets: Union[list, np.ndarray]) -> np.ndarray
quartet_topologies_by_name(quartets: List[Tuple[str, str, str, str]]) -> List[frozenset]
Graph Operations
adjacency_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
laplacian_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
incidence_matrix(from_node: Union[int, str] = None) -> Dict[str, Any]
degree_sequence(from_node: Union[int, str] = None) -> Dict[str, Any]
Export & Conversion
to_networkx_graph(from_node: Union[int, str] = None) -> 'networkx.Graph'
to_newick(include_support=True, include_distances=True) -> str
relative_evolutionary_divergence() -> Dict[int, float]