TY - JOUR

T1 - A STRUCTURE THEOREM FOR ROOTED BINARY PHYLOGENETIC NETWORKS AND ITS IMPLICATIONS FOR TREE-BASED NETWORKS

AU - Hayamizu, Momoko

N1 - Funding Information:
\ast Received by the editors November 4, 2019; accepted for publication (in revised form) June 27, 2021; published electronically October 19, 2021. https://doi.org/10.1137/19M1297403 Funding: This work was funded by JST PRESTO grants JPMJPR16EB and JPMJPR1929. \dagger Department of Statistical Modeling, The Institute of Statistical Mathematics, and JST PRESTO, Tokyo, Japan. Current address: Department of Applied Mathematics, Waseda University, and JST PRESTO, Tokyo, Japan (hayamizu@waseda.jp).
Publisher Copyright:
© 2021 SIAM.

PY - 2021

Y1 - 2021

N2 - Attempting to recognize a tree inside a phylogenetic network is a fundamental undertaking in evolutionary analysis. In the last few years, therefore, "tree-based"phylogenetic networks, which are defined by a spanning tree called a "subdivision tree"that is an embedding of a phylogenetic tree on the same leaf-set, have attracted the attention of many theoretical biologists. However, the application of such networks is still not easy, due to many important computational problems whose time complexities are unknown or not clearly understood. In this paper, we provide a general framework for solving those various old or new problems on tree-based phylogenetic networks from a coherent perspective, rather than analyzing the complexity of each individual problem or developing an algorithm one by one. More precisely, we establish a structure theorem that gives a way to canonically decompose any rooted binary phylogenetic network N into maximal zig-zag trails that are uniquely determined by N, and furthermore use it to characterize the set of subdivision trees of N in the form of a direct product. From these main results, we derive a series of linear time (and linear time delay) algorithms for solving the following problems: given a rooted binary phylogenetic network N, (1) determine whether or not N has a subdivision tree and find one if there exists any (decision/search problems); (2) measure the deviation of N from being tree-based (deviation quantification problem); (3) compute the number of subdivision trees of N (counting problem); (4) list all subdivision trees of N (enumeration problem); and (5) find a subdivision tree to maximize or minimize a prescribed objective function (optimization problem). All algorithms proposed here are optimal in terms of time complexity. Our results not only imply and unify various known results in the relevant literature, but also answer many open questions and moreover enable novel applications, such as the estimation of a maximum likelihood tree underlying a tree-based network. The results and algorithms in this paper also apply for a special class of rooted nonbinary phylogenetic networks.

AB - Attempting to recognize a tree inside a phylogenetic network is a fundamental undertaking in evolutionary analysis. In the last few years, therefore, "tree-based"phylogenetic networks, which are defined by a spanning tree called a "subdivision tree"that is an embedding of a phylogenetic tree on the same leaf-set, have attracted the attention of many theoretical biologists. However, the application of such networks is still not easy, due to many important computational problems whose time complexities are unknown or not clearly understood. In this paper, we provide a general framework for solving those various old or new problems on tree-based phylogenetic networks from a coherent perspective, rather than analyzing the complexity of each individual problem or developing an algorithm one by one. More precisely, we establish a structure theorem that gives a way to canonically decompose any rooted binary phylogenetic network N into maximal zig-zag trails that are uniquely determined by N, and furthermore use it to characterize the set of subdivision trees of N in the form of a direct product. From these main results, we derive a series of linear time (and linear time delay) algorithms for solving the following problems: given a rooted binary phylogenetic network N, (1) determine whether or not N has a subdivision tree and find one if there exists any (decision/search problems); (2) measure the deviation of N from being tree-based (deviation quantification problem); (3) compute the number of subdivision trees of N (counting problem); (4) list all subdivision trees of N (enumeration problem); and (5) find a subdivision tree to maximize or minimize a prescribed objective function (optimization problem). All algorithms proposed here are optimal in terms of time complexity. Our results not only imply and unify various known results in the relevant literature, but also answer many open questions and moreover enable novel applications, such as the estimation of a maximum likelihood tree underlying a tree-based network. The results and algorithms in this paper also apply for a special class of rooted nonbinary phylogenetic networks.

KW - counting

KW - decision/search

KW - deviation quantification

KW - enumeration

KW - optimization

KW - phylogenetic network

KW - phylogenetic tree

KW - subdivision tree

KW - tree-based network

UR - http://www.scopus.com/inward/record.url?scp=85133230113&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85133230113&partnerID=8YFLogxK

U2 - 10.1137/19M1297403

DO - 10.1137/19M1297403

M3 - Article

AN - SCOPUS:85133230113

SN - 0895-4801

VL - 35

SP - 2490

EP - 2516

JO - SIAM Journal on Discrete Mathematics

JF - SIAM Journal on Discrete Mathematics

IS - 4

ER -