TY - GEN
T1 - Cluster analysis of regulatory sequences with a log likelihood ratio statistics-based similarity measure
AU - Zheng, Huiru
AU - Wang, Haiying
AU - Hu, Jinglu
PY - 2007/12/1
Y1 - 2007/12/1
N2 - Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.
AB - Upstream regions in the DNA sequence are characterized by the presence of short regulatory motifs, which function as target binding sites for transcription factors. Finding two genes with common motifs in their regulatory regions may aid users in identifying co-regulated genes or inferring regulatory modules. By modelling pattern occurrences in the regulatory regions with Poisson statistics, this paper presents a log likelihood ratio statistics-based distance measure to calculate pair-wise similarities between sequences. To perform cluster analysis of regulatory sequences, this paper introduces two clustering algorithms on the basis of the incorporation of the log likelihood ratio statistics-based distance into hierarchical clustering and Self-Organizing Map. The proposed approach has been tested on a synthetic dataset and a real biological example. The results indicate that, in comparison to traditional distance functions, the log likelihood ratio statistics-based similarity measure offers considerable improvements in the process of regulatory sequence-based gene classification.
KW - Cluster analysis
KW - Log likelihood ratlio
KW - Poisson distribution
KW - Regulatory sequence
UR - http://www.scopus.com/inward/record.url?scp=47649108422&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47649108422&partnerID=8YFLogxK
U2 - 10.1109/BIBE.2007.4375719
DO - 10.1109/BIBE.2007.4375719
M3 - Conference contribution
AN - SCOPUS:47649108422
SN - 1424415098
SN - 9781424415090
T3 - Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
SP - 1220
EP - 1224
BT - Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
T2 - 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE
Y2 - 14 January 2007 through 17 January 2007
ER -