Identifying Differential Methylation in Cancer Epigenetics via a Bayesian Functional Regression Model

Farhad Shokoohi, David A. Stephens, Celia M.T. Greenwood: Identifying Differential Methylation in Cancer Epigenetics via a Bayesian Functional Regression Model. In: submitted , Forthcoming.

Abstract

DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. Researchers can obtain insight into methylation patterns at a single nucleotide level utilizing next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (DMC) or regions is one of such challenges. Current methodologies for identifying DMCs fall short in handling low read-depth data and missing values, capturing functional data patterns, granting multiple covariates (categorical, continuous, or combination), and multiple group comparisons. We have developed an efficient method to identify DMCs based on a Bayesian functional regression approach, termed DMCFB, that tackles these shortcomings. Through simulation studies, we establish that DMCFB outperforms current methods and results in better smoothing, and efficient imputation. We apply the proposed method to analyze a dataset containing patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new DMCs, and more importantly, exhibited enhanced consistency of differential methylation within islands and at their adjacent shores. Furthermore, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.

BibTeX (Download)

@article{Shokoohi2020AoASDMCFB,
title = {Identifying Differential Methylation in Cancer Epigenetics via a Bayesian Functional Regression Model},
author = {Farhad Shokoohi and David A. Stephens and Celia M.T. Greenwood},
url = {https://www.biorxiv.org/content/10.1101/2021.03.21.436232v1},
doi = {10.1101/2021.03.21.436232},
year  = {2021},
date = {2021-12-31},
journal = {submitted },
abstract = {DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. Researchers can obtain insight into methylation patterns at a single nucleotide level utilizing next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (DMC) or regions is one of such challenges. Current methodologies for identifying DMCs fall short in handling low read-depth data and missing values, capturing functional data patterns, granting multiple covariates (categorical, continuous, or combination), and multiple group comparisons. We have developed an efficient method to identify DMCs based on a Bayesian functional regression approach, termed DMCFB, that tackles these shortcomings. Through simulation studies, we establish that DMCFB outperforms current methods and results in better smoothing, and efficient imputation. We apply the proposed method to analyze a dataset containing patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new DMCs, and more importantly, exhibited enhanced consistency of differential methylation within islands and at their adjacent shores. Furthermore, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.},
keywords = {Bayesian computation, Bisulfite sequencing, Differentially methylated region, Functional Regression Model, Natural Cubic Splines, Next‐generation sequencing},
pubstate = {forthcoming},
tppubtype = {article}
}