Integrative annotation of variants from 1092 humans: application to cancer genomics.
Khurana E., Fu Y., Colonna V., Mu XJ., Kang HM., Lappalainen T., Sboner A., Lochovsky L., Chen J., Harmanci A., Das J., Abyzov A., Balasubramanian S., Beal K., Chakravarty D., Challis D., Chen Y., Clarke D., Clarke L., Cunningham F., Evani US., Flicek P., Fragoza R., Garrison E., Gibbs R., Gümüş ZH., Herrero J., Kitabayashi N., Kong Y., Lage K., Liluashvili V., Lipkin SM., MacArthur DG., Marth G., Muzny D., Pers TH., Ritchie GRS., Rosenfeld JA., Sisu C., Wei X., Wilson M., Xue Y., Yu F., 1000 Genomes Project Consortium None., Dermitzakis ET., Yu H., Rubin MA., Tyler-Smith C., Gerstein M.
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.