Computational Biology and Biomedical Informatics
100 College St., cbb-registrar@yale.edu
http://cbb.yale.edu
M.S., Ph.D.
Directors of Graduate Studies
Mark Gerstein (Bass 432A, 203.432.6105, cbb-dgs@yale.edu)
Steven Kleinstein (300 George St., Suite 505, 203.785.6685, cbb-dgs@yale.edu)
Professors Frederick Altice (Internal Medicine; Infections Diseases; Epidemiology of Microbial Diseases), Marcus Bosenberg (Dermatology; Pathology), Cynthia Brandt (Emergency Medicine; Anesthesiology), Joseph Chang (Statistics and Data Science), Kei-Hoi Cheung (Emergency Medicine; Anesthesiology), Ronald Coifman (Mathematics; Computer Science), Stephen Dellaporta (Molecular, Cellular, and Developmental Biology), Rong Fan (Biomedical Engineering; Pathology), Richard Flavell (Immunobiology), Joel Gelernter (Psychiatry; Genetics), Mark Gerstein (Biomedical Informatics; Molecular Biophysics and Biochemistry; Computer Science; Statistics and Data Science), Antonio Giraldez (Genetics), Jeffrey Gruen (Genetics; Investigative Medicine; Pediatrics), Murat Gunel (Neurosurgery; Genetics), Ira Hall (Genetics), Amy Justice (Internal Medicine; Public Health), Naftali Kaminski (Internal Medicine), Steven Kleinstein (Pathology; Immunobiology), Yuval Kluger (Pathology), Harlan Krumholz (Internal Medicine; Investigative Medicine; Public Health), Haifan Lin (Cell Biology; Genetics), Shuangge (Steven) Ma (Biostatistics), Zongming Ma (Statistics and Data Science), Andrew Miranker (Molecular Biophysics and Biochemistry; Chemical and Environmental Engineering), James Noonan (Genetics; Neuroscience), Corey O’Hern (Mechanical Engineering and Materials Science; Applied Physics; Physics), Xenophon Papademetris (Biomedical Informatics and Data Science; Radiology and Biomedical Imaging), Lajos Pusztai (Internal Medicine), Anna Pyle (Molecular, Cellular, and Developmental Biology; Chemistry), David Stern (Pathology), Hemant Tagare (Radiology and Biomedical Imaging; Biomedical Engineering), Jeffrey Townsend (Public Health; Ecology and Evolutionary Biology), John Tsang (Immunobiology), Hua Xu (Biomedical Informatics and Data Science), Heping Zhang (Biostatistics; Statistics and Data Science), Hongyu Zhao (Biostatistics; Statistics and Data Science), Steven Zucker (Computer Science; Electrical Engineering; Biomedical Engineering)
Associate Professors Julien Berro (Molecular Biophysics and Biochemistry), Sidi Chen (Genetics; Neurosurgery), Forrest Crawford (Biostatistics; Ecology and Evolutionary Biology), Samah Jarad (Emergency Medicine; Biostatistics), Smita Krishnaswamy (Genetics; Computer Science), Bluma Lesch (Genetics), Jun Lu (Genetics), Ted Melnick (Biostatistics; Emergency Medicine), Kathryn Miller-Jensen (Engineering and Applied Science), John Murray (Psychiatry; Neuroscience; Physics), Renato Polimanti (Psychiatry), Edward Stites (Laboratory Medicine), Andrew Taylor (Emergency Medicine), Zuoheng (Anita) Wang (Biostatistics), Yize Zhao (Biostatistics)
Assistant Professors Arnaud Augert (Pathology), David Braun (Medical Oncology), Purushottam Dixit (Biomedical Engineering), Salil Garg (Laboratory Medicine; Pathology), Leying Guan (Biostatistics), Mary-Anne Hartley (Biomedical Informatics and Data Science), Albert Higgins-Chen (Psychiatry; Pathology), Jeffrey Ishizuka (Internal Medicine; Medical Oncology; Pathology), Rohan Khera (Internal Medicine, Cardiovascular Medicine; EPH Biostatistics), Monkol Lek (Genetics), Benjamin Machta (Physics), Robert McDougal (Biostatistics), Jacob Musser (Molecular, Cellular, and Developmental Biology), C. Brandon Ogbunu (Ecology and Evolutionary Biology), Carlos Oliveira (Pediatrics; Infectious Diseases), Steven Reilly (Genetics), Wade Schulz (Laboratory Medicine), Serena Tucci (Anthropology), David van Dijk (Internal Medicine, Cardiology; Computer Science), Rex Ying (Computer Science), Jack Zhang (Molecular Biophysics and Biochemistry)
Fields of Study
Computational biology and biomedical informatics (CB&B) is a rapidly developing multidisciplinary field. The past two decades have witnessed a revolution in the biological and biomedical sciences driven by the development of technologies such as high-dimensional phenotypic profiling, next-generation sequencing, macromolecular structure determination and high-resolution imaging, wearable sensor devices, and large-scale electronic health records. These data-generation technologies demand new computational analysis approaches, which, in turn, have given rise to the field of computational biology and biomedical informatics (CB&B).
The Yale Computational Biology and Biomedical Informatics program combines research training opportunities in a range of different fields within the biological and biomedical sciences, in addition to the computational sciences, applied mathematics, statistics, and data science. The scope and balance of a student's program are highly individualized. Each student in the CB&B program develops, with the assistance of faculty advisers, a specific program of coursework, independent reading, and research that gives a depth of coverage and fits their background, interests, and career goals.
To enter the Ph.D. program, students apply to the CB&B Track within the interdepartmental graduate program in Biological and Biomedical Sciences (BBS), https://medicine.yale.edu/bbs.
Integrated Graduate Program in Physical and Engineering Biology (PEB)
Students applying to one of the tracks of the Biological and Biomedical Sciences program may simultaneously apply to be part of the PEB program. See the description under Non-Degree-Granting Programs, Councils, and Research Institutes for course requirements, and http://peb.yale.edu for more information about the benefits of this program and application instructions.
Special Requirements for the Ph.D. Degree
With the help of a faculty advisory committee, each student plans a program that includes courses, seminars, laboratory rotations, and independent reading. Students are expected to gain competence in three core areas: (1) computational biology and biomedical informatics, (2) biological sciences, and (3) informatics (including computer science, applied mathematics, statistics, and data science). While the courses taken to satisfy the core areas of competency may vary considerably, all students are required to take the following courses: CB&B 7400 and CB&B 7520 . CB&B requires a minimum of ten course credits. Completion of the core curriculum will typically take three to four terms, depending in part on the prior training of the student. With approval of the CB&B director of graduate studies (DGS), students may take one or two undergraduate courses to satisfy areas of minimum expected competency. Students will typically take two to three courses each term and three research rotations (CB&B 7110, CB&B 7120, CB&B 7130 ) during the first year. In addition to all other requirements, students must successfully complete IBIO 6010, Fundamentals of Research: Responsible Conduct of Research, (or another course that covers the material) prior to the end of their first year of study. After the first year, students will start working in the laboratory of their Ph.D. thesis supervisor. Students must pass a qualifying examination normally given no later than the end of the third year. There is no foreign language requirement. Students will serve as teaching assistants in two terms. In their fourth year of study, all students must successfully complete CB&B 5030 , RCR Refresher Course.
M.D.-Ph.D. Students
Students pursuing the joint M.D.-Ph.D. degrees must satisfy the course requirements listed above for Ph.D. students. With approval of the DGS, some courses taken toward the M.D. degree can be counted toward the ten required course credits. Such courses must have a graduate course number, and the student must register for them as graduate courses (in which grades are received). Laboratory rotations are available but not required. One teaching assistantship is required.
Master’s Degree
Terminal Master’s Degree Program Students can be admitted for a terminal M.S. degree in Computational Biology and Biomedical Informatics with the goal of gaining competency in three core areas: (1) computational biology and biomedical informatics, (2) biological and medical sciences, and (3) informatics (including computer science, applied mathematics, statistics, and data science). This is a two-year program and is not part of the BBS program. Students must complete twelve courses at Yale, including at least four graduate CB&B courses (including CB&B 7400 and CB&B 7520), two graduate courses in the biological and medical sciences, three graduate courses in areas of informatics, and three additional courses in any of the three core areas. In addition, M.S. students must take a one-term graduate seminar on research ethics and attend a CB&B seminar series. Finally, students must meet all of the graduate school’s requirements for the two-year terminal M.S. degree. We also discourage auditing courses, which do not satisfy the degree requirements.
Terminal M.S. degree students are also expected to complete an M.S. project, write a research paper describing it, and defend the project in a seminar where they present the project and answer questions about the project as well as demonstrate breadth knowledge of their coursework and track of study. The paper is evaluated by the student’s research supervisor and a second reader from the CB&B faculty. Students are expected to identify a faculty member to supervise the M.S. project by the end of the first year or early in the second year. Completion of the research paper is facilitated by enrolling in CB&B 7140 and CB&B 7150.
M.S. (en route to the Ph.D.) Students enrolled in the Ph.D. program may be awarded an M.S. degree en route as they satisfy the requirements for the Ph.D. degree. To qualify for the awarding of the en route M.S. degree a student must (1) complete two years (four terms) of study in the Ph.D. program; (2) complete the required coursework for the Ph.D. program, with ten required course credits taken at Yale including three successful research rotations; and (3) meet the graduate school’s grade requirements.
CB&B 5030b, Responsible Conduct of Research, Refresher Course Steven Kleinstein
The NIH requires that students receive training in the responsible conduct of research every four years. This course meets that requirement for fourth-year students.
HTBA
CB&B 5620b / AMTH 765b / ENAS 5620b / INP 562b / INP 7562b / MB&B 5620b / PHYS 5620b, Modeling Biological Systems II Thierry Emonet, Jing Yan, and Damon Clark
This course covers advanced topics in computational biology. How do cells compute, how do they count and tell time, how do they oscillate and generate spatial patterns? Topics include time-dependent dynamics in regulatory, signal-transduction, and neuronal networks; fluctuations, growth, and form; mechanics of cell shape and motion; spatially heterogeneous processes; diffusion. This year, the course spends roughly half its time on mechanical systems at the cellular and tissue level, and half on models of neurons and neural systems in computational neuroscience. Prerequisite: a 200-level biology course or permission of the instructor.
HTBA
CB&B 5750a, Bioinformatics Applications in Biomedicine Jihoon Kim
This course covers the latest advances in bioinformatics in the context of human diseases. Students learn background knowledge and practical skills to analyze omics data for human disease research. By the end of this course, students should be able to: (1) process bioinformatics data with linux-based pipelines and data tools, (2) apply exploratory data analysis techniques in Python and R, (3) perform analysis of DNA, RNA, and protein data, and (4) conduct a biobank-scale analysis using the platform such as the All of Us Research Workbench.
M 12:20pm-2:50pm
CB&B 5800a, Bioinformatics Algorithms in Genomics Haoyu Cheng
This course introduces key algorithms used in computational genomics, with a focus on both classical bioinformatics methods and emerging machine learning and deep learning approaches. Topics covered include sequence alignment, genome assembly and comparative genomics, variant identification and analysis, and gene expression and regulation, along with advanced techniques for specialized applications such as cancer genomics. Through hands-on exercises and projects, students gain practical experience in implementing algorithms and analyzing real-world genomic data. By the end of the course, students are prepared to conduct independent genomic analyses or develop novel bioinformatics algorithms to tackle emerging challenges in genomics.
W 2:30pm-5pm
CB&B 6340a, Computational Methods for Informatics Robert McDougal
This course introduces the key computational methods and concepts necessary for taking an informatics project from start to finish: using APIs to query online resources, reading and writing common biomedical data formats, choosing appropriate data structures for storing and manipulating data, implementing computationally efficient and parallelizable algorithms for analyzing data, and developing appropriate visualizations for communicating health information. The FAIR data-sharing guidelines are discussed. Current issues in big health data are discussed, including successful applications as well as privacy and bias concerns. This course has a significant programming component, and familiarity with programming is assumed. Prerequisite: CPSC 223 or equivalent, or permission of the instructor.
W 11am-11:50am, W 1pm-1:50pm, TTh 3pm-4:20pm
CB&B 6380a, Clinical Database Management Systems and Ontologies Kei-Hoi Cheung and George Hauser
This course introduces database and ontology in the clinical/public health domain. It reviews how data and information are generated in clinical/public health settings. It introduces different approaches to representing, modeling, managing, querying, and integrating clinical/public health data. In terms of database technologies, the course describes two main approaches—SQL database and non-SQL (NoSQL) database—and shows how these technologies can be used to build electronic health records (EHR), data repositories, and data warehouses. In terms of ontologies, it discusses how ontologies are used in connecting and integrating data with machine-readable knowledge. The course reviews the major theories, methods, and tools for the design and development of databases and ontologies. It also includes clinical/public health use cases demonstrating how databases and ontologies are used to support clinical/public health research.
Th 1pm-2:50pm
CB&B 6470a / GENE 6450a, Statistical Methods in Human Genetics Hongyu Zhao
Probability modeling and statistical methodology for the analysis of human genetics data are presented. Topics include population genetics, single locus and polygenic inheritance, linkage analysis, quantitative trait analysis, association analysis, haplotype analysis, population structure, whole genome genotyping platforms, copy number variation, pathway analysis, and genetic risk prediction models. Offered every other year. Prerequisites: genetics; BIS 505; S&DS 541 or equivalent; or permission of the instructor.
Th 10am-11:50am
CB&B 6550a / GENE 6550a, Stem Cells: Biology and Application In-Hyun Park
This course is designed for first-year or second-year students to learn the fundamentals of stem cell biology and to gain familiarity with current research in the field. The course is presented in a lecture and discussion format based on primary literature. Topics include stem cell concepts, methodologies for stem cell research, embryonic stem cells, adult stem cells, cloning and stem cell reprogramming, and clinical applications of stem cell research. Prerequisites: undergraduate-level cell biology, molecular biology, and genetics.
Th 1:30pm-3pm
CB&B 6663b / AMTH 5520b / CPSC 5520b / GENE 6630b, Deep Learning Theory and Applications Smita Krishnaswamy
Deep neural networks have gained immense popularity within the past decade due to their success in many important machine-learning tasks such as image recognition, speech recognition, and natural language processing. This course provides a principled and hands-on approach to deep learning with neural networks. Students master the principles and practices underlying neural networks, including modern methods of deep learning, and apply deep learning methods to real-world problems including image recognition, natural language processing, and biomedical applications. Course work includes homework, a final exam, and a final project—either group or individual, depending on enrollment—with both a written and oral (i.e., presentation) component. The course assumes basic prior knowledge in linear algebra and probability. Prerequisites: CPSC 202 and knowledge of Python programming.
HTBA
CB&B 7110a and CB&B 7120b and CB&B 7130b, Lab Rotations Steven Kleinstein
Three 2.5–3-month research rotations in faculty laboratories are required during the first year of graduate study. These rotations are arranged by each student with individual faculty members.
HTBA
CB&B 7140a, Research Paper in Computational Biology and Biomedical Informatics Anthony Lisi and Hua Xu
This two-semester single credit pass/fail course must be completed as part of the terminal M. S. degree program in computational biology and biomedical informatics (CB&B). Students work with a faculty supervisor in designing their project and writing their research paper. The syllabus details the intended scope and process for writing the research paper. In the broadest terms, the research paper must be of publishable quality and defensible in a public scientific forum. The student’s research supervisor is responsible for managing the intended product. The preferred format of the research paper for students is one that is in the style and length of a publishable, peer-reviewed paper, templated based on the journal submission. Prerequisite: second year enrollment in program.
HTBA
CB&B 7400a, Introduction to Health Informatics Tsung-Ting Kuo and Andrew Loza
The course provides an introduction to clinical and translational informatics. Topics include (1) overview of biomedical informatics, (2) design, function, and evaluation of clinical information systems, (3) clinical decision-making and practice guidelines, (4) clinical decision support systems, (5) informatics support of clinical research, (6) privacy and confidentiality of clinical data, (7) standards, and (8) topics in translational bioinformatics. Permission of the instructor required.
TTh 1:15pm-2:30pm
CB&B 7520b / CPSC 7520b / MB&B 7520b / MB&B 753 / MB&B 754 / MCDB 7520b, Biomedical Data Science: Mining and Modeling Mark Gerstein and Matthew Simon
Biomedical data science encompasses the analysis of gene sequences, macromolecular structures, and functional genomics data on a large scale. It represents a major practical application for modern techniques in data mining and simulation. Specific topics to be covered include sequence alignment, large-scale processing, next-generation sequencing data, comparative genomics, phylogenetics, biological database design, geometric analysis of protein structure, molecular-dynamics simulation, biological networks, normalization of microarray data, mining of functional genomics data sets, and machine-learning approaches to data integration. Prerequisites: biochemistry and calculus, or permission of the instructor.
MW 1pm-2:15pm