RCytoGPS: an R Package for Reading and Visualizing Cytogenetics Data

Dwayne G. Tally1, Zachary B. Abrams2, Lynne V. Abruzzo3, and Kevin R. Coombes2, 1Department of Biology, Indiana State University, Terre Haute, IN 47809 USA, 2Department of Biomedical Informatics, Wexner Medical Center, The Ohio State University, Columbus, OH 43210 USA, 3Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210 USA

Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotype data. However, the CytoGPS tool only parses and maps cytogenetic data; it does not help users analyze, visualize, or interpret their data. Our presented research to address this problem resulted in the development of RCytoGPS, a publicly available R package that takes JSON files generated from and converts them into objects in R. The conversion results in a list containing five elements: source, raw, frequency, size, and cytoband locations (CL). Source is a character vector that holds the names of the input JSON files. Raw is a list of lists, one per input JSON file, containing the binary loss gain fusion (LGF) matrix for all karyotypes and a status report flagging any karyotypes that failed processing because of syntax errors. Frequency is a data frame summarizing the frequency of losses, gains, or fusions at the level of cytogenetic bands. Size is the total number of processed clones, since each karyotype can contain multiple clones. CL is a description of the cytoband locations (chromosome name, start and end base pairs) in build GRCh38 of the human genome, along with standard names of chromosome arms and cytogenetic bands. RCytoGPS can generate different visualizations to help interpret results. These visualizations are known as idiograms and are based on the frequency of cytogenetic events based on the LGF matrix. RCytoGPS streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology.

Additional Abstract Information

Presenter: Dwayne Tally

Institution: Indiana State University

Type: Poster

Subject: Computer Science

Status: Approved

Time and Location

Session: Poster 5
Date/Time: Tue 12:30pm-1:30pm
Session Number: 4023