Investigating the Multifunctional Lifecycle of CNVs in the Malaria Parasite Noah Brown April 1 2 th, 20 23 Committee Members: Dr. Alan Bergland (First Reader) Dr. Martin Wu Dr. Eyleen O'Rourke Dr. Yuh - Hwa Wang Dr. Jennifer Guler (thesis mentor) Summary Malaria is a bloodborne parasitic infection that kills more than 600,000 people yearly. It is estimated that up to half of all humans who have ever lived have died from malaria, predominately children. Because malaria is such an ancient disease with strong impacts to human health, the genomes of both malaria and humans have been shaped by coevolution . Anti - malarial drugs are the strongest weapons we have to combat malari a; however, parasites continue to gain resistance to anti - malarial drugs at the genetic level. The acquisition redund ant copies of a genomic segment , otherwise known as copy number variations (CNVs) , contributes to the adaptation of malaria by harboring ge nes that confer resistance. This research will examine the lifecycle and multifunctionality of CNVs in two mechanisms that it presents itself in the malarial parasite. 1) ‘Rare’ CNVs that spontaneously arise in a parasite genome can prime subp opulations to adapt to future selective stress ors more rapidly . The rate of these rare CNVs will be examined with and without the application of low - level drug stress to simulate dynamic in - vivo situations. Both laboratory and clinical lines will be assessed, using parasite - optimized single cell sequencing and long - read sequencing pipeline s . Determining the rate of rare CNV formation will allow insight in to the prevalence of mechanisms that contribute to malarial genomic heterogeneity and evolution 2) Extrachromosomal DNA (ecDNA) has recently been ide ntified in the parasite. It conveys a major resistance advantage to an anti - malarial compound by carrying CNVs of a relevant gene. The origins, maintenance, and replication of these molecules in malaria is largely unknown due to an inefficient ecDNA enrich ment pipeline. In order to begin to better understand this phenomenon this work will develop a parasite - specific ecDNA enrichment pipeline for use in sequencing and imaging to begin inferring modes of function. Further, this research will assess the preval ence of ecDNA in other parasite models by examining likely target regions (i.e., existing chromosomal CNVs) with and without the application of drug stresses that are known to stimulate structural rearrangements. Understanding the function and pervasivenes s of malaria ecDNA is important in determining its contribution to anti - malarial drug resistance. Specific Aims Malaria is a prolific parasitic illness that kills more than a half million people every year, making it among the top th ree infectious diseases with the most deaths annually (1, 2) . Anti - malarial drugs are available, but studies show an increasing resistance to ou r frontline drug s in Plasmodium falciparum ; the species responsible for deadly malaria (3 – 5) . This resistance is largely driven by large genomic rearrangem ents called copy number variations (CNVs) that harbor copies of genomic segments encoding for proteins that confer resistance to anti - malarial drugs Our lab has identified serval innate mechanisms in Plasmodium that enhance its ability to generate CNVs, indicating its importance as a mechanism to instigate rapid evolution in parasite populations (6 – 8) . My thesis will characterize two different CNV acquisition pathways with diverse functions in the malaria parasite. First, I will study the random and constitutive rare - CNV genesis that prime parasite populations for resistance to stressors at the sing le cell level, where I hypothesize that low level drug stress increases the rate at which these CNVs are formed (Aim 1). Additionally, I will develop a pipeline to explore the abundance of newly discovered extrachromosomal DNA (ecDNA) that harbor CNVs and have been shown to confer a 10x higher resistance to a comparable line with no ecDNA (6) (Aim 2). Broadly, I hypothesize that these adaptive methods are used by the parasites in real - world settings to expand their ability to become resistant to treatment. The information derived from my studies wil l elucidate previously undefined mechanisms of genetic varation in malaria. AIM 1 - Determining the extent rare - CNVs play as a preemptive mechanism in the malaria parasite : CNVs are important for the evolution of malaria, and we have demonstrated several ways in which the parasite’s genome is set up to facilitate the production of CNVs (7, 8) They arise randomly at a low level, even in the absence of selective stress , creating genomic heterogeneity across p opulations (9) These ‘r are ’ CNVs prime populations to respond rapidly to incoming stress by supplying individuals with a relevant, preexisting CNV that c onveys a fitness advantage Moreover, it would be beneficial for populations if this process were exacerbated during stress to produce more favorable CNVs Therefore , I hypothesize that this basal rate of spurious CNV generation is increased during times of stress I will observe the effect of drug stress on both lab lines and clinical samples. Observing these rare CNVs requires genetic analysis on the single genome level, which is inherently noisy. To combat this, I will employ a tandem sequencing approach using two methods. One , I w ill use short - read sequencing , which can be used on a single parasite level to identify specific CNVs with accurate base - calling ; second, I will use Nanopore long - read sequencing , which provides cross validation and more readily identifies large structural rearrangements In both techniques, we will consider foremost the number of CNVs per genome pre - and post - treatment with a stressor. Further, I will assess CNV breakpoints, and recurrent sequence frequencies to better understand the mechanism and function of their formation Impact: Upon completion of this aim, I will understand how stress impacts rare CNV development across the genome, which contributes to the genomic diversity AIM 2 – Developing a pipeline to analyze the characteristics and abundance of ecDNA : Little is yet known about the structure, genesis , and maintenance of Plasmodium ecDNA due to an inadequate enrichment pipeline. Moreover, ecDNA has been identified in only one model. I will use charge - based gradients and structure - specific nucleases to separate and enrich ecDNA with a high yield and preserve its native 3D structure , a nd be able to identify ecDNA in other models I hypothesize that characteristics like the structure, sequence, and nuclear location can be resolved to better understand ecDNA’s modes of function ; and further, ecDNA exists in other models and can be exacerb ated through drug stress. Upon developing a n enrichment pipeline, I will assess ecDNA structur e through microscopy and infer potential functionality through sequencing (Aim 2A) . Chromosomal CNVs are likely increase the rate of ecDNA generation without any selective pressure (6) . Therefore , I will assess whether ecDNA can be stimulated in lines with preexisting CNVs with or without drug treatment I will also test c linical lines and samples with growth patterns that indicate the presence of ecDNA (Aim 2B) Impact: Completion of this aim will provide a method in which to enrich and study ecDNA in the parasite and other models , and overall assess the contribution of ecDNA to antimalarial drug resistance Broad Impacts: The study of these mechanisms will contribute to our knowledge of how malaria adapts and will further shape our understandi ng of genomic resistance in similar models. Ultimately, this can help create strategies that mitigate the formation of CNVs as a viable strategy to combat malaria. Background and Significance Malaria Biology : Malaria has been a global health threat since ancient times It is estimated to have killed more people than any other infectious disease, and still kills more than 6 00,000 annually (10) . Only recently in human history have efforts to eliminate malaria begun, and while it has all but been eradicated North America and Europe, it persists heavily in all other tropical and temperate parts of the world. T here are more than 450 million documented cases annually, and takes its toll not only in the cost of human life, but also economic growth of poverty - stricken nations (10) Plasmodium , the genus responsible for deadly malaria, is a protozoan blood - borne pathogen with a multiphase life cycle that makes its treatment and tracking difficult ( Fig. 1 ). A single invaded red blood cell (RBC) can produce 10 - 30 progeny (11) . I n an infected RBC (iRBC) , the parasite subsists primarily off existing components in host - cell cytoplasm; among these being hemoglobin (12) . While vaccines exist against malaria, they only provide pro tection in the short term , making it a poor choice for the young who m are disproportionately represented in the death toll (13) . Therefore, the responsive treatment with anti - malarials is viewed as the most viable option. Treating malaria is predicated by the appearance of symptoms. When in the human host, the immune system often fails to detect the parasites due to a variety of evasion mechanisms, which affords its ability to persist (14) . Symptoms in mild cases are Flu - like, while more severe cases include jaundice, seizures, and kidney failure (14) . Without reliable, widely available testing, it is generally infeasible to apply early stage treatment in malarial infections; especially in areas with underdeveloped healthcare. Therefore, while the mechanisms of action vary, the most prevalent anti - malarial drugs are aimed at reducing the parasitic load during the blood phase (15) As such, human involvement in the evolution of malaria is more impacted in the blood stage of the parasite life - cycle ( Fig. 1 ) . Malaria has a long history of adapting to, and subsequently, antiquating anti - malarial drugs. Some drugs have lasted as little as 5 years after being released to public use (15) . Further, we now have evidence that Southeast Asian malarial populations are developing resistance to the world's current frontline drug, Artemisinin. W ith no drugs in trial that promise similar results as Artemisinin once did, we must confront the idea that we are losing the adaptive tug - of - war. It is imperative to broaden our approach and understanding of the mechanisms that underlie malaria’s ability to evolve. In this research, I propose to elucidate two of these mechanisms, including the generation of rare CNVs that happen on the single cell level ( A im 1 ), and the generation of extrachromosomal DNA that carries CNVs of genes related to anti - malarial resistance ( A im 2 ). Copy number variations as a mechan ism of resistance : Genetic diversity, and the propensity to generate genetic diversity , underpin the malaria parasite’s ability to adapt to antimalarial drugs (16) . Malaria has an arsenal of genomic adaptive mechanisms that generate this diversity. It possesses a highly A - T rich genome (>80%), leading to breakage and subsequent mutations and structural rearrangements (7, 17) Among the most important structural rearrangements are copy number variations (CNVs), which are deletion s or insertions of segments of the parasite’s haploid genome. Amplifi cations of copy number directly increase the transcriptional output of that region of the genome. For example, amplified regions of the small - molecule efflux pump encoded for by pfmdr1 increases the parasite’s ability to export small molecule s (i.e. drugs ) from the cytosol (3) This creates a state of pseudo - polyploidy , which is integral to the accumulation of point mutations (3, 18) In this state, CNVs generated and maintained in populations have a higher chance of acquiring beneficial SNPs, which may persist in parasites even when CNVs are negatively selected for ( Fig 2 ). Figure 1: Parasite life - cycle with an emphasis on selection during drug treatments in the blood stage Figure 2: Long term resistance acquired through CNV genesis. Formation of CNVs in Plasmodium : CNVs are canonically generated through imprecise repair of double strand breaks (DSBs) or replication erro r s (19) . These DSBs are caused by damage from a variety of means including reactive oxygen species, which is increased when the parasite undergoes prolonged stressful conditions, including anti - malarial drug stress . Because Plasmodium lacks a non - homologous end joining (NH E J) repair pathway , I nitial CNV generation is more likely rendered by microhomology - mediated end joining pathways (MMEJ). Our lab has identified the utility of this aspect through the identification of A - T rich monomeric rep eats that generate hairpins at CNV junction sites. These cause polymerase stalling or enzymatic activity leading to double strand breaks to be repaired by MMEJ ( Fig. 3 ) (20) The ubiquity of these sites across the genomes of multiple unrelated lines indicates their applicatio n in the development of structural rearrangement’s. Further, a fter an initial CNV is formed using MMEJ, intramolecular looping of these CNVs regions of identical DNA can cause rapid amplifications of CNVs through homologous recombination ; despite the haplo id nature of parasites (7, 8) Rare CNVs : The dynamics of CNVs in evolving populations are not well understood. One reason for this is that most CNVs have been identified by analyzing bulk DNA from antimalarial - selected parasites, where CNVs are present in the majority of parasites However, CNVs that occur on the single cell level, hereto referred to as ‘rare’ CNVs, undoubtedly remain undetected. Contemporary investigations in other organisms have analyzed single cells to detect rare CNVs in popu lations (21 – 23) . These low - frequency CNVs can arise without specific pressure from outside forces, and are thought to reflect random - chance genomic events mediated by existing cellular machinery. Rare CNVs can often be deleterious, or offer no specific benefits in a n ideal population and are even selected against in the long term However , d ue to the highly mutable nature of the Plasmodium genome , these rare events are more likely to occur compared to other organisms (24) . For this reason, we expect that parasites take advantage of the iterant increases in population - wide genomic heterogeneity (7) Particularly , we hypothesize that under non - lethal unselective stress , underlying mechanisms will increase the rate of rare CNV formation ; thereby expanding genetic diversity of the population In turn, the population will be more likely to be ‘primed’ to a strong selective pressure, by already including individual(s) with a relevant CNV to resist that stress. Importantly, selection for these individuals happens more rapidly than in the conventional stressed induced CNV - acquisition model; wherein CNVs are only gene rated after the application of strong selective forces Stressing parasites will serve as our means of observing this phenomenon, as well as a reflection of the dynamics of real - world malarial cases. Thus, it is the aim of this research to determine the impact of stress on the generation of rare CNVs Several methods are commonly used to detect CNVs: q PCR based methods, microa rrays, and whole genome sequencing (WG S ) (25) Of these, only WGS effectively accounts for the whole genome which is necessary to detect the random nature of rare CNVs. However, WGS preformed on bulk DNA (i.e. DNA extracted from millions of cells) fails to detect even some subpopu lational CNVs that occur in significant portions of the population (~10% or less) . Sensitivity is lost in this scenario due to the averaging of signal; a real CNV existing on a single cell’s genome is treated as noise compared to millions of other cells’ genomes . Therefore, WGS must be done on the single cell level to detect rare CNVs Single Cell Sequencing : Single cell techniques are not new, but have become much more reliable and cost effective in recent years (23) . They are used to assess heterogeneity in populations. However, single cell approaches in P lasmodium have fallen behind, due to inherent challenges of the parasite genome ; particularly it’s AT richness precludes unbiased primer binding (26, 27) . Fortunately, our lab has largely overcome these challenges through the application of a robust pipeline, and was the first to use single cell techniques to detect rare CNVs in the parasite genome (8) . This is due to our use of a highly parasite - optimized WGA procedure , MALBAC . Next generation sequencing, namely Illumina short - read (ISR) Figure 3: Generation of structural rearrangements at A - T rich tracks within the parasite genome. sequencing requires 1ng - 1ug of DNA of genomic DNA, which is far more than a single parasite genome contains (~25 fg), and therefore some sort of amplification is required. A commonly used method, multiple d isplacement a mplification (MDA), has been used to successfully amplify single parasite genomes with high genome coverage to detect SNPs. However, this method cannot be used to evaluate CNVs, due to low coverage uniformity and generation of chimeric reads (28, 29) Our optimized multiple annealing and looping - based amplification cycles (MALBAC), does not exhibit these issues. Extrachromosomal DNA : EcDNA has been identified in multipl e kingdoms more than half a century ago (30) . Only recently, however, has research reignited in the subject. Most models come from studies in human cancer, as ecDNA containing oncogenes is commonplace in many cancers (31) . EcDNA is conventionally represented as a double stranded, circular molecule between 1 - 3 0Kb (though can be larger) , as well as acentromeric and atelomeric (32) However, th ere is notable variety in these characteristics in different models, which makes the investigation of structure and function of ecDNA challenging and context dependent. There is no known pathway by which ecDNA is generated (like transposable elements), and are instead thought to occur by chance genomic events (31, 32) The mechanisms that facilitate the genesis, maintenance , and replication of these molecules have been described in several organisms , but most are simply unfeasible to apply to Plasmodium ecDNA given the species unique genome. Specifically, its s mall haploid genome and lack of a NHEJ pathway preclude many of the postulated mechanisms EcDNA in malaria likely presents novelty in its most base characteristics. While this work will primarily focus on ecDNA as a transcriptional hub, ecDNA has noted to have several other functions such as serving as enhancers and molecular sponge s , undergoing gene fusion, and participating in cell signaling and communication (33) Because of an increased interest in ecDNA, computational and molecular based toolkits are being rapidly developed to identify and study ecDNA in a variety of models (34 – 36) Extrachromosomal DNA in malaria : Our lab has recently identified ecDNA in the malaria parasite (37) . It carries multip le copies of an ~ 80kb amplicon that contains the gene , dihydroorotate dehydrogenase ( dhodh ), which is a critical component in the pyrimidine biosynthesis pathway. Dhodh amplifications are selected for by a n anti - malarial compound , DSM1. Importantly, we saw that highly resistant lines that contained ecDNA had a 10x higher resistance to DSM1, despite similar total copy number to moderately resistant lines (including chromosomal copies of dhodh ) ( Fig. 4 ). This indicates that ecDNA may confer a major adaptive advantage when placed under drug stress. In this model, we found that approximately half of the amplicons were contained on the ecDNA. Removal of DSM1 caus ed total copy number to diminish, likely through rapid removal of ecDNA (37) The basic mechanisms for maintenance of malarial e cDNA are st ill virtually unknown. One model for g enesis of ecDNA in parasites is through the use of existing chromosomal CNVs. The state of pseudopolyploidy allows for intramolecular looping, and ultimately budding off into its own separate molecule ( Fig. 5 ). Conversely, ecDNA has been shown to increase chromosomal copies through reintegration into the chromosome in other models , and in soon to be published work in malaria (8, 38) Deep sequencing of enriched ecDNA revealed head - to - tail repeats of the same CNV previously described on the chromosome – with the same break points. This is consistent with the MMEJ based structural rearrangement theory as described above, as the j unctional sites between each CNV provide sequence homology ( Fig. 3 ) . However, because both the ecDNA and chromosomal copies of the dhodh amplicon share the same breakpoints and are sequentially identical, PCR and sequencing based methods are unable to diff erentiate between them. This drives the need for techniques that enrich for ecDNA and remove linear DNA prior to analysis. Figure 4: Three lines of plasmodium exhibit vastly different resistance to drug (DSM1) irrespective of the dhodh copy number. The Highly resistant clone’s resistance is indicative of the presence of ecDNA. The abundance of this molecule demands it has the ability to replicate. In Plasmodium , areas of high AT content can be used as an indefinite origin of replication ; qualify ing nearly any intergenic region (90% AT on average) Further, s equencing also identified the inclusion of a highly amplified (5000x chromosomal coverage), A - T rich (88.2%), 714bp sequence within the dhodh amplicon. The sequence is part of the SAC3 protein coding region, a gene which we believe to be inconsequential for resistance. Rather than an artifact of MDA amplification (a known phenomenon) , we hypothesize this structure to be biologically relevant as either an exaggerated origin of replication and or a pseudo - centromere due to its similarity to chromosomal centromeres (~2 kb, 97% AT - content) (39, 40) . Further investigations of this region are critical in understanding the function of ecDNA. Research Design and Methods CNVs are a major adaptive mechanism that shape the w ay that malaria evolves, which presents a challenge to anti - malarial drug treatment (9, 27, 41) . Recent studies into CNVs have revealed many open questions, and I will apply new tools and techniques developed by our lab to improve our understanding of the lifecycle of CNVs in the malarial parasite. Does drug stress increase the rate by which random, rare CNVs arise in single parasite genomes (AIM 1) ? Is extrachromosomal DNA a n abundant adaptive mechanism that predicates the expansion of CNVs in populations under stress (AIM 2) ? AIM 1 - Determine if CNVs a preemptive adaptive mechanism in the malaria parasite The parasite’s genome is naturally more likely to generat e CNVs in several different capacities ; including the high frequency of AT - rich features that are involved in their formatio n (23) Because of these inherent genomic features, there is a question of whether CNVs are randomly and constitutively being generated across the genome at a low rate in order “ prime ” the parasites for the sudden need to adapt to environmental changes. Further, t here is strong evidence that environmental stress exacerbates the rate of this CNV generation (19) , and further prompts the acquisition of beneficial structural arrangements. I will study the relationship between stress and the rate of rare CNV formation To investigate this phenomenon, single cell whole genome sequencing must be used ; as sequencing bulk DNA diminishes our ability to detect rare CNVs. Single cell genomics has several complications that can limit the quality of the results. Fortunately, our lab has well established pipelines and techniques to mitigate these problems. Single cell isolation followed by short - read sequencing has been used to detect CNVs in a variety of organisms. This approach facilitates the detection of sub - populational CNVs because rare CNV signal from a single parasite is not occluded by the normaliza tion of background noise as in bulk DNA analysis. However, genomic amplification in Plasmodium presents a particular challenge due to its haploid nature, small genome, and A - T richness (8) . Our lab has recently developed a whole genome amplification (WGA ) technique as part of a WGS pipeline that overcomes these challenges ( Fig. 6 ). Here, I will use these methods to determine the relationship between cellular stress and rare CNV generation in parasite populations. Whil e single cell genomics is ideal for detecting rare CNVs on the single parasite level, using only one sequencing technique limits the robustness of our results Therefore, I propose to use a tandem sequencing approach with both short - read based next generation sequencing (NGS) of single cells and Nanopore long - read sequencing t o provide a more robust result focused on the overall change in the number of CNVs per genome with and without drug stress. While different kinds of information are derived from each approach, they can be reconciled to improve our confidence in our results. Our established single cell genomics pipeline : Due to the nature of the parasite’s lifecycle ( Fig. 1 ), a single trophozoite can contain more than 20 copies of the genome at a time, inhibiting our ability to achieve true single cell genomics; the only scenario in which our signal to noise ratio is maximized. To circumvent Figure 5 : Generation of ecDNA through intramolecular looping, and microhomology based repair this, we synchronize and enrich f or early stage parasites (that contain only one copy of the genome). We then isolate early stage parasites using microscopy or flow sorting. Since short read sequencing requires nanogram levels of genomic material for library construction, we then conduct WGA through MALBAC MALBAC, involves a quasi - linear pre - amplification, which reduces the bias associated with exponential amplification (42) . Because MALBAC was originally designed for human cells, our lab altered the original protocol to suit the small, AT - rich parasite genome through the use of a low temperature polymerase and AT rich degenerate primers to help prevent polymerase bias (8) Following MALBAC , we construct librar ies for Illumina short read sequencing ; we ensure equal fragment length and proper adapter ligation through minimal pre - sequencing amplification ( 4 cycles) in order to reduce further bias. For analysis, we use two methods (break - point and read depth - based) of CNV analysis and then assessed the overlap between the two to limit the chance of false positive s. Improving our single cell pipeline : Since publishing this pipeline, we have introduced improvements to increase throughput and limit error. We have integrated an automatic robotic pipetting (Mosquito LV) during the MALBAC procedures, as well as flow sorting to increase parasite isolation efficiency ( P reliminary D ata - Fig. 7 ). Additionally, before sequencing, we conduct a number of quality contr ol steps to ensure the highest quality samples proceed to analysis steps, while also keeping in mind the need for higher throughput of samples. These include gels to assess levels of amplification, High Resolution Melt analysis to detect potential cross - co ntamination events , and ddPCR ( see A im 2 ) to calculate coverage uniformity of several loci (8) We are also collaborating with a bioinformatics group at the University of South Florida (USF) to specifically design an improved pipeline for analyzing CNVs from a haploid, single - cell origin. Their strategy relies on having high enoug h throughput of samples (>40) per treatment group to normalize the data based on a reproducible read coverage bias as a result of uneven priming (8) Preliminary Data 1: During my initial studies on this project, I spearheaded improvement of our cell isolation step to ensure high thro ughput and reliability. We have historically had near 100% successful amplification of parasite DNA when sorting 2 cells into a well, but when sorting 1 cell, our success rate falls to ~15%. This problem either stems from A) failure of the cell sorter to successfully deposit a droplet containing an iRBC or B) failure to amplify after an iRBC is deposited. To rule out the former, we designed an experiment that sorted late stage parasites (>10n) into 96 - well plate containing reagents fo r a WGA method that is more sensitive (but less uniform) than MALBAC ( Fig 7B ). Either 1, 2 or 10 cells were sorted in to each well. Following WGA, samples were then analyzed using qPCR to measure the presence of two targets: Cytochrome - B ( Cyt - B ) a ~17 cop y mitochondrial gene, and HSP70 , a single copy chromosomal gene. Further we also assessed droplet placement accuracy using a separate colorimetric assay, where if the droplet is successfully deposited in the well, it will turn blue ( Fig 7B ) First, w e f ound that the flow sorter can have variable accuracy in terms of droplet placement in the wells. Fortunately, it can be recalibrated to achieve perfect accuracy ( Fig. 7A ). We will incorporate this recalibrate step in all future experiments to ensure sorting success. Second, the two - target qPCR revealed that, in 80% of the Figure 7: Accuracy assessment of flow sorting: A colorimetric assay was used to verify proper droplet deposition in the wells, regardless of the droplet containing a cell. Recalibration of the flow sorter improved accuracy (A). A WGA kit followed by a two - target qPCR was used to determine the presence of parasite DNA. Darker blue wells indicate pos itive controls with the respective number of cells sorted in to them . Results are recorded on a table to the right (B) Figure 6: Overview of Single Cell Pipeline. wells, parasite DNA was deposited, and amplified. Positive controls consisting of 2 or more cells had 100% detection rate, validating our previous results ( Fig 7 B ). In current experiments (data not shown) , we have seen similar results. 3 plates with different cell lines and treatments were used. Each had 54 samples containing two ring stage parasites that were sorted into a 96 - well plate, and then amplified using our MALBAC procedure. Using ddPCR to access coverage at two different loci as in Liu et al, we have detected a 100% amplification efficiency with a relatively uniform coverage in two of the plates. Tandem sequencing with Nanopore long - re ads : While NGS is a highly accurate and relatively inexpensive, its application for detecting CNVs in single cell genomics is affected by a single to noise ratio that requires bioinformatic processing to increase our confidence of correctly identified CNVs. The key feature of Nanopore long - read sequencing is that it does not rely on PCR amplification like NGS . Amplification bias diminishes our confiden ce in CNV calling in short read sequencing, but intact genomic fragments in Nanopore long - read are sequenced as is, giving us a direct view into the sequence of the sample. T his approach is especially adept at capturing structural rearrangements within genomes, as the longer reads provide an accurate depiction of the intact DNA structure. With Nanopore long read sequencing, our lab has seen reads as long as 1Mb; long enough to span an entire parasite chromosome. Our lab has an established a Plasmodium - optimized pipeline for the detection of CNVs in our parasites with and without drug stress ( Table 1 ). Experiments and data acquisition for this pipeline will largely be carried ou t by other members in the lab, but ultimate comparisons and co nclusions between NGS and Nanopore long - read will fall under my purview. Preliminary Data 2: Drug treatment dosage and parasite staging presents a potential problem for consistency and reproducibility for both single cell and long - read techniques. This could affect our ability to properly stimulate increased rare CNV generation Fortunately, in a pilot Nanopore long - re ad experiment (performed principally by Dr. Shiwei Liu), we have shown that we can achieve appropriate treatment dosage and duration to detect changes in rates of structural variations ( Table 1 ). Parasites were treated at 10xEC50 DSM1 (antimalarial) and 24 xEC50 Aphidicolin ( DNA replication inhibitor ) for 12 hours. We achieved high coverage, quantified various structural variations and detected found a 4x and 9x fold difference between untreated and treated groups for DSM1 and Aphidicolin , respectively. Through this analysis , we can begin to predict what mechanisms underlie the differences in treatments, and ultimately what circumstances maximize the rate of rare CNV generation. Th e treatment dosage and timing from this experiment will info rm future NGS and Nanopore long - read experiments. Planned S tudies Impact of stress on CNV rate : I will treat laboratory lines of Plasmodium that are previously known to undergo rare CNV formation with two kinds of antimalarials: artemisinin and DSM1 at sub - lethal levels By comparing these two drugs, we will begin to gather information about the types of stress that induce rare CNVs – or if indeed the kind of stress matters at all. T hese compounds act in very different ways ; artemisinin is a fast - acting drug, while DSM1 causes a ‘delayed parasite death’. Additionally, they are administered in very different concentrations in clinical applications. Therefore, we will apply different do ses of each drug according to the following regimen: low (EC50), intermediate (10 x EC50), and high (100 x EC50) for 2 - 4 hours ( Artemisinin ) or 12 hours (DSM1) in at least 40 cells . As a positive control we will treat the parasites with another anti - malarial, aphidicolin , as it has been shown to induce CNVs Table 1: Pilot nanopore long - read experiment with DSM1 and Aphidicolin: through several mechanisms (43) Using flow cytometry, we can verify and normalize for differences in parasite growth and health for the purposes of reproducibility. Treated parasites will be isolated and analyzed concurrently with an untreated group of the same cell line ( Fig. 8 ) . The untreated group will serve as the ‘basal’ CNV rate for means of comparison. Sample treatment for Nanopore long - read will be carried in the same manner. The only difference being instead of ~ 40 single cell replicates , Nanopore long - read sequencing will require 3 biological replicates of bulk sample . In fact, because of the relatively little material required for single cell, treated (and untreated ) cultures can be split to produce NGS and Nanopore long - read data and effectively negate any cross - experiment variance ( Fig. 8 ) NGS will be performed by UVA genomics core using methods previously described (8) I will process r aw sequence files and assess quality and evenness of coverage using several tools from the UVA Rivanna suite: BBTools, FastQC, Qualimap, and Samtools . The resulting files will then be shared with our collaborators , Feifei Xiao and Xuanxuan Yu, at USF to undergo a custom CNV analysis approach . CNVs called by their method will then be scrutinized and reassessed biologically using qPCR on remaining amplified DNA sample with primers designed to target those detected CNVs Our Nanopore long - read analysis pipeline is based on a R - Shiny applet shared by our collaborator, Emily Ebel (38) . It allows for direct visualization of by first BLASTing .5 - 1kb ‘blocks’ of the reference genome to the reads, and then plots the read on the x - axis, and reference gene ‘blocks’ on the y - axis (38) . Next, we filter out low quality reads (<90% identity) to remove reads that hav e a high chance of producing false positive CNVs. Reads are then filtered by those that have multiple gen e ‘block’ hits on the same read. The nature of the visual output allows us to identify CNVs by eye manually. Impact of parasite origin on CNV rate : Clinical and lab lines of malaria differ in their levels of within - host competition, ultimately leading to differences between aspects between the two. Notably, clinically derived parasites often have a large variety of phenotypes, increasing p opulat ional diversity , and adaptability (44) Further, many CNVs not detecte d in laboratory parasites carrying unknown clinical or adaptive significance have been discovered in the global surveys o f field populations (27) Thus, it is possible that laboratory and clinical parasite strains carry distinct basal and stress - triggered amplification profiles that affe ct their ability to acquire antimalarial resistance To deepen the impact of our conclusions, we aim to perform a similar analysis as above on clinically isolated samples. NRL would be a likely first approach the patient is untreated , due to the relative s implicity of the protocol and its virtues for detecting structural rearrangements. Samples that have already been treated with a an anti - malarial will likely have to be used in a single cell pipeline – as most drugs in clinical settings will reduce parasit emia to an unusable level for NRL. However, the ideal situation would be one where we could perform NGS and NRL ( Fig. 8 ) . Our collaborator in the UVA hospital ( Dr. Chris Moore) informs us when a malaria - infected patient has been identified; we have a standing IRB protocol to receive blood leftover from clinical testing for our experiments . These parasites can be used in our analysis in two ways: 1) an antimalarial - treated patient whose iRBCs we receive will serve as a re al - world parallel to our lab - line studies. We can compare the CNVs per genome of these clinical samples or to non - treated clinical samples. 2 ) In the case we receive iRBCs from a patient who has not been exposed to anti - malarials, we can decide to incubate and treat these parasites using our own methods and analysis. Correlation of short and long read data for detection of CNVs : There are fundamental differences in short and long read sequencing that n eed to be reconciled to quantify CNVs. Parasites from the same biological replicate will be split for the generation of both short and long read sequencing. Long read sequencing is loaded with bulk DNA, meaning that individual genomes cannot be distinguish ed as in short read sequencing. Therefore, we consider the total number of ‘genomes’ in long read sequencing as the normalized average coverage across the bins of 1Kb. Whereas a single genome for Illumina short reads is simply the single cell it is derived from. These approaches also have different mechanisms to detect CNVs, Figure 8: Collection of experimental lines to minimize intra - experimental variance which affects the sensitivity of each approach. Therefore, using data from both approaches to calculate the precise number of CNVs in each treatment group is uninformative. Instead, we will focus on the relative comparison of the number of CNVs detected before, and after treatment for each technique. We predict that the difference between rare CNV levels pre - and post - treatment will be consistent between techniques, giving us a concrete scalar to represent the effect of drug stress on population wide CNV formation in our parasites. Further, in our commonly used cell lines, we have known CNVs that will serve as an internal control for our ability to accurately detect the position and numbe r of CNVs. We will also consider the frequency of CNV occurrence at a particular location in the genome to determine if certain treatments have repeatable effects on structural arrangements. We will also investigate potential sequence homologies of our obs erved rare CNVs to determine if there is a structural basis for the propensity of CNVs to form. Finally, the function of the genic region within the observed CNV will be noted and an attempt at correlation will be made. Expected Outcomes I expect that, u pon treatment, the rat