Kylie Munyard B.Sc. (Hons) PhD
The production of the complete alpaca genome sequence by the Researchers at the Laboratory for Genomic Diversity, University of Maryland, is a wonderful achievement, and one of incredible value to Camelid scientists. It isn?t, sadly, the answer to every question about the genetics of alpacas. Before we can use the sequence we need to interpret it. We need to identify which pieces of DNA are genes, which are regulatory regions, which are non-functional or of no known function and which are markers that can be used for further study. Such interpretation is called ?annotation? and is a time-consuming and difficult process. Luckily for those of us interested in working on the genetics of alpacas, there are a number of other genomes with just that kind of information available, most notably the human, but also the mouse, dog and cattle genomes. We can take advantage of the information from other species to help us interpret the information in the alpaca sequence.
However, even if the entire alpaca sequence was completely annotated we would still need much more research. Why? Because it is the sequence from a single alpaca. We can identify specific genes in that single animal, but that doesn?t tell us anything about how those genes vary in the population as a whole. The good news is that we can use the sequence to increase the rate of progress of genetic research.
As an example, consider the path that our research has taken in the area of alpaca fibre colour research. This research was done before the alpaca sequence was available. We wanted to study the gene melanocortin-1 receptor (MC1R), which is one of the key genes in pigmentation in all mammals. The MC1R gene is well characterised in humans, cows, and mice. That means that the sequence, plus the phenotypic effects of the genes? alleles were well known. So, we identified the MC1R sequence from each of these species and compared them. What we were looking for was a region of sequence that was identical across all three species, a so-called ?consensus? sequence (Figure 1). Why? Because that would tell us that that particular piece of sequence was highly conserved across some quite diverse species. The logical conclusion being that it was likely to be the same in alpacas too. We needed to find this conserved sequence in order to use a molecular genetics technique called Polymerase Chain Reaction or PCR. PCR is a DNA photocopying method which can make millions of copies of a particular piece of DNA. The only drawback is that in order to be able to perform PCR, you need to know a little bit about the sequence first, hence our need to identify some consensus sequence. The good news is that as little as 15 known bases at each end of the desired sequence (called PCR primer sites) is enough to allow PCR to be performed. Once that particular piece of DNA, called the target DNA, has been copied by PCR, it is a simple procedure to obtain its complete DNA sequence. This procedure is relatively cheap, so the next step is to obtain the sequence of the MC1R gene from a large number of alpacas.
Once we have those sequences, we compare them. We note any regions of DNA that are different between the different alpacas (Figure 2). We then look to see if any of those changes might be capable of causing a change to the protein that is coded for by the MC1R gene. Differences can include the way that the gene is regulated, or something as simple as a single letter change in the DNA that causes a change in an amino acid (the building blocks of proteins) at a key point in the protein. Once all the probable causative changes (or mutations or polymorphisms) have been identified, then we look at the phenotype and pedigree of the animals themselves. What we are looking for is a change that is always associated with a particular colour, and is never associated with any other colour. We try to match mutations with effects. Sometimes, you need to combine mutations to get any good matches. In our research, we found two mutations that seem to be correlated with particular colours. Both of these changes were capable of causing the MC1R protein to lose its function. Taken together, we saw that a particular combination of these two was always present in alpacas that had non-black skin (Figure 3). The interesting thing was that in this group of animals the fibre colour ranged from dark brown to white! In the other group, the black-skinned group, the colours ranged from black to white. This puzzled us, until we considered information about how MC1R works in other species. We realised that what we were seeing might be a combination of alleles that led to a loss-of-function MC1R allele. When MC1R is fully functional it, along with the Agouti signalling protein, controls the proportion of black versus yellow pigment granules in the skin and hair (many, many other genes are involved in pigmentation as well, but these two are the central ones). When MC1R is non-functional, only yellow pigment can be produced. The appearance of yellow pigment granules in the skin and hair of mammals can range from white to very dark brown in appearance, depending on the action of lots of other genes. Our hypothesis is that we have found a non-functional MC1R allele. Non-functional MC1R alleles are recessive to normally functioning MC1R. This means that if an animal has one copy of each (i.e. is heterozygous) it will appear to be normal, but can pass on a non-functional allele to its cria. We use the letters E and e to denote functional and non-functional MC1R alleles respectively. The letter ?E? comes from the historical naming of the MC1R locus as the Extension locus, before the exact gene was known. So, a homozygous functional MC1R animal has the genotype EE, and will be whatever colour its Agouti gene dictates. A homozygous non-functional MC1R animal has the genotype ee, and will be only able to produce yellow pigment (of whatever intensity). A heterozygous animal will be Ee, and it will also be the colour dictated by its agouti gene. We have sequenced the MC1R gene from over 50 Australian alpacas, and so far this hypothesis fits the data. Our work is continuing, we now need to look at families of alpacas and ensure that our genotypes match the observed phenotypes.
A further line of enquiry has led from these results, though, and this is where the existence of alpaca genome sequence makes life much simpler for us. We noted that alpacas with different intensities of fibre colour (i.e. white, pale fawn, chestnut and brown) all had the same MC1R genotype (Figure 4). This means that the intensity of fibre colour is not controlled by the MC1R gene. Quite a few genes are known in other species that act to dilute colour intensity. You are probably aware of the relationship between chestnut and palomino in horses. This is caused by the presence of one mutated allele of a gene called Membrane Associated Transport Protein, or MATP. Two copies of the mutated allele lead to pure white ?cremello? horses. This pattern of dilution seemed like a good model for alpacas. MATP mutations are responsible for the dilution of bay horses to dun, as well as some kinds of albinism in humans.
The sequence of MATP is well known in humans, mice, horses and cows. So, we found that sequence. Then, instead of comparing the sequences of all those species, we turned instead to the alpaca genome sequence. We used the sequence from the cow to ?fish out? matching sequences from the alpaca genome. We then designed PCR primers to match exactly to the alpaca gene sequence, and amplify it. The benefits of working this way are that you know that you will be able to amplify the alpaca DNA. We are currently working on sequencing the protein coding regions of the MATP gene (the exons) and hope to have results by the end of the year.
The approach to finding genes that control traits that I have outlined so far is called the candidate gene approach. This is only one way to find genes that are responsible for desired (or deleterious) traits. If you don?t have the luxury of detailed information about a trait as we do for colour genetics, then a different approach is needed, called linkage analysis. And again, the alpaca genome sequence is very, very useful for this approach.
Along with all the useful genes in our genome are a number of types of DNA that are called markers. Markers come in a variety of types, but the essence of them is that they allow the genome to be broken into manageable pieces. Consider the comparison between the size of the genome to the size of a small library. A mammalian genome is equivalent to 120 books, each with 1000 pages, with each page containing 25,000 letters of DNA. That?s a lot to search through. It?s especially difficult when you don?t even know what the gene looks like. Another analogy is that of a street directory. Searching blindly through a mammalian genome is like asking someone to find a particular building when they don?t know what the building looks like, which street it?s on, or even which city it is in. The markers in this example are the road names, and the symbols on the map. The complete sequence of the alpaca genome will allow us to identify thousands of markers, and to use them to find genes of interest.
How do scientists use markers to find genes? Well, the first step is to get a family of alpacas in which the trait of interest is apparent. This could be colour, a particular disease, or even ear shape. The nature of the trait doesn?t matter, the only criterion is that the trait must first be established to actually be inherited. For each member in the family (and it?s better to have more than one family) you need a thorough phenotype, and a pedigree. Each member is then assigned a code designating its status for the trait being studied, and a pedigree is created. Scientific pedigrees differ from breeder pedigrees in that the name of the animals is unimportant, only the trait status and relationship to the other animals is important. This pedigree is the framework upon which the DNA science is based. It is critical that the data in the pedigree is accurate and complete. This is where the relationship between breeders and scientists is so vitally important. Without the breeders, the science would be glacially slow, if it was possible at all.
The next step is to genotype each animal in the pedigree for the complete set of markers. This is an expensive process. Each individual in the study families is analysed at the site of each marker, and its genotype is recorded. Every marker has different alleles, just like a gene does. So, for a given marker, animal A might have alleles 1 and 2, while animal B might have alleles 3 and 4. The next step is to look for correlation between a particular marker allele and the trait of interest. For example, we might be looking for the appaloosa gene. So, our family would consist of members with and without appaloosa markings. If the animals with appaloosa markings tend to have a particular allele at a particular marker, and the non-appaloosa animals tend to have a different one that is a good sign. What this result means is that the particular marker might be near to the gene that controls appaloosa. If alleles from two or more markers show this same correlation, then you know that the gene of interest must be near to those markers. This narrows the search area, from the whole of the genome to a much smaller, specific area. Once the target region in the genome is identified, we can search through the area for genes that might be capable of affecting (or causing) the trait of interest. Depending on the density of markers used to define it, this search area might include 10?s or even 100?s of genes. Then, for each possible gene, we go through the whole candidate gene process, until we find the gene and the specific mutation responsible for the trait. For many traits, a single gene is responsible, however, for most of the traits of interest to alpaca breeders (e.g. fibre diameter, lustre) it is probable that more than one gene is involved. The linkage analysis approach will help to identify multiple genes as easily as single genes.
This article is a very, very brief introduction into some of the scientific processes that are used to find out about genes that control traits. However, I hope that you now have a feeling for why genetic research takes so long, costs so much, and is so dependant on data and samples from breeders.