Introduction
The alpaca is one of the two domestic South American Camelids and is raised primarily for fiber production. Alpaca fiber is composed of alpha keratins and keratin-associated proteins (KAPs) (Powell & Rogers 1997). The latter are a complex group of proteins, which have high levels of cysteine, or glycine and tyrosine. It is hypothesized that these proteins play an important role in defining the physical-mechanical properties of the fiber, therefore sequence variation and variation in the expression of the genes encoding KAPs may be associated to variations in the quality of the fiber (Gong et al. 2012, Powell & Rogers 1997). KAPs are divided into two main groups, those with high sulfur content and those with high glycine and tyrosine content. The first group can be separated into two subgroups, high-sulfur proteins with less than 30% cysteine residues and very high-sulfur proteins with more than 30% cysteine residues (Gillespie & Broad 1972, Gillespie 1972).
The genetic improvement of alpacas through selective breeding is a subject of interest for breeders and the textile industry involved in alpaca fiber processing. Breeders aim to augment their revenue streams by augmenting both the quantity of fiber produced and diminishing the diameter of individual fleece fibers per sheared animal. Meanwhile, the textile industry endeavors to procure a larger supply of extra-fine and fine-quality fiber, which will subsequently be utilized in the fabrication of textiles with higher economic value in the market (Gutiérrez 2008).
Single nucleotide polymorphisms (SNPs) are characterized by single-base variances in the DNA sequence among individuals within a species. These molecular markers serve as indispensable tools in the genetic improvement of various animal species through genomic selection. SNPs hold significant value in association studies owing to their extensive variability and widespread distribution across the genome. Additionally, they play a crucial role in establishing kinship relationships, as the majority of SNPs exhibit uniqueness within a given population (Pierce 2006).
From previous studies on SNPs in alpacas, it is important to mention that according to Jones et al. (2019), the gray color in alpacas is the result of a SNP c.376G>A (p.Gly126Arg) located in exon 3 of the KIT gene. Likewise, Salas (2019) found 34 SNP-type variants in genes involved in the structure and development of alpaca fiber (KRT31, HOXC13 and EDAR); however, no association was found between the gene fragments analyzed and the diameter of the alpaca fiber. Foppiano (2016) identified 27 SNP markers in five genes of the keratin-associated protein family (KRTAP1-2, KRTAP6-1, KRTAP9-2, KRTAP11-1, KRTAP13-1), and conducted an association study between fiber diameter and two SNPs of the KRTAP11-1 gene but did not find a significant association in a sample of 152 white Huacaya alpacas from Puno (Peru). Fernandez et al. (2019) performing a bioinformatics analysis based on orthologous genes from sheep, goat, and human, identified 48 SNPs in 22 alpha keratin genes in alpacas. Finally, Calderon et al. (2021) reported the development of a 76k alpaca SNP microarray, which included 302 SNPs that were found in candidate genes for fiber quality and color.
The objective of this study was to identify single nucleotide polymorphism molecular markers within genes encoding keratin-associated proteins (KRTAPs) in alpacas. Quality parameters such as the absence of SNPs in the flanking sequence (70 bp), minor allele frequency, rate of samples with analyzed sequences, and Illumina Score were considered to identify reliable SNPs suitable for inclusion in an alpaca SNP microarray.
Material and methods
Genomic data. For sequence comparisons to identify SNPs we used the alpaca reference genome (VicPac3.1, GenBank accession number: GCA_000164845.4) and nine alpaca genomes available in NCBI. Three of these were generated by the Universidad Peruana Cayetano Heredia (Accession number: PRJNA340289) and six genomes by the Universidad Nacional Agraria La Molina (Accession number: PRJNA685331) (Anex 1). We also used sequence reads from 150 reduced representation DNA libraries with DNA fragments generated after digestion with the ApeKI enzyme and 150 generated after double digestion with the PstI and Msp enzymes (Calderon et al. 2021).
The sequences, number of exons and location (scaffold and chromosome) of 34 keratins associated proteins genes reported in alpacas were obtained from the NCBI. To determine the initial and final position of each exon, as well as of the introns and untranslated regions, the Sequence Viewer (https://www.ncbi.nlm.nih.gov/projects/sviewer/) was used.
Identification of single nucleotide polymorphisms. Having obtained the positions of each exon, a coordinate file (bed file) of the exons, introns, and untranslated regions (UTRs) of each gene was generated. To align the sequencing data of the nine alpaca genomes, referred to above, to the 300 alpaca reduced representation DNA libraries, also described above, and to the VicPac3.1 reference genome we used BWA v.0.7.10 (Li & Durbin 2010). While single nucleotide polymorphisms (SNPs) were identified using BCFtools (https://samtools.github.io/bcftools/bcftools.html),
Selection of SNPs. A reliable SNP should be devoid of other SNPs in its flanking sequences (35 bp 5’ of the SNP and 35 bp 3’ of the SNP, for a total of 71bp). Selection of SNPs for inclusion in a microarray requires a minor allele frequency (MAF) of ≥ 0.05 with a genotyping rate > 45 %, and an Illumina Score ≥ 0.6 (Laurie et al. 2010). The absence of SNPs within flanking regions was assessed by creating a vcf file with the genotype information using the KGD v0.8.2 software with default parameters (https://github.com/AgResearch/KGD). MAFs and genotyping rate for each SNP was calculated using the PLINK software and the Illumina Score was calculated using the Illumina Design Studio (https://designstudio-array.illumina.com). Thus 35 KRTAPs SNPs were included in the 76K alpaca SNP microarray (Calderon et al. 2021).
Validation of SNPs. The selected 35 KRTAPs SNPs were validated by genotyping 937 white Huacaya alpacas (58 males and 879 females) with the 76K alpaca SNP microarray. Genotyping was done at NEOGEN GeneSeek Laboratories (Nebraska, United States). For this study, only the genotyping results of SNPs located in KRTAPs were analyzed. Quality control was performed using the PLINK v1.90p program (Chang et al. 2015). Samples with a genotyping rate ≥ 90% and SNPs with a genotyping rate ≥ 90% were retained.
Results
The location and sequences of the 34 KRTAP genes annotated in the reference genome of alpacas were obtained, of which according to the classification of Gong et al. (2012), 19 are KRTAPs encoding high sulfur content proteins, 6 are KRTAPs encoding ultra-high sulfur content proteins, and 9 are KRTAPs encoding high glycine and tyrosine content proteins (Table 1).
A total of 67 SNPs were identified (Table 2), of which 48 are distributed in 18 KRTAPs encoding high sulfur content proteins, 8 in 5 KRTAPs encoding ultra-high sulfur content proteins, and 11 in 9 KRTAPs encoding high glycine and tyrosine content proteins.
Table 2: Location of keratin-associated protein genes (scaffold and chromosome), region and position (within the scaffold) of the SNP (within brackets and in red typo) and sequence of 71 bp of the SNP. HS: KRTAP encoding high sulfur content proteins. Chr: Chromosome. UHS: KRTAP encoding ultra-high sulfur content proteins. HGT: KRTAP encoding high glycine and tyrosine content proteins.
Gene | Group | Scaffold | Chr | Position | 71 bp sequence |
---|---|---|---|---|---|
KRTAP7-1 (LOC102544636) | HGT | ABRR03000004.1 | 1 | 12420218 | TATGGCGGCAGCTTCTGCAGGCCATGGGGCTCCAG[C/G]TCTGGCTTTGGCTACAGCACCTTCTGATGGACCAA |
KRTAP8-1 (LOC116280298) | HGT | ABRR03000004.1 | 1 | 12436062 | ATGGCAGCACCTACTCCCCAGTGGGCTATGGCTTCGGCTA[T/C]GGCTACAACGGCTGTGGGGCTTTCGGCTACCGAAGATCCT |
KRTAP8-1 (LOC116280298) | HGT | ABRR03000004.1 | 1 | 12436103 | GGCTACAACGGCTGTGGGGCTTTCGGCTACCGAAGATCCT[G/A]CCCATTCTTTTTCTAGTGATTTGCTGAAATCCCCAAGGAG |
KRTAP19-1L (LOC116276605) | HGT | ABRR03000004.1 | 1 | 12804959 | GATTTTAGGAATATTTATGATTTCATAATTCTCTGCTTCATGTCTCTTATTTGTGCTTTC[A/G]TAACTGTGGCATTTCCCTATTCTTTTGCAATAAATTTCCCTAAGATAAAACAGCAtacat |
KRTAP19-3L (LOC116280314) | HGT | ABRR03000004.1 | 1 | 12746893 | CTGTGGCTATGGTTGCCTCCGAGGTCTGGGCTATGGCTACGGCGCTGGCTATGGTGGCTA[T/C]GGATACGGCTGCTACCGCCCGTGTTACTACGGAAGATACTTGTCCTCTGGCTTCTACTGA |
KRTAP20-2L (LOC116286138) | HGT | ABRR03000004.1 | 1 | 12616128 | TAGCCACAGCCTAGGCCACCATAGCCACAGCCATAACCATAGCCCAGGCCAC[C/T]GTAACCATAACCCAGACCTCCATAGTAGTTGCCGTAGTAGCACATGGTTTCA |
KRTAP20-2L (LOC116286182) | HGT | ABRR03000004.1 | 1 | 12651734 | ACCAGTATCTTCCATAGCAGCATGGGCGGTAGCAGGCATAGCCATAGCCAC[C/T]GTAGCCACATCCATAGCCACATCCCAGGCCACCATAGCCATAGCCCAGGCC |
KRTAP20-2L (LOC116276588) | HGT | ABRR03000004.1 | 1 | 12660986 | AGCCAGAAGACCAGTATCTTCCATAGCAGCATGGGCGGTAGCAGGCATAGCCATAGCCAC[C/T]GTAGCCACATCCATAGCCACATCCCAGGCCACCATAGCCATAGCCCAGGCCACCATAGTA |
KRTAP20-2L (LOC116276577) | HGT | ABRR03000004.1 | 1 | 12678801 | AGCAGCATGGGCGGTAGCAGGCATAGCCATAGCCACCGTAG[C/T]CACATCCATAGCCACATCCCAGGCCACCATAGCCATAGCCC |
KRTAP21-1L (LOC116286002) | HGT | ABRR03000004.1 | 1 | 12501902 | TTTGAGAAACCTCCTCTTTCAACCCACCTTAATTCACATCCT[T/C]TGACAACATGTGTTGCAACTACTATGGCAACTCCTGTGGCTA |
KRTAP21-1L (LOC116286002) | HGT | ABRR03000004.1 | 1 | 12502328 | TCCAAGAAATTACATGCCTGAAAGAAGGTTTGACCTCCAA[T/C]TACCCGGTTTTAGATTGGCATCTTTGTAGCCTGAGGCTCC |
KRTAP4-1 (LOC102543867) | UHS | ABRR03077387.1 | 16 | 400197 | GCCTGTGTGCTGCCAACCTACCTGCTGTCGCCCCAGGTGCTGCAACTCCAGCTGCTGCCG[C/G]CCCATCTGCTGAGGATCCTCTTTTTGCTGAACTTCATTCCTCACCACCAGCCCTGAGCCA |
KRTAP4-1 (LOC102543867) | UHS | ABRR03077387.1 | 16 | 403985 | ATTCTAacttttccatgttttaaatgaCCACCAAGTTTATTATCCTGTGACACTACCAAG[G/C]CACTGGAGAAAGAGCCTCCATCTGCGTTCAGTACCACCCCTGCTTCCCAGAAATCTCTTT |
KRTAP4-1 (LOC102543867) | UHS | ABRR03077387.1 | 16 | 404135 | GTTGGATCAAACCCCCAAGTCTCTGGCTGTTACACTTGCCTTGACCTTCATAACTGCATG[T/C]TAACTATTTCTCAATAATACTATCTTGATATCATAAATTTGTGTATCCTTCTTTACTTAT |
KRTAP4-3L (LOC102529268) | UHS | ABRR03077387.1 | 16 | 500673 | GCCAGACCAGCAGCTGTGAGACTGGCTGTGGCATTGGTGGTAGCA[T/C]TGGCTGTGGCCAGGAGGGTGGCAGCGGAGCTCTGAGCTGCCGCAC |
KRTAP4-7L (LOC107032861) | UHS | ABRR03077387.1 | 16 | 411461 | CAGGTGTGGTTCCAGCTGCTGCAGGCCCACCTGCTGCATCTCTAGCTGCT[G/A]TCGTCCCAGCTGCTACCAGACCACGTGCTGCCGCCCCAGATGCTGTGTGT |
KRTAP9-2L (LOC102544126) | UHS | ABRR03077387.1 | 16 | 379872 | CCACGTGCTGCCGCCCTAGCTGTGTGTCCAGCTGCTGCCAGCCCTCCTGCTGATCACCTC[A/T]CCAAGAGCCATCCCCTGCATCCAACACAATCTGTCAACTGAGTTGCCGTTTTGGGGGCAA |
KRTAP9-4L (LOC102534338) | UHS | ABRR03001291.1 | na | 3527 | CTGGTTACACAGGTAGGCTGGCAGCAGGTTTGTCCACAGCTGCTGG[A/G]CTCACAGCAGGTGGGCTGGCAGCAGGTGGTCACACAAGTAGGTTTG |
KRTAP9-4L (LOC102534338) | UHS | ABRR03001291.1 | na | 3574 | CTCACAGCAGGTGGGCTGGCAGCAGGTGGTCACACAAGTAGGTTTG[C/T]GGCAGGTGGTGCTGCAGCAGGTGGTCTGAGTAGCTTCATAGCAAGT |
KRTAP3-1 (LOC102528725) | HS | ABRR03077387.1 | 16 | 514886 | ATCAGTCTCCTTCAGCCTACCTGCTGCGACTGCTG[T/C]CCCCCACCCTGCTGTGAGCCTGACACCTACGTGCC |
KRTAP3-3L (LOC102528468) | HS | ABRR03077387.1 | 16 | 526383 | GATAAGTCCTGCCGGTGTGGAGTCTGCCTGCCCAG[T/C]ACCTGCCCACACACGGTTTGGTTACTGGAGCCAAC |
KRTAP3-3L (LOC102528468) | HS | ABRR03077387.1 | 16 | 526561 | TCACAACCTACActcagccctgctctgagccctgC[G/A]TCCCAAGATGCTGCTGACCGATGACTGCTTCGCTC |
KRTAP3-3L (LOC102528209) | HS | ABRR03077387.1 | 16 | 532094 | ATTCCCAAGGCCATGGCTTGCTGTGTTTCCTGCGA[C/T]GGCTGCAGTGTTCGCACCGGCCCCGCCACCACCAT |
KRTAP3-3L (LOC102528209) | HS | ABRR03077387.1 | 16 | 532200 | GCCTGCCCAGCACCTGCCCACACACGGTTTGGTTA[C/T]TGGAGCCAACCTGCTGTGACAActgccccccaccc |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12853114 | aaaagatggcTGTACTGAGCAGAGACTTTGATCTCAATACTCAGAAGGATCTAGACCCTC[A/G]GCAAGTTGAATAATAGAAACCAGAGCTAGAGATTGGTCTATAACACGATGACTGGCAGCT |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12853197 | GAGCTAGAGATTGGTCTATAACACGATGACTGGCAGCTGCTGGAGGTAAAGCAGGTTGGA[C/G]GGCAGAATCTGGATCCACAGCTCAGGGAGGGGAAGCCACAGACTCCATAACCCAGAGGTC |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888037 | GCTGATCAATAGAAGCCAGACTGACAGGTTGGTCTGTAA[C/T]AGGATGATTGGCAGCTCCTAGGGGCAAAGTAGGTTGGAC |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888077 | AGGATGATTGGCAGCTCCTAGGGGCAAAGTAGGTTGGAC[G/A]GTAGAAACCGAATCCATGGCTCAGAGGAGGGAAGCCACA |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888117 | GTAGAAACCGAATCCATGGCTCAGAGGAGGGAAGCCACA[G/T]ACTCCACAGCCCAGGGGTCTGAAGCCATGGGACCCAAAG |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888170 | ACTCCACAGCCCAGGGGTCTGAAGCCATGGGACCCAAAGCCCAGTGAGTAGC[A/G]GCTTCTGGACCCAGCGCCCAGGAAGCAGCCTCTTCCGGACCCAGAGCCTGGA |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888236 | GCCTCTTCCGGACCCAGAGCCTGGAGACCTAGCATAAG[T/C]TTTCTGGAAGGGACTGCAGAGCATGGAGGTCCTTGGGC |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888275 | TTTCTGGAAGGGACTGCAGAGCATGGAGGTCCTTGGGC[G/A]GTAGCAGGATGTCTGGCAGGGGCTGGATACCACATAAG |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888315 | GTAGCAGGATGTCTGGCAGGGGCTGGATACCACATAAGA[C/T]GTCTGGCACCTGATAGGCTTACAACAGGTCTCCTGACAG |
KRTAP10-1L (LOC107034113) | HS | ABRR03000004.1 | 1 | 12888357 | GTCTGGCACCTGATAGGCTTACAACAGGTCTCCTGACAGCC[G/A]CTGTGAAGAGAGGAGTCCTGctggcaggtgctgggagagca |
KRTAP10-10 (LOC116284217) | HS | ABRR03077307.1 | 18 | 8124973 | GCCTCAACTGTCTCTGCTTTGAGATCAAGCTCCGATGATGACCCAGTGTCCCTGCCCCTT[G/T]GGGGAGCGGCCACCCCCCGGGGCCCACATGCCCTGTTCCCTGAGGGGCCTCAGGACACCT |
KRTAP11-1 (LOC102544905) | HS | ABRR03000004.1 | 1 | 12376109 | ACATGTCCTACAACTGCTCCACAAGGAATTGCTCT[T/G]CCAGGCTGATTGGGGGACAATACTCTGTCCCCGTG |
KRTAP12-1L (LOC102528691) | HS | ABRR03009911.1 | na | 764 | GAGGACTGGCGGGAGGGGGCCACACACACAGGGGCTTGCAGCTCACGGGCAGGCG[C/T]GTGGTCAGCTTGCAGCTCACAGGCACGCACACGACAGGCCTGCGGCTCAGTGACA |
KRTAP12-1L (LOC116283958) | HS | ABRR03000004.1 | 1 | 3777628 | TGAGGAGCCCAACCCTGCGGAAGAACCTCCTGGATCAC[C/T]GAGTTTCCGCATCAAAGTTGGGGACGCTGTGCTTGTGC |
KRTAP12-2L (LOC116280071) | HS | ABRR03000004.1 | 1 | 3775478 | CCTGCCGGAGACAGAGAGGGACCCTGGGAACCAGGGCTTCCAGAGGAGATGCGCAGATTC[A/G]TAGAAAGATCCAGATCGTCCTGCCACGCTCTTGCCCCAGAGTTCAGAGGGTTGGTTCATC |
KRTAP12-2L (LOC102530622) | HS | ABRR03001991.1 | na | 807 | acaatacttcaataaaaaaataagaattaaaaaaaatggagggacACACTCCATTGAGAA[A/T]GCATGATTTATTTGTCACACATGGGATGTGGCTTCCTGCCGGAGACAGAGAGGGACCCTG |
KRTAP12-2L (LOC102530622) | HS | ABRR03001991.1 | na | 902 | CCTGCCGGAGACAGAGAGGGACCCTGGGAACCAGGGCTTCCAGAGGAGATGCGCAGATTC[A/G]TAGAAAGATCCAGATCGTCCTGCCACGCTCTTGCCCCAGAGTTCAGAGGGTTGGTTCATC |
KRTAP12-2L (LOC102530622) | HS | ABRR03001991.1 | na | 1519 | CTTAGGGCTGGGCTGGCGTCGGGAGGAGGGCGGTGATGTCTGCAGCCTCCCTGCCCGGGC[G/T]GCCTTTATACCCGGGCCGTGGGCGTCCCAGCAGCACAGAAGCTCACCTGCTGACTTCCTC |
KRTAP13-1L (LOC102533350) | HS | ABRR03000004.1 | 1 | 12923762 | TGCTGCTCTGGAaacttctcctcctgctcccttgg[G/A]GGCCACTTGCACTACCCAAGCTCTTCCTGTGGCTC |
KRTAP13-1L (LOC102533350) | HS | ABRR03000004.1 | 1 | 12923825 | TGTGGCTCCTCCTACCCCAGCAACCTGATCTACAC[T/C]ACGGAcctctgctctcccagcacctgccagCTGAG |
KRTAP13-1L (LOC102533350) | HS | ABRR03000004.1 | 1 | 12923905 | GTACAGTGGCTGTCAGGAGACCTGCTGGGAGCCCA[C/G]CAGGTGCCAGATGTCCCATGGTGTGTCCAGCCCCT |
KRTAP13-1L (LOC102533350) | HS | ABRR03000004.1 | 1 | 12924045 | CTCTGGGCTCTGGGTCCGGCAAAGGCTACTTCCTG[G/T]TCTATGGATCCAGAAGCTGTGGGTCCAGTGGATCT |
KRTAP13-1L (LOC102532866) | HS | ABRR03000004.1 | 1 | 12846534 | TACCGCTGGAAGACCTCCACGCTCTGCCGTCCCTG[G/C]CAGACGACTTACTCTGGGTCTCTGGGCTTTGGCTC |
KRTAP13-1L (LOC102532037) | HS | ABRR03000004.1 | 1 | 12859532 | GTCAGGAGACCAGCTGTGAGCCTATCAGATGCCAG[G/T]CATCCTGCTGCCCCCAGAGGACCTCCAAGCTCTGC |
KRTAP13-1L (LOC102532037) | HS | ABRR03000004.1 | 1 | 12859761 | TGGGTCCAATTTCTGGTACCCAATCAACTTTCCTT[C/G]CAGAAGTTTCTGTTCATCTTGTTACTGGCCAATTT |
KRTAP16-1 (LOC102544939) | HS | ABRR03077387.1 | 16 | 353702 | AGGGCCTGTTTGCCTGCCCAGTTCCTGCCGGAGCC[G/A]GACTTGGCAACTGGTGACTCAAGATAGCTGCAGAT |
KRTAP16-1 (LOC102544939) | HS | ABRR03077387.1 | 16 | 354957 | AGTGTGCCCCACGCCTAGCTGCTGTTCATCTGTCC[G/A]CTCCGTGGCCAATGGCCGCCAGTCTGTCTGCTGTG |
KRTAP16-1 (LOC102544939) | HS | ABRR03077387.1 | 16 | 355358 | CCGTCTACCTCGTGCCGACCTCTCTGCTGCCGCCCAGGGTCTTCTGCATCTGTCATCTGC[C/A]GGCCAATTTGCTCTCGAACTTTCTACATACCCAGCTCCTGCAAACAACCATGCACTCCTT |
KRTAP16-1 (LOC102544939) | HS | ABRR03077387.1 | 16 | 355869 | CTGCAGTGCACGCCTCTGACTACAGAGAAAATGTGACTGCCCTCCCCAAAGCTCATCTGA[C/T]ttagaatctttttctttctgcactATCACTCACCATCTGCTTATGCTTCAAAGAACTCAC |
KRTAP24-1 (LOC102532787) | HS | ABRR03000004.1 | 1 | 12979560 | ATATGCTACAGAACCCACTGTATTATCCCAGTGAC[T/G]CCTTCTGTTGCTCTTTGCTCCAGCGATGTAAGCCC |
KRTAP24-1 (LOC102532787) | HS | ABRR03000004.1 | 1 | 12979659 | TACCAAGGAACTCTCTGGCTTCTGGATAACTGCCA[A/T]GAAACCTGTGGTGAAGCACCAATCTGTGAATCTCC |
KRTAP24-1 (LOC102532787) | HS | ABRR03000004.1 | 1 | 12979802 | AGTGGGCAAAATACGCAGTGCCTGTGAAACTACCA[A/G]TGTCGGACCCAGCCCCAGCTGCAACCCATGCACTC |
KRTAP24-1 (LOC102532787) | HS | ABRR03000004.1 | 1 | 12980054 | CATACCAAATGGCTTCTCACCCTCATCTTGTATTG[C/T]CAACAGCTGCCGATTCCaaaattatttaagaagaa |
KRTAP26-1 (LOC107034115) | HS | ABRR03000004.1 | 1 | 12947455 | CTCCGGAAATCTCTGTCatatttctcttccctcctccattGCCCTCTGC[T/C]CTACCAATGTGAGCTGTGGAGATGTCCTCTGCTTGCCCAGCAGCTATCC |
KRTAP26-1 (LOC107034115) | HS | ABRR03000004.1 | 1 | 12947628 | AGCCACCTGGGAGCCCAGCCTCTGTGAGACTTCCAGCTGCCCTTCCACTGCTTGCTATGT[G/T]CCCAGACCCTGCCAAGGGACCAACCTTCTTCCTGCTTCTTACAtctctggctcctgcctc |
KRTAP26-1 (LOC107034115) | HS | ABRR03000004.1 | 1 | 12947708 | CTTCCTGCTTCTTACAtctctggctcctgcctcccagtgTCCTGCAGACCTC[T/A]GAGCTATGCGTCCAGCAGCTGCCGACCCCTGAGCCTCCTCACCTATGGATGC |
KRTAP26-1 (LOC107034115) | HS | ABRR03000004.1 | 1 | 12947881 | ACCTCTGCGTCCTCTCTTCAGTGGATGCCAACCTCTGACACAAGTGTTCAGTCCTTGT[C/T]GTCCATCCTGCTCTGCATTGGGAGGCCAGTAGCTTCCTTGTTCCAGCTAATAATCATG |
KRTAP27-1 (LOC102533595) | HS | ABRR03000004.1 | 1 | 12931105 | ggattttttttaccCAGCAGCTGCCACAGTAGGAC[C/T]TGGCTCCTGGACAACTTTCAAGAAACCTGTTGTGA |
KRTAP27-1 (LOC102533595) | HS | ABRR03000004.1 | 1 | 12931214 | TGTCCACAGGGGATAGCTGTGTGCAAACTGCCTGC[C/G]TCCCCCGAGTTGTCCAAACGAATTGTTCTAATTCC |
KRTAP27-1 (LOC102533595) | HS | ABRR03000004.1 | 1 | 12931576 | ACTTATGAGCCAACTTGCTGTGTTACTGGTGGTTT[G/A]CAGTTGCCTAGTGAATgaagaatgtgaaaaatgtg |
KRTAP29-1 (LOC102544666) | HS | ABRR03077387.1 | 16 | 361421 | CTGCCAATCAACTTATTACCAACCTGTCTGCTACTTTTTTAAGTCTTGTCAATCAGTTCC[C/T]TGCGTGCCTGTGCCCCACCAGCTGTGTCcttgtgttttctgttcttgCAATCCTGCTTGC |
KRTAP29-1 (LOC102544666) | HS | ABRR03077387.1 | 16 | 361604 | CCAGCCAGTGGCTAACCCTTGTTCTGTAAAGAACCCTTGCAAACCAGCTTCCTGCAGCAC[C/T]GTCCCTTCTGGCCAACCAACTTGTGGTGAACCTACTTCCTGCAATCAAAGTGCCTGCAAA |
KRTAP29-1 (LOC102544666) | HS | ABRR03077387.1 | 16 | 361748 | CTGTGTGACAGGTTCTGGCAAATCATCCAGTGGAGGTTCCAATCGCTTCCGAACCACTGC[T/C]CCAAGTCTGCCAGGCCAGCACCTGCTAGCCAACTTCCTGCCAACCCAGCCAGGAGTCCAG |
Table 3 shows the results of the validation of 35 SNPs in 937 alpacas that were genotyped. After quality control, 936 alpacas and all SNPs were retained. Three SNPs were monomorphic (MAF=0), nine had MAFs between 0.006 and 0.049, and 23 had MAFs ≥ 0.05.
Table 3 Validation results for the 35 KRTAP gene SNPs included in the alpaca microarray.
Gene | Scaffold | Scaffold position | MAF |
---|---|---|---|
KRTAP10-1L (LOC107034113) | ABRR03000004.1 | 12853197 | 0.03472 |
KRTAP10-1L (LOC107034113) | ABRR03000004.1 | 12853114 | 0.06624 |
KRTAP10-1L (LOC107034113) | ABRR03000004.1 | 12888170 | 0.1977 |
KRTAP11-1 (LOC102544905) | ABRR03000004.1 | 12376109 | 0.4764 |
KRTAP13-1L (LOC102532037) | ABRR03000004.1 | 12859761 | 0.03636 |
KRTAP13-1L (LOC102533350) | ABRR03000004.1 | 12924045 | 0.02724 |
KRTAP13-1L (LOC102533350) | ABRR03000004.1 | 12923762 | 0.05609 |
KRTAP13-1L (LOC102533350) | ABRR03000004.1 | 12923825 | 0.3223 |
KRTAP19-1L (LOC116276605) | ABRR03000004.1 | 12804959 | 0.01389 |
KRTAP19-3L (LOC116280314) | ABRR03000004.1 | 12746893 | 0.03904 |
KRTAP20-2L (LOC116276588) | ABRR03000004.1 | 12660986 | 0 |
KRTAP24-1 (LOC102532787) | ABRR03000004.1 | 12979659 | 0.0187 |
KRTAP24-1 (LOC102532787) | ABRR03000004.1 | 12980054 | 0.1534 |
KRTAP24-1 (LOC102532787) | ABRR03000004.1 | 12979560 | 0.1629 |
KRTAP24-1 (LOC102532787) | ABRR03000004.1 | 12979802 | 0.2198 |
KRTAP26-1 (LOC107034115) | ABRR03000004.1 | 12947628 | 0.295 |
KRTAP8-1 (LOC116280298) | ABRR03000004.1 | 12436103 | 0.4618 |
KRTAP12-2L (LOC102530622) | ABRR03001991.1 | 807 | 0.296 |
KRTAP12-2L (LOC102530622) | ABRR03001991.1 | 902 | 0.3007 |
KRTAP12-2L (LOC102530622) | ABRR03001991.1 | 1519 | 0.3331 |
KRTAP10-10 (LOC116284217) | ABRR03077307.1 | 8124973 | 0 |
KRTAP16-1 (LOC102544939) | ABRR03077387.1 | 354957 | 0.006417 |
KRTAP16-1 (LOC102544939) | ABRR03077387.1 | 355358 | 0.08333 |
KRTAP16-1 (LOC102544939) | ABRR03077387.1 | 355869 | 0.2449 |
KRTAP29-1 (LOC102544666) | ABRR03077387.1 | 361604 | 0.1487 |
KRTAP29-1 (LOC102544666) | ABRR03077387.1 | 361421 | 0.1496 |
KRTAP29-1 (LOC102544666) | ABRR03077387.1 | 361748 | 0.4322 |
KRTAP3-3L (LOC102528209) | ABRR03077387.1 | 532094 | 0.007479 |
KRTAP3-3L (LOC102528209) | ABRR03077387.1 | 532200 | 0.06912 |
KRTAP3-3L (LOC102528468) | ABRR03077387.1 | 526383 | 0.3525 |
KRTAP4-1 (LOC102543867) | ABRR03077387.1 | 404135 | 0.09936 |
KRTAP4-1 (LOC102543867) | ABRR03077387.1 | 400197 | 0.3979 |
KRTAP4-1 (LOC102543867) | ABRR03077387.1 | 403985 | 0.4995 |
KRTAP4-7L (LOC107032861) | ABRR03077387.1 | 411461 | 0.04011 |
KRTAP9-2L (LOC102544126) | ABRR03077387.1 | 379872 | 0 |
Discussion
Sixty-seven SNPs were identified for the 34 KRTAPs genes that are annotated in the alpaca reference genome. Of these, 35 SNPs are included in the 76K alpaca SNP microarray (Calderon et al. 2021) and 32 were confirmed as true SNPs and three were monomorphic in the alpaca population that was analyzed. These 32 validated SNPs are localized in 14 KRTAP genes. The three monomorphic SNPs indicate that the polymorphisms identified are not present in the alpaca population used for our analysis and/or are of low frequency in general. The latter can be true for any SNP because their frequencies will vary depending upon the sample populations being analyzed unless their frequencies are closer to 50%.
The constraints of this study lie in the potential for gene locations to shift with advancements in the alignment of the reference genome utilized. Furthermore, enhancing genome assembly is imperative for accurately determining the physical locations of genes and, consequently, single nucleotide polymorphisms (SNPs). Additionally, the presence of sequencing errors in the genomes and reduced libraries are plausible. Utilizing libraries is crucial as it ensures that sequenced fragments possess adequate repetitions, thereby facilitating the generation of a more robust consensus sequence.
In the KRTAP11-1 gene, a single nucleotide polymorphism (SNP) (g.12376109T>G) was detected, consistent with SNP33 as identified by Foppiano (2016). Foppiano (2016) additionally documented five additional SNPs within the same gene, though these variants showed no association with fiber diameter. Our study, however, did not detect all of the SNPs reported by Foppiano (2016), possibly due to our stringent filtering criteria based on quality parameters outlined earlier. Foppiano (2016) employed only allele frequency as a filtering criterion.
The KRTAP13-1 gene consists of three copies (LOC102533350, LOC102532866, LOC102532037). The SNP (g.12846534G>C) identified within KRTAP13-1 (LOC102532866) matches the one reported by Foppiano (2016), who noted its conversion from Tryptophan to Cysteine in the protein sequence, a finding consistent with our study's results. Notably, Foppiano (2016) aligned the alpaca genome (Vicugna-Pacos 2.0.1) with the human KRTAP13-1 gene sequence to ascertain the identified alpaca gene sequence. Moreover, Foppiano (2016) identified three additional SNPs within the same LOC102532866 sequence that our study did not detect due to stringent quality SNP identification filters. However, our study successfully identified SNPs within all three copies of the KRTAP13-1 gene as annotated in the NCBI. We identified four SNPs in LOC102533350, two of which are non-synonymous, resulting in alterations in the protein sequence from Threonine to Serine (g.12923905C>G) and Valine to Phenylalanine (g.12924045G>T). Similarly, we observed two SNPs in LOC102532037 leading to changes in the protein sequence, from Alanine to Serine (g.12859532G>T) and Serine to Cysteine (g.12859761C>G). In goats, Fang et al. (2010) identified a SNP (T>G) within the gene's coding region, significantly associated with fiber diameter in Xinjiang (cashmere) breed goats. In our study, we identified a G>T SNP at positions 12924045 and 12859532 of the same gene. We recommend employing these SNPs within KRTAP13-1 (LOC102533350) for association analyses with fiber characteristics in alpacas.
In KRTAP24-1 gene, four SNPs were identified and validated, three of which are non-synonymous SNPs. One produces a change from Alanine to Valine (g.12980054C>T), which is similar to the SNP (c.656T/C) identified by Wang et al. (2019) in goats. This SNP causes a change of Valine to Alanine in the protein sequence and was reported to affect the fiber diameter in cashmere goats, while the other two changed Glycine to Histidine (g.12979659A>T) and Asparagine to Serine (g.12979802A>G) were not reported in previous studies in other species. The fourth SNP identified in this study is synonymous (g. 12979560 T>G). However, it would be important to consider all of them for an association study analysis on fiber characteristics in alpacas.
For the KRTAP26-1 gene, four SNPs were identified, one synonymous that was validated and three non-synonymous, not validated, that produce the following changes in the protein sequences, Serine to Proline (g.12947455T>C), Leucine to Glutamine (g.12947708T>A), and Arginine to Cysteine (g.12947881C>T), respectively. Similarly, Li et al. (2017) identified four SNPs in this gene but the one that was associated with lower mean wool diameter, standard deviation of wool diameter, mean comfort factor, and higher wash performance in Merino x Southdown sheep (c.277A/G) changes Serine to Glycine, was not found in our study. Again, it would be important to analyze the association with fiber characteristics of these SNPs in alpacas.
A single nucleotide polymorphism (SNP) within the untranslated region (UTR) of the KRTAP9-2 gene was identified. Additionally, Foppiano (2016) reported a SNP in the 3'UTR; however, the former SNP was excluded from further analysis due to its failure to meet the allelic frequency criterion (≥ 5%), while the latter SNP was not detected in our study. Foppiano (2016) also noted two additional SNPs within the exonic region, which were not included in our analysis due to their failure to meet the quality parameters required for inclusion in the 76K SNP microarray. In studies involving cashmere-producing goats, Yu et al. (2008) identified a 30-nucleotide insertion/deletion variation in the exon of the KRTAP9-2 gene, while Wang et al. (2012) identified a C/T SNP variation in the same gene. Furthermore, in a separate investigation, Wang et al. (2014) conducted a gene expression profiling study across different phases of the fiber cycle, revealing significant differences in the expression levels of this gene between goats exhibiting high and low production of cashmere fiber. Consequently, further investigation into the presence of additional molecular markers, such as insertion/deletion mutations, remains warranted in alpacas.
In the single exon of the KRTAP7-1 gene, we found a non-synonymous SNP that changes a Serine to Arginine (g.12420218C>G), comparatively, Liu et al. (2014) identified a non-synonymous SNP in Chinese merino sheep, but this changed Asparagine to Lysine in the protein sequence. In Liaoning goats, Jin et al. (2011) found that the expression of this gene was 1.28 times higher in secondary follicles than in primary follicles, suggesting that it probably plays an important role in regulating fiber diameter.
For the KRTAP8-1 gene, we found two SNPs, one synonymous (g.12436062T>C) and the other non-synonymous (g.12436103G>A), unlike Liu et al. (2011), who found two SNPs in goats, but both were synonymous mutations, and one (T113G) was significantly associated with cashmere fleece weight, but not with fiber diameter. Our non-synonymous SNP is included in the 76K alpaca SNP chip and would be important to consider it for association studies with fiber characteristics.
For the KRTAP20-2 gene, we identified five SNPs, one of which is non-synonymous and changes an Alanine to Valine (g.12678801C>T), while the other four identified SNPs are synonymous. Bai et al. (2018) also reported a non-synonymous SNP (c.160A>T), it generates a premature terminal codon associated with mean diameter and wool curvature in sheep (Merino x Southdown).
In the single exon of the KRTAP21-1 gene we did not find any exonic SNPs, while Li et al. (2019) reported four exonic SNPs in Merino x Southdown lambs and found one associated with a positive variation in wool washing performance. However, two SNPs were identified in the UTR of the KRTAP21-1 gene, which might be involved in gene expression regulation.
The validation study of 35 out of the 67 identified SNPs in KRTAP genes was conducted successfully, as they were incorporated into the 76K alpaca SNP microarray. Alternative approaches will be employed to validate the remaining SNPs, such as gene sequence comparisons among genomes as they become accessible (Foppiano 2016, Palloti et al. 2023).
To summarize, 32 out of the 67 identified SNPs have been validated in a population of 936 white Huacaya alpacas. While the incorporation of these SNPs into the 76K alpaca SNP chip will aid in elucidating their association with fiber characteristics, it is equally imperative to explore the biochemical attributes of these proteins and their follicular expression levels in alpacas in future whole-genome association studies.