biotech report

profileA11113
AssignmentInstruction.docx

Assignment instructions(请回答highlight的问题,以及按照rubric进行文章部分的划分)

Background

In this exercise, you will look into different gene-finding approaches learn various things about a contig from a genome assembly and annotate the genes it encodes (see sequence assembly lecture if you need a reminder of the vocabulary).

There are two main methods for automatic gene prediction: ab initio methods and comparative methods. Ab initio methods use the DNA sequence as the only input and are referred to as intrinsic methods. There are several features that can be identified in a genomic sequence and used to identify genes computationally. Such features are related either to the signals that regulate the biological mechanisms of gene expression (signal sensors), or to biases in sequence composition in DNA regions that are translated into proteins (content sensors). Signal sensors are typically splice-sites (donor: GTRAGT, acceptor: YAG, branch-site: CTRAY), the start of translation (codon ATG), and the end of translation (codons TGA, TAA, and TAG).

The content sensor most commonly used is bias in codon usage: regions of DNA coding for a protein use some codons more frequently than others. Both signal sensors and content sensors must be trained, i.e., we must start from a set of observations (such as known genes) from which we build a sensor model. Predicting a gene therefore involves looking for new features in the genomic sequence that resemble our model. The resemblance can be established in terms of probabilities.

Comparative methods are called extrinsic methods. They include two strategies: those that use homology with sequences from other genes, also called homology-based, and those that make comparisons with genomic sequence from other genomes, also called comparative-genomics-based. Homology-based methods predict a gene from the alignment of a protein sequence, or an RNA sequence in the form of a full-length mRNA, cDNA or EST (expressed sequence tag), with the genome sequence that we want to annotate. The known sequence (also called evidence) guides the prediction. There are several ways of applying homology-based methods. The simplest is to accept the alignment of the known sequence to the genome as the gene prediction. More advanced methods use the known sequence as a guide and try to complete the evidence to yield a complete gene structure. The efficacy of the latter method depends on the number of known gene sequences; hence it is limited by the completeness of biological databases. Comparative-genomics-based methods hypothesise that any sequences conserved between two relatively closely-related genomes are functional and likely to code for a gene.

The annotation of a genome involves a combination of several gene prediction methods, and perhaps the prediction of other biological signals (such as transcription start sites, promoter regions, etc). In this practical we will use different ab initio (FGENESH and AUGUSTUS) and homology-based (BLAST) approaches to annotate a DNA sequence.

You are required to write a more than 1,250 words report on your analysis. 
Refer to the general assignment preparation guidelines given to you before. 


Introduction: Provide appropriate background to the aims of your analysis. Include the difference between ab initio and homology-based approaches and explain under what circumstances one approach would be more suitable than the other. 


Methods: Make sure in your methods section that you state which kind of BLAST you used for each task and explain why. (还要包括fgenesh 和augustus两个database的说明。)


Results: Based on the tasks carried out in the prac on your allocated 20-kb DNA sequence (attachment:) (make sure you note which sequence you worked on), present the findings in this section of the report. Specifically, we are looking for:

1. A summary diagram of the gene models from the program you believe represents the best, most accurate ab initio annotation. Provide a summary diagram showing the entire 20kb of DNA annotated with all predicted genes. Your diagram should indicate the base-pair positions of all gene features. You must show exons, introns, start and stop sites and the gene direction. You should also provide the polypeptide sequence encoded by each gene. Include in the text the annotation method you chose to depict in your diagram, and why you chose it (i.e. strengths and weaknesses compared to other methods). Explain the support you obtained for these predictions. Use the appropriate BLAST program to determine the type of protein produced from each gene predicted by your chosen ab initio method. Present the evidence and comment on your confidence in each BLAST result. 


2. Report on the homology-based method to give an independent prediction of the gene models in your 20-kb sequence. Provide a summary diagram showing the entire 20 kb of DNA, annotated with all predicted genes. Your diagram should indicate the base-pair positions of all gene features. You must show exons, introns, start and stop sites and the gene direction. You should also provide the polypeptide sequence encoded by each gene. Determine the type of protein produced by each gene predicted by the homology- based search. Present the evidence and comment on how confident you are in each BLASTx result.

3. Find and identify the one type of gene that is found on all DNA fragments for the class. Either download another few sequences to identify the gene type in common, or check with your colleagues which gene types they found.

Tip for visualisation purposes: Geneious Prime is very convenient to show annotations along a sequence. You can download a free trial from here: https://www.geneious.com/prime-features (Links to an external site.). Alternatively, visualise the results in PowerPoint or similar software.

Discussion: Include the following discussion points:

1. Compare the ab initio predictions and the BLASTx-based homology predictions for your sequence and comment on which approach worked best for annotating your sequence. 


2. Comment on why BLASTx was used for the homology-based prediction. How is BLASTx different to the other kinds of BLAST? 


3. Discuss the predicted biological and/or biochemical function of the gene found in Results #3 above. Include the nature of the conserved functional motif(s) and the homology to characterised genes. You will need to search the scientific literature in order to determine the biological and/or biochemical function of the chosen gene. 


Visualisation results:

Fgenesh:

Augustus:

>42 CTGGTTCATCCACCTCGTGGTAGTTCTGCTTTACACTCAACCACGGAACCAGGACACTGCCAAAGACGGA GAGAGCGGCCATGGCAGCAGCTGCCAAGAATGCCTGGTCTAGAGCATGGTTGTATGCTATGAGTACCTTG GGCAGCTCCTCCGCGGATACGATATTCTTGATGTCAGTTGCTCCGGTTGTAGTGATAATGGTTGGATTTA CCGATGGCACATTGTGAGCAAGTGAGTTGACAAGGGAGTTTGTAAAGATCGAGTTTCCGATACTGACAAA GATCATTCCGCCAAGTGTTTGCAGAAACATAATGATAGATGTACCAATAGGTATGTCCTTTGTTTCTAGG GCTGCCTGAACAGCCACCAAAGGCTGTCTTGAACCGAGACTGTATCCGATGCCGAACAGGATCTGATAAC CAATCCATTTCCCGCTTGAGGTATGAACTGCAAAAGTAGAGAGAAGACCTGCTCCGACAGAGGAAAGAAT TGCTGCCGCAATCATGAACGGCGTATAGTATCCTAGAGCCGTCACCAGGCCGCCGACCATTATAGACATG AGAATTGTTGAAATCATCATTGGAAGGTTTCGAATGCCGGATTCAATAGCTGAAACTCCTTGTATTGCCT GGAACCAAATGGGGATGTAGTATATGACAAGAAAAAATGCCGAACTTGTACAAAACCCGTAGAGGCAACC AGCCCACACGGAACGATTGCGAAAGATGCGCGGAGGCACTGTTGCAGCTTCTTGTAGGCGAAGCTGGATA AGCAGAAAAGCTATGATCAGAACGCCAGAAAGAACCAGAAGAGCAATGATTCGCCCATCATTCCAGCTGT ACTTCGAACCTCCCCATTGTAGGGTGAGCAACAGGCAAACCATTGCTGGTAAGAAAATCAGGGTCCCAAT GGGATCCAAGACTCTTATGCGATCCATCCAAGTTTGAAGCGTTGGCACATCTCGTTGCGGATCCTTGAGG ATCAACCAGATGGCCAGAGCCGTAACGCCGCCCAGTGGTAGATTTATATAAAAGCACCATCGCCAAGACT AGTGGTATGAATGTTAATAAGAAAAAGGCGCATTTGGTATTCATAATCCTTTGTGTTTGGAACGTACCAC ATTGTCAGTAAGGACGCCTCCGATGATAGGACCAGCCACAGATGCTATCCCATACATACCGCCCGCTAAA CCGATATATGTTGGTCGTTTTGCCAGAGGAGAAGTATGGGTGATGATGAGCAGAGCACCGGAAAATATAC CGGCAGAACCAGATCCAGCAATTGCTCGTCCGACGATAAAAGCCGTTGATGAAGGTGCGACGGCGCATAT TAAGCTACCGAGCTCGAATACGGCGATGGTCGATAAGAAAACCAATTTGATGCTCCAAAATGTGTAAATC TTGCCGAAAATGAGTTGCAGGGAGGCAGTTGTCAGCAAGTAAGCTATAGTCAAGTAGTATTTAGTAGGAG TTTTCAAGGTACTGTAGTAAATGCTTTCTAGGTTGGCCTACCACTTCCGTACCACCCTACATCTAACACG CTACCAAAGTCATCAGTGATTTTTGGAATCGCTGTGGTTATCACCGAGTTGCTTTATCACGGTCATAGTA TTAGTTCCTTCAATAGCAAAGTGCGAGAGAAATGAAAACCATACTCTAAAGCCATCAAGAAGACTGCAAG ACACAGAGAGACCACTATGAGAAACATTTTCGCGCCCTTTGGGTAATTTGTGTCATCTTTCGCTCGGCCA ACTGTTTCTCGTGATCCTGAAGTAGAGCTGTCATTTTTTTCACAGATATCCTCATTGGCCAGTGTTGATA GTGTGCCATTTTTCAGAACATCTGTCAATTCTAAGGAATCGCCAGTAGTTCCCTTCGACCGTTCTGTGTT AAGCGAGAGCATGATATTGAGCTATAATGAGTATAACGACGTTCAGTGTTTTCTCTAAATTCTTCCCTCC TCATTTGCGACAATGGAATCTACGAAACACAGCTTTCGCTAAGTAGCTACCAGACTACTATTGCATAATA AGACACGTGAAATTAGTGCTGCTCCGGTTTTTGCGACAATTACCTAAGAGAGCCAAAAGCGACTTTTCAA GGTGGAGCTTAGGTACCGGTACTGGTAGTGATCTAACGTACATATCCTAGCTTTCTTGAGACCATACACG AGAAGACGGTGTTCGGCATCTTACTTTTCGTCCGGTGCCTAGTGGATTACTTGCGCAGGAAAGCCATCAT TCGGTGACTTTTTACCAGTCAAGTCGTTCAAAATTTCATCGGCTGCACTAGGAAGTATCGAGGCGACTGT GCGTATCAGGGCTAATACTTCGGCTGATATAGTCCTCCCTAGAAACAAGGACATGGTCATAGAGGCCCCG ACAACTTGAAATGCGAGAGAAGGAATCTCACTGGTACTCTTGAATTCATGCTGCTGCCCGTCGACAGAGC TTCTAGAAATACGGAGGATGATATGCTTGCATGAAATAATGGACCCATTCGGGTTCGGTTTCCTATTGCA GAAGGCCTGCATCAAATGAAACCGTCATCTCGGTTGAAGTAGTCAGCATAGAGTTAGTGCTAAAACACAC AAAATTGATTTGGATGCTATGAGACAAGACGAGGAGACCGACTTCTACCTACGTAATACGATTATCGTGC AAGGTTATGATATGCACGGAGGCGGCCTATAGCACTCGGAATCAGCCCTAGGTATCAATATTGTAAAGGT TGAAAAGCCGTTAAGTTAGAGTAAAATATTATTATGTATATGATGTGATGCAAATAAGATCAGCCATGTG CCCTGAATAACCCTAATTCAACTCCGCATACTATGGCAGCAGCAATAACAGCAAGAAGGAATCACCCTTT CGCAGAGAAATGGCCATCTCTAAAACACAAACAGCAGTCATCCAGACAAACGAAAAGTCATCGCTTCCAT TGATCGTATCGCAGTCGGTACCAACGTTTCAACTGCCTTCCGAGCAATATGTACTGGTTCGTGTGCTGGC CGTGGCCCTGAATCCTACAGATTTCAAAATGGTACAGCACTTTCCAATCGGAGACAATCTCGCTGGTTGC GACTTTTGCGGAATTATCGAGGCATGCGGCAGCAACGACACTGCCAACCAATTTCCAATTGGAACACGAG TATGCGGTGCGGTGTTTCCTTACAACCCCAGTCAGCGCCAAAGCGGTGCCTTTGCGGAATGGGTGATTGC CGACTCTGGTTTATTGCTGAAGGTCCCGGAAGGTTGGGATGACCTCGATGCAGCGGCTTTAGGTGGAGTC AGCTGGGGTACCTCAGTACTGGCTTTTTATGATCCTGATGCGCTGGCACTCATTGGTCGTCCCACTAAGC CGAGTGAGAAAAGGGAACCAGTTTTGGTTTACGGAGCTGGCACAGGATCTGGCACAATGGCTTGTCAATT ACTAAGGCTGTAAGTCATACCGTATACAGCTTGATAGACAGAGATAATTGCAGGCTAACTCAAAGATCGT CTACATTTCCAGAGCTGGATACGCACCAATTGCAGTTGCTTCTTCGAAATCGGCAGCTATGGCTATGGAA TACGGCGCAATTGGTGTTGCAGATTATACATCCTCAACTTGTGCTGAAACAATCAAACAACTCGCTGGCG GCGCTCCAATCCGATATGTTCTTGATTGCATAACCACGGCGGAATCAGCTGCACTTTGCTTCAGTCTTAT TGCCCGTACTGGTGGTCGCTATGCTTGCCTTGAAGAGTTGAGACCAGCTTGGAGAACACGTCGAACAGTT CGCGTTAAGGAAGTCATGGGCTTCGAAGGCCTTGGAATAAAAATTGATTTAGGACCGACGGCCTATTCAC GCGACGTCAATCTTACCCTTCGCAAAATTTGCTGCGAAGCTACCAAGGAGATTCAGACGGTTCTCGATGA AGGGTTACTAAAGCCACATCCTGTTCGTGAGGTTACAGGACAATGGCAAGGCATCATTGATGGACTAGCT ATGCTTCAGCGAGGAGACGTAAAAGGACAAAAGCTAGTGGTCAGGGTTTCTACGATATAAACGTAGAAGC AATAGCAAGAAAAGAGTTGGAAGGGAAGCCAGCGATGTATATTCTTTTCTTGTATGTACCTAGGTAGCCA TACGAACCATATTTTCTCTGGGGTGTTCCTTTTTCAGCTGCTGCATATATCTACTTGTTACGAAACCGAC TTGCAGGGATTGTAAGATCATTTAGGAAAAGCGTGTATTAGACAGGACATCAGAATGTGGAAAGCTGATT GTTCCCAGCCCTCCTCTCCTCAGCCATCATTTTAGGCATCTGCGTCTCCTCCCCTTCTTCATCTCAGTCA ACCTGCGCACTACCTTTACACCGGCACGCACCGTAATTGATGAATGCTCATTCAACAAATTGGACAAAAA ACATGCTACTTAGTATTCACGGATCAGTTGGCGGGGTCATCCGAGGTCAGATGGTCTGTGAGAGACTACC TTCGAACTTCTGTTGCAAATAATTTTGTCTCCATATTGCCATCGACAGTAGTCTTTATCACCACAACTAT GGATGAGAAGTGTGAAATGATCAAGTTGCTTGGGGGAATTTTTATTACTGATCTAAAGAAGTACCCTTTC TTAGATTTGCCTGAATGATGCCTGAGCTCCTGCACATCCCTCGTCTTCTGAGGTCAATTGCTTTGTTTGA TTGTTGATGATATTGGTAGAGGGTCTGATCTTCGGTGAGTGTCGCCAAAGCAGCCCTCCTTTGTCTGTAT GTTTTCAATGTAAAATCATGACTATTATAAGAATCCGGCTTGCATGAAGCTCTCTATTGTTCATTCTATT TAGCAGGAGTCTCTCTTCGCAATCGTTCTTCCGCTCTTATGAACATAAAGCGAGCCGAACTGCTGAGGAC ATCAGGTATAAGACGACTCGCCGGTAAGCCTAAGATTCCATTTTCGGGAAAGAAACATCATTGTATATTT GATCACTCTTCTGGAACCGGACGTTAAGAGGCATAATCAAATATAGTCCTTGATTTGCTGCTCAGTCCAC TCCCAAAATTTATACGCACCACCTGTGCCACCCTCCGACTCGAGCTTCGTCGCATTTATCAGGTCCGGTC GCAAGGGGTAGAATCTTCCGAAAGGAGCCACTAGAGAAAATCGCTAATATTAGCAACTGGATAGGAAAAC AGAAAGTATGGTGTAATACGTACCCCATTCTCCCGACTTTTCCAAGGTGATTTCTGGCGACAATGCGGCA AAAAGCTCGGTGTATGCACCCATTATAGGGGGATAACCGACTACTTTGACAATTGCCTTCAAGGCTGCGG GTTGGTCGCGAGCAAGTTCCGAGGTCAGATTACCTGGGTTGATCGGTACACTCACAATGCCATCTGCTTT GTGACGCCGGGCATACTCCACACCAAGTGCCCAAGCACCACACTTGCTGATTCCGTATCTCTCAGTTGCT GGTTTGGGCTCATGATAATCCAGATTGTCGAGTGCAACGCCCACATTTTCTGCACCGAAAAGCTCTAATC CAAATGAAGAAAGCCAGATCACACGGACCGAACCTTCGGGCTCGGATTTTGCTGTCTTTGCCAGAAGAGC TGTAACAAGCCTGGTGAAGAGATGCGTTCCGATACAATTGACACCGAGCGCAAGCTCATGGCCTTGAACA GTATTCGTTGCCTCTGGTGTCCCAACCATTACGCCGGCATTGTTGAAAAGAACATGGAGCGTTTGTTCCT GCGACAAGAATAGCTCGGCGGATTGCTTCACCGTGCTGAGATCGCTAAGATCCAAATGAAGAAAATGCAA TGAACCGGTAGAACTGGGCTGAGAATTCCTGATCTCGTCAATAGCTACGCTCGCCTTCTCCGTAGAGCGC GCAGCCATCCACACTTTAGCCCCCTTTGAGTAGAGAATGCGAGCTGTTTCTTTTCCAACCCCTGTGTTGG AGCCGGTAACTAGATACACCTTGCCTTCAAGACTAGGCACATTTTTCTCGGTAAATTCGGGTTTAGGAGG GCGAAATGAAGCCCAAGTGCTTCGTAGTGATGGACCAGGTGCCATTACTTTTCTTATCGTATTGCAGATT GACGTGTTTTCCGGGTCGTAAATGTTTCGTAGAAGTAAACACAGCTCTGGTTGACATGTATCTTTTCCAT AAATAGTGTACCAAGCGCCATTGGACGACTAGAATTTACGGTAGTAACTCGGAGTTGTTGGATCATGGCC CTTCAGCCGGTGCATTCGATTCGGATTCGGAGCATCAGCAAACCAGAATCTAGTTTGTCAAAAGTCATAA TCTACTACCTAACTACTACACACCAGCCGGCTCATCTATTTATGAAAAGTTGACACATTATTGCAACCCG CCATCCTTGTTATGCGACTTACTGTTAAAGCTACAATTAGTACTGTTTTATAATTTGGCCATACTCAAAC TAAACGACTCACAAATTTGGGACATATCTGCTTCAATCTCTGCCACATGAATAGCTCCTCTCGGAGCGAT ACATGCACATACTTTGATTGTCCTCCAGGATGCTCACTGATTTAAGTATATGGCATTTGTATGGGAATGT AGGTTTTATCTTCGCAATAATATCGGAGTTCTATACTGGAGTGATTAATGTCGAGATAATAATAGTCGAT CGTCTACGAGAAGTAAACGAATACCAAAGTGGCTTGAAGTAATAGCGTCTCATTCTTACCGTGTTTAATG ATATTCAGAAACTAGTTTCGTGGTCTCACTCATTGAAGACACATCTCAACTTCATGCGTATCTAGAAGAA AATAGGACAAAGACCTCACATAAATGTGGAATGAAAATTACAGGACTACCATCAGGGAGGACAAATTCTG ACCTTGGTATCGATCTACAGTCGATAACCAACTTACCAAAGTCGCTTTATTCTGATCTTCATCTCACGAT CTGGAGGCATAGTACCTGCTCCGCAATAGGTTTGGTTATTCTGCGCTGGTCGGTCTGAAGGAGATCCGTC CAATTTCACCCATTTAACAACCTCTATGTCGAATTTCGATACTAACATGCCTATTGTAGTCAGGATCTCG TGTTTAGCGAAGTGTCGTCCAGGACAAACGGGTGGACCGCCGCCTACAATGATGGATTAAATTAGTATCA CGATGAATGGATGAGCACGAAAATGAGAGTTGATCATACCGAAAGGAAAGTAGGAACTCGGCCGCCCCGC CATAGCAAAGACACGTCTGCGCTCAGTCTTGCCGTCCTCCTTGGTGTCCTCCACGTATTTGATGTGTCGC TCAGCCCAGAATTCAGAAGCAGGGTGTCCACTTGATCCCCAGACACTTTCTTCGTAATGCGCAGCCATCA TAGGGACCTGTAACATGGCCCCCTTTCGAATATTGTATCCGTCCATGGTGATGTTTTCTTTGGCGTGGCG AATAATGTTGAAATTCATGTGCAATCGCAGAACTTCGGTAAAAACTGACTGTAGCAAGGGCAAGACTACT AACTTCTGAATATCACAAATATGTTCACCAGTTACAGGGTCGATGGAGTAGGTTGTAGCTACTTCTTCCC GGACCGCTTGTAGTAGGCACGAATCCTGGAAGATCCGGAGAATCATCCACATTGTGGTTGGGATAGTATT GGAATTCTGTCTGTAGATAACCAATGTTAGAAAGAAGCGTGCGAATGATATCTCACATCTCATGAGAGAG GAGAACGTACGCGAAGAGAAGAGTTCCAAGTGCACCAGATATGGATACGTCTGGAAAGCCACTTTCCTTG AACCACTTTGCAATCTCGCGACAAACGCGAGCACCGAAATGAGGTTCCCAGCTCGATTCGGCGTCCGGCC CATTCCAGTCAAAGTTGTCCCAAGCAGCATGAACATACCTCTCGATCGTAGAGAGGTACTTGTCCTGGGC ATTGTAAGGGCCTGGGTTGAGCCATTTGGGAAAGCCCAATGTAAGCATAAACACATTATGATCGAATTCC CAAAATGTATCCAAGAAGTCTGGGTCCAACTCGAAGACCTTAGGGCCCAACAAAGTACTCATGGCGCATT TGGTCACCTCGCGTCGGCACGTTTCAACCACACCTAGCGTAGTCCACTTTTCAAGAGGATATTTATTAAG ATCGTGCGCAAGCTGATTGCTATATGCCTGAATAATAGGCTTCAAGTGTTGCGTTCTCGCCAAATATTCT GTATGAATTTGTTCGTATCTCGCCCAGTATCGCTGTTCCTTTGGTGTGCCTTCCGTGCCTGGTGCAGGAA CATGACCTCGCCCGGACTTGTCGTTGGCAAAGCGCTCTACATGGCTCTTGGGCATTTTGTAAAGAGTGGG AAGAACAGCCTGAGTGAAAATTTCTTCGCTTCCGACTTTGTTGTCCCTGCCGAATATGGACTGTATATGT TTTGTTCCGGAGATGAGGTAGACCGGCACTGTTCCCATGTAGAATTTCGCAATGCTGTTCTTGCTCAGTA GTTTCCTGTTGGCAATAACAAGCCACGTTATCAGTTCATGAATTCCCCCATATTGGAATATACCCAGTAG AATCAACACCGAGGGAAATCGATAGTACTAACATTACGCGTTTCATAAACTTTTCATTATTGGTCAAGAA CTGAACAGTGTTATATATGCCAGGAATCGGGTCAGATAGTTTGACGGGCTCTCCCCGTGAACTTATAAAC CTCCATGCTGTGAGGGACAAAAGGATGAGGAGCGGCCAAATGACGTTACGATTTCCAAGGATTCCCAGCG TATATTGTGTTAGAGGTTCCGTGACCTGCATTGCTTGACTGAGACGTTGATCCATCGTTGCCCCCATAAG AAGCTCGTCCTCACAATCAAGTTGCAAATATCATCCGTTCTTCAGTCAAGCGGACTCGACTATGGAAACA ATATATGGCTATGACACAGTGAAACAATCGACTACAAACGTATAATCGTATTGATTCAGTTGTTCTATGA TACTACTTCAGTAGGTACATAGGTTTTAGAGCTGACACTTGGTCATCAGGTGGTTATTTCAGTAAAGTCT CCTCCGATTGGTGCGAGACCCCGATTTCACAGACGTCTCCCCGAATACATACCCCGATCCCGAAGTGCTC TCAGAGGATCCATTTCAGCCGAGGTGATTTCCAGCAACGTAAGTAATTTGACCATTTAGAATCCCTAATT CTTGCTTTATCAGTGTGTCATGAATGCTTATTTGACTACCATAATCCGAGCCAGTGAATGACGGTAGCTA ACTAGGGCGAGGATGATAAAGCCCCAACGAAGGCCTAACACCACCAGCAAGACGTTGACGAGACCGTCAC ATACTCGACATACACATTATATCGCCTATCGATAAATAAGACCTGTAAAATAAGCTTCATATCAAGAAAT GAGACAATAATGTACAGTGTCCATCGAATCTTAATGTATTTAGCCCGTGCCTTCAGTAGCTGAGATTCAT TCCAAGGTTCTTAACTCTTTACTTTAACTCGCTCAGTATATCAAACATTCCTGCAAGCACTGTCATAATT CAAGGTTCAACAAACATTGTCACAGGCTATTTCACTTCGACGGGCGTTAAAGTCACATTCAATGGTATCC GATCCCACACGTTGTAATTCTTCTGCCTCTCGATCCAGCTATGGCTGCCATTCGCAAGTTTCATATCAAA ATTATAGAGAATCTTAGCCAAGATAAGACGCATTTCCGTCATTGCAAGACTAGGGAGTGTTAGGTATGTC CAATACCGGTACGCACGAGCTCAACAAGGGTAAACAGAAAGATGAAGAGTACTAACTTTCTTCCGATGCA ATTCCGCGGTCCCATGCTAAATGCTTGAACGGCTTCCAACATATCACTGTGTTCCTTGGCTACTTTGACA TCCACCATAAAACGTTCAGGGCGAAACGACCAAGGGTCGACCCAGTTCTCCTTGCTGTGATTTATCGACC AATGTTGGATCTCAACTAGGGTCTAAGTTTTAAGAGATGAGCTTGGATAGTCACAAGAAAGCTTTTGCTG AAACGGGAGAAAAGAAAGAAAAAAAAAAAACTAAACTCACATCACCGGGAACGAATCGCCCTGCGATCTC AGCACCTCCTGTAGGGACTTCTCGCACCATTCCAGAGGTCAAAGGTGGGTACATTCGCAAAGATTCGTTT AATACAGGAATCAAGTATGTCAATTTGTTTACTGATTGAATATTTATATCCGCGTCTGTTGAGAAGGCAG ATCTGATCTCCGCGGTCACCTTCTCCAAGACCTCTGGATGCGTCAGAAGGAAATAGGTTGTGCCAGATAG ACAAGTTGCAGTGGTTTCGGAGCCAGCTAGGAGTAGGACAGTGGCATTCACGCTGAGTTTATCCTAATAG ATTTTTAGTAGAGAGTCCCGCAATACACAGAGTGTCCTGGATTGCGTGGGTGCATGAGAGGATCACTTAC TAGAAGTGGTTTTGTTAGCCACCATATTACCACCAATCATCATTTTAAGCAAAGAGCTTACTTACCAAAA GATAGCCCCTAAAGAATGGAGACTCAAGATTTAGTTACGGTTATTCCCGTTTTCAGCAGACCTGGTGAGG ACAACATTACCCATTCATCTCGTCTTTTAGCCAAACCTTCAAAAAGGTCATCATGCCTTTGCTTCATTGC CAGCCTTGACTTTAGCATACTATCGATATAACCGCGCATTGTTTTCATCGCGAGAAATCCACCGGCTTTG TGGATCAATTCGACCAGCGGCCTAGCACCGCCATAACCTAGAGCTACCATGAACGCATGAAACTTGATAG CCTTCATGATAGTAGCAACCCATGGGTGATAATTGACTGTTTGAAGACAATGGAACGACTCTCCAAAAAT CAGATCACTCGAGAGGTCAAAGGTTGTCCAGTTATACCATGCCGCCATATCCGGATTTTTGCCAGACTCC GTTCGTAGTCTATCCACTAGCATGTTTATGTGTTTGAGCATGGTGGGTTCTTGCTGTCTCATGGAGCTGT CCGAGAAGCCATGAGAAAGCGCTCGTCTCAGTTGTTGATGTTCGTCTTGGTCGGCATTAACAATGGTGGT AAGTTGGCCTTTGACATTTCGCGTGAACTCGGTAGACTTGGCCATTTCCACCATACCACCTACAGCCCCA ACACGGTGGTAATAAATATCCTTCCATGCCCGTGGATCGGTGAATGAGAGGTGGTTGGGCGCAATGCGCA CAACCGGCCCATACATGTCATGGAGTCGTTGGGTGTAAAATGCTTGTTCGCCGCGCGCATGACGAATAGC CCATGGAATACGGGAGATACAATGAACTATTGGCCCAGGAAAGTGACACAAAGGGTGGAAGAAGATATTG TATATGGCCGTTACTGCGCTCCAAAGACAAAATACCTTTCGAGCTGAAATATCAGTTGTTGCAAGAGATT CCAGGGTATAGAAAGGTAATGGTAAGAGACACTTCACGCACCGCTGTTAGGATAATTCCAAGATTGACCC AAAAGCCCAACTTATCAGGTTGGAGAAGTTGCCAAAGAGCTATAATAGTAGTCATGTTGTTCTTGGTTCC AAGGTTTTACGTGGTCTGGAGATTGCTCAATAGGACTGTTGATTCATCAGGGACAAAGAAATATGGAAGG AAGCTAGACTGGTTCTTACTATCTAATAGCTCGTACTTACGACCGTTTTGGAACCAACCAACCTGCTGTA GCGCCATTCAACATACTATTCCACCTGAACTTCATGATTGTGTATTATTCTCCATCCGGCATTTTCGGCA AACCTTTGATTGTCGCAAACAATACAAGCCACGTTGCCTAAGCGAATAGGGAGTGAAATTATCTGGAAAC GCCCAAGTTAATACATCAAGTTAGTGACAGCTGGAAGAAGAGAAGAAGGAAACACAAGCATACATCGGCT GTGTATAGAATCGGACGTCGTAGCCCTAACTAGCCATCCGAATCGGCCCCTACAGCCCAATCAGATGCCG AGCAATGGTCGGGTCATGTCTAAACACCGCCGAATTTACGAATGGAAGGTGTAAGAACAAAAGTTTCAGA AGTAGGATTATTATTGTACATTGCGATTCCAGCGGAAATAGTAATTTCTGTTTCGTTATGTCCGTTTTGA TTCGCCTGGAGTCCGTTGTTGTATCCATGCCCCCCGTTGATTGTAGACATGCACCCAGTTAGCTAAGCTC AGTCACCACGTTATTGAAATATCTGACCTGAGTGAAGTTTTGCCACTCTTCCCTTCTGGCATTTGGTCAA ATTGTAAGTGGTGCAATAGAAGCCTTTTGGCCTCTAGGATTCAGATTGTCGAAACCTTACATTGTGGGAC TGTTATCCATGTGCAAAACATCTCAAAGTACTACTGATTTAGACTCCAATAAATCACTTTCTTTATCCAC AAACACCAGATACATACATACATACGTGAGGCATCGTATGCACCAGGCATTATGGAGGCCGTCTTAGAGA CAGGAATGACATTCGTGTCGAGCCTCACAATGAGGGACATGTTGGCTTTGGCCGTTTCCACGGTTTGAGA ACATCTAAAGTACTGGAATATGATGAACGTTATCTCATGATACTATTGTGTACAGACTCTTCTGGTGCCA ATCACTTGGATGATCTATAACCTCTATTTTCACCCGCTGGCGCGCTTCCCTGGACCCTTTTGGCACCGTG CTACACGTCTTGCATACGTTATCAAAATGAACAAAGGAACACTTGCTTTCGACGTGCTTCCTATGCATAA GAAGTACGGTCCCGTTGTTCGAATTGCGCCTAACGAACTCTCCTTTCAAACTCCCCAGGCATGGAAAGAT ATTTATGGTCATCGGACCGGCACCGCTGCCGGAGCTGAGGAAATGGACAAGTATCATACATTCTACCGTA CCAAGGGAGAGGTTCTCAGCATCAGCTCAGGAGGTCGCGAGTATCATGGAATTCTTCGTCGTCAATTAGC ATACGGATTCAGCGACCGCGCCATGCGCGAGCAGGAGCCTATCATTGGCAGTTACATTGATTTGCTGATC AAGAGACTCCATGAAAACTGCGTTGATCCCAACGTGAAGGATCCCAAGACAGGAAACCCAGCGGAGAAAA TGCTAAACATGGTCTCCTGGTACAATTGGACCACATTTGATGTGATTGGTGATTTGGTCTTTGGCGAGTC CTTTGGATCGCTTGAAAATGGAAACTATGACCCTTGGGTAGCCGCTATCAATGACTCTATCAAATTCCTT GGTGTTATCAATGGAGTCAAGCATATGGGTCTTGAATCTCTGTTCATTTGGGTTGTTAAAAAACTTAATA CCGGCCGCCGTGAGCACACAGATCGATTGGTCAAGAAGCTTCAGAAACGTATCAATCTCGGCGTTGAGCG TTTAGACTTGATCGAAGGCTTGCTACAGAAGAAAGATGAATGGGTGAGGGAGGCCCCTCTTCTCAAAGTC TCTGAGATTGAACAATGTTTTTTTTTTTTTTTTTTTTTGTTTTGTTTTTCTTACATCCGAGTGCATGCTG ACATTTTATAGAACCTATCTATTCACCACCTTGAAGCCAACGGAAGTAGCATCTTGATTGCCGGTTCAGA AACAACAGCTACAATGTTGAGTGGTGTTACCTATATGCTCCTTACACACCCAGAGGCCCTCCGCAAAGTT ACCAAAGAAGTGCGCACCACTTTCAAGTCCAGCGAAGAGATCACATTGACATCTGTCAGTTCTCTTACGT ATATGCTTGCCTGCTTGAATGAGTCACTGCGTGCTTACCCTCCTGTCCCTTTTGGCATGCCTCGCCAGGT TCCAAAGGGCGGAGCCACCATTGCCGGAGAATATGTCCCAGAAGACGTGAGTTTATTCCCCACTGTTGTT GTCCATGGCTCCCACTCCATGGGCTTCTCGTTGTTTGTTTCAACTTACTAAACATGGACCAATCCAAGAC CGTCGTCGCTATTTGGCACTGGGCTGCCTACCATAACGACCAGCTATGGACCGATCCATTCGGCTATCAT CCAGAGCGCTATCTCCACGATCCCAGATTTGCTAATGATGCTTTTGCAATCCACAATCCATTCTCCCACG GGCCCAGAAATTGCATTGGAAGACAGTAAGCCAAACCAACAACTCACAAGGATGAGATGAACTATTAATT GTATGATGTTATTGTCACTGACTTTCTCGTCTAGTTTGGCATACGCAGAAATGAGACTTGTATTGGCGCG CCTCCTTTTCGATTTTGAGATGCGTTTGGCAGATCCTGACTTAGACTGGTTGAACCACAAATCTTATGTG CTGTGGTCGAAGCCAGCTTTGAACGTATATTTGACTCCTCGTGAATTTTAAATACGTTTTCATGGAATTT ATATACTGGTATCCTCCTGAGGTGGAATCTGCCAGCAATGAGAGCTAACCTTGATTACAACTAGAATGAA ATCCTTCCAAGGTTATTGTTCTTGATTTTAGTGACTGCCTTTTTGTTTTATTTAAATTTGGAGGTTTGGC GATACAACATTGAATAGATTGTAGCAAGGAATCCTGCCAGGCTTGATTAGTTTCTCAGAGTAAGTTTGTA CCTCTGCCGGTATATATATACATGGATGTATATGTAGATCTTCAAAGTATGTCATGAGGAACATCAGGGG AGACGTCTACATATGACATAAATTATCCACTATGAATAAATAATTCATGGAATCCATGGGTCCCAAATTC ATCTACCCTGGACCTACACCGAAATTACAACTGGCACGCTTTACTTCCACAAATGTTGTAAGGTGCCAAT AGTATGGGCCTCAAGGCTGGCAGGGCTGGGATGTCTAAGTAAGAAGTCTAAACGTGGGCAGACCACCAGT AACTAGATCTAATAATACATGGCAGTCTGCGTCTGGTTGCATATGTAACAGTTTATCTCCTGCATATGCC TCGGTGATCTGAAAATACTGCAACTTCATAACCGCAACTGAATCACATTGAACGTATCCGCTAAATCTAA CATCAGCTCATCGCAAGAACGCACAGATTTCACGTTAAACGCAGAGTCAAATCAAGCACAATGCAGCTCA CGCATTACGTCGTTAAAGGTCAACCCCAAAGCATTCGGGCGACCATTCCATATGGGGCAAGATTGTCCCG TATCCTCTCATTATGGGACCTTGTGTATCCCCTCGTGTCCTCGTATCGTTATACCGCATTTGATATTAAC CACTCAGATGTTCCACCTCCTTCGGAAAAATTCATGATCAACCACTTCACGCAGACTCACAACATTCTCG ATCCTCACTTGTTCTTCGGCGTGTTCAACTGCGCAAGCTTCTTCTGCACCTCCGCCTCGACCCTCCGCTG CAACTCCGGCCGATACCCAAACATAAACAAATACTCCAAAAACACAAACAACGGCGCCAAAAACAGCGCC TGGAAAAGATTATCCAACAAAGCCGGCGATCGTCCCTCGTAGGCTCCATGGCCAATAAACTGAGCAATCC ATGACACGATAAAGATTCCGAAGGACCAGGATGTAGCAGTCGCGCCGTAGGTCATTGTGAGGTAATTCAT GTAGGCGGCGGCACCCAGCAGAGTAGGAACGCATATCAGACCGACGACCGGCTCGAGGAGGATGTAGCCC AGGGCGTAGATTAGGGATTGGATTGTGCCGGCGTTGAGAGGGAGGTATTTGTATTGAAGAAAGTCCGGCA GGGGGATGAGCGTTGGTGTGTTGGTGAACTGTGATGAGACGACGGTCAATACACAATACCCCTTTATAAA GGAGGAAGAAAGAAGCTTACCAATTGAATACTGGTTATGAGGATGACGGGAACGAAGATCATGTGTATCC GGACATTGACCTGTGGGACGTTTATCAGCTGGCTGGCAATAAATAATAGCATGAGAGGATCGTCAGGAAC CGCATACCGGATTATGGTGGTAAGTGCCATACTATTTGAATGTTAGCGACGATTCTCTGGTCTAAATAGG TACAGTATCTGCTGGAGAAACTCACAAAGACGAGTTGTCTCTCAAGATCCAAAGTCATCATTATGAGATG ATATTGTGTGTGGGGTTTCAATCCCCAACAGTCAACGACAGGGTGTTGAGCTTCAGTGGGCGGTGGTCTT ATGAAAAGAACTATAAAACAGTTCAAGTGTCAATATAGCTATCAGCGCGGTTGTTAGGTCTTGTTTTGCG ACGTTGCAGATCGTTGCGTGGTTGTCGAAACAGTTGTTGTCTGAACTGTTGGGCTGTCGTTGGATGATGA TATGATAAGATGTTCTGGTAATACGTAGTTAACTAACTGAGAACGGACTTAGCAATACCTAACCAGCGCT GGATGCCGAGGCCACGTCCACCAAGCAAATGTGTCTGGATCACATGCCCTCCCTATCCCCCGCACTCGAT TTCATGCTCGATTATGGGGTGATGATATGGCACGAAATTTAGGCCTGAAATCGATTTATCGATCCATCGA TTGATTGATTGATTGATTGATTGATTTCACTCGACTATGGTACAGGGAACATGATCTATTACTGTGTACA ACGTAGCGACCCGTTCTGATGCATTCGATAATATCTGTACGAGCCATGACATGTTACGATACAATGGGAA TTGGTGTAGTCTGGTATTTCCCTACCTATGGCCCCGGCATACGGGTGTCTCAGTTGCTCAGTTGAAAATA CGTACAAAATATTTGCAAGCGCAGGGTTGAAGAGTCAAGACCATTGAGTTTTGGCGTGGCTTGTCAAGAC TTTCCAGGCAATGGTGGGTCACAGGTCACGGATCACGTGATCGAATGGAAACCCGCTCAGAATCCCGCGT CGTTAAGATGACTCTGCAGAGTACTCTATTTCAACAGTCTCGGGGCAGAAGACGTTGACTGGCGACATCC ATCGCAGGCGCAGCACAGAGCAGAATTTGCCTCTCGATAAGATAGACAAAAAACTTCTGCGACAAGAATA TCTGTGTCCATCCTGGTCTCCATGTCTTTAGTCTAAAGAGCTCTTCTTTTCGCATTGCCTTTGGACATGC TGAGCTTGGCCGGTTCCTTCTTTTGATTCTGATCGAGGGCTATTGCATAGCGGTTATTATCGCATACGCT TCGATTTCCGCATACATTCATTCCCACCGAACACCCCTCGGAAGCACAGAGAATGTTCTTCCTCAAAGAG GAAACCAAAGTCGTCACCCTCCACCCGTCCTTTTTCGGACCCAATGTCAAGGAATATCTGATCAACCGCT TGAACGAAGAAGAAGAGGGACGATGTACGGGTGATCATTTCGTGATTTGTGTCATGGACATGGTGGATAT TGGCGACGGGCGAGTGATACCGGGCAGTGGACATGCAGAATATACTATCAAATATCGCGCGATCATATGG AAGCCTTTCCGAGGGGAGACGGTACGCTTGTCTTTTCTTTCCATCTCGTATCTGTGGTTGTATGCTAACT CGCGTATAGGTCGATGCCATCGTGACCTCGGTTAAACCCACCGGTATCTTCACGCTCGCGGGCCCTCTGT CGGTGTTTATTGCGCGCAAAAATATACCATCAGATATCAAATGGGAACCCGGCACCGTGCCGCCCCAATA TACGGATCATGCAGACCAGGTTATTGAGCGAGGAACGAGTCTGCGGTTGAAGATACTGGGTGTGAAGCCA GATGTGTCTGCGATTAATGCGATTGGCACCATTAAGGAGGATTATCTTGGGTAAGATTATGCACATGACT GTTACTGAATTTGCTACTGACAGTTTTACGATAGCACCCTCTAAGGTTCCTTCACTATTTATAGAGAATC GTCTCCCTCAGCACTCGATGATGCCGAATTTCTACCCGCGACTGCCCTACGAAGAATGGCGGCGTATATG CCTGGTTCTGCGAGTGTTCGACAACAAGTGACTTCGGGCTGGTGATATGTCACTCGCAAGGAAGCTACCA AGTGCAGACGTCACAACCGGATAGTAATGAGCACGAACTATTTTGGCGACATCAGAAAAAGAAAATATTC TGTATCGTTATACATTATACAACAAAACGTGGATGAGTCGGGCGCACTATGTAGTAAATTGTACTTGACG GCGCATCACAGAATTGGCATAGTCAGCGTACATAATACAAATTAAGTTACAGTAACTAGTTAGTTACGAG CTGCGTATGTACGTAGTTGCTTTAAGTTTACTCTGATACAGTAAGCAACACAACAGAGGCTAGAATGGTG TGGGCCTTGCGATCCTCTAATAAGAGTTCTAGAATGAAGATTAGGCTCGGGCAATTTCGCAGGGCGGAGA AGAGAGAATATGCCCTGCCATGCGAATCATATTTCTACAGCATATGTAGCACACTGCTCTCAGAGGCAGG GCCGGTGGTTCGCTGAATCTCCGTATTTCTTACTTGACAGCAGAGACTGTACATCAATAACAACCAGTTT CCTGCCTTCGTTCTGGGAGGAGGACCGTTGCACTAGTTAACCCGTTCTGTGCTAAATTACTCGCAGACCC TACTAGCGCGGCGGACTAAAGATGAAACGCTGGTGCCCTTGCTTAGTCCCTCCCTGCCATAGTTATTTGA TTAGATTATCGTCTTGGAGCACCGGCCCTGGTCCTACACTACGTAGTAGTATGTGACTATGTATGCAGCC TGTATGGAGTAAGCCCAGAGACTGACTTTATGGAGTATGGCGTGTAACAACAAGACTACGTGTAACAAAA TAAATGAAGGAAACGCGTGGAGAACTACGGCGTAGTTCCTCTGTTGTTCCTCAGCTCCTGTCGCAAACCA AGGGTTCTAGCGGAGCGCGAACCAAACAACGAAGTTACTCTAGATCGAGTCTAGATAAAGACAAAAACAA TTGCTTTCGTCCTGACTCTTCCTCTTTTTCTTTTCCTCTTTTTCACTCCTCTCTCATTCCTTTGACATTG TTTACATCTGCAGTCCCGCCACCCCGGCATCTTCCTTCCTCTCGCTCCTCTTGCATCCCTCTCCCTCGCA TCAACATTGAACAAACAGTTCTCTGTTCTGTTGATTGTTTACATACCCCGCCGTCCTTCCTGAGTCGCAC AGTGTCAACAATCTTTTCGAACAAACTTGATCCTAAGACGACAAACAATGGCGCTCAAAGCAGTCTACGA GCGATTCCTCTCCTCCCCAACCTCTGGCGCCCTGAGCGCCGATGTCTCAGTCAACTACATCACGACGGCA ACCTCAATCACGGGCAGCGAACAGGTCCTCAAGCATCTGAGTAACCAGGAGAAGGCAGTCAAGAAAAAGG GAGACAAGATCCTCGGCATCGTCGAGACTCCGGATACTATTGTCCTGGACGTGGAGACTACACTGGAGTT TGTGACTGGGGGTGGGGCTTATCTGCCCTCGCTGGACGATAATTTCCTAGCCGACCGCGTAGTGACTTTT CCCACGGTACGTTTTGTGTTTTCTCTATCGTCGTATGTAATAGGCCCCTTTGCTGATTGATGACTGCAGA TACACATTGTTCACTTCAATTCGCAATCCCAGATCCAACAGGTCAGAGTCTATTGGGACCAGGGTTCGCT CTTGAAACAAGTCGACGTCATCGGTTCTCGTGGAAAGAATTGGCCCATTCGCGATGCTAAGGATCAGTCG AAGTTGATTGCCAATGTAGCAACTGCACAGGGTATCACCACACCCCCGCCCCAAAAACAGGTCCCCGTCG CAGTTGGAAAAGCGCCTGGTTCGTCTAGACCTCCTTCGCGCTCGCCCAGCGAGATTTTTGGCGGCAGTGA TGACGTTGAGAACGAGCCCACTCCTCGCGCTCGCGATAGCATCATTTCGCCTCGAGCTGGCGCTGCGAAG CACTTCCGACCTGTCCGTGTTTTTGGTGAAGACGATGAAGATGTCACTTCAAGCCCCATCAAGCCCAAGA TTGGTTCGAACAAGGGATTCCAACCCGTGCGCGTATTCGACGTAGAGCAACATGAAGCCGAAGTCGCTCA GCAGCCACGAGCCGGAGCGACTAAGAACTTCCAACCCATCCGTGTCTTCGA

42_002196668.1[

250000..270000].fa