·
Engenharia Florestal ·
Genética Molecular
· 2023/2
Send your question to AI and receive an answer instantly
Recommended for you
59
Slide - Técnicas Inovadoras de Melhoramento de Precisão - 2023-2
Genética Molecular
USP
77
Slide - Biologia Sintética - 2023-2
Genética Molecular
USP
55
Slide - Métodos de Transformação de Plantas - 2023-2
Genética Molecular
USP
57
Slide - Marcadores Moleculares - Uso no Melhoramento - 2023-2
Genética Molecular
USP
65
Slide - Estudos das Ômicas - 2023-2
Genética Molecular
USP
7
P1 - Genética Molecular 2022 2
Genética Molecular
USP
58
Aula 7 - Marcadores Moleculares 2022-2
Genética Molecular
USP
85
Aula - Estrutura e Expressão de Genes
Genética Molecular
USP
77
Aula 4 - Tecnologia do Dna Recombinante
Genética Molecular
USP
52
Aula 6 - Métodos de Transformação de Plantas
Genética Molecular
USP
Preview text
LGN0232 - Genética Molecular Bancos de Dados Biológicos Antonio Figueira CENA figueira@cena.usp.br Roteiro da Aula 1. Definição de Banco de Dados Biológicos 2. Bioinformática 3. Recursos oferecidos pelo NCBI Forma de busca de informações: palavra-chave, sequências de nucleotídeos ou amino ácidos, espécies, artigos, autores,... 4. Utilização da plataforma BLAST Dogma Central da Biologia Molecular Genoma Transcritoma Proteoma Avanços tecnológicos recentes permitiram o surgimento da Era das Ômicas FENÓTIPO Ambiente Projetos de Sequenciamento Aumento do Número de Projetos de Sequenciamento Novas tecnologias e redução de custos Compartilhamento das Informações Obtidas + Banco de Dados Biológicos Banco de Dados Biológicos (BDB) O que são? São repositórios online que centralizam as informações genéticas (sequências) de DNA, RNA ou proteína, dentre outros Centralizar os dados, torná-los públicos e permitir o acesso a informações geradas Objetivos do BDS: Permite, por exemplo, comparar genes/genomas de espécies distintas. Homologia • Homologia: dividem a mesma ancestralidade com significado evolutivo • Permite inferências sobre a funcionalidade das sequencias identificadas Homologia – conceito fundamental na biologia Nature is prodigal in variety, but niggard in inovation - Charles Darwin A análise de sequências objetiva encontrar similaridades importantes que permitam inferir sobre homologia Exemplos: Órgãos homólogos – asas de morcego e mãos de humanos (mesma origem) Órgãos similares – asas de morcego e asas de borboleta (mesma função) Bioinformática Produção massiva de sequências de DNA, mRNA, proteínas • A bioinformática consiste no desenvolvimento de métodos computacionais, matemáticos e estatísticos para organizar e analisar informações biológicas em grande escala e de maneira integrada - Bancos de Dados Biológicos Organização e Armazenamento Visualização e Análise -- Ferramentas computacionais Compreensão do significado biológico >LT594788.1 Theobroma cacao genome assembly, chromosome: I ATCGGCAGTGACGTTTTATGATGATGAGATCATTGCTCTTGCACAGCCATTTAAACATTCCATGGTAGGA AAGTTTTCACGTATGCCCCGGTTGAATGACATTAGGGTTGCTTTCAAAGGAATCGGGCTAGTGGGTGCAT ATGAAATTCGTTGGTTGGATTATAAGCACATCCTGATTCATTTATCTAATGAGCAAGATCTGAATCATTT ATGGATGCGTCAAGCATGGTTCATTGCAAACCAGAAGATGAGAGTCTTTAAGTGGACTCCGGATTTCCAA TCGAAAAGGGAATCCTTCTTGGTTCCCGTTTGGGTCTCATTTTCGAACCTGCGGGCTCATCTATATGAAA AATCGACACTTCCGATGATTGCTAAGTCGGTGGGGAGACCACTTTTTATTGATGAAGCTACGGCAAATGG CACACGACCAAGTGTGGCCCGAGTGTGTGTTGAGTACGACTGCCAGCAGCCCCCTCTTGAACAGATCTGG ATCGTGACTAGGGATAGAAGCACAGGAAACATCACTGGAGGATTTCAACAGAAAGTAGAGTTTGCCAGGC TTCCTGACTATTGCAATCACTGTTGCCATGTGGGACATAGTATTGCAACATGTCTGGTGATGGGTCACAG TAAGGACAAGCCAAGAAAGGCACGGCCTAAGCCCCTTGTGGATAAAAAGCAGGAAGATGATGATTGGAAA AGAGAGAAAAGTAAGGAAACAGGTGATCTAATGGTTAATGGCGATAAAAGGAAAAATTCGATCCAAACAG AATCGAAAAAGCAGAGCGTGAAATGGGTGAAGGTTGAAAAGGGTGGCACAAGCGGGTTCAAGGATGCCCA CGGCGTAGAAGTCAATCTGGAGAGTAGTGGAGCAGATCCCGTGCAGATCTCGAATGGTTTTAGGGTGCTA GAAGCAATGGAGGATGGCGGGGATGTTAGATCCGCAAAACAGGGGAGAACAGAGAAGGTGAACAGTACCA TGCAATTTTTAAAAAATATTTTTAGGGAGAAAGAAAGGCAGTCGACGGAGATGGAAAGATGCTCGGGAAA GATAAATGGCGACGAAACGACATTAGAAGCTCTACCGATAAAACGGACTGCAGATGGAGTGAATCGGGAC AAGCTAAAATCTTCTACAGTGGGTGTGATCGAGGGTCCAAAGCAGAAGGAGAGTGAGGTTAAGCAAAGTT CTGTGCAGACGTTGATGGCTGAAATTTGGCGGACAGGAGCAGATACTCACGAGAGTGTAGAAAATATTGC AGACTTTGATCGAGTTCAATGGGCGATGGATGCAGGTCGTGTGACGTCCTGGAAGGCAAAAAAAAAGAGC AACAGAAAACTTGAGGACCGACTGTCGGGGACGGCCGTGCAAGGTGATGGTCAGACAGTACCGGAGGTCG AACAATGCTTGGGGAGTCCAAAACAGTGGGTGTACCGTCTAAACGTGGACGGTGAAAAGGTGCTGAAGGG TGGTGAAAATGTGCAGTTGAGTCAACTCGACAGTAATAGTGTAGTGAGTTCTCGTGGCTGTCTTAAACTC GGTACTGTTCACTCTCATGTAGCCAACTCCCGTGCGGTACATGCAGTGAAAGGAAGTATACACCGGTTGG AAGAAAATGCTTTACTAGGGGAACCAGCAGCTAGTTCACGTGAAGTGATGGAAGAAAATGCAGAACACGA TCCAAACTTGGGATCCAACCTGGGTATATGTGGTTACAATAAAGAAATAAGTTCGGTTCCTTCATGTGCA GGAACTAATTCTGCTGACTTTCACGCACATTTGGAAGCAAACAAACAACAGGAGAACAACAATCGAGGGC AAGTAAATCAAATCGAAACTGATGATAGCAGTAGATCAGTGCTCCATGTGGACTCGGGAGAGATTTTGGA CAGCCAGCATATTAAATACCACCCCATGGTTTCCAGGAGAAGAAAATCCGATAGTGAAGTTATATATATC CCTTCAGAGGATATTCTTTCAGAGAATGATGCTCATATGTTGATGGATGGGTCTGATGAAGAATCCATCT CCAAGCAATTTACCACTAGAACTTACCCATGATCAGTGCCCTGCTTTGGAATGTAAGGGGAGTGACTGGA AAAGCAATCCAAAGGAGAATTAAAAAACTGCAGATGATGCACCAAATAAAGATATTGGTTATCCTGGAAC CAATGGTAACTGTTGATCGAATTGAATTTTTTAGGAGAAAATTAGGCTTTGAGGGGGCGGCCTTTAATTG TTCTCAAAAAATTTGGATTTTTTGGATGCACGGCATCACTTGCACAACCAGGTTTGATCATCCCCAATGC TTGCATGTTCAATTATGTTTCCCGTAGCTTCCTGTCCCTATTGAAGCTTCATTTGTTTATGCTAAATGTA CTAGAATGGAACGACTTGCTTTATGGGATTTTATGAGACGTATTGCAGAGGATGTACAGGGTCCTTGGCT GGCTGGAGGCGACTTTAATGTTATTTTAAGGTGAGAAGAGAGATTTTTGGGTGCAGACCCACATACTGGA GCCATGGAAGATTTTGCAAATGCCTTACTTGATTGTGGGTTAGTAGATGCAGGGTTTGAAGGCAACAATT TTACGTGGACTAACTCCCGGATGTTCCAAAGATTAGATCGGATTCTCTATAACCCACAGTGGGTAGCTCA Genoma do cacaueiro Banco de Dados NCBI –National Center for Biotechnology Information • https://www.ncbi.nlm.nih.gov/ • Iniciado em 1988 – ligado a biblioteca de medicina • Missão: melhor entendimento dos processos moleculares que afetam a saúde humana • Understanding nature's mute but elegant language of living cells is the quest of modern molecular biology. • NCBI cria banco de dados públicos e recursos de biologia computacional e disseminação de informações NCBI - PubMed https://pubmed.ncbi.nlm.nih.gov/ Origem do NCBI -> National Library of Medicine Histórico de Sequências no NCBI https://www.ncbi.nlm.nih.gov/genbank/statistics/ International Nucleotide Sequence Database Collaboration insdc.org/ International Nucleotide Sequence Database Collaboration https://www.ddbj.nig.ac.jp/statistics/index-e.html NCBI NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | NCBI News & Blog Submit Deposit data or manuscripts into NCBI databases Download Transfer NCBI data to your computer Learn Find help documents, attend a class or watch a tutorial Develop Use NCBI APIs and code libraries to build applications Analyze Identify an NCBI tool for your data analysis task Research Explore NCBI research and collaborative projects Popular Resources PubMed Bookshelf PubMed Central BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI News & Blog Announcing GenBank release 252.0 19 Oct 2022 Now over 3 billion records! GenBank release 252.0 (10/17/2022) is now available on the NCBI FTP site. This release has 20.35 trillion bases and 3.10 billion records. The current release has 240,539,282 traditional records containing 1,562,963,366,851 base pairs of sequence data. There are also 2,167,900,306 WGS records containing 18,231,960,808,828 base pairs of sequence data, 574,020,080 .... Continue https://www.ncbi.nlm.nih.gov/ NCBI https://www.ncbi.nlm.nih.gov/home/analyze/ NCBI - TaxBrowser https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Root NCBI - Genomas https://www.ncbi.nlm.nih.gov/Traces/wgs/ Tipos de Bancos de Dados Biológicos GenBank Protein Data Bank (PDB) Swiss Prot Protein Information Resources (PIR) Ribosomal Database Project https://medium.com/omixdata/bancos-de-dados-biol%C3%B3gicos-parte-i-o-ncbi-c16dfc1b0a84 Banco de Dados Biológicos (BDB) Tipos de Bancos de Dados Nível de Curadoria: -Preliminar – sequências não terminadas - localizadas nos centros de sequenciamento -Arquivo – repositório da informação - redundante (várias sequências do mesmo gene) - submissor mantém o controle editorial sobre registros -Com Curadoria – não redundante – ex. RefSeq NCBI https://www.ncbi.nlm.nih.gov/refseq/ - cada registro pretende conter conhecimento sobre a sequencia -Revisado - Kyoto Encyclopedia of Genes and Genomes - KEGG Genes, genomas, enzimas, rotas metabólicas Outros bancos específicos.. Proteínas https://www.expasy.org/ - Expasy – Instituto Suíço de Bioinformática https://proteininformationresource.org/ https://www.uniprot.org/ Estrutura de Proteínas Protein Data Bank - https://www.rcsb.org/ Structure (NCBI) - https://www.ncbi.nlm.nih.gov/structure Outros bancos específicos.. Genomas espécie-específicos • http://www.yeastgenome.org/ • http://flybase.org/ • http://www.maizegdb.org/ • http://rice.plantbiology.msu.edu/ • https://solgenomics.net/ • https://cocoa-genome-hub.southgreen.fr/ NCBI (National Center for Biotechnology Information) – fundado em 1988 O website https://www.ncbi.nlm.nih.gov/ foi criado em 1994 Literature — Repositório de artigos científicos, livros, entre outros. Um dos bancos mais utilizados dessa categoria é o PubMed Central É de uso gratuito e acolhe diversos BDBs, separados por categorias: Genes — São encontradas sequências gênicas e anotações para estudo de estrutura de ortólogos, expressão e evolução Proteins — Apresenta dados como sequências proteicas, estruturas tridimensionais (3D) e domínios proteicos Genomes — Possui bancos de sequências genômicas, dados de genômica funcional e origem de amostras biológicas. Um dos principais bancos dessa categoria é o Nucleotide, que tem o GenBank como um dos seus principais componentes BLAST — É uma ferramenta que realiza consultas em diferentes bancos de dados, como Nucleotide e Protein PubChem — Repositório de informações químicas, rotas metabólicas e ferramentas para screening de atividade biológica https://medium.com/omixdata/bancos-de-dados-biol Banco de Dados Biológicos (BDB) Busca nos bancos de dados • Por texto – palavra chave, número das sequências, espécie, gênero,... • Por sequência de nucleotídeos ou amino ácidos • Uso de programa específico - BLAST • Basic Local Alignment Search Tool • BLAST para nucleotídeos e amino ácidos U.S. National Library of Medicine NCBI National Center for Biotechnology Information Sign in to NCBI NCBI HOME LITERATURE HEALTH GENOMES GENES PROTEINS CHEMICALS POPULAR RESOURCES ▼ All Databases Search NCBI Analyze NCBI provides a wide variety of data analysis tools that allow users to manipulate, align, visualize and evaluate biological data. Selected Analysis Tools All Tools Literature Health Genomes Genes Proteins Chemicals Filter this table Tools Description Amino Acid Explorer Explores amino acid properties, substitutions and functions Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in GenBank/EMBL/DDBJ Basic Local Alignment Search Tool (BLAST) Finds regions of local similarity between biological sequences Batch Entrez Retrieves records specified in an uploaded file of identifiers BioAssay Services Tools that summarize the biological test results in the PubChem database BLAST Link (BLink) Displays the results of a pre-computed BLAST search of a protein against all other protein sequences at NCBI BLAST Microbial Genomes Finds regions of local similarity between query sequences and sequences from complete microbial genomes found in GenBank U.S. National Library of Medicine NCBI National Center for Biotechnology Information Sign in to NCBI NCBI HOME LITERATURE HEALTH GENOMES GENES PROTEINS CHEMICALS POPULAR RESOURCES ▼ All Databases Search NCBI cp4 epsps agrobacterium Analyze NCBI provides a wide variety of data analysis tools that allow users to manipulate, align, visualize and evaluate biological data. Selected Analysis Tools All Tools Literature Health Genomes Genes Proteins Chemicals Filter this table Tools Description Amino Acid Explorer Explores amino acid properties, substitutions and functions Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in GenBank/EMBL/DDBJ Basic Local Alignment Search Tool (BLAST) Finds regions of local similarity between biological sequences Batch Entrez Retrieves records specified in an uploaded file of identifiers BioAssay Services Tools that summarize the biological test results in the PubChem database BLAST Link (BLink) Displays the results of a pre-computed BLAST search of a protein against all other protein sequences at NCBI BLAST Microbial Genomes Finds regions of local similarity between query sequences and sequences from complete microbial genomes found in GenBank National Library of Medicine National Center for Biotechnology Information Log in Search NCBI cp4 epsps agrobacterium Search Results found in 6 databases Literature Bookshelf 0 MeSH 0 NLM Catalog 0 PubMed 44 PubMed Central 212 Genomes Assembly 0 BioCollections 1 BioProject 0 BioSample 0 Genome 0 Nucleotide 22 SRA 0 Taxonomy 0 Genes Gene 1 GEO DataSets 0 GEO Profiles 0 HomoloGene 0 PopSet 0 Clinical ClinicalTrials.gov 0 ClinVar 0 dbGaP 0 dbSNP 0 dbVar 0 GTR 0 MedGen 0 OMIM 0 Proteins Conserved Domains 0 Identical Protein Groups 0 Protein 12 Protein Family Models 0 Structure 9 PubChem BioAssays 0 Compounds 0 Pathways 0 Substances 0 FOLLOW NCBI https://www.ncbi.nlm.nih.gov/nuccore/AB209952.1 https://www.ncbi.nlm.nih.gov/nuccore/AB209952.1 FEATURES Location/Qualifiers source 1..2457 /organism="Glycine max" /mol_type="genomic DNA" /cultivar="Roundup Ready 30-4-2" /db_xref="taxon:3847" /clone="pC4a-TOPO" /transgenic /country="Japan" source 1..265 /organism="Cauliflower mosaic virus" /mol_type="genomic DNA" /db_xref="taxon:10641" source 298..510 /organism="Petunia x hybrida" /mol_type="genomic DNA" /db_xref="taxon:4102" /note="synonym: Petunia hybrida" source 511..2457 /organism="Agrobacterium sp. CP4" /mol_type="genomic DNA" /strain="CP4" /db_xref="taxon:268951" gene 1..265 /gene="CaMV35S" regulatory 1..265 /regulatory_class="promoter" /gene="CaMV35S" misc_feature 266..297 /note="Cauliflower mosaic virus 35S promoter" /note="nontranslation region" gene 298..1881 /gene="cp4epsps" CDS 298..1881 /gene="cp4epsps" /note="5-enol-pyruvylshikimate-3-phospate synthase (EPSPS) class 2 precursor" /codon_start=1 /product="5-enol-pyruvylshikimate-3-phospate synthase class 2 precursor" /protein_id="BAA98423.1" /translation="MAQQNRNAQIGQTTLNPIMSHFKPQVKSSFSLYFGSKGLNSKAN SMLVLKKDSIFQFKQCFSRFISASAVTACHLMAGGSSRATARKSSGLVFSGYGTVDGGKSI SHRSFFGLGALGSTEGIIVEDGWTDGFDVSQQDIGAQIAMGRAGRIGTIVGDCIGVNLGL APEAPLDGFNVGAGRIITELEGDVDIYKIGKGGAMQIARGMRGMVEDIDIVGGVNGKLVI EDGQDRPLYVKPGTPIPTPYTIGPYPTGVQPSLASQVSVLSLGLSNLGAIPTTLVEGFA EDGDRLPIVPTKRPTGYPVYAVTGRILEPSGSGAALVVALLGNRPIIVIRGTTHENGVPM KMLQFGGANVIETFGLGITDGIPVTVETGAMGRIITVVESKLGRGALFSYTPVIATIMTDKF VITIILVLMNPTRITGLAIGEGGMARMYVERITVDIAGGEDVAIAGRTDGSYGVGVDEG APPPMDIEDPIPVAAIFAQLFQAAYGGRGTVGRVVFIEGGEDGVDAIVMGGETFSITDL LVRARGDPGKGIALAASGAAVATHIHDAIRMAIFSLWLFGVPDMAATIAIFSPVTESTDF MDLNGLAGAKIELSDKFA" REPEAT REGION 2186..2439 /note="repeated fragment of cp4epsps" /truncate cp4epsps" ORIGIN 1 tgaaaaaggg aggtgcctcc tacaacagtc atctattgca taaaggagaa gccaacgttg 61 aaatgcctcc tccgcaaggt ggtccaagga tgctaacggg gatctgagga cacgtgcacc 121 aaaaaggagg agttccaacc aaggtcttag gcttgtggaa tgtactgata attcaattct 181 atgtaagaga tgaagcacta tccaaccttt caaacattaa actcgtgctg ctatctaatg 241 gtttcctact agaaaccgaa cgagataaaa tccagtgttc tgctcagagc gagaagacgc 301 ncaacaaata aacaatnpc xaaatgcttc caatttgcat 361 aaccacccag ttccaggttt ccengatt nttttttttg gatcaaaaca atattatttattt aacgaatga 421 tcgcaatttc atgatattgt tttgggaaaa tagagaaaat ccggcccgtt aacaatagga 481 tttgaagatt cttgttggaa atcgccgaaa cctgaccggt gtgcgaggtt gcttggtgaa 541 gggaaatacc atgtgtgatt cgatggtccg 601 attacctgca aagatgttct atgcgtgacc tcaacaacgc ctaagtcgga atccgggccg 661 ctgctcatcg tgcgacgagc taagaccgcc taacatgtac gacacaaact ttaccaaacg 721 atacggaggg tggaaaatag tgatcatcga cgcccgggga gagctatcgccggcagccgtg 781 cccagcctga gatgccaaag gaaaagatgt ggcgaagtgg ttcatatttt cttagacgac 841 agggagtcta cgtcgacgct gacgaccgga ttgctccgag ccagaacatg gctgaattgg 901 atggcggcgt tggatctgag gaataaaacg ttagcggtgg atcacacaac gaggtcatgg 961 tcgaaccttt tgctggtgcc agtttttcgt tctctggggt ggctgccgpg cgggtcggtt 1021 gaaataggca agcgctagct tggattggtc gaagcgtgtt accgacctgt accgttcttc 1081 agatctacta actgtctcig tcatgagcgg gattgtggtg gacggtggtg ccggcttcga 1141 gtctagacat tgcagcctca gggcaccacc actaaaaggg gtaaatggat tcgtgatgtc 1201 cttcacctag gtggagacat gactagatgg ttgcacatgg gcgacgggct gcgagaccgg 1261 taccatcgga aaacatcaat ggaactcatc taaccagact gacggagatg ccgtttcca 1321 gagcaggaga gcgaaggcgt tgaagcgggc agcggccagg gcagccagag cagataaacc 1381 agtttaatga gtcggtaaaa ttgaaggacg acggttccac ccatcaccat cgatgatcgt 1441 aagtccagga aaagagtaat ccctctaccc gtgcaaatgc tgatactagt cctcagacta 1501 gctggtgtgg ttggcgagga gtatattggg cttgcgcatg cccagcggaa tactgggacg 1561 tggaatggct ggcggtcgtc cccatgggaa ctgtgtggat gcgaggggag cccgtgtcac 1621 accgccagta atagcggctt gggctcttcc aagtaaaaat gcttcaggtt tatagacggc 1681 tgattcttca atgattgagt ctgtgcccat cccggtgaag actattccaa tagctggcag 1741 gatggcaaat gccgtaggac tgaacgctgg agggctaccc ggtgagacca tcccaactag 1801 gcaaagagct agaggaagaa atcttatctt gcctccgcac gagagccggc accggcacgg 1861 tagctagaat agcaggttcc tgtcagtagc gtgctgtacg ttcgccgctg aaggtgacca 1921 htattgttaa aaaaataaag ttagtgacca gttcatacca aatatgaaac aatgccagtg 1981 gctacctaaa tggcagccat gaatatgatt ctaaacaagc agaggagtgt aacgaggagt 2041 tagattttat ttattctaag ttttgggaaa gaggagatga tcctcctgtt 2081 atctagactc 2101 ttaatagcct cgctggccgg cgtctgagct tctttggcgc cgccgctcag cccgtcttgg 2161 tgctggttgg AB209952: 1 segment 1 of 1 Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds GenBank: AB209952.1 GenBank Graphics >AB209952.1:298-1881 Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds ATGGCAAATTAACAACGTGCAACAGGAACAACAACCCATTCCTACGTAACACCC AAGTTCATGAAGAGGAGACGATGTTGTTTTCCTTTCGTTTTTTTTTCTTTGTATATTCTATT GGTTTTGAACAAAAGATTCAATTTAATTAAGGTTTTTATGTCAGTAGATGGTGATAACA GCCTGCAGCTGTTCAGCAAAGACGCGCGGCCAGCCCCGTCGGTGGTGCCAGC TCCGAGTTCCAGTTCACCGGAACAGGACGACCAGTTCCTGTTGAGTAGAGATAAC GGCATCGAGTCCCCTTGTGGACAGCGGGACGGAAGGACTTGTGTGGGTGCGGGGGG CGGCAACGACCCTGAGGGAACACCAAGCGCGAAGGCTGTGCGTGAGTGCTGCGGGC GCGCGGAATGGCTGGATCCAAACAACAAGGAGGCTTCATCAAGTGGACCTTCGGCTTG CGCGCTGGATTTCGATTGCGACCGAGCCGCGAGGCCGCGGCGTCTACTGGTTTC GAATAGTGGGATTGCGAAGTGGAACGGAAGTAAAGTTCCATGATCGGAATCGA CGAGGCCGATACGTCAAGACCCTGCCGCCGTACACGACTTTGCCGTACGAAGATTTC CAACACGCCGCGCAAAAGATTGAAGGCTGGAGACCGGTGCGACGGTAGAGTGGC GGTGTCGGGCAAATCGATTGAGATGGGGGACCTGAGCCTGCGAGATTTCGATGC GGCTTTGATCCGGAAGAAGTACGAGGATAGCATTTGCCGGGGCGGAGATCGG CGGCAAGTATCTGCGGGGCAGAAGCTGCGGGAGCGAGGAATCGGCTGCGTGGT AACCCTGGTATTGCGCTAGGCTATCATCCGCACGGCACCGAACCATCGTGACGA AACTGCCCCATAAGGAGAAACGGAAGGCCCCGTCTTTGCCCTCAAGAGTGGG CGATGAGATCCATTGAGGCCGCGGCAAGAGCTGAGAGGCGTGGCTTCCCGA CGGAAGACTTGGCCGAGACCGAACACAACCCTGGCCGACTGCGTCAAGGCC GGACTGCCGCGGCACCTCAGCTCAGCAGTACGTCATCGCAGGACCATTGACCTG AATCCCCTGAAGCGTGAGTGACCGAGGAATTGAGGTCGAAGACCACGAGACA CGTCCGACACCTCCGCTCGGTCGATACGTCGACGACGTGCCAGTCGAGGAAGCC GGATGCGAAGATCGGTTCCGCCAGGCCGTCGCACGCGAGCCATCGTGAGCACC AACCTTCGACCAGCAAAGGAAAGGGCAAAAGTACTGACGCGTTACCAGCAAGA GCTGGTGTCGCGCGAGA Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds GenBank: AB209952.1 GenBank FASTA Graphics Link To This View Feedback Gene cp4epsps BRD948231 cp4epsps Repeat region Features nontranslation region regulatory Features biosrc Features Agrobacterium sp. C /othersynonym: Pet... AB209952.1: 1..2.5K (2,457 nt) Tracks shown: 6/7 Busca por BLAST •Busca por nucleotídeos ou amino ácidos (proteínas) •Comparação de sequencias para identificar similaridade significativa de DNA ou PTN para inferir função, origem, filogenia • Realiza comparações entre pares de sequencias, buscando regiões com similaridade local • Alinhamento local (segmentos) é a base da busca por BLAST • Usa algoritmos para gerar alinhamento de sequências Basic Local Alignment Search Tool Alinhamento Global: é feito quando comparamos uma sequência de aminoácidos ou nucleotídeos com outra ao longo de toda sua extensão Alinhamento Local: a comparação entre duas sequências não é feita ao longo de toda sua extensão, mas sim através de pequenas regiões destas. O BLAST é o principal programa para realizar o alinhamento local Comparando sequências - Alinhamento Comparando sequências - Alinhamento O alinhamento de sequencias consiste em comparar duas sequencias (de nucleotídeos ou aminoácidos) de forma a identificarmos o grau de identidade/similaridade entre elas Identidade Número de posições invariáveis em duas sequências (nucleotídeos ou aminoácidos) alinhadas. BLAST Similaridade Grau de semelhança entre duas sequencias de proteínas expressa em percentual de amino ácidos com característica similar alinhados Busca por BLAST • Identidade: ocorrência do exato mesmo nucleotídeo ou amino ácido na mesma posição em sequencias alinhadas • Similaridade: ocorrência de amino ácidos equivalentes (quimicamente) na mesma posição • Homologia: dividem a mesma ancestralidade com significado evolutivo Homologia – conceito fundamental na biologia Algoritmos em Blast: • Não avaliam homologia • Medem similaridade e identidade entre sequências A análise de sequências objetiva encontrar similaridades importantes que permitam inferir sobre homologia Exemplos: Órgãos homólogos – asas de morcego e mãos de humanos (mesma origem) Órgãos similares – asas de morcego e asas de borboleta (mesma função) Alinhamento Global Alinhamento Local BLAST Comparando sequências - Alinhamento XM_010026669.3 PREDICTED: Syzygium oleosum rubisco accumulation factor 1.1, chloroplastic (LOC115690120), mRNA Sequence ID: XM_030616373.2 Length: 1930 Number of Matches: 1 Range 1: 56 to 1928 GenBank Graphics Score 2082 bits(1127) Expect 0.0 Identities 1643/1886(87%) Gaps 60/1886(3%) Strand Plus/Plus Query 24 aataaAaagcTC-aaagccatcagtactgaacttcaagactagggaacc 82 | || ||||| ||||| |||||||||||||||||||| |||| 115 Sbjct 56 AAGAAAATCTAAAGGGCATCAGTACTGAACTTCAAGACCATGGAGATC 115 Query 83 acagtgatcaccctccgccggaggaccggccggcgcc--t 139 ||| | | |||||||||||||||||||||||||||| 174 Sbjct 116 ACAGTGACCACCCTCCGCCGGAGGACCGGCCGGCGCCTCACACC 174 Query 140 ccg---ccaccattgaggccctacatacaaacccacc-----c 193 | || ||||| ||| | ||||||||||||| | || 234 Sbjct 175 CCACCACCACCATTGAAAACCCTAAGTGAAAAGACCCACCCCCTAACC 235 Query 194 aaatttctaattccacctaccactacaactctctcccctcacaccac 253 | |||| | ||| ||||||| ||||||||| || |||| | 290 Sbjct 236 AAATTTCAAGTCCAACCTACCACCACAACCTCTCTACCCCTACCAC 290 Query 254 cttgcgccacgccaccgaccgctatggtccccagaatgctgccaggc 313 | | || || |||| ||||||||| || | || || Sbjct 291 C-TGCAATGCCGACACCCCTCGCGATGAAGCCATCTCCGCTACGCC 348 Query 314 acccggagggtgacctccacccttccctggttggtggggggcgggct 373 | || || |||| ||||||||||||| | | Sbjct 349 ACCCGGAGCGCTG -----------------ACCTCAGCCACTA 369 Query 374 tccggctccgccctcctccggt---ctccgtgctaggcctcaaaa 408 ||| |||||| || ||||||| | ||| | Sbjct 409 TCCGGCTCCGACCCACCCTCCGGTCTCCCCGTTC 440 NH National Library of Medicine National Center for Biotechnology Information BLAST Check out the ClusteredNR database on BLAST+ Learn more Give us feedback Home Recent Results Saved Strategies Help Log in Basic Local Alignment Search Tool BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. Learn more Web BLAST Nucleotide BLAST nucleotide › nucleotide blastx translated nucleotide › protein tblastn protein › translated nucleotide Protein BLAST protein › protein NEWS BLAST Quick Start guides! Need some help getting started with BLAST? Thu, 22 Jun 2023 More BLAST news... Busca por BLAST Tipos de BLAST de acordo com o tipo de sequencia fornecida e qual o tipo buscado Artigo BLAST Busca por BLAST Nucleotide Sequence Translated Protein Sequence blastn Nucleotide DB tblastn Protein Sequence blastx blastp Protein DB tblastx Translated DB (contain amino acid sequences) Em 6 quadros Em 6 quadros >gi|226347322|gb|FJ830553.1| Anabaena planctonica CENA210 ribulose-1,5- bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds CCGGCGAAATTAAAGGTCACTACCTCAACGTTACCGCTCCTACCTGCGAAGAAATGTTGAAACGGGCTGA GTACGCTAAAGAACTCAAAATGCCCATCATCATGCACGACTACCTAACCGCAGGTTTCACCGCTAACACC ACATTGGCTCGTTGGTGTCGTGATAACGGTATTTTATTGCACATTCACCGTGCTATGCACGCTGTAATTG ACCGTCAAAAAAATCACGGTATCCACTTCCGCGTATTAGCTAAAGCCCTCCGCTTGTCCGGTGGTGATCA CATCCACACTGGTACAGTTGTTGGTAAGTTAGAAGGTGAACGCGGTATTACCATGGGCTTCGTTGACTTA TTACGTGAAAACTACGTTGAGCAAGACAAGTCTCGCGGTATTTACTTTACCCAAGATTGGGCGTCTCTAC CTGGTGTAATGGCCGTTGCTTCTGGTGGTATCCACGTATGGCATATGCCCGCGTTGGTTGAGATCTTCGG TGATGACTCCGTATTACAATTCGGTGGTGGTACACTCGGACATCCTTGGGGTAACGCTCCTGGTGCTACA GCTAACCGCGTAGCTCTAAAAGCAGTTGTTCAAGCTCGTAACGAAGGCCGTAACTTAGCTCGTGAAGGTA ACGATATTATCCGCGAAGCTGCTAAGTGGTCTCCTGAGTTGGCTGTTGCTTGCGAACTG >gi|226347323|gb|ACO50079.1| ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit [Anabaena planctonica CENA210] GEIKGHYLNVTAPTCEEMLKRAEYAKELKMPIIMHDYLTAGFTANTTLARWCRDNGILLHIHRAMHAVID RQKNHGIHFRVLAKALRLSGGDHIHTGTVVGKLEGERGITMGFVDLLRENYVEQDKSRGIYFTQDWASLP GVMAVASGGIHVWHMPALVEIFGDDSVLQFGGGTLGHPWGNAPGATANRVALKAVVQARNEGRNLAREGN DIIREAAKWSPELAVACEL Formato FASTA: formato universalmente aceito para ser processado Identificador - linha do nome (máximo 80 caracteres por linha) • Nossa sequência –> query (consulta) • O resultado da busca em BLAST pode ser um ou mais hits em sequências-sujeito (subject), ou seja, sequências pertencentes ao banco • Os melhores resultados de escores são relatados, • usar valor E • valor E <0.01 Quanto menor o e-value, mais significativo o alinhamento!!! Busca por BLAST Busca por BLAST BLAST® » blastn suite Standard Nucleotide BLAST Enter accession number(s), gi(s), or FASTA sequence(s) Query subrange Clear From To Or, upload file Escolher arquivo Nenhum arquivo escolhido Job Title Enter a descriptive title for your BLAST search Align two or more sequences Choose Search Set Database Standard databases (nr etc.) rRNAsITS databases Genomic + transcript databases Belacoronavirus Organism Nucleotide collection (nr/nt) Enter organism name or id—completions will be suggested Add organism Exclude Common Models (XM/XP) Uncultured/environmental sample sequences Limit to Sequences from type material Enter Query Create custom database Enter an Entrez query to limit search Program Selection Optimize for Highly similar sequences (megablast) More dissimilar sequences (discontiguous megablast) Somewhat similar sequences (blastn) Choose a BLAST algorithm BLAST Search database Nucleotide collection (nr/nt) using Megablast (Optimize for highly similar sequences) Show results in a new window Algorithm parameters Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST pesticidal protein (plasmid) [Bacillus thuringiensis] GenBank: AKJ62760.1 Bacillus thuringiensis strain T29 Cry1Ac gene, partial cds GenBank: MK882923.1 >gi|47933333|gb|AY262820.1| Pinus radiata cellulose synthase (CesA10) mRNA, complete cds Length=4482 Score = 7374 bits (3720), Expect = 0.0 Identities = 3741/3741 (100%), Gaps = 0/3741 (0%) Strand=Plus/Plus Query 1 GCACGAGGATTTATAATCCGGAATTACGTGATTATCATTGGTTTACACGTTAGCGTGGGAGCTGGTGAT 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1 GCACGAGGATTTATAATCCGGAATTACGTGATTATCATTGGTTTACACGTTAGCGTGGGAGCTGGTGAT 60 Query 61 ATTTTAGTTTTTATCCGAAACTTTCGGGCGTGAGCAAGAAAGGGTGAAGAAAGGTTGGAACAGTGGTG 120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 61 ATTTTAGTTTTTATCCGAAACTTTCGGGCGTGAGCAAGAAAGGGTGAAGAAAGGTTGGAACAGTGGTG 120 Query 121 GAATGGGTGTGAGAAGGTTGTAACTCCCAAG 180 ||||||||||||||||||||||||||| Sbjct 121 GAATGGGTGTGAGAAGGTTGTAACTCCCAAG 180 Query 181 TATTAGAGTGTTGCGAAGCGAAACGAATTTA 240 ||||||||||||||||||||||||||| Sbjct 181 TATTAGAGTGTTGCGAAGCGAAACGAATTTA 240 Query 241 TTCTGTGAAGCTTTTTAGTCTCTTGTTCATG 300 ||||||||||||||||||||||||||| Sbjct 241 TTCTGTGAAGCTTTTTAGTCTCTTGTTCATG 300 Barra = Identidade >gi|47933335|gb|AY262821.1| Pinus radiata cellulose synthase (CesA2) mRNA, partial cds Length=3603 Score = 866 bits (437), Expect = 0.0 Identities = 977/1157 (84%), Gaps = 0/1157 (0%) Strand=Plus/Plus Query 1450 GAAGACCTTCAAAATGAGTAGATGGAACGACCTCGAACCCCTCAAGAAGAGAGTGGTTCCTATTGCT 1509 |||| |||||||||||||| || | ||||||||||||||| || || | || | || || | || Sbjct 697 GAAGACCCTGGAAATGAGTACATGGAACGACCTCCAACCCCTCTCTCAAGATGCCAAAGGTTTCACT 756 Query 1510 CTCTCCAAGATCAATAGACGAGGCTTAACCGGCTCTACGCGTATCGGGCGCTTTC 1569 ||||| || | || | || || ||| ||| ||||||||| || ||| Sbjct 757 TCTTCCAAGAGTCAATAAACGTAGGCTAAACGCTCTACCGCATACGGATGCCTTT 816 Query 1570 TTCTTCCGCTACCGGGAATATTGCATAAGCGTATAAGCGATATTACGATGTGACTGTTTACTCTGTGAAGCT 1629 ||| || | || | || || | | ||| ||| |||| |||| | || | || | | || | Sbjct 817 TTCTTCCCGCTCCGGGAATGTTGCATTACGCTCTAAGCGTCATTACAATGTGACTGTTTAATCTGTAAGC 876 Query 1630 GTAATAGTGATGGAAGAGG 1647 | | || |||||| || Sbjct 877 GTAATAATGAGTGGAAGGCT 896 Query 1690 CCCACTAGATAGGGAAAGGA 1749 |||| |||| ||||| |||| Sbjct 936 CCCACTAGATAGGGAAAGGA 937 Busca por BLASTp BLAST® » blastp suite Standard Protein BLAST Enter Query Sequence Enter accession number(s), gi(s), or FASTA sequence(s) Clear Query subrange From To Or, upload file Escolher arquivo Nenhum ar...ivo escolhido Job Title Enter a descriptive title for your BLAST search more… □ Align two or more sequences Choose Search Set Standard Databases Organism Exclude Optional Optional □ Standard databases (nr etc.) New □ Exp... Select to compare standard and experimental database Add organism Program Selection Algorithm Quick BLASTP (Accelerated protein-protein BLAST) blastp (protein-protein BLAST) Non-redundant protein sequences (nr) RefSeq Select proteins... Job Title gi|47933334|gb|AAG63935.1| cellulose synthase… Enter a descriptive title for your BLAST search Choose Search Set Database Non-redundant protein sequences (nr) Organism Optional Enter organism common name or id-completions will be suggested Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Entrez Query Optional Enter an Entrez query to limit search Program Selection Algorithm blastp (protein-protein BLAST) PSI-BLAST (Position-Specific Iterated BLAST) PHI-BLAST (Pattern Hit Initiated BLAST) Choose a BLAST algorithm BLAST Search database nr using Blastp (protein-protein BLAST) Show results in a new window Algorithm parameters Human Mouse Rat Arabidopsis thaliana Oryza sativa Gallus gallus Bos taurus Pan troglodytes Danio rerio Microbes Drosophila melanogaster Apis mellifera Basic BLAST Choose a BLAST program to run. nucleotide blast Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontinuous megablast protein blast Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast blastx Search protein database using a translated nucleotide query tblastn Search translated nucleotide database using a protein query tblastx Search translated nucleotide database using a translated nucleotide query Specialized BLAST Choose a type of specialized search (or database name in parentheses.) Search trace archives Find conserved domains in your sequence (cds) Find sequences with similar conserved domain architecture (cdart) Tip of the Day How to Search Custom Databases in Web-Blast Using Entrez Queries A powerful feature of the BLAST Web interface is the ability to limit BLAST searches to a subset of any database using a standard Entrez query. Skillful use of Entrez queries allows the equivalent of on-the-fly construction of databases of exact composition More tips… Enter Query Sequence Enter accession number, gi, or FASTA sequence Clear >gi|47933333|gb|AY262820.1| Pinus radiata cellulose synthase (CesA10) mRNA, complete cds GCACCGAGTGGTGGTACCAGTCACGGTACTCTAACTTAACACGAACCAGCTCA... Or, upload file Genetic code Standard (1) [?] Job Title gi|47933333|gb|AY262820.1| Pinus radiata cellulose… Enter a descriptive title for your BLAST search Choose Search Set Database Non-redundant protein sequences (nr) Organism Optional Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Entrez Query Optional Enter an Entrez query to limit search BLAST Search database nr using Blastx (search protein databases using a translated nucleotide query) Show results in a new window Algorithm parameters >gi|47933334|gb|AAQ63935.1| cellulose synthase [Pinus radiata] Length=1096 Score = 2221 bits (5754), Expect = 0.0 Identities = 1096/1096 (100%), Positives = 1096/1096 (100%), Gaps = 0/1096 (0%) Frame = +1 Query 649 MEARTNTAA…CNE 828 MEARTNTAA…CNE Sbjct 1 MEARTNTAA…CNE 60 Query 829 CAFPVCRPC…TQGNR 1008 CAFPVCRPC…TQGNR Sbjct 61 CAFPVCRPC…TQGNR 120 Query 1009 NEKQQIAEAM…SEYR 1188 NEKQQIAEAM…SEYR Sbjct 121 NEKQQIAEAM…SEYR 180 Query 1189 IAAPPTGGGS…KQDK 1368 IAAPPTGGGS…KQDK Sbjct 181 IAAPPTGGGS…KQDK 240 Query 1369 NTLQVTSDTY…IVVL 1548 NTLQVTSDTY…IVVL Sbjct 241 NTLQVTSDTY…IVVL 300 >gi|47933336|gb|AAQ63936.1| cellulose synthase [Pinus radiata] Length=1066 Score = 1813 bits (4695), Expect = 0.0 Identities = 890/1066 (83%), Positives = 972/1066 (91%), Gaps = 9/1066 (0%) Frame = +1 Query 760 ICQICGEDVL…SPQV 939 +CQIC+DVGL…+CSP+V Sbjct 3 VCQICGDDVG…KHGSPRV 62 Query 940 DGDKEDEDAD…PELPQL 1116 +GD+ ADD++++++ Sbjct 63 EDGEDGADDY…ARSESES 122 Query 1117 QVPLTINGQA…PADK 1296 Q+P +TINGQH…P+AKD Sbjct 123 QIPRLTINGS…DHSRD 181 Query 1297 FNSYGFGNVA…DEA 1476 FNSYGFGNW…+DEA Sbjct 182 FNSYGFGNVA…MDEA 241 Query 1477 RQPLSKRVPI…FAI 1656 RQPLRKRVPI…FAI Sbjct 242 RQPLRKRVPI…FAI 300 ESTUDO DIRIGIDO 1. Bancos de dados públicos e internacionais: NCBI, EMBL, DDBJ; 2. Definição de Bioinformática; 3. Análise da sequência no NCBI; 4. Busca de sequências por similaridade; 5. BLAST e Banco de dados de sequências.
Send your question to AI and receive an answer instantly
Recommended for you
59
Slide - Técnicas Inovadoras de Melhoramento de Precisão - 2023-2
Genética Molecular
USP
77
Slide - Biologia Sintética - 2023-2
Genética Molecular
USP
55
Slide - Métodos de Transformação de Plantas - 2023-2
Genética Molecular
USP
57
Slide - Marcadores Moleculares - Uso no Melhoramento - 2023-2
Genética Molecular
USP
65
Slide - Estudos das Ômicas - 2023-2
Genética Molecular
USP
7
P1 - Genética Molecular 2022 2
Genética Molecular
USP
58
Aula 7 - Marcadores Moleculares 2022-2
Genética Molecular
USP
85
Aula - Estrutura e Expressão de Genes
Genética Molecular
USP
77
Aula 4 - Tecnologia do Dna Recombinante
Genética Molecular
USP
52
Aula 6 - Métodos de Transformação de Plantas
Genética Molecular
USP
Preview text
LGN0232 - Genética Molecular Bancos de Dados Biológicos Antonio Figueira CENA figueira@cena.usp.br Roteiro da Aula 1. Definição de Banco de Dados Biológicos 2. Bioinformática 3. Recursos oferecidos pelo NCBI Forma de busca de informações: palavra-chave, sequências de nucleotídeos ou amino ácidos, espécies, artigos, autores,... 4. Utilização da plataforma BLAST Dogma Central da Biologia Molecular Genoma Transcritoma Proteoma Avanços tecnológicos recentes permitiram o surgimento da Era das Ômicas FENÓTIPO Ambiente Projetos de Sequenciamento Aumento do Número de Projetos de Sequenciamento Novas tecnologias e redução de custos Compartilhamento das Informações Obtidas + Banco de Dados Biológicos Banco de Dados Biológicos (BDB) O que são? São repositórios online que centralizam as informações genéticas (sequências) de DNA, RNA ou proteína, dentre outros Centralizar os dados, torná-los públicos e permitir o acesso a informações geradas Objetivos do BDS: Permite, por exemplo, comparar genes/genomas de espécies distintas. Homologia • Homologia: dividem a mesma ancestralidade com significado evolutivo • Permite inferências sobre a funcionalidade das sequencias identificadas Homologia – conceito fundamental na biologia Nature is prodigal in variety, but niggard in inovation - Charles Darwin A análise de sequências objetiva encontrar similaridades importantes que permitam inferir sobre homologia Exemplos: Órgãos homólogos – asas de morcego e mãos de humanos (mesma origem) Órgãos similares – asas de morcego e asas de borboleta (mesma função) Bioinformática Produção massiva de sequências de DNA, mRNA, proteínas • A bioinformática consiste no desenvolvimento de métodos computacionais, matemáticos e estatísticos para organizar e analisar informações biológicas em grande escala e de maneira integrada - Bancos de Dados Biológicos Organização e Armazenamento Visualização e Análise -- Ferramentas computacionais Compreensão do significado biológico >LT594788.1 Theobroma cacao genome assembly, chromosome: I ATCGGCAGTGACGTTTTATGATGATGAGATCATTGCTCTTGCACAGCCATTTAAACATTCCATGGTAGGA AAGTTTTCACGTATGCCCCGGTTGAATGACATTAGGGTTGCTTTCAAAGGAATCGGGCTAGTGGGTGCAT ATGAAATTCGTTGGTTGGATTATAAGCACATCCTGATTCATTTATCTAATGAGCAAGATCTGAATCATTT ATGGATGCGTCAAGCATGGTTCATTGCAAACCAGAAGATGAGAGTCTTTAAGTGGACTCCGGATTTCCAA TCGAAAAGGGAATCCTTCTTGGTTCCCGTTTGGGTCTCATTTTCGAACCTGCGGGCTCATCTATATGAAA AATCGACACTTCCGATGATTGCTAAGTCGGTGGGGAGACCACTTTTTATTGATGAAGCTACGGCAAATGG CACACGACCAAGTGTGGCCCGAGTGTGTGTTGAGTACGACTGCCAGCAGCCCCCTCTTGAACAGATCTGG ATCGTGACTAGGGATAGAAGCACAGGAAACATCACTGGAGGATTTCAACAGAAAGTAGAGTTTGCCAGGC TTCCTGACTATTGCAATCACTGTTGCCATGTGGGACATAGTATTGCAACATGTCTGGTGATGGGTCACAG TAAGGACAAGCCAAGAAAGGCACGGCCTAAGCCCCTTGTGGATAAAAAGCAGGAAGATGATGATTGGAAA AGAGAGAAAAGTAAGGAAACAGGTGATCTAATGGTTAATGGCGATAAAAGGAAAAATTCGATCCAAACAG AATCGAAAAAGCAGAGCGTGAAATGGGTGAAGGTTGAAAAGGGTGGCACAAGCGGGTTCAAGGATGCCCA CGGCGTAGAAGTCAATCTGGAGAGTAGTGGAGCAGATCCCGTGCAGATCTCGAATGGTTTTAGGGTGCTA GAAGCAATGGAGGATGGCGGGGATGTTAGATCCGCAAAACAGGGGAGAACAGAGAAGGTGAACAGTACCA TGCAATTTTTAAAAAATATTTTTAGGGAGAAAGAAAGGCAGTCGACGGAGATGGAAAGATGCTCGGGAAA GATAAATGGCGACGAAACGACATTAGAAGCTCTACCGATAAAACGGACTGCAGATGGAGTGAATCGGGAC AAGCTAAAATCTTCTACAGTGGGTGTGATCGAGGGTCCAAAGCAGAAGGAGAGTGAGGTTAAGCAAAGTT CTGTGCAGACGTTGATGGCTGAAATTTGGCGGACAGGAGCAGATACTCACGAGAGTGTAGAAAATATTGC AGACTTTGATCGAGTTCAATGGGCGATGGATGCAGGTCGTGTGACGTCCTGGAAGGCAAAAAAAAAGAGC AACAGAAAACTTGAGGACCGACTGTCGGGGACGGCCGTGCAAGGTGATGGTCAGACAGTACCGGAGGTCG AACAATGCTTGGGGAGTCCAAAACAGTGGGTGTACCGTCTAAACGTGGACGGTGAAAAGGTGCTGAAGGG TGGTGAAAATGTGCAGTTGAGTCAACTCGACAGTAATAGTGTAGTGAGTTCTCGTGGCTGTCTTAAACTC GGTACTGTTCACTCTCATGTAGCCAACTCCCGTGCGGTACATGCAGTGAAAGGAAGTATACACCGGTTGG AAGAAAATGCTTTACTAGGGGAACCAGCAGCTAGTTCACGTGAAGTGATGGAAGAAAATGCAGAACACGA TCCAAACTTGGGATCCAACCTGGGTATATGTGGTTACAATAAAGAAATAAGTTCGGTTCCTTCATGTGCA GGAACTAATTCTGCTGACTTTCACGCACATTTGGAAGCAAACAAACAACAGGAGAACAACAATCGAGGGC AAGTAAATCAAATCGAAACTGATGATAGCAGTAGATCAGTGCTCCATGTGGACTCGGGAGAGATTTTGGA CAGCCAGCATATTAAATACCACCCCATGGTTTCCAGGAGAAGAAAATCCGATAGTGAAGTTATATATATC CCTTCAGAGGATATTCTTTCAGAGAATGATGCTCATATGTTGATGGATGGGTCTGATGAAGAATCCATCT CCAAGCAATTTACCACTAGAACTTACCCATGATCAGTGCCCTGCTTTGGAATGTAAGGGGAGTGACTGGA AAAGCAATCCAAAGGAGAATTAAAAAACTGCAGATGATGCACCAAATAAAGATATTGGTTATCCTGGAAC CAATGGTAACTGTTGATCGAATTGAATTTTTTAGGAGAAAATTAGGCTTTGAGGGGGCGGCCTTTAATTG TTCTCAAAAAATTTGGATTTTTTGGATGCACGGCATCACTTGCACAACCAGGTTTGATCATCCCCAATGC TTGCATGTTCAATTATGTTTCCCGTAGCTTCCTGTCCCTATTGAAGCTTCATTTGTTTATGCTAAATGTA CTAGAATGGAACGACTTGCTTTATGGGATTTTATGAGACGTATTGCAGAGGATGTACAGGGTCCTTGGCT GGCTGGAGGCGACTTTAATGTTATTTTAAGGTGAGAAGAGAGATTTTTGGGTGCAGACCCACATACTGGA GCCATGGAAGATTTTGCAAATGCCTTACTTGATTGTGGGTTAGTAGATGCAGGGTTTGAAGGCAACAATT TTACGTGGACTAACTCCCGGATGTTCCAAAGATTAGATCGGATTCTCTATAACCCACAGTGGGTAGCTCA Genoma do cacaueiro Banco de Dados NCBI –National Center for Biotechnology Information • https://www.ncbi.nlm.nih.gov/ • Iniciado em 1988 – ligado a biblioteca de medicina • Missão: melhor entendimento dos processos moleculares que afetam a saúde humana • Understanding nature's mute but elegant language of living cells is the quest of modern molecular biology. • NCBI cria banco de dados públicos e recursos de biologia computacional e disseminação de informações NCBI - PubMed https://pubmed.ncbi.nlm.nih.gov/ Origem do NCBI -> National Library of Medicine Histórico de Sequências no NCBI https://www.ncbi.nlm.nih.gov/genbank/statistics/ International Nucleotide Sequence Database Collaboration insdc.org/ International Nucleotide Sequence Database Collaboration https://www.ddbj.nig.ac.jp/statistics/index-e.html NCBI NCBI Home Resource List (A-Z) All Resources Chemicals & Bioassays Data & Software DNA & RNA Domains & Structures Genes & Expression Genetics & Medicine Genomes & Maps Homology Literature Proteins Sequence Analysis Taxonomy Training & Tutorials Variation Welcome to NCBI The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. About the NCBI | Mission | Organization | NCBI News & Blog Submit Deposit data or manuscripts into NCBI databases Download Transfer NCBI data to your computer Learn Find help documents, attend a class or watch a tutorial Develop Use NCBI APIs and code libraries to build applications Analyze Identify an NCBI tool for your data analysis task Research Explore NCBI research and collaborative projects Popular Resources PubMed Bookshelf PubMed Central BLAST Nucleotide Genome SNP Gene Protein PubChem NCBI News & Blog Announcing GenBank release 252.0 19 Oct 2022 Now over 3 billion records! GenBank release 252.0 (10/17/2022) is now available on the NCBI FTP site. This release has 20.35 trillion bases and 3.10 billion records. The current release has 240,539,282 traditional records containing 1,562,963,366,851 base pairs of sequence data. There are also 2,167,900,306 WGS records containing 18,231,960,808,828 base pairs of sequence data, 574,020,080 .... Continue https://www.ncbi.nlm.nih.gov/ NCBI https://www.ncbi.nlm.nih.gov/home/analyze/ NCBI - TaxBrowser https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Root NCBI - Genomas https://www.ncbi.nlm.nih.gov/Traces/wgs/ Tipos de Bancos de Dados Biológicos GenBank Protein Data Bank (PDB) Swiss Prot Protein Information Resources (PIR) Ribosomal Database Project https://medium.com/omixdata/bancos-de-dados-biol%C3%B3gicos-parte-i-o-ncbi-c16dfc1b0a84 Banco de Dados Biológicos (BDB) Tipos de Bancos de Dados Nível de Curadoria: -Preliminar – sequências não terminadas - localizadas nos centros de sequenciamento -Arquivo – repositório da informação - redundante (várias sequências do mesmo gene) - submissor mantém o controle editorial sobre registros -Com Curadoria – não redundante – ex. RefSeq NCBI https://www.ncbi.nlm.nih.gov/refseq/ - cada registro pretende conter conhecimento sobre a sequencia -Revisado - Kyoto Encyclopedia of Genes and Genomes - KEGG Genes, genomas, enzimas, rotas metabólicas Outros bancos específicos.. Proteínas https://www.expasy.org/ - Expasy – Instituto Suíço de Bioinformática https://proteininformationresource.org/ https://www.uniprot.org/ Estrutura de Proteínas Protein Data Bank - https://www.rcsb.org/ Structure (NCBI) - https://www.ncbi.nlm.nih.gov/structure Outros bancos específicos.. Genomas espécie-específicos • http://www.yeastgenome.org/ • http://flybase.org/ • http://www.maizegdb.org/ • http://rice.plantbiology.msu.edu/ • https://solgenomics.net/ • https://cocoa-genome-hub.southgreen.fr/ NCBI (National Center for Biotechnology Information) – fundado em 1988 O website https://www.ncbi.nlm.nih.gov/ foi criado em 1994 Literature — Repositório de artigos científicos, livros, entre outros. Um dos bancos mais utilizados dessa categoria é o PubMed Central É de uso gratuito e acolhe diversos BDBs, separados por categorias: Genes — São encontradas sequências gênicas e anotações para estudo de estrutura de ortólogos, expressão e evolução Proteins — Apresenta dados como sequências proteicas, estruturas tridimensionais (3D) e domínios proteicos Genomes — Possui bancos de sequências genômicas, dados de genômica funcional e origem de amostras biológicas. Um dos principais bancos dessa categoria é o Nucleotide, que tem o GenBank como um dos seus principais componentes BLAST — É uma ferramenta que realiza consultas em diferentes bancos de dados, como Nucleotide e Protein PubChem — Repositório de informações químicas, rotas metabólicas e ferramentas para screening de atividade biológica https://medium.com/omixdata/bancos-de-dados-biol Banco de Dados Biológicos (BDB) Busca nos bancos de dados • Por texto – palavra chave, número das sequências, espécie, gênero,... • Por sequência de nucleotídeos ou amino ácidos • Uso de programa específico - BLAST • Basic Local Alignment Search Tool • BLAST para nucleotídeos e amino ácidos U.S. National Library of Medicine NCBI National Center for Biotechnology Information Sign in to NCBI NCBI HOME LITERATURE HEALTH GENOMES GENES PROTEINS CHEMICALS POPULAR RESOURCES ▼ All Databases Search NCBI Analyze NCBI provides a wide variety of data analysis tools that allow users to manipulate, align, visualize and evaluate biological data. Selected Analysis Tools All Tools Literature Health Genomes Genes Proteins Chemicals Filter this table Tools Description Amino Acid Explorer Explores amino acid properties, substitutions and functions Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in GenBank/EMBL/DDBJ Basic Local Alignment Search Tool (BLAST) Finds regions of local similarity between biological sequences Batch Entrez Retrieves records specified in an uploaded file of identifiers BioAssay Services Tools that summarize the biological test results in the PubChem database BLAST Link (BLink) Displays the results of a pre-computed BLAST search of a protein against all other protein sequences at NCBI BLAST Microbial Genomes Finds regions of local similarity between query sequences and sequences from complete microbial genomes found in GenBank U.S. National Library of Medicine NCBI National Center for Biotechnology Information Sign in to NCBI NCBI HOME LITERATURE HEALTH GENOMES GENES PROTEINS CHEMICALS POPULAR RESOURCES ▼ All Databases Search NCBI cp4 epsps agrobacterium Analyze NCBI provides a wide variety of data analysis tools that allow users to manipulate, align, visualize and evaluate biological data. Selected Analysis Tools All Tools Literature Health Genomes Genes Proteins Chemicals Filter this table Tools Description Amino Acid Explorer Explores amino acid properties, substitutions and functions Assembly Archive Links the raw sequence information found in the Trace Archive with assembly information found in GenBank/EMBL/DDBJ Basic Local Alignment Search Tool (BLAST) Finds regions of local similarity between biological sequences Batch Entrez Retrieves records specified in an uploaded file of identifiers BioAssay Services Tools that summarize the biological test results in the PubChem database BLAST Link (BLink) Displays the results of a pre-computed BLAST search of a protein against all other protein sequences at NCBI BLAST Microbial Genomes Finds regions of local similarity between query sequences and sequences from complete microbial genomes found in GenBank National Library of Medicine National Center for Biotechnology Information Log in Search NCBI cp4 epsps agrobacterium Search Results found in 6 databases Literature Bookshelf 0 MeSH 0 NLM Catalog 0 PubMed 44 PubMed Central 212 Genomes Assembly 0 BioCollections 1 BioProject 0 BioSample 0 Genome 0 Nucleotide 22 SRA 0 Taxonomy 0 Genes Gene 1 GEO DataSets 0 GEO Profiles 0 HomoloGene 0 PopSet 0 Clinical ClinicalTrials.gov 0 ClinVar 0 dbGaP 0 dbSNP 0 dbVar 0 GTR 0 MedGen 0 OMIM 0 Proteins Conserved Domains 0 Identical Protein Groups 0 Protein 12 Protein Family Models 0 Structure 9 PubChem BioAssays 0 Compounds 0 Pathways 0 Substances 0 FOLLOW NCBI https://www.ncbi.nlm.nih.gov/nuccore/AB209952.1 https://www.ncbi.nlm.nih.gov/nuccore/AB209952.1 FEATURES Location/Qualifiers source 1..2457 /organism="Glycine max" /mol_type="genomic DNA" /cultivar="Roundup Ready 30-4-2" /db_xref="taxon:3847" /clone="pC4a-TOPO" /transgenic /country="Japan" source 1..265 /organism="Cauliflower mosaic virus" /mol_type="genomic DNA" /db_xref="taxon:10641" source 298..510 /organism="Petunia x hybrida" /mol_type="genomic DNA" /db_xref="taxon:4102" /note="synonym: Petunia hybrida" source 511..2457 /organism="Agrobacterium sp. CP4" /mol_type="genomic DNA" /strain="CP4" /db_xref="taxon:268951" gene 1..265 /gene="CaMV35S" regulatory 1..265 /regulatory_class="promoter" /gene="CaMV35S" misc_feature 266..297 /note="Cauliflower mosaic virus 35S promoter" /note="nontranslation region" gene 298..1881 /gene="cp4epsps" CDS 298..1881 /gene="cp4epsps" /note="5-enol-pyruvylshikimate-3-phospate synthase (EPSPS) class 2 precursor" /codon_start=1 /product="5-enol-pyruvylshikimate-3-phospate synthase class 2 precursor" /protein_id="BAA98423.1" /translation="MAQQNRNAQIGQTTLNPIMSHFKPQVKSSFSLYFGSKGLNSKAN SMLVLKKDSIFQFKQCFSRFISASAVTACHLMAGGSSRATARKSSGLVFSGYGTVDGGKSI SHRSFFGLGALGSTEGIIVEDGWTDGFDVSQQDIGAQIAMGRAGRIGTIVGDCIGVNLGL APEAPLDGFNVGAGRIITELEGDVDIYKIGKGGAMQIARGMRGMVEDIDIVGGVNGKLVI EDGQDRPLYVKPGTPIPTPYTIGPYPTGVQPSLASQVSVLSLGLSNLGAIPTTLVEGFA EDGDRLPIVPTKRPTGYPVYAVTGRILEPSGSGAALVVALLGNRPIIVIRGTTHENGVPM KMLQFGGANVIETFGLGITDGIPVTVETGAMGRIITVVESKLGRGALFSYTPVIATIMTDKF VITIILVLMNPTRITGLAIGEGGMARMYVERITVDIAGGEDVAIAGRTDGSYGVGVDEG APPPMDIEDPIPVAAIFAQLFQAAYGGRGTVGRVVFIEGGEDGVDAIVMGGETFSITDL LVRARGDPGKGIALAASGAAVATHIHDAIRMAIFSLWLFGVPDMAATIAIFSPVTESTDF MDLNGLAGAKIELSDKFA" REPEAT REGION 2186..2439 /note="repeated fragment of cp4epsps" /truncate cp4epsps" ORIGIN 1 tgaaaaaggg aggtgcctcc tacaacagtc atctattgca taaaggagaa gccaacgttg 61 aaatgcctcc tccgcaaggt ggtccaagga tgctaacggg gatctgagga cacgtgcacc 121 aaaaaggagg agttccaacc aaggtcttag gcttgtggaa tgtactgata attcaattct 181 atgtaagaga tgaagcacta tccaaccttt caaacattaa actcgtgctg ctatctaatg 241 gtttcctact agaaaccgaa cgagataaaa tccagtgttc tgctcagagc gagaagacgc 301 ncaacaaata aacaatnpc xaaatgcttc caatttgcat 361 aaccacccag ttccaggttt ccengatt nttttttttg gatcaaaaca atattatttattt aacgaatga 421 tcgcaatttc atgatattgt tttgggaaaa tagagaaaat ccggcccgtt aacaatagga 481 tttgaagatt cttgttggaa atcgccgaaa cctgaccggt gtgcgaggtt gcttggtgaa 541 gggaaatacc atgtgtgatt cgatggtccg 601 attacctgca aagatgttct atgcgtgacc tcaacaacgc ctaagtcgga atccgggccg 661 ctgctcatcg tgcgacgagc taagaccgcc taacatgtac gacacaaact ttaccaaacg 721 atacggaggg tggaaaatag tgatcatcga cgcccgggga gagctatcgccggcagccgtg 781 cccagcctga gatgccaaag gaaaagatgt ggcgaagtgg ttcatatttt cttagacgac 841 agggagtcta cgtcgacgct gacgaccgga ttgctccgag ccagaacatg gctgaattgg 901 atggcggcgt tggatctgag gaataaaacg ttagcggtgg atcacacaac gaggtcatgg 961 tcgaaccttt tgctggtgcc agtttttcgt tctctggggt ggctgccgpg cgggtcggtt 1021 gaaataggca agcgctagct tggattggtc gaagcgtgtt accgacctgt accgttcttc 1081 agatctacta actgtctcig tcatgagcgg gattgtggtg gacggtggtg ccggcttcga 1141 gtctagacat tgcagcctca gggcaccacc actaaaaggg gtaaatggat tcgtgatgtc 1201 cttcacctag gtggagacat gactagatgg ttgcacatgg gcgacgggct gcgagaccgg 1261 taccatcgga aaacatcaat ggaactcatc taaccagact gacggagatg ccgtttcca 1321 gagcaggaga gcgaaggcgt tgaagcgggc agcggccagg gcagccagag cagataaacc 1381 agtttaatga gtcggtaaaa ttgaaggacg acggttccac ccatcaccat cgatgatcgt 1441 aagtccagga aaagagtaat ccctctaccc gtgcaaatgc tgatactagt cctcagacta 1501 gctggtgtgg ttggcgagga gtatattggg cttgcgcatg cccagcggaa tactgggacg 1561 tggaatggct ggcggtcgtc cccatgggaa ctgtgtggat gcgaggggag cccgtgtcac 1621 accgccagta atagcggctt gggctcttcc aagtaaaaat gcttcaggtt tatagacggc 1681 tgattcttca atgattgagt ctgtgcccat cccggtgaag actattccaa tagctggcag 1741 gatggcaaat gccgtaggac tgaacgctgg agggctaccc ggtgagacca tcccaactag 1801 gcaaagagct agaggaagaa atcttatctt gcctccgcac gagagccggc accggcacgg 1861 tagctagaat agcaggttcc tgtcagtagc gtgctgtacg ttcgccgctg aaggtgacca 1921 htattgttaa aaaaataaag ttagtgacca gttcatacca aatatgaaac aatgccagtg 1981 gctacctaaa tggcagccat gaatatgatt ctaaacaagc agaggagtgt aacgaggagt 2041 tagattttat ttattctaag ttttgggaaa gaggagatga tcctcctgtt 2081 atctagactc 2101 ttaatagcct cgctggccgg cgtctgagct tctttggcgc cgccgctcag cccgtcttgg 2161 tgctggttgg AB209952: 1 segment 1 of 1 Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds GenBank: AB209952.1 GenBank Graphics >AB209952.1:298-1881 Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds ATGGCAAATTAACAACGTGCAACAGGAACAACAACCCATTCCTACGTAACACCC AAGTTCATGAAGAGGAGACGATGTTGTTTTCCTTTCGTTTTTTTTTCTTTGTATATTCTATT GGTTTTGAACAAAAGATTCAATTTAATTAAGGTTTTTATGTCAGTAGATGGTGATAACA GCCTGCAGCTGTTCAGCAAAGACGCGCGGCCAGCCCCGTCGGTGGTGCCAGC TCCGAGTTCCAGTTCACCGGAACAGGACGACCAGTTCCTGTTGAGTAGAGATAAC GGCATCGAGTCCCCTTGTGGACAGCGGGACGGAAGGACTTGTGTGGGTGCGGGGGG CGGCAACGACCCTGAGGGAACACCAAGCGCGAAGGCTGTGCGTGAGTGCTGCGGGC GCGCGGAATGGCTGGATCCAAACAACAAGGAGGCTTCATCAAGTGGACCTTCGGCTTG CGCGCTGGATTTCGATTGCGACCGAGCCGCGAGGCCGCGGCGTCTACTGGTTTC GAATAGTGGGATTGCGAAGTGGAACGGAAGTAAAGTTCCATGATCGGAATCGA CGAGGCCGATACGTCAAGACCCTGCCGCCGTACACGACTTTGCCGTACGAAGATTTC CAACACGCCGCGCAAAAGATTGAAGGCTGGAGACCGGTGCGACGGTAGAGTGGC GGTGTCGGGCAAATCGATTGAGATGGGGGACCTGAGCCTGCGAGATTTCGATGC GGCTTTGATCCGGAAGAAGTACGAGGATAGCATTTGCCGGGGCGGAGATCGG CGGCAAGTATCTGCGGGGCAGAAGCTGCGGGAGCGAGGAATCGGCTGCGTGGT AACCCTGGTATTGCGCTAGGCTATCATCCGCACGGCACCGAACCATCGTGACGA AACTGCCCCATAAGGAGAAACGGAAGGCCCCGTCTTTGCCCTCAAGAGTGGG CGATGAGATCCATTGAGGCCGCGGCAAGAGCTGAGAGGCGTGGCTTCCCGA CGGAAGACTTGGCCGAGACCGAACACAACCCTGGCCGACTGCGTCAAGGCC GGACTGCCGCGGCACCTCAGCTCAGCAGTACGTCATCGCAGGACCATTGACCTG AATCCCCTGAAGCGTGAGTGACCGAGGAATTGAGGTCGAAGACCACGAGACA CGTCCGACACCTCCGCTCGGTCGATACGTCGACGACGTGCCAGTCGAGGAAGCC GGATGCGAAGATCGGTTCCGCCAGGCCGTCGCACGCGAGCCATCGTGAGCACC AACCTTCGACCAGCAAAGGAAAGGGCAAAAGTACTGACGCGTTACCAGCAAGA GCTGGTGTCGCGCGAGA Glycine max transgenic cp4epsps gene for 5-enol-pyruvylshikimate-3-phosphate synthase class 2 precursor, complete cds GenBank: AB209952.1 GenBank FASTA Graphics Link To This View Feedback Gene cp4epsps BRD948231 cp4epsps Repeat region Features nontranslation region regulatory Features biosrc Features Agrobacterium sp. C /othersynonym: Pet... AB209952.1: 1..2.5K (2,457 nt) Tracks shown: 6/7 Busca por BLAST •Busca por nucleotídeos ou amino ácidos (proteínas) •Comparação de sequencias para identificar similaridade significativa de DNA ou PTN para inferir função, origem, filogenia • Realiza comparações entre pares de sequencias, buscando regiões com similaridade local • Alinhamento local (segmentos) é a base da busca por BLAST • Usa algoritmos para gerar alinhamento de sequências Basic Local Alignment Search Tool Alinhamento Global: é feito quando comparamos uma sequência de aminoácidos ou nucleotídeos com outra ao longo de toda sua extensão Alinhamento Local: a comparação entre duas sequências não é feita ao longo de toda sua extensão, mas sim através de pequenas regiões destas. O BLAST é o principal programa para realizar o alinhamento local Comparando sequências - Alinhamento Comparando sequências - Alinhamento O alinhamento de sequencias consiste em comparar duas sequencias (de nucleotídeos ou aminoácidos) de forma a identificarmos o grau de identidade/similaridade entre elas Identidade Número de posições invariáveis em duas sequências (nucleotídeos ou aminoácidos) alinhadas. BLAST Similaridade Grau de semelhança entre duas sequencias de proteínas expressa em percentual de amino ácidos com característica similar alinhados Busca por BLAST • Identidade: ocorrência do exato mesmo nucleotídeo ou amino ácido na mesma posição em sequencias alinhadas • Similaridade: ocorrência de amino ácidos equivalentes (quimicamente) na mesma posição • Homologia: dividem a mesma ancestralidade com significado evolutivo Homologia – conceito fundamental na biologia Algoritmos em Blast: • Não avaliam homologia • Medem similaridade e identidade entre sequências A análise de sequências objetiva encontrar similaridades importantes que permitam inferir sobre homologia Exemplos: Órgãos homólogos – asas de morcego e mãos de humanos (mesma origem) Órgãos similares – asas de morcego e asas de borboleta (mesma função) Alinhamento Global Alinhamento Local BLAST Comparando sequências - Alinhamento XM_010026669.3 PREDICTED: Syzygium oleosum rubisco accumulation factor 1.1, chloroplastic (LOC115690120), mRNA Sequence ID: XM_030616373.2 Length: 1930 Number of Matches: 1 Range 1: 56 to 1928 GenBank Graphics Score 2082 bits(1127) Expect 0.0 Identities 1643/1886(87%) Gaps 60/1886(3%) Strand Plus/Plus Query 24 aataaAaagcTC-aaagccatcagtactgaacttcaagactagggaacc 82 | || ||||| ||||| |||||||||||||||||||| |||| 115 Sbjct 56 AAGAAAATCTAAAGGGCATCAGTACTGAACTTCAAGACCATGGAGATC 115 Query 83 acagtgatcaccctccgccggaggaccggccggcgcc--t 139 ||| | | |||||||||||||||||||||||||||| 174 Sbjct 116 ACAGTGACCACCCTCCGCCGGAGGACCGGCCGGCGCCTCACACC 174 Query 140 ccg---ccaccattgaggccctacatacaaacccacc-----c 193 | || ||||| ||| | ||||||||||||| | || 234 Sbjct 175 CCACCACCACCATTGAAAACCCTAAGTGAAAAGACCCACCCCCTAACC 235 Query 194 aaatttctaattccacctaccactacaactctctcccctcacaccac 253 | |||| | ||| ||||||| ||||||||| || |||| | 290 Sbjct 236 AAATTTCAAGTCCAACCTACCACCACAACCTCTCTACCCCTACCAC 290 Query 254 cttgcgccacgccaccgaccgctatggtccccagaatgctgccaggc 313 | | || || |||| ||||||||| || | || || Sbjct 291 C-TGCAATGCCGACACCCCTCGCGATGAAGCCATCTCCGCTACGCC 348 Query 314 acccggagggtgacctccacccttccctggttggtggggggcgggct 373 | || || |||| ||||||||||||| | | Sbjct 349 ACCCGGAGCGCTG -----------------ACCTCAGCCACTA 369 Query 374 tccggctccgccctcctccggt---ctccgtgctaggcctcaaaa 408 ||| |||||| || ||||||| | ||| | Sbjct 409 TCCGGCTCCGACCCACCCTCCGGTCTCCCCGTTC 440 NH National Library of Medicine National Center for Biotechnology Information BLAST Check out the ClusteredNR database on BLAST+ Learn more Give us feedback Home Recent Results Saved Strategies Help Log in Basic Local Alignment Search Tool BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. Learn more Web BLAST Nucleotide BLAST nucleotide › nucleotide blastx translated nucleotide › protein tblastn protein › translated nucleotide Protein BLAST protein › protein NEWS BLAST Quick Start guides! Need some help getting started with BLAST? Thu, 22 Jun 2023 More BLAST news... Busca por BLAST Tipos de BLAST de acordo com o tipo de sequencia fornecida e qual o tipo buscado Artigo BLAST Busca por BLAST Nucleotide Sequence Translated Protein Sequence blastn Nucleotide DB tblastn Protein Sequence blastx blastp Protein DB tblastx Translated DB (contain amino acid sequences) Em 6 quadros Em 6 quadros >gi|226347322|gb|FJ830553.1| Anabaena planctonica CENA210 ribulose-1,5- bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds CCGGCGAAATTAAAGGTCACTACCTCAACGTTACCGCTCCTACCTGCGAAGAAATGTTGAAACGGGCTGA GTACGCTAAAGAACTCAAAATGCCCATCATCATGCACGACTACCTAACCGCAGGTTTCACCGCTAACACC ACATTGGCTCGTTGGTGTCGTGATAACGGTATTTTATTGCACATTCACCGTGCTATGCACGCTGTAATTG ACCGTCAAAAAAATCACGGTATCCACTTCCGCGTATTAGCTAAAGCCCTCCGCTTGTCCGGTGGTGATCA CATCCACACTGGTACAGTTGTTGGTAAGTTAGAAGGTGAACGCGGTATTACCATGGGCTTCGTTGACTTA TTACGTGAAAACTACGTTGAGCAAGACAAGTCTCGCGGTATTTACTTTACCCAAGATTGGGCGTCTCTAC CTGGTGTAATGGCCGTTGCTTCTGGTGGTATCCACGTATGGCATATGCCCGCGTTGGTTGAGATCTTCGG TGATGACTCCGTATTACAATTCGGTGGTGGTACACTCGGACATCCTTGGGGTAACGCTCCTGGTGCTACA GCTAACCGCGTAGCTCTAAAAGCAGTTGTTCAAGCTCGTAACGAAGGCCGTAACTTAGCTCGTGAAGGTA ACGATATTATCCGCGAAGCTGCTAAGTGGTCTCCTGAGTTGGCTGTTGCTTGCGAACTG >gi|226347323|gb|ACO50079.1| ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit [Anabaena planctonica CENA210] GEIKGHYLNVTAPTCEEMLKRAEYAKELKMPIIMHDYLTAGFTANTTLARWCRDNGILLHIHRAMHAVID RQKNHGIHFRVLAKALRLSGGDHIHTGTVVGKLEGERGITMGFVDLLRENYVEQDKSRGIYFTQDWASLP GVMAVASGGIHVWHMPALVEIFGDDSVLQFGGGTLGHPWGNAPGATANRVALKAVVQARNEGRNLAREGN DIIREAAKWSPELAVACEL Formato FASTA: formato universalmente aceito para ser processado Identificador - linha do nome (máximo 80 caracteres por linha) • Nossa sequência –> query (consulta) • O resultado da busca em BLAST pode ser um ou mais hits em sequências-sujeito (subject), ou seja, sequências pertencentes ao banco • Os melhores resultados de escores são relatados, • usar valor E • valor E <0.01 Quanto menor o e-value, mais significativo o alinhamento!!! Busca por BLAST Busca por BLAST BLAST® » blastn suite Standard Nucleotide BLAST Enter accession number(s), gi(s), or FASTA sequence(s) Query subrange Clear From To Or, upload file Escolher arquivo Nenhum arquivo escolhido Job Title Enter a descriptive title for your BLAST search Align two or more sequences Choose Search Set Database Standard databases (nr etc.) rRNAsITS databases Genomic + transcript databases Belacoronavirus Organism Nucleotide collection (nr/nt) Enter organism name or id—completions will be suggested Add organism Exclude Common Models (XM/XP) Uncultured/environmental sample sequences Limit to Sequences from type material Enter Query Create custom database Enter an Entrez query to limit search Program Selection Optimize for Highly similar sequences (megablast) More dissimilar sequences (discontiguous megablast) Somewhat similar sequences (blastn) Choose a BLAST algorithm BLAST Search database Nucleotide collection (nr/nt) using Megablast (Optimize for highly similar sequences) Show results in a new window Algorithm parameters Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST Glycine max transgenic cp4epsps gene for 5-enol- pyruvylshikimate-3-phospate synthase class 2 precursor, complete cds Busca por BLAST pesticidal protein (plasmid) [Bacillus thuringiensis] GenBank: AKJ62760.1 Bacillus thuringiensis strain T29 Cry1Ac gene, partial cds GenBank: MK882923.1 >gi|47933333|gb|AY262820.1| Pinus radiata cellulose synthase (CesA10) mRNA, complete cds Length=4482 Score = 7374 bits (3720), Expect = 0.0 Identities = 3741/3741 (100%), Gaps = 0/3741 (0%) Strand=Plus/Plus Query 1 GCACGAGGATTTATAATCCGGAATTACGTGATTATCATTGGTTTACACGTTAGCGTGGGAGCTGGTGAT 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1 GCACGAGGATTTATAATCCGGAATTACGTGATTATCATTGGTTTACACGTTAGCGTGGGAGCTGGTGAT 60 Query 61 ATTTTAGTTTTTATCCGAAACTTTCGGGCGTGAGCAAGAAAGGGTGAAGAAAGGTTGGAACAGTGGTG 120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 61 ATTTTAGTTTTTATCCGAAACTTTCGGGCGTGAGCAAGAAAGGGTGAAGAAAGGTTGGAACAGTGGTG 120 Query 121 GAATGGGTGTGAGAAGGTTGTAACTCCCAAG 180 ||||||||||||||||||||||||||| Sbjct 121 GAATGGGTGTGAGAAGGTTGTAACTCCCAAG 180 Query 181 TATTAGAGTGTTGCGAAGCGAAACGAATTTA 240 ||||||||||||||||||||||||||| Sbjct 181 TATTAGAGTGTTGCGAAGCGAAACGAATTTA 240 Query 241 TTCTGTGAAGCTTTTTAGTCTCTTGTTCATG 300 ||||||||||||||||||||||||||| Sbjct 241 TTCTGTGAAGCTTTTTAGTCTCTTGTTCATG 300 Barra = Identidade >gi|47933335|gb|AY262821.1| Pinus radiata cellulose synthase (CesA2) mRNA, partial cds Length=3603 Score = 866 bits (437), Expect = 0.0 Identities = 977/1157 (84%), Gaps = 0/1157 (0%) Strand=Plus/Plus Query 1450 GAAGACCTTCAAAATGAGTAGATGGAACGACCTCGAACCCCTCAAGAAGAGAGTGGTTCCTATTGCT 1509 |||| |||||||||||||| || | ||||||||||||||| || || | || | || || | || Sbjct 697 GAAGACCCTGGAAATGAGTACATGGAACGACCTCCAACCCCTCTCTCAAGATGCCAAAGGTTTCACT 756 Query 1510 CTCTCCAAGATCAATAGACGAGGCTTAACCGGCTCTACGCGTATCGGGCGCTTTC 1569 ||||| || | || | || || ||| ||| ||||||||| || ||| Sbjct 757 TCTTCCAAGAGTCAATAAACGTAGGCTAAACGCTCTACCGCATACGGATGCCTTT 816 Query 1570 TTCTTCCGCTACCGGGAATATTGCATAAGCGTATAAGCGATATTACGATGTGACTGTTTACTCTGTGAAGCT 1629 ||| || | || | || || | | ||| ||| |||| |||| | || | || | | || | Sbjct 817 TTCTTCCCGCTCCGGGAATGTTGCATTACGCTCTAAGCGTCATTACAATGTGACTGTTTAATCTGTAAGC 876 Query 1630 GTAATAGTGATGGAAGAGG 1647 | | || |||||| || Sbjct 877 GTAATAATGAGTGGAAGGCT 896 Query 1690 CCCACTAGATAGGGAAAGGA 1749 |||| |||| ||||| |||| Sbjct 936 CCCACTAGATAGGGAAAGGA 937 Busca por BLASTp BLAST® » blastp suite Standard Protein BLAST Enter Query Sequence Enter accession number(s), gi(s), or FASTA sequence(s) Clear Query subrange From To Or, upload file Escolher arquivo Nenhum ar...ivo escolhido Job Title Enter a descriptive title for your BLAST search more… □ Align two or more sequences Choose Search Set Standard Databases Organism Exclude Optional Optional □ Standard databases (nr etc.) New □ Exp... Select to compare standard and experimental database Add organism Program Selection Algorithm Quick BLASTP (Accelerated protein-protein BLAST) blastp (protein-protein BLAST) Non-redundant protein sequences (nr) RefSeq Select proteins... Job Title gi|47933334|gb|AAG63935.1| cellulose synthase… Enter a descriptive title for your BLAST search Choose Search Set Database Non-redundant protein sequences (nr) Organism Optional Enter organism common name or id-completions will be suggested Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Entrez Query Optional Enter an Entrez query to limit search Program Selection Algorithm blastp (protein-protein BLAST) PSI-BLAST (Position-Specific Iterated BLAST) PHI-BLAST (Pattern Hit Initiated BLAST) Choose a BLAST algorithm BLAST Search database nr using Blastp (protein-protein BLAST) Show results in a new window Algorithm parameters Human Mouse Rat Arabidopsis thaliana Oryza sativa Gallus gallus Bos taurus Pan troglodytes Danio rerio Microbes Drosophila melanogaster Apis mellifera Basic BLAST Choose a BLAST program to run. nucleotide blast Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontinuous megablast protein blast Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast blastx Search protein database using a translated nucleotide query tblastn Search translated nucleotide database using a protein query tblastx Search translated nucleotide database using a translated nucleotide query Specialized BLAST Choose a type of specialized search (or database name in parentheses.) Search trace archives Find conserved domains in your sequence (cds) Find sequences with similar conserved domain architecture (cdart) Tip of the Day How to Search Custom Databases in Web-Blast Using Entrez Queries A powerful feature of the BLAST Web interface is the ability to limit BLAST searches to a subset of any database using a standard Entrez query. Skillful use of Entrez queries allows the equivalent of on-the-fly construction of databases of exact composition More tips… Enter Query Sequence Enter accession number, gi, or FASTA sequence Clear >gi|47933333|gb|AY262820.1| Pinus radiata cellulose synthase (CesA10) mRNA, complete cds GCACCGAGTGGTGGTACCAGTCACGGTACTCTAACTTAACACGAACCAGCTCA... Or, upload file Genetic code Standard (1) [?] Job Title gi|47933333|gb|AY262820.1| Pinus radiata cellulose… Enter a descriptive title for your BLAST search Choose Search Set Database Non-redundant protein sequences (nr) Organism Optional Enter organism common name, binomial, or tax id. Only 20 top taxa will be shown. Entrez Query Optional Enter an Entrez query to limit search BLAST Search database nr using Blastx (search protein databases using a translated nucleotide query) Show results in a new window Algorithm parameters >gi|47933334|gb|AAQ63935.1| cellulose synthase [Pinus radiata] Length=1096 Score = 2221 bits (5754), Expect = 0.0 Identities = 1096/1096 (100%), Positives = 1096/1096 (100%), Gaps = 0/1096 (0%) Frame = +1 Query 649 MEARTNTAA…CNE 828 MEARTNTAA…CNE Sbjct 1 MEARTNTAA…CNE 60 Query 829 CAFPVCRPC…TQGNR 1008 CAFPVCRPC…TQGNR Sbjct 61 CAFPVCRPC…TQGNR 120 Query 1009 NEKQQIAEAM…SEYR 1188 NEKQQIAEAM…SEYR Sbjct 121 NEKQQIAEAM…SEYR 180 Query 1189 IAAPPTGGGS…KQDK 1368 IAAPPTGGGS…KQDK Sbjct 181 IAAPPTGGGS…KQDK 240 Query 1369 NTLQVTSDTY…IVVL 1548 NTLQVTSDTY…IVVL Sbjct 241 NTLQVTSDTY…IVVL 300 >gi|47933336|gb|AAQ63936.1| cellulose synthase [Pinus radiata] Length=1066 Score = 1813 bits (4695), Expect = 0.0 Identities = 890/1066 (83%), Positives = 972/1066 (91%), Gaps = 9/1066 (0%) Frame = +1 Query 760 ICQICGEDVL…SPQV 939 +CQIC+DVGL…+CSP+V Sbjct 3 VCQICGDDVG…KHGSPRV 62 Query 940 DGDKEDEDAD…PELPQL 1116 +GD+ ADD++++++ Sbjct 63 EDGEDGADDY…ARSESES 122 Query 1117 QVPLTINGQA…PADK 1296 Q+P +TINGQH…P+AKD Sbjct 123 QIPRLTINGS…DHSRD 181 Query 1297 FNSYGFGNVA…DEA 1476 FNSYGFGNW…+DEA Sbjct 182 FNSYGFGNVA…MDEA 241 Query 1477 RQPLSKRVPI…FAI 1656 RQPLRKRVPI…FAI Sbjct 242 RQPLRKRVPI…FAI 300 ESTUDO DIRIGIDO 1. Bancos de dados públicos e internacionais: NCBI, EMBL, DDBJ; 2. Definição de Bioinformática; 3. Análise da sequência no NCBI; 4. Busca de sequências por similaridade; 5. BLAST e Banco de dados de sequências.