[BioPython] Entrez

[BioPython] Entrez

2024. 9. 19. 23:09ㆍBioinformatics

0. Bio.Entrez는 NCBI에 있는 데이터를 검색하고 사용할 수 있도록 해준다.

기본적인 사용법은 다음과 같다.

from Bio import Entrez 
Entrez.email = "your_email@~~.com" 
handle = Entrez.efetch(db = "nucleotide", id = "id" , rettype = "gb", retmode = "text or xml ..") 
record = Entrez.read(handle)

1. 사용해 보기

id 가 NC_001367.1인 종이 무엇인지와 특성을 살펴보자.

from Bio import Entrez 
Entrez.email = "my_@email.com" 
handle = Entrez.efetch(db = "nucleotide" , id = "NC_001367.1", rettype = "gb" , retmode = "xml") 
records = Entrez.parse(handle) 
for record in records: 
	print(record["GBSeq_locus"]) 
    print(record["GBSeq_definition"]) 
    ...

output

NC_001367
Tobacco mosaic virus, complete genome
single RNA
6395 bp
3 journals
541501

2. Entrez.einfo()

Entrez에 어떤 데이터베이스가 있는지 확인하고 싶을 때 이용

from Bio import Entrez 
Entrez.email = "youremail" 
handle = Entrez.einfo() 
record = Entrez.read(handle) 
print(record) 
print(len(record["DbList"]))

output

{'DbList': ['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'medgen', 'mesh', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']}
40

3. Entrez.esearch()

Entrez.esearch()는 Entrez에 있는 40개의 데이터베이스에 검색하고 싶을 때 이용한다. esearch를 이용한 다음, Entrez.read로 읽으면 된다.

from Bio import Entrez 

Entrez.email = "youremail"
handle = Entrez.esearch(db = "pubmed", term = "bioinformatics") 
record = Entrez.read(handle) 
print(record["Count"])

output

'Bioinformatics' 카테고리의 다른 글

[BioPython] Bio.SwissProt (2)	2024.09.21
[M/O] Akkermansia muciniphila (0)	2024.09.20
[BioPython] BLAST (0)	2024.09.18
[BioPython] FASTA,FASTQ,GENBANK : SeqIO,Entrez (0)	2024.09.13
[BioPython] wget을 이용해 github 데이터를 다운받을때 (0)	2024.09.12

rudgh99

rudgh99

태그

최근글

댓글

공지사항

아카이브

1. 사용해 보기

2. Entrez.einfo()

'Bioinformatics' 카테고리의 다른 글

관련글

티스토리툴바