[Ferritin] Ferritin Lactiplantibacillus plantarum 분석

2024. 11. 16. 21:59Data_Analysis

1. Lactiplantibacillus plantarum에서 발현되는 ferritin  데이터 생성. 

ncbi protein 사이트에 접속해서 "Ferritin Lactiplantibacillus plantarum" 입력했다. 

총 43개의 searching result가 나왔고, 이 파일들을 다 모아서 ferritinL.plantarum.raw.fasta 파일을 만들었다. 

>ALO75854.1 ferritin (plasmid) [Lactiplantibacillus plantarum]
MKYTKTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLHPLMDEFMEEIDSQLDVISERLIALDGSPYS
TLKEMAENTKIQDWPGEWDKTTPERLAHLVDGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKI
WMIQAELGSAPEVDE

>TEA94412.1 ferritin [Lactiplantibacillus plantarum]
MKYTKTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLHPLMDEFMEEIDSQLDVISERLIALDGSPYS
TLKEMAENTKIQDWPGEWDKTTPERLAHLVDGYRCLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKI
WMIQAELGSAPEIDE

>TEA92010.1 ferritin [Lactiplantibacillus plantarum]
MKYTKTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLHPLMDEFMEEIDSQLDVISERLIALDGSPYS
TLKEMAENTKIQDWPGEWDKTTPERLAHLVDGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKI
WMIQAELGSAPEIDE

>TEA91991.1 ferritin, partial [Lactiplantibacillus plantarum]
MKYTKTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH

>PHM02949.1 ferritin (plasmid) [Lactiplantibacillus plantarum]
MKYTKTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLHPLMDEFMEEIDSQLDVISERLIALDGSPYS
TLKEMAENTKIQDWPGEWDKTTPERLAHLVDGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKI
WMIQAELGSAPEIDE



>BBA81365.1 ribonucleoside-diphosphate reductase, beta chain [Lactiplantibacillus plantarum]
MATDLAYYQKLLSNGNYKAINWDRVSDAIDKSTWEKLTEQFWLDTRIPVSNDMADWRELDDDHRWVVGHV
FGGLTLLDTLQSQDGLQALRRNVLTSHETAVLNNIQFMESVHAKSYSTIFETLNTPDEINEIFDWSDSEE
FLQAKAQWIYKLYDNIDEDPLKQKVANVFLETFLFYSGFYTPLYYLGHNQLPNVAEIIKLILRDESVHGT
YIGYKFQLGFKDRSEKQQAEFKDWMFDFLYKLYENEENYIHLVYDQIGWSDEVLTFSRYNANKALMNLGQ
DALFPDTAEDVNPVVMNGISTGTSNHDFFSQVGNGYRLGQVEAMQDTDYDIGNPDD

>CDN28226.1 hypothetical protein LP80_1530 [Lactiplantibacillus plantarum]
MSELTIDEQYAAELKQSDIDHHVPTAGAMTNHILSNLMVAYVKLTQVKWYVKGPQSLALRTAYQRLLDQN
VRQFAELGELLLDENQKPSSTTAELTKYSMLEENGAFKYQSADELVAATIKDFDTENLFVDRAIKLAEKE
TRPALAAWLVAYRGSNNRNIRELQVYLGNDARTGLDEEDEDDD

파일의 일부분을 가져왔다. 자세히 살펴보면 ferritin이 아닌 다른 protein들도 들어가 있는 것을 확인할 수 있다. 


2. 데이터 전처리 

중복 seq, partial seq, "X"를 포함한 seq 그리고 ferritin이 아닌 seq는 제거했다. 

from Bio import SeqIO 

seq = SeqIO.parse("/home/rudlab/projects/ferritinL.plantarumAn/data/ferritinL.plantarum.raw.fasta","fasta") 
seq_set = set() 
with open("/home/rudlab/projects/ferritinL.plantarumAn/data/ferritinL.plantarum.fasta", "w") as handle:
    for s in seq:
        if "ferritin" not in s.description:
            continue 
        if "X" in s.seq:
            continue
        if "partial" in s.description or "truncated" in s.description:
            continue 
        if s.seq not in seq_set:
            seq_set.add(s.seq) 
            SeqIO.write(s, handle, "fasta")

 

 

3. MSA 분석하기

https://www.ebi.ac.uk/jdispatcher/msa/muscle?stype=protein

 

https://www.ebi.ac.uk/jdispatcher/msa/muscle?stype=protein

 

www.ebi.ac.uk

muscle이라는 사이트에서 msa를 진행했다. 

가공된 fasta 파일을 input으로 주면 된다. 

 

output : 

CLUSTAL multiple sequence alignment by MUSCLE (3.8)


EQM54203.1      -------MKYT-------------KTKEVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH
TEA94412.1      -------MKYT-------------KTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH
ALO75854.1      -------MKYT-------------KTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH
KLD56812.1      -------MKYT-------------KTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH
TEA92010.1      -------MKYT-------------KTKAVLNQLVADLSQMSMIIHQTHWYMRGPNFLKLH
BBM21960.1      MSELTIDEQYAAELKQSDIDHHVPTAGAMTNHILSNLMVAYVKLSQVKWYVKGPQSLALR
EPD22853.1      MSELTIDEQYAAELKQSDIDHHVPTAGAMTNHILSNLMVAYVKLTQVKWYVKGPQSLALR
                        :*:             .:  : *:::::*    : : *.:**:.**: * *.

EQM54203.1      PLMDEFMEEIDSQLDVISERLIALDGNPYSTLKEMADNTKIKDWPGTWDKTTPERLAHLV
TEA94412.1      PLMDEFMEEIDSQLDVISERLIALDGSPYSTLKEMAENTKIQDWPGEWDKTTPERLAHLV
ALO75854.1      PLMDEFMEEIDSQLDVISERLIALDGSPYSTLKEMAENTKIQDWPGEWDKTTPERLAHLV
KLD56812.1      PLMDEFMEEIDSQLDVISERLIALDGSPYSTLKEMVENTKIQDWPGEWDKTTPERLAHLV
TEA92010.1      PLMDEFMEEIDSQLDVISERLIALDGSPYSTLKEMAENTKIQDWPGEWDKTTPERLAHLV
BBM21960.1      TEYQQLIDQNVRQFAELGDLLLDENQKPSSTTAELTKYSMLEENGAFKYQSADELVAATI
EPD22853.1      TAYQRLLDQNVRQFAELGELLLDENQKPSSTTAELTKYSMLEENGAFKYQSVDELVAATI
                .  : ::::   *:  :.: *:  : .* **  *:.. : :::. . . ::. * :*  :

EQM54203.1      DGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKIWMIQAELGSAPEI-------
TEA94412.1      DGYRCLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKIWMIQAELGSAPEI-------
ALO75854.1      DGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKIWMIQAELGSAPEV-------
KLD56812.1      DGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKIWMIQAELGSAPEI-------
TEA92010.1      DGYRYLEDLYQHGIEVSDVEKDFSTQDIFIGLKTAIEKKIWMIQAELGSAPEI-------
BBM21960.1      KDFDTENLFVDRAIKLAEKETRPALAAWLVAYRGSNNRNIRELQAYLGNDARTGLDEEDE
EPD22853.1      KDFDTENLFVDRAIKLAEKENRPALAAWLVAYRGSNNRNIRELQAYLGNDARTGLDEEDE
                ..:   : : :..*:::: *.  :    ::. . : :.:*. :** **. .         

EQM54203.1      --DE
TEA94412.1      --DE
ALO75854.1      --DE
KLD56812.1      --DE
TEA92010.1      --DE
BBM21960.1      DDDD
EPD22853.1      DDDD
                  *:

 

 

4. WebLogo 그리기 

https://weblogo.berkeley.edu/logo.cgi?gad_source=1&gclid=Cj0KCQiAouG5BhDBARIsAOc08RTTgeFBxPz9L5aTn4jtFCVQldXYa-XgaWXNLj2EwpSLzQfw-N7t_LgaAs0lEALw_wcB

 

WebLogo - Create Sequence Logos

 

weblogo.berkeley.edu

웹로고는 이 사이트에서 그렸다. 

 

output : 

 

 

Ferritin-like domain protein, DNA-binding ferritin-like protein, DPS family, 그리고 ferritin의 서열 길이가 다르기 때문에 로고 플롯에서 빈 영역이 나타났다고 추측할 수 있다.