Installation¶
pip install propy3
Download proteins from Uniprot¶
You can get a protein sequence from the Uniprot website by providing a Uniprot ID:
from propy.GetProteinFromUniprot import GetProteinSequence as gps
uniprotid = "P48039"
proseq = gps(uniprotid)
print(proseq)
gives
MQGNGSALPNASQPVLRGDGARPSWLASALACVLIFTIVVDILGNLLVILSVYRNKKLRNAGNIFVVSLAVA\
DLVVAIYPYPLVLMSIFNNGWNLGYLHCQVSGFLMGLSVIGSIFNITGIAINRYCYICHSLKYDKLYSSKNS\
LCYVLLIWLLTLAAVLPNLRAGTLQYDPRIYSCTFAQSVSSAYTIAVVVFHFLVPMIIVIFCYLRIWILVLQ\
VRQRVKPDRKPKLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLAVASDPASMVPRIPEWLFVASYYMAYFNS\
CLNAIIYGLLNQNFRKEYRRIIVSLCTARVFFVDSSNDVADRVKWKPSPLMTNNNVVKVDSV
You can get the window✕ 2 + 1 sub-sequences whose central point is the given amino acid ToAA.
from propy import GetSubSeq
subseq = GetSubSeq.GetSubSequence(proseq, ToAA="S", window=5)
print(subseq)
gives
[
"MQGNGSALPNA",
"ALPNASQPVLR",
"DGARPSWLASA",
"PSWLASALACV",
"LLVILSVYRNK",
"NIFVVSLAVAD",
"PLVLMSIFNNG",
"LHCQVSGFLMG",
"FLMGLSVIGSI",
"LSVIGSIFNIT",
"CYICHSLKYDK",
"YDKLYSSKNSL",
"DKLYSSKNSLC",
"YSSKNSLCYVL",
"DPRIYSCTFAQ",
"CTFAQSVSSAY",
"FAQSVSSAYTI",
"AQSVSSAYTIA",
"GLAVASDPASM",
"ASDPASMVPRI",
"WLFVASYYMAY",
"MAYFNSCLNAI",
"RRIIVSLCTAR",
"VFFVDSSNDVA",
"FFVDSSNDVAD",
"VKWKPSPLMTN",
]
You can also get several protein sequences by providing a file containing Uniprot IDs of these proteins.
from propy.GetProteinFromUniprot import GetProteinSequenceFromTxt as gpst
tag = gpst("propy/data", "target.txt", "target1.txt")
prints
--------------------------------------------------------------------------------
The 1 protein sequence has been downloaded!
MADSCRNLTYVRGSVGPATSTLMFVAGVVGNGLALGILSARRPARPSAFAVLVTGLAATDLLGTSFLSPAVFVAYARNSSLLGLARGGPALCDAFAFAMTFFGLASMLILFAMAVERCLALSHPYLYAQLDGPRCARLALPAIYAFCVLFCALPLLGLGQHQQYCPGSWCFLRMRWAQPGGAAFSLAYAGLVALLVAAIFLCNGSVTLSLCRMYRQQKRHQGSLGPRPRTGEDEVDHLILLALMTVVMAVCSLPLTIRCFTQAVAPDSSSEMGDLLAFRFYAFNPILDPWVFILFRKAVFQRLKLWVCCLCLGPAHGDSQTPLSQLASGRRDPRAPSAPVGKEGSCVPLSAWGEGQVEPLPPTQQSSGSAVGTSSKAEASVACSLC
--------------------------------------------------------------------------------
TODO: HTTP Error 300!
The downloaded protein sequences have been saved in “propy/data/target1.txt”.
You could check whether the input sequence is a valid protein sequence or not:
from propy import ProCheck
temp = ProCheck.ProteinCheck(proseq)
print(tmp)
which prints 350
. This output is the number of the protein sequence if
it is valid; otherwise 0.
Obtaining the property from the AAindex database¶
You could get the properties of amino acids from the AAindex database by providing a property name (e.g., KRIW790103). The output is given in the form of dictionary.
If the user provides the directory containing the AAindex database (the AAindex database could be downloaded from ftp://ftp.genome.jp/pub/db/community/aaindex/. It consists of three files: aaindex1, aaindex2 and aaindex3), the program will read the given database to get the property.