propy.PseudoAAC

Instead of using the conventional 20-D amino acid composition to represent the sample of a protein, Prof. Kuo-Chen Chou proposed the pseudo amino acid (PseAA) composition in order for inluding the sequence-order information. Based on the concept of Chou’s pseudo amino acid composition, the server PseAA was designed in a flexible way, allowing users to generate various kinds of pseudo amino acid composition for a given protein sequence by selecting different parameters and their combinations. This module aims at computing two types of PseAA descriptors: Type I and Type II.

References

[1]Kuo-Chen Chou. Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition. PROTEINS: Structure, Function, and Genetics, 2001, 43: 246-255.
[2]http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/
[3]http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/type2.htm
[4]Kuo-Chen Chou. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics, 2005, 21, 10-19.

Authors: Dongsheng Cao and Yizeng Liang. Date: 2012.9.2 Email: oriental-cds@163.com

The hydrophobicity values are from JACS, 1962, 84: 4240-4246. (C. Tanford).

The hydrophilicity values are from PNAS, 1981, 78:3824-3828 (T.P.Hopp & K.R.Woods).

The side-chain mass for each of the 20 amino acids.

CRC Handbook of Chemistry and Physics, 66th ed., CRC Press, Boca Raton, Florida (1985).

R.M.C. Dawson, D.C. Elliott, W.H. Elliott, K.M. Jones, Data for Biochemical Research 3rd ed.,

Clarendon Press Oxford (1986).

propy.PseudoAAC.GetAAComposition(ProteinSequence: str) → Dict[Any, Any][source]

Calculate the composition of Amino acids for a given protein sequence.

Parameters:ProteinSequence (str) – a pure protein sequence
Returns:
Return type:result is a dict form containing the composition of 20 amino acids.

Examples

>>> from propy.GetProteinFromUniprot import GetProteinSequence
>>> protein = GetProteinSequence(ProteinID="Q9NQ39")
>>> result = GetAAComposition(protein)
propy.PseudoAAC.GetAPseudoAAC(ProteinSequence, lamda: int = 30, weight: float = 0.5)[source]

Computing all of type II pseudo-amino acid compostion descriptors based on the given properties. Note that the number of PAAC strongly depends on the lamda value. if lamda = 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the choice of lamda and weight simultaneously.

Parameters:
  • ProteinSequence (str) – a pure protein sequence
  • lamda (int) – reflects the rank of correlation and is a non-Negative integer, such as 15. Note that (1)lamda should NOT be larger than the length of input protein sequence; (2) lamda must be non-Negative integer, such as 0, 1, 2, …; (3) when lamda =0, the output of PseAA server is the 20-D amino acid composition.
  • weight (float) – is designed for the users to put weight on the additional PseAA components with respect to the conventional AA components. The user can select any value within the region from 0.05 to 0.7 for the weight factor.
Returns:

result – contains calculated 20+lamda PAAC descriptors

Return type:

Dict[Any, Any]

Examples

>>> from propy.GetProteinFromUniprot import GetProteinSequence
>>> protein = GetProteinSequence(ProteinID="Q9NQ39")
>>> result = GetAPseudoAAC(protein)
propy.PseudoAAC.GetAPseudoAAC1(ProteinSequence, lamda=30, weight=0.5)[source]

Computing the first 20 of type II pseudo-amino acid compostion descriptors based on

[_Hydrophobicity, _hydrophilicity].

propy.PseudoAAC.GetAPseudoAAC2(ProteinSequence, lamda=30, weight=0.5)[source]

Computing the last lamda of type II pseudo-amino acid compostion descriptors based on

[_Hydrophobicity, _hydrophilicity].

propy.PseudoAAC.GetCorrelationFunction(Ri='S', Rj='D', AAP=None)[source]

Computing the correlation between two given amino acids using the given properties.

Parameters:
  • Ri (str) – amino acids
  • Rj (str) – amino acids
  • AAP (List[Any]) – contains the properties, each of which is a dict form.
Returns:

Return type:

result is the correlation value between two amino acids.

Examples

>>> GetCorrelationFunction(Ri="S", Rj="D", AAP=_Hydrophobicity)
propy.PseudoAAC.GetPseudoAAC(ProteinSequence: str, lamda: int = 30, weight: float = 0.05, AAP=None)[source]

Computing all of type I pseudo-amino acid compostion descriptors based on the given properties. Note that the number of PAAC strongly depends on the lamda value. if lamda = 20, we can obtain 20+20=40 PAAC descriptors. The size of these values depends on the choice of lamda and weight simultaneously. You must specify some properties into AAP.

Parameters:
  • ProteinSequence (str) – a pure protein sequence
  • lamda (int) – reflects the rank of correlation and is a non-Negative integer, such as 15. Note that (1)lamda should NOT be larger than the length of input protein sequence; (2) lamda must be non-Negative integer, such as 0, 1, 2, …; (3) when lamda =0, the output of PseAA server is the 20-D amino acid composition.
  • weight (float) – is designed for the users to put weight on the additional PseAA components with respect to the conventional AA components. The user can select any value within the region from 0.05 to 0.7 for the weight factor.
  • AAP (List[Any]) – contains the properties, each of which is a dict form.
Returns:

Return type:

result is a dict form containing calculated 20+lamda PAAC descriptors.

Examples

>>> from propy.GetProteinFromUniprot import GetProteinSequence
>>> protein = GetProteinSequence(ProteinID="Q9NQ39")
>>> result = GetPseudoAAC(protein)
propy.PseudoAAC.GetPseudoAAC1(ProteinSequence, lamda=30, weight=0.05, AAP=None)[source]

Computing the first 20 of type I pseudo-amino acid compostion descriptors based on the given properties.

propy.PseudoAAC.GetPseudoAAC2(ProteinSequence, lamda: int = 30, weight: float = 0.05, AAP=None)[source]

Compute the last lamda of type I pseudo-amino acid compostion descriptors based on the given properties.

propy.PseudoAAC.GetSequenceOrderCorrelationFactor(ProteinSequence, k: int = 1, AAP=None)[source]

Computing the Sequence order correlation factor with gap equal to k based on the given properities.

Parameters:
  • ProteinSequence (str) – a pure protein sequence
  • k (int) – the gap.
  • AAP (List[Any]) – contains the properties, each of which is a dict form.
Returns:

Return type:

result is the correlation factor value with the gap equal to k.

Examples

>>> from propy.GetProteinFromUniprot import GetProteinSequence
>>> protein = GetProteinSequence(ProteinID="Q9NQ39")
>>> result = GetSequenceOrderCorrelationFactor(protein)
propy.PseudoAAC.GetSequenceOrderCorrelationFactorForAPAAC(ProteinSequence, k=1)[source]

Computing the Sequence order correlation factor with gap equal to k based on

[_Hydrophobicity, _hydrophilicity] for APAAC (type II PseAAC) .

Parameters:
  • ProteinSequence (str) – a pure protein sequence
  • is the gap. (k) –
Returns:

Return type:

result is the correlation factor value with the gap equal to k.

Examples

>>> from propy.GetProteinFromUniprot import GetProteinSequence
>>> protein = GetProteinSequence(ProteinID="Q9NQ39")
>>> result = GetSequenceOrderCorrelationFactorForAPAAC(protein)
propy.PseudoAAC.NormalizeEachAAP(AAP)[source]

All of the amino acid indices are centralized and standardized before the calculation.

Parameters:is a dict form containing the properties of 20 amino acids. (AAP) –
Returns:
  • result is the a dict form containing the normalized properties of 20 amino
  • acids.

Examples

>>> result = NormalizeEachAAP(AAP=_Hydrophobicity)