SourceForge.net Logo

TEpredict

Download

Contacts

PolyCTLDesigner

Here be dragons!

PolyCTLDesigner: a program  for constructing  polyepitope immunogens






1. REQUIREMENTS.

Note! To be able to use the program you should install the following software:

  1. At first you'll need Python 2.7.x! not Python 3.x.x! It could be downloaded from http://www.python.org

  2. numpy - it could be downloaded from http://numpy.org/; currently I'm using numpy version 1.5.1. 

  3. biopython - it could be downloaded from http://biopython.org/; currently I'm using biopython version 1.59. 

  4. graph - package python-graph; I'm using python-graph version 1.3.1, but newer versions should work also.

       The package could be downloaded from http://code.google.com/p/python-graph/.  

  5. pyevolve - this module could be found at http://pyevolve.sourceforge.net/; currently I'm using pyevolve version 0.5.



2. PACKAGE DESCRIPTION.

TEpredict.pyThis module contains functions to predict T-cell epitopes (either MHC class I- or II-restricted).
The module has detailed comments, thus if you are interested in any particular function or class I advise you to have a look at those comments.
LocusGroupAlleleFreq.pyThe module contains the genotypic frequences of various HLA class I alleles;
The data was taken from http://www.ncbi.nlm.nih.gov/projects/gv/mhc/ihwg.cgi
PolyCTLDesigner2.pyThis is the main module of PolyCTLDesigner package. However the majority of functions defined here are depricated. The only functions actively used to date are:
  1. chooseNlink - the function to add amino acid residues to peptide's N-tetrminus to optimise its binding to TAP;
  2. graph_cre4 - the function to select optimal spacer sequences for all possible pairs of peptides
  3. graph_autoselect - this function allows one to select the minimal set of peptides covering the desired HLA repertoire with desired redundancy rate. By default the rate of redundancy is 5, i.e. the programm will pick at least 5 epitopes restricted by each particular HLA allele, if the number of such peptides in the set is greater or equal to 5.
The module source-code has commented lines with examples, but I advise you to use examples from the designer.py module.
epitopes_count.pyThis module contains functions to construct spacer sequences, to rank the spacers for a given pair of peptides and to choose an optimal one.
To change the ranking function or the weights of selected criteria you should modify the function return_edge in epitopes_count.py module (line #33) the weight is computed by the code in lines #119-120.
construct_graph.pyUsing the output from the previous module this one creates directed weighted graph with nodes corresponding to target epitopes and with edges corresponding to optimal epitope junctions.
TSP_polyCTL.pyThis module is used to find the longest simple path with the least weight in the constructed digraph.
designer.pyThis is the control flow module. The most important functions of the module are described below.
                            
3. THE MOST IMPORTANT FUNCTIONS OF DESIGNER.PY MODULE.
       
predict_all(x):
    predict(x, sel_mods, sel_mods.keys(), tap = tap,
                prot = prot, immprot = immprot, N=11, thrsh_t=1.0,
                thrsh = 6.3, method = 'N-extended',thrsh_p=10,thrsh_i=10)
    This function is used to predict HLA class I-binders (potential CTL T-cell epitopes).
    if you want to use other parameters for prediction you should change them here:
      x           - is the instance of class "Antigen"
      thrsh     - the T-cell epitope prediction method threshold, in general you should make it higher to increase the specificity
      thrsh_t  - the TAP-binding prediction threshold, in general you should make it higher to increase the sensitivity
      N          - the maximal length of N-extended precursor peptides to be considered (it might be increased upto 15-18)
      method   - it can be set to "9" to consider only nonameric peptides when predicting peptide-binding to TAP
                      or to "N-extended", if you want to consider longer peptides
      tap         - can be set either to tap or to False (to disable TAP-filtering)
      prot       - can be set either to prot or to False (to disable proteasome-filtering)
      immprot  - can be set either to immprot or to False (to disable immunoproteasome-filtering)
      thrsh_p  - the proteasomal cleavage prediction threshold (an integer from 1 to 10),
                      in general you should make it higher to increase the sensitivity
      thrsh_i   - the immunoproteasomal cleavage prediction threshold (an integer from 1 to 10),
                      in general you should make it higher to increase the sensitivity
   NOTE! If you would like to use other alleles than selected by default make sure you have changed sel_alleles pattern in desired way (line 34).
            
predict_II(x):
    predict(x, modelsII, modelsII.keys(), thrsh=2)
    This function is used to predict HLA class II-binders (potential T-helper epitopes)
    x                      - is the instance of class <Antigen>
    modelsII           - models to be used for Th-epitopes prediction
    modelsII.keys() - selected HLA class II allomorphs
    thrsh                 - the prediction method threshold (by default it is set to 2, i.e. to the second top percentile score)
        
producePolyE(file2open=False, ag=False, ouname = 'polyCTL_out.log',
              THRSH=4, FLANK='ADLVKV', useGA = False, tap = tap,
              prot = prot, improt = immprot, predefined = False):

    This function predicts CTL epitopes in antigens provided in Fasta or GenPep format,
    selects the minimal set of epitopes covering the desired HLA repertoire with
    desired redundancy rate and constructs the polyepitope.
    Depending on the size of your Fasta this function might be running up to
    several days (typically, using 5-6 antigens of about 200aa long you'll
    get the results after 6-10 hours)
    
    file2open   - file with your fasta (or genbank) records
    ag              - instead of reading the file you can give the function an
                        instance of class <Antigen>
    ouname       - output file name
    THRSH      - threshold for predicting both proteasomal and immunoproteasomal
                        cleavage (an integer from 1 to 10)
    FLANK       - the spacer motif used to produce the polytope
                        It can be either fixed, e.g. 'ADLVKV'
                        or degenerate, e.g. [['A','S','N','R'],
                                                     ['D','L','I','A','T'],
                                                     ['L','G','A'],
                                                     ['A','K','S','N','V']]
    useGA         - if you want to use genetic algorythm-based TSP-solver you should
                        set this parameter to True
                        NOTE!!! This parameter should be set to True if you use the degenerate
                                     spacer motif.
    predefined  - it is set to False by default, but if you like, you can set it
                         to the list of preselected CTL epitopes
    NOTE!!! parameters prot and improt shouldn't be changed!!!
                 You can set parameter tap to False if you realy don't need TAP-binding affinity prediction.
    
producePolyTh(file2open=False, ag=False, ouname = 'polyCTL_out.log'):
    This function predicts T-helper epitopes in antigens
    provided in Fasta or GenPep format (file2open)
    This function will predict Th epitopes restricted by ~50 HLA-DR alleles
    using ProPred (TEPITOPE) predictive models, and it will return a set of
    10 peptides containing the most Th epitopes restricted by the most HLA
    class II allomorphs - the most promiscuous HLA-binders.
    

4. POLYCTLDESIGNER USAGE.

1. To produce polyepitope, composed of predicted CTL epitopes from preselected set of antigens use the following function:

        producePolyE(file2open='YourFastaFile', FLANK = [['A','S','P','G','N','R'],
                                                                                     ['D','L','I','A','T'],
                                                                                     ['L','G','A'],
                                                                                     ['A','K','S','N','V']],
                              useGA = True, THRSH=4)

    Here you should provide the function with path to your file with antigen sequences (in Fasta or GenPep format).
    If you want to use another spacer motif, you should modify the FLANK variable value in desired way.
    If you don't want to use this function you should comment corresponding lines in designer.py module.

2. If you don't want to use GA-based TSP-solver and/or degenerate spacer motif you can use the following function:

        producePolyE(file2open='YourFastaFile', FLANK = 'ADLVKV',
                             useGA = False)

    This function optimizes epitope junctions only for proteasomal and/or immunoproteasomal cleavage without minimizing the number of junctional epitopes. It uses fixed spacer sequence, e.g. ADLVKV and only the length of the spacer is varied.
    If you don't want to use this function you should comment corresponding lines in designer.py module.

3. To choose promiscuous HLA II-binders to be used/tested as potential Th epitopes use the following function:

         producePolyTh(file2open='YourFastaFile')

    If you don't want to use this function you should comment corresponding lines in designer.py module.

4. If you don't want to predict T-cell epitopes, you can construct polyepitope using your own predefined set of epitopes. In this case producePolyE should be provided with with the list of predefined epitopes:

         producePolyE(predefined=['EPITOPESEQUENCE1','EPITOPESEQUENCE2','EPITOPESEQUENCE3',

                                                '...','EPITOPESEQUENCEN'],
                             FLANK = [['A','S','P','G','N','R'],
                                             ['D','L','I','A','T'],
                                             ['L','G','A'],
                                             ['A','K','S','N','V']],
                             useGA = True, THRSH=4)

    If you don't want to use this function you should comment corresponding lines in designer.py module.

The output will be written to the file 'polyCTL_out.log'.

When you've made all necessary changes into designer.py module code, you should save it and run the script either with double clicking the file or with running the followng command:

python designer.py

If you have any problems (or questions, suggestions etc.), please feel free to email me:
Den.Antonets@gmail.com
I'll be happy to help you!

Best Regards,
Denis