A Tool to Detect Plant Metabolic Gene Clusters


Metabolic gene cluster are operon-like structures of functionally linked metabolic genes. They are common feature in procaryotes and filamentous fungi. In recent years more than a dozen metabolic gene clusters have been characterised in a variety of plant species. PhytoClust is a computational tool to identify and analyze metabolic gene clusters in plants. The program uses the plant genome as an input and runs a Hidden-Markov-Model based search algorithm to detect co-located enzymes based on user-defined detection criteria. The program is based on antiSMASH as the core detection tool and used all the secondary programs also used in antiSMASH.


PhytoClust takes FASTA or EMBL files as an input for the cluster search. You can chose between a selection of plant genomes already stored on the server or upload your own sequence. To detect gene cluster candidates you have two search options. Firstly, you can perform the search based on a list of currently known gene cluster types from different plant species and use default settings or your own parameters for the cluster range (range in which the respective enzymes are to be found) and flanking region (region in proximity of the cluster range that is searched for additional secondary metabolism enzymes). Alternatively, you can create your own cluster rule by selecting up to four enzymes families from plant secondary metabolism and setting the parameters for the cluster range and flanking region. This enables you to search for complete new cluster types or to test if your gene cluster candidate can be found in other species as well.

You can leave your email address and will receive a notification including a download link once your calculations are finished. Your results will be available for download for 7 days after the calculations are finished.


Once submitted the program searches the genome sequence for the given cluster types. After completion the results can be examined in the browse or downloaded for further offline analysis. Moreover, for a selection of species you can run a co-expression analysis on the detected putative gene clusters. If any of the gene cluster candidates show co-expression the results will be displayed with the respective putative cluster.

List of known metabolic gene clusters in plants with high-quality genome assemblies

List of secondary metabolism enzyme families in plants

  • 2-Isopropylmalate synthase (2Isopropylmalate S)
  • 2-Oxoglutarate-dependent dioxygenase (2OG-dep Diox)
  • Acyl-activating enzymes/CoA ligase (AAE/CoA ligase)
  • Acyl transferase (Acyl TF)
  • Aldehyde dehydrogenase (Aldehyde DH)
  • Carboxy methylesterase (Carboxy methylE)
  • Carboxylesterase (CarboxylE)
  • Cytochrome P450 (C p450)
  • Dehydrogenase/Reductase (DH/Reductase)
  • Dioxygenase (Diox)
  • Glucuronosyltransferase (Sugar transferase) (GlucuronosylTF)
  • Glutathione-S-transferase (GHS S TF)
  • Glycosyl transferase (Glycosyl TF)
  • Glycoside hydrolase family I (Glycosyl hydrolase fam 1)
  • Marneral synthase_Oxidosqualene cyclase_Prenyltransferase (Marneral S OSC Prenyl TF)
  • Methylene bridge-forming enzymes (MBE)
  • Methyltransferase (MethyTF)
  • NADPH-dependent dehydrogenase (NADPH dep DH)
  • Oxidoreductase (OxidoR)
  • Pathogenesis related lipase-like proteins (PRLIP)
  • Polyketide synthase (PKS)
  • Serine carboxy-peptidase like acyltransferase (SCPL) (SCP like acylTF)
  • Terpene synthase (TS)
  • Transaminase
  • Tryptophane synthase alpha homolog= Indole-3-glycerol phosphate lyase (Trp S alpha homolog)

Co-expresssion anlysis

Co-expression analysis (based on Pearson correlation) for the detected putative metabolic gene clusters is currently available for Arabidopsis taliana, Solanum lycopersicum (tomato), Solanum tuberosum (potato), and Oryza sativa (rice). The data for the Co-expression module were taken from the following publications.


Gan X. et al, Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature (2011) 477(7365):419-23


Itkin M. et al, Biosynthesis of antinutritional alkaloids in solanaceous crops is mediated by clustered genes. Science (2013) 341(6142):175-9


The Potato Genome Sequencing Consortium, Genome sequence and analysis of the tuber crop potato. Nature (2011) 475:189–195


Sato Y. et al, Field transcriptome revealed critical developmental and physiological transitions involved in the expression of growth potential in japonica rice. BMC Plant Biology (2011) 11:10


Program source code: