PHYTOPHTHORA FUNCTIONAL GENOMICS DATABASE (PFGD)

INTRODUCTION

PFGD Database – Is a comprehensive repository of information regarding plant functional genomics (fungi and plants). It contains data derived from over 600 publications and the databases are updated periodically.

PFGD database has several search and browse options. Search option allows you to search specific type of plant or fungal genes in different categories such as genes involved in development, metabolism, signaling, stress response, photosynthesis, transcription factors, etc. Browse options include search by gene locus, gene product name, function/description, taxon, keywords, citations, genome sequence source, PubMed ID, author names, publication dates, organism and others.

This course includes an overview of PFGD database and the search options. This course also includes some basic information about plant functional genomics (fungi and plants).

 How to use the database

1. Phytopthora functional genomics database is a database which contains gene sequences of some economically important crops such as corn, rice, cotton, soybean and potato.

2. The gene sequences are collected from NCBI, Ensembl, EMBL-EBI and DFCI databases and can be accessed through a web browser.

3. The data are also provided in excel format.

What are the current applications of PFGD?

1. This course is about Phytopthora Functional Genomics Database (PFGD). You will learn all aspects of PFGD.

2. This course is the perfect place if you are interested in Plant-Fungus interaction, Gene Discovery, Microarray and Bioinformatics tools.

3. In this course you will learn how to use PFGD tools such as BLAST, TBLASTN, BLASTP, etc.

4. In this course you will also learn about Phytopthora Functional Genomics Database (PFGD).

5. You will learn how to use PFGD tools like BLAST, TBLASTN, BLASTP, etc.

6. This course is for beginners who have no idea about bioinformatics.

7. This course is for intermediate to advanced level students.

8. This course covers topics like Bioinformatics, Genomics, Proteomics and Genome Annotation.

9. This course is for students who want to know more about genome annotation and gene discovery.

10. If you want to learn more about bioinformatics, genomics, proteomics and other related subjects then this course is for you.

11. This course contains 20 lectures and 8 hours of content.

Phytopthora Functional Genomics Database (PFGD) – How It Works.

PFGD is an Open Source, web-based system that supports functional genomics data management and analysis of multiple species. PFGD was created to help researchers and students in the fields of genomics and functional biology. The system is designed to allow researchers to easily integrate, manage, store and analyze different types of data related to gene function and regulation. It is designed to support the broad field of functional genomics research, including transcriptome, epigenetics, proteomics, metabolomics, and phenotyping of various model organisms and plants.PFGD integrates with third party tools to support its users, allowing them to view their data in context using tools like KEGG, GO, PANTHER, ChEBI, Reactome, etc. In addition, PFGD can also support the analysis of user data using custom pipelines or external software like EMMA, ANNOVAR, VARIABLE SELECTION, etc.

 What are its major functions?

Phytopthora Functional Genomics Database (PFGD) is an open source sequencing analysis tool that allows researchers to conduct a variety of bioinformatics analyses on their sequencing data. In this article, we will explain how to get started with PFGD, what types of analyses can be done, and how to get help from the community.

What is the database?

Phytopthora is a web database of plant genes that allows anyone to conduct their own experiments using our tools. It’s built by scientists, for scientists, and is free to all.

Phytopthora is the culmination of several years of development work by my lab, and it has been designed to allow anyone with basic programming skills to use our tools to analyse their own data.

By doing so, they can discover new genes, and develop hypotheses about how those genes are involved in the growth, development, and stress response of plants.

DATA CONTENT

PFGD makes use of the XGI system for automated analysis and annotation of transcript data, including ESTs, consensus sequences and full length ORFs (UniORF) as well as genomic data. Sequence data are processed through a series of analysis operations, each operation building upon the results of the previous stage. Functional assay experiments are manually curated whereas expression data are imported in an automated bulk upload.

Transcript data

The PFGD transcript data content consists of quality screened EST and derived consensus sequences for P. infestans and P. sojae as well as UniORF sequences for P. infestans. Using the XGI transcript pipeline, raw public EST and cDNA data are gathered from NCBI and analyzed. Where available, quality scores for EST sequences are incorporated into the database for use by downstream analysis components. Detailed metadata concerning sequence origins such as submitting organization, organism, library details and cloning methodology, are captured and are viewable in the PFGD interface.

Before being used in analyses, raw EST data are screened for quality. Screening operations include removal of most common vector sequences, poly (A/T) trimming, N trimming, adapter/linker removal and poor quality read trimming. Vector screening and adapter/linker screening remove sequence contamination of the insert that typically arises as part of the cloning process. In addition, the fidelity of a sequence read typically degenerates toward the end of the sequence, resulting in errors in base calling which are trimmed out as part of this process. Finally, low complexity sequences represented by polyadenylated regions can produce many false positive matches in downstream analyses. The end result of the quality screens is a high quality ‘approved sequence that is then deposited in the database. An EST that has failed the XGI vector screen analysis for one or more reasons is not included in subsequent analyses, but may still be inspected through the interface as a failed EST.

 Database structure of PFGD

Approved EST data are clustered, which performs clustering and contig assembly to produce consensus sequences. these aggregate the high quality sequence information of their member ESTs and are used in all downstream analyses. Consensus sequences are analyzed using NCBI’s blastx algorithm (6) to search for potential homologues against NCBI’s nr database (7). BlimpsSearcher analysis (8) against the Blocks+ database (9) is used to identify protein blocks. InterProScan (10) is run to integrate results from a variety of protein motif analysis tools using the InterPro database (11). Each of these analyses is followed by a stage that uses the results to associate Gene Ontology (GO) terms (12) with the sequences. Pexfinder (Phytophthora Extracellular Proteins Finder) (13), co developed by NCGR and OSU OARDC, based on SignalP (14) has also been incorporated. Pexfinder predicts proteins secreted through the pathogen plasma membrane based on scanning for signal peptides in cDNA sequences. The results of the pipeline analyses are stored in the PFGD database and are updated periodically depending on the number of public sequences available for analysis.

UniORFs are full length P. infestans cDNA sequences derived from select consensus of interest based on sequence analysis and annotation from PFGD. UniORFs are used in functional assays on potato and tomato as well as in gene expression experiments using microarrays based on tomato or potato consensus sequences annotated and stored in SolGD. UniORFs are also annotated using the above mentioned post clustering analyses.

The Genomic Pipeline consists of a suite of tools to automate the generation of an annotated genome. It is based on a pipeline approach that starts with raw sequence data and then produces a genome assembly that includes gene models (gene structure annotation), protein sequences, repeat annotations, and quality metrics. These tools are used to create a draft genome assembly which is then improved through several iterations of refinement.

Genome assembly is the process of stitching together fragments of DNA into a single continuous sequence. When we get the genome sequence for a species, it is usually split up into chunks or contigs, each one representing a piece of the whole genome. It’s a very complicated process, so it is often outsourced to a sequencing center.

Programs are used to annotate the protein sequences in the assembly using the InterPro database. Results that are in common between overlapping pieces are merged before being stored.

The GenScan program is used to predict the genes and exon-intron organization in the genomic sequences.

Genomics is a rapidly evolving field, and the genomic pipeline is constantly evolving to keep up with new advances. However, the results of the genomic pipeline are stored in the PFGD database and are updated periodically.

Genomic data

PFGD genomic data are processed using the XGI genomic pipeline. Public sequence information is gathered from NCBI’s High Throughput Genomic (HTG) division (7) for the species of interest. HTG sequences are from large scale genome sequencing centers and are submitted as in process assemblies in various stages of completeness, often containing two or more contigs. PFGD does not assemble its genomic data, but takes data from GenBank as submitted by the sequencing centers. Each genomic sequence is then separated into its constituent contigs and analyzed in pieces using a sliding window of length 10 000 with an overlap of 3000 bp. These pieces are processed using blastx (6) against NCBI’s nr database (7), and with blastn and tblastx (6) against the consensus sequences produced by the PFGD transcript pipeline. BlimpsSearcher (8) and InterProScan (10) are used as described previously. Analysis results that are in common between overlapping pieces are merged before being stored. GenScan (15), which performs ab initio gene prediction on the genomic sequences, is run on the complete contig sequences, providing the opportunity to define and compare the genes and exon intron organization of the sequences. The results of the genomic pipeline are stored in the PFGD database and are updated periodically depending on the number of sequences available for analysis.

Functional assay data

Functional assay experiments performed on host plants using full length P. infestans ORFs (UniORFs) are manually curated by the OSU OARDC into PFGD via the functional assay curation interface. A user account management system allows only privileged users to create or edit experiments, although all PFGD users can view the experiments already curated in PFGD.

Gene expression data

An extensible MIAME compliant data model, designed in such a way that it integrates well with the existing data model for transcript and genomic data, stores the gene expression data. Tomato gene expression experiments have been imported into this gene expression component using an automated upload mechanism. Sequence and gene expression data are tightly integrated through the user interface.

USER INTERFACE

The PFGD web interface provides a powerful set of search tools to access, compare, and save transcript, genomic, gene expression and functional assay data. Sequence and analysis data are logically organized and searchable in a variety of ways. The PFGD interface allows intuitive navigation of all publicly available transcript and genomic data (Figure 2). A list of libraries is provided in a hierarchical structure based on the organism, sequence type, library name and organization which produced the library for user selection. Precise delineation of library selection can be accomplished by using the boolean logic operators (AND, OR, NOT) in conjunction with scope delimiters (STRICT, LOOSE) thus enabling virtual Northerns and in silico subtractions. More sophisticated queries using keyword searches on features and GO annotations are also supported. The interface also gives users the ability to search for results and annotation based on specific types of analysis tool output (e. g. only retrieve sequences that have InterPro results) or combinations of output (retrieve only sequences with both Blocks+ and InterPro results). Access to functional assays is provided for each curated UniORF.

The interface presents data in a variety of formats, including graphical depictions of sequences decorated with their predicted features, multiple sequence alignments (MSAs) with highlighted sequence variants and detailed reports of analysis operations. These annotations can also be downloaded in batches using the interface.

Sequence details provides a detailed view of all sequences in a run. It captures all information relevant at the individual sequence level, and other information can be accessed through this display. This includes quality trimming, base composition, as well as sequence metadata and clustering information. For Consensus and UniORF sequences, it is also a portal to all analyses run on the sequence.

Sequence details

The sequence detail views capture all information relevant at the individual sequence level, and other information can be accessed through this display. This includes quality trimming, base composition, as well as sequence metadata and clustering information. For Consensus and UniORF sequences, it is also a portal to all analyses run on the sequence.

Oomycetes, in particular Phytophthora spp. , comprise a unique branch of destructive eukaryotic plant pathogens that are responsible for causing a number of world’s most devastating diseases of dicot plants. Not only are these diseases difficult to manage, but also they cause enormous economic damage to important crop species such as potato, tomato and soybean, as well as environmental damage in natural ecosystems. These plant pathogenic microbes have the remotest of ancestry, being derived from a common ancestor of Chytridiomycota and Zoopagomycota. Despite their phylogenetic proximity, they have evolved to infect vastly different plant hosts, with some species even being host specific. The genomes of two of these oomycetes, Phytophthora infestans and Phytophthora sojae, have been sequenced, and these have provided valuable insights into the molecular mechanisms that underpin the host range of these organisms. However, much remains to be learnt about these species, and the availability of a reference genome for the third oomycete, Phytophthora ramorum, has enabled the first systematic investigation of the genome of this organism.

Remarkable ability to manipulate biochemical, physiological and morphological processes in their host plants via effector molecules . In collaboration with The Ohio State University Ohio Agricultural Research and Development Center (OSU OARDC), we have constructed the Phytophthora Functional Genomics Database. We are experimentally characterizing effector gene and protein sequences, gene expression patterns, biological activity, as well as cellular responses and localization during infection. PFGD is a publicly accessible, web based information resource designed to capture these heterogeneous data in a useful and intuitive way for biological researchers.

PFGD interrelates functional assays, transcript, genomic and expression analysis. PFGD was built upon data formerly available from the Phytophthora Genome Consortium and the Syngenta Phytophthora Consortium , as well as from all publicly available Phytophthora infestans and Phytophthora sojae transcript data and P. infestans genomic data all of which are analyzed and annotated using NCGR’s XGI (Genome Initiative for species X) automated computational pipeline. PFGD integrates with NCGR’s Solanaceae Genomics Database to explore plant pathogen interactions. SolGD hosts the expression studies of Solanaceae response to P. infestans isolates and will soon incorporate expression profiles for unique open reading frames (UniORFs) in comparable plant pathogen systems as data become available. These federated databases are designed to provide significant insight into key molecular processes regulating an economically important pathosystem and a useful and accessible computational platform for the study of disease resistance in crop plants.

PFGD ARCHITECTURE

PFGD integrates transcript, genomic, gene expression and functional assay data from a number of sources and allows users to access and compare data via a multifaceted web interface (Figure 1). PFGD has a modular architecture, which can best be described as a local federation of databases. This allows each module to be updated independently reflecting the reality of differential data generation in the scientific community. For example, the functional assays are curated in real time as data become available, whereas the transcript module is updated based on expressed sequence tag (EST) count thresholds set for each species. A core database module has been developed to encapsulate data common to more than one module. For instance, the same user login information can be used to access MyData modules of both PFGD and SolGD.

A brief description of pfgd architecture.

All publicly available transcript and genomic data from P. infestans and P. sojae have been analyzed using NCGR’s XGI system. Analysis is done within each species and the results can be used in cross species comparisons to identify species specific gene sequences. Comparative transcript data are used in genomic analyses by aligning gene sequences to genomic contigs to validate ab initio gene predictions. Functional assay experiments performed on host plants with full length P. infestans ORFs (UniORFs) are manually curated at OSU and uploaded to PFGD. Transcript data, TIGR Tentative Consensus or consensus sequences generated by XGI, for Solanaceae host plants are analyzed using the XGI system and imported into SolGD. Gene expression experiments performed on Solanaceae with P. infestans UniORFs will be imported into SolGD as they become available and will be tightly integrated with PFGD. Various user interface tools to query and visualize the above mentioned data content are provided.

Phytophthora functional genomics database (PFGD): functional genomics of phytophthora–plant interactions.

PFGD has become a valuable tool for plant pathologists, researchers, and the public to discover and share information about Phytophthora species. The PFGD hosts information on Phytophthora species from all over the world, including new isolates and unpublished data, as well as information on the genes involved in Phytophthora plant interactions.

PFGD is a database of expressed sequence tags (ESTs) generated from Phytophthora infestans and Phytophthora sojae. These ESTs were generated by two independent research groups using the Sanger Institute’s 454 sequencing platform.

 Role of phytophthora in plant disease and yield loss

PFGD is a web-based service for searching and visualizing genomic and transcript data, and it has been used extensively in our lab. PFGD uses a query language called GeneQuery to allow users to navigate through large amounts of transcript or genomic data. It can also be used to search and visualize genomic and transcript data in the context of annotations and gene families.

PFGD is a web-based database that integrates various types of biological data for the plant family Solanaceae. It is designed to provide the user with a central resource to store, access and analyze data.

Why is functional genomics important?

Phytophthora is a genus of oomycete, a type of fungus-like protist that can cause devastating diseases in plants. It is the third most economically important pathogen of plants after Fusarium and Pythium. The genus consists of more than 100 species, of which many are of major importance in agriculture. Phytophthora was first described by Christian Gottfried Ehrenberg in 1833.

Evolutionary relationship with other eukaryotes and have evolved to produce a complex and diverse range of effector proteins that are delivered into host cells to suppress host immunity and to enable infection. The genome of Phytophthora sojae, a major pathogen of soybean, was the first to be sequenced and has provided a wealth of information about the evolution of these effectors and the mechanisms of pathogenesis. However, the majority of oomycetes remain poorly studied, and the majority of the information available is limited to the genome sequences of a few species. The limited information available about the biology of oomycetes is due to their unique evolutionary history and the fact that most of them are obligate biotrophs that can only infect plants.

 It producing highly specialized cell types to aid in their survival and growth within their host, and are able to overcome a range of host defense mechanisms. This requires the expression of a number of virulence-associated genes, including those encoding effector proteins that manipulate host cellular processes to the benefit of the pathogen .

There are two main groups of effectors in oomycetes: RxLR and Crinkler (CRN) effectors. RxLR effectors are small proteins of about 100 amino acids, and contain an N-terminal signal peptide, a central RxLR motif, and a C-terminal effector domain (3). RxLR effectors are subdivided into two main classes based on their effector domain.

 RxLR effectors are small proteins 

Again, the majority of RxLR effectors are small proteins of about 100 amino acids, and contain an N-terminal signal peptide, a central RxLR motif, and a C-terminal effector domain. RxLR effectors are subdivided into two main classes based on their effector domain. The RxLR effectors contain a typical RxLR motif, and a C-terminal effector domain. The RxLR effectors have a role in a wide range of processes, including host cell entry, modulation of host cell signaling, suppression of plant immunity, and induction of disease symptoms (4).

RxLR effectors are secreted proteins that are translocated into host cells during infection. They are also known as type III effectors because they have a type III signal peptide and are targeted to the apoplast.

The PEXFinder program was designed to allow users to identify and annotate potential pathogen-associated molecular patterns (PAMPs) in genomic and transcriptomic data. Users can select any gene from the input files, which are then analyzed for potential conserved domains using a local BLAST search against a curated database.

There are many ways to explore the relationship between the zoospore stage and the watercolor sequence, and this experiment is just one of them. For instance, you could create a whole series of watercolor sequences and then use each sequence to paint the same animal. You could also make up your own watercolor sequences and see if they match the sequences you have already created.

 Main classes of RxLR effectors

Again, there are two main classes of RxLR effectors: one class is comprised of proteins that contain a plant-targeting peptide (PTP) and a RXLR motif. The other class contains proteins that do not contain a PTP. The RXLR effectors are divided into two groups: RXLR-like effectors and CRN effectors. The RXLR-like effectors contain a conserved RxLR motif, but the effector domain is not typical of RxLR effectors . The CRN effectors are divided into two groups based on the number of repeats in the conserved motif.

CRN effectors contain a conserved sequence motif of ~30 amino acids and a C-terminal CRN domain.

Again, these two classes of RxLR effectors have distinct functions. Class I effectors are translocated into host cells by the RxLR motif, whereas class II effectors are translocated by the Crinkler (CRN) domain. The RxLR and CRN domains are the most abundant effector domains in oomycetes, and have been found in many other eukaryotic organisms.

The oomycete genome encodes hundreds of RxLR effectors, and these effectors are often clustered in the genome. It is thought that the RxLR effectors evolved from the RxLR effector domain, which is present in the RxLR effector protein AvrPtoB in plant pathogens.

PHYTOPHTHORA INFESTANS

We use the following approach to annotate the drug:

1. We use the available expression data to identify the drugs that are likely to be effective.

2. We use the drug’s target genes to predict the disease that the drug is likely to treat.

3. We use the target genes to predict the disease that the drug is likely to treat.

4. We use the disease’s target genes to predict the drug that is likely to treat the disease.

5. We use the drug’s target genes to predict the disease that the drug is likely to treat.

6. We use the disease’s target genes to predict the drug that is likely to treat the disease.

7. The Ingenuity Pathway Analysis (IPA) software was used to perform pathway analysis.

IPA is a web-based software that allows you to identify biological pathways that are overrepresented in your gene list.

We performed a global analysis using IPA.

IPA calculates a p-value for each biological pathway in your gene list. A p-value indicates the probability that the association between the genes in the pathway and the condition you are analyzing is due to chance alone. The lower the p-value, the more significant the association.

A p-value of less than 0. 05 indicates that the association is significant.

We performed a gene set enrichment analysis (GSEA) using the Molecular Signature Database (MSigDB).

I do not use functional assays to determine if a drug will work or not. Instead, I use gene expression data to predict whether a drug will work or not. This is because gene expression data are much easier to obtain than functional assays. In addition, gene expression data can be easily integrated with other data sources.

Phytopthora Functional Genomics Database (PFGD) – Is a comprehensive repository of information regarding plant functional genomics (fungi and plants).

It contains data derived from over 600 publications and the databases are updated periodically.

PFGD database has several search and browse options. Search option allows you to search specific type of plant or fungal genes in different categories such as genes involved in development, metabolism, signaling, stress response, photosynthesis, transcription factors, etc. Browse options include search by gene locus, gene product name, function/description, taxon, undefined, citations, genome sequence source, PubMed ID, author names, publication dates, organism and others.

Conclusion

The PFG Database was designed with the mission of helping researchers around the world perform fast and accurate gene function prediction based on sequence homology.

1. PFGD is a freely available database of functional genomic sequences that allows users to search for candidate genes in a large number of genomes.

2. The first release includes over 500 bacterial genomes, and this number is growing. Genomes of other prokaryotes are in progress.

3. PfGD has been developed with a goal of supporting large-scale comparative genomic analyses.

4.PFGD has collected data from thousands of published articles and over 700,000 protein sequences. Information is organized by sequence, function, organism, and other relevant information.

5.Accessing PFGD is easy. Simply search for a gene of interest and then use the “Genes” link to view all of its functions.

6.PFGD contains over 300 different genes related to phytopathogens, many of which are new to science. This list includes genes associated with pathogenicity and virulence, plant resistance mechanisms, environmental sensing, detoxification, and the synthesis of secondary metabolites.

Leave a Reply

Your email address will not be published.