| Thursday, 01 June 2006 |
|
|
|
|
A Quick Guide to PRIDE: The PRoteomics IDentifications Database |
|
The advent of high-throughput proteomics has enabled the identification of ever-increasing numbers of proteins. Correspondingly, the number of publications centred on these protein identifications has increased dramatically. However, the mechanism for publishing these identified proteins has lagged behind in technical terms.
Long lists of identifications are either published directly with the article, resulting in a voluminous and rather tedious read, or are included on the publisher’s website as supplementary information. In either case, these lists are typically provided as PDF documents with a custom-made layout, making it practically impossible for computer programmes to interpret them, let alone efficiently query them. What is PRIDE? PRIDE (www.ebi.ac.uk/pride), the open source ‘PRoteomics IDEntifications database’, has been developed through a collaboration between the EMBL–European Bioinformatics Institute, Ghent University in Belgium (www.proteomix. org), the Yonsei Proteome Research Centre (YPRC) in Seoul, Korea and Manchester University in the UK. PRIDE offers a webbased query interface, a userfriendly data-upload facility, and a documented application programming interface for direct computational access. The complete PRIDE database, source code, data, and support tools are freely available for web access or download and local installation. The original motivation behind the development of PRIDE was to provide a common data exchange format and repository to support proteomics publications. However, the original purpose has grown with PRIDE, with the hope that PRIDE will provide a reference set of tissue-based identifications for use by the proteomics community. The PRIDE data model has been designed with flexibility in mind. The current iteration supports identifications from both LC–MSbased and gel-based techniques. Processed peak lists arising from MS, MS–MS and higher MS levels are supported. Post-translational modifications (both natural and artefactual) can be included in the data set. PRIDE is closely linked to the Human Proteomics Organisation’s Proteomics Standards Initiative (HUPO-PSI; psidev.sourceforge. net). It is intended that PRIDE will allow data transfer using the mz- Data and analysisXML data transfer standards, which are being developed by the PSI. The controlled vocabularies being developed by the PSI General Proteomics Standards workgroup will increase the power of PRIDE to support diverse proteomics data sets in a manner that can be effectively queried by the user. What can I do with PRIDE? PRIDE allows users to retrieve the complete set of protein identifications for a publication, along with the supporting peptide identifications and hyperlinks to further information. PRIDE can also be queried to find all relevant data sets for a particular protein of interest or for a given tissue. The retrieved data can be formatted in two different ways: one is the familiar HTML format, which is easily readable and hyperlinked. The other format is XML, which can be readily parsed by simple scripts or computer programmes. We also encourage proteomics researchers to submit their identifications to PRIDE. This obviates the need to publish long, impractical PDF tables as supplementary information to papers; instead, authors of proteomics papers need only to include the accession number(s) for their data set(s) and a reference to the PRIDE database URL in their publications. ![]() Figure 1. Summary of data types captured in PRIDE Submitting data to PRIDE. Submission to PRIDE is open to any proteomics laboratory generating protein and peptide identification data. PRIDE supports private data upload, allowing submittersto secure their data at the level of an individual user or as a collaboration. This allows data sharing among laboratories or with journal editors and peer reviewers without the data being available to the general public. Whenever the authors decide to make their data publicly available (e.g. after a publication has been accepted), it is simply a matter of clicking a button to share their findings with the community. It is, after all, the aim of PRIDE to create an open, public repository of protein identifications. Submission of data to PRIDE can be accomplished by completing (free) registration with the PRIDE system and following the ‘Upload Data’ hyper-link on the left of the PRIDE web page. The submission format is an XML document following the current PRIDE XML schema. This schema can be downloaded from the PRIDE homepage by following the ‘How To Submit Data to PRIDE’ hyper- link. Please contact the PRIDE team if you require any support with exporting your data to this format. The PRIDE software development team is nearing the completion of a Microsoft Excel based workbook to allow the export of data directly to PRIDE XML for small-scale experiments. Please contact the PRIDE team for further details of this new feature. Retrieving data from PRIDE. The data in PRIDE can be retrieved as a brief summary (EMBL-EBI developers are currently in the process of expanding the power of the statistical reporting functions of PRIDE), as a PRIDE.xml file or as a formatted HTML table. The database can be searched by following the ‘Search PRIDE’ hyper-link on the left of the PRIDE homepage. PRIDE Internationalisation. The PRIDE web pages and user interface have been translated into several languages including Korean, French and German, with Chinese and Polish translations underway. The selection of a language is automatic, being determined from the locale settings of your computer or Internet browser. If your particular locale is not implemented in PRIDE, you will be presented with the default English. Further reading. Jones, P. et al. (2006) PRIDE: a public repository of protein and peptide identifications for the proteomics community Nucleic Acids Res. 34, D659-D663 Martens, L. et al. (2005) PRIDE: The PRoteomics IDEntifications database Proteomics 5, 3537–3545 Côté, R.G. et al. (2006) The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. http://www. biomedcentral.com/1471- 2105/7/97 BMC Bioinformatics 7, 97 Orchard, S. et al. (2005) Further steps towards data standardisation: the Proteomic Standards Initiative. Proteomics 5, 337–339 |