Pedro Download

History

by Kevin Garwood

In 2001, the COGEME consortium group was developing a standard data model for describing proteomics experiments. The committee knew that the widespread adoption of the standard would be greatly facilitated by the availability of software tools that could support the model. Parts of the model were volatile and frequently changed. Manchester University's Norman Paton led a research group to develop a set of tools that could support the scientists using an evolving model.

One of the researchers working with him was Chris Taylor, who spent about a year interviewing various scientists about what they would want in a data model. While he gathered requirements about the model, he also gathered requirements about what a data entry tool might look like.

By the summer of 2002, Norman and Chris were spread too thinly over the tasks they had to accomplish. It became very difficult for Chris to continue gathering feedback from scientists and to find time to code a well-designed data entry tool. Norman sought help from the ESNW, and asked to contract me to develop the software tool.

Traditional project funding tends to pay for research, not development. This often results in software that is robust enough to demonstrate concepts but not robust enough to service an end-user community. There are also usually pressures to publish papers that distract people from putting time into documenting and testing research software.

The ESNW funding ethos was able to respect software development as a service to accomplish research ends. When Norman approached me, my only real concern was: "Is there going to be a user-base for this?" to which he answered "Yes". I was able to do development and feel it was an activity that was on par with the development of the data model.

I spent from October 2002 to February 2003 developing the first release of Pedro. It remains one of the best teams I've worked with. We had Norman to manage the project. Chris Taylor was able to devote all of his time to polishing the data model the COGEME group was trying to promote. The interviews he conducted in the previous year allowed development to start from end-user requirements rather than untested hypotheses. The division of labour was a pairing of a domain scientist and a developer.

While Chris was off trying to finish the design of the data model, I had to develop a model to use for feature testing. I don't really know much about proteomics, so I decided to investigate other domains that I knew more about. When Chris finished the PEDRo XML Schema, we began testing with that. The tool was not written with any domain-specific code but was driven by the proteomics use-case.

By early January 2003, Pedro began catching the attention of other people. Andy Brass began to think the tool would have applications in environmental genomics. Chris Wroe, a researcher on the MyGrid project, thought Pedro could be used to annotate descriptions of bioinformatics services. My interactions with MyGrid were fostered by the fact that the P.I was Carole Goble, who also happens to be one of the ESNW's codirectors. Chris Wroe and his fellow MyGrid colleague Phil Lord have both given feedback on how Pedro's support for controlled vocabularies should be enhanced. Some of the deficiencies they identified were remedied and will not be noticed by proteomics scientists who may eventually require key word services.

While Carole's group began to evaluate the tool for their own domain, Chris Taylor, Norman and others from COGEME were busily preparing a paper: "A systematic approach to modeling, capturing and disseminating proteomics experimental data", which was published in the March 2003 issue of Nature Biotechnology.

From March onwards, Pedro began to be downloaded by more and more people. I was put on another project until November 2003. Norman negotiated with the ESNW to secure blocks of my time to do periodic maintenance of the tool. After that project ended, the ESNW allowed me to work full time on Pedro until April 2004.

A number of people in the department began to recognise that Pedro could create data entry forms for a variety of scientific disciplines. Originally known only as the PEDRo Data Entry Tool, the application began to warrant its own separate identity. I wanted to separate the tool from the model so I began thinking of candidate names. Faced with the frightening prospect of me naming the tool "Mad Zillion" or "Oregano Glow", Norman decided the tool should simply be called Pedro - not the contorted proteomics acronym PEDRo - just Pedro.

In the ongoing Pedro story, Chris Taylor has now moved down to the EBI and has joined Weimen Zhu's database group. Chris's group wants to use Pedro for the HUPO tissue projects. Kai Runte, a developer in the group, has begun jointly developing parts of the front end. Working with him and the others is forcing me to make my source code easier to understand. Their suggestions have been welcome and will improve the quality of the product.

In the time period between February 2003 and January 2004, Pedro has had over 285 unique downloads by people in 22 different countries. It has crossed domains and it has been entered in the European Academic Software Competition.

Please see models and people to get a perspective on the people who have helped make this tool become a success.