As we approach CASP11, I’d like to share with you this story that highlights the importance of collaborative research. According to R&D Magazine News published today ( an old mathematical puzzle attributed to Euclid may be unraveled soon. The puzzle states that there exists an infinite number of pairs of prime numbers whose difference is two (these numbers are called twin primes). Prime numbers are very useful in cryptography. They are used to ensure data security and protection in applications such as online banking and internet shopping.

In April 2013, Yitang Zhang presented a weak version of the solution by showing that there is an infinite number of pairs separated by at most 70 million. Soon after, James Maynard, a newly graduated from Oxford, reduced the gap to 600. What happened next? Hundreds of researchers have been collaborating to reduce the gap to two by submitting their research results to Polymath (the online collaborative platform that inspired us to create WeFold). According to Maynard “It is quite unusual for me since I’m used to working alone. But it’s really worthwhile to work within a community.”

Through the Polymath collaborative effort, the gap continues to decrease.

I hope to work with you soon in WeFold2!



WeFold is an experiment designed to compete and collaborate within CASP. We began discussing and preparing the WeFold gateway in 2011 but the project’s blueprint was pretty much undefined until the first day of CASP10. In May 2012 the WeFold participants logged in into the WeFold gateway and started the real discussion about how to combine the different components of their protein structure prediction pipelines into new pipelines.

After the official CASP10 results were released, we spent several months analyzing them and concluded that the mix-and-match methods that we collaboratively generated produced improvements relative to the base methods in a statistically significant manner. In fact, we were impressed that the combination of methods that were not optimized to work with each other, could achieve peak performance for certain CASP10 targets in both tertiary structure and refinement categories.

When we were in the first stages of the WeFold paper preparation (I’m very excited to report that we submitted it last week!!!), I read a New York Times article by Thomas Friedman entitled “Collaborate vs. Collaborate” [1]. The article describes how the industry uses coopetition, which refers to cooperative competition, to thrive in today’s markets and better serve their customers and suggests that the practice should be adopted by Congress to achieve similar results. I immediately adopted the word coopetition to describe what we have done with WeFold: I changed the title of the paper to “WeFold: Large-scale coopetition for protein structure prediction” and change the entire introduction of the paper to emphasize on the coopetition concept.

Last week, George Khoury sent me a paper that gives a comprehensive literature overview on the field of coopetition from a business perspective [2]. One reference in particular got my attention and helped me explain WeFold as a natural consequence of the CASP experiment. In fact, Hu et al. [3] compare business networks with many players or agents that use either strictly competitive or strictly cooperative strategies. They show that in a competitive environment a situation evolves which is in the middle between competition and cooperation and provides the best profit perspectives in the whole system. Their conclusion is that a competitive outset is advantageous for the whole system in comparison with a totally cooperative outset because the former evolves into a coopetition, which is the best of all the approaches. Thus, CASP is the best approach to dealing with a grand challenge and WeFold is the natural byproduct, the natural result of CASP evolution.

Finally, I’d like to make the distinction between the WeFold-style coopetition, which refers to large-scale, open collaboration within a competitive context, and specific, targeted collaborations between 2 or 3 players that sometimes occur, usually between leaders in complementary areas of the same field. To illustrate the difference, I’ll use the Netflix challenge which another WeFolder, Jaume Bacardit, brought to my attention. In 2006, the Netflix Corporation organized a competition in which it challenged the computer science community to develop methods that could beat the accuracy of their movie recommendation system by 10% [4]. It took 3 years for participants to meet the 10% challenge and during that time progress prizes were awarded. Many teams participated achieving an 8% improvement by the end of the first year. However, by year two they reached a plateau where barely any progress was made. Then, some teams began to “blend” resulting in further improvements. This is another example that proves Hu’s theory that a strictly competitive system will likely evolve into a coopetitive one. But the interesting observation here is that initially the blending only included the leaders of the competition and that the resulting teams could not make substantial progress. It was the collaboration between the leaders and the most outlying teams, i.e., also-rans, that made them won. This unlikely collaboration proved to be key towards the end of the competition because the combined approaches captured effects that the mainstream ones had neglected [5,6].

With this in mind, I’d like to invite all the CASP participants as well as those non-CASP participants who have interesting ideas and methods to share to join the WeFolders for another experiment in coopetition that we’ll call WeFold2.

Talk to you soon!


1. Friedman, T. L. “Collaborate vs. Collaborate”, 2013.
2. Stein, H.D. “Literature Overview on the Field of Co-opetition”, Business: Theory and Practive 11 (3): 256-265, 2010.
3. Hu, Y., Houdet, J. and Duong T. “A Multi-Agent Model of Cooperative and Competitive Strategies in Supply Chain”, Journal of Fudan University, Shangai 9(1):873-879, 2008.
4. Bennett J. and Lanning S. The Netflix Prize. Proceedings of KDD Cup and Workshop 2007, San Jose, California, Aug 12, 2007.
5. Buskirk E.V. “How the Netflix Prize Was Won”, Wired 2009.
6. Lohr S. “Netflix Competitors Learn the Power of Teamwork”, 2009.


A planned outage at NERSC is affecting the WeFold gateway. It should be back by 21:30 PST.

The gateway ( for the collaborative effort is ready. I’m very thankful to NERSC (National Energy Research Scientific Computing center, for hosting both the gateway and data that we will share during this experiment.

The gateway has a public front page that is open to everyone and then a password-protected area for those who want to participate in the experiment. The front page has an opening message and a table that shows the name of the participants (those who have contacted me) and how they plan to contribute.

Below the We Fold title and image there is a “Targets” button that you need to click to enter the protected area. There is also a panel on the right-hand side with links to each one of the targets. Target IDs will be updated to their current numbers once they are released.

Once inside each one of the targets’ discussion groups, those who have logged in can become members of the groups, enter comments, upload/download files, start a Jmol viewer popup, drag the name of a pdb file to the viewer for protein visualization, and create forums for specific discussions within the groups. The main discussion about each target will be done through the comments and the documents will make it possible to share the data. The site is configured to accept files with extensions pdb, mol, xyz, pqr, txt, gz, zip, tgz, tar, and z.

There is also a protected area for general discussions that are not particularly related to a specific target. This General Discussions group is for back and forth discussions among participants.

If you’re interested in participating in this effort please contact me at

Thanks to all who responded to this invitation! This table summarizes the responses I’ve got so far:


I have received a NERSC allocation for those who need computing cycles for this collaborative effort. Please let me know if you need some.

More news coming soon…


We had a very good discussion at the Zing Conference on Protein and RNA Prediction (many thanks to Andrzej Kloczkowski for organizing the discussion panel and to John Moult for chairing it.)

A number of people were very supportive of the idea while others expressed some doubts.

The good news is that there are 10 groups that are willing to participate in this effort. Andriy Kryshtafovych, one of the CASP organizers, suggested to start with CASP ROLL as this allows for plenty of time to work on each target.

The two main concerns referred to:

  • Collaborations already happen in CASP and therefore, there is no need to organize this collaborative effort. As a member of the audience pointed out, the collaborative effort will allow groups and individuals that currently do not participate in CASP to join the CASP groups in the discussion and prediction of some targets. Furthermore, the goal of this collaborative is not to replace the current way people collaborate in CASP but to offer a new way to connect people and leverage people’s skills. Moreover, if the collaboration between 2 groups has proven successful, then just on probabilistic grounds there is a greater chance that the collaboration among several groups will be equally or more fructiferous.
  • Sharing and stealing ideas. To address this concern we will set up a wiki system (see below) that will keep track of everybody’s contribution within a password‐protected environment. In addition, those that do not feel comfortable sharing their methods/ideas can still share some results without revealing the method used to achieve them. We’re interested in the aggregation of methods and/or ideas to solve a difficult problem.

What’s next? This week we’ll set up the web infrastructure for this collaboration. We will use this blog for the open discussion and will set up a wiki (that uses the OpenID protocol) for the issues that are specific to the CASP participation and predictions. Anybody will be able to register to the wiki and only those registered will be able to read the wiki contents and add their comments and/or data. I will let you know when it is ready.

I have been participating in CASP since 1998 in the ab‐initio/free modeling (FM) category. Our physics‐based method was very computationally intensive and I was as convinced then as I am now that using human knowledge and intuition to guide the search process could significantly accelerate the time to solution. We began developing a tool to support human‐computer interaction in 2001 and we had a prototype ready to try the human steering approach in 2003.

Although we found that the combination of human intuition and computer power was an improvement over the original method, we realized that it was not good enough. The main reason was that the interaction process was focused on our physics‐based method and, although this had strengths, it also had many weaknesses. What we actually needed to crack this difficult puzzle was the aggregation of different methods and ideas. However, a large‐scale collaboration in the context of CASP didn’t seem feasible then.

In 2009 the mathematician Timothy Gowers created a project called Polymath. The Polymath project had one ambitious goal: doing math research in an open way. Using blogs and wikis to mediate a fully open collaboration, this project let any person follow along and contribute ideas to the solution. And it worked! The contributors found the proof of the density Hales‐Jewett theorem and published 2 papers about their research. Currently, there are 8 active Polymath projects.

Since I couldn’t attend the CASP9 meeting and I wanted to prepare a review of the results in the free modeling category, I read the assessors report available at the prediction center site This report includes some discussion points from the FM modeling roundtable. The following comments from roundtable participants suggest that the community is finally ready for a collaborative effort:

“We are stuck in a very deep local minimum.”

“The present dead‐lock situation in CASP comes from the fact that almost all participants apply the same methods, there are no innovators.”

“Get these top predictors to work as a group to solve these tough problems rather than perfecting one method of their own.”

“The less impossible scenario is to have one open‐source platform for the whole community, like SBML or Cytoscape, where developers in the field contribute to it without any reservation.”


After reading these comments I felt encouraged to write an email message to about 20 representatives of different groups to find out how receptive they were to the idea of a collaborative effort during CASP. The feedback was very positive and encouraged me to scale up the conversation.


The proposal

Tim Gowers started the Polymath conversation with a post on his blog entitled “Is massively collaborative mathematics possible?” and, based on the results the answer is: Yes, it is!

The question for us is: Is a massively collaborative group effort possible during CASP?

We don’t know the answer yet but a number of CASP participants think we should try it and that by itself is a very positive step forward!

The idea is to have a forum for the online discussion and collaboration during CASP10 (or even sooner with the Rolling CASP experiment). The invitation would be open to groups and individuals who want to participate in this experiment by sharing their insight and observations, their data, their most recent code, or whatever they’d like to contribute. This collaborative experiment will let different groups or individuals work on different components of the prediction pipeline thus making it possible to leverage expertise like never before. For example, a group posts the best alignment, another group models the loops, a person ponders about a scoring function and another group implements those ideas and applies the scoring function to the current set of structures, etc, etc.

What would be the goal? The immediate goal would be to submit predictions for a small group of targets. The long‐term goal would be to shake up the field and push it out of its “very deep local minimum”.

What would be the advantage of proceeding this way? There are some potential advantages to this approach. Michael Nielsen, co‐author of the Polymath project paper published in Nature, addresses this issue in his blog very eloquently; therefore, I will paraphrase his words here.

First, you can think of this forum as a way of scaling up the scientific conversation, so that conversations can become widely distributed in both time and space. Today, only a small group of people have the opportunity to listen as D. Baker, Y. Zhang, J. Skolnick (fill in the name of your favorite protein scientist here), brainstorm with the members of their labs or in the post‐CASP meeting; why not have hundreds of talented people listening in? Why not enable people who have specific expertise contribute their insights back and combine all that insight and test the combined knowledge during CASP?

The exchange should be informal and rapid fire: let’s shoot ideas, let’s combine ideas, let’s test these ideas!

Second, as Gaurav Chopra put it, a collaborative effort makes perfect sense with respect to the division of a prediction pipeline handled by best methods and people who do different parts of the pipeline like alignment, scoring, refinement, etc.

Finally, another benefit of this forum is to make the conversations searchable so that future protein modelers can also benefit from this insight.

In summary, advantages of this open collaboration include:

  • Aggregate methodologies and knowledge
  • Leverage knowledge and strengths of different groups
  • Encourage CASP outsiders to contribute their insight and perspectives
  • Encourage groups that focus on specific aspects of protein structure prediction to contribute
  • Make expertise searchable


The resources

We are very fortunate to have the support of the National Energy Research Scientific Computing Center (NERSC). They will help us get the infrastructure (web services, wikis, groupware, file sharing, etc) that we need to run this collaborative effort.


The ground rules

Of course, there is a lot to be discussed about how to implement this collaborative, how to choose the targets, how to choose the models for submission, etc. Let’s start with some of the ground rules set by Gowers for his Polymath project and then add more specific rules for this project.

1. Comments should be concise.

2. Comments should be easy to understand so others get the idea and can build on it.

3. Stupid comments are welcome. Not stupid like “unintelligent” but stupid like not fully thought through.

4. If you can see why somebody else’s comment is stupid, point it out in a polite way. And if someone points out that your comment is stupid, do not take offense.

5. Don’t actually use the word “stupid”.

6. If you are convinced that you could answer a question but it would just need a couple of weeks to polish your thoughts and try a few things out, then resist the temptation to do that. Instead, explain concisely why you think it is feasible to answer the question and see if the collective approach gets to the answer more quickly. Only go off on your own if there is a general consensus that that is what you should do.

7. If you think of an approach that is completely different from the current one then you should suggest starting a different track. We need to decide how we’re going to choose which models to submit.

8. If the experiment results in something publishable then the paper should be submitted under a pseudonym with a link to the people who contributed and the entire online discussion.

9. The collaborative experiment should focus on a small number of targets, leaving plenty of time for the individual groups that want to participate in CASP to do so as usual.


A final invitation

I hope you’re interested in joining me on these discussions. We’re going to discuss this collaborative at the Zing Conference on Protein and RNA Structure Prediction next week. I’ll post a summary here so that everyone knows what we discussed and can further comment.

We can’t guarantee that this experiment will produce groundbreaking results. However, no matter what the short‐term results may be, let’s hope that it will create the spark to fire up the field and to speed up the rate at which discoveries are made.