Data and Code
This section of my site contains links to the datasets and code I am actively maintaining and have publically available.
S4PRED
A state-of-the-art single-sequence secondary structure predictor for proteins. The code for the S4PRED tool is available on GitHub. The S4PRED data is hosted on the Jones Group servers of UCL Computer Science.
- Model parameter files (Download)
- Real-labelled training set (Download)
- Real-labelled validation set (Download)
- Real-labelled test set (Download)
- Pseudo-labelled training set (Download)
Check out the GitHub repository for further details on the data, the code, and how to use the tool. If you'd like to use the S4PRED model without running it on your own hardware, please check out the PSIPRED Workbench.
DARK
A tool for generating, scoring, and designing massive numbers of hallucinated protein sequences. The code for the DARK tool is available on GitHub.
- Model parameter files
- Training set
- Validation set
- Test set
This data is all currently accessible through google drive. This is a temporary location so please see the relevant guidance here for directly downloading these datasets.