SPREAD: Spatial Phylogenetic Reconstruction of Evolutionary Dynamics

Authors
Filip Bielejec (filip.bielejec@rega.kuleuven.be)
Philippe Lemey (philippe.lemey@rega.kuleuven.be)
Andrew Rambaut (a.rambaut@ed.ac.uk)
Marc Suchard (msuchard@ucla.edu)
Contact
Filip Bielejec (filip.bielejec@rega.kuleuven.be)
Download
Compiled, runnable program can be downloaded from: http://www.phylogeography.org/SPREAD.html
Get the source code on GitHub: https://github.com/phylogeography/SPREAD . You can download this project in either zip or tar formats. You can also clone the project with Git by running:
$ git clone git://github.com/phylogeography/SPREAD
  
In this supplement we give a detailed description of program functionalities, describe some possible analysis and give general guidelines on running SPREAD. We present an example for each of the possible analysis. The example files used in the visualisations can be accessed from http://www.phylogeography.org/SPREAD.html (right click and ’save target as’).
Table of Contents

1 Recommended platform, hardware and software for SPREAD

Spread will run on a variety of platforms, as long as a suitable Java Runtime Environment is present. Recommended JRE include OpenJDK and Sun JRE, however application will also run on other runtime environments. Spread has been tested on well established operating systems namely Debian GNU/Linux, Mac OS X, Windows XP, as well as more esoteric ones like Nokia Linux Maemo 5 and Linux MeeGo. For most of the templates following hardware is sufficient:
The Time Slicer analysis is more resource-hungry and therefore we recommend running it with following hardware:

2 Visualizing location-annotated MCC tree

This tutorial assumes the user has generated the location annotated Maximum Clade Credibility (MCC) tree. Good tutorial on doing so can be found in Tree Summary section of BEASTs tutorial wiki (http://beast.bio.ed.ac.uk/Tree_summary).
The example tree file and location coordinates file for this analysis can be found here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/H5N1_HA_discrete_MCC.tre and here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/locationCoordinates_H5N1.txt. This example considers influenza A H5N1 diffusion among 7 discrete locations. We will visualise this data using Spread own map and virtual globe software.

Loading the data

Click on the Discrete Model tab, then on the Load tree file button and navigate to the location of Your MCC tree file to load it.
figure fig01.png
Figure 1 Loading MCC tree file.
The SPREAD will now set the working directory to the one containing Your file. This means that KML output generated by SPREAD will be saved in this directory.
figure fig02.jpg
Figure 2 Message in Terminal.
To view the MCC tree in it’s geographic context, we have to associate each location with a particular latitude and longitude. To this purpose, you can either use the editor supplied with SPREAD and generate the input file or load previously prepared tab-delimited file including each location, its latitude and longitude. For the H5N1 example, the file should look like this:
Fujian 25.917 118.283
Guangdong 22.87 113.483
Guangxi 23.6417 108.1
Hebei 39.3583 116.6417
Henan 33.875 113.5
HongKong 22.3 114.167
Hunan 27.383 111.517
figure fig003.png
Figure 3 Location coordinates editor.
Go to the Discrete Model tab and click on Setup location coordinates button. You should see the list of discrete locations parsed from Your tree. Click on the appropriate fields and fill them with latitude and longitude coordinates. After you finished editing the file save it and load it by clicking on Done button. Alternatively use the Load button and navigate to Your previously prepared location coordinates file to load it.
In either case when you click on Done button the Terminal tab should show You how many discrete locations have been read by SPREAD, with their names and corresponding latitude and longitude coordinates. If you forgot to save the edited file You can still copy and paste this output.
figure fig0003.png
Figure 4 Terminal output.

Setting the visualisation attributes

Now that Your data is loaded, You can start setting the attributes.
figure fig03.png
Figure 5 Color chooser.

Generating the visualisations

Once You’re satisfied with the specified attributes click on Generate KML button to generate output for viewing in virtual globe software. If everything went fine You should see a message in Terminal tab indicating how long did it take for SPREAD to generate the file. If something went wrong Spread will also show a warning or an error message there.
figure fig04.jpg
Figure 6 Message in Terminal.
The visualisation can be opened for viewing in Google Earth (www.google.com/earth/). Once opened in GE visualisation will have a slider indicating the time component of the tree. Clicking on the play button starts an animation of the viral diffusion throught time.
Click on Plot map button in SPREAD menu to view the visualisation in the inner SPREAD map.
figure H5N1_discrete_vis.png
Figure 7 Screenshot of the SPREAD output.

3 Identifying well-supported rates using Bayes factors test

This tutorial assumes the user has generated a BEAST log file with rate indicators as described in Bayesian stochastic search variable selection (BSSVS) procedure, as described here http://beast.bio.ed.ac.uk/BSSVS. The test aims at identifying frequently invoked rates to explain the diffusion process and visualize them in virtual globe software or using SPREAD own map.
The log file used in this example can be accessed from here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/H5N1_HA_discrete_rateMatrix.log, the location coordinates file can be accesed and downloaded from here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/locationCoordinates_H5N1.txt.

Loading the data

Go to the Rate Indicator BF tab and load Your log using proper button and a supplied chooser, to navigate to the directory containing Your log file.
figure fig06.png
Figure 8 Loading the log file.
To analyze the results in the log file we will need a tab delimited file with location names and their latitude and longitude coordinates. For H5N1 analysis example this file should look like this:
Fujian 25.917 118.283
Guangdong 22.87 113.483
Guangxi 23.6417 108.1
Hebei 39.3583 116.6417
Henan 33.875 113.5
HongKong 22.3 114.167
Hunan 27.383 111.517
You can either use the location coordinates editor to prepare it, or load a previously saved file. In either case when you click on Done button the Terminal tab should show You how many discrete locations have been read by SPREAD, with their names and corresponding latitude and longitude coordinates. If you forgot to save the edited file You can still copy and paste this output.
figure fig006.png
Figure 9 Terminal output.

Setting the visualisation attributes

Now that Your data is loaded, You can start setting the attributes.

Generating the visualisations

Once You’re happy with the specified plotting attributes click on the Generate KML button. Spread will now output the kml file in its current working directory, you can view this file using google Earth or any other software capable of reading the format. You can also see the visualisation using Spreads own map by clicking the Plot button.
figure H5N1_bf_vis.png
Figure 10 Generated visualisations.
Both plotting and generating kml output file should result in a lists with the rates yielding a bayes factor above the specified cut-off to be printed in the terminal tab. You copy and paste this output for later use. The rates are by default sorted in ascending order.
figure fig08.png
Figure 11 Terminal tab output.

4 Visualising a continuous MCC tree

This tutorial assumes the user has set up a BEAST phylogeographic analysis in continuous space and generated a Maximum Clade Credibility (MCC) tree using TreeAnnotator. Good tutorial on neccessary steps can be found at http://beast.bio.ed.ac.uk/Continuous_phylogeographic_analysis.
This visualisation aims at projecting the MCC tree on the grid of geographical coordinates, with polygons representating the uncertainty in location coordinates and considers raccoon rabies diffusion in north-eastern United States.
The example tree file used in the presented analysis can be accessed from here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/RacRABV_cont_0.8_MCC_snyder.tre.

Loading the data

Click on Continous model tab and load Your MCC tree into SPREAD. If You now open the Terminal tab SPREAD will show message with path to Your file, indicating it has set the working directory to the one Your file is in (generated output will be putted there).
figure fig09.png
Figure 12 Loading tree file.

Setting the visualisation attributes

Generating the visualisations

Once You’re satisfied with the specified attributes click on Generate KML button to generate output for viewing in virtual globe software. If everything went fine You should see a message in the Terminal tab indicating how long did it take for SPREAD to generate the file. The generated output can now be opened for viewing in Google Earth (www.google.com/earth/). Once opened in GE visualisation will have a slider indicating the time component of the tree. Clicking on the play button starts an animation of the viral diffusion through time. Click on Plot map button in SPREAD menu to view the visualisation in the inner SPREAD map.
figure RacRABV_cont_vis.png

Figure 13 Screenshot of the SPREAD output.

5 Summarising full posterior distribution

This tutorial assumes the user has set up a BEAST phylogeographic analysis in continuous space and generated a trees file and an MCC tree file using TreeAnnotator. Good tutorial on neccessary steps can be found at (http://beast.bio.ed.ac.uk/Continuous_phylogeographic_analysis).
This analysis allows one to summarise and visualise full posterior distribution of the trees obtained in continuous phylogeographic analysis. To achieve this SPREAD creates a time line according to the MCC tree length, slices through each phylogeny at a particular points in time, imputes the unobserved descendant locations for those time points and contoures them by creating polygons, a natural representation of the uncertainty in these inferences. In this template we also visualise the MCC tree which gave rise to the time slices by drawing it’s branches.
 
Since Spread version 1.0.2 user can choose to supply custom slice heights instead of automatically defining them according to the MCC tree. This can be done by choosing appropriate analysis type and then loading text file with the custom slice heights, defined in a single column and in ascending order (tips to the root of the phylogeny).
figure fig10.png
Figure 14 Choosing analysis type.
 
We will use the same racoon rabies data as in the continuous tree visualisation. The tree files used in this analysis can be downloaded from http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/RacRABV_cont_0.8_MCC_snyder.tre, the trees file from here http://www.kuleuven.be/aidslab/phylogeography/SPREAD_files/RacRABV_cont.trees.
Note that this analysis is much more resource-hungry and You might want to increase the heap space memory limit for Your Java Virtual Machine. This can be done by starting SPREAD from command line with the following command:
java -jar -Xmx2024m spread.jar

Loading the data

First You need to load the MCC tree that gives rise to the time intervals. Do this using the Load tree file button. Next import the trees file using Load trees file button. Once You’re done start setting the appropriate attributes.

Setting the visualisation attributes

Now that Your data is loaded, You can start setting the plotting attributes. Time Slicer template allows for the following options to be specified:

Generating the visualisations

When You are satisfied with specified attributes click on the Generate KML button to export the results to keyhole markup language or the Plot button to visualise them on a map. Depending on the size of Your trees file the analysis might take some time. You can observe the progress of Your analysis in the Terminal tab.
Once Spread is done You can open the resulting kml file in the virtual globe software and animate the spatial diffusion over time.
figure RacRABV_Time_Slicer_vis.png
Figure 15 Generated output.

6 Tips & tricks

For some Mac users Spread refuses to generate KML output of the Bayes factors test, even though it claims to have done so in the Terminal output. This problem can be fixed by starting Spread with more memory availliable. To do so start Spread from command line with the following arguments:
java -jar -Xmx2024m spread.jar