This is a tutorial on using the program PyRate for the analysis of cultural evolution. PyRate was originally developed by Daniele Silvestro and colleagues in a series of papers. I highly recommended reading these papers for additional details, Methods in Ecology and Evolution and Systematic Biology.

The first application of this methodology to cultural data can be found in Palgrave Communications.

Users of this tutorial should first start by creating a folder on the desktop title PyRate_CE_Tutorial as this folder will be where results are stored. Users should then download the PyRate program from Daniele Silvestro’s GitHub site at Simply click on the clone or download button and choose the download as zip option. Unzip the folder and place the contents of the pyrate_master folder in the PyRate_CE_Tutorial folder. (You will want to see all the files, such as, and pyrate_libraries in the PyRate_CE_Tutorial folder)

This tutorial we use a combination of R (for data manipulation and plotting) and Python (run through a terminal window) for analysis. R (required) and R Studio (recommended) can be downloaded from the following websites (R and R Studio). Commands that use Python will start with “python2.7” and should be run in a terminal window. All other commands will be run in R so they can be copied and pasted into the R console. (Note: This tutorial was created on and for Mac users, please adjust file paths as necessary for a Windows machine)

For more inexperienced users of the R programming environment, many functions require the installation of additional packages. For packages that are not currently installed on the user’s local machine they can be installed by the following commands. (Note: a list of installed packages on the user’s local machine can be found by checking for package updates under the Tools menu)


For this research the following packages will be necessary: RCurl; dplyr; stringr; scales; laser; tidyr

Once the packages have been installed, they will need to be loaded into the current R session (and reloaded every time that R is restarted). Loading libraries can be done with the following commands


Getting the raw data from github

In this tutorial, we will be working with two data sets. The first contains a list of all the car models produced by American manufacturers between 1896 and 2014. This data was used in the Palgrave Communications paper (Gjesfjeld et al. 2016). By going through the tutorial similar results should be obtained. The second data set contains a list of archaeological pottery ware types from the American Southwest that have had there names and dates randomized. Using the commands below, load the data into R from my github site, they data can also be downloaded manually from

car_models <- read.csv(text=getURL(""),header=T)

The first thing we want to do is look at the structure of our data. We can look at the first few entries by using the following command:

##              make           model first_year last_year
## 1        Allstate           A-230       1952      1953
## 2        Allstate           A-240       1952      1953
## 3      AM_General             DJ5       1969      1974
## 4      AM_General             FJ8       1973      1974
## 5      AM_General          Hummer       1992      2001
## 6 American_Austin American_Austin       1930      1935
##           Ware most_recent_age..BP. oldest_age..BP. trait
## 1 Brown_Ware_A                   66             265     1
## 2 Brown_Ware_A                   66             265     5
## 3 Brown_Ware_A                   66             265     3
## 4 Brown_Ware_A                 1130            1265     5
## 5 Brown_Ware_A                 1130            1265     1
## 6 Brown_Ware_A                 1165            1365     2

We can quickly see that the format of the two data sets is different. The car models has a date for the first and last production year of each model, or lineage. In contrast, the pottery data set has multiple dates for each type of ware. This is because these data sets are at different scales. The pottery data contains information for every pottery sherd in our analysis, where as the car model only shows information for each car model (not every single car).

The important difference is that with the pottery data set, we have enough additional information in order to evaluate each pottery ware based on multiple dates. This is particularly useful for archaeological data where we might not be sure that the most recent and oldest ages are completely accurate. Given the detailed historical record of car models, we have high confidence that the first and last years of production for each car model is correct, so we are more interested in knowing the origination and extinction rate rather than the preservation rate. Throughout this tutorial I will refer to the pottery data set as an occurrence data set (as we have dates for each individual occurrence) and the car models as a lineage data set (where we have dates for each lineage). Of course, we can quickly change the pottery data set from occurrence data to lineage data by summarizing the most recent and oldest dates for each ware type into a single entry (i.e. Brown_Ware_A 66 1465). However, we this should only be done if the research is highly confident in the dates being used.

Please also not that the dates in both data sets are in years before present (or BP). This means that 0 is the most recent time unit and the higher the number the further back in time. Please make sure that the your own data follows this date structure (i.e. please do not use Gregorian calendar dates).

Formatting Data For PyRate

Lineage data (car model) for PyRate

In order to use lineage data with PyRate, we must further transform our data into a structure that can be read by the program. We will use the function lineage_pyrate. The main output of this function is an R object and text file titled species_pyrate_data.

##   clade species ts te
## 1     1     307 63 62
## 2     1     308 63 62
## 3     2     861 46 41
## 4     2    1117 42 41
## 5     2    1269 23 14
## 6     3     394 85 80

We can see that the names of the car models and manufacturers have been replaced by numeric values. The lineage_pyrate function also produces two csv files which provide the keys for linking the numeric values to their names, these can be found in your desktop folder. In all of these files we are loosely borrowing from biology nomenclature by referring to the car models as species and the manufacturers of cars as clades. These terms are used in the most generic way possible with species representing our smallest taxonomic unit (car models or lineages) and clades representing a way to group species, in this case by their manufacturer.

It is possible that you may not have a adequate grouping variable (clade) for lineages. If so, simply adjust the numeric clade value to 0 using the following command.


We will also want to write a text file to be used with the PyRate analysis later on.

write.table(species_pyrate_data,"~/Desktop/PyRate_CE_Tutorial/lineage_pyrate_data.txt",quote=FALSE,sep="\t",row.names = FALSE)

Formatting Ocurrence Data for PyRate

As discussed above, data sets in occurrence format (pottery_data) provide the opportunity to incoporate uncertainty about the preservation of artifacts when analyzing origination and extinction rates. If a researcher does not have a high level of certainty that the dates of first and last occurrence are correct, such as is often the case with fossils or artifacts, preservation can be informative. Please see the discussion by Silvestro et al. (2014) for further details. This function will produce an R object title occurrence_pyrate_data that structures the pottery data into four categories.

##        Species  Status min_age max_age trait
## 1 Brown_Ware_A extinct      66     265     1
## 2 Brown_Ware_A extinct      66     265     5
## 3 Brown_Ware_A extinct      66     265     3
## 4 Brown_Ware_A extinct    1130    1265     5
## 5 Brown_Ware_A extinct    1130    1265     1
## 6 Brown_Ware_A extinct    1165    1365     2

The four columns in this object are: Species: which once again refers the smallest taxonomic unit which is each pottery sherd; Status: which identifies whether the sherd or ware is still used today, with archaeological data these are all extinct but could be extant with modern technological data min_age: this refers to the most recent age assigned to each sherd (dates should be in BP) max_age: this refers to the oldest age assigned to each sherd (dates should be in BP) trait: this refers to some discrete trait that is known about each entry. In this example, each trait could represent an archaeological site in which the sherd was found at.

This function writes also writes a text file directly to the PyRate_CE_Tutorial folder on your desktop. This text file then needs to be modified into a python readable file for analysis in PyRate.This is accomplished by calling the pyrate_utilities.r file which will extract the ages and create a readable python file


Additional options, such as replicates, can be included in the extract.ages function. See Silvestro et al. 2015 for details.

A python file titled should have been created and available in the PyRate_CE_Tutorial folder.

Summary Plots

Prior to starting the PyRate analysis, it is good practice to get a sense of the data by developing a series of plots. Using the function below, the following plots are created

  1. Log Lineages Through Time
  2. Cumulative Diversity Through Time
  3. Average Lifespan Through Time
  4. Histogram of Lifespans

Car Data (Lineage Data)

summary_plots(species_pyrate_data,5) #The numeral indicates the x-axis increments with the number 5 indicating 5-year time bins