Dynamic Risk Scales Decay Over Time: Evidence for Reassessment
By Seung C. Lee, Kelly M. Babchishin, Kimberly P. Mularczyk and R. Karl Hanson
Table of contents
Abstract
Many forensic evaluators (e.g., psychologists, psychiatrists) and community supervision officers use dynamic (i.e., putatively changeable) risk assessment tools to determine the reoffending risk of justice-involved individuals. Although developers of dynamic risk tools recommend regular reassessments, there is little research examining the extent to which the predictive accuracy of dynamic risk assessments decays over time. This study investigated how the predictive accuracy of two popular dynamic risk assessment tools for sexual reoffending, the ACUTE-2007 and the STABLE-2007, decay over time using two independent samples of men under community supervision adjudicated for sexual offences (N = 795 for Study 1; N = 4,221 for Study 2). Overall, it was found that reassessments using ACUTE-2007 and STABLE-2007 predicted sexual, violent, and any recidivism (including technical violations) better than the initial assessments of those tools. This study further found that the more recent an assessment is (i.e., closer proximity), the better the predictive accuracy. In conclusion, reassessments using the ACUTE-2007 should occur during every meeting with supervisees (e.g., probationers, parolees) or at least every 30 days during the community supervision. For STABLE-2007, current results support reassessment every six months vs. 12 months. Given that updating scores of STABLE-2007 requires more extensive effort (e.g., interview and review of file information), however, the decision about when to update the STABLE-2007 needs to balance increasing predictive accuracy for more recent assessment against the cost of new assessments.
Author’s Note
The views expressed are those of the authors and do not necessarily reflect those of Public Safety Canada. Correspondence concerning this report should be addressed to:
Research Division
Public Safety Canada
340 Laurier Avenue West
Ottawa, Ontario
K1A 0P8
Email: PS.CPBResearch-RechercheSPC.SP@ps-sp.gc.ca
Acknowledgements
The authors would like to thank Leigh Greiner (BC Corrections) for providing access to this dataset and L. Maaike Helmus for merging the administrative datasets and saving us considerable work in the process. The authors would like to thank Amel Loza-Fanous for her feedback on this report.
Declaration of Conflicting Interests: R. Karl Hanson is a co-author and a certified trainer for STABLE-2007 and ACUTE-2007. The Government of Canada holds the copyright for these measures, and none of the authors receives royalties from these measures.
Introduction
Risk assessment is a central component of effective correctional interventions. Correctional officers utilize risk assessment tools to identify individuals at high risk for reoffending. Risk assessment tools also help to indicate which individuals may require more intensive interventions (Risk-principle) to manage their criminogenic needs and reduce the likelihood of reoffending (Need-principle; Andrews & Bonta, 2010; Bonta & Andrews, 2017; Hanson et al., 2015). A wide variety of structured risk assessment tools have been developed to classify the risk levels of individuals adjudicated for different types of offences (e.g., violent, sexual, and general offences; Bourgon et al., 2018; Kelley et al., 2020; Neal & Grisso, 2014). Such structured risk assessment tools are similar in terms of predictive accuracy for measuring risk to reoffend (Campbell et al., 2009; Hanson & Morton-Bourgon, 2009; Tully et al., 2013; Yang et al., 2010).
Structured risk assessment tools include factors that can be classified as static or dynamic (Bonta & Andrews, 2017; Hanson, 1998). Risk scores with static risk factors—which are fixed features of individuals, like demographics or their criminal history—can inform estimated recidivism risk and intervention strategies (Hanson et al., 2017). Static risk tools, however, are poorly equipped to assess changes related to reductions or increases in risk-relevant factors (Hanson, 1998; Harris & Hanson, 2010). Dynamic risk tools, in contrast, include risk factors that are amenable to change or intervention. Thus, scoring the dynamic tools may require more professional expertise and time to score than simple static risk tools but have greater potential for supporting inferences concerning treatment needs and case formulation (Polaschek & Yesberg, 2018; Wong et al., 2009). The items measured within a dynamic assessment can help identify the characteristics of an individual that are conducive to change and that, when targeted, can reduce their overall risk of reoffending.
Dynamic risk factors can be further defined as being stable or acute. Stable dynamic risk factors are durable, changing infrequently over months to years (Hanson & Harris, 2000; Serin et al., 2019). Examples of stable dynamic risk factors include emotion regulation, poor impulse control, problem-solving, and work ethic (Andrews & Bonta, 2010; Polaschek & Yesberg, 2018; Zamble & Quinsey, 1997). Acute dynamic risk factors are those that change quickly, over minutes to days (e.g., access to victims; Hanson & Harris, 2000; Serin et al., 2019). Reoffending is theorized to occur when environmental triggers interact with stable dynamic risk factors (enduring psychological vulnerabilities), which can be buffered or amplified by acute dynamic risk factors (Polaschek & Yesberg, 2018; Zamble & Quinsey, 1997). When risk-related changes in the environment occur, the assessment of acute risk factors (e.g., loss of employment, victim access) can be particularly crucial. Thus, circumstantial changes in the lives of individuals serving sentences in the community can lead to violations of parole or new offences (Andrews & Bonta, 2010; Douglas & Skeem, 2005; Serin et al., 2019).
Given that several benefits accompany the use of both static and dynamic risk tools, evaluators commonly use both types of tools. Static tools can be used to estimate recidivism risk in tandem with measures of dynamic risk factors, which assist therapists and case managers in making informed “real-time” adjustments to their interventions (Seto & Fernandez, 2011). There is widespread agreement that risk to reoffend changes (Blumstein & Nakamura, 2009; Hanson, 2018; Harris & Rice, 2007); despite this, there is a shortage of research examining the timing and extent to which risk to reoffend measured by dynamic risk tools are changeable (e.g., decay) and whether incorporating observed changes on dynamic tools improve prediction.
Assessing Change in Recidivism Risk
Reassessment of dynamic risk assessment tools improves the prediction of criminal recidivism (Babchishin & Hanson, 2020; Cohen et al., 2016; de Vries Robbé et al., 2015; Howard & Dixon, 2013; Labrecque et al., 2014; Lloyd et al., 2020; for an exception, see Viljoen et al., 2017). Specifically, reassessment has been found to add incremental predictive validity to initial risk assessments, and the most proximal risk assessments predicted reoffending best. Promising results have also been found for dynamic risk tools designed specifically to measure sexual recidivism (Babchishin & Hanson, 2020; van den Berg et al., 2018). Given that the most recent risk score predicts reoffending better than the initial risk score (e.g., Babchishin & Hanson, 2020; Hanson et al., 2021; Lloyd et al., 2020), it follows that the predictive accuracy of dynamic tools decays after a certain amount of time has passed (i.e., the decay of predictive accuracy due to risk-relevant change). The decay of predictive accuracy may be more pertinent for acute risk factors than stable risk factors. Acute risk factors are meant to change rapidly, in contrast to stable risk factors that are considered relatively enduring qualities.
Dynamic Predictions of ACUTE-2007 and STABLE-2007
The ACUTE-2007 (Brankley et al., 2019; Hanson et al., 2007) and STABLE-2007 (Fernandez et al., 2014; Hanson et al., 2007; Hanson et al., 2015) are two dynamic tools designed to assess the likelihood of sexual recidivism. Both the ACUTE-2007 and STABLE-2007 are widely used by correctional officers, forensic experts, and mental health practitioners (Bourgon et al., 2018; Hill & Demetrioff, 2019; Kelley et al., 2020; Neal & Grisso, 2014). The STABLE-2007 includes thirteen dynamic items assessing atypical sexual interests, emotional identification with children, and relationship stability. The ACUTE-2007 evaluates imminent indicators of risk, such as preoccupation with sexual fantasies and victim access (Harris & Hanson, 2010). The chronic propensities associated with sexual recidivism are those assessed by the STABLE-2007, broadly grouped into sex crime specific factors (atypical sexual interests, emotional congruence with children, low sexual self-regulation) and general criminality (antisocial peers, hostility, impulsivity, opposition to supervision; Brouillette-Alarie & Hanson, 2015).
Early research on the predictive accuracy of ACUTE-2007 and STABLE-2007 focused on scores from the first risk assessments conducted. These results found that the first assessment on these dynamic risk tools predicted recidivism and still discriminated between recidivists and non-recidivists up to five years later (Brankley et al., 2021; Hanson et al., 2007; Hanson et al., 2015). Further, the first dynamic risk assessment scores incrementally contributed to predictive accuracy after accounting for static scores (e.g., Static-99/R; Babchishin & Hanson, 2020; Hanson et al., 2007; Helmus et al., 2021). However consistent with the findings of other dynamic change studies (e.g., Lloyd et al., 2020; Hanson et al., 2021), recent studies have found that reassessment with ACUTE-2007 improves predictive accuracy (Babchishin & Hanson, 2020; Babchishin et al., 2020). Specifically, the most proximal scores of the ACUTE-2007 risk tool predicted sexual recidivism better than the first assessment scores (Babchishin & Hanson, 2020).
Risk tool developers recommend a consistent reassessment of ACUTE-2007 (e.g., at each meeting with supervisees) and STABLE-2007 (e.g., every six to twelve months), and this practice of reassessment is currently implemented in the field of community corrections (Brankley et al., 2019; Fernandez et al., 2014; Hanson et al., 2007). There is, however, a lack of empirical evidence on how often reassessments of dynamic risk should be conducted for optimal predictive accuracy. Therefore, this study seeks to ascertain the predictive accuracy decay rate of these measures after specific periods have elapsed.
Cox Regression with Time-Dependent Covariates for Dynamic Predictions
Although there is no conventional approach to evaluate dynamic predictions, one common method is Cox regression survival analysis with time-dependent covariates (Altman & de Stavola, 1994; Singer & Willet, 2003). There are several advantages to using this method to examine whether including dynamic scores improves prediction. First, Cox regression survival analysis manages the incomplete follow-up time (Singer & Willet, 2003). In longitudinal data analyses, the follow-up time for each individual typically varies due to different start times (i.e., the date they are released into the community in a given study) and end times (e.g., the date they reoffended, died, or discontinued the study). Second, Cox regression does not limit the number of assessments and allows unequal time intervals between assessments. In contrast, the pre-post change analyses, which are standard in studies of the institutional treatment, require only two assessments with similar intervals before and after the treatment. Despite being commonly used, it is difficult to differentiate true change from measurement error (e.g., regression to the mean) with pre-post change analyses (Singer & Willet, 2003).
Cox regression allows for time-dependent covariates, such as dynamic risk scores. Time-dependent covariates, however, require models that impute the expected score at the time of the outcome. The need for models forces the data analyst to consider how risk scores change, which, in turn, produces different models that can be tested against each other. For example, Cox regression can compare the predictive accuracy of the following ways of defining dynamic risk scores: 1) use only the score from the first assessment (a “static” one-time assessment of potentially dynamic factors), 2) update the value of the predictor with each new assessment (the fully dynamic model), or 3) use scores from different time periods prior to the recidivism event (e.g., 30 days, 180 days). This last type of analysis (using a range of time periods) can identify the extent to which there is decay in predictive accuracy as assessments get more distal from the recidivism event.
Present Study
The purpose of the current study was to explore the extent to which the predictive accuracy of two common dynamic sexual recidivism risk assessment tools (ACUTE-2007 and STABLE-2007) decays over time. Specifically, the predictive accuracy of these risk tools were compared in a fully dynamic model and when the assessments were conducted within 30 days, 45 days, 60 days, 120 days, or 180 days prior to recidivism.
A recent study (Babchishin & Hanson, 2020) found that the most recent ACUTE-2007 assessments predicted recidivism better than the first assessment, and were also more accurate than other methods of predicting recidivism (e.g. examining the highest or lowest risk score). Based on these findings, it was hypothesized that the predictive validity of the ACUTE-2007 would diminish over time. As well, given the findings from other similar studies (Lloyd et al., 2020), it was also hypothesized that the predictive validity of the STABLE-2007 would diminish over time but that it would diminish at a slower rate than the acute dynamic tool.
General Method
Overview
The present research included two independent studies: Study 1 with the developmental sample (i.e., the Dynamic Supervision Project [Hanson et al., 2007]) that was used to develop the ACUTE-2007 and STABLE-2007 tools and Study 2 with the administrative, field validity sample that was built for day-to-day supervision of clients from British Columbia Corrections. Sample descriptions are provided in the respective Participant sections.
Measures
ACUTE-2007 (Hanson et al., 2007)
The ACUTE-2007 is an empirically-derived risk tool used to assess and track rapid changes in sexual reoffending risk over time by evaluating acute dynamic risk factors for adult males who were charged with or convicted of a sexually motivated offence. The ACUTE-2007 has seven items (e.g., victim access, sexual preoccupation, substance abuse). The items are considered to represent the current expressions of chronic, risk-relevant factors (Fernandez et al., 2014), and the total scores are calculated by summing all item scores (ranging from 0 to 21; higher scores indicate higher acute dynamic risk). Research suggests that it measures one latent factor, and the measurement model is invariant across time (Babchishin & Hanson, 2020). Reassessment of the ACUTE-2007 at each scheduled meeting with supervisees is recommended. The scoring of the ACUTE-2007 requires an additional five to ten minutes to a routine supervisory session (Hanson et al., 2007).
The ACUTE-2007 was previously found to predict sexual, violent, and any recidivism and to add predictive accuracy above that of static risk tools (Hanson et al., 2007). Further, the most recent ACUTE-2007 score or the average of all previous ACUTE-2007 scores are more predictive of recidivism than the first ACUTE-2007 score or the most extreme ACUTE-2007 score (smallest or largest; Babchishin & Hanson, 2020). In the current study, ACUTE-2007 total scores were only calculated for individuals who had no missing items, given that there are only seven items of ACUTE-2007.
STABLE-2007 (Fernandez et al., 2014; Hanson et al., 2007)
The STABLE-2007 was designed to measure stable dynamic risk factors for adult males who were charged with or convicted of a sexually motivated offence. The STABLE-2007 is one of the most widely used measures of dynamic risk for sexual recidivism (Kelley et al., 2020; McGrath et al., 2010). The STABLE-2007, for example, is used by probation officers in England, Ireland, and Wales to identify risk-relevant issues for their case reports and improve their confidence and consistency in decision making (McNaughton Nicholls et al., 2010; Walker & O’Rourke, 2013).
The STABLE-2007 has 13 items (e.g., cooperation with supervision, deviant sexual interests, emotional identification with children, impulsivity), and the total scores are calculated by summing all item scores (ranging from 0 to 26 or 0 to 24 for individuals who did not offend against a child who was less than 14 years old). Higher scores indicate higher stable dynamic risk).
The STABLE-2007 is scored by trained evaluators (e.g., parole officers, psychologists) based on information collected during an interview and a review of available file information and, if possible, consultation with collateral informants (e.g., spouse). The interview usually takes 90 to 120 minutes, although the time decreases with increased experience and prior knowledge of the case (Fernandez et al., 2014). Reassessment of the STABLE-2007 every six to twelve months is recommended (Fernandez et al., 2014).
Based on 21 studies (n = 6,955) from Canada, U.S., U.K., and Austria, a meta-analysis concluded that considering the STABLE-2007 along with the Static-99R (a static, actuarial risk tool) significantly improves the prediction of sexual, violent, and any recidivism (Brankley et al., 2021). In the current study, as recommended by the STABLE-2007 user guidance manual, scores were calculated for individuals only if there was no more than one item with missing information (e.g., the emotional identification with children item; Fernandez et al., 2014).
Recidivism
Three different recidivism outcomes, including sexual recidivism, violent recidivism, and any recidivism (including technical violations) were examined. Sexual recidivism was defined in Study 1 as any crimes with sexual motivation (contact and non-contact offences) after release, whether or not the name of charge/conviction was explicitly sexual (e.g., break and enter conviction, but the nature of crime shows individual was motivated to commit a sexual assault). The definition of sexual recidivism in Study 1 also included sexual breaches, defined as official sanctions for sexually motivated violations of the conditions of community supervision (e.g., being in the company of children contrary to a supervision condition). In Study 2, sexual recidivism was identified only when the name of charge/conviction (contact or non-contact sexual offences) explicitly included sexual motivation (e.g., sexual assault).
Violent recidivism was defined as all crimes that involved a confrontation with the victim. Violent recidivism included contact sexual offences but excluded non-contact sex offences and sexually motivated breaches. Any recidivism was defined as all sexual, violent, or non-violent crimes, as well as all technical offences (e.g., violation of conditional release), regardless of whether they were sexually motivated or not. The category of “any recidivism” incorporated the above two recidivism categories (sexual and nonsexual violent), with the addition of non-violent offences and technical offences.
Procedure
The datasets were constructed as a “person-period” format for discrete-time survival analysis (Singer & Willet, 2003). That is, there was a separate assessment record (i.e., a new row) for each new time period of risk assessments within an individual level. Each individual could have multiple risk assessment records (in multiple rows) in order from the initial assessments (i.e., baseline) to the time that they were released into the community. This format accommodates varying periods of each assessment, with the subsequent assessment date marking the end date of the previous assessment.
Further, the risk score was organized into equally projected forward timeframes (i.e., 30 days, 45 days, 60 days, 120 days, and 180 days). First, the given projected times were artificially forwarded from the first assessment of each individual, and any assessments scored within this first projected time were replaced by the first assessment score. For example, assume that an individual had a score of 2 at the first assessment after release and the second assessment occurred after 15 days from the first assessment (a score of 4). With the 30-day timeframe, the risk score of 4 at the second assessment is replaced by the score 2 at the first assessment.
Next,the closest risk assessments from the previously projected time were selected as the next assessment, and any assessments scored within the projected time were replaced by the first score of the time frame. The same process is repeated until the last assessment. The person-period format presents outcome information (e.g., recidivism events) in the last row for everyone. Given the different lengths of the projected time across models, more recidivism events occurred within longer projected time frames.
Plan of Analysis
Harrell’s C index (Harrell et al., 1996)
Harrell’s C index was used to compare the predictive accuracy (discrimination) across different fixed follow-up timeframes as it estimates the probability that, in a randomly selected pair of individuals, the individual with a higher risk score will reoffend before the other. Harrell’s C is calculated from survival data and does not require fixed follow-up times. Harrell’s C can vary between 0 and 1, with .50 indicating the level of prediction that would be expected by chance. Given its similarity to the Area Under the Curve (AUC), similar interpretations of effect size magnitudes are applicable (i.e., the effect of .56 is small, .64 is moderate, and .71 is large; Helmus & Babchishin, 2017; Rice & Harris, 2005). The Harrell’s C index analysis was conducted using the R function survConcordance of the “survival” package (Version 3.1-11; Therneau, 2020) from the statistical software R (Version 4.0.0; R Core Team, 2013). Although the C values provide some guidance concerning the relative predictive accuracy of the different models, the C values cannot be directly compared using standard statistical tests because the comparisons would involve non-nested models, each with a different number of recidivists (Volinsky & Raftery, 2000).
Cox Regression with Time-Dependent Covariates (Singer & Willet, 2003)
A series of Cox regressions with time-dependent covariates were conducted to examine the extent to which different models that integrate multiple assessments perform better at prediction. Three models were tested: 1) initial baseline scores (first assessment since released), 2) reassessment scores every 30 days and every 180 days, and 3) fully dynamic scores as assessed in the field within the time frames. For example, consider an individual who was assessed on March 1, then again on March 14, and was known to have reoffended on March 20th. In the 30- day analysis, the risk scores used for Cox regression would be those from the March 1 assessment. For the fully dynamic model, the risk scores would be those from March 14. If the individual was known to have reoffended April 15, the individual would be considered a non-recidivist in the 30-day and a recidivist in the 180-day analysis.
Given the non-nested models, comparing models requires fit indices. One of the most commonly used fit indices is the Bayesian Information Criterion (Raftery, 1995; Volinsky & Raftery, 2000). The Bayesian Information Criterion starts with the difference in the observed and predicted values (as indexed by -2 time the log-likelihood [-2LL]) and then adds a penalty proportional to the number of predictor variables (BIC = -2 LL + [k*ln(n)], where k is the number of parameters and n is the number of recidivists; Raftery, 1995; Volinsky & Raftery, 2000). Smaller BIC values suggest better fitting models. BIC differences of 0-2, 2-6, 6-10, and 10 and higher, respectively, represent “weak,” “positive,” “strong,” and “very strong” evidence of model fit (Gordon, 2012).
Study 1
Participants
This study included 795 individuals from a study of community supervision outcomes known as the Dynamic Supervision Project (Hanson et al., 2007; Hanson et al., 2015) that was used to develop the ACUTE-2007 and STABLE-2007 tools. All individuals in this study were adult males starting a period of community supervision (probation or parole) between 2001 and 2005 following a conviction for a sexual offence with the follow-up period until 2011. On average, the individuals were 39 years old (SD = 13.5, range from 18 to 84 years), and 72% (566/789) had no prior charges or convictions for a sex offence. About 45% of the sample had victimized children who were 12 or younger, 33% had victimized adults who were 18 or older, and 12% had committed non-contact sexual offences. Approximately 20% of the individuals self-identified as being of Indigenous heritage, and 5% had previously been diagnosed as developmentally delayed (low intellectual functioning). No information was available in this data set for other offender groups.
Information concerning new offences was gathered from reviews of provincial and national criminal history records, as well as from supervising officers, local police jurisdictions, and searches of newspaper databases. The average length of follow-up was 6.5 years (SD = 2.6, Mdn = 7.5, ranging from 0.1 to 10.1 years). The sample had a total of 6,656 ACUTE-2007 and 1,243 STABLE-2007 assessments. The average number of ACUTE-2007 assessments per individual was 9.1 (SD = 9.0, Mdn = 6.0, ranging from 1 to 69), and of STABLE-2007 assessments per individual was 1.6 (SD = 0.9, Mdn = 1.0, ranging from 1 to 5; Table 1). Average time between ACUTE-2007 assessments was 35 days (SD = 38, Mdn = 28) and 242 days (SD = 100, Mdn = 203) for the STABLE-2007 (Table 1).
Variables | Study 1 (n = 795) | Study 2 (n = 4,221) | ||
---|---|---|---|---|
M (SD) | % (n/N) | M (SD) | % (n/N) | |
Demographic information | ||||
Age |
39.5 (13.5) |
40.8 (13.7) |
||
Racialized group | ||||
White |
|
|
63.1% (2,575/4,078) |
|
Indigenous status |
|
19.8% (153/773) |
|
22.2% (906/4,078) |
East Indian |
|
|
3.8% (153/4,078) |
|
East Asian |
|
|
2.8% (113/4,078) |
|
Black |
|
|
1.4% (58/4,078) |
|
Hispanic |
|
|
1.4% (58/4,078) |
|
Average score of risk scales | ||||
Static-99R |
2.5 (2.3) |
2.4 (2.5) |
||
Static-99R risk level |
||||
I |
4.2% (33/789) |
5.2% (216/4,165) |
||
II |
15.7% (124/789) |
17.6% (733/4,165) |
||
III |
- |
48.9% (386/789) |
46.0% (1,914/4,165) |
|
IVa |
21.8% (172/789) |
20.6% (860/4,165) |
||
IVb |
9.4% (74/789) |
10.6% (442/4,165) |
||
ACUTE-2007a |
1.9 (2.2) |
2.2 (2.4) |
||
STABLE-2007a |
7.8 (4.8) |
7.5 (4.8) |
||
Average number of risk assessments | ||||
ACUTE-2007 |
9.1 (9.0), Mdn = 6 |
13.7 (11.1), Mdn = 11 |
||
STABLE-2007 |
1.6 (0.9), Mdn = 1 |
2.6 (1.8), Mdn = 2 |
||
Average days between risk assessments | ||||
ACUTE-2007 |
35.2 (37.7), Mdn = 28 |
40.4 (76.3), Mdn = 29 |
||
STABLE-2007 |
242.1 (100.2), Mdn = 203 |
244.6 (212.7), Mdn = 190 |
||
Average follow-up years |
6.5 (2.6) |
4.6 (2.5) |
|
Note. a Average scores of the first assessment
Results and Discussion
ACUTE-2007
The first analysis examined whether reassessment improved prediction (Table 2). For these analyses, recidivism was counted only if it occurred within 180 days of the last assessment. An individual who reoffended after 181 days would be counted as a non-recidivist. Three models were tested with this data structure: a) baseline, i.e., the first assessment as a “static” variable, b) the fully dynamic model in which the scores were updated with each new assessment, and c) the 180-day model in which the scores were updated only once during each 180-day period. The fully dynamic model fits the data best, followed by the 180-day model for all the recidivism outcomes (sexual, violent, and any). The first baseline assessment was consistently the weakest predictor, although still statistically significant. In Study 1, the differences between the fully dynamic model and the 180-day model for the ACUTE-2007 were not strong (BIC values of 4.4, 5.0, and 5.2, for sexual, violent, and any recidivism, respectively).
As a second examination of the value of reassessment, the dataset was reorganized such that recidivism was only counted if it occurred within 30 days of the last assessment. The same three models were then tested: a) baseline (first) assessment, b) fully dynamic, and c) scores updated every 30 days. As can be seen in Table 3, the baseline model was again consistently the worst fit to the data. There were only small differences between the fully dynamic and the 30-day model. In Study 1, the 30-day model was best for all recidivism outcomes. The BIC differences, however, tended to be small, particularly for sexual recidivism and for violent recidivism ( ∆ BIC < 3.0). Nevertheless, there was sufficient evidence that reassessment improved prediction to justify examining patterns of decay in predictive accuracy for different time frames.
The next set of ACUTE-2007 analyses (Table 4) compare the predictive accuracy of risk scores over time periods ranging from 30 days to 180 days. For each time period, recidivism events that occurred after the time limit was not considered. Consequently, the number of recidivism events varied for each analysis, being largest for the 180-day time period and the smallest for the 30-day time period. In Study 1, there was a consistent pattern such that ACUTE-2007 scores that had shorter projected timeframes (30 and 45 days) had a higher predictive accuracy compared to the 120 days or 180 days projections. For example, for any recidivism (Table 4; Study 1), the C values were .73 and .72 for 30 days and 45 days, respectively, compared to .69 at 120 days and .66 at 180 days. For ACUTE-2007 in Study 1, the rank order correlation (Kendall’s tau) between the C values and the length of the timeframe was r𝜏 = -.467 (expected ranks [1 to 5] nested within outcomes).
Consistent with similar analyses of another version of this dataset (Babchishin & Hanson, 2020), the current analyses found strong evidence that reassessing using ACUTE-2007 improves prediction over the initial baseline assessment. Current user recommendations are to score ACUTE-2007 after each meeting with supervisees during community supervision, but not more than once a week (Brankley et al., 2019). In Study 1, most assessments were scored approximately one month apart (median = 28 days), a frequency of contact consistent with supervisory practices in Canada at that time. The current analyses found the highest predictive accuracy for assessments 30 days or 45 days prior to the recidivism event and that there was no advantage for the fully dynamic model over the 30 day projections. There would be little empirical difference, however, between the fully dynamic model and the 30 day projections because the average gap between assessments was approximately 30 days. Nevertheless, the results support current recommendations to rescore the ACUTE-2007 every 30 days. Although decay was evident between the fully dynamic model and the 180 day projections, the 45 day projections had similar effect sizes to the 30 day projections.
STABLE-2007
Direct comparisons between different dynamic models were not possible in Study 1, given that most cases were assessed only once. It was possible, however, to examine how the predictive accuracy of STABLE-2007 changed based on proximity to the recidivism events. As can be seen in Table 4, the closer the assessment to the recidivism event, the greater the predictive accuracy. There was a strong linear relationship between the rank order of the follow-up time and the size of Harrell’s C (Kendall tau = - .993, with expected ranks nested within recidivism types). The C values for the short timeframes (30 days and 45 days) were large (.79 to .90) and larger than the AUC values typically observed for recidivism risk tools (which are typically in the .70 range). The results from Study 1 generally support the value of reassessments of the STABLE-2007. Direct comparisons between different timeframes, however, require more assessments, which were available in Study 2.
|
Sexual Recidivism (43/736) |
Violent Recidivism (49/734) |
Any Recidivism (131/721) |
||||||
---|---|---|---|---|---|---|---|---|---|
Study 1 – ACUTE-2007 |
C [95% CI] |
BIC |
∆ BIC |
C [95% CI] |
BIC |
∆ BIC |
C [95% CI] |
BIC |
∆ BIC |
Baseline |
.666 [.573, .759] |
488.54 |
7.88 |
.642 [.555, .728] |
552.71 |
7.79 |
.635 [.582, .688] |
1,489.31 |
19.89 |
180 Days |
.695 [.602, .787] |
485.04 |
4.37 |
.649 [.563, .735] |
549.88 |
4.96 |
.658 [.605, .711] |
1,474.58 |
5.17 |
Dynamic |
.692 [.600, .783] |
480.66 |
ǂ |
.641 [.536, .746] |
544.92 |
ǂ |
.667 [.582, .752] |
1,469.42 |
ǂ |
Sexual Recidivism (104/4,108) |
Violent Recidivism (316/4,022) |
Any Recidivism (727/3,929) |
|||||||
---|---|---|---|---|---|---|---|---|---|
Study 2 - ACUTE-2007 |
C [95% CI] |
BIC |
∆ BIC |
C [95% CI] |
BIC |
∆ BIC | C [95% CI] |
BIC |
∆ BIC |
Baseline |
.626 [.565, .687] |
1,546.90 |
13.62 |
.634 [.599, .670] |
4,729.40 |
37.17 |
.678 [.654, .702] |
10,969.60 |
110.18 |
180 Days |
.652 [.590, .713] |
1,542.38 |
9.10 |
.635 [.600, .671] |
4,722.27 |
30.04 |
.680 [.656, .704] |
10,943.50 |
84.14 |
Dynamic |
.671 [.610, .732] |
1,533.28 |
ǂ |
.666 [.631, .701] |
4,692.23 |
ǂ |
.708 [.684, .732] |
10,859.40 |
ǂ |
|
Sexual Recidivism (71/4,221) |
Violent Recidivism (264/4,134) |
Any Recidivism (632/4,059) |
||||||
---|---|---|---|---|---|---|---|---|---|
Study 2 – STABLE-2007 |
C [95% CI] |
BIC |
∆ BIC |
C [95% CI] |
BIC |
∆ BIC |
C [95% CI] |
BIC |
∆ BIC |
Baseline |
.686 [.607, .765] |
1,048.16 |
7.78 |
.647 [.605, .688] |
3,889.50 |
17.22 |
.693 [.666, .721] |
9,488.79 |
17.60 |
180 Days |
.697 [.617, .776] |
1,042.09 |
1.71 |
.652 [.610, .693] |
3,872.45 |
5.75 |
.692 [.664, .719] |
9,485.65 |
14.47 |
Dynamic |
.701 [.622, .780] |
1,040.38 |
ǂ |
.657 [.609, .706] |
3,866.70 |
ǂ |
.696 [.669, .724] |
9,471.19 |
ǂ |
Note. C: Harrell’s concordance index. ǂ : Reference group (the best fitting model). BIC ∆ : BIC difference, with 0-2“weak”, 2-6 “positive”, 6-10 “strong”, and 10+ “very strong” evidence for a better model fit. Bolded represents statistically significant predictors (p < .05).
Sexual Recidivism (21/736) | Violent Recidivism (19/734) | Any Recidivism (61/721) | |||||||
---|---|---|---|---|---|---|---|---|---|
Study 1 – ACUTE-2007 | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC |
Baseline |
.703 [.575, .830] |
225.45 |
9.34 |
.677 [.538, .816] |
196.04 |
7.36 |
.703 [.627, .779] |
635.19 |
19.61 |
30 Days |
.733 [.606, .860] |
216.10 |
ǂ |
.664 [.526, .803] |
188.68 |
ǂ |
.732 [.656, .807] |
615.57 |
ǂ |
Dynamic |
.713 [.586, .839] |
218.90 |
2.80 |
.670 [.531, .808] |
189.16 |
0.48 |
.723 [.648, .798] |
620.01 |
4.44 |
Sexual Recidivism (68/4,108) | Violent Recidivism (187/4,022) | Any Recidivism (458/3,930) | |||||||
---|---|---|---|---|---|---|---|---|---|
Study 2 – ACUTE-2007 | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC |
Baseline |
.671 [.595, .747] |
927.92 |
8.04 |
.653 [.607, .698] |
2,651.96 |
28.74 |
.689 [.660, .719] |
6,529.71 |
66.77 |
30 Days |
.687 [.611, .763] |
921.71 |
1.83 |
.667 [.622, .713] |
2,626.07 |
2.84 |
.708 [.679, .737] |
6,470.73 |
7.78 |
Dynamic |
.692 [.616, .768] |
919.88 |
ǂ |
.672 [.636, .708] |
2,623.23 |
ǂ |
.711 [.682, .740] |
6,462.94 |
ǂ |
Sexual Recidivism (24/4,221) | Violent Recidivism (74/4,134) | Any Recidivism (196/4,059) | |||||||
---|---|---|---|---|---|---|---|---|---|
Study 2- STABLE-2007 | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC | C [95% CI] | BIC | ∆ BIC |
Baseline |
.787 [.614, .960] |
303.70 |
4.44 |
.649 [.559, .739] |
967.00 |
5.01 |
.745 [.693. .797] |
2,639.47 |
5.78 |
30 Days |
.793 [.620, .966] |
299.26 |
ǂ |
.652 [.563, .742] |
961.99 |
ǂ |
.746 [.694, .798] |
2,633.69 |
ǂ |
Dynamica |
.793 [.620, .966] |
299.26 |
ǂ |
.652 [.563, .742] |
961.99 |
ǂ |
.746 [.694, .798] |
2,633.69 |
ǂ |
Note. C: Harrell’s concordance index. ǂ : Reference group (the best fitting model). BIC ∆ : BIC difference, with 0-2“weak”, 2-6 “positive”, 6-10 “strong”, and 10+ “very strong” evidence for a better model fit. Bolded represents statistically significant predictors (p < .05). a The 30-day and dynamic models were identical because there were no reassessments within the 30-day timeframes.
|
Sexual Recidivism |
Violent Recidivism |
Any Recidivism |
|||
---|---|---|---|---|---|---|
C [95% CI] |
# of recidivists /N |
C [95% CI] |
# of recidivists /N |
C [95% CI] |
# of recidivists /N |
|
ACUTE-2007a | ||||||
30 Days |
.733 [.606, .860] |
21/736 |
.664 [.526, .803] |
19/734 |
.732 [.656, .807] |
61/721 |
45 Days |
.743 [.628, .858] |
27/736 |
.681 [.555, .807] |
24/734 |
.715 [.646, .784] |
75/721 |
60 Days |
.688 [.588, .789] |
35/736 |
.650 [.538, .761] |
30/734 |
.688 [.625, .751] |
90/721 |
120 Days |
.6950 [.596, .794] |
35/736 |
.683 [.577, .789] |
32/734 |
.691 [.635, .748] |
110/721 |
180 Days |
.6947 [.602, .787] |
43/736 |
.649 [.563, .735] |
49/734 |
.658 [.605, .711] |
131/721 |
STABLE-2007b | ||||||
30 Days |
.895 [.658, .999] |
12/795 |
.903 [.633, .999] |
9/794 |
.824 [.644, .999] |
21/790 |
45 Days |
.819 [.636, .999] |
17/795 |
.804 [.603, .999] |
14/794 |
.790 [.650, .929] |
33/790 |
60 Days |
.818 [.645, .992] |
20/795 |
.799 [.614, .985] |
18/794 |
.773 [.648, .898] |
40/790 |
120 Days |
.697 [.570, .824] |
33/795 |
.679 [.547, .811] |
31/794 |
.697 [.607, .788] |
67/790 |
180 Days |
.700 [.590, .811] |
40/795 |
.642 [.527, .758] |
37/794 |
.691 [.610, .772] |
78/790 |
Note. C: Harrell’s concordance index. Bolded represents statistically significant predictors (p < .05). aTotal of 6,604 ACUTE-2007 assessments for sexual recidivism, 6,656 assessments for violent recidivism, and 6,238 assessments for any recidivism. Kendall’s tau between recency and C values = -.467 for ACUTE-2007. bTotal of 1,239 STABLE-2007 assessments for sexual recidivism, 1,243 assessments for violent recidivism, and 1,216 assessments for any recidivism. Kendall’s tau between recency and C values = -.933 for STABLE-2007.
Study 2
Participants
Study 2 included 4,221 adult males who were provincially sentenced for a sexual offence (i.e., less than two years) and supervised in the community between 2005 and 2013 by British Columbia Corrections (see Helmus et al., 2021). The follow-up period was until 2013. Like Study 1, the individuals in Study 2 were, on average, 40 years old (SD = 13.7, range from 18 to 90 years), and 73% (3,057/4,166) had no prior charges or convictions for a sex offence. Of the total sample, 63% were White, 22% were from Indigenous heritage, and 15% from other ethno-cultural groups (e.g., Black, East Asian, Hispanic).
The recidivism information included all charges and convictions within the province of BC up to June 4, 2013. Charges occurring outside BC would not have been included in this study. The average length of follow-up was 4.6 years (SD = 2.5, Mdn = 4.5, ranging from 0.1 to 8.5 years). The ACUTE-2007 and STABLE-2007 assessments were completed by probation officers between December 13, 2004, and June 4, 2013. During the follow-up time, the sample had a total of 56,091 ACUTE-2007 assessments; the average number of assessments per individual was 13.7 (SD = 11.1; Mdn = 11.0; ranging from 1 to 85; Table 1). The sample had a total of 11,101 STABLE-2007 assessments, and the average number of assessments per individual was 2.6 (SD = 1.8; Mdn = 2.0; ranging from 1 to 12; Table 1). Average time between ACUTE-2007 assessments was 40 days (SD = 76, Mdn = 29) and 245 days (SD = 213, Mdn = 190) for the STABLE-2007 (Table 1).
Results and Discussion
ACUTE-2007
Like Study 1, Study 2 found strong evidence that reassessment improves predictions. As can be seen in Table 2, the 180 day reassessments were better than the baseline assessments, and the best fitting models were fully dynamic scores (i.e., updated with each new assessment). All differences between the dynamic and the baseline model were very strong ( ∆ BIC of 13.6 to 110.2), as were most of the differences between the dynamic model and the 180 day reassessments ( ∆ BIC of 9.1 to 84.1).
When the dataset was reorganized for 30 day projections (i.e., recidivism at 31 days was not counted; Table 3), the dynamic model was still meaningfully better than the baseline model ( ∆ BIC of 8.0 to 66.8). There was little difference, however, between the fully dynamic ACUTE-2007 and the 30 day reassessments for the outcomes of sexual recidivism or violent recidivism ( ∆ BIC < 3). For any recidivism, there was strong evidence ( ∆ BIC of 7.8) that the fully dynamic model better fit the data than the 30 day reassessments.
As can be seen in Table 5, the ACUTE-2007 model with the shortest projected timeframes showed larger C values than longer projected timeframes. The same pattern applied to all three recidivism outcomes (sexual, violent, any). The correlation between the rank order of the follow-up time and the size of Harrell’s C was large (Kendall tau = - .533, with expected ranks [1 to 5] nested within recidivism types).
Consistent with Study 1, there was strong evidence the ACUTE-2007 reassessments improve prediction. In terms of decay over time, the most recent assessments were the most accurate; however, there was little difference between the fully dynamic model and the 30 day projections for sexual and violent recidivism (results favoured the fully dynamic model). For any recidivism, there was strong support for the fully dynamic model over the 30 day reassessments. Although there would be little empirical difference between the dynamic model and the 30 day reassessments (the median gap between assessments was 29), it is possible that acute variables have a different relationship to general recidivism than to sexual or violent recidivism (see General Discussion below).
STABLE-2007
The models with fully dynamic STABLE-2007 scores were better fitting models for sexual, violent, and any recidivism than the baseline models (Δ BIC = 7.78 to 17.60; Table 2). Compared to the 180-day reassessments, the fully dynamic STABLE-2007 scores performed similarly for sexual recidivism (Δ BIC = 1.71), were a somewhat better fit for violent recidivism (Δ BIC = 5.75), and a much better fit for any criminal recidivism (Δ BIC was 14.47; Table 2).
When considering 30 day projections (Table 3), 30 day reassessments were a somewhat better fit than the baseline model for sexual, violent, and any recidivism ( ∆ BIC of 4.4, 5.0, and 5.8, respectively). These comparisons were similar in magnitude to those between the baseline model and the 180-day model ( ∆ BIC of 6.1, 11.5, and 3.13, respectively; Table 2). The fully dynamic model was identical to the 30-day model because no cases had more than one STABLE-2007 assessment in the 30 days prior to a recidivism event.
Consistent with the findings for the ACUTE-2007, the STABLE-2007 model with the shortest follow-up timeframe (30 days and 45 days) had the highest predictive accuracy for sexual, violent, and any recidivism (Table 5). There was a strong correlation between the predictive accuracy (Harrell’s C) and the time from the recidivism event: accuracy diminished as the follow-up timeframes became longer (e.g., Kendall tau = - .800, with expected ranks nested within recidivism types).
Sexual Recidivism |
Violent Recidivism |
Any Recidivism |
||||
---|---|---|---|---|---|---|
C [95% CI] |
# of recidivists /N |
C [95% CI] |
# of recidivists /N |
C [95% CI] |
# of recidivists /N |
|
ACUTE-2007a |
||||||
30 Days |
.687 [.611, .763] |
68/4,108 |
.667 [.622, .713] |
187/4,022 |
.708 [.679, .737] |
458/3,930 |
45 Days |
.651 [.582, .720] |
79/4,108 |
.659 [.616, .702] |
211/4,022 |
.706 [.679, .733] |
534/3,930 |
60 Days |
.634 [.564, .705] |
77/4,108 |
.664 [.624, .704] |
236/4,022 |
.698 [.671, .724] |
562/3,930 |
120 Days |
.654 [.588, .719] |
92/4,108 |
.665 [.628, .701] |
292/4,022 |
.697 [.672, .721] |
673/3,930 |
180 Days |
.652 [.590, .699] |
104/4,108 |
.635 [.600, .671] |
316/4,022 |
.680 [.656, .704] |
727/3,930 |
STABLE-2007b |
||||||
30 Days |
.793 [.620, .966] |
24/4,221 |
.652 [.563, .742] |
74/4,134 |
.746 [.694, .798] |
196/4,059 |
45 Days |
.779 [.633, .925] |
32/4,221 |
.670 [.597, .743] |
106/4,134 |
.728 [.684, .771] |
278/4,059 |
60 Days |
.773 [.633, .912] |
33/4,221 |
.663 [.596, .730] |
124/4,134 |
.718 [.679, .757] |
337/4,059 |
120 Days |
.722 [.627, .818] |
53/4,221 |
.649 [.600, .697] |
216/4,134 |
.700 [.669, .731] |
514/4,059 |
180 Days |
.697 [.617, .776] |
71/4,221 |
.650 [.608, .691] |
264/4,134 |
.696 [.669, .724] |
632/4,059 |
Note. C: Harrell’s concordance index. Bolded represents statistically significant predictors (p < .05). a Total of 56,091 ACUTE-2007 assessments for sexual recidivism, 52,716 assessments for violent recidivism, and 46,479 assessments for any recidivism. Kendall’s tau between recency and C values = -.533 for ACUTE-2007. b Total of 11,101 STABLE-2007 assessments for sexual recidivism, 10,372 assessments for violent recidivism, and 9,300 assessments for any recidivism. Kendall’s tau between recency and C values = -.800 for STABLE-2007.
General Discussion
In order to implement effective interventions that reduce the likelihood of reoffending among those on community supervision, it is crucial that dynamic risk factors (i.e., criminogenic needs) are accurately assessed (Andrews et al., 1990; Andrews & Bonta, 2010; Andrews & Dowden, 2006). Many forensic evaluators working with individuals serving sentences in the community (e.g., parole and probation officers) currently use dynamic risk assessment tools and regularly reassess the risk of reoffending risk during routine supervision practice. The ACUTE-2007 and STABLE-2007 are the most used dynamic risk tools for this purpose (Bourgon et al., 2018; Hill & Demetrioff, 2019; Kelley et al., 2020; Neal & Grisso, 2014). The decay period of these risk tools, however, had not been previously examined. Using novel statistical analyses and two independent samples, the current study found that predictive accuracy increased the closer the assessment was to the recidivism event. Nevertheless, the baseline assessments remained significant (and moderately large) predictors in all analyses. There was no time limit (in the range studied) at which the assessments failed to indicate the relative risk of recidivism. Consequently, decisions concerning the ideal reassessment period need to balance the increased accuracy of recency against the costs and administrative burdens of frequent, repeat assessments.
There are some important strengths of the current study. First, evidence for consistent reassessments of dynamic risk assessment tools was supported by two independent field samples of individuals who were serving part of their sentence in the community and were under supervision. Furthermore, the assessment results could have real consequences to the individuals’ assessment (e.g., increased intensity of supervision, home visits). Given this, the findings from the current study should be field validity studies (Edens & Boccaccini, 2017), supporting their applied use in other criminal justice samples, particularly within Canada. Second, some previous research has focused on whether score changes at two-time points (e.g., pre-and post-treatment) predicts reoffending (e.g., de Vries Robbé et al., 2015; Olver et al., 2007; Vose et al., 2013) or whether reassessment scores that are projecting varying time-period predicts reoffending better than the initial (baseline) scores (Viljoen et al., 2017). This study was able to add to this research by evaluating whether there were patterns of declining predictive accuracy across different timeframes with models that more closely resembled the typical contact timeframes outlined within the case management plans of correctional staff. Finally, this study directly and also indirectly compared the patterns of declining predictive accuracy between an acute dynamic risk tool (ACUTE-2007) and a stable dynamic risk tool (STABLE-2007). This study contributed to our understanding of these commonly used dynamic risk tools and elucidated their capacities to predict the likelihood of reoffending for individuals on community supervision at varying assessment periods.
Overall, the total scores of the ACUTE-2007 and STABLE-2007 tools significantly predicted sexual, violent, and any recidivism in both samples. Higher predictive accuracy was observed in the developmental sample [Study 1] than the administrative sample [Study 2]; these findings were anticipated for Study 1 in particular, given the greater breadth and depth of information available upon which the dynamic risk assessments were based (e.g., comprehensive recidivism information, well-trained correctional officers, more comprehensive evaluations, see the Dynamic Supervision Project; Hanson et al., 2007; Hanson et al., 2015).
Across the samples, the current study showed that the dynamic version of ACUTE-2007 and STABLE-2007 predicted sexual, violent, and any recidivism better than the first assessments of those tools. In other words, reassessment of dynamic risk tools can improve the prediction of recidivism risk. The current findings are consistent with a growing body of recent research supporting the reassessment of dynamic risk assessment tools for general and sexual recidivism (e.g., Babchishin & Hanson, 2020; Lloyd et al., 2020, Hanson et al., 2021).
As hypothesized, this study also found consistent patterns of declining predictive accuracy for the ACUTE-2007 and STABLE-2007 risk tools as projected timeframes of risk scores became longer, particularly for STABLE-2007 (an average Kendall’s tau of .87 for STABLE-2007 versus .50 for ACUTE-2007). Specifically, Harrell’s C values fell as the projected timeframes became longer (30 days to 180 days) for all three different types of recidivism assessed (i.e., sexual, violent, and any recidivism). The direct comparisons of different models also supported that the closer the proximity, the greater predictive accuracy. Reassessment of ACUTE-2007 every 30 days and every 180 days both showed better predictions for recidivism than the first assessments; however, the evidence was stronger for the scores assessed every 30 days (i.e., greater Δ BIC against the first assessments).
For both samples, there was stronger evidence for the value of reassessing ACUTE-2007 than STABLE-2007. Reassessment of the ACUTE-2007 every 30 days improved the prediction for sexual, violent, and any recidivism compared to the baseline scores. Although more frequent assessments of the ACUTE-2007 shorter than 30 days did not improve the prediction for sexual and violent recidivism, the two models (30 days versus fully dynamic) were likely based on similar data given the average time between ACUTE-2007 assessment was about 30 days.
Direct comparisons indicated that reassessment of the STABLE-2007 every 180 days improved the prediction for sexual, violent, and any recidivism compared to the baseline scores. As well, there was a strong relationship between the recency of the STABLE-2007 (180 to 30 days) and its predictive accuracy. The direct comparison between the baseline and 30-day assessments also favoured the 30-day assessments, but not by much. Consequently, the study provided only weak evidence concerning the pattern of decay. Nevertheless, the overall pattern is consistent with the predictive accuracy of the ACUTE-2007 scores declining more quickly and more obviously than STABLE-2007 scores. This pattern would be expected given that ACUTE-2007 is intended to assess rapidly changing features, whereas stable dynamic factors are conceptualized as relatively be enduring qualities.
The overall findings support the consistent reassessment of the ACUTE-2007 and STABLE-2007 to improve their predictive accuracy and to better inform frontline community supervision officers. Specifically, given the ease of rescoring ACUTE-2007 (five to ten minutes), the current recommendation to rescore ACUTE-2007 every meeting or at least 30 days seems reasonable. For STABLE-2007, however, updating scores requires more extensive effort (e.g., interview and review of file information). Consequently, the decision about when to update the STABLE-2007 needs to balance increasing predictive accuracy with more recent assessment against the cost of new assessments. The current recommendation is either six months or 12 months. Based on the current results, there is support for the shorter of these two options (6 months) or after major changes (e.g., successfully completing treatment) associated with their risk-relevant characteristics.
Limitations
The 95% confidence intervals of Harrell’s C values for each of the timeframes were quite broad due to the small number of sexual recidivism events. Consequently, the confidence intervals were overlapping (low statistical power; Type II error), even for a study with 4,221 participants (Study 2). It was not possible to directly compare Harrell’s C values across the models because the sample sizes varied.
In Study 1, most individuals on community supervision had only one STABLE-2007 assessment (at baseline) over the follow-up time; therefore, it was not possible to conduct three model comparisons (i.e., baseline versus 180 days versus fully dynamic). Consequently, analyses focusing on the STABLE-2007 could only be drawn from the administrative sample (Study 2).
Previously, the ACUTE-2007 was found to assess the same underlying constructs across follow-up time (measurement invariance over time; Babchishin & Hanson, 2020), but an assumption of measurement invariance of STABLE-2007 has not been tested yet. Understanding whether findings regarding changes in dynamic scores could be attributed to true changes, as opposed to measurement errors (Asparouhov & Muthén, 2009), is an important topic of future research.
Another potential limitation is that the current samples received provincial community supervision sentences in Canada may not be representative of other individuals who received federal sentences in Canada. In addition, individuals who commit different types of sexual crimes (e.g., rapists vs. child molesters) might show different decay rates of the predictive accuracy of dynamic risk tools.
Implications for Research
More studies are needed with a higher quantity of reassessments of the STABLE-2007 to replicate the results of Study 2. In particular, the research could profitably examine whether the declines in predictive accuracy are slower for STABLE-2007 than for ACUTE-2007. Although the current results could be interpreted as supporting this hypothesis, the pattern of results was not entirely consistent.
Several studies have found that dynamic risk tools like the STABLE-2007 and the ACUTE-2007 incrementally predict sexual recidivism above and beyond the contributions of static risk tools, such as Static-99R (Brankley et al., 2021; Hanson et al., 2007; Helmus et al., 2021). Some researchers have provided estimated recidivism rates for risk predictions completed with both static and dynamic tools (Static-99R and STABLE-2007; Brankley et al., 2017). When static and dynamic risk assessments are assessed in combination, the risk levels of the static risk tool have typically been adjusted by the first STABLE-2007 assessment (Brankley et al., 2017, 2019); the current study suggests that adjusting Static-99R using the most recent assessment may provide a more accurate assessment than using the first assessment.
The current study only examined discrimination (or relative risk), not calibration (the match between expected and observed values; Helmus & Babchishin, 2017). Although several studies (including the current study) have found that later assessments improve relative risk (discrimination) compared to earlier assessments, little is known about how reassessments should inform absolute risk estimates. Given that the recidivism risk is expected to decline, the longer individuals remain offence-free in the community (Hanson, 2018; Hanson et al., 2018), it is likely that the relationship between dynamic risk factors and absolute recidivism risk also changes over time.
Conclusions
The current research suggests that the predictive accuracy of dynamic risk assessment decay over time, likely because individuals are changing on risk-relevant propensities. As such, regular reassessment of dynamic risk tools assists corrections officers in evaluating individuals’ risk more accurately, and acute tools appear to benefit from reassessment more frequently than stable tools. Intervention plans should be updated according to the reassessment results for the most effective management of criminogenic needs, such as psychological problems (e.g., emotional collapse, sexual preoccupation) and/or situational changes (e.g., victim access, loss of employment).
References
- Altman, D. G., & de Stavola, B. L. (1994). Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates. Statistics in Medicine, 13(4), 301-341. https://doi.org/ 10.1002/sim.4780130402
- Andrews, D. A., & Bonta, J. (2010). Rehabilitating criminal justice policy and practice. Psychology, Public Policy, and Law, 16(1), 39–55. https://doi.org/10.1037/a0018362
- Andrews, D. A., Bonta, J., & Hoge, R. D. (1990). Classification for effective rehabilitation: Rediscovering psychology. Criminal Justice and Behavior, 17(1), 19–52. https://doi.org/10.1177/0093854890017001004
- Andrews, D. A., & Dowden, C. (2006). Risk principle of case classification in correctional treatment: A meta-analytic investigation. International Journal of Offender Therapy and Comparative Criminology, 50(1), 88–100. https://doi.org/10.1177/0306624X05282556
- Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16(3), 397-438. https://doi.org/10.1080/10705510903008204
- Babchishin, K., & Hanson, R. K. (2020). Monitoring changes in risk of reoffending: A prospective study of 632 men on community supervision. Journal of Consulting and Clinical Psychology, 88(10), 886–898. https://doi.org/10.1037/ccp0000601
- Bonta, J., & Andrews, D. A. (2017). The psychology of criminal conduct (6th ed.). Routledge.
- Bourgon, G., Mugford, R., Hanson, R. K., & Coligado, M. (2018). Offender risk assessment practices vary across Canada. Canadian Journal of Criminology and Criminal Justice, 60(2), 167–205. https://doi.org/10.3138/cjccj.2016-0024
- Brankley, A. E., Babchishin, K. M., Chankin, L., Barsetti, I., & Hanson, R. K. (2019). ACUTE-2007 Evaluator Workbook. [Unpublished report]. Public Safety Canada.
- Brankley, A. E., Babchishin, K. M., & Hanson, R. K. (2021). STABLE-2007 demonstrates predictive and incremental validity in assessing risk-relevant propensities for sexual offending: A meta-analysis. Sexual Abuse, 33(1), 34-62. https://doi.org/10.1177/1079063219871572
- Brankley, A., Helmus, L., & Hanson, R. (2017). STABLE-2007 evaluator workbook: Updated recidivism rates (includes combinations with Static-99R, Static-2002R, and Risk Matrix 2000) [Unpublished report]. Public Safety Canada.
- Brouillette-Alarie, S., Babchishin, K., Hanson, R., & Helmus, L. (2016). Latent constructs of the Static-99R and Static-2002R: A three-factor solution. Assessment, 23(1), 96–111. https://doi.org/10.1177/1073191114568114
- Brouillette-Alarie, S., & Hanson, R. (2015). Comparison of two measures of recidivism risk assessment of sexual offenders. Canadian Journal of Behavioural Science, 47(4), 292–304. https://doi.org/10.1037/cbs0000019
- Blumstein, A., & Nakamura, K. (2009). Redemption in the presence of widespread criminal background checks. Criminology, 47(2), 327–359. https://doi.org/10.1111/j.1745-9125.2009.00155.x
- Campbell, M. A., French, S., & Gendreau, P. (2009). The prediction of violence in adult offenders: A meta-analytic comparison of instruments and methods of assessment. Criminal Justice and Behavior, 36(6), 567-590. https://doi.org/10.1177/0093854809333610
- Cohen, T. H., Lowenkamp, C. T., & VanBenschoten, S. W. (2016). Examining changes in offender risk characteristics and recidivism outcomes: A research summary. Federal Probation, 80(2), 57–68. https://www.uscourts.gov/sites/default/files/80_2_8_0.pdf
de Vries Robbé, M., de Vogel, V., Douglas, K. S., & Nijman, H. L. I. (2015). Changes in dynamic risk and protective factors for violence during inpatient forensic psychiatric treatment: Predicting reductions in post-discharge community recidivism. Law and Human Behavior, 39(1), 53–61. https://doi.org/10.1037/lhb0000089
- Douglas, K., & Skeem, J. (2005). Violence risk assessment: Getting specific about being dynamic. Psychology, Public Policy, and Law, 11(3), 347–383. https://doi.org/10.1037/1076-8971.11.3.347
- Edens, J. F. & Boccaccini, M. T. (2017). Taking forensic mental health assessment ‘out of the lab’ and into ‘the real world’: Introduction to the special issue on the field utility of forensic assessment instruments and procedures. Psychological Assessment, 29(6), 710-719. https://doi.org//10.1037/pas0000475
- Fernandez, Y., Harris, A., RK, H., & J, S. (2014). STABLE-2007 coding manual: Revised 2014 (Unpublished manual). Public Safety Canada.
- Gordon, R. (2012). Applied statistics for the social and health sciences. Routledge.
- Hanson, R. K. (1998). What do we know about sex offender risk assessment? Psychology, Public Policy, and Law, 4(1–2), 50–72. https://doi.org/10.1037/1076-8971.4.1-2.50
- Hanson, R. K. (2018). Long-term recidivism studies show that desistance is the norm. Criminal Justice and Behavior, 45(9), 1340–1346. https://doi.org/10.1177/0093854818793382
- Hanson, R. K., Babchishin, K. M., Helmus, L. M., Thornton, D., & Phenix, A. (2017). Communicating the results of criterion referenced prediction measures: Risk categories for the Static-99R and Static-2002R sexual offender risk assessment tools. Psychological Assessment, 29(5), 582-597. https://doi.org/10.1037/pas0000371
- Hanson, R. K., Bourgon, G., McGrath, R., Kroner, D., D’Amora, D., Thomas, S., & Tavarez, L. (2017). A five-level risk and needs system: Maximizing assessment results in corrections through the development of a common language. The Council of State Governments Justice Center. https://csgjusticecenter.org/wp-content/uploads/2017/01/A-Five-Level-Risk-and-Needs-System_Report.pdf
- Hanson, R. K, & Harris, A. (2000). Where should we intervene? Dynamic predictors of sexual offense recidivism. Criminal Justice and Behavior, 27(1), 6–35. https://doi.org/10.1177/0093854800027001002
- Hanson, R. K., Harris, A. J. R, Letourneau, E, Helmus, L. M., & Thornton, D. (2018). Reductions in risk based on time offense free in the community: Once a sexual offender, not always a sexual offender. Psychology, Public Policy and Law, 24(1), 48-63. https://doi.org/10.1037/law0000135
- Hanson, R. K., Harris, A., Scott, T., & Helmus, L. (2007). Assessing the risk of sexual offenders on community supervision: The Dynamic Supervision Project (Corrections Research User Rep. No. 2007-05). Public Safety Canada.
- Hanson, R. K., Helmus, L.-M., & Harris, A. J. R. (2015). Assessing the risk and needs of supervised sexual offenders: A prospective study using STABLE-2007, Static-99R, and Static-2002R. Criminal Justice and Behavior, 42(12), 1205–1224. https://doi.org/10.1177/0093854815602094
- Hanson, R. K., & Morton-Bourgon, K. E. (2009). The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis of 118 prediction studies. Psychological Assessment, 21(1), 1–21. https://doi.org/10.1037/a0014421
- Hanson, R. K., Newstrom, N., Brouillette-Alarie, S., Thornton, D., & Miner, M. H. (2021). Does reassessment improve prediction? A prospective study of the Sexual Offender Treatment Intervention and Progress Scale (SOTIPS). The International Journal of Offender Therapy and Comparative Criminology, 65(16), 1775-1803. https://doi.org/10.1177/0306624X20978204
- Harrell, F. E., Lee, K. L., & Mark, D. B. (1996). Multivariable prognostic models: Issues in developing models, Evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine, 15(4), 361–387. https://doi.org/10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
- Harris, A. J. R., & Hanson, R. K. (2010). Clinical, actuarial and dynamic risk assessment of sexual offenders: Why do things keep changing? Journal of Sexual Aggression, 16(3), 296–310. https://doi.org/10.1080/13552600.2010.494772
- Harris, G. T., & Rice, M. E. (2007). Adjusting actuarial violence risk assessments based on aging or the passage of time. Criminal Justice and Behavior, 34(3), 297–313. https://doi.org/10.1177/0093854806293486
- Helmus, L. M., & Babchishin, K. M. (2017). Primer on risk assessment and the statistics used to evaluate its accuracy. Criminal Justice and Behavior, 44(1), 8–25. https://doi.org/10.1177/0093854816678898
- Helmus, L. M., Hanson, R. K., Murrie, D. C., & Zabarauckas, C. L. (2021). Field validity of Static-99R and STABLE-2007 with 4,433 men serving sentences for sexual offences in British Columbia: New findings and cumulative meta-analysis. Psychological Assessment, 33(7), 581-595. https://doi.org/10.1037/pas0001010
- Hill, D., & Demetrioff, S. (2019). Clinical-forensic psychology in Canada: A survey of practitioner characteristics, attitudes, and psychological assessment practices. Canadian Psychology/Psychologie Canadienne, 60(1), 55–63. https://doi.org/10.1037/cap0000152
- Howard, P., & Dixon, L. (2013). Identifying change in the likelihood of violent recidivism: Causal dynamic risk factors in the OASys violence predictor. Law and Human Behavior, 37(3), 163–174. https://doi.org/10.1037/lhb0000012
- Kelley, S. M., Ambroziak, G., Thornton, D., & Barahal, R. M. (2020). How do professionals assess sexual recidivism risk? An updated survey of practices. Sexual Abuse, 32(1), 3–29. https://doi.org/10.1177/1079063218800474
- Labrecque, R. M., Smith, P., Lovins, B. K., & Latessa, E. J. (2014). The importance of reassessment: How changes in the LSI-R risk score can improve the prediction of recidivism. Journal of Offender Rehabilitation, 53(2), 116–128. https://doi.org/10.1080/10509674.2013.868389
- Leite W. L. (2007). A comparison of latent growth models for constructs measured by multiple items. Structural Equation Modeling, 14(4), 581–610. https://doi.org/10.1080/10705510701575438
- Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm, K. J. (2017). Testing measurement invariance in longitudinal data with ordered-categorical measures. Psychological Methods, 22(3), 486-506. https://doi.org/10.1037/met0000075
- Lloyd, C., Hanson, R., Richards, D., & Serin, R. (2020). Reassessment improves prediction of criminal recidivism: A prospective study of 3,421 individuals in New Zealand. Psychological Assessment, 32(6), 568-581. https://doi.org/10.1037/pas0000813
- McGrath, R. J., Cumming, G., Burchard, B., Zeoli, S., & Ellerby, L. (2010). Current practices and emerging trends in sexual abuser management: The Safer Society 2009 North American Survey. Safer Society Press. http://ra.ocls.ca/ra/login.aspx?inst=centennial&url=https://www.deslibris.ca/ID/223961
- McNaughton Nicholls, C., Callanan, M., Legard, R., Tomaszewski, W., Purdon, S., & Webster, S. (2010). Examining implementation of the stable and acute dynamic risk assessment tool pilot in England and Wales. Ministry of Justice. https://espace.library.uq.edu.au/view/UQ:236394
- Neal, T. M. S., & Grisso, T. (2014). Assessment practices and expert judgment methods in forensic psychology and psychiatry: An international snapshot. Criminal Justice and Behavior, 41(12), 1406–1421. https://doi.org/10.1177/0093854814548449
- Olver, M. E., Wong, S. C. P., Nicholaichuk, T., & Gordon, A. (2007). The validity and reliability of the Violence Risk Scale-Sexual Offender version: Assessing sex offender risk and evaluating therapeutic change. Psychological Assessment, 19(3), 318–329. https://doi.org/10.1037/1040-3590.19.3.318
- Phenix, A., Fernandez, Y., Harris, A. J. R., Helmus, M, Hanson R. K. & Thornton D. (2017). Static-99R Coding Rules – Revised 2016. Research Report 2017-R012. Public Safety Canada. http://www. saarna.org.
- Polaschek, D. L. L., & Yesberg, J. A. (2018). High-risk violentprisoners’ patterns of change on parole on theDRAOR’s dynamic risk and protective factors. Criminal Justice and Behavior, 45(3), 340–363. https://doi.org/10.1177/0093854817739928
- Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. https://doi.org/10.2307/271063
- R Core Team. (2013). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing, Vienna. http://www.R-project.org/
- Rice, M. E., & Harris, G. T. (2005). Comparing effect sizes in follow-up studies: ROC Area, Cohen’s d, and r. Law and Human Behavior, 29(5), 615–620. https://doi.org/10.1007/s10979-005-6832-7
- Serin, R. C., Chadwick, N., & Lloyd, C. D. (2019). Integrating dynamic risk assessment into community supervision practice. In D. L. L. Polaschek, A. Day, & C. R. Hollin (Eds.), The Wiley International Handbook of Correctional Psychology (pp. 725–743). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119139980.ch45
- Seto, M. C., & Fernandez, Y. M. (2011). Dynamic risk groups among adult male sexual offenders. Sexual Abuse: A Journal of Research and Treatment, 23(4), 494–507. https://doi.org/10.1177/1079063211403162
- Singer, J., & Willet, J. (2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford University Press.
- Therneau, T. (2020). Package “survival” (Version 3.1-11) [Computer software].
- Tully, R. J., Chou, S., & Browne, K. D. (2013). A systematic review on the effectiveness of sex offender risk assessment tools in predicting sexual recidivism of adult male sex offenders. Clinical Psychology Review, 33(2), 287-316. https://doi.org/10.1016/j.cpr.2012.12.002
- van den Berg, J. W., Smid, W., Schepers, K., Wever, E., van Beek, D., Janssen, E., & Gijs, L. (2018). The predictive properties of dynamic sex offender risk assessment instruments: A meta-analysis. Psychological Assessment, 30(2), 179–191. https://doi.org/10.1037/pas0000454
- Viljoen, J. L., Gray, A. L., Shaffer, C., Bhanwer, A., Tafreshi, D., & Douglas, K. S. (2017). Does reassessment of risk improve predictions? A framework and examination of the SAVRY and YLS/CMI. Psychological Assessment, 29(9), 1096–1110. https://doi.org/10.1037/pas0000402
- Volinsky, C. T., & Raftery, A. E. (2000). Bayesian information criterion for censored survival models. Biometrics, 56(1), 256–262. https://doi.org/10.1111/j.0006-341X.2000.00256.x
- Vose, B., Smith, P., & Cullen, F. T. (2013). Predictive validity and the impact of change in total LSI-R score on recidivism. Criminal Justice and Behavior, 40(12), 1383–1396. https://doi.org/10.1177/0093854813508916
- Ward, T., & Beech, A. R. (2015). Dynamic risk factors: A theoretical dead-end? Psychology, Crime & Law, 21(2), 100–113. https://doi.org/10.1080/1068316X.2014.917854
- Wong, S., Olver, M., & Stockdale, K. (2009). The utility of dynamic and static factors in risk assessment, prediction, and treatment. In Handbook of violence risk assessment and treatment: New approaches for mental health professionals (pp. 83–120). Springer Publishing Co.
- Yang, M., Wong, S. C. P., & Coid, J. (2010). The efficacy of violence prediction: A meta-analytic comparison of nine risk assessment tools. Psychological Bulletin, 136(5), 740-767. https://doi.org/10.1037/a0020473
- Zamble, E., & Quinsey, V. (1997). The criminal recidivism process. Cambridge University Press.
- Date modified: