Clinical Review (insufficient quality of evidence to enable a Clear Recommendation):
In the absence of RCTs, observational studies have generally found that neither serial inflammatory biomarkers (e.g., erythrocyte sedimentation rate [ESR], C-reactive protein [CRP]) nor routinely repeated imaging accurately predict long-term treatment success for osteomyelitis or PJI for individual patients, nor have they been shown to meaningfully alter treatment decisions beyond clinical observation. Thus, following inflammatory biomarkers and repeated imaging may not offer benefit or contribute to high value care in most patients. Nonetheless, repeated imaging may be useful for patients who are clinically failing therapy to inform source control attempts, identify mechanical complications such as pathological fracture, and/or to trigger reconsideration of the initial diagnosis.
ESR and CRP are the two most commonly used serum biomarkers in establishing diagnosis of osteomyelitis and its response to therapy in conjunction with clinical acumen, history or physical examination findings, and diagnostic imaging. Other serum biomarkers such as procalcitonin have not been shown to have better sensitivity.
No RCTs have been published to assess the impact of serial biomarkers on the treatment outcomes of osteomyelitis or joint infections. However, numerous observational studies of various designs, sizes, and quality have been published that evaluate the utility of such tests.
Among many studies, Carragee et al. conducted a retrospective chart review of 44 cases to describe the clinical use of ESR in monitoring outcomes for pyogenic vertebral osteomyelitis.(Carragee, Kim et al. 1997) The 44 patients had ESR tested at or before time of diagnosis and at least twice during the following month. The study revealed a correlation of ESR with response to treatment, in that those with a decline in ESR were unlikely to have clinical failure. Indeed, a rapid decline of ESR (> 50% in the first month) was rarely seen in treatment failure. However, failure to decline did not predict failure. Indeed, by approximately two weeks after antibiotic treatment, 19 of 32 patients had ESRs that were actually higher than at the time of diagnosis. Yet these patients went on to achieve clinical cure without surgery. Thus, the accuracy of the ESR at predicting who would fail and require some modification to the treatment regimen was poor. It is unclear how the test could alter therapy compared to clinical observation alone.
Similarly, in observational studies of 345, 79, and 38 patients with vertebral osteomyelitis, ESR and CRP levels were assessed and compared between patients who did or did not achieve long-term treatment success.(Kowalski, Berbari et al. 2006, Nanni, Boriani et al. 2012, Park, Cho et al. 2016) There was no relationship between ESR and CRP and risk of subsequent relapse at any time point.
In a study of 45 patients, Yoon et al. found that an ESR >55 mm/h and CRP > 27.5 mg/L at four weeks after antibiotic treatment was associated with a higher rate of treatment failure, defined as disease progression or recurrence, with an OR (95% CI) of 5.2 (1.0-26.6; p = 0.04).(Yoon, Chung et al. 2010) However, the maximal sensitivity and specificity for ESR was 70% and 40%, respectively, and for CRP was 40% and 86%, respectively (+LR ≤ 3 and -LR ≥ 0.7). Thus, while the odds ratios for treatment success at the population level may have statistically correlated with short-term failure, accuracy of the tests for shifting post-test probabilities at the level of individual patients was poor.
Babouee Flury et al. conducted a retrospective study of 61 patients with vertebral osteomyelitis and found that the only independent predictor of switch to oral antibiotics was a lower CRP at two weeks compared with baseline, with an OR of 0.7 per 10 mg/L increase in CRP (p = 0.041).(Babouee Flury, Elzi et al. 2014) Thus, CRP correlated with physicians’ clinically-based assessments that patients were improving. Furthermore, nearly all patients achieved clinical response, whether their CRP fell or not. Again, it is unclear how biomarker information could have changed management.
Similarly, in a retrospective analysis of 21 patients with postoperative wound infections after spinal surgery, Khan et al. found that ESR levels did not correlate with clinical improvement.(Khan, Smith et al. 2006) However, the authors reported that CRP levels tracked well with clinical response: decreases in CRP levels paralleled patients clinically responding, while CRP levels remained elevated in patients who were clinically failing. Patients with clinical failure demonstrated persistent sinus tract drainage of pus while on therapy, required repeat surgical debridement, and/or had persistent erythema at the infection site. Yet, all of the clinically failing patients were known to be failing anyway based on physical exam, so it is unclear what new information the CRP added. In other words, CRP did not provide additional, practice-altering information over and above physical exam findings that were consistent with treatment failure.
Michail et al. conducted a prospective study to examine the performance of serum inflammatory markers in the diagnosis and monitoring of patients with DFI.(Michail, Jude et al. 2013) Of 61 patients (average age 63) with untreated foot infection, 27 had a diagnosis of osteomyelitis based on clinical exam and confirmed with imaging. The remainder (n = 34) were diagnosed with soft tissue infection only. Serologic markers (e.g., ESR, CRP) were obtained in all patients at baseline, one week, three weeks, and three months. At baseline, serologic markers were significantly higher in patients with osteomyelitis compared to those with soft tissue infection. After initiation of antibiotics, serologic markers declined. While it took, on average, seven days for CRP to return to near normal, ESR remained high until month three in those with underlying osteomyelitis. Unfortunately, outcomes are not described in this study, so it cannot be ascertained whether either marker accurately predicted clinical response to therapy, and in a manner distinct from clinical observation alone.
Tardaguila-Garcia et al. conducted an observational cohort study to analyze the predictive role of inflammatory markers in the healing time of DFO either managed by surgery or antibiotic treatment.(Tardaguila-Garcia, Garcia Alvarez et al. 2020) They found no correlation of inflammatory markers with healing time regardless of treatment group.
Van Asten et al. conducted a cohort study of 24 patients with DFO to determine if inflammatory markers could be used to monitor the treatment of DFI.(van Asten, Jupiter et al. 2017) The biomarkers of interest included ESR, CRP, PCT, interleukins (IL-2, IL-6, IL-8), and TNF. The authors reported that CRP, ESR, PCT, and IL-6 levels significantly declined in the group with osteomyelitis after starting therapy. However, outcomes were not assessed, and it was therefore again not possible to determine how such levels could have altered outcomes or management compared to clinical observation alone.
In a second, longitudinal cohort study, Van Asten et al. evaluated trajectories of biomarkers, including ESR, CRP, and WBC count in 122 patients treated for DFO.(van Asten, Jupiter et al. 2017) Initial inflammatory levels did not correlate with long-term outcomes. The authors found that CRP and ESR fell less rapidly in patients who had poor outcomes compared to those who healed with therapy. However, no formal ROCs were calculated, and the graphs demonstrate considerable overlap between the values over time, suggesting accuracy was low at distinguishing outcomes in individual patients.
Lin et al. sought to determine the association between both ESR and CRP and osteomyelitis recurrence.(Lin, Vasudevan et al. 2016) They reviewed records of 81 males and 27 females with a median age of 54 years (range 10 to 87) who underwent antibiotic and surgical treatment for primary (n=68) or recurrent (n=40) osteomyelitis that was related (n=26) or unrelated (n=82) to a prosthesis. Of the 40 cases of osteomyelitis recurrence at a median 23.4 (range, 0.6-74.0) months of follow up, 7 and 33 were related and unrelated to a prosthesis, respectively. Risk factors for osteomyelitis recurrence were ESR ≥ 20 mm/h, infection with MRSA, and infection in the lower limb. Evaluating numerous cut-points of both ESR and CRP by regression analyses, they were able to find statistically significant relationships between individual test levels and hazard ratios of recurrence of osteomyelitis among the entire cohort. However, the sensitivity and specificity for both tests at predicting relapse in individual patients ranged from 50% to 85% (with most values being in the 60%-70% range), resulting in relatively poor +LR and -LR < 5 and > 0.6, respectively, at all cut-point values analyzed. Thus, irrespective of odds ratios for predicting the proportion of patients who would relapse across a population, the tests remained relatively inaccurate for predicting who would relapse among individual patients.
Faizal et al. reported on 51 adult patients with skull-base osteomyelitis, for whom ESR and CRP were ordered at initiation of therapy and at end of therapy, between week 6 and 8.(Faizal, Surendran et al. 2020) Upon completion of eight weeks of antibiotic therapy, 30 of the 51 (59%) patients were asymptomatic. Of these 30 patients, only three had achieved normal ESR and CRP values. Yet all 30 of these patients continued to be asymptomatic throughout the period of follow up, indicating the testing was not useful. Furthermore, the authors tried to establish best cut-off values for ESR or CRP, which, while still reflecting abnormal levels, had fallen enough that they could be considered indicative of treatment success. The best sensitivity and specificity they could achieve were 70%-80% and 60% (+LR and -LR <3 and >0.3), respectively, and the best correlation between ESR/CRP and PET scan was 60%-70%.
Ghani et al. examined the usefulness of CRP testing in determining whether a PJI had been treated successfully.(Ghani, Hutt et al. 2020) They found no difference between the mean CRP values of successful vs. unsuccessful treatment groups. Similar studies suggest that serial CRP monitoring is not reliable in determining infection specifically in two-stage revision procedures. Ghanem et al. sought to determine the usefulness of CRP as a test to determine the eradication of infection and the success in DAIR and single-stage revision.(Ghanem, Azzam et al. 2009) The optimal ROC was 0.55 (poor capacity to distinguish), which was not statistically significant. They concluded that CRP often does not normalize even when the infection is eradicated.
Shukla et al. performed a study looking at 87 infected total hip arthroplasties treated with antibiotic spacer and six weeks antibiotics.(Shukla, Ward et al. 2010) ESR and CRP were obtained before reimplantation. They came to a similar conclusion that ESR and CRP were not sufficiently rigorous tests and frequently remained elevated in patients whose infection had been eradicated.
Recently, Maier et al. evaluated the ESR:CRP ratio (ECR) as a marker for predicting infection resolution in 179 patients with acute PJI, acute hematogenous PJI, or chronic PJI who underwent DAIR.(Maier, Klemt et al. 2020) The area under the ROC was calculated to evaluate ECR as a diagnostic marker for predicting postoperative reinfection in patients who underwent DAIR. Statistically significant differences in ECR were found in patients who underwent DAIR revision vs. total joint arthroplasty for chronic infection (1.23 vs. 2.33; p = 0.04). There was no significant difference in ECR in patients who underwent DAIR for acute infection (p = 0.7) and acute hematogenous infection (p = 0.6). In patients who underwent DAIR for chronic PJI, ECR demonstrated a sensitivity and specificity of 75% and 84%, respectively, for the prediction of postoperative reinfection, which was significantly higher than that of ESR alone (sensitivity, 67%; specificity, 47%; p< 0.001) or CRP alone (sensitivity, 50%; specificity, 26%; p< 0.001). Nevertheless, that superior accuracy still resulted in marginally useful likelihood ratios (+LR < 5, -LR > 0.3).
Finally, in one of the largest studies conducted to date, Bejon et al. also came to a similar conclusion.(Bejon, Byren et al. 2011) They analyzed 3,732 serially obtained CRPs from 151 total joint arthroplasty patients (71 hip, 76 knee, and four elbow revisions) who had undergone two-stage revision for PJI, and 109 patients who had undergone DAIR (51 hip replacement, 50 knee replacements, and eight other joints). They reported that CRP values and changes in values were inaccurate at predicting treatment success, with poor ROCs. As Dr. Bejon and colleagues noted in their discussion, “CRP could not be recommended as a diagnostic test based on the sensitivity and specificity values indicated by ROCs. This does not reflect limited power of the study, but the wide scatter of individual readings in both outcome groups, as found in previous studies.”
Collectively, the data are not compelling that inflammatory biomarker values over time can accurately predict osteomyelitis outcomes in a manner distinct from clinical observation, or inform a change in management to improve outcomes in individual patients. These lab tests may be used more for clinician psychological reassurance than to inform beneficial patient care decisions. If so, they are potential examples of wasteful, low-value care. Absent more compelling prospective data that demonstrate their ability to alter clinical decision-making in a manner that improves outcomes, we do not recommend their routine monitoring to assess response to therapy.
In a study of osteomyelitis in children, 164 MRIs were ordered over time for 59 patients.(Courtney, Flynn et al. 2010) All repeat MRIs continued to show evidence of osteomyelitis due to abnormal marrow signal, including in patients who went on to treatment success. Of the 104 repeat MRIs (subtracting out the 59 baseline MRIs), 28 were ordered within the first two weeks of therapy, all due to “worsening clinical course.” Eight (29%) of these resulted in a change in management. Of the remaining 76 repeat MRIs that were ordered after the first two weeks of therapy, only three (4%) changed management. Thus, 8/11 (72%) studies that changed management did so within the first two weeks of therapy. Overall, 10 of the 11 studies that changed management were triggered because of clinical signs and symptoms of failure of response to therapy. Thus, in this uncontrolled case series, MRI was informative only to confirm clinical suspicion of failure of response to therapy, and to guide changes in management; MRI was not informative as general surveillance in patients with clinical response.
In a case series of 79 patients with vertebral osteomyelitis who had repeat imaging, the median duration of antimicrobial therapy was 58 days.(Kowalski, Berbari et al. 2006) The median follow up was 739 days. Imaging was repeated by physician choice at 4 to 8 weeks, likely triggered by clinical concerns of treatment failure. The finding of an improved MRI at 4 to 8 weeks was predictive of long-term clinical success, achieved in 26 of 27 (96%) patients with improved MRI. However, those patients were clinically improved anyway, so it is unclear how the MRI information could have changed management. Furthermore, when including patients with stable MRI findings at 4 to 8 weeks, the positive predictive value fell dramatically, to 47 of 65 (72%). Worsening MRI findings failed to reliably predict poor outcome, as only 5 out of 14 (35%) patients with worsening MRI at 4 to 8 weeks experienced clinical failure at long-term follow up. By univariate analysis, the strongest predictor of long-term treatment success was clinical improvement at follow-up.
In an additional study of 29 patients with vertebral osteomyelitis, all patients had baseline MRIs and repeat MRIs at three months, and 22 patients had additional repeat imaging at six months.(Zarrouk, Feydy et al. 2007) Antibiotic therapy was administered for an average of 14 weeks. All patients were described to have treatment success at 18 months of follow up. Nevertheless, abnormal MRIs, principally due to marrow edema, persisted in 67% of patients at three months of therapy, and in 15% at six months of therapy. None of those patients experienced clinical failure, and there were no differences in imaging studies in patients who had persistent pain or neurological sequelae from infection compared to those who did not. Finally, 30% of patients had epidural abscesses on imaging at baseline, and all had resolved by three months, in parallel with clinical response. Persistence of MRI bony abnormalities that did not predict clinical failure on subsequent imaging in improving patients has been described in multiple other case series as well.(De Korvin, Provensol et al. 1994, Niccoli Asabella, Iuele et al. 2015, Riccio, Chu et al. 2015)
Nuclear medicine scans have also failed to distinguish patients with osteomyelitis who went on to have long term treatment success from those who did not—generally because the tests were overly sensitive and continued to show bone abnormalities in responding patients.(Scoles, Hilty et al. 1980, Alazraki, Fierer et al. 1985) Studies of CT scans have been inconclusive due to small sample size and no correlation of changes in radiographic results with long term treatment success.(Kattapuram, Phillips et al. 1983)
In a study of 51 adult patients with skull-base osteomyelitis, PET scans were obtained at initiation of therapy and at end of therapy, between weeks 6 and 8.(Faizal, Surendran et al. 2020) After completion of initial antibiotics, the PET scan was repeated every three months, until it was normal or the patient was asymptomatic and had normal ESR and CRP. Among the 21 patients who continued to have symptoms at eight weeks, nine were continued on antibiotic therapy for up to six months, and four received treatment for up to 15 months. Whether or not it was necessary to continue therapy because of positive PET scans could not be determined. Overall, this study found that biomarkers and PET scans did not predict clinical failure in patients who were clinically responding, did not correlate well with one another, and appeared to result in extreme prolongation of therapy in a subset of patients, without clear benefit.
In two studies totaling 35 patients with vertebral osteomyelitis who had serial PET scans, PET scan uptake tended to reduce on antibiotic therapy, consistent with response to therapy.(Niccoli Asabella, Iuele et al. 2015, Riccio, Chu et al. 2015) Yet, results significantly overlapped, making it impossible to distinguish someone adequately treated from someone who was not. Furthermore, the sample size, variable follow-up, and variable antibiotic treatments administered made it impossible to discern if the PET scan results precipitated a change in clinical outcomes.
In a third study of 38 patients with vertebral osteomyelitis, serial PET scans were more sensitive and specific than ESR or CRP, achieving approximately an 80% sensitivity and specificity for predicting “response”.(Nanni, Boriani et al. 2012) That combination of sensitivity and specificity results in +LR and -LR of about 4 and 0.3, which reflects only a modest ability to change post-test probability. Furthermore, the definition of “response” was vague, defined as, “assessed during therapy on the basis of clinical status and inflammatory indexes.” Yet, ESR and CRP may not be relevant to assessing therapeutic response. The only meaningful definition of success is: did patients achieve long term success without clinical evidence of relapse (of signs and symptoms of infection)?
Cumulatively, no data indicate that routine surveillance imaging of any type, including MRIs or PET scans, are clinically impactful, resulting in improved patient outcomes. Marrow signal abnormalities can persist for many months, even in patients who are successfully treated, and cannot accurately distinguish those who will achieve treatment success from those who will not. The use of imaging studies that are overly sensitive to monitor therapy may have the tendency to trigger inappropriately and unnecessarily long courses of antibiotic therapy, exposing patients to harm from drug side effects and selection for antibiotic resistance. The primary driver of clinical decision-making (e.g., whether to prolong antibiotics from a standard six-week course of therapy, whether to evaluate for the need for a source control procedure, etc) should be clinical response to therapy.
Therefore, we do not recommend routine serial imaging in patients with osteomyelitis to determine response to therapy. However, it is rational to repeat imaging in patients who are clinically not responding to antimicrobial therapy to evaluate the need for and feasibility of achieving source control, or to reconsider the initial diagnosis.