Psychometric Properties of the Scoliosis Research Society Questionnaire (Version 22r) Domains Among Adults With Spinal Deformity: A Rasch Measurement Theory Analysis
Article information
Abstract
Objective
Adult spinal deformity (ASD) have lower health-related quality of life (HRQoL) compared to the general population. Applying Rasch measurement theory (RMT), this study tested the revised Scoliosis Research Society-22 (SRS-22r) HRQoL instrument among symptomatic adult patients with degenerative spinal disorders and varying degrees of ASD.
Methods
SRS-22r data from 637 outpatient spine clinic patients with degenerative spine conditions were investigated for unidimensionality, item/scale fit, differential item functioning (DIF), scale coverage/targeting, and person separation index (PSI) using RMT.
Results
Unidimensionality of the SRS-22r was not supported for either the total score or for 3 of its 5 domains. Item fit was acceptable for 11/22 items. The individual domains showed good coverage despite the degree of structural disorders. Ordered thresholds were achieved by merging response categories in some of the items. DIF towards age or sex was found in 11/22 items and in some domain items. The PSI exceeded 0.7 for the SRS-22r total score.
Conclusion
The individual domain scores of the SRS-22r perform better than the total score providing good coverage and targeting among patients with ASD. Refinements of items and domains may improve the structural validity of the instrument to meet the criteria for measuring ASD patients, even when multidimensionality persists.
INTRODUCTION
Adult spinal deformity (ASD) is a common problem, with a reported prevalence of 32%, increasing with age [1]. In the population over age 60, a prevalence of degenerative scoliosis as high 68% has been reported [2]. The prevalence of spinal deformities is expected to rise further with increased life expectancy and population aging [1]. ASD patients have been reported to have lower health-related quality of life (HRQoL) when compared to the unaffected population standardized by age [3]. Patients with symptomatic spinal structural disorders have pain and limitations in functional abilities as well as problems with self-image and mental health issues [3]. Compared to other prevalent chronic diseases, such as arthritis, congestive heart failure, chronic lung disease and diabetes, impaired HRQoL has been found to be even more prominent in patients with ASD [4].
Spinal structural disorders in adults develop gradually over the years due to multiple etiologies, such as spinal degeneration, idiopathic scoliosis, neuromuscular or congenital origin, and obliquity of the pelvis [5]. Low back pain and sciatica are usually the main symptoms in the early phases of ASD [6]. Patients’ HRQoL is affected in the early phases of sagittal malalignment long before the visible loss of sagittal or coronal balance [7,8]. Thus, it is essential to monitor the HRQoL of the patients with spinal disorders and detect problems associated with their spinal structural changes.
The Scoliosis Research Society questionnaire is a deformityspecific patient-reported outcome (PRO) instrument used to measure HRQoL outcomes of patients with spinal deformity [3,8-10]. Thus far the SRS questionnaire is the only disease-specific instrument available to measure HRQoL in patients of all ages with spinal deformity.
The revised Scoliosis Research Society-22 (SRS-22r) has 22 items [11,12] which are based on a 5-point symmetrical agree-disagree Likert scale. The response options are “very good, good, fair, poor, very poor”, “none, mild, moderate, moderate to severe, severe,” or ”very often, often, sometimes, rarely, never.” The items are scored from 1 to 5, with 1 being the worst and 5 the best result. The SRS questionnaire and the scoring guide are available free on the patient outcomes webpage: www.srs.org/professionals. The questionnaire can be scored as subtotals for individual domains (function/activity, pain, self-image/appearance, mental health) or as a total score (subtotal + satisfaction with management domain) (Table 1).
Since 1999 the SRS questionnaires have had different versions: 22, 22r (revised), 23, 24, and 30. In 2021, the SRS removed all but the revised version of SRS-22 from their webpage (www.srs.com) and recommends that all practitioners should utilize the SRS-22r, and its various translations. A translation code from all versions to the SRS-22r has been published by the developers of the questionnaires [12]. Thus far the SRS-22r has been more widely translated, validated, and revised among adolescent scoliosis population [11] rather than adults with degenerative spinal deformity [3,13-15]. Both the measurement properties of different translations [14,16] and structural validity of the SRS-22r total score [17-19] have been debated.
The SRS questionnaire has previously been found to be culturally and linguistically valid among adult patients with degenerative spinal complaints in Finland [10]. However, the structural validity of the SRS-22r domains has not been investigated using a partial credit model based on the Rasch measurement theory (RMT) model. Shortcomings in an instrument’s structure may lead to bias when comparing large patient cohorts with different ages, diagnoses, cultures, and languages between centers or during follow-ups [17]. The RMT provides a tool to investigate the ability of the SRS-22r to measure a latent trait such as function, pain, self-image, mental health, or satisfaction with management. A scale measuring one latent trait can be considered unidimensional and linear, which is essential when measuring longitudinal changes in scores. Item and scale fit in the predefined model as well as construct validity can be tested using the RMT. Furthermore, response bias in each of the scale items can be tested for different degrees of structural disorders using differential item functioning (DIF). Currently, the RMT analysis can be considered one of the gold standard statistical techniques for instrument development and psychometric validation research [20].
This study aimed to evaluate the applicability of the SRS-22r domains in clinical practice among all patients with subacute and chronic spinal degenerative conditions, with special emphasis on the level of structural disorders. Unidimensionality, item and scale fit, residual correlation, DIF, scale coverage/targeting, and person separation were investigated using RMT based on predefined hypotheses.
MATERIALS AND METHODS
A total of 991 consecutive patients with prolonged degenerative thoracolumbar disorder referred for specialist consultation to the spine clinic during 12 months in 2013 and 2014 were recruited to the study. Inclusion criteria were age over 18 years, ability to communicate in the official language and having full spine radiographs. Patients with specific health conditions, including malignancy, pregnancy, neuromuscular disease, or acute fracture were excluded. Altogether 874 patients met the inclusion criteria and 670 gave their written informed consent to participate in the study.
The patients completed the SRS-22r version of the questionnaire. Based on the spinal radiographs, the patients were classified into 3 categories of sagittal structural disorders severity (none or mild, moderate, and severe) according to the SRS-Schwab simplified classification as previously described [8]. All radiographic parameters were measured by a senior spine deformity surgeon. The study protocol was approved by the Central Finland Healthcare District Research Ethical Committee, Jyväskylä Finland (17U/2012).
RMT is a mathematical model designed to evaluate the properties of measurement instruments [21]. RMT analysis calculates the extent to which the observed responses fit the predefined measurement model responses and assesses the unidimensionality of the scale and the precision of measurement [22,23]. The model is based on latent trait theory and the application of additive conjoint measurement [24]. For Rasch analysis, sample sizes of ≥200 subjects can be considered very good with, sizes of 100–199 adequate, sizes of < 50–100 doubtful, and sizes of < 50 subjects inadequate [25].
The study applied the RMT using Rumm2030 software to measure construct validity, model and individual item fit, and reliability. The polytomous partial credit model [26] was chosen. The RMT mathematic model describes the process and pertinent psychometric criteria for fit statistics and reliability [26-29]. Person estimation was conducted with the weighted maximum likelihood method. Analyses were conducted using statistical and illustrative tests in the software. Unidimensionality is one of the main assumptions of the RMT and refers to whether the items in a PRO instrument measure a single construct or a specific latent trait, such as pain or function.
The unidimensionality of the SRS-22r total score and its domains was examined. Principal component analysis (PCA) was conducted to define the “Rasch factor,” i.e., the first factor identified with the highest eigenvalue. After identifying the Rasch factor, the existence of residual factors was examined by dividing the SRS-22r items into 2 groups according to their correlation coefficients with the second factor identified in the PCA. The items with correlation coefficients over +0.3 and those with correlation coefficients below -0.3 formed the 2 sets of items. The person estimates for each item for each patient were calculated in both sets of items. Conducting a series of independentsamples t-tests patient by patient, the estimates were compared between the 2 item sets. Less than 5% of significant t-tests at a 0.05 probability were used as the criterion for unidimensionality. Further, calculating residual correlations between each item pairs was used to identify item dependency. We used a value equal to or over 0.2 to recognize residual correlations [30]. Higher values are generally considered to indicate similarity between items and hence, either item redundancy or the existence of another latent trait after controlling for the influence of the primary factor. If unidimensionality was violated, testlets based on residual correlation between items were formed.
Testlest is formed by summing the response categories of suitable items into one item. Thus, testlets are item bundles that share a common content. To alleviate the influence of item dependency, each bundle is considered as a single polytomous item. The resulting polytomous RMT model is then applied to analyze the testlets. Items with residual correlations over 0.2 were combined to form testlests. Next, another set of independent-samples t-tests was conducted to investigate if the violation of unidimensionality had been corrected. The authors hypothesized that the SRS-22r instrument and its 5 separate domains would exhibit a unidimensional structure.
To investigate the fit of the SRS-22r to the RMT, overall goodness-of-fit and item fit statistics were calculated. Chi-square (χ2) values (item-trait interaction) and standardized fit residuals (item–person interaction) were investigated to identify item fit. χ2-values can be used to investigate how well the difficulty in performing of the item meets the ability of the respondent and hence correctly discriminates between different states of the trait being measured. The standardized fit statistics provide information on how much a response differs from the model expectation. The outcome may be interpreted with very low standardized fit statistics indicating redundancy and high values indicating poor fit (Supplementary material 1). The authors hypothesized that the p-values of the chi-square test statistics after Bonferroni adjustment would be nonsignificant, indicating good fit.
The fit residual calculation (item–person interaction score) shows the level of divergence of the item for the persons who fit the model. The divergence calculation yields a residual score that estimates a standard normal distribution where the expected mean is 0 and standard deviation ± 1. In the RMT model, fit residuals between -2.5 and +2.5 are generally considered to indicate acceptable fit. Values below or over this range indicate over- or underdiscrimination in relation to average discrimination ability and thus poor fit of the item to the RMT model and measurement disturbances. A high residual fit can provide information on the redundancy of the given item, as the item may not contribute any new information to the scale.
The targeting ability of each item was examined by investigating the order of the thresholds of the response categories. A threshold indicates the point at which a 50% probability exists for the response to fall into either 1 of the 2 adjacent categories. Disordered thresholds indicate that the response categories resemble each other too closely to detect which category the answer should fall into. The authors hypothesized that the thresholds of the SRS-22r would be ordered.
The targeting and coverage of the SRS-22r scale were examined to investigate whether the questionnaire captures the whole spectrum of the subject matter in the sample as well as to obtain information about the range in which a questionnaire best functions in a distinct patient group. Person and item locations were then examined to determine whether the distribution of items matched the patient distribution on the scale. Differences in person-item distribution in subgroups by age, sex, and degree of spinal deformity were examined. Differences in person-item distribution and the functioning of the SRS-22r total score and of individual domains in the aforementioned subgroups were examined. Analysis of variance (ANOVA) was used to test the statistical significance between different groups. Differences in the mean score of those with different degrees of spinal structural disorders were tested. The authors hypothesized that no significant differences would be observed when the type I error rate (alpha) was set to 0.05. The person separation index (PSI) value was calculated to investigate the sensitivity of the instrument to discriminate between patients of varying health status [31]. The PSI ranges between 0 and 1, with a higher value indicating better sensitivity. Values exceeding 0.7 are generally considered acceptable. The authors hypothesized a minimum PSI value of 0.80. DIF was used to test for possible response bias between subgroups in each item. DIF occurs when, for example, men and women within the same sample respond differently to an individual item. Uniform DIF means that the difference in probability remains constant at different levels of measurement. Nonuniform DIF, in turn, means that probabilities differ between groups at different levels of measurement. If the response distribution is similar between the subgroups under examination, then no DIF exists between the groups. If the distribution is similar in shape but follows different logit values, a uniform DIF (UD) is confirmed. If the shape of distribution is different, a nonuniform DIF (NUD) is confirmed. DIF was analyzed for age and sex. The authors hypothesized that there would not be DIF in any of the item towards age or sex. Bonferroni-adjusted ANOVA was used to identify potential item DIF.
RESULTS
A total of 637 patients with complete data and a signed informed consent (mean± SD, aged 54.8± 15.3 years; 56.2% female) were included in the final analysis (effective response rate: 64.3%). Overall, 407 patients (64%) had none or mild, 159 moderate (25%) and 71 severe spinal structural disorders (11%). Patient characteristics are presented in Table 2.
1. SRS-22r Total Score
The unidimensionality of the SRS-22 items was not supported, as 20.57% of t-tests were significant at 0.05 probability (Table 3). A residual correlation over 0.2 was found between 38 item pairs (residual correlation matrix; see Supplementary material 2). Creating testlets using residual correlations or clinical and logically associations between relevant items did not produce unidimensional scale.
The item fit statistics calculated for each SRS-22r item revealed significant chi-square values after Bonferroni adjustment in 12 items (items 3, 7–8, 11, 13–14, 16–18, 20–22). Further, fit residuals falling outside the range of -2.5 to +2.5 were found in 11 items (items 7–8, 10–11, 13–17, 21–22).
When the 22 items of the SRS-22r were investigated as one scale, 16 of the 22 items had ordered thresholds. The remaining 6 items (11, 15, 17, 18, 19, and 22) had disordered thresholds.
Patients scored within the range set for the coverage of the scale. No statistically significant differences were observed for age (p= 0.68) or sex (p= 0.06) in the person and item distribution (Supplementary material 3). However, a moderate statistically significant difference (p= 0.01) was found for the person-item threshold distribution after grouping the patients by degree of structural spinal disorders but not after Bonferroni adjustment (Supplementary material 4). The PSI for the 22 items was 0.89 (Table 3). The PSI 0.89 indicated good degree to which patients can be differentiated into groups of person separation. Cronbach alpha was 0.89 for the SRS-22r. Five items (5, 6, 9, 10, 19) showed DIF for age and one item (12) DIF for sex (Table 4). As unidimensionality was not found for the total score with reasonable adjustments, the subsequent adjustment analyses are reported only for the domains of the SRS-22r.
2. Function/activity (F/A) domain
In the function/activity domain, 4.9% of the t-tests were significant (p < 0.05), thereby supporting its unidimensionality (Table 3). Residual correlations over 0.2 were noted in 5 of the 10 item pairs (residual correlation matrix shown in Supplementary material 2). The item fit statistics in the function domain indicated good fit of the items to the RMT model (Table 3). The PSI for the domain was 0.77 and thus was below the hypothesized value of 0.8 (Table 3). Items 15 and 18 had disordered thresholds. Merging item response categories that score 1 to 3 in item 15 and 3 and 4 in item 18 led to ordered response category thresholds in each of the 5 function domain items (Fig. 1). The person-item threshold distribution showed only minor exceptions in the coverage of the function domain in the lower extremity of the scale (Supplementary material 5A). Subgroup analysis revealed significant differences between the severity classes in the person-item distribution of the Function domain, with higher severity patients having lower mean logit values (p < 0.001) (Fig. 2A). Uniform DIF for age and/or sex was observed in all the function domain items except item 18 (Table 4).

Item response category thresholds (IRCTs) of the function/activity domain after merging response categories 1, 2, and 3 in item 1 and 2 and 3 in item 5. IRCTs of the pain domain after merging response categories 1 and 2 in item 4 and 0–2 and 3–4 in item 17. IRCTs of the self-image domain after merging response categories 1 and 2 in item 4 and 0–2 and 3–4 in item 17. IRCTs of the satisfaction with management domain after merging response categories 3 ‘probably not’ and 4 ‘definitely not’ in item 2. No merging to achieve ordered thresholds was required for the mental health domain.

Illustrations showing the person-item threshold distribution difference between groups of degree of spinal deformity in the distribution of person scores and items of the SRS-22r. Person-Item threshold distribution and degree of deformity. Mild or moderate deformity (blue), moderate (red), and marked structural disorder (green). Function/activity (A), pain (B), self-image (C), mental health (D), satisfaction with management (E). SRS-22r, revised Scoliosis Research Society-22; SD, standard deviation.
3. Pain Domain
In the original version of the pain domain, 6.6% of the t-tests were significant, indicating violation of the unidimensionality assumption (Table 3). Rescoring the items did not lead to a unidimensional scale structure, as the percentage of significant t-tests was unchanged. Nine out of 10 item pairs showed residual correlations over 0.2 (residual correlation matrix Supplementary material 2). The formation of a testlet by combining items 1 (‘Which one of the following best describes the amount of pain you have experienced during the past 6 months?’), 2 (‘Which one of the following best describes the amount of pain you have experienced over the last month?’), and 17 (‘In the last 3 months have you taken any days off of work, including household work, or school because of back pain?’) according to their residual correlations and contents led to a unidimensional scale, as 1.2% of the t-tests were significant (Table 3). All the pain domain items except item 8 showed acceptable fit residuals whereas, according to the Bonferroni-corrected chi-square tests, the item-trait interactions showed no significant distortions (Table 3). The PSI for the pain domain was 0.67 (Table 2). After testlet formation, the PSI increased to 0.85 (Table 3). In the pain domain, items 11 and 17 had disordered thresholds. Merging item response categories “nonnarcotics daily or less” in item 11 and categories “0–2 days absence” and “over 3 days absence” in item 17 led to ordered thresholds (Fig. 1). Overall, the patients scores indicated that coverage of the pain domain was good (Supplementary material 5B). The patients’ logit values did not differ between the deformity severity subgroups (p = 0.9) (Fig. 2B). Age (p= 0.0018) was associated with the item location distribution. No DIF was observed in any of the pain domain items (Table 4).
4. Self-Image Domain
In the self-image domain, 9.91% of the t-tests were significant, indicating violation of the unidimensionality assumption (Table 3). A residual correlation of over 0.2 was found in 7 out of 10 item pairs. To achieve unidimensionality, items 4, 6, 10, 14, and 19 were pooled to form a testlet based on item content. The testlet reduced the proportion of significant t-tests to 3.1% (Table 3). All the self-image items except item 4 showed good fit to the RMT model (Table 3). Both the fit residual and Bonferroni-corrected chi-square statistic for item 4 indicated poor fit to the RMT model (Table 3). The PSI of the self-image domain was 0.76 (Table 3). To achieve ordered thresholds, response categories “somewhat happy” and “neither happy nor unhappy” in item 4 were merged (Fig. 1). Coverage of the self-image domain was good with minor discrepancy as 4 patients scored beyond the range of which the scale provided (Supplementary material 5C). There was a statistically significant difference in person-item distribution in age (p< 0.001), sex (p= 0.01), and degree of deformity (p < 0.001). Uniform DIF was observed across the age groups in items 4, 6, and 19, and nonuniform DIF was observed between sexes in item 10 (Table 4).
5. Mental Health Domain
In the mental health domain, the proportion of significant t-tests was 6.3%, and hence the domain was not unidimensional (Table 3). Residual correlations over 0.2 were found in 6 of the 10 item pairs. No clear testlet solution that would achieve unidensionality was available. All the items in the domain showed ordered thresholds as well as good fit to the RMT model (Fig. 1, Table 3). The PSI of the domain was 0.90. The domain covered the patients well, as only a few outliers were found at both extremities of the range (Supplementary material 5D). Coverage was equal in terms of degree of spine deformity (p= 0.32), age (p= 0.64), or sex (p= 0.70). DIF was detected in 2 out of 5 items (Table 4).
6. Satisfaction With Management Domain
The satisfaction with management domain met the criterion for unidimensionality, as 1.9% of the t-tests were significant (Table 3). No residual correlation was found between the 2 items. The item fit statistics indicated good fit to the RMT model of the 2 items (Table 3). Item 22 had disordered threshold categories. Merging response categories ‘probably not’ and ‘definitely not’ in item 22 produced ordered thresholds. The PSI value of the treatment satisfaction domain was 0.33 (Table 3). The patients’ satisfaction with management scores showed that the domain covered the sample well (Supplementary material 5E). There was no discrepancy in the person-item distribution for age (p= 0.21), sex (p= 0.26), or degree of deformity (p= 0.66). No DIF was observed in either item (Table 4).
An overall summary of the RMT statistics for the SRS-22r domains is presented in Table 5.
DISCUSSION
The performance and structural validity of the SRS-22r questionnaire differed according to whether it was analyzed as the total score or as the individual domains. The SRS-22r total score showed poor structural validity when inserted into the RMT model. It seems that the construct validity of the SRS-22r improves when it is divided into distinct subscales. Nonetheless, the total score and its 5 domains provided sufficient coverage and targeting in all the spinal deformity severity categories.
In the present study on adults with degenerative spine conditions, the unidimensionality of the SRS-22r total score was not supported. The present findings are in line with previous findings of multidimensionality of the SRS questionnaires [17,32]. Jain et al. [18] and Caronni et al. [19] introduced a reduced, unidimensional and linear 7-item version (SRS-7) of the SRS-22 that met the Rasch criteria among adolescents with scoliosis. Jain et al. [33] validated the SRS-7 version on adults but the fit for RMT was not separately tested. Four of the pooled items in the SRS-7 were from the self-image and one each from the pain, function/activity, and mental health domains. The short version of the SRS instrument was found to be a good for assessing global changes but lacking the individual aspects of spinal deformity [17,33]. Mannion et al. [17] performed structural factor analysis on different linguistic versions of the SRS-22. They suggested that removing the worst fitting items (3, 14, 15, 17), one from each nonmanagement domains, would improve the multidimensional instrument together with standardization and validation of the items across language versions [17]. In the early revisions, items 17 and 18 were rephrased for the present SRS-22r after further adaptations among adolescents and adults [11,13].
The concept of HRQoL is multidimensional [34], and thus it is plausible that RMT analysis does not support the unidimensionality of the SRS questionnaires. Our findings indicate that the structural validity of the SRS-22r could be enhanced by reevaluating its content and removing the afore mentioned potentially mis-fitting items. Moreover, the fact that the individual SRS-22r domains showed better structural validity leads us to recommend that the individual domain scores rather than the total score are used in clinical work and research. This might provide more accurate patient-reported outcome measure (PROM) data.
To the best of the authors’ knowledge, the performance of no SRS instrument has previously been evaluated in participants with different degrees of ASD severity. The domains of the SRS22r seem to work well irrespective of the degree of spinal structural disorders. The sample used in the present analysis presents the population visiting an orthopedic spine center due to prolonged degenerative thoracolumbar disorders. In adolescents, the SRS-22 was found to be inferior to the specific Spinal Appearance Questionnaire (SAQ) in detecting patients who required surgery and had greater curve magnitude [35]. In ASD, pain, disability, and sagittal structural changes cause deterioration in HRQoL and are the main drivers for seeking surgical treatment [36] instead of the deformity magnitude. Adults also comprise a very heterogenous group of people as to their spinal disorders and medical conditions compared to adolescents with idiopathic scoliosis. The degree of spinal deformity may affect the completion of the total score so that more respondents have for example higher scores from harder items affecting the person-item distribution. Further studies could focus on performing the RMT analysis for different stages of spinal deformity, a task that was beyond the scope of this study. Also, the previously studied HRQoL instruments failed to account sufficiently for neurogenic injury or impairment [37]. The SRS version 30 total score has been structurally evaluated in relation to radicular symptoms [9], but further studies are required to evaluate the validity of the SRS-22r for measuring neurogenic impairment. Structural validity of the SRS or other deformity-specific HRQoL instruments has not been mathematically analyzed in large patient cohorts or with RMT. The SRS-22r domains are reported separately in several studies, but to the author’s best knowledge few studies report results in comparison between the individual SRS domains [14,38].
Compared to the SRS-22r total score or other domains, Function/activity most optimally met the RMT model criteria. It was found to be unidimensional with both good item fit and coverage and an acceptable PSI level. The domain achieved ordered thresholds in all items after merging the response categories in items asking about current level of activity and the frequency of going out compared to friends. Potential item response bias, compromising fit to the RMT model, was noted when patients were divided into subgroups by age or sex. Majority of the function/ activity domain items showed DIF with at least one tested age or sex group. Patients with high degree of sagittal deformity also had lower logit values and hence a different person-item distribution in the function/activity domain. This may indicate that physical functioning and the capability to perform and daily activities depend on the degree of spinal deformity and that this difference is detected by the SRS-22r Function/activity domain [3,4].
The pain domain showed good item fit and sensitivity, and no DIF was found. The domain items ask about pain during the last 6 months, during the past month and during rest, the use of pain medication, and the frequency of absence from work or school (item 17). The last 2 items may also measure other traits that patients cannot clearly differentiate from their spinal condition when filling in the questionnaire. Item 17 showed misfit to the RMT rating scale structure parameters (Andrich thresholds), indicating that the response categories did not match the item’s intended meaning. Altogether 40% of the participants were not available for employment or school, which could explain the confusion over the response categories in this older population. Adapting item 17 to better serve ASD patients who may be students, in employment or retired, can be recommended. Pain was the only SRS-22r domain that showed no response bias between the age or sex groups. The domain functioned well across all degrees of spinal deformity. However, the pain scale differed between the age groups in its coverage and targeting.
Although the self-image/appearance domain did not show unidimensionality, it showed good item fit in 4/5 of the items, sensitivity and coverage. Item 4 (“If you had to spend the rest of your life with your back shape as it is right now, how would you feel about it?”) showed potential misfit to the RMT model. The sensitivity of the domain was acceptable. It was also multidimensional; however, removing or modifying item 4 might improve the fit of the domain to the RMT model. This domain might also improve the value of the SRS instrument in measuring HRQoL in all degrees of ASD, as the other spine questionnaires do not place similar emphasis on the emotional and psychological functions [39].
The mental health domain items were taken with permission from the Rand Corporation’s SF-36 instrument. All the mental health items are good measures of mental well-being problems, as demonstrated by their ordered thresholds, good sensitivity, coverage and fit to the RMT model. However, in this study, the mental health domain was not unidimensional. In another patient cohort with prolonged back pain and associated depression and distress, the SRS-22r has also shown a multidimensional structure [40]. Potential age-related response bias was found for item 20, which asks how often the respondent has been a happy person, and sex-related bias for item 16, which asks whether the respondent has felt downhearted and blue. Such bias may be explained by the multidimensionality of the measured trait and respondents’ interpretation of the positive vs negative tone of the item (happy vs. blue).
Satisfaction with management is rarely covered in the PROMs used for spinal problems. This 2-item unidimensional domain showed good coverage and fit to the RMT model and no DIF. Merging 2 response categories (probably not and definitely not) in item 22 resulted in ordered thresholds. The domain is simple and short, has good structural and psychometric validity, and can be recommended in clinical use.
The strength of this study was the consecutive-sample cohort of symptomatic adult patients with a wide range of different degrees of spinal deformities. The dropout rate of the recruitment was low, and thus the result can be generalized to real-life studies of this patient population. RMT was applied in a sufficient sample size to provide reliable information on the psychometric and structural properties of the SRS-22r. Furthermore, to our best knowledge, the individual SRS-22r domains have not been evaluated with the RM among adults. Chi-square statistics can be sensitive to large sample sizes. As our sample size was ample, it could potentially result in significant chi-square statistics, even for a well-fitting measure. The limitations of the current study are that analysis only included mostly preoperative patients and that the study was as a single-center study conducted in one spine clinic. Furthermore, the SRS-22r and its domains scores could be structurally investigated and developed among adult patients who have undergone surgery due to spinal deformity.
CONCLUSION
The results of the present RMT analysis show that, among ASD patients, the individual domain scores of the SRS-22r perform better than the total score. Refining items and domains may improve the validity of the instrument for use with adult patients with spinal deformities, even when multidimensionality between domains persists. The questionnaire largely performed equally across age and sexes. The SRS-22r domains were able to differentiate between degrees of spinal deformity.
SUPPLEMENTARY MATERIALS
Supplementary materials 1-5 can be found via https://doi.org/10.14245/ns.2143354.677.
Item-trait interaction formulae in Rasch measurement theory.
Residual correlation matrix SRS-22 domains. SRS-22r, revised Scoliosis Research Society-22.
Person-Item threshold distribution of the SRS-22r questionnaire total score. SRS-22r, revised Scoliosis Research Society-22.
Person-Item threshold distribution of the SRS-22r total score according to different stages of deformity. The scale provided coverage for patients locating between -4.5 and 4 logits. All patients were inside the range where the scale provided coverage. Mild or no deformity (blue), moderate (red), and marked deformity (green). SRS-22r, revised Scoliosis Research Society-22.
Person-Item threshold distribution of the SRS-22r domains, grouping set to interval length of 0.20 making 55 groups. Function/activity (A), pain (B), self-image (C), mental health (D), satisfaction with management (E). SRS- 22r, revised Scoliosis Research Society-22.
Notes
Conflict of Interest
The authors have nothing to disclose.
Funding/Support
This study received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author Contribution
Conceptualization: KK, JY, AH, JR; Data curation: KK, MU, JR; Formal analysis: MU, JR; Methodology: KK, JY, AH, JR; Project administration: KK, JY, AH; Visualization: KK; Writing - original draft: KK, SH, JR; Writing - review & editing: KK, SH, MU, JY, AH, JR.