What Does Camp Do for Kids? Chapter 4
The research problem lies in assessing the type and degree of influence that the organized camping experience has on the self constructs of youth. In order to determine this influence, a random effects model of metaanalysis was employed. Use of this metaanalytical technique allows for the combination and comparison of research results providing information that can be generalized to the population. This chapter addresses the treatment of data, the reliability of that treatment, the data analysis, and the results. The guiding work for the procedures described in this chapter was The Handbook of Research Synthesis, (Cooper & Hedges, 1994).
Treatment of the Data
The Sample
A total sample of sixtyone (61) studies was evaluated for there ability to supply relevant data with regard to the research question. Of this sample, twenty two (22) studies measured a construct of self and provided sufficient data from which to calculate an effect size. Sufficient data is represented by a minimum of mean and standard deviation values of the pre and post treatment measures and an N value. These 22 studies provided sufficient data to identify a sample of thirtyseven (37) effect sizes. These effect sizes represent 37 independent measures, taken at thirtysix (36) different camps. The 37 cases were then subject to the methodology described in Chapter 3. Appendix B contains a list of the entire sample evaluated for inclusion in the metaanalysis and the reason for any study's exclusion from the analysis.
Coding Procedures
The coding protocol was developed by the researcher, using the research questions as a foundation for the data extracted. The coding sheet was then reviewed by a panel of experts. The coding sheet was amended and the data recoded. A panel of coders was then employed to verify the coding process. Suggestions from this experience were used to amend the coding key a second time. The data was recoded. The final data coding sheet and key were then reviewed and confirmed by the panel of experts, and the data recoded for the final time (Cooper & Hedges, 1994; Electric..., 1987; Sacks, et al., 1994). The primary focus of the coding was on the quality of the research method employed and the accurate extraction of data used to identify the potential moderators to the effect. A weighting factor was assigned to each study based on the coders estimation of quality derived from the coding process (Cooper & Hedges, 1994; Rosenthal, 1984; Wolfe, 1986).
Reliability of the Coding Procedure
The Effective Reliability in Table 4.1 expresses a measure of the level of repeatability of the quality rating aspect of the coding process that could be expected if the effort were to be duplicated. In this case (r = .9982) the coding key provides sufficient information for a high level of repeatability of the coding used to estimate the quality rating of each study.
The quality rating was used to compute a weighting factor and combined with the inverse of each study's variance for the evaluation of the quality weighted effect sizes (Cooper & Hedges, 1994). The use of weightings for calculating effect size is discussed later in this chapter in the section on Results. Reliability of the coding of the data extracted from the sample can be established through the initial coding and the subsequent recodings during the development and verification of the coding key.
Table 4.1  Effective Reliability of the Mean of Judge's Ratings for Quality
Relationship of Coders 
r 
r  Primary Coder/ Verification Coder J =  0.9960 
r  Primary Coder/ Verification Coder M =  0.9919 
r  Verification Coder J / Verification Coder M =  0.9960 
mean r =  0.9946 
r of the Effective Reliability of the Mean of Judge's Ratings = 
0.9982 
Rosenthal, (1984). 
N=35 
Data Analysis
The Sample of Studies
Studies included in the metaanalysis were selected based on the following criteria:
 The study was experimental, preexperimental, or quasiexperimental in design.
 Age was initially a criteria for including studies in the metaanalysis. This criteria was eliminated as a parameter for the decision to include or not include a study in the final analysis. The decision to eliminate the age criteria approximately double the number of effect sizes included in the study. The final range of ages included in the analysis is from six to twenty years old. Thus, use of this approach enhanced generalizability by expanding the population of studies to reflect the range of ages of subjects that attend camps. Furthermore, there is some difficulty in associating development stage with chronological age. The elimination of the age criteria was thought to better represent the spectrum of youth that experience the organized camping environment.
 The study measured a construct of self, as defined in Chapter 2, primarily either selfconcept or selfesteem.
 Individual study's reflected an acceptable level of validity which was evaluated based on criteria outlined in the Coding Key (Appendix G). Statistical methods employed depended on information provided in the study (Cooper & Hedges, 1994; Hunt, 1997; Rosenthal, 1984; Wolfe 1986). No studies were eliminated from the sample for failure to meet this criteria.
 The study provided adequate statistical information or data to be useful in the greater metaanalysis: either a reported effect size or the data to calculate an effect size.
 Studies which exhibited a fundamental flaw, defined as not meeting criteria one through five above, were not included. Notation about a study's exclusion was recorded (Cooper & Hedges, 1994; Hedges & Olkin, 1985; Hunt, 1997; Hunter, Schmidt & Jackson, 1982; Light & Pillemer, 1984; Wachter & Straf, 1990) and can be found in Appendix B.
 Unpublished studies and journals that are not refereed were screened for appropriate compliance with human subjects procedural protocols that were in effect at the time the study was conducted. A study's mention of parental permission and a subjects freedom to discontinue participation at any time were taken as means of compliance. Given the refereeing process for journals and dissertations, these studies were assumed to be in compliance.
The Random Effects Model
The random effects model of metaanalysis presumes that the sample of studies analyzed is based on a greater population of studies that differ from the studies in the sample in two ways. The population and the sample of studies differ based on characteristics and effect size parameters. In the fixed effects model, studies are in groups with similar characteristics and effect size parameters, and the greater population is composed of these groups. The second difference between population and sample in the random effects model is that the characteristics of the subjects in the sample studies differ from the greater population as a result of the randomization process. The random effects model mathematically reduces to a fixed effects model if the variance across the effect sizes is homogeneous. An identified random effect that is significant is by definition, generalizable to the population (Cooper & Hedges, 1994).
Precedent, set by Andrews, Guitar, & Howie (1980) for including preexperimental studies, recognizes the accomplishment of randomization through the spectrum of subjects that result from the combination of numerous studies. The spectrum of subjects in the various studies included in this metaanalysis supports this notion of randomization. An overview of the studies in this research supports the arguments for randomization and generalizability. The thirtyseven effect size cases combined in this research were generated from samples with a cumulative total of 1139 subjects. The profiles available covered a broad spectrum of cultural, socioeconomic, gender, and ability variables across the sample analyzed in this research.
The data was analyzed according to the procedures described in the Chapter 3 section titled Statistical Analysis and Interpretation. In the majority of cases the mean and standard deviations of the pre and post treatment, or control and experimental groups, were used to generate a Hedge's g for each study. Appendix J contains a matrix of data that was used to calculate the mean effect sizes for the sample. The combination of effect sizes was analyzed for homogeneity of variance using the Q statistic. The Q was then compared to P² (p < .05, df = k1 studies), rejection of the hypothesis of homogeneity was based on the Q value exceeding the P² (Cooper & Hedges, 1994).
Rejection of the hypothesis of homogeneity indicates a variance component of the mean effect size that significantly exceeds zero: the variance is due to more than just being chance. The unconditional variance component was then recalculated according to Cooper & Hedges (1994), and the conditional variance component isolated. Based on the conditional variance, a 95% confidence interval was used to establish significance of the random variance and the generalizability of that variance to the population. A nonzero interval indicates statistical significance and generalizability of the random variance (Cooper & Hedges, 1994).
In order to identify moderator variables that are correlated with the random effect, explaining some of the associated random variance, a triangulation approach to analysis was used. The first angle of analysis used in the identification of moderator variables was the use of datapoint line plots to identify relationships (Electric..., 1994; Cooper & Hedges, 1994; Rosenthal, 1984). Moderator variables were identified through visible relationships to the distribution of effect sizes from the 37 cases taken from the sample of studies. Potential moderators were then included in a regression analysis. In order for this to be meaningful, the coded data was arranged in some order so that identified variances could be explained and graphic representations could be visually interpreted. The ordering of the data for each of the moderators is presented in the Variable Key in Appendix K.
A stepwise linear regression was performed on the Hedge's g, Pearson r, and the Fisher Zr effect sizes, as the second angle of analysis (Cooper & Hedges, 1994). In the regression, moderators that contributed to the random variance were isolated for analysis in order to add to the explanation of the random variance (Appendix L). SPSS 8.0 for Windows was used to calculate the stepwise regression.
The final angle of analysis used to identify moderators was the comparison of various combinations of the effect sizes from the 37 cases (Cooper & Hedges, 1994, Hunt, 1997; Wachter & Straf, 1990). These combinations where based on the moderator variables identified in the plot and regression analysis described above. Effect sizes were combined based on an individual study's relationship to these moderator variables and then compared to other combinations in order to identify the largest effect.
As an example of effect size sensitivity analysis, all the studies from day camps could be combined to generate a random effect size for day camps. A similar combined random effect could be generated for resident camps. Comparing the two would give an idea about the influence of the day versus resident camp environment on the magnitude of the associated random effect. There was an insufficient amount of data on day camps in this study to actually make this comparison.
Results
Interpretation of the effect size requires some caution, as different metrics have different scales. Hedge's g was used to calculate the initial effect size for most of the cases. Because of the limitations of the data included in some studies, calculating Pearson's r was necessary to establish an initial effect in these cases. Hedge's g, a dichotomous measure, is most appropriate for this measure because it is based on the difference between a pre and post mean comparison and an associated estimate of standard deviation (Cooper & Hedges, 1994). A problem exists with dichotomous, dindex, estimators as they tend to overestimate the population for small sample sizes (Hunt, 1997). Both the initial g or r were transformed into the other measure so that an r and a g were available for each case.
The Pearson r is applicable to correlational data as well as dichotomous data. There are different equations for calculating r for each of the d and r indices. Also, Pearson's r is readily interpretable (Cooper & Hedges, 1994; Hunt 1997). As a comparative measure, Fisher's Z transformation of r was also calculated. The Fisher Zr is a z score transformation that corrects r for variance that arises as an r distribution becomes skewed when the population size gets farther and farther from zero (Cooper & Hedges, 1994). For smaller sample sizes and an aggregated r value that is less than .25, calculating Zr provides no additional meaningful information (Hunt, 1997).
Appendix M presents a graph of the g, r, and Zr effect sizes, arranged from highest to lowest value. This table provides a good visual interpretation of the relationship between the three effect size calculations. The Pearson r can be seen to provide the tightest distribution around the zero axis, a function of the limits of the r scale, 1 to 1 (Hunt, 1997). Because of the Pearson r distribution's applicability to dindex data, the ease of interpretation of r, the lack of need for the use of the Fisher's Zr transformation, and some small sample sizes among the studies that would overstate the dichotomous Hedge's g values, the Pearson r will be used for the interpretation of the random effect. The distribution of r's for each case is also presented in Appendix M.
All three effect size estimators (r, g, and Zr) were used as dependent variables in three separate regression analysis, this was done in order to identify any potential differences in the estimators and to enhance identification of moderators. The final weighting of effect sizes was based on the inverse of the effect's variance (Cooper & Hedges, 1994). Recall from earlier discussion that the inverse of the variance provides more weight to those effect sizes that are the result of studies done with greater precision. Use of the quality weighting factor generated during the coding process did not produce significantly different results, and in light of Cooper & Hedges (1994) preference for the inverse of the variance, quality weighting was dropped from the analysis. The decision to drop quality weighting was further reinforced by a resulting reduction of confusion that might occur from the reader's need to interpret the additional results.
Only sensitivity analyses that identified moderator variable as contributing to the variance of the effect are discussed in the data interpretation. Effect sizes could not be compared based on gender, socioeconomic status or cultural background variables because this information was not available in enough detail across the cases to provide a meaningful interpretation of the influences of these potential moderators. The variables of the timing of the measure, length of the camp session, sample size, research design, instrument used to measure the treatment, alpha level or Type I error probability, and study quality weighting were found not to be moderating variables in this analysis.
Interpretation of the Random Effect
Table 4.2 presents the mean effect size estimates, Q statistics, and confidence intervals for the sample of 37 effect sizes. These results indicate a significant positive random effect that is generalizable to the population (Cooper & Hedges, 1994). The magnitude of the positive effect (r = .1023) can be interpreted as small (Cohen, 1969; Cooper
Hedges, 1994).
Table 4.2  Mean Random Effect Comparison for g, r, and Zr.
Hedge's g  Pearson r  Fisher Zr  
Mean Random Effect 
=  .2517  .1023  .4308 
95% Confidence 
=  1168 .3866 
.0457 .1606 
.3748 .4858 
N=37 
=  
Q statistic 
=  165.9668  127.2930  89.0058 
P² (p<.05,36) 
=  50.9985  5039985  50.9985 
An intuitive and easily understood tool for interpreting the effect size is the Binomial Effect Size Distribution (BESD), a common way of interpreting an effect size (Cooper & Hedges, 1994; Rosenthal, 1984). The BESD answers the question, What is the effect on, or change in, the success rate as a result of the treatment? Because the difference in success rates is identical to r, interpretation is simplified (Rosenthal, 1984). In this case r = .1023, thus 10.23% more of the population experiencing camp achieve significant positive increases in a construct of self than the rest of the population. According to the prescribed calculations for the BESD, .5 + or  r/2 (Cooper & Hedges, 1994), this effect can be restated as a change in the number of people effected from 45% to 55%. This restatement provides for easily understood interpretation, that while the r represents a small effect, an effect on 10% of the population is significant.
Given the nature of a metaanalysis' reliance on locating research results, an estimation should be calculated of how many nonsignificant results that would be required to reverse the findings of a study (Cooper & Hedges, 1994; Hunt 1997). This filedrawer effect, as it is known, is calculated using a formula that sums the effect size from each case divided by its associated standard error. The sum is then divided by 1.96 and squared, and then added to the negative value of the number of studies in the meta analysis, 37 in this case. The value 1.96, a constant, represents the z score for a twotailed significance test at the 95 percent confidence level (Cooper & Hedges, 1994). The result is the number of studies required to reverse the findings for the study. In the case of this metaanalysis the filedrawer effect is 296 studies. Thus, it would take 296 studies with an average result of nonsignificance in order to negate the findings of this metaanalysis.
Identifying Moderator Variables
The stepwise regression analysis of the g, r, and Zr effect sizes indicate that the philosophy of the camping program and, to a lesser extent, age were explanatory, or moderator variables for the variance of the random effect. Table 4.3 presents the results of the regression. The moderators of camp philosophy and age account for 33% of the variance (R² = .330, for r = .1023). The graphs in Appendix N provide a corresponding visual interpretation of the age and camp philosophy moderators and their relationship to the random effect, r.
Table 4.3  Results of the StepWise Regression Analysis.
Predictors  
Camp Philosophy  Age  
Dependent Variables  
Hedge's g   2.667    
Significance level 
.012 
.035 

Pearson r   3.054    
Significance level 
.004 
.028 

Fisher Zr   2.919    
Significance level 
.006 
.031 
Identifying the meaningful moderator variables as being the camp philosophy and age of the subjects, permitted for an effect size sensitivity analysis as described in the above section on the Random Effects Model. Sensitivity to the camp philosophy was explored by recognizing a positive correlation with r and sequentially eliminating the lowest ranked category for the camp philosophy. For the purpose of this study, coded camp philosophies were divided into three categories termed campgoal 2, campgoal 3 and campgoal 4:
 Campgoal 2) A structured environment focused on competence or knowledge development. Enhancement of a self construct is not part of the camp philosophy.
 Campgoal 3) An experience designed for the development of personal leisure skills or an environment in which to have fun. Enhancement of a self construct is not a stated part of the philosophy. The focus of the camp is also not based on developing a competence. This category is operationally termed general camp.
 Campgoal 4) The development of some construct of self was a focus of the program and camp philosophy. Effort, recognized in the primary study, was made to create an environment that was conducive to enhancing a self construct.
Table 4.4  Effect Size Sensitivity Analysis for the Campgoal Moderator Variable.
Campgoal Categories 

2  competence 3  general camp 4  self focus 
3 general camp 4 self focus 
4 self focus


Random effect  r  .1023  .1585  .2006 
Table 4.4 presents the change in the random effect for the sensitivity analysis of the camp philosophy moderator variable. Eliminating the campgoal 2 camps, those focused on competence development, made a significant increase in the r value. Similarly, when the camps with focus on self enhancement were taken alone, the effect increased again. The implication of this is that programs designed with the intent of enhancing a self construct have a greater effect on the evaluation of self. Appendix O provides a graph of the relationship between r and age, sorted by the campgoal categories.
Evaluation of the age variable in sensitivity analysis identified that eliminating the 13 to 18 year old categories created the largest random effect from eliminating any one age group, Table 4.5. The need to take the data into context of other moderators is highlighted by this table. Removing age category 2 would increase the overall effect to r = .1208, but eight to twelve year olds would still be included in the identified random effect. The information in this table should be viewed in the context of the plot in Appendix O, as the outcomes from sensitivity are likely related to the camp philosophy. This is particularly true in this context because the breakdown of the age category does not provide for easy interpretation. Examination of the graph in Appendix P also shows that the youngest age group had the majority of the highest scores. The negative regression correlation between r and age (t = 2.292, p < .031), also indicates that the there is a greater influence on the self at younger ages.
The results indicate that the camp philosophy has an influence on the magnitude of the random effect. This correlation suggests that a camp philosophy that is oriented toward enhancing a camper's self constructs has a more significant effect. To a lesser extent, age is negatively correlated to the random effect, indicating that the effect is greater for younger campers than for older campers.
Table 4.5  Effect Size Sensitivity Analysis for the Age Moderator Variable.
Age Category Removed 

full study 
1 age 6  10 
2 age 8  12 
3 age 10  15 
4 age 13  18 
5 age 7  15 
6 age 8  20 

Resulting Random Effect  r 
.1023  .0655  .1208  .0802  .1257  .1129  .1107 
Summary
A sufficient sample of studies was gathered to produce effect sizes and perform a metaanalysis. Reliability of the coding process was established through expert and verification coder review. An effect size was generated and evaluated as being a significant positive random effect. By definition this positive effect is generalizable to the population. The Pearson r was identified as the most useful effect size estimator because of its generality and ease of interpretation. The data was then analyzed using datapoint line plots, regression analysis and an effect size sensitivity analysis, in order to identify moderator variables. The explanatory variables were identified as being those of a camp philosophy related to enhancing self, and the age of the camper.