Evaluating Arguments, Part 8: Variables, Sample Sizes, and Experimentation

Statistical and empirical arguments are crucial to the formulation of justified beliefs. As a result, knowing how to evaluate such arguments is of paramount significance. So, today I will explore yet another way to address and evaluate an argument.

Why-Study-Psychology

(1) Infers a faulty generalization from a small sample population or a single experiment or observation — or fails to account for confounding variables or complexities within an issue.

This is highly relevant in psychological research, which in turn has a bering on a number of philosophical questions such research poses.

To think critically about such experimentation and research, use the following procedure.

Was the study conducted in the laboratory or in a natural setting? Was the sample representative across time, culture, socioeconomic status, mood, age, sex, etc? Were all possible confounding factors accounted for?

In order to critically analyze psychological and statistical findings:

  • Identify the aim

Identify in the procedure:

  • Participants: sampling method, sample size, and variables in the experiment
  • Type: Questionnaire? Survey? Observational? Experimental?
  • Method: Was the experiment a repeated measures experiment (i.e. the same sample group with different conditions applied)? Was it a longitudinal study (over a long period of time)? Was it an independent measures experiment (i.e. one group tested with one condition and another with another)? Was the study cross-cultural?

Identify findings:

  • Actual results, numerical results (exact numbers, trends, or correlations)

Identify conclusions:

  • Generalization(s) drawn from data
  • Evaluate for: internal validity; benefits and drawbacks of sample; controls; demand characteristics (demand characteristics occur when subjects consciously or subconsciously respond the way (the subjects believe) the researcher’s desire — they also occur when subjects realize what factor is being observed and change their behavior accordingly); and social desirability effect (the subjects know they’re in a study so they desire to appear better or more desirable than they in fact are)
  • In research, demand characteristics refer to an experimental artifact (bias) where participants form an interpretation of the experiment’s purpose and consciously or subconsciously change their behavior to fit that interpretation
  • Extraneous/confounding variables (other things that may have influenced results)
  • External validity: how results can be generalized outside the context of the particular experiment
  • Ecological validity (behavior across the population) is lower in a lab than in the real world
  • Locational validity (based on the particular sample and population being observed)
    • A study is more locationally valid if it tests across a variety of backgrounds and cultures
  • Temporal validity: cultures, individuals, and trends (psychological or otherwise) change over time
  • Cultural dimension: different perspectives and values within cultures can introduce confounding factors into the experiment

What is the nature of the sample? Is it an opportunity (convenience) sample? This is a sample of whoever happens to be around and agrees to participate — for example setting up your station in a supermarket. But we must ask: What type of people shop at the supermarket? What type of people say yes to participating? Is there a gender imbalance in who does the shopping in this community? Were students used for the study? With regard to students, though, there are a number of limitations that may limit the findings’ generalizability to the broader population. Students (i) have a strong need/desire for social approval, (ii) were pre-selected for competence in cognitive skills by their admission to the university, and (iii) tend to be more egocentric/self-focused than adults.

Is the sample self-selected? If so, why suppose that those who volunteer properly represent the population at large?

It is important, also, to consider participant variability — that is, the extent to which the participants may share a common set of traits that can bias the outcome of the study. If a survey is about anxiety about mathematics, it may be the case that, if you do a volunteer study, only people who feel very strongly and have a lot of anxiety about math may volunteer, thus skewing results.

Generally, the best sampling method is random sampling. A random sample is one in which every member of the target population has an equal chance of being selected. A stratified example is yet another tool for sampling, which draws random samples from each subpopulation in a target population. For example, if 20% of the student body is Indian, then if we have 30 participants, we may want to randomly select 6 out of all the Indian students. In this way, the sample is a more accurate reflection of the actual distribution of the given population.

All of this falls under the umbrella term selection bias.

Check for the aim, procedure, and findings, along with the validity and reliability.

Ecological Validity: The extent to which the results predict behavior outside the laboratory. Does the study represent what happens in real life? If the experiment was conducted in a lab and participants did things they would not normally do in real life, the study likely lacks ecological validity. It could also be the case that the situation in which the experiment took place was so well-controlled that normal influences on behavior were eliminated, thereby leading, again, to a lack of ecological validity.

Cross-cultural Validity: is the research relevant to other cultures or is it ethnocentric (based on the values, practices, and beliefs of a single culture)?

Is this study replicable, and is the hypothesis falsifiable?

Is the study based on a representative group of people (sample)?

  • Is there a bias in the sample? Is one group overrepresented? (e.g. gender, ethnicity, culture, socioeconomic status, etc.)

Was the study conducted in a laboratory or a natural setting?

  • Laboratory settings are artificial.

Were the participants asked to do things that are far from real life?

Are the findings of the study supported/questioned by the findings of other studies?

Do the findings have practical relevance? To what extent? What are the limitations?

Variables in an experiment need to be operationalized — that is, they need to be written in such a way that it is clear what is being measured. Be specific! Don’t just say “noise”. Say the operationalized variable of high music at volume 35. Don’t just say “recall”. Say the operationalized variable of number of words remembered from a list of 20 words after 15 minutes.

Confounding variables are undesirable variables that influence the relationship between the independent and dependent variables.

  • Demand characteristics occurs when participants act differently simply because they know they are in an experiment.
  • Researcher bias occurs when the experimenter (selectively) sees what he or she is looking for. The expectations of the researcher may consciously or subconsciously affect the findings of the study. Use a double blind study to help combat this. In this design, not only do the participants not know whether they are in the treatment or control group, but the person administering the treatments and placebos knows neither the aim of the study nor which group is the treatment/control.
  • Participant variability occurs when characteristics of the sample affect the dependent variable.

Also consider artificiality, which is when the situation created is so unlikely to occur that one should remain skeptical if there is any validity in the findings.

Bidirectional ambiguity is present when two variables are correlated but whether the first causes the second or the second causes the first is ambiguous, indeterminable, or unknown. It is the concept that in a correlational study, since no independent variable is manipulated, it is impossible to know if x causes y, y causes x, if they interact to cause behavior, or whether the relationship is just coincidental.

Be cognizant, moreover, of researcher degrees of freedom. This means choosing when to stop recording data, which variables to follow, which comparisons to make, and which statistical methods to use. Now, all of these are key components of nearly every scientific study. If, however, the researchers monitor the data or outcomes in any way while making these meta-observational choices, they can consciously or subconsciously exploit their degrees of freedom to reach apparently statistically significant results (usually a p-value of less than 0.05).

For instance, if you survey the data as you collect it, you might decide that, once you cross over the p = 0.05 threshold, you can stop collecting data and publish. Tracking the data isn’t by itself dubious (and is often required in, say, medical studies). But this kind of tracking should be done independently of the primary investigator who is collecting the data to publish (or, at the very least, the number of subjects needs to be determined prior to collecting the data). Anything a researcher changes about the research during or after collecting the data introduces bias.

To determine the reliability and validity of empirical evidence, ensure the following conditions are met:

  • Decisions about data collection are pre-determined, then register the study methods before collecting data
  • Report effect sizes and confidence intervals
    • Effect sizes that are small (like a one-week cold being reduced on average by one hour) often indicate subtle but systematic bias, errors, and unknown factors influencing the results
  • Minimize effect of bias and unrelated variables
  • Not only statistically significant results (i.e. results not due to random variation or noise), but also results that are significant in effect size as well (reasonable signal to noise ratio)
  • A pattern of independent replication consistent with the study’s findings
  • Evidence proportional to the plausibility of the claim

All of the above information is tremendously useful when you come across someone attempting to make a point based on statistical research, whether in psychology (most often used) or elsewhere. Knowing these methods of evaluation will equip you with the skeptical tools necessary to formulate objective and sound judgements — ultimately in hopes if finding truth!

Author: Joe

Email: [email protected]