top of page

Descriptive Statistics in Clinical Research: Types of Data and Variables, Numerical Data, and Categorical Data

Writer: MaytaMayta

1. Introduction to Descriptive Statistics

Descriptive statistics are the foundational tools for summarizing, organizing, and visualizing data in both clinical and epidemiological studies. Before moving on to more complex analytical methods (like regression or hypothesis testing), it’s crucial to:

  1. Identify the type of data (numerical vs. categorical).

  2. Use appropriate descriptive measures (mean, median, proportions).

  3. Visually inspect data distributions (histograms, boxplots, bar charts).

These steps allow researchers to understand the “shape” of the data, spot potential errors or outliers, and communicate findings clearly.


 

2. Types of Data and Variables

A. Numerical Data

  1. Continuous: Variables that can take on any value within a range or interval, theoretically without breaks.

    • Examples: Serum creatinine level, body weight, blood pressure, blood glucose levels.

  2. Discrete: Variables that can only take specific (often integer) values.

    • Examples: Number of hospital admissions, number of adverse events, number of children in a family.

B. Categorical Data

  1. Nominal: Categories without any intrinsic order.

    • Examples: Blood group (A, B, AB, O), type of health insurance, eye color.

  2. Ordinal: Categories that have a meaningful order, but intervals between categories may not be equal.

    • Examples: Staging of cancer (Stage I, II, III, IV), Likert scale responses (Strongly disagree, Disagree, Neutral, Agree, Strongly agree), NYHA functional classes (I, II, III, IV).

Key Takeaway: Correctly classifying variables ensures that the right descriptive statistics and visualization methods are applied.


 

3. Descriptive Statistics for Numerical Data

A. Distribution-Based Checks

Before computing summary statistics, it is helpful to assess the distribution of your numerical data:

  • Normality: A “bell-shaped” or Gaussian distribution is common in many biological measurements, but not guaranteed.

  • Skewness: Indicates asymmetry in the distribution. A right-skew (tail to the right) often arises in variables like hospital length of stay or annual income.

  • Kurtosis: Describes the “tailedness” or peak of the distribution.

Diagnostic Tools:

  • Histogram: Quick visual to see if data cluster around the mean or exhibit skew.

  • Boxplot: Reveals median, quartiles, and potential outliers.

  • Q-Q Plot: Checks normality by comparing quantiles of the sample data to a normal distribution.

B. Measures of Central Tendency

  1. Mean (Arithmetic Average)

    • Definition: Sum of all observations divided by the number of observations.

    • Use Case: Data that appear to be roughly symmetric with few extreme outliers.

    • Interpretation: Represents the “balance point” of the distribution.

  2. Median

    • Definition: The middle value when observations are sorted.

    • Use Case: Skewed or heavily outlier-prone data (e.g., incomes, hospital lengths of stay).

    • Interpretation: Half of the observations lie below and half above the median.

C. Measures of Dispersion

  1. Standard Deviation (SD)

    • Definition: Average amount by which each observation deviates from the mean.

    • Use Case: Paired with the mean in symmetrically distributed data.

    • Interpretation: Indicates clustering of data around the mean; a small SD means points are tight around the mean.

  2. Interquartile Range (IQR)

    • Definition: The difference between the 75th percentile (Q3) and 25th percentile (Q1).

    • Use Case: When the median is used (due to skew or outliers).

    • Interpretation: Contains the central 50% of data points, providing a more robust measure of spread for non-normal distributions.


 

4. Descriptive Statistics for Categorical Data

When dealing with nominal or ordinal variables, the following are standard ways to summarize:

  1. Frequency Counts

    • Definition: The number of observations falling into each category.

    • Example: If your dataset has 100 patients and 30 of them have type 2 diabetes, the count is 30 for that category.

  2. Percentages or Proportions

    • Definition: The fraction (or percentage) of observations in each category.

    • Example: In the example above, 30 out of 100 patients with type 2 diabetes translates to 30%.

Additional Tip: For ordinal data, you can also present cumulative frequencies (e.g., proportion at or below a certain stage of disease).

 

5. Visualization Techniques

Visualization is a powerful aid in understanding and communicating data distributions. Appropriate charts and plots depend on the type of variable:

  1. Histograms (Numerical Data)

    • Show the frequency (or density) of observations binned across intervals of continuous or discrete data.

    • Great for spotting skew, multi-modal distributions (multiple peaks), or outliers.

  2. Boxplots (Numerical Data)

    • Display median, IQR, and outliers.

    • Very useful for comparing distributions across multiple groups (e.g., comparing blood pressure across different treatment arms).

  3. Bar Charts (Categorical Data)

    • Display counts or percentages in categories.

    • Simple, clear way to communicate categorical frequencies.

  4. Pie Charts (Categorical Data)

    • Illustrate how a whole is divided among categories.

    • Less commonly recommended in scientific literature compared to bar charts, as it can be harder to compare relative sizes of slices precisely.


 

6. Clinical Relevance of Descriptive Statistics

  • Quality Control: Basic descriptive statistics are often the first step in identifying data-entry errors (e.g., an outlier that is clearly a mistyping).

  • Contextual Understanding: Clinically, understanding the distribution of age, gender, and comorbidities helps gauge whether a study population matches one’s own patients.

  • Hypothesis Formation: Observations about skewed distributions or unusual frequency counts can lead to new hypotheses or sub-analyses.

Example: In a study of systolic blood pressure among hypertensive patients, a histogram revealing a heavy right tail might suggest non-normal distribution. Consequently, you’d likely use a median (IQR) to summarize central tendency and variability, instead of mean (SD).


 

7. Conclusion

Descriptive statistics form the backbone of any scientific investigation. By properly classifying your variables and choosing the correct measures of central tendency and spread, you can provide an accurate and meaningful summary of your dataset. Coupled with the right visualization techniques, descriptive statistics lay the foundation for all subsequent inferential analyses and evidence-based conclusions in clinical research.

Recent Posts

See All

OSCE: Cervical Punch Biopsy

Introduction A cervical punch biopsy is a procedure used to obtain a small tissue sample from the cervix to investigate suspicious...

Comentarios

Obtuvo 0 de 5 estrellas.
Aún no hay calificaciones

Agrega una calificación
Post: Blog2_Post

Message for International Readers
Understanding My Medical Context in Thailand

By Uniqcret, M.D.
 

Dear readers,
 

My name is Uniqcret, which is my pen name used in all my medical writings. I am a Doctor of Medicine trained and currently practicing in Thailand, a developing country in Southeast Asia.
 

The medical training environment in Thailand is vastly different from that of Western countries. Our education system heavily emphasizes rote memorization—those who excel are often seen as "walking encyclopedias." Unfortunately, those who question, critically analyze, or solve problems efficiently may sometimes be overlooked, despite having exceptional clinical thinking skills.
 

One key difference is in patient access. In Thailand, patients can walk directly into tertiary care centers without going through a referral system or primary care gatekeeping. This creates an intense clinical workload for doctors and trainees alike. From the age of 20, I was already seeing real patients, performing procedures, and assisting in operations—not in simulations, but in live clinical situations. Long work hours, sometimes exceeding 48 hours without sleep, are considered normal for young doctors here.
 

Many of the insights I share are based on first-hand experiences, feedback from attending physicians, and real clinical practice. In our culture, teaching often involves intense feedback—what we call "โดนซอย" (being sliced). While this may seem harsh, it pushes us to grow stronger, think faster, and become more capable under pressure. You could say our motto is “no pain, no gain.”
 

Please be aware that while my articles may contain clinically accurate insights, they are not always suitable as direct references for academic papers, as some content is generated through AI support based on my knowledge and clinical exposure. If you wish to use the content for academic or clinical reference, I strongly recommend cross-verifying it with high-quality sources or databases. You may even copy sections of my articles into AI tools or search engines to find original sources for further reading.
 

I believe that my knowledge—built from real clinical experience in a high-intensity, under-resourced healthcare system—can offer valuable perspectives that are hard to find in textbooks. Whether you're a student, clinician, or educator, I hope my content adds insight and value to your journey.
 

With respect and solidarity,

Uniqcret, M.D.

Physician | Educator | Writer
Thailand

bottom of page