Survey Data Analysis for Dissertation

Marcus Whitfield
Written By

Marcus Whitfield

✔️ 97% Satisfaction | ⏰ 97% On Time | ⚡ 8+ Hour Delivery

Survey Data Analysis for Dissertation



Most dissertation guides tell you how to design a survey. Very few tell you what to do once the responses arrive. You've sent out 200 questionnaires. 89 people responded. You've got a spreadsheet. Now what?

That's the gap this post fills. From cleaning your raw data to interpreting statistical tests, here's the full workflow for analysing survey results.

Step 1: Data Cleaning

Your raw data is always messier than you expect. Before you run a single statistical test, spend time cleaning.

Incomplete responses. Decide your threshold for missing data. If someone answered three questions out of forty, include them? Usually not. If someone skipped one question out of forty, include them? Usually yes. A common rule is to exclude any respondent with more than 10% missing data. If someone's missing only one or two responses on a 40-item survey, you can sometimes impute (estimate) the missing value as the mean of that person's other responses, but this reduces genuine missing information. Many analysts simply exclude incomplete responses, which is more conservative.

Outliers and impossible values. Check your data for impossible entries. If you asked age and someone entered 999 or "abc," these need removing or recoding. Check for outliers that might be errors. A score of 95 on a scale that goes to 50 suggests a data entry mistake.

Midpoint bias. In Likert scale surveys, some respondents always choose the middle option. This tells you the respondent wasn't engaged. You might exclude respondents who answered identically for 80% of your items, or flag them for analysis separately. Midpoint bias weakens your data but isn't always grounds for exclusion. It depends on your research question.

Reverse scoring. Some survey items are worded negatively ("I don't enjoy my job") while others are positive ("I enjoy my job"). Before analysis, you need to reverse score the negative items so all items point in the same direction. If your scale runs 1 to 5 (strongly disagree to strongly agree), a response of 5 on a negative item gets recoded as 1. Most statistical software handles this automatically if you code it correctly.

Step 2: Choosing the Right Descriptive Statistics

Descriptive statistics summarise your data before you run any inferential tests. Which descriptive statistics you report depends on your data type.

For normally distributed continuous data (like age, test scores, or rating scales when you've many respondents), report the mean and standard deviation. "The mean age was 34.2 years (SD = 12.4)."

For skewed data or ordinal data (like Likert items, which are technically ordinal not interval), report the median and interquartile range. "The median satisfaction rating was 4.0 (IQR = 3.0 to 5.0)." The interquartile range tells you the spread of the middle 50% of responses.

For categorical data (yes/no, gender, department), report frequency counts and percentages. "42 participants (47.2%) identified as female, 45 (50.6%) as male, and 2 (2.2%) as non-binary."

Test for normality. If you're unsure whether your data are normally distributed, run the Shapiro-Wilk test or Kolmogorov-Smirnov test. Most statistical software can do this in one click. If p < 0.05, your data aren't normally distributed. Use medians instead of means.

Step 3: Visual Presentation

A good chart can communicate what numbers can't. Choose the chart type that matches your data type.

Bar charts for categorical data. Show the frequency of each category. "Responses to the question 'How often do you exercise?' displayed as horizontal bars showing the number of respondents in each category (daily, several times weekly, weekly, rarely)."

Histograms for continuous data. Shows the distribution. Put the variable on the x-axis and frequency on the y-axis. This reveals whether your data are normally distributed or skewed. If the histogram is bell-shaped, your data are probably normal. If it's skewed left or right, they're not.

Box plots for comparing distributions between groups. If you want to compare satisfaction scores between departments, a box plot shows the median, quartiles, and outliers for each department. This makes visual comparison straightforward.

Scatter plots for relationships between two continuous variables. Put one variable on the x-axis and one on the y-axis. Each point is one respondent. The pattern tells you whether the relationship is linear, curved, or absent.

Step 4: Inferential Statistics for Survey Data

Inferential statistics let you test whether patterns in your sample reflect real patterns in the population.

Chi-square test for comparing proportions between groups. You want to know whether the proportion of yes/no responses differs between departments. Chi-square tests this. If p < 0.05, the difference is statistically considerable (less than 5% chance it occurred randomly). Run this test when you've categorical data in groups.

Independent samples t-test for comparing means between two groups. Do Department A and Department B have different mean satisfaction scores? The t-test answers this. Report the t-statistic, degrees of freedom, and p-value. "Department A (M = 4.2, SD = 0.8) reported higher satisfaction than Department B (M = 3.4, SD = 1.1), t(87) = 3.24, p = 0.002."

Mann-Whitney U test for comparing medians between two groups when data aren't normally distributed. This is the non-parametric equivalent of the t-test. Use it when your data are ordinal (Likert items) or skewed.

One-way ANOVA for comparing means across three or more groups. You've four departments and want to test whether satisfaction differs between them. ANOVA tests this. If p < 0.05, at least one department differs from the others. You then need post-hoc tests (like Tukey's HSD) to see which departments differ from which.

Pearson correlation for the relationship between two normally distributed continuous variables. Does workload correlate with stress? Pearson's r ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation). r = 0 means no relationship. Report r, degrees of freedom, and p-value. "Workload and self-reported stress were positively correlated, r(87) = 0.64, p < 0.001."

Spearman's rho (rank correlation) for relationships between ordinal variables or when data aren't normally distributed. This is the non-parametric equivalent of Pearson's r. Use it for correlating Likert items.

Step 5: Reporting Statistics in APA Format

Your university probably requires APA (American Psychological Association) format for reporting. The conventions are strict but exist to make results clear.

For means and standard deviations: "(M = 4.2, SD = 1.1)" where M is the mean and SD is the standard deviation.

For t-tests: "t(87) = 3.24, p = 0.002" where 87 is the degrees of freedom, 3.24 is the t-statistic, and p is the significance level.

For chi-square: "χ2(2) = 8.47, p = 0.014" where χ2 is chi-square, (2) is degrees of freedom, 8.47 is the test statistic, and p is the significance.

For correlation: "r(87) = 0.64, p < 0.001" where r is Pearson's correlation coefficient, 87 is degrees of freedom, 0.64 is the correlation strength, and p is the significance.

For ANOVA: "F(3,86) = 5.23, p = 0.002" where F is the test statistic, (3,86) is degrees of freedom (between groups, within groups), 5.23 is the F-statistic, and p is the significance.

The transition from coursework essays to a full dissertation can feel daunting for many students, largely because the dissertation requires a much higher level of independent research, sustained argument, and self-directed project management than most previous assignments. Unlike a coursework essay, which typically has a defined topic and a relatively short word count, a dissertation gives you the freedom to choose your own research question and to pursue it in considerable depth over a period of several months. That freedom can be both exhilarating and overwhelming, which is why it is so important to develop a clear plan early in the process and to work consistently towards your goals rather than waiting for inspiration to strike. Students who approach the dissertation as a long-term project requiring regular, disciplined effort consistently produce better work than those who attempt to write the entire dissertation in the final weeks before the submission deadline.

Referencing accurately is one of the most important skills you will develop during your time at university, and it is a skill that will serve you well throughout your academic and professional career. Many students lose marks not because their ideas are poor but because their citation practice is inconsistent, with some references formatted correctly and others containing errors in punctuation, ordering, or detail. Whether your institution uses Harvard, APA, Chicago, or another referencing style, the underlying principle is the same: you must give credit to the sources you have used and allow your reader to verify those sources independently. Taking the time to learn one referencing style thoroughly before your dissertation submission will reduce your anxiety considerably and ensure that your bibliography presents your research in the most professional possible light.

Step 6: The Likert Scale Debate

Here's where things get heated amongst statisticians. Likert items (1=strongly disagree, 5=strongly agree) are technically ordinal data. Ordinal data should be analysed with non-parametric tests. Yet many researchers treat Likert items as interval data and use parametric tests like t-tests and ANOVA.

What does this mean in practice? If you've many Likert items (10 or more) that you've combined into a single scale, and the combined scale is reasonably normally distributed, treating it as interval data is usually acceptable. Most journal articles do this. If you've only a few items, or the individual items are heavily skewed towards one end, be more conservative. Use medians and non-parametric tests.

The honest answer: there's no single right way. Different supervisors will have different preferences. Ask yours early. Document your choice. Justify it. "Likert items were combined into a composite satisfaction scale (alpha = 0.83). Although Likert data are technically ordinal, the composite scale approximated normal distribution (Shapiro-Wilk p = 0.083) and contained 12 items, justifying the use of parametric tests (Field, 2013)."

Frequently Asked Questions

Q: How many responses do I need for statistical tests? A: Rules of thumb vary, but most parametric tests require at least 30 participants per group. If you're doing correlations, 30 total is often sufficient. If you're comparing four groups on ANOVA, aim for at least 20 per group. Smaller samples make type II errors (failing to detect real effects) more likely. Always report your sample size and power calculations if you did them.

Q: What do I do if my survey responses are very low? A: Low response rates don't necessarily invalidate your analysis, but they raise questions about bias. If you had 200 people in your sampling frame and 35 responded, that's a 17.5% response rate. You should discuss who responded and who didn't. Were respondents different in any measurable way from non-respondents? Did older people respond more than younger people? Did management respond more than staff? This response bias matters more than the absolute number.

Q: Should I report p-values or effect sizes? A: Both. p-values tell you whether an effect is statistically considerable (unlikely to have occurred by chance). Effect sizes tell you how big the effect is. A large sample can produce a quite considerable p-value for a tiny effect. Report both so readers understand the practical importance of your findings.

Need Expert Help With Your Dissertation?

Our UK based experts are ready to assist you with your academic writing needs.

Order Now
Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Post

20% Off
Live Chat with Humans
GET
20% OFF!