This post provides some tips and information about the process of analyzing survey data. Some of it is from Dave’s vlog and some of it is my own. Just a note about survey research.
Surveys can be quantitative with all questions/items that can be analyzed statistically or it can be mainly or in part qualitative. Qualitative research using a survey would include open-ended questions that the respondent has to write out in sentences or paragraphs. This post mainly addresses issues in quantitative survey research. If you need help on writing a paper or editing your thesis, you should check out this detailed post.
A disclaimer about Dave’s vlog on this topic: This is one of Dave’s more technical vlogs, and you do have to have some baseline knowledge of research analysis methods to benefit from some of the content, but Dave did provides a great summary of key things that are important to keep in mind as you design your survey research and prepare to analyze survey data, whether is be more a smaller class project or your dissertation. You can view Dave’s full vlog here:
First of all, before you begin your analysis, you must think about your research question and how the survey / questionnaire relates to your research question. How are you going to operationalize the variables specified in your research question? That is, how is the survey data going to describe phenomena that you are interested in observing and measuring? Also, if you made some hypotheses, how are you going to determine whether they are confirmed or rejected by the data?
This post was written by Stephanie A. Bosco-Ruggiero (PhD candidate in Social Work at Fordham University Graduate School of Social Service) on behalf of Dave Maslach for the R3ciprocity project (Check out the YouTube Channel or the writing feedback software). R3ciprocity helps students, faculty, and research folk by providing a real and authentic look into doing research. It provides solutions and hope to researchers around the world.
Creating a data analysis plan
Specifically focus on your research questions before you do anything else and come up with a data analysis plan. If your research is purely quantitative (no open ended questions requiring content analysis) outline the statistical procedures you are going to use to answer your research question. Do you want to use bivariate or multivariate analyses? That is, do you want to measure the association between two variables, or do you want to observe how more than two variables impact an outcome or relate to each other? Some common bivariate analyses are Pearson chi-square or bivariate correlation. For a more rigorous multivariate analysis you might use a multiple linear regression or a cluster analysis.
There is a lot of regression analysis to cover, so we are not going to cover regression here. People spend many courses trying to understand regression analysis. Most of it is thinking about how regression assumptions do and do not hold.
Avoiding confirmation bias
The key thing is to specify as much of the analysis before you touch the data. Why, you ask? We have a tendency as humans to look to confirm our hypotheses, and the goal in science is to objectively confirm or reject (falsify) your hypotheses. By specifying as much of the analysis upfront as possible, you prevent yourself from being human and selecting analytical methods that will more likely confirm your hypothesis as you proceed through your research.
Now, sometimes you do have to adjust your data analysis plan (more about that at the end) and that is ok in some instances, but don’t change your research questions and data analysis plan continuously as you go through your research because you want to come up with some kind of predetermine finding or don’t like what you’ve come up with your original plan.
(This is Dave: Personally, I think you are OK to adjust as you go as long as you are upfront and clear with this in your analysis. If anyone has gone through the review process of a major journal, you will know that revise and improving clarity is a major part of writing papers. Yes, we know that there is debate about HARKing and such right now, but writing a paper is virtually impossible to do without this trial and error process. If we knew what the answer was upfront, which is what pre-specification presumes, then it would not be research.)
This pitfall of wanting to change our questions or plan to find something interesting or confirm our hypotheses is known as confirmation bias. We all want to find something interesting in our data, and all the better if our analyses confirm what we thought would happen, but we can’t will our results. They are what they are. By creating a data analysis plan early on, you are more likely to stick to it and not make too many adjustments based on what you’re seeing in the data, or learning, along the way. At some point, you just have to say, I will find what I find even if it’s not that interesting.
Here are tips Dave shared about things you should think about and steps you should take as you go through the process of planning your study and analyzing your data.
- Consider construct validity. First you want to ask yourself whether your survey items are measuring your concept precisely enough. That is, is your main idea and hypothesis being accurately measured by the set of questions in your survey tool? If you adjust your hypotheses along the way, you may want to consider using a different survey tool if your original tool no longer measures the concept you are studying. To determine construct validity you want to become familiar with past studies and tools used to analyze and observe your key ideas or research question. Read more about construct validity here.
- Run your frequencies and plot your data. So you’ve gathered 100 completed surveys and you have them in hand or the data online. After you enter the data into a data analysis software platform (e.g. R, SAS, SPSS), run your frequencies. Simply look at your numbers. Can you glean anything from the descriptive data? Is there an imbalance in who answered your survey (e.g. by gender). Just take a look at the data and become familiar with the raw results. You can plot your continuous data as well. Dave says, “Always plot the data!” Try to think about simple histograms or scatterplots (x-y plots), or line plots. Then look at the data and see what it’s telling you. Take a look at the outliers and anomalies that show up in the data. Plotting allows you to see how the data lines up comparatively to what you would expect to see.
- Explore your data. In order to run certain statistical tests, your data has to be appropriate for that test. First think about the basics. If your data is categorical (e.g. colors, states) you can’t run tests that are appropriate only for continuous variables (e.g interval or ratio data). You also need to confirm that your continuous data is appropriate for certain tests. Is it normally distributed? Does it contain numerous outliers? Learn about the requirements for running an ANOVA, regression, chi-square, etc. Then you have to decide which are your independent and dependent variables. If you are doing multivariate analyses, carefully consider which independent variables you want to include in the model. Also, do you want to create any interaction variables? Which control variables do you want to include so you can clearly understand which independent variables cause your dependent variable to vary.
4. Run your analyses. Run your bivariate and/or multivariate analyses. When conducting a multiple linear regression, use a stepwise regression so you can add variables to the model one by one. If you remove or add a variable, do your findings suddenly become significant. Think about why this might be. A particular variable might also make your model unstable. Figure out which variable is causing the problem, and find out why. Is it intercorrelated with another variable? It’s ok to run a bunch of analyses that you never report on (Dave: Maybe. We always have to be clear on what we report on and don’t run. It’s just far more efficient use of your time to document, document, document.)
You just want to become familiar with your data and various results. If you want to run a bunch of bivariate analyses to become familiar with what you are going to see in your multivariate analyses, go right ahead. It doesn’t mean you have to report on every single test you run. Also, you are not deviating from your analysis plan by running more tests than you need. You are only deviating from your plan if you keep changing the variables you are looking, make a major change to your research methodology, or completely change the focus of your study or how you are going to analyze your data (e.g. scrapping a regression analysis plan to do a factor analysis, moving from a cross sectional analysis to a longitudinal study).
5. If you conduct a one way ANOVA or regressions, run a post hoc analysis. If you find a difference in means between your variables, find out where the significant differences are. To do this run a post-hoc test, also known as a multiple comparisons test. For example, if you have groups of freshman, sophomore, and junior high school students taking a standardized test, and your ANOVA results are significant, run a post-hoc test to determine if all three groups have significantly different scores or whether the different lies between two specific groups. You have to choose your post-hoc statistic carefully (e.g. think Tukey) based on the characteristics of your data.
6. Double check your work and output. We have all made mistakes at one time or another in analyzing our data or interpreting our results. Double check everything you’ve done after you’ve run all of your analyses. Do some of the results seem really off, or the data is not performing as expected? Trace your steps and make sure you entered all of the correct variables and ran the right tests. You can even have a student assistant double check your work, or have a colleague look at any puzzling results.
7. Think about how your findings are different or similar to other studies’ findings. You should have conducted a literature review in the study planning stages to find out who has studied your concept, or closely related concepts, prior, and what they discovered. Are you going to confirm past findings or try to refute them? What should you include or not include in the analysis? How many research questions should you have and have you made them straightforward enough that they are easily analyzed? Take a look at your frequencies and think about whether the data is lining up with what was found in previous studies.
8. Continuously write up your results: Obviously, people from a range of disciplines read this blog, so we can’t describe exactly how you’re going to write up your results because there are different formatting requirements in each discipline. We can tell you, however, that whatever the format, you are going to need to understand and write up your results and interpret them in a discussion section (or something similar). As soon as you look at your output you can start writing notes about about what your’re seeing and what it might mean. How does it relate to prior finding in this area of research? If your hypothesis was rejected or the null could not be rejected, think about why. If you found something completely new that has not been found before in your field, discuss why at the present time or in your particular study these results might have come about.
9. Leftover data. Dave advises that you don’t need to use all of the data in the survey in your analysis. Save some for future research. You don’t want to go overboard in reporting every single result. Stick to what you wanted to look at according to your research questions, hypotheses, and data analysis plan. Of course, dissertation data and analyses can provide the perfect content for several peer reviewed research manuscripts (journal articles). Save all of your data! (Dave: Indeed, a good research project should have room for 3 to many more studies with the data).
10. Think about future studies. What did you find that was particularly interesting from your data that you might want to explore further. Jot down some ideas for future studies that look at different angles of what you studied or that take your research to the next level. You might look at a similar set of research questions using a different research methodology or set of tests, or you might focus in on particular finding and explore it using qualitative survey techniques (e.g. focus groups, interviews.)
Check out Dave’s vlog about Mistakes most PhDs make in their Doctoral Research and Scientific Careers:
Here are a few more tips to consider as you conduct your quantitative survey research:
- Keep careful track of who you administered your questionnaire to (create IDs for anonymity), if you changed any items along the way, and your process for distributing your survey. You’ll have to write all of this up later on.
- Don’t forget about Ethics Approval! Every university and research institution has an Ethics Board or IRB. It is super important that you do that before you start doing research. Our research has consequences, some of them can be rather nasty, and we have to think about what those consequences are. If you need help, check out this video on the IRB Approval process:
- Be upfront about the limitations of your research. Don’t try to hide the limitation. Any good study disclosed limitations so the reader understands where there may be a lack of validity or reliability in the numbers. For example, if you have a small sample or it is skewed in some way, note that. You want your reader to understand whether these results might be similar for a different population.
- If you don’t actually find much that is interesting or all of your hypotheses must be rejected, don’t despair, you are not the only one. Report your results and if you’ve done a good job you will get a good grade or your degree. Journals are full of studies that found nothing significant and nothing particularly interesting. (Dave: Yes and No. You have to think about why that is interesting to not find results.) This is research too and others in your field will learn from your results. Your results might even help you come up with a new theory. As Dave says, “That is the beauty of science….don’t be afraid to explore different avenues, you should be writing down and clear about the stuff you’ve done, be very systematic.” He also advised that a spurious result can be really interesting.
- If you do need to change your data analysis plan, that is ok. As long as your new plan is helping you come up with results that best explain your research questions that is fine. You do have to write up a new data analysis plan and stick to your new plan, don’t keep changing it up. It’s great if you can stick to your original plan, but that often does not happen.
- Do keep a codebook. A codebook is your menu of variables, their names, and their numerical codes. You should include all of your created interaction variables. Also keep careful note or log of all of your recoded variables and their new responses and codes. You will thank yourself later…..
- Did you benefit from this post? Do you know of anyone at all that could use feedback on their writing or editing of their documents? I would be so grateful if you read this post on how to get feedback on your writing using R3ciprocity.com or let others know about the R3ciprocity Project. THANK YOU in advance! You are the bees knees.
If you enjoyed this blog, check out these other blogs on r3ciprocity.com: