Data Cleaning Using MS Excel


Last time we discussed briefly about data collection methods. Once we have collected the data, we need to edit it with the aim of making it clean for the next step of analysis. We shall assume that after collection, our data was entered into MS Excel.
Editing is the process of examining or scrutinizing data in order to identify any errors, mistakes or omissions. Under this process, we focus on the completeness of data (no missing values), distribution of the data and extreme or outliers in the data. Some of the errors that that we may have to deal with under this process are sampling errors, Non-sampling errors, biased errors, non-biased errors, and positive or negative errors.
Sampling errors occur due to the type of sampling method used during the collection of data. Sampling errors mean the difference between the estimate of a value as obtained from the sample and the actual value. Non-sampling errors take place if randomization was not used during sample selection. Biased errors are due to the biasedness of the investigator himself or even the measuring instrument during the collection of data.
If data is not checked and cleaned to remove the errors, mistakes and omissions, the results may be statistically invalid which in turn give misleading conclusions and recommendations. This can cost your company dearly.
Now, let’s see how you can clean your data in Excel.

The preliminary step is of course to create a backup copy of your original data in case you make undesired changes, you will not have lost your data.
1. Checking for missing data
Missing data cause a lot of problems during analysis especially if not properly handled. In MS Excel, one can find out the cells that are blank through a simple procedure and choose to replace the blanks with a zero, NA or whichever way he chooses to handle missing values.
The first step is to select the entire dataset using CTRL+A followed by CTRL+G. This should open a Go To dialogue window.


Click on Special.




Select Blank. Click OK. Excel will look for blank cells and will highlight them. Enter the value for example zero and press CTRL+ENTER. If you want to enter different values in each cell, you will have to enter manually one by one.
2. Replacing values
Sometimes you may want to find a value and replace it with a new one. In this case, press CTRL+F. A Find and Replace window will appear.


Type in the value you want to find. Click find all. Click replace. Type in the value you want to replace with. Click replace.

3. Conversion of Text to Numbers
At times you find that whoever entered your data may have stored numbers as texts. This may cause trouble during analysis both in Excel and other statistical programs if data is imported to them. To change say from General format to Number format, type 1 in a blank cell then convert it into number type (see the picture below).


Put the cursor in the cell in which you typed 1 and press CTRL+C. Select the cells containing the numbers you want to convert. You may want to use CTRL+SHIFT+ Arrow Key if you want to select a row(s) or a column(s). Click Paste Special. A Paste Special window appears as shown below.


Select multiply and click OK. You have successfully converted text numbers to numbers.

4. Lastly, you can create a macro in excel to automate your data cleaning process.
I understand that Excel my not be the best in data cleaning compared to STATA, SPPS and R but it gives you an easy starting point. Some of the processes that are complicated in other programs may be simple in MS Excel.  Besides, many people are conversant with MS Excel more than other programs. I wish you all the best in your data cleaning using excel.

Comments

Popular posts from this blog

Microsoft Excel for Data Analysis

Data Collection