I do a lot of work with excel for manipulating data to import into databases and such. One thing I have had to do in the past is delete duplicate rows. After a little bit of searching I have discovered an easy way to do this programmatically without manually deleting them. Thanks to this site http://www.rondebruin.nl/delete.htm for providing the script. He doesn’t have a comment section on his website, so Ron, if you are reading this, Thank you! I appreciate it. You have saved me tons of time and headaches! I thought I would re-post his script with an explanation of how I was able to apply his script to my excel documents.
So the scenario that I ran into was that I had 21000 rows of items where I could not have duplicates. To make this easy lets just include one column of the data and around 20 rows, and we will say that the column is A.
- What you first do is sort the column by A, this way the duplicates that you have appear one after another. The icon to sort looks like this, you most likely want to sort A-Z and sort anything that looks like a number as a number. It is on the far right-hand side of the ribbon on the Home Tab in Excel 2007.
- Then in column B1 you would do the following formula (those are double quotes)
- =IF(A2=A1,”YES”,””)
- Basically what you are saying above is if the CellA2 (Widget1) is equal to CellA1(Widget2), then place a “YES” in column B1, so we know that cell A1(widget1) is a duplicate of cell A2 (widget1). This will help the script tell whether or not this is a row that we have to delete.
- So with that logic, if you look at my example above, Cell A1(widget1) is the same as A2(widget1), and Cell A2(widget1) is the same as A3(widget1), but A3(widget1) is different from A4(widget2) so it does not put a “YES” in Cell B3, but a blank space instead.
- Now we copy that formula to all cells by selecting cell B1, and copying that formula. Easiest way to do this is, after you have cell B1 selected, in the bottom right-hand corner of the cell is a small black square that when you just put your cursor over it, it will turn into a plus symbol. Double-click when it turns into a plus and it should copy it to all cells in the B column. You could also drag that plus down B as well (I realize this is basic excel but perhaps there is someone on here who is fairly new to excel)
- Now we need to go to the Developer Tab in Excel so we can paste and execute the script. If its not already there, it can be shown by viewing the following link (different versions of excel have different ways of doing this so I will refer you to this link) http://msdn.microsoft.com/en-us/library/bb608625.aspx The Developer Tab should now appear on the ribbon.
- Click on the Developer tab that appeared, then Click the Visual Basic icon on the left-hand side of the ribbon which will bring up a separate Visual Basic window.
- Assuming that you are working on Sheet1 then Double-Click on the “Sheet1” icon in the left-hand pane of the Visual Basic window. This will create a blank white page in the right hand pane.
- Now here’s where we use the VB code, which I borrowed from the following site http://www.rondebruin.nl/delete.htm . The most important part of this code is the items that I have in bold. We basically tell it, if cell B has a “YES” in it then delete the entire row. You can change these 2 values to adapt the code to your excel sheet if you have a different cell or different wording. Another thing is that I think this is case-sensitive so if nothing happens then double-check that you have the correct row in there.
Sub Loop_Example() Dim Firstrow As Long Dim Lastrow As Long Dim Lrow As Long Dim CalcMode As Long Dim ViewMode As Long With Application CalcMode = .Calculation .Calculation = xlCalculationManual .ScreenUpdating = False End With 'We use the ActiveSheet but you can replace this with 'Sheets("MySheet")if you want With ActiveSheet 'We select the sheet so we can change the window view .Select 'If you are in Page Break Preview Or Page Layout view go 'back to normal view, we do this for speed ViewMode = ActiveWindow.View ActiveWindow.View = xlNormalView 'Turn off Page Breaks, we do this for speed .DisplayPageBreaks = False 'Set the first and last row to loop through Firstrow = .UsedRange.Cells(1).Row Lastrow = .UsedRange.Rows(.UsedRange.Rows.Count).Row 'We loop from Lastrow to Firstrow (bottom to top) For Lrow = Lastrow To Firstrow Step -1 'We check the values in the B column in this example With .Cells(Lrow, "B") If Not IsError(.Value) Then If .Value = "YES" Then .EntireRow.Delete 'This will delete each row with the Value "YES" 'in Column A, case sensitive. End If End With Next Lrow End With ActiveWindow.View = ViewMode With Application .ScreenUpdating = True .Calculation = CalcMode End With End Sub
9. Now here’s where we use the code from http://www.rondebruin.nl/delete.htm to find cells in row B that have a YES in them, and then delete the column. Copy the code above from “Sub Loop Example ()” to “End Sub” . The most important part of this code is the items that I have bolded. We basically tell it, if cell B has a “YES” in it then delete it. You can change these 2 values to adapt the code to your excel sheet. Another thing is that I think this is case-sensitive so if nothing happens then double-check that you have the correct Column in there and correct case-sensitive “YES”.
10. When you are confident that you have the correct values in there then you run the script by clicking the “Play” (run) button which is located on the top of the screen under the “debug” menu option (see below).
11. Did it run? Switch back to the excel document and you should see the following with the duplicates deleted.
12. Now you can just select the entire column B (where you had the YES’s previously and it now says “#REF!”) and delete the entire column. You should how have gotten rid of all your duplicates
In a real world scenario you probably have more than one column, all you have to do is insert a new blank row next to the column that you want to check for duplicates and place the formula there, then modify the VB Script above to use whatever cell it created, so instead of With .Cells(Lrow, “B”) use, say, With .Cells(Lrow, “D”) to check column D for the “YES”.
If you do not see that it deleted duplicate rows (#REF!) make sure that you have the correct column in the script, and that the case that you have entered in the formula in Column B is the same as in the part of the VB code that you modified to say “YES”
That should be it. Also visit his site http://www.rondebruin.nl/tips.htm for other tips and examples of doing tricky things in excel.