One of those simple but indispensable tricks
Back to a “real” post now after all the book stuff, but it’s going to be a short one while I get back on my feet.
Let’s say you have multiple worksheets (or workbooks) that all contain the same sort of data:
Multiple Worksheets (or Workbooks), All Contain The Same Type of Data
You Want to Combine ALL of Them Into a Single PowerPivot Table
These worksheets all come to you separately, but really you just want them as one big table.
Naturally, if it’s a small number of sheets, and each sheet isn’t massive, you can just copy paste them all into one table in Excel, then copy/paste into PowerPivot, or link the table into PowerPivot, or export as CSV so you can import it.
And you could also use Paste Append to directly paste into PowerPivot.
But if the combined data set exceeds 1 million rows, you won’t be able to combine the sheets into one – you will exceed the worksheet row limit. And a data set of that size is not something you can paste into PowerPivot directly with Paste Append – pasting large data sets into PowerPivot takes forever, if it completes at all.
Here’s what I do when I find myself in this position:
1) Remove the Header Row
Delete the Header Row – We Don’t Want The Header
Duplicated Multiple Times in the Combined Data Set
My goal is to have the data start in cell A1:
Important that the Data Starts in A1
2) Save each sheet as CSV.
When you save as CSV, it saves only the active sheet, so you have to repeat this for each sheet (or each workbook).
This yields, for instance:
Now you Have a Bunch of Individual CSV Files
3) Open a Command Prompt
Start –> All Programs –> Accessories
Yielding:
Yep, That’s a Cmd Prompt
4) Change Directory to the Folder Containing Your CSV Files
This Isn’t a Cmd Prompt Tutorial, but the CD Command Changes Folder
5) Copy Them All Into One File
Copy *.csv combined.csv
This part is really the only “trick” in the whole process.
This command merges all of your CSV files into a single CSV file. If I look back in Windows Explorer now:
6) Now Import that CSV into PowerPivot
Last Step
Bonus: Macro to Save All Workbooks in a Folder as CSV
If you have a bunch of workbooks in a folder and you need to save them all as CSV, here’s a macro that will do it all for you:
Sub ProcessWorkbooksInFolder()
Dim sPath As String
Dim sFile As String
Dim sDir As String
Dim oWB As Workbook
‘***Set this to your folder of workbooks***
sPath = “C:\Users\rob\desktop\csvfiles”
If Right$(sPath, 1) <> “\” Then sPath = sPath & “\”
sDir = Dir$(sPath & “*.xlsx”, vbNormal)
Do Until LenB(sDir) = 0
Set oWB = Workbooks.Open(sPath & sDir)
SaveAsCSV
oWB.Close False
sDir = Dir$
Loop
End Sub
Sub SaveAsCSV()
Dim sNewPath As String
sNewPath = GetFileName(ActiveWorkbook.FullName)
sNewPath = sNewPath & “.csv”
ActiveWorkbook.SaveAs Filename:=sNewPath, FileFormat:=xlCSV, CreateBackup:=False
End Sub
Function GetFileName(sFullPath As String) As String
GetFileName = Replace(sFullPath, “.xlsx”, “”, 1, -1, vbTextCompare)
GetFileName = Replace(sFullPath, “.xlsb”, “”, 1, -1, vbTextCompare)
GetFileName = Replace(sFullPath, “.xlsm”, “”, 1, -1, vbTextCompare)
GetFileName = Replace(sFullPath, “.xls”, “”, 1, -1, vbTextCompare)
End Function



Great info, Rob! Do you also know how to break a csv file into a gruop of smaller files (without opening it)?
It is very odd that this action (combining datasets) isn´t a native part of PowerPivot. Consolidation of data must be one of the most common work-tasks for the average Exceljockey.
I know that Kasper de Jong posted another way around this problem a while back using Excel´s data connection as the method.
Best / Erik
Nice VBA. I’ve been using that DOS command to combine my UPS Billing files for each week into one large csv for PowerPivot. Would be nice if PowerPivot could do an incremental update each week instead of importing all of the data even though the only data changed is in the newest file.
For splitting large files, you can try this macro.
http://answers.microsoft.com/en-us/office/forum/office_2007-customize/vba-code-to-split-text-file/32e26d6c-7422-46df-b9c1-741d9319de95
Otherwise try searching for text file splitter.
cheers/Andrew
Thanks. I’ll try it out.
When confronted with three large spreadsheets, I’ve found creating linked tables to accomplish the first step of the goal. Then defining links between them provides a means of joining the tables. Or do i not underatand what you were trying to create?
If you are more comfortable working within powerpivot and know a little SQL then the following post is quite useful : http://www.contextures.com/PowerPivot-Identical-Excel-Files.html
plus you don’t have to ‘save as’ with each file and use a cmd window.
the example works fine but with my sheets I did not succeed. Although the sheets have identical headers however different number of rows connection was not possible. Is there a limit to the number of columns?
This is pretty great. However, the remove header and/or saving each csv filter is still a manual process for each sheet. Group all sheets and deleting the row work might work for the remove header.
Any way to quickly save all sheets into separate csv files without a macro?
Is there any way of doing this for Non-excel databases? I use Dynamics NAV & want to merge the Sales Line, Sales Invoice Line & Sales Cr.Memo line tables before creating a report from the combined table.
Many thanks in advance
Rob, this is superb. It is very nice and simple way of doing it. I exercised it with .csv files 2; 3 and up to 4 GB. Now, I hit 4,2 GB and it is not working. I have 16 GB RAM, so I can assume that local memory is not an issue… Any ideas why if .csv > 4GB it is impossible to import CSV into PowerPivot? Thank you.
When does it fail? On import into PowerPivot, or on creation of the combined CSV?
Also, are you running 32 or 64-bit Excel? If 32 bit, Excel can only use something like 2 GB of RAM, no matter how much RAM is on your computer. I prefer 64-bit Excel purely because of PowerPivot.
But if your office version is 32-bit it is apparently not possible to install the 64-bit add-in? Our IT department recently installed the powerpivot add-in however it does not work correctly, e.g. i cannot get data from different sources into powerpivot, only by copy and paste.
How do I find out if they installed the correct 32-bit add-in or is there a way to test it?
Hi, thank you for your reply. it fails when I tried to import CSV into PowerPivot. This particular CSV can hold 4,02 GB (app 16,5M lines X 36 columns)- no problem. I have 64-bit excel.
I am not a pro and start exercising PowerPivot only few months ago, but fail sharply after 4 GB looks really strange and therefore I refer to RAM-issue…
36 columns is a lot. Do you need all of those? Can you import just a subset of them? PowerPivot has no problem with massive row counts but columns are a killer with large data sets. Also, if you have calculated columns added to that table (in PowerPivot), try deleting them, then importing the data, then adding the columns back. That sometimes works.
Thank you for your comment. I followed your suggestion:
1- Decrease number of columns from 35 to 28. Result- same file is not 4,2 GB it is now 3,37 GB.
2- Load .csv as text to PowerPivot- no problem. Now even faster.
Result: that particular problem- solved.
3- Increase .csv with 28 columns by *2 lines. Result- 28 columns X 30M lines, .csv=6,6GB
4- Load 6,6 GB.csv as text to PowerPivot. Same error as before. “an error occurred while loading the file” -> table import wizard -> unhandled exception occurred in a component in your application… object reference not set to an instance of an object”
I think it is obvious that it is impossible (in my case) to load .csv>4,00 GB into PowerPivot.
Why did I start using .csv solution?
Previous exercise with PowerPivot: -> load data via sql directly from source (ORACLE). Very heavy sql query with many tables. Case: query should be completed between 05.00 and 23.00 (tables updated between 23.00 and 05.00). in given time I can load max 4M lines, so combining multiple workbooks seems like a good alternative. Run sql in excel-> create data workbooks with <1M lines:
Option1 – “UNION ALL” SQL http://www.contextures.com/PowerPivot-Identical-Excel-Files.html for same reasons as above can’t combine more than 4 GB of excel books (in this case max 6 books or 5M lines X 36 columns;
Option2- use Rob’s example. Can combine more compressed .csv files (15M lines X 36 columns or app 20-22M lines X 28 columns.
Conclusion: (in my case) no matter how I doing it in 64-bit excel I am hitting the wall after 4 GB data source.
Question: how to load more lines with 28 columns within max 4GB data file? So fare .csv is the best, but kind of not good enough for my requirements.