Combine Multiple Worksheets/Workbooks into a Single PowerPivot Table

One of those simple but indispensable tricks

Back to a “real” post now after all the book stuff, but it’s going to be a short one while I get back on my feet.

Let’s say you have multiple worksheets (or workbooks) that all contain the same sort of data:

image  image  image

Multiple Worksheets (or Workbooks), All Contain The Same Type of Data

You Want to Combine ALL of Them Into a Single PowerPivot Table

These worksheets all come to you separately, but really you just want them as one big table.

Naturally, if it’s a small number of sheets, and each sheet isn’t massive, you can just copy paste them all into one table in Excel, then copy/paste into PowerPivot, or link the table into PowerPivot, or export as CSV so you can import it.

And you could also use Paste Append to directly paste into PowerPivot.

But if the combined data set exceeds 1 million rows, you won’t be able to combine the sheets into one – you will exceed the worksheet row limit.  And a data set of that size is not something you can paste into PowerPivot directly with Paste Append – pasting large data sets into PowerPivot takes forever, if it completes at all.

Here’s what I do when I find myself in this position:


1) Remove the Header Row

image

Delete the Header Row – We Don’t Want The Header
Duplicated Multiple Times in the Combined Data Set

My goal is to have the data start in cell A1:

image

Important that the Data Starts in A1

2) Save each sheet as CSV.

When you save as CSV, it saves only the active sheet, so you have to repeat this for each sheet (or each workbook).

This yields, for instance:

image

Now you Have a Bunch of Individual CSV Files

3) Open a Command Prompt

image

Start –> All Programs –> Accessories

Yielding:

image

Yep, That’s a Cmd Prompt

4) Change Directory to the Folder Containing Your CSV Files

image

This Isn’t a Cmd Prompt Tutorial, but the CD Command Changes Folder

5) Copy Them All Into One File

image

Copy *.csv combined.csv

This part is really the only “trick” in the whole process.

This command merges all of your CSV files into a single CSV file.  If I look back in Windows Explorer now:

image

 

6) Now Import that CSV into PowerPivot

image

Last Step

Bonus:  Macro to Save All Workbooks in a Folder as CSV

If you have a bunch of workbooks in a folder and you need to save them all as CSV, here’s a macro that will do it all for you:

Sub ProcessWorkbooksInFolder()

    Dim sPath As String
    Dim sFile As String
    Dim sDir As String
    Dim oWB As Workbook
   
    ‘***Set this to your folder of workbooks***
    sPath = “C:\Users\rob\desktop\csvfiles”
   
    If Right$(sPath, 1) <> “\” Then sPath = sPath & “\”
    sDir = Dir$(sPath & “*.xlsx”, vbNormal)
   
    Do Until LenB(sDir) = 0
        Set oWB = Workbooks.Open(sPath & sDir)

        SaveAsCSV       
       
        oWB.Close False
        sDir = Dir$
    Loop

End Sub

Sub SaveAsCSV()
   
    Dim sNewPath As String
    sNewPath = GetFileName(ActiveWorkbook.FullName)
    sNewPath = sNewPath & “.csv”
    ActiveWorkbook.SaveAs Filename:=sNewPath, FileFormat:=xlCSV, CreateBackup:=False

End Sub

Function GetFileName(sFullPath As String) As String
    GetFileName = Replace(sFullPath, “.xlsx”, “”, 1, -1, vbTextCompare)
    GetFileName = Replace(sFullPath, “.xlsb”, “”, 1, -1, vbTextCompare)
    GetFileName = Replace(sFullPath, “.xlsm”, “”, 1, -1, vbTextCompare)
    GetFileName = Replace(sFullPath, “.xls”, “”, 1, -1, vbTextCompare)
End Function

23 Responses to Combine Multiple Worksheets/Workbooks into a Single PowerPivot Table

  1. David Hager says:

    Great info, Rob! Do you also know how to break a csv file into a gruop of smaller files (without opening it)?

  2. Erik Olsson Dibbern says:

    It is very odd that this action (combining datasets) isn´t a native part of PowerPivot. Consolidation of data must be one of the most common work-tasks for the average Exceljockey.

    I know that Kasper de Jong posted another way around this problem a while back using Excel´s data connection as the method.

    Best / Erik

  3. Donald Parish says:

    Nice VBA. I’ve been using that DOS command to combine my UPS Billing files for each week into one large csv for PowerPivot. Would be nice if PowerPivot could do an incremental update each week instead of importing all of the data even though the only data changed is in the newest file.

  4. Andrew Sears says:

    For splitting large files, you can try this macro.
    http://answers.microsoft.com/en-us/office/forum/office_2007-customize/vba-code-to-split-text-file/32e26d6c-7422-46df-b9c1-741d9319de95

    Otherwise try searching for text file splitter.

    cheers/Andrew

  5. Bob Pappas says:

    When confronted with three large spreadsheets, I’ve found creating linked tables to accomplish the first step of the goal. Then defining links between them provides a means of joining the tables. Or do i not underatand what you were trying to create?

  6. dan says:

    If you are more comfortable working within powerpivot and know a little SQL then the following post is quite useful : http://www.contextures.com/PowerPivot-Identical-Excel-Files.html
    plus you don’t have to ‘save as’ with each file and use a cmd window.

    • SAP user says:

      the example works fine but with my sheets I did not succeed. Although the sheets have identical headers however different number of rows connection was not possible. Is there a limit to the number of columns?

  7. David says:

    This is pretty great. However, the remove header and/or saving each csv filter is still a manual process for each sheet. Group all sheets and deleting the row work might work for the remove header.

    Any way to quickly save all sheets into separate csv files without a macro?

  8. Paddy says:

    Is there any way of doing this for Non-excel databases? I use Dynamics NAV & want to merge the Sales Line, Sales Invoice Line & Sales Cr.Memo line tables before creating a report from the combined table.

    Many thanks in advance

  9. Alex says:

    Rob, this is superb. It is very nice and simple way of doing it. I exercised it with .csv files 2; 3 and up to 4 GB. Now, I hit 4,2 GB and it is not working. I have 16 GB RAM, so I can assume that local memory is not an issue… Any ideas why if .csv > 4GB it is impossible to import CSV into PowerPivot? Thank you.

    • powerpivotpro says:

      When does it fail? On import into PowerPivot, or on creation of the combined CSV?

      Also, are you running 32 or 64-bit Excel? If 32 bit, Excel can only use something like 2 GB of RAM, no matter how much RAM is on your computer. I prefer 64-bit Excel purely because of PowerPivot.

      • SAP user says:

        But if your office version is 32-bit it is apparently not possible to install the 64-bit add-in? Our IT department recently installed the powerpivot add-in however it does not work correctly, e.g. i cannot get data from different sources into powerpivot, only by copy and paste.
        How do I find out if they installed the correct 32-bit add-in or is there a way to test it?

  10. Alex says:

    Hi, thank you for your reply. it fails when I tried to import CSV into PowerPivot. This particular CSV can hold 4,02 GB (app 16,5M lines X 36 columns)- no problem. I have 64-bit excel.
    I am not a pro and start exercising PowerPivot only few months ago, but fail sharply after 4 GB looks really strange and therefore I refer to RAM-issue…

    • powerpivotpro says:

      36 columns is a lot. Do you need all of those? Can you import just a subset of them? PowerPivot has no problem with massive row counts but columns are a killer with large data sets. Also, if you have calculated columns added to that table (in PowerPivot), try deleting them, then importing the data, then adding the columns back. That sometimes works.

  11. Alex says:

    Thank you for your comment. I followed your suggestion:
    1- Decrease number of columns from 35 to 28. Result- same file is not 4,2 GB it is now 3,37 GB.
    2- Load .csv as text to PowerPivot- no problem. Now even faster.
    Result: that particular problem- solved.
    3- Increase .csv with 28 columns by *2 lines. Result- 28 columns X 30M lines, .csv=6,6GB
    4- Load 6,6 GB.csv as text to PowerPivot. Same error as before. “an error occurred while loading the file” -> table import wizard -> unhandled exception occurred in a component in your application… object reference not set to an instance of an object”
    I think it is obvious that it is impossible (in my case) to load .csv>4,00 GB into PowerPivot.
    Why did I start using .csv solution?
    Previous exercise with PowerPivot: -> load data via sql directly from source (ORACLE). Very heavy sql query with many tables. Case: query should be completed between 05.00 and 23.00 (tables updated between 23.00 and 05.00). in given time I can load max 4M lines, so combining multiple workbooks seems like a good alternative. Run sql in excel-> create data workbooks with <1M lines:
    Option1 – “UNION ALL” SQL http://www.contextures.com/PowerPivot-Identical-Excel-Files.html for same reasons as above can’t combine more than 4 GB of excel books (in this case max 6 books or 5M lines X 36 columns;
    Option2- use Rob’s example. Can combine more compressed .csv files (15M lines X 36 columns or app 20-22M lines X 28 columns.
    Conclusion: (in my case) no matter how I doing it in 64-bit excel I am hitting the wall after 4 GB data source.
    Question: how to load more lines with 28 columns within max 4GB data file? So fare .csv is the best, but kind of not good enough for my requirements.

  12. Jean-Pierre Heileman says:

    Hello
    There is à problème with importing data from CSV file when a text column contains both numeric & text values : if there are more numeric values then text value are loss and vice et versa.
    I tried to use “schema.ini” to force text columns to be text but it doesnt work …
    does somebody had solve this problème ?

  13. Dennis Sevilla says:

    How do you edit the CSV if it’s great than 1M rows and you want to delete the header before combining it?

    Also, is there a way to import the data into an existing powerpivot model? For ex. I have monthly data that I need to refresh every month in an existing powerpivot model but I’m not sure how to update it without creating a new table and remapping all the formulas.

    • Chris Gilbert says:

      I use a text editor (EverEdit (free version)) which can load a .CSV file FAST! Remove the top line, re-save, and you’re good to go!

      • Dennis Sevilla says:

        Thanks! Do you know what the max # of rows EverEdit can adjust? I tried Notepad++ and it capped out.

        Re: PowerQuery, I don’t think I can do that for this use case since it’s just a bunch of 5M+ row files that I need to combine and see in one table…

  14. Chris Gilbert says:

    You might want to try the latest PowerQuery Add-in for Excel (from Microsoft, and now part of their Power BI offering.) Do a web search for the download location.

    • Chris Gilbert says:

      With PowerQuery, you should change the type from “number” to “text” of your mixed-type field (like the Column3 entry below) in order to force it to load even the all-numeric entries as text, and this should eliminate the data loss issues. (I had tried the schema.ini route, but found it to not be reliable.)

      Table.TransformColumnTypes(Source,{{“Column1″, type number}, {“Column2″, type number}, {“Column3″, type text},…

Leave a Reply