If you work with a BI tool such as MicroStrategy, I’m sure at some point you have come across the need to split large PDF files / reports into smaller separate files.
I have searched far and wide and the only solutions I have found require you to purchase a tool. One such tool is PDF Splitter. It is not very expensive but if you need to split a lot of large files, it can be slow.
I have developed a script in Python that I use in a production environment and it is much faster than an application like PDF Splitter, not to mention free.
You will need to install the Python 3 interpreter which is a free download from www.python.org. You will also need the PyPDF2 package which you can download from here. To install the PyPDF2 package first extract the contents, then browse to the directory from a command prompt and run the following command: python setup.py install.
You can download the PDF Splitter Python code from my GitHub repository here.
The Python script takes 4 parameters.
- Source PDF File: This is the source PDF file that you want to split
- Output Directory: This is the output directory where the split files are placed
- Output Name Prefix: This is a naming prefix that will be appended to the beginning of each file. The script automatically names the files after the bookmarks. So for example if your PDF file has a bookmark labeled “EAT_AT_JOES_STORE_1” and you enter an output prefix of “TEST_RUN – “, the splitter will name this file after it has been split “TEST_RUN – EAT_AT_JOES_STORE_1.pdf“.
- Delete Source File: This is a True/False Boolean value that will delete the source file if set to True
I have this script running automatically in a production environment using Windows Task Scheduler. I create the task, pass the appropriate parameters and schedule it to run accordingly and it works great!
If you need help scheduling this to run automatically in Windows, here is how I did it:
- Create a batch file (for example Execute_Split_PDF_Reports.bat)
- Add the following line of code to the batch file: “C:\Program Files (x86)\Python35-32\python.exe” <Path to the Python script>\Split_PDF_Reports.py %1 %2 %3 %4 (Note: The values %1, %2, etc. are the parameters that will be passed from the Scheduled Task we will create in later steps)
- Replace the underlined portion of the above code with the path to where you saved the Split_PDF_Reports.py Python script. (Also, if you installed Python interpreter to anywhere other than the default directory, you may need to edit that path as well)
- Create a new Scheduled Task by going to Start->Control Panel->Administrative Tools->Task Scheduler
- Under the general tab make sure you choose the option “Run whether user is logged on or not” otherwise the task will not run unless a user is actively logged in. If this is a production server, you definitely want this option checked.
- Go to the Triggers tab and setup the schedule you want.
- Go to the Actions tab, click “New…” in the dropdown labeled “Action” choose “Start a program”. Then in the text box labeled “Program/script” click browse and choose the batch file we created in step 1. Next, add the 4 parameter values for the Python script separated by a space (Note: If your parameter values contain spaces make sure you surround them with quotes). Finally, enter the path where the batch file is located in the optional “Start in” text box. For example, if the batch file is located in the folder C:\Programs\BatchFiles then this is what you would enter in this field.
- Click OK and OK again in main window and your done!
Don’t forget to please leave comments below. Enjoy!