Concatenate images with OpenCVThe PDF file format has several advantages. When generating a PDF file, you have the certainty that it will be displayed the same way on every platform. On print, it's going to be the same everywhere.
Meanwhile, the PDF format isn't easily editable. The choices made by the file viewer can also be counter productive in some specific scenarios (when displaying full paged images from comics for example, the reoslution of the image has nothing to do with the resolution of the PDF and the display, meaning that you can have some pixelization occuring...)
Scientific publications in PDF (or those same comics) embed sometimes infographies, charts and pictures that you may need to reuse.
The pdfimages tool can extract JPEG images embedded inside a PDF file (doing it also for other image file format that we aren't going to use here). You can find a precompiled binary for Win32 there : http://www.foolabs.com/xpdf/download.html. You can choose to keep only the pdfimages.exe file and delete everything else. When it's saving the images, the tool doesn't 'reinterpret' them, which is important for JPEG images : there's no loss of quality when doing a new compression/decompression cycle.
The syntax to extract the files in JPEG is -j :
pdfimages.exe -j "nom_du_fichier_pdf" 0
To keep the numerotation of the file the same as in the file, youneed to provide a prefix to the saved files. You can use what you want obviously. Here, I use '0'.
OpenCV to the rescueSome tools split the image when embedding them inside the PDF. You can't see it onscreen but when the images are dumped from the PDF, they are extracting in several pieces.
I coded a little tool with OpenCV that takes all the images of a folder and concatenate them in logical 'pages'. OpenCV is overkill for a work like that, but by using it you can later add some effects (for example) on the images.
To read the folders and files in a multiplatform way (for a future port on Linux, for example), I use a simple header : tinydir.h (by Cong Xu and Baudoin Feildel, from here : https://github.com/cxong/tinydir).
To handle different cases of image splitting, the tool accept as a parameter the number of successive images to reassemble. Forexample, if you want to concatenate the images 5 by 5, here's the commandline :
By default, the tool is going to save the pictures sequentially from 000.jpg to 999.jpg (you can edit the code if you need to go over that).
For the moement, the tool works only with JPEG (with the .jpg extension). You need to add the checks for other file formats if you need to concatenate PNG or others.
The code isn't completely clean, but it's providing a good codebase to start with. For exemple, the proper way to handle memory would be to free the temporary images in RAM, handled by the vector. However the tool is fast and is only made to run for a short (it shoudln't stay in memory for too long). The memory segment allocated for the program is going to be properly freed at the end of the execution by the OS.
The code compile with OpenCV 2.4.9 and Visual Studio 2012, but it should works with any OpenCV install above 2.0+ and a fairly recent VS install (something above VS 2005). The linked libraries are declared as #pragma comment(lib,"") at the head of the main cpp file. You need to change those lines to match your installation.(This way of declaring the libraries is helpful when dealing with different version of OpenCV installed concurrently on the system...).
The code is available on GitHub : https://github.com/Pseudopode/OpenCVConcat, with an archive containing the compiled binary. The same archive is available here : https://dl.dropboxusercontent.com/u/1412774/Blog/BlogSpot/OpenCV_Concat/OpenCVConcat.1.0.7z.