PHP Libraries Hate Ram
I’ve come to the conclusion that PHP libraries are simply designed to eat up RAM and do their very best to never spit it back out. There seems to be an assumption that everything will be done in RAM and then at the last possible moment dump everything out to the browser.
Sadly, this doesn’t work if what you’re building in RAM happens to be a zip file containing a whole heap of images. There are a few zip libraries around for PHP but none of them can directly stream the created zip file back out to the browser. Most of them create the entire zip file in RAM and then tell you to just ‘echo zip->file();’ which is just plain crazy. Others can “stream” but only to disk where they have random access.
How has PHP gone this long without recreating the ever so useful ZipOutputStream from Java?

May 25th, 2009 at 9:25 am
Hi,
What’s about ext/zip which allows you to create a stream from a zip entry, creating an image from an archive is then as simple as:
$im = imagecreatefromjpeg(‘zip://pics2009.zip#image1.jpeg’);
See http://www.php.net/manual/en/zip.examples.php
It does not support write mode for the streams but it’s coming. It is based on the well known libzip.
May 25th, 2009 at 9:29 am
oopsie, misread the post, coffee++ :)
But that’s still valid. ext/zip does not create the archive in ram but on the disk. You can then send it to the client (using x-sendfile for example).
libzip also supports java’s zip stream (single data compressed using zip, like in soap).
May 25th, 2009 at 9:38 am
Yeah, creating on disk is better than RAM but still means creating a ton of temporary files and dealing with that mess. It would be much nicer to be able to simply stream it out to the browser directly. I’m clearly going to have to settle with temp files though.
May 25th, 2009 at 9:42 am
If you create a zip stream only (like java zip stream), yes, it would be better and it is technically possible. However if you create a zip archive, it is not possible to output it directly to the browser, for obvious reasons :)
May 25th, 2009 at 9:49 am
The Java ZipOutputStream correctly handles multiple entries – so new ZipOutputStream(new FileOutputStream(myFile)); will create a valid zip file. Alternatively you can do new ZipOutputstream(request.getOutputStream()); and it will go straight to the browser – with the right headers it will download as a valid zip file.
I’m looking for the PHP equivalent but can’t find one.
May 25th, 2009 at 9:52 am
Yes, but are you sure that it does not create a temporary file?
It sounds very hard (impossible) to finalize an archive index when the archive content is not even known.
May 25th, 2009 at 11:12 am
I’m sure it doesn’t create a temporary file. When closing off the zip index you don’t need the file content – just some of the meta data (name, size, date etc). It’s pretty simple to store just that meta data in memory instead of the entire file so you can write the index at the end.
May 25th, 2009 at 12:05 pm
Interesting, I’d to take a look and see if I can add this kind of features to ext/zip (I maintain it). Thanks for the discussions&suggestions!
May 25th, 2009 at 12:19 pm
Cool. The OpenJDK project provides the Java source code under GPL license so that’s probably your best bet. It’s also included with the JDK download, but under a proprietary license which at least in the past had some pretty overbearing clauses about tainting etc. Not sure if that got cleaned up at some point, but OpenJDK is a safe bet anyway.
May 25th, 2009 at 12:21 pm
oh, I won’t read the sources but see how it works from a runtime pov. PHP license is not GPL compatible so I won’t taint my code with some possible GPL code (lawers–) :)
May 25th, 2009 at 12:32 pm
True, I don’t know why I was thinking PHP was GPL….
May 25th, 2009 at 8:11 pm
I have a lot of experience with this problem because the company I work for (Kink.com) offers .zip file downloads of all of our content. The issue is that it nearly impossible (ie: not reliable) to create a reliable streaming option for zip files because it is nearly impossible to know the final length of the file before the entire file has been created. The zip specification is crappy in this regard. The reason you need this information is that you need to be able to set the Content-Length: header so that downloads don’t look like they are unlimited in the browser download window. People like to have an idea of when a download will complete.
I tried implementing a dynamic streaming solution in PHP, Python and Java. PHP and Python have no clue about streaming (which you figured out already). They love to hold data in memory which is terrible for trying to create a large scale concurrent streaming system. Java does streaming really well, but it is impossible to know the total length of the resulting file until after it is created. On top of it, each file tends to get created with random sizes (even without compression), so you can’t cache this data anywhere.
We ended up punting on all of this and just create and cache the entire zip files. They get copied out to our CDN with the rest of our video/picture data. It doubled our storage requirements, but what’s a few extra TB these days. =)
May 25th, 2009 at 8:40 pm
Ah good point Jon. I hadn’t thought about the lack of a known content length for the browser’s progress bar. That makes me feel better about having to use temporary files. The zip files that are created are just thrown out once the download finishes in this case, since they’re fairly highly customised. Depending on traffic/load we can always investigate a caching layer to keep them around for longer.
May 26th, 2009 at 8:26 am
Right Jon, that was my thoughts too.
About php having no clue about stream, I think you should really consider to actually read the docs about the stream API in PHP (userland or internals). It is one of the most powerful out there. The problem is that some extensions do not rely on it (due to the underlying libraries for example), but the stream API itself in PHP is great, try it.
May 26th, 2009 at 8:38 am
Yeah, I’ve read the docs on the stream support, it’s pretty good, but as you mentioned a lot of stuff doesn’t use it which makes it far less useful. The Java stream stuff is long winded (moreso than the php stuff) but because it’s been there from day one and most of the Java libraries were developed purely in Java rather than using existing C libraries, it’s very comprehensively used.
Such is life.