OK, maybe not frequently. And I might not have actually been asked them by anyone. But they are questions.
No, really, why?
Because it made sense to me. It’s great having magazines preserved as digital scans, but without the aid of an online index or a lot of time, finding content is difficult. OCR’ing means most of the text is now searchable, and using the “Find” utility built into every PDF search, it’s possible to find pretty much exactly what you’re looking for. Better still, most modern OS’s have “full text indexing”, so you can search across multiple files for any given word or phrase. The first mention of the SAM Coupe? The first review of Tetris? Just bang it into Spotlight on the Mac and off it goes.
Why are the files so big?
PREVIOUS ANSWER: Because they have to be. As I’m taking fairly low quality JPG images, decompressing and then compressing them again, it’s difficult to maintain the image quality without introducing a ton of artefacts (i.e. crap) into the image. Plus all the text and layout information takes up space too.
However, I’ve got the process more or less as good as it’s going to get now, so each issue of each mag shouldn’t be more than 10% bigger than all the source JPG’s put together. Plus you get all of that OCR’d goodness.
NEW, BETTER ANSWER: I’m now using different OCR software which doesn’t recompress the images. This has the main benefit of not affecting file image quality, but also means the file size should only be a little bigger than the equivalent source image.
Some people have commented that the image quality of some of the collections is noticeably worse than the raw JPGs available. This is certainly true in places, but please remember that some source scans are pretty ropey to start with. I’ve also done some “blind taste tests” with random pages from different collections, and I’m happy the quality is mostly fine. This is only an issue for collections released prior to December 2011: for future releases as I won’t be recompressing anything, but please feel free to contact me if you feel any issues or entire older collections could do with a re-encode – I’m happy to spend the time if necessary.
But some of the files are HUGE
Er, yes. The first few torrents I put together were much larger than they need to be, as I hadn’t refined the whole compression process. I’m working through repacking the first load, so eventually the older original collections will be retired. Apologies to everyone who spent time and bandwidth downloading and seeding these – the new collections are exactly the same, just half the size (or smaller).
But the files are still too big for me, I only want a couple of issues
No problem, each issue is an individual PDF – just set your torrent client to only download the individual files you want.
Why not use ComicBookReader (.CBZ/.CBR) format?
ComicBook Reader is great, but it’s not as seamless or as ubiquitous as PDF. Although the files are a bit smaller (due to extra compression), it means that they won’t show in things like OS X Coverflow. To me, the convenience of being able to open them on the iPad and in OS X Preview makes the extra few Mb of disk space worth it. If you disagree, I’m sure there are several utilities available that will convert PDF’s into ComicBookReader format (I haven’t tried any of them; please let me know if you have a favourite and I’ll pass on the recommendation).
I’m trying to “copy” the text from the magazines, and the results are a bit rubbish
Yeah, they are. The OCR has done it’s best, but low-res scans, random fonts, and some frankly criminal use of colour and layout (it was the 80’s, I s’pose) mean that you’re lucky if there’s more than a paragraph per page that’s been OCR’d “perfectly”. However, my aim was to create a locally SEARCHABLE archive (in my case, using OS X’s Spotlight), rather than something that could be used for grabbing the text – the underlying text is just for searching, leaving the image for human eyes to decode perfectly 🙂
Did you scan all of these mags yourself?
Good God no. They’ve been leeched, snaffled and spirited from various different sites around the Interweb. A fair chunk of them came from the sublime World of Spectrum site; sincere thanks to Martijn for lifting the bandwidth limits for me. Others came from torrent sites of varying degrees of respectability. Kudos to those who spent the time and energy scanning the original mags – I know from personal experience how much “fun” this is. I’ve tried to credit the original scanners where I can – I’ve seen some sites (like archive.org) redistributing the scans and crediting them as “scanned by Ken D”, and it’s just not true; I’m just the OCR’ing man. I’m not one to take credit where it’s not due, honest!
The only ones that are completely “my own work” is the Amiga Shopper scans. I do have plans to do my own scans of Amiga Format and some Amiga Powers – watch the main blog page for any news on this.
Is this legal?
Probably not. Don’t tell the police. But I’d hope that whoever owns the copyrights on these mags respects that no-one’s making a profit out of their property; these mags are presented for anyone who enjoyed them the first time around and can use them to bore younger relatives with pictures of computers with tape drives and single-button joysticks.
Can I redistribute the collections?
Help yourself. It’d be nice if you left the OCR’ing credit with me, and please don’t sell them anywhere (including eBay), but otherwise do what you like with them. And please, keep seeding the torrents for as long as you can – it’s the decent thing to do. Every little helps.
I’ve been trying to download a certain collection for ages, but there’s no seeds.
First, try updating the trackers within the torrents. Over time, the trackers embedded within the original torrent file may (i.e. probably have) moved, going offline, or simply don’t work. I regularly update the tracker list used by my own seeds, so if you can’t find any sources, try updating the tracker list on the torrent with the one on this page.
If that doesn’t work, or doesn’t make any sense, please drop me a mail and I’ll do what I can to help.
I’ve just started downloading, but the torrents looks like it’s going to take ages
I’ve got the torrents hosted on a very fast connection, but I can’t use the bandwidth during the day (UK time). Leave your torrents running overnight, and you should see a big jump in download speed. And leave the torrent seeding afterwards, so everyone will benefit 🙂
I hate torrents, can’t you upload them to megadownfileshare4u.net for me please?
I don’t trust file hosting sites – too many adverts and paywalls for my liking – but if someone wants to upload them to their file hoster of choice and send me the links, I’ll happily add them here so others can use them.
I’ve been trying to download a certain collection for ages, but my internet connection is too slow. Can you send me it on DVD?
It’s not something I’m keen to do a lot of, but if you’re REALLY stuck mail me and we’ll see if I can work something out. No promises though. If you’re just looking for the magazine scans without OCR, Mort sells a fine collection on http://www.zzap64.co.uk .
I’ve got some old magazines missing from your collection – would you like them?
Ooooh yes please. I can’t offer you cash for them I’m afraid (I’m skint), but I will pay for postage. If at all possible, I can return the mags to you after scanning if you like. However, scanning “perfect bound” (i.e. hard spine) mags involves a very sharp craft knife so I’m afraid it’d be a one-way trip 😦
I’ve got some old magazines missing from your collection already scanned, or know where to get them. Can you OCR them?
Absolutely – drop me a mail
This is all too good to be true, I must reward you!
I’m not doing any of this for profit, or even to break even – this was a project for my own amusement, and I thought it would be good Karma to share with like-minded old farts. If you do feel the need to say “thanks”, I’ve added a Paypal button below. Any amount (no matter how small) will be gratefully received and will go towards more mags, replacement hard drives, beer and old computery bits from eBay that my wife insists I don’t really need. But not necessarily in that order.
Everything here is free (“as in beer”), but if you’d like to help me get more mags online:
Any amount will help – see the FAQ for more details.
Do you have a collection of tatty, rotting magazines that haven’t been scanned yet that you’d like to donate to a good home? Or have a question/comment? Feel free to contact me at email@example.com