drdave
Junior Member
Posts: 99
|
Post by drdave on Jul 17, 2013 1:22:11 GMT
Hi. Has anybody gone to lengths and arranged a zipped version of the comic? I'd really like to have it on my tablet when I'm abroad. Oh, please don't go "buy the books" on me because I did. If this is out of place, delete away.
|
|
|
Post by Señor Goose on Jul 17, 2013 8:13:15 GMT
Hmm, good question. Let me go math.
|
|
|
Post by Señor Goose on Jul 17, 2013 8:30:50 GMT
Alright, assuming an average page size of 400 KB, and currently there are 1,224 pages in the comic, we're looking at 489,600 KB of data. Uncompressed, that is. After compressing a picture for testing I discovered that the compression ratio is only 1%. That means that the entire folder would only be one percent smaller- 484,704 KB instead of 489,600 KB. So if you have half a Gig of memory on your tablet, great.
|
|
|
Post by GK Sierra on Jul 17, 2013 9:16:13 GMT
Hi. Has anybody gone to lengths and arranged a zipped version of the comic? I'd really like to have it on my tablet when I'm abroad. Oh, please don't go "buy the books" on me because I did. If this is out of place, delete away. Go to one of the GKC comics (today's, for example) and right click on it, then click "view image". Observe how the URLs are ordered, one after the other. Now write a macro that grabs all those images. Should be pretty simple, all you have to do is tell one value to increase by one for each operation. That's the only way I can think of.
|
|
|
Post by Toloc on Jul 17, 2013 9:32:20 GMT
possible, yes should be relatively easy to build a script to do it even. the pages are jpegs, so further compression would be pointless. I don't think size would be much of an issue, ~15MB a chapter I guess. One wouldn't have to carry the whole thing all the time. Not too sure if the Great Creator would be pleased, of course.
|
|
|
Post by eightyfour on Jul 17, 2013 9:58:56 GMT
The complete comic uncompressed is currently 223 MB large. I know that because I save every page to my disk the day it is posted. I had too many good comics disappear in the digital Nirvana on me in the past, so now I just make sure I always have a copy. Not that I expect this to happen with GC. And no, I'm sorry, but I'm not gonna make my archive available online (at least not unless Tom gives his explicit OK, which I don't expect to happen either). But as others have said, it's fairly easy to write a script that grabs all the images for you. Wget is a handy tool for this kind of thing.
|
|
|
Post by TBeholder on Jul 17, 2013 9:59:16 GMT
Alright, assuming an average page size of 400 KB, and currently there are 1,224 pages in the comic, we're looking at 489,600 KB of data. Uncompressed, that is. After compressing a picture for testing I discovered that the compression ratio is only 1%. That means that the entire folder would only be one percent smaller- 484,704 KB instead of 489,600 KB. So if you have half a Gig of memory on your tablet, great. Add some overhead for the index, too. But hey, compressing what, JPEG? Wasn't this obvious from the start? Then again, simply having it as one chunk may save as much as actual "compression" above, and there are Comix and other fancy viewers. So it may make sense to pack it... not even necessary actual compression, just tarball it, rename *.cbt and not bother CPU with extra unpacking. Or make one "archive" per chapter, for that matter. Why not? Observe how the URLs are ordered, one after the other. Now write a macro that grabs all those images. Should be pretty simple, all you have to do is tell one value to increase by one for each operation. That's the only way I can think of. That, or as a lazy and nice option: use HTTrack, then see all the pictures under "www.gunnerkrigg.com/comics/" in the created mirror. It would also go slightly easier on the server - it would take HTTP pages, but also throttle speed, reuse connections (of course, e.g. wget also can be told to "--limit-rate=20000" if you care) and "view" external ads once per mirrored page if you didn't exclude them. The good part here is that you can update the mirror later... not that it would be too hard to enhance the same batch script so that it skips existing files.
|
|
|
Post by Xan on Jul 17, 2013 12:14:58 GMT
For a Windows-based solution, I used Woofy several times. You basically write regular expressions for the comic page image and the "Next" (or "Previous"?) link, and it will crawl the site for you. Edit: A suitable definition was constructed by the inquisitive mind of the formidable GK Sierra: comic "Gunnerkrigg Court" start_at "http://gunnerkrigg.com" for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg">"""): download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""")
|
|
|
Post by Señor Goose on Jul 17, 2013 15:48:20 GMT
I've occasionally considered building a collection of every page, but I don't know any way to do that aside from individually saving every single one individually. I am not good with coding.
|
|
|
Post by Xan on Jul 17, 2013 15:54:42 GMT
|
|
|
Post by GK Sierra on Jul 17, 2013 19:46:40 GMT
|
|
|
Post by sapientcoffee on Jul 17, 2013 20:19:30 GMT
So if you have half a Gig of memory on your tablet, great. Depending on the data plan/availability of free wi-fi, it might be the best way to go even with limited memory.
|
|
|
Post by GK Sierra on Jul 17, 2013 20:30:35 GMT
I'm trying to make a new XML/Definition file for Woofy, but I'm running into some trouble. If someone who is better with regular expressions could debug this for me, I would be very grateful. error on start:GKC Extension(1,1): BCE0043: Unexpected token: <. GKC Extension(3,16): BCE0044: unexpected char: '!'. code:<?xml version="1.0" encoding="utf-8" ?> <comicInfo friendlyName="Gunnerkrigg Court"> <startUrl><![CDATA[http://gunnerkrigg.com/]]></startUrl> <firstIssue><![CDATA[http://gunnerkrigg.com/comics/00000001.jpg]]></firstIssue> <comicRegex><![CDATA[(?<content>/comics/[0-9]{8}_[^.]*\.(jpg)]]></comicRegex> <backButtonRegex><![CDATA[<a\shref="(?<content>/d/comics/[0-9]{8}\.html)"\starget="_self"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg" </a>]]></backButtonRegex> </comicInfo> Line 7 (</a>]]></backButtonRegex>) is a continuation of Line 8Edit: comic "Gunnerkrigg Court" start_at "http://gunnerkrigg.com/" for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg" alt="">"""): download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""") This one seems to be working better, but it keeps getting hung up:[1:46:19 PM] Woofy 1.20 (c) Vlad Iliescu [1:46:19 PM] code.google.com/p/woofy/[1:46:23 PM][GKC Extension visit] starting at gunnerkrigg.com/[1:46:23 PM][GKC Extension download] found 1 strips [1:46:23 PM][GKC Extension download] downloading gunnerkrigg.com/comics/00001224.jpg to C:\Users\■■■■■■■\Desktop\GKC\GKC Extension\00001224.jpg [1:46:23 PM][GKC Extension download] WARNING: already downloaded gunnerkrigg.com/comics/00001224.jpg. [1:46:23 PM][GKC Extension visit] found 0 links
|
|
|
Post by GK Sierra on Jul 17, 2013 20:56:24 GMT
Nevermind, it's working now.
For reference:
comic "Gunnerkrigg Court" start_at "http://gunnerkrigg.com" for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg">"""): download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""")
I'll post a Rapidshare to the .zip when I'm done.
|
|
|
Post by GK Sierra on Jul 17, 2013 23:27:20 GMT
|
|
|
Post by Señor Goose on Jul 18, 2013 2:07:25 GMT
Aaaand GK Sierra delivers. Well done.
|
|
|
Post by Señor Goose on Jul 18, 2013 2:12:29 GMT
Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo!
|
|
|
Post by GK Sierra on Jul 18, 2013 4:25:14 GMT
Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo! Well-guestimated, good sir!
|
|
|
Post by hslugs on Jul 18, 2013 15:13:11 GMT
|
|
|
Post by Xan on Jul 18, 2013 16:04:28 GMT
Whoa, 224MB for me. Are current images larger than they were originally, or do I miss some pages?.. Interesting. I ran your script and get the same size.
|
|
|
Post by sapientcoffee on Jul 18, 2013 18:27:13 GMT
Whoa, 224MB for me. Are current images larger than they were originally, or do I miss some pages?.. Hmm, looking at GK Sierra's zip, I see .jpgs, but also .jpg_original. Looks like there's two files for each page.
|
|
|
Post by TBeholder on Jul 18, 2013 22:13:29 GMT
Pic 78k, original 75k? What the microsoft it stuffs there? Don't tell me its name and URL take 3k.
|
|
|
Post by Señor Goose on Jul 19, 2013 4:35:24 GMT
Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo! Well-guestimated, good sir!
|
|
|
Post by hslugs on Jul 19, 2013 7:18:19 GMT
Ok, now as I have GKC.zip... Pic 78k, original 75k? What the microsoft it stuffs there? Don't tell me its name and URL take 3k. Well, comment, (xml-adorned) url and lots of padding. exiv2 ex 00000001.jpg will dump the stuff. And "GKC Extension.txt" actually shows how it was done.
|
|
|
Post by GK Sierra on Jul 19, 2013 7:36:11 GMT
Well-guestimated, good sir!
|
|
|
Post by Daedalus on May 29, 2014 19:30:07 GMT
Link is broken. Do you still have this file to re-upload?
|
|
|
Post by GK Sierra on May 29, 2014 22:09:48 GMT
Link is broken. Do you still have this file to re-upload? Sure thing. Current to Wednesday's page: www.filedropper.com/gkcIt won't stay up for more than a couple days, so PM me if you missed it.
|
|
|
Post by forestflight on Jun 1, 2014 4:06:09 GMT
...The new, current file is 256.2MB... I just have to ask, what was in the other 186.8MB?
* data archival nerd here *
|
|
|
Post by GK Sierra on Jun 2, 2014 0:37:13 GMT
...The new, current file is 256.2MB... I just have to ask, what was in the other 186.8MB? * data archival nerd here * I took out the jpeg_originals to prevent size creep. The comic is probably going to be much longer and I know some people like to read on mobile devices which are already stuffed with music and apps.
|
|
|
Post by Daedalus on Jun 4, 2014 15:36:13 GMT
Link is broken. Do you still have this file to re-upload? Sure thing. Current to Wednesday's page: www.filedropper.com/gkcIt won't stay up for more than a couple days, so PM me if you missed it. Is there any way to download this without an creating an account? Or, failing that, an account without paying?
|
|