Gunnerkrigg ZIP

drdave Junior Member Posts: 99	Gunnerkrigg ZIP Jul 17, 2013 1:22:11 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by drdave on Jul 17, 2013 1:22:11 GMT Hi. Has anybody gone to lengths and arranged a zipped version of the comic? I'd really like to have it on my tablet when I'm abroad. Oh, please don't go "buy the books" on me because I did. If this is out of place, delete away.

Señor Goose Gunner "Drink lots of water and change your socks." Posts: 1,088	Gunnerkrigg ZIP Jul 17, 2013 8:13:15 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Señor Goose on Jul 17, 2013 8:13:15 GMT Hmm, good question. Let me go math.
	Our Helix Who art in ITEM Hallowed be thy name

Señor Goose
Gunner

"Drink lots of water and change your socks."

Posts: 1,088

Gunnerkrigg ZIP Jul 17, 2013 8:30:50 GMT

Quote

Post by Señor Goose on Jul 17, 2013 8:30:50 GMT

Alright, assuming an average page size of 400 KB, and currently there are 1,224 pages in the comic, we're looking at 489,600 KB of data. Uncompressed, that is. After compressing a picture for testing I discovered that the compression ratio is only 1%. That means that the entire folder would only be one percent smaller- 484,704 KB instead of 489,600 KB. So if you have half a Gig of memory on your tablet, great.

Our Helix
Who art in ITEM
Hallowed be thy name

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP Jul 17, 2013 9:16:13 GMT

Quote

Post by GK Sierra on Jul 17, 2013 9:16:13 GMT

Jul 17, 2013 1:22:11 GMT drdave said:

Hi. Has anybody gone to lengths and arranged a zipped version of the comic? I'd really like to have it on my tablet when I'm abroad.
Oh, please don't go "buy the books" on me because I did. If this is out of place, delete away.

Go to one of the GKC comics (today's, for example) and right click on it, then click "view image". Observe how the URLs are ordered, one after the other.

Now write a macro that grabs all those images. Should be pretty simple, all you have to do is tell one value to increase by one for each operation.

That's the only way I can think of.

Toloc
Senior Member

avatar drawn by reddyeno5

Posts: 392

Gunnerkrigg ZIP Jul 17, 2013 9:32:20 GMT

Quote

Post by Toloc on Jul 17, 2013 9:32:20 GMT

possible, yes
should be relatively easy to build a script to do it even.
the pages are jpegs, so further compression would be pointless.
I don't think size would be much of an issue, ~15MB a chapter I guess. One wouldn't have to carry the whole thing all the time.
Not too sure if the Great Creator would be pleased, of course.

I walked worlds of smoke and half-truths, intangible. Worlds of torment and of unnameable beauty.
Opaline towers as high as small moons. Glaciers that rippled with insensate lust.
And one world with nothing but shrimp. I tired of that one quickly.

-Illyria

eightyfour
Gunner

not good with words

Posts: 759

Gunnerkrigg ZIP Jul 17, 2013 9:58:56 GMT

Quote

Post by eightyfour on Jul 17, 2013 9:58:56 GMT

The complete comic uncompressed is currently 223 MB large. I know that because I save every page to my disk the day it is posted. I had too many good comics disappear in the digital Nirvana on me in the past, so now I just make sure I always have a copy. Not that I expect this to happen with GC.

And no, I'm sorry, but I'm not gonna make my archive available online (at least not unless Tom gives his explicit OK, which I don't expect to happen either). But as others have said, it's fairly easy to write a script that grabs all the images for you. Wget is a handy tool for this kind of thing.

Last Edit: Jul 17, 2013 9:59:44 GMT by eightyfour

TBeholder
Gunner

Two Eyes Good, Eleven Eyes Better

Posts: 2,995

Gunnerkrigg ZIP Jul 17, 2013 9:59:16 GMT

Quote

Post by TBeholder on Jul 17, 2013 9:59:16 GMT

Jul 17, 2013 8:30:50 GMT Señor Goose said:

Alright, assuming an average page size of 400 KB, and currently there are 1,224 pages in the comic, we're looking at 489,600 KB of data. Uncompressed, that is. After compressing a picture for testing I discovered that the compression ratio is only 1%. That means that the entire folder would only be one percent smaller- 484,704 KB instead of 489,600 KB. So if you have half a Gig of memory on your tablet, great.

Add some overhead for the index, too.

But hey, compressing what, JPEG? Wasn't this obvious from the start?
Then again, simply having it as one chunk may save as much as actual "compression" above, and there are Comix and other fancy viewers.

So it may make sense to pack it... not even necessary actual compression, just tarball it, rename *.cbt and not bother CPU with extra unpacking. Or make one "archive" per chapter, for that matter. Why not?

Jul 17, 2013 9:16:13 GMT GK Sierra said:

Observe how the URLs are ordered, one after the other.
Now write a macro that grabs all those images. Should be pretty simple, all you have to do is tell one value to increase by one for each operation.
That's the only way I can think of.

That, or as a lazy and nice option: use HTTrack, then see all the pictures under "www.gunnerkrigg.com/comics/" in the created mirror. It would also go slightly easier on the server - it would take HTTP pages, but also throttle speed, reuse connections (of course, e.g. wget also can be told to "--limit-rate=20000" if you care) and "view" external ads once per mirrored page if you didn't exclude them.
The good part here is that you can update the mirror later... not that it would be too hard to enhance the same batch script so that it skips existing files.

Last Edit: Jul 17, 2013 10:09:46 GMT by TBeholder

And the creature making toothpicks of my five-foot patent wheels.

Xan
Senior Member

Boxbot Acolyte

Posts: 343

Gunnerkrigg ZIP Jul 17, 2013 12:14:58 GMT

Quote

Post by Xan on Jul 17, 2013 12:14:58 GMT

For a Windows-based solution, I used Woofy several times.
You basically write regular expressions for the comic page image and the "Next" (or "Previous"?) link, and it will crawl the site for you.

Edit: A suitable definition was constructed by the inquisitive mind of the formidable GK Sierra:

Jul 17, 2013 20:56:24 GMT GK Sierra said:

comic "Gunnerkrigg Court"
start_at "http://gunnerkrigg.com"
 
for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg">"""):
    download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""")

Last Edit: Jul 18, 2013 9:19:25 GMT by Xan

$\Box\bot$

Señor Goose Gunner "Drink lots of water and change your socks." Posts: 1,088	Gunnerkrigg ZIP Jul 17, 2013 15:48:20 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Señor Goose on Jul 17, 2013 15:48:20 GMT I've occasionally considered building a collection of every page, but I don't know any way to do that aside from individually saving every single one individually. I am not good with coding.
	Our Helix Who art in ITEM Hallowed be thy name

Xan Senior Member Boxbot Acolyte Posts: 343	Gunnerkrigg ZIP Jul 17, 2013 15:54:42 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Xan on Jul 17, 2013 15:54:42 GMT There is actually a definition for Woofy for Gunnerkrigg: woofy.vladiliescu.ro/definitions1x:gunnerkrigg-court Disclaimer: haven't tested it.
	$\Box\bot$

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP Jul 17, 2013 19:46:40 GMT

Quote

Post by GK Sierra on Jul 17, 2013 19:46:40 GMT

Jul 17, 2013 15:54:42 GMT Xan said:

There is actually a definition for Woofy for Gunnerkrigg: woofy.vladiliescu.ro/definitions1x:gunnerkrigg-court
Disclaimer: haven't tested it.

Looks like it hasn't been updated since the switchover.

The new start URL is gunnerkrigg.com/comics/00000001.jpg

"next_a.jpg" is still correct, though.

sapientcoffee Gunner Posts: 733	Gunnerkrigg ZIP Jul 17, 2013 20:19:30 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by sapientcoffee on Jul 17, 2013 20:19:30 GMT Jul 17, 2013 8:30:50 GMT Señor Goose said: So if you have half a Gig of memory on your tablet, great. Depending on the data plan/availability of free wi-fi, it might be the best way to go even with limited memory.
	"We cannot tolerate the proliferation of this paperwork any longer. It is useless to fight the forms. We must kill the people producing them." - Attributed to Vladimir Kabaidze, Director of the Ivanovo Machine Works near Moscow, in a speech before the annual Communist Party Congress, 1936

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP Jul 17, 2013 20:30:35 GMT

Quote

Post by GK Sierra on Jul 17, 2013 20:30:35 GMT

I'm trying to make a new XML/Definition file for Woofy, but I'm running into some trouble. If someone who is better with regular expressions could debug this for me, I would be very grateful.

error on start:

GKC Extension(1,1): BCE0043: Unexpected token: <.
GKC Extension(3,16): BCE0044: unexpected char: '!'.

code:

<?xml version="1.0" encoding="utf-8" ?>

<comicInfo friendlyName="Gunnerkrigg Court">

<startUrl><![CDATA[http://gunnerkrigg.com/]]></startUrl>

<firstIssue><![CDATA[http://gunnerkrigg.com/comics/00000001.jpg]]></firstIssue>

<comicRegex><![CDATA[(?<content>/comics/[0-9]{8}_[^.]*\.(jpg)]]></comicRegex>

<backButtonRegex><![CDATA[<a\shref="(?<content>/d/comics/[0-9]{8}\.html)"\starget="_self"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg"

</a>]]></backButtonRegex>

</comicInfo>

Line 7 (</a>]]></backButtonRegex>) is a continuation of Line 8

Edit:

comic "Gunnerkrigg Court"
start_at "http://gunnerkrigg.com/"

for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg" alt="">"""):
download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""")

This one seems to be working better, but it keeps getting hung up:

[1:46:19 PM] Woofy 1.20 (c) Vlad Iliescu
[1:46:19 PM] code.google.com/p/woofy/
[1:46:23 PM][GKC Extension visit] starting at gunnerkrigg.com/
[1:46:23 PM][GKC Extension download] found 1 strips
[1:46:23 PM][GKC Extension download] downloading gunnerkrigg.com/comics/00001224.jpg to C:\Users\■■■■■■■\Desktop\GKC\GKC Extension\00001224.jpg
[1:46:23 PM][GKC Extension download] WARNING: already downloaded gunnerkrigg.com/comics/00001224.jpg.
[1:46:23 PM][GKC Extension visit] found 0 links

Last Edit: Jul 17, 2013 21:06:00 GMT by GK Sierra

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP Jul 17, 2013 20:56:24 GMT

Quote

Post by GK Sierra on Jul 17, 2013 20:56:24 GMT

Nevermind, it's working now.

For reference:

comic "Gunnerkrigg Court"
start_at "http://gunnerkrigg.com"

for page in visit("""href="(?<content>[^\n]*?)"><img src="http://www.gunnerkrigg.com/images/prev_a.jpg">"""):
download("""<img class="comic_image" src="(?<content>/comics/[^"]*?)">""")

I'll post a Rapidshare to the .zip when I'm done.

Last Edit: Jul 17, 2013 20:56:46 GMT by GK Sierra

GK Sierra Moderator The world continues to spin, pup. Posts: 2,797	Gunnerkrigg ZIP Jul 17, 2013 23:27:20 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by GK Sierra on Jul 17, 2013 23:27:20 GMT Done. My first triple post. I feel nauseated. It's 443MB, not bad at all.

Señor Goose Gunner "Drink lots of water and change your socks." Posts: 1,088	Gunnerkrigg ZIP Jul 18, 2013 2:07:25 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Señor Goose on Jul 18, 2013 2:07:25 GMT Jul 17, 2013 23:27:20 GMT GK Sierra said: Done. My first triple post. I feel nauseated. It's 443MB, not bad at all. Aaaand GK Sierra delivers. Well done.
	Our Helix Who art in ITEM Hallowed be thy name

Señor Goose Gunner "Drink lots of water and change your socks." Posts: 1,088	Gunnerkrigg ZIP Jul 18, 2013 2:12:29 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Señor Goose on Jul 18, 2013 2:12:29 GMT Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo!
	Our Helix Who art in ITEM Hallowed be thy name

GK Sierra Moderator The world continues to spin, pup. Posts: 2,797	Gunnerkrigg ZIP Jul 18, 2013 4:25:14 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by GK Sierra on Jul 18, 2013 4:25:14 GMT Jul 18, 2013 2:12:29 GMT Señor Goose said: Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo! Well-guestimated, good sir!

hslugs
New Member

Posts: 31

Gunnerkrigg ZIP Jul 18, 2013 15:13:11 GMT

Quote

Post by hslugs on Jul 18, 2013 15:13:11 GMT

Whoa, 224MB for me. Are current images larger than they were originally, or do I miss some pages?..

Anyway, my archive, with separated chapters:
docs.google.com/file/d/0B1ttsWQxRVfeMW5iOU41b0diZ1E
and the script I use to keep it up to date:
hastebin.com/raw/tivowinuru.px
(or hastebin.com/raw/pibuhodupa.px if you need that single gif page)

Last Edit: Jul 19, 2013 8:27:12 GMT by hslugs

0 10 1,3,5 * * ~/net/gunnerkrigg/get.px

Xan Senior Member Boxbot Acolyte Posts: 343	Gunnerkrigg ZIP Jul 18, 2013 16:04:28 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Xan on Jul 18, 2013 16:04:28 GMT Jul 18, 2013 15:13:11 GMT hslugs said: Whoa, 224MB for me. Are current images larger than they were originally, or do I miss some pages?.. Interesting. I ran your script and get the same size.
	$\Box\bot$

sapientcoffee
Gunner

Posts: 733

Gunnerkrigg ZIP Jul 18, 2013 18:27:13 GMT

Quote

Post by sapientcoffee on Jul 18, 2013 18:27:13 GMT

Jul 18, 2013 15:13:11 GMT hslugs said:

Whoa, 224MB for me. Are current images larger than they were originally, or do I miss some pages?..

Hmm, looking at GK Sierra's zip, I see .jpgs, but also .jpg_original. Looks like there's two files for each page.

Last Edit: Jul 18, 2013 18:30:17 GMT by sapientcoffee: oh there are two zip files

"We cannot tolerate the proliferation of this paperwork any longer. It is useless to fight the forms. We must kill the people producing them."
- Attributed to Vladimir Kabaidze, Director of the Ivanovo Machine Works near Moscow, in a speech before the annual Communist Party Congress, 1936

TBeholder Gunner Two Eyes Good, Eleven Eyes Better Posts: 2,995	Gunnerkrigg ZIP Jul 18, 2013 22:13:29 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by TBeholder on Jul 18, 2013 22:13:29 GMT Pic 78k, original 75k? What the microsoft it stuffs there? Don't tell me its name and URL take 3k.
	And the creature making toothpicks of my five-foot patent wheels.

Señor Goose Gunner "Drink lots of water and change your socks." Posts: 1,088	Gunnerkrigg ZIP Jul 19, 2013 4:35:24 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Señor Goose on Jul 19, 2013 4:35:24 GMT Jul 18, 2013 4:25:14 GMT GK Sierra said: Jul 18, 2013 2:12:29 GMT Señor Goose said: Wait, what'd I guess, 480MB? That's within 5% of the actual value! Whoo! Well-guestimated, good sir!
	Our Helix Who art in ITEM Hallowed be thy name

hslugs
New Member

Posts: 31

Gunnerkrigg ZIP Jul 19, 2013 7:18:19 GMT

Quote

Post by hslugs on Jul 19, 2013 7:18:19 GMT

Ok, now as I have GKC.zip...

Jul 18, 2013 22:13:29 GMT TBeholder said:

Pic 78k, original 75k? What the microsoft it stuffs there? Don't tell me its name and URL take 3k.

Well, comment, (xml-adorned) url and lots of padding.
exiv2 ex 00000001.jpg will dump the stuff.
And "GKC Extension.txt" actually shows how it was done.

0 10 1,3,5 * * ~/net/gunnerkrigg/get.px

GK Sierra Moderator The world continues to spin, pup. Posts: 2,797	Gunnerkrigg ZIP Jul 19, 2013 7:36:11 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by GK Sierra on Jul 19, 2013 7:36:11 GMT Jul 19, 2013 4:35:24 GMT Señor Goose said: Jul 18, 2013 4:25:14 GMT GK Sierra said: Well-guestimated, good sir!

Daedalus Gunner http://i.imgur.com/8qDLN6l.png MUAHAHAHAH Posts: 3,213	Gunnerkrigg ZIP May 29, 2014 19:30:07 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Daedalus on May 29, 2014 19:30:07 GMT Jul 17, 2013 23:27:20 GMT GK Sierra said: Done. My first triple post. I feel nauseated. It's 443MB, not bad at all. Link is broken. Do you still have this file to re-upload?
	All hail the Angel - the Machine Goddess, Demiurge of Robotkind, Divinity of Metal, Herald of Greater Realms.

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP May 29, 2014 22:09:48 GMT Daedalus likes this

Quote

Post by GK Sierra on May 29, 2014 22:09:48 GMT

May 29, 2014 19:30:07 GMT Daedalus said:

Jul 17, 2013 23:27:20 GMT GK Sierra said:

Done.

My first triple post. I feel nauseated.

It's 443MB, not bad at all.

Link is broken. Do you still have this file to re-upload?

Sure thing. Current to Wednesday's page: www.filedropper.com/gkc

It won't stay up for more than a couple days, so PM me if you missed it.

forestflight New Member Posts: 7	Gunnerkrigg ZIP Jun 1, 2014 4:06:09 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by forestflight on Jun 1, 2014 4:06:09 GMT It's 443MB, not bad at all. ...The new, current file is 256.2MB... I just have to ask, what was in the other 186.8MB? * data archival nerd here *

GK Sierra
Moderator

The world continues to spin, pup.

Posts: 2,797

Gunnerkrigg ZIP Jun 2, 2014 0:37:13 GMT

Quote

Post by GK Sierra on Jun 2, 2014 0:37:13 GMT

Jun 1, 2014 4:06:09 GMT forestflight said:

It's 443MB, not bad at all.

...The new, current file is 256.2MB... I just have to ask, what was in the other 186.8MB?

* data archival nerd here *

I took out the jpeg_originals to prevent size creep. The comic is probably going to be much longer and I know some people like to read on mobile devices which are already stuffed with music and apps.

Last Edit: Jun 2, 2014 3:18:23 GMT by GK Sierra

Daedalus
Gunner

http://i.imgur.com/8qDLN6l.png MUAHAHAHAH

Posts: 3,213

Gunnerkrigg ZIP Jun 4, 2014 15:36:13 GMT

Quote

Post by Daedalus on Jun 4, 2014 15:36:13 GMT

May 29, 2014 22:09:48 GMT GK Sierra said:

May 29, 2014 19:30:07 GMT Daedalus said:

Link is broken. Do you still have this file to re-upload?

Sure thing. Current to Wednesday's page: www.filedropper.com/gkc

It won't stay up for more than a couple days, so PM me if you missed it.

Is there any way to download this without an creating an account? Or, failing that, an account without paying?

All hail the Angel - the Machine Goddess, Demiurge of Robotkind, Divinity of Metal, Herald of Greater Realms.

Post by drdave on Jul 17, 2013 1:22:11 GMT

Post by Señor Goose on Jul 17, 2013 8:13:15 GMT

Post by Señor Goose on Jul 17, 2013 8:30:50 GMT

Post by GK Sierra on Jul 17, 2013 9:16:13 GMT

Post by Toloc on Jul 17, 2013 9:32:20 GMT

Post by eightyfour on Jul 17, 2013 9:58:56 GMT

Post by TBeholder on Jul 17, 2013 9:59:16 GMT

Post by Xan on Jul 17, 2013 12:14:58 GMT

Post by Señor Goose on Jul 17, 2013 15:48:20 GMT

Post by Xan on Jul 17, 2013 15:54:42 GMT

Post by GK Sierra on Jul 17, 2013 19:46:40 GMT

Post by sapientcoffee on Jul 17, 2013 20:19:30 GMT

Post by GK Sierra on Jul 17, 2013 20:30:35 GMT

Post by GK Sierra on Jul 17, 2013 20:56:24 GMT

Post by GK Sierra on Jul 17, 2013 23:27:20 GMT

Post by Señor Goose on Jul 18, 2013 2:07:25 GMT

Post by Señor Goose on Jul 18, 2013 2:12:29 GMT

Post by GK Sierra on Jul 18, 2013 4:25:14 GMT

Post by hslugs on Jul 18, 2013 15:13:11 GMT

Post by Xan on Jul 18, 2013 16:04:28 GMT

Post by sapientcoffee on Jul 18, 2013 18:27:13 GMT

Post by TBeholder on Jul 18, 2013 22:13:29 GMT

Post by Señor Goose on Jul 19, 2013 4:35:24 GMT

Post by hslugs on Jul 19, 2013 7:18:19 GMT

Post by GK Sierra on Jul 19, 2013 7:36:11 GMT

Post by Daedalus on May 29, 2014 19:30:07 GMT

Post by GK Sierra on May 29, 2014 22:09:48 GMT

Post by forestflight on Jun 1, 2014 4:06:09 GMT

Post by GK Sierra on Jun 2, 2014 0:37:13 GMT

Post by Daedalus on Jun 4, 2014 15:36:13 GMT