Saturday November 01 , 2014
Text Size
   
Welcome, Guest
Username: Password: Remember me
Discussion and Sharing of Web Comics
  • Page:
  • 1

TOPIC: Download MegaTokyo WebComic

Download MegaTokyo WebComic 4 years 7 months ago #6187

  • cYo
  • cYo's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3396
  • Thank you received: 628
  • Karma: 171
"Save as..." and open with ComicRack 0.9.117 and up.

hf

File Attachment:

File Name: MegaTokyo.cbw
File Size: 2602
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 7 months ago #6543

  • damocles
  • damocles's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 95
  • Thank you received: 16
  • Karma: 14
So how does one deal with the situation when Fred uses PNG format rather than GIF?
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 7 months ago #6544

  • cYo
  • cYo's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3396
  • Thank you received: 628
  • Karma: 171
good point.
Maybe I add something for the next version.
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 2 months ago #9589

  • uTP226
  • uTP226's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 3
  • Karma: 0
I tried making a BrowserScraper version for MT, but something does not work. I tested the regex in Expresso against the sourcecode of MT and they seemed to work. I don't know how ComicRack handles relative Paths, so the problem may lie there.
<Images>
	<Image Url="?http://megatokyo.com/strip/1">
      <Parts>
        <Part>src=&quot;(?&lt;link&gt;strips/\d+\.\w{3})&quot;</Part>
        <Part>&lt;div class="navcontrols top"&gt;\s*&lt;ul class="prevnext"&gt;&lt;li class="prev"&gt;&lt;a href="\./strip/\d+"&gt;Prev&lt;/a&gt;&lt;/li&gt;&lt;li class="next"&gt;&lt;a href="\./(?&lt;link&gt;strip/\d+)"&gt;Next&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</Part>
      </Parts>
</Image>
(The spaces after </, ">\, ="\, ></, d+) are created by the forum.)

I am not very good with regex so excuse the rather ugly second expression. :laugh:

Does someone see what mistake I made, or is the regexp correct and the problem lies at my end?
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 1 month ago #10348

  • Helmic
  • Helmic's Avatar
  • OFFLINE
  • Expert Boarder
  • (Don't be) That guy
  • Posts: 149
  • Thank you received: 80
  • Karma: 50
The missing pages are bugging me to no end, so any updates on this? As far as I know I still can't simply edit the provided .cbw to look for both PNGs and GIFs, and while I can use either the BrowseScraper or IndexScraper types to find the pages the comics are on, I can't actually retrieve the images themselves because the source writes them as "strips/[digits]." and there's no way to append a forward slash in front of it so that ComicRack can find the URL. If there hasn't been any fixes for these issues, are there any workarounds? What I've got so far: [code]<?xml version="1.0"?> <WebComic xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Info> <Series>MegaTokyo</Series> <Summary>Based in a fictional version of Tokyo, MegaTokyo follows the adventures of a pair of American anime and video game enthusiasts. The series started off as all humor, but as it shifted to having longer storylines since one of the creators left, and the remaining one, Fred Gallagher, decided to do the series by himself and take it in a more serious direction.</Summary> <Year>2001</Year> <Writer>Fred Gallagher, Rodney Caston</Writer> <Penciller>Fred Gallagher, Rodney Caston</Penciller> <Genre>Humor</Genre> <Format>Web Comic</Format> <BlackAndWhite>Yes</BlackAndWhite> <Web>www.megatokyo.com</Web> </Info> <Images> <Image Url="!http://www.megatokyo.com/archive.php"> <Parts> <Part>&quot;(?&lt;link&gt;\./strip/\d+)&quot;</Part> <Part>&quot;(?&lt;link&gt;strips/\d+\.\w+)&quot;</Part> </Parts> </Image> </Images> </WebComic>[/code] Any extra spaces are the forums being stupid.[file extension]" and there's no way to append a forward slash in front of it so that ComicRack can find the URL. If there hasn't been any fixes for these issues, are there any workarounds?

What I've got so far:
<?xml version="1.0"?>
<WebComic xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
	<Info>
		<Series>MegaTokyo</Series>
		<Summary>Based in a fictional version of Tokyo, MegaTokyo follows the adventures of a pair of American anime and video game enthusiasts.  The series started off as all humor, but as it shifted to having longer storylines since one of the creators left, and the remaining one, Fred Gallagher, decided to do the series by himself and take it in a more serious direction.</Summary>
		<Year>2001</Year>
		<Writer>Fred Gallagher, Rodney Caston</Writer>
		<Penciller>Fred Gallagher, Rodney Caston</Penciller>
		<Genre>Humor</Genre>
		<Format>Web Comic</Format>
		<BlackAndWhite>Yes</BlackAndWhite>
		<Web>www.megatokyo.com</Web>
	</Info>
	<Images>
		<Image Url="!http://www.megatokyo.com/archive.php">
			<Parts>
				<Part>&quot;(?&lt;link&gt;\./strip/\d+)&quot;</Part>
				<Part>&quot;(?&lt;link&gt;strips/\d+\.\w+)&quot;</Part>
			</Parts>
		</Image>
	</Images>
</WebComic>

Any extra spaces are the forums being stupid.
Request webcomics on IRC: http://widget.mibbit.com/?server=irc.rizon.net&channel=%23comicrack
server: irc.rizon.net
channel: #comicrack
Or in the webcomics subforum: http://comicrack.cyolito.com/forum/22-web-comics
Last Edit: 4 years 1 month ago by Helmic.
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 4 weeks ago #10362

  • Stonepaw
  • Stonepaw's Avatar
  • OFFLINE
  • Moderator
  • Posts: 848
  • Thank you received: 237
  • Karma: 159
Huh, I can't get this working either. :huh:
The administrator has disabled public write access.

Re:Download MegaTokyo WebComic 4 years 4 weeks ago #10377

  • Helmic
  • Helmic's Avatar
  • OFFLINE
  • Expert Boarder
  • (Don't be) That guy
  • Posts: 149
  • Thank you received: 80
  • Karma: 50
Attached is a (temporarily) fixed version. I simply checked to see which pages didn't load, then inserted lines specifically marking them as jpegs. The ones that didn't load after that I marked as PNGs. Unfortunately this doesn't automatically update like it would if ComicRack could read the URL's the webpage provides, but it shouldn't be too difficult to get later pages if you follow my process.

File Attachment:

File Name: MegaTokyo-20101001.cbw
File Size: 89014
Request webcomics on IRC: http://widget.mibbit.com/?server=irc.rizon.net&channel=%23comicrack
server: irc.rizon.net
channel: #comicrack
Or in the webcomics subforum: http://comicrack.cyolito.com/forum/22-web-comics
The administrator has disabled public write access.

Re: Download MegaTokyo WebComic 3 years 10 months ago #11150

  • Alien
  • Alien's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 5
  • Karma: 0
I haven't used CR in a while [haven't had time really], so was unaware of its new [to me at least] webcomic function.

For downloading comics I mostly use either Orbit downloader, with its batch download function, or TcD [ Text Control Downloader ], or a combination of both. The domain lease for TcD's homepage [tcd.in] appears to have run out without being renewed, but that link points to a backup of the site on archive.org, & you can still get the most recent version [2.2.4] from there.

The help does a pretty good job of explaining how to use it, but I'll give an example for MT.

First, set the download folder option, then open its commands window, & type the following commands, followed by Enter key [just as you would in a DOS command prompt window].

*** doesn't work ***
[noparse]fusker megatokyo.com/strips/[1-1291][/noparse]
img
keep /strips/
******************
ignore the above code, I'll make a new post with code I've actually tested first. :blush: :pinch:

at this point you can either download the list of images you've just generated with TcD itself, simply by using the get command, or you can save the list to a file for use with another prog if you prefer.

Be advised, the second step - the img command - can take quite a while, & sometimes cause the prog to become unresponsive. In situations like that I tend to do them in batches of 200-300.
No crops were circled, or animals mutilated in the making of this sig.
Last Edit: 3 years 10 months ago by Alien. Reason: realised that what I'd posted wouldn't work
The administrator has disabled public write access.

Re: Download MegaTokyo WebComic 3 years 10 months ago #11154

  • Alien
  • Alien's Avatar
  • OFFLINE
  • Fresh Boarder
  • Posts: 5
  • Karma: 0
Ok, sorry about not actually checking it first, some sites have a few quirks that need to be worked around [perhaps intended to discourage hot-linking or whatever], & I forgot that MT is 1 of them.

If you have TCD installed, paste the following into Notepad [or similar text editor]:
title 1-250
fusker http://megatokyo.com/strip/[1-250]
img
keep /strips/
replace /strip/strips/ /strips/
save 0001-0250.olt
[250 is just how many I usually do in 1 go, you can change it to suit your own needs/preferences]
& save it with .flux as the filename extension. If you've already set your download folder then when you click that file it will automatically open & run with TcD & when it's done it'll save a list of the comic image URLs to a file called 0001-0250.olt. .olt is the file extension that Orbit Downloader uses for its imported lists [though I think it can also use .txt as well], but you can change it to something different if using a different prog that requires a different extension, but still reads a plain unformatted list of 1 URL per line.

The title part just sets the title of the command window that's running the script, which I find useful if you're running several at once.

The replace command just fixes the previously mentioned weirdness with the URLs. IIRC comicgenesis/keenspot do something similar with some of the comics hosted on their servers, so if you read any of them it's a command worth remembering.

If you'd rather use TcD to do the actual downloading [& on a few sites you may need to as progs like OD won't work, can't remember which ones though off the top of my head - my reading list of comics is over 100 long, & I tend to read them sporadically, rather than constantly keeping up to date], then just replace the save line with the get command [doesn't need any arguments, switches, etc]. or you could do both if you have trouble with 1 particular site & want to compare the list of images to the actual images downloaded.

I should also mention that when you use TcD to download it adds a number to the beginning of the filename. Sometimes this is useful, other times it's a pain. Renaming them isn't that much of a hassle though, there's a few freeware file renaming progs about on the net, or you could just as easily do it with a .bat file.
No crops were circled, or animals mutilated in the making of this sig.
Last Edit: 3 years 10 months ago by Alien.
The administrator has disabled public write access.
  • Page:
  • 1
Time to create page: 0.225 seconds

Who's Online

We have 177 guests and 1 member online
  • 600WPMPO

PIM

You are not logged in.