Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper

Comic Vine Scraper 1 year 10 months ago #44221

  • solidus0079
  • solidus0079's Avatar
  • Offline
  • Senior Boarder
  • Posts: 66
  • Thank you received: 3
  • Karma: 1
iohanr wrote:
Anyway, on a somewhat unrelated topic but in response to jkthemac, do you keep your comics in cbr format? I convert everything to cbz and save the metadata within each book so that if I ever lose the ComicRackDB.xml (or want to start over) I don't have to rescrape anything. I've had to re-do my DB several times as I experimented with moving it to MySQL and then back again to the local XML file. The nice thing is all of the comics that had already been scraped, converted to cbz and "updated" with metadata didn't need to be rescraped again. As soon as I "scanned book folders" in ComicRack, all of the metadata got populated again.
I know you were more asking jkthemac, but I've tried the various formats available and I run a custom export setting for conversion to CBZ with max compression level. I find it smaller than CBR and even CB7. Plus I do the embedding of metadata like you do, which CBR can't do due to licensing. CB7 can also do this but I've found that file operations like syncing to iPad take forever, and there's no real space savings compared to max level zip compression (which actually makes a difference believe it or not).

I've recently started using conversion from jpg to webc 80% quality in addition to max zip level, I'm loving the 50-60% space savings and don't really notice quality loss. I'm probably going to convert my whole library. It'd be nice to get a TB back.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44222

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 760
  • Thank you received: 248
  • Karma: 55
iohanr wrote:
Anyway, on a somewhat unrelated topic but in response to jkthemac, do you keep your comics in cbr format? I convert everything to cbz and save the metadata within each book so that if I ever lose the ComicRackDB.xml (or want to start over) I don't have to rescrape anything.

CBZ for exactly that reason but since I switched to SQL I havn't needed to worry - my NAS is rock solid and the SQL database is independent of Comicrack and my PC so it makes everything crash proof. Indeed I am reluctant to rescrape because it will only mess up my carefully corrected and reorganised database. I usually turn off multiple fields if I rescrape. Comic Vine is hardly the most consistant of databases, they are forever changing their mind on how to group annuals and older indicia. in the last three years they have changed the early X-Men indicia at least three times, and Fantastic Four annuals twice. (I sometimes think I should just publish my Marvel database, it is probably more accurate than Comicvine and more logical than Marvel.com but of course it isn't complete.)
Last Edit: 1 year 10 months ago by jkthemac.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44226

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
Hey guys, the latest version of Comic Vine Scraper (1.0.91) is now available at the usual spot.

This version throttles API requests properly, which means that you should not run into any more 'scraped too many comics' limitations while scraping. This won't be true if you try to use it today, though, because Comic Vine has indicated that there is a bug with their API limiting code that will not be fixed until Monday at the earliest.

So feel free to download this release, but don't be surprised if you're still having problems this weekend. Hopefully that will go away early next week...
The administrator has disabled public write access.
The following user(s) said Thank You: forkicks, tglass1976, actioncomics, oraclexview, bi0mech, kenjio, James Spaceman, romsnesrom, Marv74br, ClayM and this user have 8 others thankyou

Comic Vine Scraper 1 year 10 months ago #44227

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 190
  • Thank you received: 16
  • Karma: 2
cbanack wrote:
Hey guys, the latest version of Comic Vine Scraper (1.0.91) is now available at the usual spot.

This version throttles API requests properly, which means that you should not run into any more 'scraped too many comics' limitations while scraping. This won't be true if you try to use it today, though, because Comic Vine has indicated that there is a bug with their API limiting code that will not be fixed until Monday at the earliest.

So feel free to download this release, but don't be surprised if you're still having problems this weekend. Hopefully that will go away early next week...

Thanks for coming out of retirement to help with this. You work on the scraper is always appreciated.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44228

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
Thanks for all the hard work.
ComicRack would never have become what it is without your contributions over all those years.
Try to still stay around :)
The administrator has disabled public write access.
The following user(s) said Thank You: fieldhouse, krandor

Comic Vine Scraper 1 year 10 months ago #44233

  • whinkle
  • whinkle's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 18
  • Thank you received: 3
  • Karma: 1
The ComicRack community has probably just saved ComicVine. Maybe they know it.

I’m not a coder. But I have a lot of e-comics. When I discovered ComicRack and CVS, I began posting to the ComicVine wiki with updates/info on comics they did not have.Because I have always thought on a wiki, everyone should give back.

If you update the wiki diligently, it’s not hard to rocket to the top of ComicVine contributors. I have 46704 wikipoints now. I only did that as a way of paying back access to the API because of CVS.

I stopped contributing to the CV wiki when they became such jerks about the API.

I am glad to hear that this issue may be resolved. We CR folks need CVS. And CV needs CVS.

Hope this is a win-win.

Wade Hinkle
The administrator has disabled public write access.
The following user(s) said Thank You: 600WPMPO

Comic Vine Scraper 1 year 10 months ago #44234

  • solidus0079
  • solidus0079's Avatar
  • Offline
  • Senior Boarder
  • Posts: 66
  • Thank you received: 3
  • Karma: 1
whinkle wrote:
The ComicRack community has probably just saved ComicVine. Maybe they know it.

I’m not a coder. But I have a lot of e-comics. When I discovered ComicRack and CVS, I began posting to the ComicVine wiki with updates/info on comics they did not have.Because I have always thought on a wiki, everyone should give back.

If you update the wiki diligently, it’s not hard to rocket to the top of ComicVine contributors. I have 46704 wikipoints now. I only did that as a way of paying back access to the API because of CVS.

I stopped contributing to the CV wiki when they became such jerks about the API.

I am glad to hear that this issue may be resolved. We CR folks need CVS. And CV needs CVS.

Hope this is a win-win.

Wade Hinkle
Yeah, I mean I get having to look out for the health and safety of their website and database, but their tone and solutions were totally wrong on this. Web 2.0 (is that even a term anymore) is tricky business since everything's crowdsourced. Sure, we're scraping their site top to bottom but at the same time but many of those people are the reason why there's stuff to be scraped.

I'm not an editor at your level by any means, but I'm correcting and updating stuff when I see it. Like a Doctor Strange issue that was titled "Dr. Strange". That drove me crazy having Doctor Strange split into two different titles in Comicrack. I doubt anyone would have noticed it either. CR and CVS gives a totally different perspective and new sets of eyes on the data.

It does seem that they've come to their senses.
Last Edit: 1 year 10 months ago by solidus0079.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44235

  • oraclexview
  • oraclexview's Avatar
  • Offline
  • Moderator
  • aka SoundWave
  • Posts: 906
  • Thank you received: 182
  • Karma: 37
solidus0079 wrote:
I'm not an editor at your level by any means, but I'm correcting and updating stuff when I see it. Like a Doctor Strange issue that was titled "Dr. Strange". That drove me crazy having Doctor Strange split into two different titles in Comicrack. I doubt anyone would have noticed it either.
In your example this was actually done on purpose. The two different titles of Doctor Strange and Dr. Strange were two separate titles completely. They weren't the same series, thus the reasoning for them having separate entries. So not the best example, yet I do get your point and agree. :P B) ComicVine needs the scrapper and the ComicRack community as much as they both need ComicVine. It's a symbiotic relationship and I'm glad that cbanack came out of retirement for one last Rescue Ranger hail mary save in overtime in order to help keep that relationship in tack.


cbanack wrote:
Hey guys, the latest version of Comic Vine Scraper (1.0.91) is now available at the...
My friend, you deserve more thanks and karma than this site will allow to be given in one sitting. As always, you're the man!

One question though...with this release, have you reimplemented the features of the scrapper (albeit updated to work more efficiently with the new API limits) that you disabled earlier due to the initial API issues?
Last Edit: 1 year 10 months ago by oraclexview.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44236

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 190
  • Thank you received: 16
  • Karma: 2
solidus0079 wrote:
whinkle wrote:
The ComicRack community has probably just saved ComicVine. Maybe they know it.

I’m not a coder. But I have a lot of e-comics. When I discovered ComicRack and CVS, I began posting to the ComicVine wiki with updates/info on comics they did not have.Because I have always thought on a wiki, everyone should give back.

If you update the wiki diligently, it’s not hard to rocket to the top of ComicVine contributors. I have 46704 wikipoints now. I only did that as a way of paying back access to the API because of CVS.

I stopped contributing to the CV wiki when they became such jerks about the API.

I am glad to hear that this issue may be resolved. We CR folks need CVS. And CV needs CVS.

Hope this is a win-win.

Wade Hinkle
Yeah, I mean I get having to look out for the health and safety of their website and database, but their tone and solutions were totally wrong on this. Web 2.0 (is that even a term anymore) is tricky business since everything's crowdsourced. Sure, we're scraping their site top to bottom but at the same time but many of those people are the reason why there's stuff to be scraped.

I'm not an editor at your level by any means, but I'm correcting and updating stuff when I see it. Like a Doctor Strange issue that was titled "Dr. Strange". That drove me crazy having Doctor Strange split into two different titles in Comicrack. I doubt anyone would have noticed it either. CR and CVS gives a totally different perspective and new sets of eyes on the data.

It does seem that they've come to their senses.

Definitely agree. I update stuff all the time when I see it is wrong. The one that often gets me is dates that are wrong. Since I often sort by published date, seeing an issue out-of-order is a good sign there is an issue with the date fields.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 10 months ago #44237

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
oraclexview wrote:
One question though...with this release, have you reimplemented the features of the scrapper (albeit updated to work more efficiently with the new API limits) that you disabled earlier due to the initial API issues?

Unfortunately no, those features were disabled because they went 'outside' the API and accessed the Comic Vine website directly...that is not allowed, but I we never knew that until they implemented a new system that detects that kind of behaviour and bans it.

There was some talk (see comment #2) about putting the data that those features need into the API, but it so far hasn't happened.
The administrator has disabled public write access.
The following user(s) said Thank You: krandor
Time to create page: 0.457 seconds

Who's Online

We have 207 guests and 9 members online