Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper

Comic Vine Scraper 3 years 8 months ago #38739

  • forkicks
  • forkicks's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 871
  • Thank you received: 109
  • Karma: 37
Requiring a per user API key doesn't seem that bad.

Also, loved the part about the wednesdays :)

fK
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38740

  • ClayM
  • ClayM's Avatar
  • Offline
  • Senior Boarder
  • Posts: 60
  • Thank you received: 5
  • Karma: 0
forkicks wrote:
Requiring a per user API key doesn't seem that bad.

Also, loved the part about the wednesdays :)

fK

getting your api key seems reasonable to me!
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38743

  • sykoone
  • sykoone's Avatar
  • Offline
  • Expert Boarder
  • Posts: 153
  • Thank you received: 16
  • Karma: 5
If an option was added to allow a user to input their own key, I'd gladly to it to help lighten the load a bit.
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38744

  • WraithTDK
  • WraithTDK's Avatar
  • Offline
  • Junior Boarder
  • Posts: 36
  • Thank you received: 1
  • Karma: 0
KnobblySavage wrote:
Well, for those of us who are already registered Comic Vine users, getting an api key can be done at this page: www.comicvine.com/api/

Would it be possible to add the option of entering our own api-key in the settings, and if the field is left blank it defaults to yours, so nobody is forced to register at CV if they do not want to, but those of us who do can lighten the load on your api key at least a little bit?

(Also: thanks for your great work :))

I think this is a great idea. I don't even think it should be optional. Using the scraper is already optional to begin with. If you're using it, and you're getting that much value from it, I don't think it's unreasonable to ask people to spend 2 minutes to register for a CV account that they don't have to pay for.
I am currently reading every Marvel Superhero comic book every printed, in chronological order, and blogging about the milestones, footnotes, and other interesting moments I read at http://www.wraithscomicjourney. I'll be adding DC when I hit 1985, and other companies when they launch.
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38745

  • WraithTDK
  • WraithTDK's Avatar
  • Offline
  • Junior Boarder
  • Posts: 36
  • Thank you received: 1
  • Karma: 0
cbanack wrote:
Yup, Comic Vine has re-enabled access to my API key, so the scraper is working again. :)

They added some more server hardware this morning to support the extra load, but they're still not very happy with our recent spike in data usage. Comic Vine Scraper is by far the biggest single user of the Comic Vine API--he tells me we generate around half a million database accesses per day (and significantly more on Wednesdays).

What the hell are you people scraping??? :blink:

So anyway, we're officially "on notice". There's a couple of things we can do to make sure we don't get locked out again:

1) if you're not using the latest version of Comic Vine Scraper (1.0.76), update now please! Some of the older versions (the ones that return WAY too many search results) are really inefficient with the new Comic Vine database. I'm worried that some people are still using those old versions, and if they are, they are putting a significant and needless load on the Comic Vine servers.

2) I know some people like to rescrape their entire collection. Please do this as infrequently as possible. The vast majority of comic book metadata does not ever change once it has been entered, and even the bits that do don't change that frequently. Personally, I only do a full-collection rescrape about once a year.

3) If you've copied any of the Comic Vine scraper code and used it to write your own app, PLEASE be sure that you are using your own API key. API keys are free and easy to get, and you'll only have to update one line of code. If you're not sure whether you're using your own API key, or you want me to help you obtain and install your own key, just PM me.

If our combined load on the Comic Vine servers stays really high, Comic Vine will ban my API key again, and Comic Vine Scraper will stop working. If that happens, I'll need to make some significant changes to the scraper: everyone will have to have a Comic Vine account and enter their own API key into the settings dialog before the scraper will work.

I don't really want to do that, so everyone please try not to abuse the Comic Vine servers! ;)

:whistle: Welllll...this is awkward. :whistle:

I'm responsible for a lot of this. A lot a lot. I've been trying to organize my collection for the past few months, and have significantly widdled it down, eliminating duplicates, etc...so that it's now down to 80,000 files.

Now I've never done a full scrape of everything, all at once but in the early days I had to be scraping 10's of thousands a day. I have, over time, greatly refined my process, so I'm doing a fraction of what I used to, but I've still probably been doing a few thousands scrapes a day (just not tens of thousands, like I used to). All that being said:

Why I did it:

Basically, it comes down to 2 things: 1.) I didn't know it would cause a problem. I figured it would check the database one at a time, and since it only download a small text-file worth of data, and even then, only when when it found a match, it wasn't a big deal. And 2.) My lack of experience with smart lists and methodology left me scraping in a very, very inefficient manner.

How I've cut back *significantly* (and how you can too!)

1.) This line:



Is your best friend. Build smart lists around it, and it will show you only the files that have not been scrapped, so you're not wasting your time with re-scrapes. When I started, I scrape thousands and thousands of books, manually identifying the ones it wouldn't recognize, stop for the night, come back, and rescrape EVERYTHING. Using this line saves ME a ton of time and prevents a flood of sever-hits. If you want to scrape everything that hasn't been scraped in a particular directory, add this line:



2.) Get your files named consistently. Once everything is scraped, you can use the organizer script to rename your files according to metadata, which is awesome. But until then, you can save yourself a lot of time by making sure that your files have fairly "clean" names before you scrape them. Easiest way to do that? Ant Renamer. There are similar products, but this one is free and works well. Get to know it.

For example, I downloaded tons of books that had numbers at the start of the file name. Some were story-arc bundles with the files numbered chronologically, some were just packs where the files had year/month before them. What I did was gathered all the one with number before them, sorted them by name, and then used the "character deletion" option to remove all the numbers. The scraper went from recognizing NONE of the files (because it was trying to reconcile the numbers as part of the file name) to recognizing the vast majority of them.

and finally:

My advice & notes to cabanack

1.)First of all, allow me to apologize. For what it's worth, my part in this was completely a result of ignorance, not malice or lack of caring. I've cut back quite a bit, and I'll ration myself further.

2.) As I said before, I definitely think the individual API idea is a good one. I'm a regular contributor to Comic Vine, and would be happy to use my own API key. In fact, if you can give me a tutorial on HOW (I'm in IT, but I don't do any coding; so you'll have to guide me a bit), I'll be happy to replace yours with mine right now.

3.) I haven't updated the scraper in months; because I didn't know there was and update. If there was some way to easily check for script updates, it would be useful.



Again, really, really sorry. :( :blush:
I am currently reading every Marvel Superhero comic book every printed, in chronological order, and blogging about the milestones, footnotes, and other interesting moments I read at http://www.wraithscomicjourney. I'll be adding DC when I hit 1985, and other companies when they launch.
Last Edit: 3 years 8 months ago by WraithTDK.
The administrator has disabled public write access.
The following user(s) said Thank You: 600WPMPO

Comic Vine Scraper 3 years 8 months ago #38746

  • actioncomics
  • actioncomics's Avatar
  • Offline
  • Senior Boarder
  • Posts: 43
  • Thank you received: 6
  • Karma: 2
I have to apologize also because i have scrapped alot this last week.

If it is possible could someone give us instructions on where to put our own API code? I have one and would be willing to change it if necessary.
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38747

  • kenjio
  • kenjio's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 597
  • Thank you received: 127
  • Karma: 32
I'm also on board with the idea of getting individual API keys.
If enough of us "heavy" scrapers get our own, cbanack's key may not be overloaded, and can still be used for the occasional scrapers out there.
I'm baaaaaaaaaaaaaaack!!
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38748

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Same here. An user an API, so everyone is responsible for their own doings!

As a matter of fact, numbers do not strike me as too high. Just consider this week 0-days+hitlists added up to almost 4000 comics (yes, wednesday!)... Just 100 people scraping this would mean 400000 comics being scraped, every week...
The administrator has disabled public write access.

Comic Vine Scraper 3 years 8 months ago #38749

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
HOW TO GET YOUR OWN API TO WORK WITH CVS:

1) Download the latest version of the CVS, and install it (latest version)

2) Register to comicvine

3) Login to comicvine and go to www.comicvine.com/api to grab your api key

4) In your CR scripts directory, (most likely under C:\Users\****yourUserName****\AppData\Roaming\cYo\ComicRack\Scripts\Comic Vine Scraper), edit the file cvconnection.py, and in line 29 replace the public API key with your own.

You are all set!
The administrator has disabled public write access.
The following user(s) said Thank You: 600WPMPO, actioncomics, SiPfan, vc4u, boshuda, KnobblySavage, baltoc, Ron Post-modernist, Couverdude

Comic Vine Scraper 3 years 8 months ago #38751

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 233
cbanack wrote:
KnobblySavage wrote:
Would it be possible to add the option of entering our own api-key in the settings, and if the field is left blank it defaults to yours, so nobody is forced to register at CV if they do not want to, but those of us who do can lighten the load on your api key at least a little bit?

Yeah, that's not a bad idea. I'll wait and see how things go over the next little while.
I'm also in favour of individual API keys. :)

Only doubt: If, someday, ComicVine bans an individual's key, how would he scrape? :unsure:
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.
Time to create page: 0.349 seconds

Who's Online

We have 263 guests and 3 members online