Welcome, Guest
Python Scripts for ComicRack

TOPIC: New Advanced Option for ComicVineScraper

New Advanced Option for ComicVineScraper 4 years 8 months ago #30961

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Hi

I added a new advanced option to ComicVine scraper, I am neither a ComicVine Scraper developer nor one of the creators. I hope I didn't need to ask for permission. The code below has been tested and it works perfectly, but if you want to feel safe you can wait until one the developers checks it before using it. To install just replace these two files in ComicVine scraper folder.

New Advanced Option
MINIMUM_SERIES_LENGTH=XXXX (any numeric value)
thanks to this option the scraper will always ask you to confirm before searching the database in case "series" is a very short word, like "ma", "3a" etc.
The default value is 3, two-letter words are not scraped, three-letter words are.

I added this option because when the value of "series" is a very short word, 1 or 2 characters, the scraper looks into ComicVine database for all the possible matches, which in the case of words like "ma", reach the thousands and more. Also, scraping cannot be stopped at that point, you need to wait until it downloads all the possible matches and it takes a while, especially with a slow connection.
Sometimes I scrape a stack of 100 comics or more, before doing it I use the TagsFromName script to obtain the series' name but occasionally you get names like "ma", "xx" if the file name doesn't follow the standard pattern.

So I altered two files "scrapeengine.py" and "configuration.py"
All my changes have been highlighted with a "Wolverutto" comment.
Practically I just added a new integer option to the config class and this code to the engine:
if book.series_s.__len__() < self.config.min_series_length_n:
manual_search_b = True

File Attachment:

File Name: CV_min_length.zip
File Size:15 KB
Hic
Last Edit: 4 years 8 months ago by Wolverutto. Reason: Files fixed
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 8 months ago #31101

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
This is an interesting problem...I've left a note in the Comic Vine Scraper issue tracker for me to look into this again at some point, perhaps I'll integrate your solution or something similar into the main codebase for the scraper.

Are there very many series out there with very short, common names that cause this problem? For me this problem manifests itself more often when I'm trying to scrape series like "Batman" or "Spiderman" or "Superman", because those terms appear in the names of so many series. (The actual problem here is that ComicVine's API won't let you load more than 20 results for each web query, so it takes much longer to load 200 or 2000 search results than it should.)

Note that you CAN cancel you scrape if you accidentally search for "ma" or something like that. Just click on the cancel button or close the progress bar window--it's not very responsive (unfortunately there's not much I can do about that.)

Also, another way you can bypass this problem is by telling the scraper that a specific folder always contains comics from one particular series. This can be done with CVINFO files:

code.google.com/p/comic-vine-scraper/wiki/AdvancedFeatures
Last Edit: 4 years 8 months ago by cbanack.
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 8 months ago #31102

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Oh, I don't why but I was convinced that you couldn't stop the procedure. I tried now and it did, so I don't think that my addition is really necessary.

So you created the script?
I have to thank you, apart from being a good tool for comic lovers it was well documented. I had never seen a line of Python in my entire life but it didn't take long to know what to add, thanks to all your comments throughout the code.
Bye
Hic
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 8 months ago #31104

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Yes, I wrote pretty much all of the code that currently makes up the scraper, although it was originally based on a comic vine project that was started a few years ago by some other users (at least one of them is still active in this forum.)

I'm glad you found the code helpful. As far as I'm concerned, the most important thing when it comes to writing good code is making it easy to understand and maintain. So if someone who has never seen python in their life was able to go in and make useful modifications like you did, I must be doing something right! :)
The administrator has disabled public write access.
The following user(s) said Thank You: donspace

Re: New Advanced Option for ComicVineScraper 4 years 8 months ago #31107

  • Madmatx
  • Madmatx's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 457
  • Thank you received: 63
  • Karma: 19
The CVINFO option is great. It's especially nice for titles like "The Spider", "The Avengers" and "Batman" and it certainly would help if you read something called "ma"

I use it for all my top picks that I keep in seperate folders outside of the drive where Library Organizer puts everything.
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 7 months ago #32990

  • CodeComix
  • CodeComix's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 18
  • Karma: 0
Would it be possible for Scrape to scrape Story Arc from Comicvine, CR has the ability to group as such and I think it would be cool to have that....

thanks:)
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 7 months ago #32991

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Well, that depends on what you mean by "story arc".

The scraper already scrapes what ComicVine calls a story arc (e.g. Age of Ultron) into each comic; it puts that data into the "Alternate Series or Storyline Title" field.

The scraper does not (and cannot) populate ComicRack's "Story Arc" field, however, because that information is not available in the ComicVine database.

This is explained in more detail here.
Last Edit: 4 years 7 months ago by cbanack.
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 7 months ago #32992

  • donspace
  • donspace's Avatar
  • Offline
  • Senior Boarder
  • Posts: 67
  • Thank you received: 17
  • Karma: 7
Corey ... forgive me if I've said it before, but you deserve all the credit in the world for what you do. Yeah, it's great you adopted the script and maintain it, but what I'm mostly talking about is how responsive you always are when someone brings up a question or whatever.

It borders on ridiculous. It's like, if we don't see a response within a few hours at the most, we start thinking you must have got struck by lightening or something! Thanks for staying on top of it like you do ...

D
Java devil, you are now my bitch.
The administrator has disabled public write access.

Re: New Advanced Option for ComicVineScraper 4 years 7 months ago #32993

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Thanks for that; I just do what I can... :blush:

It's really cool to have a community of people that are so interested in my little side project. :)
Last Edit: 4 years 7 months ago by cbanack.
The administrator has disabled public write access.
Time to create page: 0.216 seconds

Who's Online

We have 231 guests and one member online