Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper

Comic Vine Scraper 1 year 1 week ago #49876

  • fieldhouse
  • fieldhouse's Avatar
  • Offline
  • Expert Boarder
  • Posts: 96
  • Thank you received: 10
  • Karma: 1
cbanack wrote:
Drybonz wrote:
Here's a minor search issue I found today. If you search "Chamber of Chills" it does not show "Chamber of Chills Magazine" (sometimes this is called Chamber of Chills v2) as a search result. If I search just "Chamber" It does show "Chamber of Chills Magazine" as a result.

*edit* Same problem for the title "This Magazine is Haunted". Looks like the word "magazine" is screwing things up.

Perfect, this is exactly the info I need. I'm traveling right now, but when I get home this should be pretty straightforward to fix. The scraper tries to 'fix' queries before sending them to Comic Vine, but that's now causing more problems than it solves, so I just have to take that bit out.

Anything with a colon gets borked. I suspect that the colon is being included as part of the word during the filtering. Why you can't search for everything after the colon is beyond me.

Example:
1001 Arabian Knights - The Adventures of Sinbad ...turns up nothing
The Adventures of Sinbad ...turns up nothing
1001 Arabian Knights ...again, nada
1001 Arabian Knights: The Adventures of Sinbad ...nope, doesn't exist :pinch:
1001 Arabian ...bingo! hey, did you mean 1001 Arabian Knights: The Adventures of Sinbad? :silly:
The administrator has disabled public write access.

Comic Vine Scraper 1 year 1 week ago #49877

  • Drybonz
  • Drybonz's Avatar
  • Offline
  • Gold Boarder
  • Posts: 318
  • Thank you received: 3
  • Karma: 11
cbanack wrote:
Perfect, this is exactly the info I need. I'm traveling right now, but when I get home this should be pretty straightforward to fix. The scraper tries to 'fix' queries before sending them to Comic Vine, but that's now causing more problems than it solves, so I just have to take that bit out.

Thanks for staying active and keeping the scraper up to date. I don't know what I would do without it. Thank you.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 3 days ago #49964

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1371
  • Thank you received: 566
  • Karma: 188
Hi guys,

There's a new release of Comic Vine Scraper (v 1.0.96) available in the usual place.

There was some old code in the scraper that didn't play very well with the new Comic Vine Search API, so I removed it. This should make searches work a lot better. (The scraper should find your comic on the first try more often now, without you having to strip words out of the search.) More details here, if you're interested.

Happy Scraping!
Last Edit: 1 year 3 days ago by cbanack.
The administrator has disabled public write access.
The following user(s) said Thank You: forkicks, Drybonz, oraclexview, rmagere, Xelloss, Yhoogi, fieldhouse, Dereck, Targg, pueblo and this user have 3 others thankyou

Comic Vine Scraper 1 year 2 days ago #49966

  • ghoti
  • ghoti's Avatar
  • Offline
  • Junior Boarder
  • Posts: 25
  • Thank you received: 4
  • Karma: 0
thanks for all you do cbanack. cheers mate
The administrator has disabled public write access.

Comic Vine Scraper 1 year 2 days ago #49967

  • Targg
  • Targg's Avatar
  • Offline
  • Senior Boarder
  • Posts: 55
  • Thank you received: 11
  • Karma: 5
Thanks for the update! It is working much better now!

The update seems to have overwritten the option to not scrape the day when scraping the release day (day, month, year). I know that there is a way to only scrape the month and year, but can't find where it was shown in the forums. Does anyone know where in the code/options to turn off the scraping of the day field?
The administrator has disabled public write access.

Comic Vine Scraper 1 year 2 days ago #49983

  • n8thagr8
  • n8thagr8's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 13
  • Karma: 0
Looks like that was added in the 1.0.95 update. So it's not that you used to be able to turn it off, it used to not be able to do it at all.

github.com/cbanack/comic-vine-scraper/releases

Released Feb 17, 2018 (see all changes here)

-fixed issue searching for comics with 'vs' in the title.
-scraper now includes the release 'day' in scraped metadata, where possible (instead of just 'month' and 'year'.)
-the 'Direct URL' search feature works better now: you can paste the full URL of a Comic Vine volume OR issue into the search dialog to 'short circuit' the search and force the scraper to find the series that you are looking for.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 2 days ago #49984

  • n8thagr8
  • n8thagr8's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 13
  • Karma: 0
I did find how to turn it off though, open the cvdb.py file in notepad and do a ctrl+f for "published". You'll see this section:
# grab the published (front cover) date
   if "cover_date" in dom.results.__dict__ and \
      is_string(dom.results.cover_date) and \
      len(dom.results.cover_date) > 1:
      try:
         parts = [int(x) for x in dom.results.cover_date.split('-')]
         issue.pub_year_n = parts[0] if len(parts) >= 1 else None
         issue.pub_month_n = parts[1] if len(parts) >=2 else None
         issue.pub_day_n = parts[2] if len(parts) >= 3 else None
      except:
         pass # got an unrecognized date format...? should be "YYYY-MM-DD"

All you need to do is comment out the line about pulling the day so it looks like this:
# grab the published (front cover) date
   if "cover_date" in dom.results.__dict__ and \
      is_string(dom.results.cover_date) and \
      len(dom.results.cover_date) > 1:
      try:
         parts = [int(x) for x in dom.results.cover_date.split('-')]
         issue.pub_year_n = parts[0] if len(parts) >= 1 else None
         issue.pub_month_n = parts[1] if len(parts) >=2 else None
   # issue.pub_day_n = parts[2] if len(parts) >= 3 else None
      except:
         pass # got an unrecognized date format...? should be "YYYY-MM-DD"

Then save and reload comicrack and it won't pull the published day anymore.

As always be sure to save a copy of that file before messing with it. And to find that file, go to Edit->Preferences->Scripts and double click on comic vine.
The administrator has disabled public write access.

Comic Vine Scraper 1 year 2 hours ago #50009

  • pueblo
  • pueblo's Avatar
  • Offline
  • Junior Boarder
  • Posts: 30
  • Thank you received: 3
  • Karma: 0
Hi folks,
Is it just me or is Comic Vine Scraper always ignoring volume and year information?
For example, I have a comic book:

Series: Adventure Comics
Year: 2010
Volume: 2009
Number: 519

Each time I run CV Scraper it shows additional windows “Choose a Comic Book Series” and first on the list is Adventure Comics from vol 1938.
The correct one (from vol 2009) is second on list. I always have to choose it manually.
Am I doing something wrong? How can I enforce checking volume or year? Maybe it’s CV Search API limitation?
The administrator has disabled public write access.

Comic Vine Scraper 11 months 4 weeks ago #50011

  • Targg
  • Targg's Avatar
  • Offline
  • Senior Boarder
  • Posts: 55
  • Thank you received: 11
  • Karma: 5
Is it the only series? I can reproduce the error with Adventure Comics, but on every other series I try with multiple volumes, it picks the right volume if the volume field is filled in.
The administrator has disabled public write access.

Comic Vine Scraper 11 months 4 weeks ago #50012

  • pueblo
  • pueblo's Avatar
  • Offline
  • Junior Boarder
  • Posts: 30
  • Thank you received: 3
  • Karma: 0
I’ve made a test for few other series. It’s better. Volume is selected correctly but I still have to manually confirm each episode.

Adventures of Superman #465 V1987 – shows additional window (unnecessary) but at least correct volume is selected
All-American Comics #61 V1939 – same as above
Detective Comics #554 V1937 – same as above

When I run CV Scrapper for few Detective Comics issues (460, 471,481) with volume field filled in (they are however each in different folders) this additional window shows up for each issue. I have to confirm it manually every time, and I have the option ‘try to choose the correct series automatically’ (in CV Scrapper settings) selected.
Any ideas how to speed up this process?
The administrator has disabled public write access.
Time to create page: 0.273 seconds

Who's Online

We have 86 guests and no members online