Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper

Comic Vine Scraper 6 months 1 week ago #47435

  • Crazher
  • Crazher's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 1
I had the latest ComicVine Scraper (1.0.93), but I reinstalled it just in case. It still does not update the day, just the year and month.

i.imgur.com/bLcnoHh.png - Notice how the day is not listed. I have re-scraped after reinstalling 1.0.93.
comicvine.gamespot.com/the-unbeatable-sq...-girl-1/4000-475464/ - Clearly shows the day.

If this is supposed to work, I hope I can sort it out as I would love to sort my comics with that little extra touch of having the release date as well, as my sorting goes \Year\Month\Day usually. Its not a big issue obviously, but would be a nice touch to have :)
Last Edit: 6 months 1 week ago by Crazher.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47436

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 295
  • Thank you received: 64
  • Karma: 8
ComicVine doesn't want there to be published days in their database. They only want days for the release days. On their site, I believe it's 'Cover Date' versus In Store Date'. Published is month/year only. On shelves has the Day, Month, and Year. This corresponds to the Wednesdays that comics traditionally hit the shelves. Based on your picture of the comicVine website one of two things probably happened. They changed that restriction for Cover Date recently, or whoever is putting the 'Cover Date' day in there is going to get yelled at by PikaHyper. I know, because I was yelled at for doing that :blush: .

So, if you want to sort by day also you have to use the Released date in Comic Rack. And that date doesn't appear on the screen you showed. You have to enable the catalog tab somewhere under Preferences. And/or you can enable the 'Released' column in the details view in the library browser. I often do the latter. Unfortunately, you can't use that data to sort when you sync to your Android (and probably iOS) device. At least not as far as I've been able to find.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47437

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1327
  • Thank you received: 508
  • Karma: 182
Hi guys,

I took an hour this afternoon to look into this, and it turns out Crazher is not imagining things. :laugh: The scaper's ability to scrape the publication DAY has been deliberately disabled. It hasn't been scraping the DAY for at least 4 years! I had completely forgotten about this, and to my everlasting shame, the reason why I made this change is now lost forever, thanks to the very poor comment I added to the code when I did it.

The relevant code is in cvdb.py. It looks like this:

         issue.pub_year_n = parts[0] if len(parts) >= 1 else None
         issue.pub_month_n = parts[1] if len(parts) >=2 else None
         # corylow: can we ever add this back in??
         #issue.pub_day_n = parts[2] if len(parts) >= 3 else None

Notice the last two lines that start with # (which means they are commented out). The last line in particular is key -- that's the line that scrapes the publication DAY. But it's 'turned off' by that '#' on the front. The line above it (which is meant to explain) is marvelously cryptic. What was I thinking? I don't know. I must have had a good reason...but it's long gone from my mind now.

Anyway, all I need to do is remove the # from the front of that last line and rebuild a new version of the Scraper, and we should start scraping the publication DAY as Crazher is requesting. But I'm not going to do that just yet, because I'm worried that whatever caused me to disable this years ago might still be a lingering problem. I don't want to break the scraper for everyone.

Instead, maybe some of you guys could test it out first? If a number of you try it, and lots of scraping is completed without any issues, then I will put the change into a new 'official' release of the scraper for everyone.

Testing should be easy:

Just find the cvdb.py source file that is installed on your computer, It's in the folder where ComicRack stores its installed plugins, which I believe is ..\ComicRack\Data\Scripts\ComicVineScraper\. (Someone around here will know the exact spot.) Then open cvdb.py, go to line 518 and change it from this:

#issue.pub_day_n = parts[2] if len(parts) >= 3 else None

to this:

issue.pub_day_n = parts[2] if len(parts) >= 3 else None

That should involve pressing the delete or backspace key exactly once; don't change spaces or anything else. Voila! If you weren't before, you are now a programmer. B)

Save the file, then open ComicRack and try scraping. You should see the DAY of publication being scraped now (it's in the info dialog for your comic) as well as the MONTH and YEAR. Scrape a bunch more comics (or just continue your regular scraping activities) but keep an eye out for problems. Hopefully there will be none. If enough people report back here that they've tried this and they aren't running into any issues, then I'll make the change official.

Cheers!
Last Edit: 6 months 1 week ago by cbanack.
The administrator has disabled public write access.
The following user(s) said Thank You: Crazher

Comic Vine Scraper 6 months 1 week ago #47438

  • Crazher
  • Crazher's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 1
Sounds awesome :D

Well I did find this from 2013: github.com/cbanack/comic-vine-scraper/issues/271 while searching for it earlier, where you mentioned that it was due to ComicVine not liking how the Scraper was accessing the information. Perhaps that was it? Although someone did post later that there is an API feature to do it back in July 2016, for the store date that is.

I tried your hotfix, and it worked i.imgur.com/Q3MszUd.png! :) Whether there is an issue with how it accesses the information and the load on ComicVine, is something outside of my scope of knowledge though.

Now... Another question.. Can I make the "Number of" be scraped too? xD As in, this is #2 of "X"?

Edit: Also quick note, you mispelled the file in this part "Just find the csvb.py source" :P
Last Edit: 6 months 1 week ago by Crazher.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47439

  • oraclexview
  • oraclexview's Avatar
  • Offline
  • Moderator
  • aka SoundWave
  • Posts: 906
  • Thank you received: 182
  • Karma: 37
Lol, and here I was thinking Crazher was talking about the comic's Release Date.

Great find cbanack about the comic's Cover Date.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47440

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1327
  • Thank you received: 508
  • Karma: 182
Crazher wrote:
Now... Another question.. Can I make the "Number of" be scraped too? xD As in, this is #2 of "X"?

Not as easily as turning off a commented out line of code, that's for sure. :laugh:

In fact (though I can't remember the details precisely) it's likely not a possibility at all. When the scraper was originally written, we went through the data that ComicVine's interface provides very carefully, and tried to make use of anything we could. So even without spending another hour looking into it, I can still be 99% sure that if this information isn't being scraped, it's simply because it is not available to us. That's true regardless of whether or not it is visible on the ComicVine website.

Edit: Also quick note, you mispelled the file in this part "Just find the csvb.py source" :P

Oops, that's what I get for rushing. Thanks for pointing it out, I've fixed it. :)
Last Edit: 6 months 1 week ago by cbanack.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47441

  • Reason
  • Reason's Avatar
  • Offline
  • Junior Boarder
  • Posts: 32
  • Thank you received: 1
  • Karma: -1
for some reason I'm not finding the file to change :(
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47442

  • Crazher
  • Crazher's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 1
Aww thats too bad, but nevertheless thanks a ton for this plugin, even after all these years its still working perfectly and has come to great use :D And I am glad I got the day part in there too!

Reason wrote:
for some reason I'm not finding the file to change :(

Reason: Its in C:\Users\YOURNAME\AppData\Roaming\cYo\ComicRack\Scripts\Comic Vine Scraper\
Last Edit: 6 months 1 week ago by Crazher.
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47443

  • Reason
  • Reason's Avatar
  • Offline
  • Junior Boarder
  • Posts: 32
  • Thank you received: 1
  • Karma: -1
Crazher wrote:
Aww thats too bad, but nevertheless thanks a ton for this plugin, even after all these years its still working perfectly and has come to great use :D And I am glad I got the day part in there too!

Reason wrote:
for some reason I'm not finding the file to change :(

Reason: Its in C:\Users\YOURNAME\AppData\Roaming\cYo\ComicRack\Scripts\Comic Vine Scraper\

Thanks found it, going to give it a try now
The administrator has disabled public write access.

Comic Vine Scraper 6 months 1 week ago #47444

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 295
  • Thank you received: 64
  • Karma: 8
cbanack wrote:
Crazher wrote:
Now... Another question.. Can I make the "Number of" be scraped too? xD As in, this is #2 of "X"?

Not as easily as turning off a commented out line of code, that's for sure. :laugh:

In fact (though I can't remember the details precisely) it's likely not a possibility at all. When the scraper was originally written, we went through the data that ComicVine's interface provides very carefully, and tried to make use of anything we could. So even without spending another hour looking into it, I can still be 99% sure that if this information isn't being scraped, it's simply because it is not available to us. That's true regardless of whether or not it is visible on the ComicVine website.

Edit: Also quick note, you mispelled the file in this part "Just find the csvb.py source" :P

Oops, that's what I get for rushing. Thanks for pointing it out, I've fixed it. :)
I think a lot of the problem is that the number of issues they supply is wrong. So you would have to pull the total entries and count them yourself. But that adds the problem of series that change numbering in the middle, then pick it back up. Like amazing Spider-Man. So while there are probably only 500 total entries, the last issue in the series is closer to 700. Which number should be used? Not too mention those weird #1000000 from DC...
The administrator has disabled public write access.
Time to create page: 0.220 seconds

Who's Online

We have 265 guests and 5 members online