Welcome, Guest
Python Scripts for ComicRack

TOPIC: Comic Vine Scraper further Development

Comic Vine Scraper further Development 2 years 1 week ago #43550

  • hyperspacerebel
  • hyperspacerebel's Avatar
  • Offline
  • Junior Boarder
  • Posts: 31
  • Thank you received: 9
  • Karma: 1
GCD definitely lags behind sites like ComicVine as far as getting data for newer comics. I find it most useful for older, more niche comics that CV doesn't have, because GCD tends to have at least a modicum of data for them. I also like GCD's story-based data system, where most metadata isn't attached directly to the issue, but instead to the individual stories/sections that make up the comic. So you can find individual writers, character-lists, page count, etc. for each story within the issue, instead of clumped all together as issue metadata. I've often wished ComicRack supported defining stories within an issue and letting us tag those, but I can imagine that being a huge undertaking to integrate something like that and that we'll probably never see that.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43551

  • Jothay
  • Jothay's Avatar
  • Offline
  • Senior Boarder
  • 1
  • Posts: 45
  • Thank you received: 23
  • Karma: 6
I'll have some architecture documentation up shortly for how I'm building it, see below:

Windows Server 2012 R2 with
* 4GB RAM (scale up for prod)
* 80-128GB HDD space

Solution will be:
* Hosted on GitHub (once I get the basics set up)
* C# .NET 4.5.2 compiled using Visual Studio (you can download Visual Studio Community Edition to compile it yourself)
* Using Repository Pattern with EntityFramework 6.1.3 Code-First with the following projects:
* * 01.ComicVine.API: Core interfaces and enums
* * 02.ComicVine.API.DataModel: EF CodeFirst Database Schema
* * 03.ComicVine.API.Models: Concrete Data Transfer Objects
* * 04.ComicVine.API.Repositories: Repository actions
* * 04.ComicVine.API.Repositories.Testing: Unit Tests of Repositories
* * 05.ComicVine.API.Mappings: Mapping between EF Entities and Models
* * 05.ComicVine.API.Mappings.Testing: Unit Tests for Mappings
* * 06.ComicVine.API.BusinessWorkflows: Business Logic to call data based on the Service Request
* * 06.ComicVine.API.BusinessWorkflows.Testing: Unit Tests for BusinessWorkflows
* * 07.ComicVine.API.Services: ServiceStack Services Layer
* * 07.ComicVine.API.Services.Testing: Unit Tests for Services
* Much of the system will be generated using T4 template code generation. This allows for changes to be made to the Code First schema as a baseline and automatically creating a ton of boilerplate code that supports the full system
* Goal is 98% Unit Test coverage on BusinessWorkflows project, 80% everywhere else (we'll probably sit at 95% coverage on full solution)
* A separate solution will be created to contain the windows service which will handle pulling deltas from ComicVine and pushing it into the local database.

Last Edit: 2 years 1 week ago by Jothay.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43564

  • Jothay
  • Jothay's Avatar
  • Offline
  • Senior Boarder
  • 1
  • Posts: 45
  • Thank you received: 23
  • Karma: 6
Alphabetically, I'm down to Series on the api documentation page.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43567

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 178
  • Thank you received: 13
  • Karma: 2
Jothay wrote:
Alphabetically, I'm down to Series on the api documentation page.

Great work Jothay. Don't forget, we don't every single API call listed on their documentation page... only the one CVS calls.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43568

  • Jothay
  • Jothay's Avatar
  • Offline
  • Senior Boarder
  • 1
  • Posts: 45
  • Thank you received: 23
  • Karma: 6
In order to store the data that CVS calls, I wanted to be safe and store all the different tertiary calls contents. Besides, my code generation means I only really write the tables, the rest of the code takes an hour per layer instead of days.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43569

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 178
  • Thank you received: 13
  • Karma: 2
Jothay wrote:
In order to store the data that CVS calls, I wanted to be safe and store all the different tertiary calls contents. Besides, my code generation means I only really write the tables, the rest of the code takes an hour per layer instead of days.

I am not a DB guy and don't claim to be. Whatever is easier and best for you to implement. May make things easier in the future.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43574

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 760
  • Thank you received: 247
  • Karma: 55
It is not a bad idea to replicate the existing functionality because my guess is a solid repository will make the existing system look antiquated and prompt the CV guys to change their approach. In an ideal world simply demonstrating a more up to date approach would be enough to make them take it on board as their own.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43576

  • krwren
  • krwren's Avatar
  • Offline
  • Junior Boarder
  • Posts: 37
  • Thank you received: 1
  • Karma: 1
Looks good. I don't have time to volunteer to help code it, but I will look over the code when posted and give any constructive feedback I see.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43587

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 280
  • Thank you received: 62
  • Karma: 7
I've got the code branched (cloned, whatever) and have made some version of the changes. I want to make it a more complete removal, clean it up, and test it out before pushing my changes into a proper release. Plus I'd like cbanack's blessing before making a proper release. Quick and dirty is to simply comment out the offending code in cvdb.py and hope nobody clicks the 'more covers' link. However, since there's a workaround the urgency seems lessened. I would like to more fully remove the hidden option and the more covers link, as well as change the signature of the function and everything that calls it all the way out to the option.

cbanack, what do you do to test CVS? You have the Run Unit Tests in there, as well as the Run Comic Vine Scraper which scrapes the simulated books. But do you have some [in]formal testing procedure you follow before releasing? Maybe a checklist of things to look for (version incrementing, readme updating, anything like that)?
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 1 week ago #43588

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
boshuda wrote:
I've got the code branched (cloned, whatever) and have made some version of the changes. I want to make it a more complete removal, clean it up, and test it out before pushing my changes into a proper release. Plus I'd like cbanack's blessing before making a proper release. Quick and dirty is to simply comment out the offending code in cvdb.py and hope nobody clicks the 'more covers' link. However, since there's a workaround the urgency seems lessened. I would like to more fully remove the hidden option and the more covers link, as well as change the signature of the function and everything that calls it all the way out to the option.

It looks like the ability to get those two currently problematic pieces of data (community ratings and alternative covers) might be coming to the official Comic Vine API soon. So rather than completely removing all references to that data and the features that use it, you might want to just remove the bad http request from the code and make the data always be blank/empty. I.e. make it so clicking on 'Search for More Covers' just doesn't find anything, and scraping the community rating just never finds any value. The rest of the CVS code should be able to handle not finding any data in these two cases, since that already happens with some comics.

And in the future if those two pieces of data become part of the proper Comic Vine API (or the new API server that's being built), it would be a simple matter to start obtaining those details again during scrapes, and then all the features would just start working again.

Since you're forking the scraper, I would also suggest taking some time to go through the code and rebrand it; give it a new name and author (maybe 'Comic Vine Scraper Remake', or 'Boshuda's Scraper', or whatever.) That will avoid a lot of confusion and ensure people know which fork they are using. If you search for uses of the strings 'Comic Vine Scraper' and 'Cory Banack', that should find you most of what you'd want to change.

cbanack, what do you do to test CVS? You have the Run Unit Tests in there, as well as the Run Comic Vine Scraper which scrapes the simulated books. But do you have some [in]formal testing procedure you follow before releasing? Maybe a checklist of things to look for (version incrementing, readme updating, anything like that)?

  • I make sure the version number is changed (I never re-release with the same version number, as that just confuses people.)
  • I run the unit tests. The 'Run Comic Vine Scraper' launch target is mostly used during actual development, to iteratively test new features.
  • When I'm ready to do real testing (i.e. just before I release a new version) I usually try to grab a pile of recent releases and scrape them, utilizing as many different variations of settings and features as possible, and obviously focusing more heavily on any areas that I've changed.
Last Edit: 2 years 1 week ago by cbanack.
The administrator has disabled public write access.
The following user(s) said Thank You: boshuda
Time to create page: 0.222 seconds

Who's Online

We have 217 guests and 2 members online