Welcome, Guest
Python Scripts for ComicRack

TOPIC: Comic Vine Scraper further Development

Comic Vine Scraper further Development 2 years 2 months ago #43535

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 676
  • Karma: 182
This is should be the new thread to discuss further development of the ComicVine scraper plugin.

Topics may be:
* Short term maintenance
* Mid term improvements and changes
* Even getting away from ComicVine and come up with an independent solution

I'm happy to support any real attempts with changes and/or new features in ComicRack.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43537

  • iohanr
  • iohanr's Avatar
  • Offline
  • Junior Boarder
  • Posts: 23
  • Thank you received: 17
  • Karma: 8
fieldhouse wrote:
I have some hardware that could be used short term if that helps.

I'll have to think about cloud hosting. It really depends on sizing - how big the initial database is, how quickly it grows. Plus how optimized the api calls are. We don't want to simply offload the main CV servers and point everyone at an EC2 T1.Micro if it throttles everything back to a couple scrapes a second. That'd be worse than having to swap public IP's every couple days.

EDIT: how about Heroku or Cloud Foundry?

The cloud hosting providers that provide "free" hosting have different restrictions on them that may be unacceptable in some cases, i.e. "must sleep 6 hours in 24 hour period", Even AWS offers a free instance that is severely limited in terms of bandwidth transfer. They would be fine for simple development and testing of apps, but not for the amount of data I foresee we would be transferring on an ongoing basis once this database is up and running.

Even though I refer to my basement as a "ghetto" data center, I do have extensive experience with networking and systems, and I have 150Mbps of bandwidth (up/down) - which is more than the company I work has in their "real" data center. I can support and manage whatever platform you guys want - Windows or Linux. I am willing to build a whole new server with the specs you require (and yes, I am willing to foot the bill for it, since I will own the server(s) anyway), and make it as reliable as I can with a dedicated UPS, good airflow, etc. I suspect for a DB server that will be servicing a constant stream of queries, we will need something with lots of CPU and RAM. I doubt the database will be so big that a few TBs of RAID HD space wouldn't be sufficient. There is a finite number of back issue comics after all, and even new comics being added to the DB would not grow the DB exponentially - it would be more of a slow growth. My IP hardly ever changes, but if it does, I have my own domain name and I utilize Google Domains so that any change in my IP is dynamically updated to any hostname I want. The domain I have is "routex.net". If we point something like cvapi.routex.net to a server in my basement, it will always work, regardless of whether my public IP changes. The developers could have access via VPN or SSH.

So, basically what I'm saying is that I can get this stuff up and running pretty quickly and easily, and I will leave the important stuff - coding and db design - to you guys! The servers are the easiest part. Coordinating with CV and getting something going as far as a regular DB pull is probably the hard part. I don't even think the CVS part is difficult, as cbanack has done the majority of the work and if we can get the same functionality as CV's DB up and running, I would imagine forking CVS would just mean a few minor tweaks to where it points its API calls and some minor changes to disable HTML scraping.
The administrator has disabled public write access.
The following user(s) said Thank You: 600WPMPO, oraclexview

Comic Vine Scraper further Development 2 years 2 months ago #43538

  • iohanr
  • iohanr's Avatar
  • Offline
  • Junior Boarder
  • Posts: 23
  • Thank you received: 17
  • Karma: 8
It also occurs to me that we could use Heroku or some other free cloud hosted server as a "backup". Although my ISP (Verizon FiOS - fiber-optic to my house) has been rock solid and I can't remember the last time it was down (been years), it doesn't hurt to have a backup at a different site, especially if its free and wouldn't take any production traffic on a regular blue-sky day.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43539

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 768
  • Thank you received: 253
  • Karma: 55
I know it is very early to be discussing functionality of a future database query system, but I would reiterate what I have posted on the CV forum that the main thing missing from their API is a quick and easy way of grabbing data for comics released in a particular week.

Imagine you had the ability to query the database with a 'week-start' date and it returned all of the main scraped data for every comic in one go. All of the pain of scraping a new comic list would evaporate immediately.

At present you could use the CV API in this manner for weekly releases but it would still require multiple calls to obtain the data.

Also while people think about forking the scraper, you could split the functionality to separate out weekly release scrapes. This would reduce the uncertainty of which comic was being looked for by the user. The task of adding a few random old comics is a very different task to adding a brand new comic released in the last few weeks. There is no reason the scraper needs to treat these as the same.
Last Edit: 2 years 2 months ago by jkthemac.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43540

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 296
  • Thank you received: 65
  • Karma: 8
I'm not much of a developer but it is finally becoming my day job (kids, don't let your first job be in any way related to test if you have any hope of ever actually developing code), and my life is finally slowing down. I would love to either fork or help fork the scraper. I don't know much about api development, web servers, and that side of the world, but I would love to learn. Even if my only assistance is on things like code reviews, documentation, etc I'm here to help.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43542

  • Fuzzyluzzi
  • Fuzzyluzzi's Avatar
  • Offline
  • Gold Boarder
  • Posts: 310
  • Thank you received: 45
  • Karma: 11
are there no other options besides Comicvine? Comicbookdb?
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43545

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 296
  • Thank you received: 65
  • Karma: 8
Fuzzyluzzi wrote:
are there no other options besides Comicvine? Comicbookdb?
I'm pretty sure ComicVine is the only DB with an API. The rest have explicit restrictions against doing an html scrape. But, maybe that's changed.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43546

  • hyperspacerebel
  • hyperspacerebel's Avatar
  • Offline
  • Junior Boarder
  • Posts: 31
  • Thank you received: 9
  • Karma: 1
GCD has a weekly database dump. Maybe down the line, if this database project gets off the ground we can think about doing a similar thing with the GCD database and giving CVS users a choice. One step at a time though.
Last Edit: 2 years 2 months ago by hyperspacerebel.
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43548

  • iohanr
  • iohanr's Avatar
  • Offline
  • Junior Boarder
  • Posts: 23
  • Thank you received: 17
  • Karma: 8
I actually met the GCD guys at Awesome Con last year - nice folks. They would probably be open to working something out with us on this project. That's another avenue worth exploring in the future...
The administrator has disabled public write access.

Comic Vine Scraper further Development 2 years 2 months ago #43549

  • mcpierce
  • mcpierce's Avatar
  • Offline
  • Senior Boarder
  • Posts: 46
  • Thank you received: 2
  • Karma: 1
I've been wondering about this, about whether there were alternatives. I like the idea of a NP doing this and would gladly support them for this.

Do we know the breadth of their database as compared to CV? Do any publishers support any of them by providing credits in advance?
Check out The Comic Book Update podcast!
www.comicbookupdate.com/
The administrator has disabled public write access.
Time to create page: 0.231 seconds

Who's Online

We have 265 guests and 3 members online