Welcome, Guest
Python Scripts for ComicRack

TOPIC: Comic Vine Scraper suggestion

Comic Vine Scraper suggestion 3 years 7 months ago #38294

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 760
  • Thank you received: 248
  • Karma: 55
While Comic Vine struggles to get it's searches back into order, I find myself wishing for an option in the CV-Scraper to use existing scraped comics within a directory to help find the relevant comic.

At present CV is so off the mark it is only possible to find pretty main stream comics such as Uncanny X-Men or Wonder Woman by cutting down the search string to Uncanny or Wonder.

I find myself making increasing use of the very useful cvinfo files to get things scraped, but it would be really cool if you could make the scraper at least make use of the information often sitting right there in other comics within the folder.

I envision the scraper first checking if there are already scraped comics in the folder that all belong to the same volume, and if so looking at the search string and seeing if it pattern matches those comics. If so then it would not only search comic vine normally but add in the probable volume if the search results don't include that volume and automatically treat it as the most likely and or automatic option based on the "try to choose the correct series automatically" toggle.

This would also make the scraper more robust when Comic Vine was struggling. Making for less misdirected user frustration.

Another idea that would work well for the folder organisers amongst us would be an option to automatically generate a cvinfo file while scraping if all comics in the folder after scraping are from the same volume.
Last Edit: 3 years 7 months ago by jkthemac.
The administrator has disabled public write access.

Comic Vine Scraper suggestion 3 years 7 months ago #38296

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
jkthemac wrote:
While Comic Vine struggles to get it's searches back into order, I find myself wishing for an option in the CV-Scraper to use existing scraped comics within a directory to help find the relevant comic.
I considered adding a feature like this a few years ago, back when I was actively working on the Comic Vine Scraper (as opposed to maintaining it, which is what I am doing now.) The main reason I decided not to do it was that it simply wasn't necessary; back when searching worked properly it did a very good job of finding the right comic on its own. This is also not an easy feature to implement, since the ComicVineScraper does not know how to read your comic files directly out of your folders--it only interacts with ComicRack's database, which does not organize the comics into folders. So the first step to this feature involves scanning the entire database to reconstruct a temporary model of your comic collection's file structure in memory. There would be performance costs to this for people with large collections.

These performance costs are not worth it, given that the feature is really just a band-aid to make things less annoying while we wait for ComicVine to fix their search API. Once things are fixed, things will go back to the way they were and the feature wouldn't be helping anything anymore. But it would still be costing memory and slowing things down.

At present CV is so off the mark it is only possible to find pretty main stream comics such as Uncanny X-Men or Wonder Woman by cutting down the search string to Uncanny or Wonder.
Yes, as far as I am concerned, the scraper is broken right now. I'm kind of amazed that anyone is still trying to use it; I've simply stopped scraping my comics until ComicVine is fixed.

This would also make the scraper more robust when Comic Vine was struggling. Making for less misdirected user frustration.
It might help with that, but honestly I have no interest in trying to working around a web API that is so severely broken. If ComicVine can't manage to return the correct results to the Comic Vine Scraper, then ultimately no amount of work on my part is going to fix that. I hate to say it, but it comes down to this: if ComicVine fixes the search problem, there's no reason for me to build a bunch of temporary partial fixes, and if they don't fix the problem, then Comic Vine Scraper is dead. It simply can't function without a working search API from ComicVine!

(I don't have any reason to think that they're not planning to fix the problem. And I'm just as frustrated as everyone else with how long it is taking. Maybe more so, since I've been getting a steady stream of complaints for weeks now.)

Another idea that would work well for the folder organisers amongst us would be an option to automatically generate a cvinfo file while scraping if all comics in the folder after scraping are from the same volume.
This is a good idea, except that I don't really want to put something like that into the scraper. The CVINFO file was never intended to be a mainstream feature; the only reason it is popular right now is because it offers another way to avoid using the broken search feature. As I mentioned here, I'm sure someone could whip up a script create CVINFO files fairly easily. I don't have time to do it myself. :dry:
Last Edit: 3 years 7 months ago by cbanack.
The administrator has disabled public write access.
The following user(s) said Thank You: jkthemac

Comic Vine Scraper suggestion 3 years 7 months ago #38383

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 760
  • Thank you received: 248
  • Karma: 55
Hi,
Sorry to hear you sounding so dispondent about this. I can assure you even in the current state your scraper tool is still doing an excellent job. One just has to change all of the series names to a single word and it operates fine. Suggesting that CV only has a problem with how it passess multi-word search terms.

To play devils advocate (I understand if you don't want to push the script forward) you wouldn't need to create a database to make use of the comics in a folder, you could just use the XML files in the archives where present and if all of the XML data matched a voulume then add that to the results.
The administrator has disabled public write access.

Comic Vine Scraper suggestion 3 years 7 months ago #38384

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
Yes, the scraper is great. It is a problem of the comicvine website.

But it would still be great to add some stuff like looking up existing scraped comics in the database and use this info to get better results.

I know that you're currently only in maintenance mode of the scraper (like I'm with ComicRack), but as this is open source, maybe some people would be willing (and you allow) to join your project on google code and contribute.
The administrator has disabled public write access.

Comic Vine Scraper suggestion 3 years 7 months ago #38389

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
jkthemac wrote:
To play devils advocate (I understand if you don't want to push the script forward) you wouldn't need to create a database to make use of the comics in a folder, you could just use the XML files in the archives where present and if all of the XML data matched a voulume then add that to the results.

As I mentioned, the scraper doesn't know how to directly read .cbz files. Not only would it be lot of effort for me to add this ability, but in many cases it still would not solve the performance issues I mentioned above. Some series have hundreds of issues--in those cases, loading in all those cbz/cbr/cb7 files as a precursor to scraping one comic would be painfully slow. Especially for people who store data on network drives.

Setting aside the technical hurdles associated with your suggestion (my fault for bringing them up when I don't really want to debate them), the reason I'm not interested in implementing your suggestion is because it's a temporary workaround. A band-aid. A way to bypass the broken search feature and get the scraper up and running again. Don't get me wrong; performance issues notwithstanding, I do think your idea would work. It could make the current search API bug a lot less annoying for some people.

But the thing is, I've been getting lots of these suggestions over the last two weeks, and they all have one thing in common: no one was asking for them before the ComicVine search API broke, and no one will really need them once the search API has been fixed. I just don't have the spare time to work on something like that.

cYo wrote:
But it would still be great to add some stuff like looking up existing scraped comics in the database and use this info to get better results.

For what it's worth, when I originally looked into this idea, I settled on a different approach to try to get the same effect: the scraper "remembers" which series you have scraped in the past, and then biases the sort order of your search results so that the remembered series (basically the series in your collection) are more likely to appear at the top of the list. This is fast and efficient, even if it isn't as robust as checking against all of the other comics in the folder. Of course, if the correct series isn't in the list in the first place, then this approach doesn't help at all. :unsure:

I know that you're currently only in maintenance mode of the scraper (like I'm with ComicRack), but as this is open source, maybe some people would be willing (and you allow) to join your project on google code and contribute.

I try discourage people from "helping" me with the scraper, for two reasons. 1) I don't want to be responsible for fixing bugs created by someone else, and 2) I'm really picky about which new features get added into the scraper. I feel really bad rejecting features that someone has already put a lot of effort into implementing.

That said, I have no problem if someone wants to create and distribute their own fork of Comic Vine Scraper. It is, after all, an open source project!

If someone is willing to go to that much effort, however, then there is a much easier way to work around CV's broken search API. Just create a small, temporary script that goes through your comic collection and generates CVINFO files. This has been suggested a few times (including by jkthemac) and I think it is definitely the easiest and most bulletproof way to work around the current difficulties.
Last Edit: 3 years 7 months ago by cbanack.
The administrator has disabled public write access.

Comic Vine Scraper suggestion 3 years 7 months ago #38408

  • shaba21
  • shaba21's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 4
  • Thank you received: 1
  • Karma: 0
If you have the comicvine website up, or use the view on web feature from a previously scraped comic, and use the URL of the desired volume to search using the scraper, it will find the issue. It's a little bit cumbersome, but works until CV gets their search working fully again.
Last Edit: 3 years 7 months ago by shaba21.
The administrator has disabled public write access.
Time to create page: 0.555 seconds

Who's Online

We have 199 guests and 8 members online