Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper 1.0.53-58

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29412

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
cbanack wrote:
Ah yes, those algorithms do look pretty straightforward. If they work as well as we're hoping for matching comic book covers, this feature may be something I can do. I will look into it as time permits over the coming weeks...

EDIT: Oh yeah, and thanks for the links + source code, ComicTagger and fK. :)

Wow, Cory, if you could add something like this to the scrapper you'd be even more loved than cYo :P

BTW, if memory serves me well, sometime ago you were teasing with a standalone version of the script, right? Any news on that?
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29414

  • Madmatx
  • Madmatx's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 457
  • Thank you received: 63
  • Karma: 19
I just scraped a bunch of new comics and I love the new feature that shows the cover art for the issue being scraped. It's nice to know for sure you have the correct series/volume without having to hit the Show Issues button.

Thanks!
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29430

  • forkicks
  • forkicks's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 871
  • Thank you received: 109
  • Karma: 37
Heh, i just realized i was the one that said that matching images was very hard to do in the other thread where this is mentioned.

But I am now convinved it would absolutely rock :). Imagine this, getting a bunch of new books, pressing "scrape", wait a few seconds, presto, no prompts, no questions, no nothing, all books scraped automagically.

fK
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29438

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
perezmu wrote:
Wow, Cory, if you could add something like this to the scrapper you'd be even more loved than cYo :P
I agree, it would be a really great new feature, and I'm definitely going to look into it seriously in the next little while (before summer, for sure, since that's when I usually put all my side projects on hold and go outside.)

But for now, I make no promises that this will happen (see the response to fK that I'm about to write.)
BTW, if memory serves me well, sometime ago you were teasing with a standalone version of the script, right? Any news on that?
Yeah, I did start doing that a while ago. I was trying to trick the scraper into thinking it was still running inside ComicRack, so that it could run standalone. But the code and the user interface turned into a real mess so I gave up, at least for now.

I was actually aiming for something kind of like what ComicTagger has done, so I guess maybe someone else beat me to it. :) If I do ever make a standalone scraper, I think I'll write it from scratch in Java, so it can run on any OS.
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29439

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
forkicks wrote:
Heh, i just realized i was the one that said that matching images was very hard to do in the other thread where this is mentioned.

But I am now convinved it would absolutely rock :). Imagine this, getting a bunch of new books, pressing "scrape", wait a few seconds, presto, no prompts, no questions, no nothing, all books scraped automagically.

fK

I keep teasing in my mind with some magic script that would use image recognition techniques to automagically tag ad pages... ;) ... anyone?
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29441

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
forkicks wrote:
But I am now convinved it would absolutely rock :). Imagine this, getting a bunch of new books, pressing "scrape", wait a few seconds, presto, no prompts, no questions, no nothing, all books scraped automagically.
It would be pretty great if I can make it work like that, but at the same time, it would be nearly useless if I can't. Imagine having to go through every comic after scraping, comparing the titles to the covers manually just so that you could find that 1 out of 50 (or 100 or 500) that got scraped wrong. If you're forced to personally verify the scraper results anyway, it would be faster (and probably way less irritating) to just check each book as you scrape it, like we do now.

So in terms of image identification algorithms, there's two issues: false positives (where the algorithm thinks two comic covers are the same, but they aren't) and false negatives (where the algorithm thinks two comic covers are different, but they aren't).

An automatic scraping feature is really only a good idea if one of the image matching algorithms can be made to work with basically NO false positives at all. A few false negatives aren't a big deal, because it just means you'll have to identify those comics manually, but any false positives will lead to incorrectly scraped comics.

Not that I'm trying to be overly negative; I just don't want to promise anything just yet in case I can't deliver. I did read through a couple image matching algorithms and papers last night, and it looks like some of them might be really good about not giving false positives, so it's definitely worth further investigation.
Last Edit: 4 years 10 months ago by cbanack.
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29444

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 233
cbanack wrote:
So in terms of image identification algorithms, there's two issues: false positives (where the algorithm thinks two comic covers are the same, but they aren't) and false negatives (where the algorithm thinks two comic covers are different, but they aren't).

An automatic scraping feature is really only a good idea if one of the image matching algorithms can be made to work with basically NO false positives at all. A few false negatives aren't a big deal, because it just means you'll have to identify those comics manually, but any false positives will lead to incorrectly scraped comics.
What we need is Low sensitivity and high specificity:



You can never get NO false positives. You can only reduce false positives to a minimum. So, while image comparisons are cool, I doubt if they would be able to totally eliminate the need for user interaction. And talking of image identification capabilities, why not have them incorporated in ComicRack to identify duplicates? I bet that would be way less tough. Just saying, you know. That way, we can know if this stuff really works.
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
Last Edit: 4 years 10 months ago by 600WPMPO.
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29446

  • forkicks
  • forkicks's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 871
  • Thank you received: 109
  • Karma: 37
Oh, i have no problems admitting this isn't easy (i already did), or that it's not without issues. But an approach like (very simply) subtracting one image from another then checking if any non-black pixels exist (meaning it didn't match) is basically foolproof. If one image minus another equals total black, then its the same image. Any other cases would need human interaction. I also know this is lollypop land and is not real, because the images don't have the same size and resizing will introduce artifacts that prevent this (naive) method from working.

But using other, more robust methods will give you results that can be interpreted as confidence levels. If there is a (very) high level of confidence, make it automatic, if not, ask the user. Or just put a button labelled "Suicide button, you're on your own" and the user can choose to let it go automatic. I'm only saying this because of all the comics i have tried, and it's been quite a few, it hasn't failed once. I am highly confident on the scraper picking the right comic at this point - i was even before the right cover appeared, but now it's even more obvious.

It's a fine piece of software.
fK
The topic has been locked.
The following user(s) said Thank You: 600WPMPO

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29475

  • fieldhouse
  • fieldhouse's Avatar
  • Offline
  • Expert Boarder
  • Posts: 89
  • Thank you received: 10
  • Karma: 1
It would be great if we could convince the scanners that they need to publish crc hash info to a common database similar to what the anime community has with anidb.net. Then as long as you have the original unaltered file it is recognizable no matter how how the filename is changed. Since it's doubtful that will happen, visual characteristics seems like a good alternative.

Image matching sounds really useful. I could definitley skim throug a page of possible matched thumbnails faster than I could read through a page of all of the metadata for the same comics.

You could use some of the suggestions on stackoverflow or why not just use an existing image matching service like Tineye or Google's "search by image"?

For example:
The topic has been locked.

Re: Comic Vine Scraper 1.0.53 4 years 10 months ago #29503

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
fieldhouse wrote:
You could use some of the suggestions on stackoverflow or why not just use an existing image matching service like Tineye or Google's "search by image"?
Well, we're not really trying to use image matching to find out which comic a given cover image belongs to--google or tineye could help with that, but the scraper is already pretty good at doing it just based the comic's filename.

This is how scraping a comic usually works:
1) the scraper shows your comic cover, alongside the cover of the ComicVine issue that it thinks is a match.
2) you look at the two images and decide if they are the same, and if so, you click ok and the scrape proceeds.

What we're talking about is automating step 2. So if the two images are the same, don't even show them to the user, just "click ok" automatically and keep going. As I mentioned in a previous post, this will require a very good image-matching algorithm, so we don't automatically "click ok" when we shouldn't. Otherwise people will be forced to always double check their scraped comics just to make sure that none of them were scraped wrong.

On the bright side, a simple google search turns up many different kinds of image matching algorithms; I plan to implement a few in the coming weeks to see how well they work.
The topic has been locked.
Time to create page: 0.470 seconds

Who's Online

We have 248 guests and 2 members online