Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper

Comic Vine Scraper 8 months 1 week ago #49029

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
krandor wrote:
Thanks Xelloss. Glad my research could be of help. I'll have to test it out tonight. A wonky counter is certainly better then having to wait for 5000 entries to be pulled down.

I am not worried about that, but if the patch filters results it shouldn't filter...

For example, I had to change the condition of perfect match many times, so that "good matches" didn't stop the result search... (the version I uploaded, the last one, for example ignores differences that are not numbers or letters, that stopped the search when the results were ok. There could be many more cases like that, the problem is that with only ONE false negative, all the results that follows will not be dowloaded... so I could tinker a bit more with that to avoid this cases)

A case I fixed, for example, was a comic called Cap'tain something (or something like that). When the script reached that comic, searches such as "Captain" would give a false positive with that one, and all comics after that (as CV see cap'tain and captain as the same word, many comics with captain were after that one) were ignored.

One solution I thought about was to stop the search only if two consecutive comics were incorrect perfect matches, that would be make stoping for a correct match wrong compared less probable... (after all, ALL comics after a not perfect match should be not perfect matches)

Another more robust solution would be comparing how different is each match to the search (with distance algorithms) and once a % of error is reached stop there... but that would be too complex for a temporary patch :P

Also, I would restrict the first search for only 20-30 comics, instead of 100. In most cases the search will find the first not match in the first 20 or 30 comics, and it is a pity to download 100 comics for ALL cases... by restricting it to 20-30 comics, it would improve the search in 90% of the cases, and only make it a little longer in 10% of the cases.
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.
The following user(s) said Thank You: kino13

Comic Vine Scraper 8 months 1 week ago #49030

  • kino13
  • kino13's Avatar
  • Offline
  • Senior Boarder
  • Posts: 58
  • Thank you received: 6
  • Karma: 0
Hello Xelloss,

I have tried your patched version, and I am not saying it is perfect (I had a couple of zero results), but it is a huge improvement on what we had until now.

It is way faster, and yes, the usually correct match is found on the first results.

Thanks a lot, let's wait for more people to try it.

At least we can clear a bit until the API is fixed.

Edit: Ok, I found some comics that give no result. That was expected, but again, this is better than it was before.
As an example "Batman and Nightwing" number 23
with no power comes no responsibility. except that wasn't true
Last Edit: 8 months 1 week ago by kino13.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Comic Vine Scraper 8 months 1 week ago #49031

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
I was working in improving it right now, and I have made a more robust filter... What I need most is exactly what you are giving me, cases of comics not recognised

Please, send me any comic you have problems with to improve the filter :)

I will look for the case you mention and upload the new patch :P

Edit: Please, send me exactly the text that shows the script where no results are given (the script will show you the search it made in an editable field). If you can add the filename of the file, better. In cases of results, but not the one you need, send me only the filename.

From what I am seeing in how the search API works, with a bit of work it could work even BETTER than before (for example for cases of comics that before weren't given for x reason)
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49032

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
kino13 wrote:
As an example "Batman and Nightwing" number 23

This comic is not recognised not because of the api problem or the patch, but because Comicvine renamed it to Batman and Robin (of course in the not patched version it will show because it will show all comics with Batman in its name XD, but it didn't show in the previous version of the API either)

comicvine.gamespot.com/batman-and-robin-...eptance/4000-422432/

All the same, this would be a GREAT case for what I was talking about, using the new API results for comics that are not EXACT matches :)

I tinkered a bit and this should work better (not for this case, for other cases I was trying):

File Attachment:

File Name: cvdb-2.zip
File Size:8 KB


now the filter that decide good matches from bad matches is more "friendly"

Also try with comics with a lot of results, as Avengers or Captain America, it should work ok with them too (giving ALL results in all cases, and filtering what needs to be filtered)

By the way, I don't think they will fix it, because I don't think it is not working as they want it to. I think it is on purpose, because of what i explained in the previous posts... The new search is made so that it will give always result (or almost always), showing correct results first, and not so correct results after that... So that if you mispelled a name, or put a wrong word, it will usually show you what you are looking or all the same as the first results... It IS quite clever indeed (much as google search works). Now the thing is how we say the script to STOP downloading results, after we have what we need, but not before we have it
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49033

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
Yeah batman and nightwing is definitely a corner case and just one of their weird naming/numbering things that both DC and Marvel do (though Marvel is worse with some of their 5.NOW and 5.MU stuff). That one will always be tricky.

I wish I could give you more karma Xelloss... your work and cbanaks helping keep comicrack a very useful program is always much appreciated.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49034

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
krandor wrote:
Yeah batman and nightwing is definitely a corner case and just one of their weird naming/numbering things that both DC and Marvel do (though Marvel is worse with some of their 5.NOW and 5.MU stuff). That one will always be tricky.

I wish I could give you more karma Xelloss... your work and cbanaks helping keep comicrack a very useful program is always much appreciated.

Please, don't compare me with the GREAT developers of this project, what I always do are minor tinkerings and small fixes here and there... I don't make nothing new, or build something from 0. The real work is done by developers such as cbanaks and cyo, I am only trying to give them back SOMETHING for all the work they did :)
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49044

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
Xelloss wrote:
krandor wrote:
Yeah batman and nightwing is definitely a corner case and just one of their weird naming/numbering things that both DC and Marvel do (though Marvel is worse with some of their 5.NOW and 5.MU stuff). That one will always be tricky.

I wish I could give you more karma Xelloss... your work and cbanaks helping keep comicrack a very useful program is always much appreciated.

Please, don't compare me with the GREAT developers of this project, what I always do are minor tinkerings and small fixes here and there... I don't make nothing new, or build something from 0. The real work is done by developers such as cbanaks and cyo, I am only trying to give them back SOMETHING for all the work they did :)

Don't sell yoruself short. You contribute a ton to the community.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49046

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
Played with this last night and working much better and much faster and simialr results to the old CV search. There are a few things that didn't get matched but are the same things the old search didn't match on (stuff numbered with v1, v2, stuff with numbers in front of the name, or stuff named differently then comicvine has it) and I'm used to those and know how to re-search those to get the right result which works.

So far no "misses" that would be different from old CV search.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Comic Vine Scraper 8 months 1 week ago #49047

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
krandor wrote:
Played with this last night and working much better and much faster and simialr results to the old CV search. There are a few things that didn't get matched but are the same things the old search didn't match on (stuff numbered with v1, v2, stuff with numbers in front of the name, or stuff named differently then comicvine has it) and I'm used to those and know how to re-search those to get the right result which works.

So far no "misses" that would be different from old CV search.

Could you write the searches that don't work? (even if the old CV search didn't work either)

The good thing about the new search, is that it can be used for not perfect matches like those, just tinkering a bit with the filter you put.
The administrator has disabled public write access.

Comic Vine Scraper 8 months 1 week ago #49048

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
Xelloss wrote:
krandor wrote:
Played with this last night and working much better and much faster and simialr results to the old CV search. There are a few things that didn't get matched but are the same things the old search didn't match on (stuff numbered with v1, v2, stuff with numbers in front of the name, or stuff named differently then comicvine has it) and I'm used to those and know how to re-search those to get the right result which works.

So far no "misses" that would be different from old CV search.

Could you write the searches that don't work? (even if the old CV search didn't work either)

The good thing about the new search, is that it can be used for not perfect matches like those, just tinkering a bit with the filter you put.

Sure thing. I have a a thousand or so older comics I need to run through and i'll start making a note of things that don't match.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss
Time to create page: 0.246 seconds

Who's Online

We have 151 guests and 2 members online