Welcome, Guest
Python Scripts for ComicRack

TOPIC: Comic Vine Scraper Patch [Not Official]

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49050

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
I create this topic to discuss the patch I am making for the recent problems with the ComicVine Scraper (till an official fix is made)

The last version of the patch:

File Attachment:

File Name: cvdb2.1-2-3.zip
File Size:9 KB


(replace the file in the ComicVine Scraper original folder if you want to test it, remember to restart ComicRack after replacing the file for the patch to work)

The idea of this patch is to make it more usable after the changes to the CV API

(If you want to know more about this, read the Official Comic Vine Scraper Topic in News last posts)

PLEASE POST ALL YOUR PROBLEMS HERE (comics that are showed you would like not to show, comics that are not found, bugs, errors, performance issues, etc) EVEN IF THE OLD SEARCH DIDN'T FIND IT (we can anaylise if we can do something for this comics to show now)
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49059

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
I will post one thing that works with your patch that didn't work before. That is volumes that start with a number beause people named the files in some kind of chronological ordering. I saw this yesturday but wasn't sure if maybe the volume script did something to them but just ran a few comics and had one like this and the volume script did NOT catch it.

What I saw in CR series field was "03 - Incredible Hulks". In the past, this would choke in the scrape and I'd have remove the 03 and tell it to research. Now, it found this one fine.

If this holds true that is a HUGE improvement. Those series where people put numbers in front of series names were a PITA.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49060

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
It probably is because of an improvement I made in the last update...

The differences between how the api searches worked:

Before the API Search Update:

You Search A B C, the API would look for comics volume with A and B and C

You Search for 03 - Incredible Hulks, the API would not show Incredible hulk because 03 was not in the name

After the API Search Update:

You Search A B C, the API would look for comics volume with A or B or C

You Search for 03 - Incredible Hulks, the API would show Incredible hulk because it would show, for example, all comics that had Hulks in the name (or Incredible). Of course it would also show hundreds of comics among this one

With my patch old version:

You Search for 03 - Incredible Hulks, the API would show Incredible hulk because it would show, for example, all comics that had Hulks in the name (or Incredible). Of course it would also show hundreds of comics among this one. BUT the script would not show it because I filtered all comics that didn't have 03

With my patch new version:

You Search for 03 - Incredible Hulks, the API would show Incredible hulk because it would show, for example, all comics that had Hulks in the name (or Incredible). Of course it would also show hundreds of comics among this one. The script would first filter it, as before, but after not findings results, it would then only filter comics with more than 1 word missing... so it ignores 03 missing and show it anyway...


This is one of many conditions I put to better the results :)

Important note: The missing words option is only enabled after "no missing words criteria" returns no result... and it will increment 1 number to the missing word criteria, each time, and only each time, the previous criteria didn't return ANY result. Once 1 result is return, that is the new criteria and it will return all comics with the same amount of words missing (no more). Also, the max amount of missing words criteria is 5/6 or less (truncated as integer) of the amount of words search... This analysis is done only with the first 100 results, if no result was found within the 100 results, no results will be shown (to avoid loading more results in case of not results found). Also even with "no missing words criteria" the script will not filter results with 20% or less of the words missing... (this is for very large searches were mispelling a word is common)
Last Edit: 8 months 2 weeks ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49061

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
Good work!

I just tested another one. I didn't have any more numbers in the front but had a few starting with paranthese (not sure where they came from but I know one time I bulk renamed a bunch that started with numbers because it was annoying and it may have left behind a parentheses behind the,

Anyway, had a group with series of ") The Transformers".

Again, worked fine as far as identifying it as transformers. I had to tell it which volume it was since cover image was different but that is no big deal.

I'll continue to let you know what I find.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49062

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
Filter, script and I think API, only see numbers and letters...

Once of the first changes I made to the filter was exactly that, when filtering results only see numbers and letters... not symbols (as the API does the same)

So ") Transformers" and "transformers" (all in lower letters) are exactly the same for the script
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49063

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
I'm starting to wonder if this OR structure with changes like you have made might not actually be better long term.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49064

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
That was exactly what I was saying in the other topic... However after much trial and error, I discovered the sorting of results in the new API is not as reliable as it should be... so I am not sure this will not cause a lot of bugs we didn't have before...

Also, now we don't know in case of more than 100 results how many results we have to wait for... the bar will show thousands some times, when there are really only hundreds... and will stop downloading them "randomly" (for the user point of view at least)

Also if you paly a bit with the filter rules, you can easily see cases where the filter is not as reliable as I would like it to be...

For example: Try searching Uncany Avengers (uncany with only one n), instead of telling you the comic doesn't exist, it will show you ALL avengers comics (which will take a while to do)
Last Edit: 8 months 2 weeks ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49066

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Senior Boarder
  • Posts: 48
  • Thank you received: 5
  • Karma: 0
Thanks for looking at CVscraper Xelloss

Just got an replicate error- just in case it relates to lines you've changed.

I'm deleting the file anyway (but will keep it for a couple of days in case you want to look)

--------------------------------------------------------------------------------
CV Scraper Version:  1.0.93
Running As:          ComicRack Plugin (CR version 0.9.178)
Cache Directory:     C:\Users\name\AppData\Roaming\Comic Vine Scraper\localCache
Settings File:       C:\Users\name\AppData\Roaming\Comic Vine Scraper\settings.dat
--------------------------------------------------------------------------------

--------------------------------------------------------------------
[X] Series          [X] Volume          [X] Number          
[X] Title           [X] Published       [X] Released        
[X] Crossovers      [X] Publisher       [X] Imprint         
[X] Writer          [X] Penciller       [X] Inker           
[X] Colorist        [X] Letterer        [X] Cover Art       
[X] Editor          [X] Summary         [X] Characters      
[X] Teams           [X] Locations       [X] Webpage         
-------------------------------------------------------------------
[X] Overwrite Existing        [ ] Ignore Blanks             
[X] Convert Imprints          [X] Autochoose Series         
[X] Download Thumbs           [X] Preserve Thumbs           
[ ] Confirm Issues            [ ] Rescraping: Notes         
[ ] Fast Rescrape             [ ] Rescraping: Tags          
[X] Summary Dialog            
-------------------------------------------------------------------

======> scraping next comic book: 'BD FR - Wunderwaffen présente Zeppelin's war - Les Raiders de la nuit v1.cbz'
trying to match this book automatically...
saved window geometry: comicformLocation comicformSize
------------------- PYTHON ERROR ------------------------
Caught UnicodeDecodeError: ('unknown', u'\xe9', 0, 1, '')
Traceback (most recent call last):
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\scrapeengine.py", line 143, in scrape
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\scrapeengine.py", line 257, in _ScrapeEngine__scrape
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\scrapeengine.py", line 426, in _ScrapeEngine__scrape_book
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\automatcher.py", line 38, in find_series_ref
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\automatcher.py", line 75, in __find_best_series
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\db.py", line 156, in query_series_refs
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\cvdb.py", line 143, in _query_series_refs
  File "\\name\portableapps\comicrack\Data\Scripts\Comic Vine Scraper\cvdb.py", line 213, in __query_series_refs

wrote debug logfile: cvs-debug-log-2018-02-06.txt
Last Edit: 8 months 2 weeks ago by beardyandy.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49067

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 596
  • Thank you received: 150
  • Karma: 30
Definetely the lines I added... Mmmh... It seams something about unicode...

Could you post the filename of the file that caused this bug? It really worries me... Didn't think unicode could cause a problem here D:

Which version did you use? The last one?
Last Edit: 8 months 2 weeks ago by Xelloss.
The administrator has disabled public write access.

Comic Vine Scraper Patch [Not Official] 8 months 2 weeks ago #49068

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 313
  • Thank you received: 34
  • Karma: 5
You were asking for things that didn't match. Here is one that is an interesting case.

Filename - "Batman v2 50 (2016) (Webrip) (The Last Kryptonian-DCP.cbz" (correct no closing parenthis)

So CVS Searched for

Batman (The Last Krytonian-DCP

Which of course failed. Just removed the extra stuff and it worked.

Definitely a case where they named the file wrong.
The administrator has disabled public write access.
Time to create page: 0.209 seconds

Who's Online

We have 117 guests and one member online