Welcome, Guest
Python Scripts for ComicRack

TOPIC: Other Ideas for ComicVine Scraper

Other Ideas for ComicVine Scraper 4 years 8 months ago #31122

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Two ideas and one bug actually.
Here is just to show the changes I did so that if you like you can add them to the Scraper, but if someone is interested I can upload eventually a script to be used separately from ComicVine Scraper.

1. Limit the length of summary that is downloaded from Comic Vine. I hate when people write poems (well I do), but sometimes they really write too much, so I would like it to be truncated. (see below to see how I did it).

2. Associate each series to a SeriesGroup and a Genre automatically.(see below)
Example:
Batman=[Batman] [Superheroes: DC comics]
Detective Comics=[Batman] [Superheroes: DC comics]
I use SeriesGroup a lot, so I find it useful
It can be integrated to CV Scraper or it can be used as a stand-alone script, which is what I do. (see below)

3. I would like to not have to click twice when there is only one issue in a series.
You click once to choose the series, and then again you are asked to choose the only-one issue.
I think it's a bug, because in your code you address this question but it still does it.

SO...

To limit the length of Summary I changed a line. I set the limit at 500 characters and added some chaos to not truncate words.
In pluginbookdata.py:
if "summary_s" in ok_to_update:
         self.__crbook.Summary = self.summary_s[:500].rstrip("abcdefghijklmnopqrstuvzABCDEFGHIJKLMNOPQRSTUVZ-'")[:-1] + "..."

About SeriesGroup,
If we write something like this in a configuration file:
Superman=>[Superman] [Superheroes]
Batman=>[Batman] [Superheroes]
Detective Comics=>[Batman] [Superheroes]
The first value is to be found in the series' name, so "Batman" is a match for "Batman", "Batman and Robin" and so on.
The first value in the squares is "SeriesGroup" so Batman and Detective Comics get the same.
Third value is Genre.
After the first match the script stops, so if we call it for a series called "Superman and Batman" it will be grouped as [Superman] in the case I showed.
You can add as many genres as you want for a book, just separate them with a comma, like [Horror, Fantasy].

Here is the code, I actually don't use it integrated with ComicVine but as a side script, if someone finds it useful I can upload a zip.
#seriesName is obvious, and lines is the list of all the lines of
#the text file with all the options, eventually the same as
#the one CV uses.
def FindSerieGroup(seriesName,lines): 
  i = 0
  Found = False
  while Found == False and i < len(lines):
    try:
      match = re.match(r"^(.*)=>\[(.*?)\] \[(.*)\]$", lines[i])
      sr = match.group(1)
      if seriesName.find(sr) != -1:  Found = True  
    except: pass
    i = i + 1
  if Found == True:
    return [match.group(2),match.group(3)]

The two matches are returned to the calling function and the values applied to the respective tags (SeriesGroup, Genre).

For the single issue ..issue, if I am not wrong this change should do the job:
if len(issue_refs) == 1 and not issue_num_s == '' and not force_b: etc...

Ciao
Hic
Last Edit: 4 years 8 months ago by Wolverutto.
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31167

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Items number 1 and 2 are definitely the kind of thing that would go better in a second script; in fact, a couple weeks ago someone started a thread for a general purpose post-scraping data "correction" script. Those two ideas would fit well in that.

comicrack.cyolito.com/forum/16-developer...vine-in-your-library

As for item number 3, can you send me the name of a file that exhibits this behaviour? It currently seems to be working ok for me, but I'd like to try to duplicate the problem that you're seeing.
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31172

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Sorry I forgot to say which file I had changed.
It's directly from scrapeengine.py,
in def __choose_issue_ref , point #2 when you say # 2. if we don't know the issue number, and there is only one issue in ...

if len(issue_refs) == 1 and not issue_num_s and not force_b:
     result = IssueFormResult("OK", list(issue_refs)[0])

but, this is preceeded by:
issue_num_s = '' if not book.issue_num_s else book.issue_num_s

If I guessed it right (not issue_num_s == -always- False, even if issue_num_s == '', and the AND always fails (at least, it does in my PC).

Ciao
Hic
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31175

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Actually in python you can test for an empty string with the 'not' operator. The following code:
x = 'hello'
y = ''
if x:
  print 'x is not empty'
if y:
  print 'y is not empty'
will print only one line:
x is not empty
Thus, as far as I can see, the code you're referring to is working correctly. It should work as follows for each comic being scraped:
  • if there is NO issue number available, but there is ONLY one issue in the series, choose that issue automatically.
  • If there is NO issue number available and the series has two or more issues, show the issue chooser dialog so the user can select the correct issue.
  • If there IS an issue number available, search through the series and automatically choose the issue based on the comics issue number. If none of the available issues numbers match, show the issue chooser dialog so the user can select the correct issue. This happens even if there is only one issue to choose from; the user needs to confirm that choice since it doesn't match up with the comic's issue number.
As you can see, there are still two ways that you might see the issue chooser dialog, even when a series contains only one issue: 1) if the scraper couldn't find an issue number in your comic's filename, or 2) if it found an issue number that doesn't match any of comic vine's issue numbers for that series.

Is it possible that you ran into one of these two situations? If not, please send me the filename for a comic that cause the problem, so I can step through the code myself and see what's going on.
Last Edit: 4 years 8 months ago by cbanack.
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31181

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
I tried as an example: "Marvel Milestones: Venom & Hercules", it shows me the issue form.

Anyway I am falling in love with Python, much better than c++ for small win32 applications.

Thanks for your time.
Bye
Hic
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31182

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Wolverutto wrote:
I tried as an example: "Marvel Milestones: Venom & Hercules", it shows me the issue form.
Is this the name of the file? "Marvel Milestones: Venom & Hercules.cbz"? I need to know whether the filename contains any kind of issue number.
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31183

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
No number at all.
Is it the normal beahaviour?
If that's so I will change the code to adapt to my needs, I like to skip the choice every time there is only number.
Hic
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31184

  • Wolverutto
  • Wolverutto's Avatar
  • Offline
  • Senior Boarder
  • Posts: 42
  • Karma: 0
I am sorry, there's a number.
The file is "200500 Marvel Milestones - Blade, Man-Thing & Satana.cbz"
Does the scraper try to get a number from there?

I think I had misunderstood how you wanted it to behave. Then, in my case it's better to remove issue_num_s completely from the code if I wanted the number to be automatically assigned every time there is only one number in the database.

Sorry if I wasted your time
Hic
Last Edit: 4 years 8 months ago by Wolverutto.
The administrator has disabled public write access.

Re: Other Ideas for ComicVine Scraper 4 years 8 months ago #31185

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Yes, that's what's happening: '200500' is being parsed as an issue number by ComicRack and also by ComicVineScraper. (It seems like 200500 is much too large to be a valid issue number, but we have to allow it because every now and then a comic series will come out with a silly issue number like "1,000,000".)

So the scraper thinks the comic is issue #200500, and since there is only an issue #1 for that series, it asks you to confirm that you want to change the issue number for this comic. If you renamed your comic to "Marvel Milestones - Blade, Man-Thing & Satana.cbz", it would work as you want (only one confirmation).

Don't worry, you didn't waste my time. It's useful for me to go back and review these things from time to time. :)
The administrator has disabled public write access.
Time to create page: 0.195 seconds

Who's Online

We have 252 guests and one member online