Welcome, Guest
News and Announcements

TOPIC: Comic Vine Scraper 1.0.44-47

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21201

  • Samael69
  • Samael69's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 381
  • Thank you received: 47
  • Karma: 21
Hey Cory, I have a ton of files that require scraping, that have leading numbers (Story-arc sequencing). Currently these have to be manually scraped one at a time. Could I modify __cleanup_search_terms method to ignore leading numbers? I haven't done Python, but if I modify the cvdb.py script, will it automatically become "live" or do I have to compile it in some manor? One should think it would be as simple as a regex string sub. Similar to what's already there.

Thanks.

**Update**
Add this to the method seem to do the trick...I'll just comment it out when I'm done.

search_terms_s = re.sub(r"(\b\d+\b)", '', search_terms_s)
Last Edit: 5 years 8 months ago by Samael69.
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21202

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Yes, that should work exactly as you are expecting; just change the file and restart ComicRack, no need to recompile or anything. Use python's "re" module, like the examples you see there. For example, to remove all the digits that appear at the start of your comic name, use something like:

search_terms_s = re.sub(r'^\d+', '', search_terms_s)

That will turn:

"11. Batman: The Dark Guy"

into:

". Batman: The Dark Guy"

As long as you make your modifications on the first line of __cleanup_search_terms(), the rest of the code in that function will take care of things like that pesky extra period...

(Feel free to post an example if you want me to suggest a different regex pattern.)


If you have problems, you can even run ComicRack with the debug console, then modify the code more like this:

log.debug("BEFORE: ", search_terms_s)
search_terms_s = re.sub(r'^\d+', '', search_terms_s)
log.debug("AFTER: ", search_terms_s)

That's useful if you need to debug your regex pattern.

Good luck!
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21203

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Samael69 wrote:
search_terms_s = re.sub(r"(\b\d+\b)", '', search_terms_s)
Yup, that looks like it should work, just be aware that it will remove ALL groups of digits from the name, not just the ones at the beginning. If you want to deal with only digit groups ones at the beginning, see the example I mentioned in my last post.

But honestly, removing all of the groups of digits is not likely to impede your searching/scraping very much, so it's probably not a big deal either way. :cheer:
Last Edit: 5 years 8 months ago by cbanack.
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21204

  • Samael69
  • Samael69's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 381
  • Thank you received: 47
  • Karma: 21
It still searches every issue. So, say

015 The Mighty Thor v1 411
029 The Mighty Thor v1 412
035 The Mighty Thor v1 413

While it will remove the leading numbers, it doesn't group them and forces a search every time, as if the "When several comics appear to be from the same series" checkbox is unchecked, but at least it's one less step since I don't have to remove the numbers every time.

Is there somewhere else I can put the "sub" so that grouping works properly?

BTW, thanks for the help. It's kind of an oddball situation, but I have 1200 of these things, so it will take all day to go through them otherwise.
Last Edit: 5 years 8 months ago by Samael69.
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21205

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
There is a function called unique_series_s() in comicbook.py that is responsible for ensuring that comics from the same series are grouped together (so you don't get re-asked about each one.)

code.google.com/p/comic-vine-scraper/sou...tils/comicbook.py#48

Basically, you have to make it so that unique_series_s() returns the exact same string for any two comics that are from the same series -- AND a different string if they are from different series (so no, you can't just make it return "The Mighty Thor" for everything. :pinch: )

Another regular expression subsitution on "sname" (i.e. after the first line of the function), similar to what you've already done would probably do the trick; I'm not at home right now so I can't try it out to be absolutely sure.


Alternatively, if you don't have too many different series, it might just be easier to multi-select them all in ComicRack and set their series names manually--the scraper should use your manually set series name whenever one is available. i.e. select all your "Mighty Thor" Comics, open the comic info dialog, and set their series name to "The Might Thor". Then when you scrape them, they should all be from that same series...
Last Edit: 5 years 8 months ago by cbanack.
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21206

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Samael69 wrote:
BTW, thanks for the help. It's kind of an oddball situation, but I have 1200 of these things, so it will take all day to go through them otherwise.

Hey Samael, you are not alone in this... check this script, and run it before scraping :) . You might need to commit proposed values or make a simple rename of the comics before scraping, though.

Remove Leading Numbers script
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21207

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
perezmu wrote:
Samael69 wrote:
BTW, thanks for the help. It's kind of an oddball situation, but I have 1200 of these things, so it will take all day to go through them otherwise.

Hey Samael, you are not alone in this... check this script, and run it before scraping :) . You might need to commit proposed values or make a simple rename of the comics before scraping, though.

Remove Leading Numbers script
Clearly this is something I'll need to look into for a future scraper feature...
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21208

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
cbanack wrote:
Clearly this is something I'll need to look into for a future scraper feature...

I've been doing this for ever using previously the above script without any problems... :)
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21209

  • Samael69
  • Samael69's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 381
  • Thank you received: 47
  • Karma: 21
perezmu wrote:
Samael69 wrote:
BTW, thanks for the help. It's kind of an oddball situation, but I have 1200 of these things, so it will take all day to go through them otherwise.

Hey Samael, you are not alone in this... check this script, and run it before scraping :) . You might need to commit proposed values or make a simple rename of the comics before scraping, though.

Remove Leading Numbers script

In this case, I want the leading numbers. These are pre-sorted story-arc packs.
The topic has been locked.

Re: Comic Vine Scraper 1.0.44-47 5 years 8 months ago #21210

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Samael69 wrote:
perezmu wrote:
Samael69 wrote:
In this case, I want the leading numbers. These are pre-sorted story-arc packs.

Oh! Sorry, I did not understand right. Here is what I'd do: sort the issues according to these numbers, add Alternate Series as the name of the story arc you want, use the autonumber script in the alt number field, remove the leading numbers and scrape making sure I do not overwrite the Altseries-number, which will now hold the info you need.
The topic has been locked.
Time to create page: 0.392 seconds

Who's Online

We have 270 guests and 4 members online