Welcome, Guest
Python Scripts for ComicRack
  • Page:
  • 1
  • 2

TOPIC: Req? Find Duplicates Based on CVDB#?

Req? Find Duplicates Based on CVDB#? 4 years 4 months ago #34335

  • schroder08
  • schroder08's Avatar
  • Offline
  • Senior Boarder
  • Posts: 52
  • Thank you received: 10
  • Karma: 3
This is something I think would be helpful, if it's possible. I was looking through my New 52 First Wave list the other day and noticed I had a few duplicates. I was lazy at first and flipped on the Show Duplicates under Views and my list emptied. Rather odd, but it probably meant some minor flag was toggled on one of them. I tried rescraping (though I chose not to clear data first), but that didn't help. That's when I thought, hey, I scrape all my comics, therefore even if some minor data changes they should have the same CVDB#, I wonder if there's a way to find dupes that way! I am guessing it would need to be some kind of GUI script though, so one could go through and decide which they want to keep (unless there's a way to get this in a smart list that I didn't think of (also possible)). I also realize this would bring up "duplicates" like if one had a color version and a B&W version of a comic, but that's something that could be skipped over on a manual basis.

So, just throwing this out there, is it doable? I have as much programming experience as I can Google up, although none of that would be practical, nor would I know how to implement it in the best way possible. Thanks for your time and consideration on this.
Last Edit: 4 years 4 months ago by schroder08. Reason: Changed the symbol.
The administrator has disabled public write access.
The following user(s) said Thank You: Saurian333

Re: Req? Find Duplicates Based on CVDB#? 4 years 4 months ago #34353

  • wojosama
  • wojosama's Avatar
  • Offline
  • Gold Boarder
  • Posts: 180
  • Thank you received: 45
  • Karma: 11
If the list in question is a reading list and not a smart list then you may have added some files twice, which I don't believe would be considered a duplicate. EDIT: I thought you could add a book to a reading list more than once but it looks like I was mistaken.

Also, just to put it out there, these are the only fields that are used in determining duplicates (at least in the case of the duplicates smartlist, but I think both methods are the same):
Series
Volume
Number
Format
Year
Month
Day*
Language*

All of them are in the details tab and nothing outside of the Details tab is used.
*not used by scraper AFAIK, so shouldn't be set to anything unless done manually.

If you are scraping everything even without clearing data, as long as you have the overwrite option enabled, it should be showing them as duplicate.

There might a smartlist expression to match the notes field but I don't know. I'll keep fiddling with the expressions and see if I can make it work, but for now, take a look at your files and make sure that they are truly duplicates.
Last Edit: 4 years 4 months ago by wojosama.
The administrator has disabled public write access.
The following user(s) said Thank You: schroder08

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35873

  • schroder08
  • schroder08's Avatar
  • Offline
  • Senior Boarder
  • Posts: 52
  • Thank you received: 10
  • Karma: 3
Sorry to dig this up from the grave, but I felt I had some new thoughts to add in favor of this idea. I'd previously responded (or at least tried to), perhaps twice, but I managed to lose my post both times and so I felt I was not meant to reply then. Anyway...

My problem before about the dupes was that there were random comics with a language of Afrikaans and some with English, while the majority have it blank, so that was one reason they didn't show. However...

I admit that I have some difficulty with consistency (catching Limited Series, including the of number, sometimes labeling something as TPB, or GN, or not, etc), and some of that can be helped by better organizing my setup and using the tools available to me. Even so, since I do scrape all comics, that would be one consistent thread that could be used to compare and look for duplicates. This is useful now, I notice that there are re-scans of older comics and if one were to add that to their library but not select all the same flags as their older one, taa daa, they have a dupe that doesn't show up as one.

My other pro argument for this is same name/volume but different other stuff, like Publisher. For example, The Uncanny from Wyldcard and Uncanny from Dynamite. I can ignore them but I don't like my Duplicate smartlist to sit the with 2 comics in it.

Anyway, this was just some additional thoughts towards this, if anyone has the skill/interest to be able to implement. Thanks!
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35901

  • Caliph
  • Caliph's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Thank you received: 13
  • Karma: 8
Save the following code to a file named 'ComicVineSmartLists.py' and place it in the Comicrack/Scripts folder. Make sure you restart ComicRack then create a new smart list. In the new smart list set the field to search to User Scripts and then you should see 'Find comics missing CVDB id' and 'Find duplicate CVDB'. Pick either one and it should generate a new smart list showing either comics missing ComicVine data or all of your duplicates.

A couple of gotchas - I have done little to no testing and there is no standardized way to find the returned ComicVine Id. I stash them in the Tags field in the format 'CVDB######' with the hash marks representing the pure numeric ID. They are also usually stored in the Web field in the form of a URL. This script can find them in either location. If you store it in another format or field then we would have to edit the appropriate places in the script.
#Imports
import clr, re
clr.AddReference('System.Windows.Forms')
from System.Windows.Forms import MessageBox
from System.IO import *

#Entry point

#Debugging switches
debug = True
logging = True


#@Name  Find comics missing CVBD id
#@Hook  CreateBookList
#@Key   findMissingCVDB
#@PCount 0
def findMissingCVDB(books,a,b):
    missingComics = []
    for book in books:
        cvid = None
        cvid = findCvidInBook(book)
        if not cvid:
            missingComics.append(book)

    return missingComics

#@Name  Find duplicate CVBD comics
#@Hook  CreateBookList
#@Key   findDupliateCVDB
#@PCount 0
def findDuplicateCVDB(books,a,b):
    print 'find duplicate cvdb'
    #Dictionary of all cvbd tags
    cvdbTags = {}
    #Comics to return
    retComics = []
    for book in books:
        cvid = findCvidInBook(book)
        if cvid and cvid >=0 :
            if cvdbTags.has_key(cvid):
                retComics.append(book)
                retComics.append(cvdbTags[cvid])
            else:
                cvdbTags[cvid] = book

    return retComics

def findCvidInBook(book):
    cvid = None
    #try search in tags for CVDB##### or skip it if CVDBSKIP
    cvid = searchBook(r'(?i)CVDB(\d{1,}|SKIP)',book.Tags)
    #if nothing in tags then search Web field
    if not cvid:
        cvid = searchBook(r'(?i)comicvine.com/\s*/\d{1,}\-(\d{1,})',book.Web)

    return cvid

def searchBook(regex, stringToSearch):
    searchResult = re.search(regex, stringToSearch)
    return extractSearchBookResult(searchResult)

def extractSearchBookResult(searchResult):
    tag = None
    if searchResult:
        tag = searchResult.group(1).lower()
        tag = extractTag(tag)
    return tag

def extractTag(tag):
    if tag:
        try:
            tag = -1 if tag == 'skip' else int(tag)
        except:
            tag = None
    else:
        tag = None

    if tag < -1:
        print 'ComicVine tag was less than -1?', tag
    return tag
The administrator has disabled public write access.
The following user(s) said Thank You: cYo, Saurian333, keving

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35902

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
Might be useful: the latest versions of the scraper always stores the ComicVine series (volume) and issue IDs in the custom data fields for each comic.
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35903

  • Caliph
  • Caliph's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Thank you received: 13
  • Karma: 8
cbanack wrote:
Might be useful: the latest versions of the scraper always stores the ComicVine series (volume) and issue IDs in the custom data fields for each comic.
That sounds promising, can you be a bit more specific with the custom data fields or know how I can access them from the ComicRack API?

Also there is a bug in the script above, I confused the \S and \s regex operators and was searching for the exact opposite of what I wanted. Just attaching it this time since that's probably easier than copying and pasting with python's finickiness with whitespace.

Edit - just pasting the code since I can't figure out how the attachment feature works
#Imports
import clr, re
clr.AddReference('System.Windows.Forms')
from System.Windows.Forms import MessageBox
from System.IO import *

#Entry point

#Debugging switches
debug = True
logging = True


#@Name  Find comics missing CVBD id
#@Hook  CreateBookList
#@Key   findMissingCVDB
#@PCount 0
def findMissingCVDB(books,a,b):
    missingComics = []
    for book in books:
        cvid = None
        cvid = findCvidInBook(book)
        if not cvid:
            missingComics.append(book)

    return missingComics

#@Name  Find duplicate CVBD comics
#@Hook  CreateBookList
#@Key   findDupliateCVDB
#@PCount 0
def findDuplicateCVDB(books,a,b):
    print 'find duplicate cvdb'
    #Dictionary of all cvbd tags
    cvdbTags = {}
    #Comics to return
    retComics = []
    for book in books:
        cvid = findCvidInBook(book)
        if cvid and cvid >=0 :
            if cvdbTags.has_key(cvid):
                retComics.append(book)
                retComics.append(cvdbTags[cvid])
            else:
                cvdbTags[cvid] = book

    return retComics

def findCvidInBook(book):
    cvid = None
    #try search in tags for CVDB##### or skip it if CVDBSKIP
    cvid = searchBook(r'(?i)CVDB(\d{1,}|SKIP)',book.Tags)
    #if nothing in tags then search Web field
    if not cvid:
        cvid = searchBook(r'(?i)comicvine.com/\S*/\d{1,}\-(\d{1,})',book.Web)

    return cvid

def searchBook(regex, stringToSearch):
    searchResult = re.search(regex, stringToSearch)
    return extractSearchBookResult(searchResult)

def extractSearchBookResult(searchResult):
    tag = None
    if searchResult:
        tag = searchResult.group(1).lower()
        tag = extractTag(tag)
    return tag

def extractTag(tag):
    if tag:
        try:
            tag = -1 if tag == 'skip' else int(tag)
        except:
            tag = None
    else:
        tag = None

    if tag < -1:
        print 'ComicVine tag was less than -1?', tag
    return tag
Last Edit: 4 years 2 months ago by Caliph.
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35904

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
The commands to get custom values are described in the wiki

comicrack.cyolito.com/documentation/wiki?id=developer:api
Last Edit: 4 years 2 months ago by cYo.
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35905

  • Caliph
  • Caliph's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Thank you received: 13
  • Karma: 8
cYo wrote:
The commands to get custom values are described in the wiki

comicrack.cyolito.com/documentation/wiki?id=developer:api
Oh nice, so with ShowCustomScriptValues = true in ComicRack.ini I can see the key for ComicVine and then check that value as well. Will try and get that in tonight.
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35906

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
No need to set the flag, as cbanack chose to save the scraper values without a leading '.', which would have made them private (and the flag would have been needed).

Just turn on the custom settings in preferences.

BTW: Great work with the script :)
The administrator has disabled public write access.

Re: Req? Find Duplicates Based on CVDB#? 4 years 2 months ago #35912

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 232
First off, a +1 karma for giving us a nice script! :-)

I have packaged your script into a crplugin format (this is just a renamed zip file) which can be simple double-clicked to be installed (like any other script). I have also added an icon which will get displayed in the Script Packages dialog.

File Attachment:

File Name: Find Dupli...crplugin
File Size:4 KB




The Custom tab can be turned on by simply checking the relevant option from the Preferences dialog (Behavior tab -> Application section).



If I do have the CVDB info only in the 'notes' tab (and neither in the tags field, nor the custom field), the script doesn't 'see' it:



And lastly, a request to add another script in this to find books which have the custom fields for CVDB empty. :-)
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
Last Edit: 4 years 2 months ago by 600WPMPO.
The administrator has disabled public write access.
The following user(s) said Thank You: Saurian333, keving
  • Page:
  • 1
  • 2
Time to create page: 0.234 seconds

Who's Online

We have 201 guests and 5 members online