Welcome, Guest
Python Scripts for ComicRack

TOPIC: Work In Progress and a some thoughts...

Work In Progress and a some thoughts... 6 years 10 months ago #11812

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Hello, everyone!

I've been silent for a while regarding my scripts... Many other things have taken my time lately and I have not had the chance to seriously work on any scripts lately... I have a couple of projects half cooked, so I thought I would share them with you to see how much interest there is in them, as a way to encourage and somehow compromise myself to work harder on them.

So, this is my current WIP:

- Improvement of the Infopanel Metadata script: I am trying to implement some buttons for changing the size of the font and/or cover, as cYo requested. Here I am having problems, since my script behaves differently in different computers... I guess there is such a variety of IE engines out there (7, 8, 9, x32, x64...). It is a mesh. I am seriously considering trying to develop this script directly in Flash... Any thoughts?

- Improvement of the Coverflow script: This was much less success than I expected, but I like the idea, and I would like to create something good here, with more info. The ideal would be a combination of coverflow and infopanel metadata script... Again, thinking on doing this in Flash, or at least, make use of HTML5, which would require IE9 update...

- Dupes management: This is just a concept I am working on, not yet written a single line of code, but at least, after much thought I guess I know what I want it to do. It is frustrating for me to spend countless hours removing dupes... so I definitely will try to do something in that respect. The idea is to (i) identify dupes, (ii) go through a series of rules in an given order provided by the user to remove those files, like 'keep all named c2c and delete all noads', 'prefer JohnDoe's scans to JaneDoe's scans', 'prefer larger files', 'prefer more covers'... Of course these will need to be based on (a) filename, (b) number of pages, and (c) file size. This will take a while, but I hope I will at least be able to automate some of the worst work.

Now, some thoughts on scripting an a couple of ideas:

- Hard to keep up: Man, if I look at some of the most evolved scripts out there, I cannot understand them... the level in some scripts is just incredible, I could never do such a thing... meaning, that it is somehow daunting to try to put out my humble 'copy & paste' scripts... Uf... you guys rock!

- Missing issues: I think this, along with duplicates, is one of the most interesting problem for 'scripters' out there to try and solve. I have given this much thought, but since I have other projects at hand, I cannot get into them. So, I share two possible approaches to this problem, in case someone wants to try, or at least discuss here:
  1. Using "What's scanned" lists
  2. : We have, at least for DC and Marvel, well kept and thorough lists of all comics published and what is scanned - I tried with Nerone's Marvel list. The first idea was to convert this list into a '.cbl' comicrack list. Ideally, importing it, would match the comics I have in my library, and create 'fileless' comics for my missing comics. I tried to reduce this effect by (i) importing the cbl list in an empty library, so a fileless entry is created for all the comics, (ii) scraping these comics with CVS, so the naming would match the one in my library, (iii) export the scraped list (iv) import against my library. This worked better, but there are failures still... I am attaching a test list for you to try.

    I think the idea is neat, but does not work too well. I have two problems: (i) the matching between the list and my library is far from perfect, and many fileless comics are created for comics I do have... (ii) if I add a new comic to the library, and need to manually remove the previous fileless comic - the dupes script described above could take care of this easily by 'prefer ecomics to fileless entries'.

    As a side note, I want to stress that this is A LOT of work. First, you need to convert Nerone's list to a complete list of comics, then make sure there are not strange characters ilegal in the cbl files, then scrape the whole thing... I began with it, but eventually gave up. If anyone thinks this is worth pursuing further I can provide the work I did so far.


  3. Using Comicvine
  4. Another possible approach to find missing issues it to check a series against Comicvine. This should not be very difficult to do, if your comics are scraped according to CV: you chose a comic or a series in your library, look it up in comicvine, identifying which series it belongs to, and then compare the issues in Comicvine to the ones in the library... This could probably done locally checking against CVS local cache files, but has the limitation on the quality and copleteness of CV data.

Well, I think that was more than enough for a single post. I really like to have some discussion going, and would love to get more people involved in developing these.

Cheers :laugh:
Attachments:
Last Edit: 6 years 10 months ago by perezmu.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11818

  • Stonepaw
  • Stonepaw's Avatar
  • Offline
  • Moderator
  • Posts: 921
  • Thank you received: 268
  • Karma: 173
Hi perezmu, good to see you are still scripting.
perezmu wrote:
Improvement of the Infopanel Metadata script: I am trying to implement some buttons for changing the size of the font and/or cover, as cYo requested. Here I am having problems, since my script behaves differently in different computers... I guess there is such a variety of IE engines out there (7, 8, 9, x32, x64...). It is a mesh. I am seriously considering trying to develop this script directly in Flash... Any thoughts?
Yes the IE version/non-standards mess is really annoying. If you go with flash, there is no guarantee that the user has Flash installed. It's not too much trouble for someone to install it but it's something to consider.
- Improvement of the Coverflow script: This was much less success than I expected, but I like the idea, and I would like to create something good here, with more info. The ideal would be a combination of coverflow and infopanel metadata script... Again, thinking on doing this in Flash, or at least, make use of HTML5, which would require IE9 update...
If I recall correctly IE9 will not be usable in XP. XP users would have to install something like Chrome Frame in order to use HTML5. Flash or javascript may be a better option.
- Dupes management: This is just a concept I am working on, not yet written a single line of code, but at least, after much thought I guess I know what I want it to do. It is frustrating for me to spend countless hours removing dupes... so I definitely will try to do something in that respect. The idea is to (i) identify dupes, (ii) go through a series of rules in an given order provided by the user to remove those files, like 'keep all named c2c and delete all noads', 'prefer JohnDoe's scans to JaneDoe's scans', 'prefer larger files', 'prefer more covers'... Of course these will need to be based on (a) filename, (b) number of pages, and (c) file size. This will take a while, but I hope I will at least be able to automate some of the worst work.
I have no suggestions here.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11826

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 233
Keep working perezmu!.. I always like to call you the grand-daddy of all scripters, and its good to see that you are still kicking ass !! B)
perezmu wrote:
Improvement of the Infopanel Metadata script..I am seriously considering trying to develop this script directly in Flash... Any thoughts?
Just a coincidence.. I was going to ask you to remove the IE elements from it, if possible. Flash is a great idea. If a user can install .NET framework for using ComicRack™, then why won't they install Flash for using the InfoPanel..?
perezmu wrote:
Dupes management: This is just a concept I am working on, not yet written a single line of code, but at least, after much thought I guess I know what I want it to do. It is frustrating for me to spend countless hours removing dupes... so I definitely will try to do something in that respect. The idea is to (i) identify dupes, (ii) go through a series of rules in an given order provided by the user to remove those files, like 'keep all named c2c and delete all noads', 'prefer JohnDoe's scans to JaneDoe's scans', 'prefer larger files', 'prefer more covers'... Of course these will need to be based on (a) filename, (b) number of pages, and (c) file size. This will take a while, but I hope I will at least be able to automate some of the worst work.
You are on the right track.. with all these options at hand, this would be a great script! Currently, finddups.pl (from malor89's perl scripts) does this the best for me. However, it runs outside of ComicRack™, and thus relies heavily on correct filenames & not metadata. This idea of yours is the most promising out of all the current WIPs. :cheer:
perezmu wrote:
Missing issues Using "What's scanned" lists........ I want to stress that this is A LOT of work. First, you need to convert Nerone's list to a complete list of comics, then make sure there are not strange characters ilegal in the cbl files, then scrape the whole thing... I began with it, but eventually gave up. If anyone thinks this is worth pursuing further I can provide the work I did so far.
I would love to see what insanity have you cooked up, and would try it on an empty library in my other laptop ;) Still, as you say, its a lot of work, and IMHO, largely impractical for finding missing issues. but, its a good one for just keeping all the 'what's scanned' in your ComicRack™ library!
perezmu wrote:
Using Comicvine
A good one would be making fileless entries from the Comic Vine database. Maybe cbanack could add some option in the scraper for this, like "Make fileless entries for issues not present in the ComicRack™ library". Hope he reads this!

All the best...
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
Last Edit: 6 years 10 months ago by 600WPMPO.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11830

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
600WPMPO wrote:
perezmu wrote:
Using Comicvine
A good one would be making fileless entries from the Comic Vine database. Maybe cbanack could add some option in the scraper for this, like "Make fileless entries for issues not present in the ComicRack™ library". Hope he reads this!

I think the only really useful information you might get back from ComicVine would be the first and last issue number for a particular series--there's nothing more than the obvious numbers in between.

An ambitious script writer could try to steal some of the code from Comic Vine Scraper to get this information, but that code is all tangled up with the nightmare of caching data locally--something I intend to remove as soon as ComicVine fixes their API. Besides, there are easier ways to get that information from ComicVine; if someone is working seriously on a script like this, I would be happy to offer some advice.

On the other hand, you could make the problem a lot simpler by just ignoring the first and last issues (just assume that whatever the user has is the correct first and last issue). Then all you are looking for is the gaps in between, which is still pretty useful. You can also take advantage of one useful fact to write a gap-finder:
If your comics have been scraped by CVS, then any comics with the same SeriesName and Volume number (i.e. year) will be from the same real-life series.

You can use this fact to group the comics in your library together, which should make it pretty easy to look for gaps. Or may this is what the missing comic finder script already does? :dry:
Last Edit: 6 years 10 months ago by cbanack.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11835

  • pescuma
  • pescuma's Avatar
  • Offline
  • Expert Boarder
  • Posts: 115
  • Thank you received: 16
  • Karma: 21
cbanack wrote:
I think the only really useful information you might get back from ComicVine would be the first and last issue number for a particular series--there's nothing more than the obvious numbers in between.

Could I add a feature request here: to add an option to save a file, in the same folder as the scrapped comic, with the first and last issues of the serires (and maybe some more metadata)? The filename could be something like "Series Name vVersion.txt". This way other scripts could use this information. In the series info panel script I compute the missing files based on the gaps, but it would be a lot better if the first and last issues where known.

I'd say a file format like:
Series Name: Xyz
First Issue: 1
Last Issue: 124
Special Issues: 0.5 10000
CVDB: http://...
maybe some more data

If i can dream a little bit more here, maybe Stonepaw could handle moving this file together with the comics in his library organizer script (a very good one, by the way). To make this better maybe we could come up with a file name extension like .cdb or something.


@ perezmu

About missing issues: there is code inside series info panel that get all selected comics and create a struct with the first level the series name, the second the volumes and the third the comics in that subset. It also has a wrapper for the book object so you dont have to deal with the Field/ShadowField duplicity everywhere.
Last Edit: 6 years 10 months ago by pescuma.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11836

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Hey there...

Back to the missing issues thing... I thought on using Comicvine or the "lists", not only thinking in special numbers (like 0.5, 1.000.000 and so on), but also in larger gaps in the original numbering... you know this thing of changing volume, then going back, and you find that the same volume is 1-355,500-633 for example... This is what Comicvine could add to a simple last to first check.

@pescuma,

Thanks for that, I will check it. I must admit yours and cbanack's script take me a loooong time to understand, but I will check if I can use your structure.

In any case, I think I will work with the duplicates first :)

Cheers!
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11846

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
Sorry guys, I'm not really interested in expanding the ComicVine scraper to become a generic source of ComicVine data for other scripts--also, there are many people who would not be impressed if I started adding extra files into their comic book directory. :unsure:

The portion of ComicVine scraper code that loads data from ComicVine is actually pretty small, and could be easily modified to work inside someone else's script: basically, the cvconnection.py, xml2py.py, and ipypulldom.py files are all you need. You could use cvconnection.py as a starting point to construct your own queries of the comicvine database. The last two files (xml2py and ipypulldom) are just tools for parsing the xml from comicvine into a "dom" object; they can be used as-is. There's also the cvdb.py file, which shows how to interpret the "dom" object, but otherwise is probably too complicated and specific make it worth copying directly.
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11850

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
I agree on the new files thingy...!
cbanack wrote:
The portion of ComicVine scraper code that loads data from ComicVine is actually pretty small, and could be easily modified to work inside someone else's script: basically, the cvconnection.py, xml2py.py, and ipypulldom.py files are all you need. You could use cvconnection.py as a starting point to construct your own queries of the comicvine database. The last two files (xml2py and ipypulldom) are just tools for parsing the xml from comicvine into a "dom" object; they can be used as-is. There's also the cvdb.py file, which shows how to interpret the "dom" object, but otherwise is probably too complicated and specific make it worth copying directly.

Thanks for the tips. Of course this was my idea, to use code from your script (or even my old one), not to try and expand it. Again, while you keep the local cache, I could even look there, without any need for contacting CV that much... :)

Thanks for your tips!

Cheers! :laugh:
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11851

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 233
perezmu wrote:
..Of course this was my idea, to use code from your script (or even my old one), not to try and expand it...Thanks for your tips!



And life comes full circle..! :laugh:
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.

Re: Work In Progress and a some thoughts... 6 years 10 months ago #11852

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
perezmu wrote:
Thanks for the tips. Of course this was my idea, to use code from your script (or even my old one), not to try and expand it. Again, while you keep the local cache, I could even look there, without any need for contacting CV that much... :)

If you have any questions about my code, or the ComicVine API, please feel free to ask!

Also, be careful if you trying to use CVS's local cache:

1) it won't be in the same directory as your scripts, so you'll have to do some fancy (and potentially dangerous) tricks with the filesystem to even get at the cache files.

2) the format of those files is ... complicated. Most long-time CVS users will actually have their cache in 2 locations, and in two formats, because I've had to change the way caching was done several times. The current scraper code still uses the old cache files, but it is not pretty.

3) those cache files may not always be there--they are just hack that makes up for a major shortcoming in the way that ComicVine presents its search results. If ComicVine ever fixes up their API, removing the cache will be the first thing I do!
The administrator has disabled public write access.
Time to create page: 0.276 seconds

Who's Online

We have 201 guests and 3 members online