Welcome, Guest
Python Scripts for ComicRack
  • Page:
  • 1
  • 2

TOPIC: Problems and Solutions - put your Ideas for smartlist scripts here!

Problems and Solutions - put your Ideas for smartlist scripts here! 4 weeks 2 days ago #48299

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
As you already know if you read my topics, I am a fan of custom smartlist scripts and things you can do with them... Anytime I have a problem doing a list with what I want and cannot find a way to do it with the current tools I have, I try making a tool for this (and other situations, trying to not make the tools too specific)

So, the idea with this topic is for everyone to talk about problems you find making smartlist, and if there is no solution with the current options and scripts, make a script that help with that :)

Have you tried making a smartlist and couldn't find a way to do it? post it here! (even if you think it is imposible to do!)
Last Edit: 4 weeks 2 days ago by Xelloss.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 5 days ago #48436

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 15
  • Thank you received: 1
  • Karma: 0
Hi Xeloss,

Thanks for the offer ;) Very timely but zero problems if you can't help as this is quite specific to my current situation.

1)
I'm just having to rescan my entire (large) library after some hard disk crashes and rescues. I've ended up with some horrible duplication. Sometimes the file/size will be exactly the same and I can detect this through a tool like doublekiller (it detect duplicates through checksums). However this isn't perfect as sometimes the file will be subtly different after conversion or numerous scans with library organiser.

Now, assuming I've scanned and filed them I can detect duplicates through detecting whether the filename has (1), (2) etc in it (I'm talking about windows naming of duplicated in this case). But that won't list the other duplicate in the set that doesn't include the (1).
Or I can detect them through the duplicates script to see if the CVDB ID is the same, but that will include all scans, even if they are significantly different.

But is there some way you can think of to filter when the filesize is within a certain range of another file (and therefore probably from the same scan originally). e.g. filename is the same and filesize is within 10 (or 100 etc) kb of each other. That would allow me to remove all the duplicates that are very similar, then I can deal with the other cases where the duplicate exists because the scan is significantly different and I need to make a conscious choice which one I'd like to keep.

I suspect I really just need to sit down with a coffee and look properly at the duplicate script that already exists (?)

2)
Secondly, can you think of a more general way of putting ranges into smartlists that wouldn't normally have them e.g. Series starts with characters A-M/a-m . May just be a regular expression but I'm just trying to understand them.

3)
A non-related one is if you've come across any tools for checking the integrity of cbz files and find any that have unreadable images so I can delete them - they appear in the GUI when using the cbz export function but that isn't idea when dealing with a lot of them. I'll try the perl script that's in the list again but remember that didn't work for me before. Worst case I'll write one for 7zip to test them all but not confident I'll catch them all.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 5 days ago #48438

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
About the first problem (let's analyse that one first)

What you are trying to indentify are duplicate comics... and but that you mean:

1- Same comicbook
2- Same comicbook of the same scanner

?

If it is the first, if you have your comics comicvinescraped it is as easy as to use my SameXdifferentY script and find comics with same comicvine_issue custom value (if you lost the custom value, you can copy them from your notes with a script I have)

If it is the second, that is a more difficult problem, as as you said, comic files can be a bit different because of how comicrack store data. However I have been working in a already usable script that find duplicate pages in different books. With a bit of tinkering, I could make it so it finds, for example, comics that has more than 10 pages in common, and so, they are surely duplicates :) (this method use image hashing, so as long as the images are exactly the same, the images, not the comic file, it would find duplicates quickly)

Tell me which case would be, and I will help you with either case :P
Last Edit: 2 weeks 5 days ago by Xelloss.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 5 days ago #48439

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
A quick answer to 2 is indeed a regex smartlist:
Name "A-M"
Match [Series] regex "^[A-Ma-m]"

(Open a new smartlist select query and paste this in over whatever is there.)

I am not convinced there is a practical reason for such a script.
Last Edit: 2 weeks 5 days ago by jkthemac.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 5 days ago #48440

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
About 3, as comic files are compress files (rar, tar, zip, 7zip, etc), you can use any compression integrity tool for that... outside comicrack...

I could make a script that open file by file, page by page, and look for errors... but it would be no optimized at all... I am sure there are good programs for this that are much better solution than that...
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 4 days ago #48449

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
Xelloss wrote:
About 3, as comic files are compress files (rar, tar, zip, 7zip, etc), you can use any compression integrity tool for that... outside comicrack...

I could make a script that open file by file, page by page, and look for errors... but it would be no optimized at all... I am sure there are good programs for this that are much better solution than that...

Indeed, I use Winrar for this, which can be used to test a whole directory of archives at once for integrity, and always seems very quick.
Last Edit: 2 weeks 4 days ago by jkthemac.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 4 days ago #48451

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 204
  • Thank you received: 21
  • Karma: 4
For the duplicate issue, the duplicate manager script has the ability to tell is to watch things within certain filesizes, etc when deciding which one to keep. My main issue with the script is that it doesn't use CVDB values so I hacked together a very messy version that does the duplicate identification by CVDB value instead of series, volume, etc. which caused me issues with Graphic novels and with marvel possibly having 2-3 #1s in a single year and how to tell them apart.

If somebody wants a copy of my messy hacked DM script to clean it up a bit then they are welcome.
The administrator has disabled public write access.
The following user(s) said Thank You: Xelloss

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 4 days ago #48452

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
The 'same X different Y' script by Xelloss can also be used to find duplicates with 'comicvine_issue'

One could theoretically use this with another rule to limit the results to specific file size ranges, or you could just group by 'comicvine_issue' and make decisions on each one.
Last Edit: 2 weeks 4 days ago by jkthemac.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 4 days ago #48462

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
jkthemac wrote:
The 'same X different Y' script by Xelloss can also be used to find duplicates with 'comicvine_issue'

One could theoretically use this with another rule to limit the results to specific file size ranges, or you could just group by 'comicvine_issue' and make decisions on each one.

I also made a script that scrap the comicvine_issue from the notes if the custom values were deleted in a reimport... If anybody is interested I could post it here in the forum :)
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 2 weeks 4 days ago #48464

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 15
  • Thank you received: 1
  • Karma: 0
Arse! Just wrote a long reply and it crashed. It was probably too wordy anyway.

Thanks for all the suggestions. I think it's probably best if I go look at some of these other scripts as they sound hopeful. More generally I need to catch up with a lot of the recent efforts you have all put into the forum. Thanks in advance for the undoubtedly useful things I'm about to discover.

Xellos
You're right I'm currently concerned with "2- Same comicbook of the same scanner". Thought in my examples I won't necessarily have captured the scanner for some of the first comics I scanned so can't rely on that field. Hence why I was trying to judge from filesize.

That said I can genuinley see the use (outside of my specific current situation) for anything that does actual hashing of the jpgs if you'd really be willing to tinker with that - going forward it would help detect reposts of comics for example. I like hashing, it would give near 100% confidence that it's the same, rather than having to use some judgement. Even better, perhaps, if it could hash the pages (5-10 feels right to me) but also doublecheck that the total number of pages was the same.
I'm not sure if you've ever come across 'doublekiller' but it's something I use extensively - hashing gives me that confidence to just press delete, rather than second guess the answer.
What it then allows you to do is easily choose just one file from each set of duplicates to delete; just to help ensure you don't accidentally delete both. Helpful when working with a lot of files. It would be the answer here if the files weren't subtly changed of course.

What's the output of that, could it be directed to a text file. After this first pass of the comicvine scraper I'm going to have to catch quite a few examples where it's detected volume 2 of a series as volume 1 (and therefore a duplicate but an incorrect one). It would seem that the hashing method would be excellent for detecting those problems.

I also need to look myself a litlte more of what information is preserved in the cbz files - I didn't know, for example, that notes was still there. You mentioned you had a scripts for recovering the CVDB ID back from that to the custom field? Could I have a look please.

jkthmac- thanks for the regex I understand the regex better now I've seen the answer ;) Just reading through your excellend guide on regex so thanks for both.
As an aside It's actually very helpful for me at the moment to both:
1) Deal with more manageable subsets of the extensive library to work on whilst I rescape it.
2) Help spilt some of the library over disks until new ones arrive. I can just tag quickly, rather than have to put the whole lot through datamanager.
That said - I agree, I suspect it will be of use to others

Krandor - love to look at that script if possible please, and will look at duplicate manager again properly. I suspect it's the answers to a lot going forward. If, as I seem to remember, it allows checking two files are of a similar filesize (e.g. withing 10% and/ or 1Mb of each other) then I think it's a good answer.

It's that problem of the relationship between two files I've been trying to get my head around...
Not - check that series is the same and filesize is 10-11Mb
But instead - check series is the same and filesize is within 1Mb of each other

3.
On the checking of archives - totally agree that winrar/ 7zip would be a good answer and simple to script. I just wasn't aware whether comicrack was doing something clever to check that the image files were actually valid jpg files, as I (perhaps wrongly?) presume 7zip if just checking that the file can be extracted i.e. not that the jpg is itself readable.

Really pleased to see the forum is more active than I previously thought. It's given me the incentive to look much more throughougly of what I've missed and hope to be able to help in return sometime. Thanks again and apologies for length of any lack of clarity.
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Time to create page: 0.198 seconds

Who's Online

We have 216 guests and 4 members online