Welcome, Guest
Python Scripts for ComicRack

TOPIC: Problems and Solutions - put your Ideas for smartlist scripts here!

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 2 weeks ago #48465

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 582
  • Thank you received: 146
  • Karma: 29
I am currently working in a script that detect scanner information based in duplicate pages... It is yet to be fully released, but the hashing part seems to work quite fine right now... You can find it here:

comicrack.cyolito.com/forum/13-scripts/3...eta-scripts?start=50

If you run it once in any comic, it will scan all comics with scan information in them (I can easily modified it to do it in all comics if you want) and store an array of all pages hashes in a custom value called page_hashes (than it uses for another thing but you don't mind about that part)

Then if you use my Same X Different Y script you can find all comics with same page_hashes, and my First X of Y script to select only the first of each duplicate pair... (or group), and presto! instant duplicate comics deletion XD

Of course I can make a script that does all that automatically, but if you don't want to wait you can just do that XD

If you don't have many versions of the same comic and different scanner, and have them comicvine scraped, you can just use Same X Different Y script to search for files with same series, same volume and same month of release.... That would surely find all duplicates, although it will detect different versions of the same comic as the same...

About the script that copy the CVDB from the notes to the comicvine_issue custom value, I think I have it somewhere, let me look for it and I will post it here when I do (never posted it here I think)
Last Edit: 9 months 2 weeks ago by Xelloss.
The administrator has disabled public write access.
The following user(s) said Thank You: beardyandy

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 2 weeks ago #48466

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 582
  • Thank you received: 146
  • Karma: 29
ok, I remember now, I didn't do a script that only did that... but if you you use this other script I made some time ago:

comicrack.cyolito.com/forum/13-scripts/3...-format?limitstart=0

with the last database (it is in the post), it will repopulate not only comicvine_issue custom value, but also comicvine_volume as well of all comics selected... Remember this only work if you still have the notes from the scraper... (that is stored in the comicfiles if you store data inside the files, not like the custom values that are only stored in the comicrack library)
Last Edit: 9 months 2 weeks ago by Xelloss.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 1 week ago #48504

  • krandor
  • krandor's Avatar
  • Offline
  • Gold Boarder
  • Posts: 310
  • Thank you received: 32
  • Karma: 5
beardyandy wrote:
Krandor - love to look at that script if possible please, and will look at duplicate manager again properly. I suspect it's the answers to a lot going forward. If, as I seem to remember, it allows checking two files are of a similar filesize (e.g. withing 10% and/ or 1Mb of each other) then I think it's a good answer.
.

I'll do so along with my notes. Basically I created my changed version due to some GNs coming out the same year as the comics leading to two "volumes" with same volume number per comicvinescraper. I worked around that with an organizer script based on number of pages The final straw was secret wars where you had a volume that ran during secret wars and then a new ongoing right after (like A-Force) and again with comicvinescraper both showed as same volume. So finally decided then I needed something to decide duplicate by CVDB ID and not Series/Volume.

I'm working on learning python and have some other things I'd like to add to dupliate manager once I learn more so I hope I can eventually fork the project into something I can release but for now I have my own private branch but I'm happy to send you a copy and I'll explain caveats when I send it. It is a hack and nothing more and really want one day to be able to do more then that and make a releasable fork.
The administrator has disabled public write access.
The following user(s) said Thank You: beardyandy

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 1 day ago #48602

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Senior Boarder
  • Posts: 47
  • Thank you received: 5
  • Karma: 0
Hey Krandor,

Not chasing you for that but I've been using Xeloss' hashing script and it's excellent. Although I'm a little unsure of a couple of the details of how it works but if I'm right.

Makes searching for genuine duplicates a definite match. Actually I think it has great potential, if anyone was willing to do it... as it gives a definite fingerprint for each scan there could potentially be a database of each scan and it's relevant comicvine information/ scanner info etc etc. Although that may attract unwanted attention. Not sure if that's where you're going with it Xeloss?
That's something I'd definitely give my data/ time to help with (although skills are lacking)

I don't (yet) have the python skills to hack apart duplicate manager to include the page_hashes field, as you have done with comicrack_issue but if you're serious about developing it further I'd definitely give it a look and perhaps include that page_hashes field as you have done with comicvine_issue. Just a thought

Thank you all so much.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 1 day ago #48603

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Senior Boarder
  • Posts: 47
  • Thank you received: 5
  • Karma: 0
Back to the subject of script suggestions.

Would there be some way to chain a workflow.

Not sure what people's workflow is but mine is largely...
Convert to cbz
Scan with comicvine scraper
Apply Data Manager rules
Update
Move with File manager

I appreciate exceptions would be the problem here and may not work for all, but if there was some way of getting 80% in a single click?
Last Edit: 9 months 1 day ago by beardyandy.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 9 months 5 hours ago #48609

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 582
  • Thank you received: 146
  • Karma: 29
My system would be more or less:

-Move comics to main comic folder
-Scan folder to import new comics
-Scan for Scanner name in file (scanner script)
-Autopopulate data from already organized books to new books (autopopulate script)
- Remove comics added by mistake
-Scrap books with not recognised cvvolume (cvscraper manually)
-Scrap books with recignised cvvolume (cvscraper automatic)
-Decide group, subgroup, format and count (1 if oneshot or 0 if not) of new volume comics
-Complete autonatic field rules (Datamanager script)
- Search for any kind of incongruencies in my 40+ smartlist designed to search for incongruencies in the above procedures and fix them
- Move files to correct folder (Library Organized script)
- Load resolution and delete ads of all pages in all comics, also recognise multiple covers (with pages preview)
- Export files to new, clean cbz files
- Update data into comics
- Add tags with my autotag scripts
- Fix comments (extract cover information) with my clean comment script
- Search for duplicate pages in new comics and delete them (custom script)
- Search for comics with low resolution to download new hd ones if possible (custom script)
- Search for Missing comics (Missing comic Script)
- Search for new events or changes to smart list in collections
- Search for finished series to add finished field Yes and correct count (custom script and manual checking)

I am surely forgetting many steps that some smartlist will warn me to check XD

As you can see there are still a lot of manual criteria and things to check that are imposible to automate... So in my case a one buttom solution is imposible :/
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 8 months 3 weeks ago #48649

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 582
  • Thank you received: 146
  • Karma: 29
beardyandy wrote:
Hey Krandor,

Not chasing you for that but I've been using Xeloss' hashing script and it's excellent. Although I'm a little unsure of a couple of the details of how it works but if I'm right.

Makes searching for genuine duplicates a definite match. Actually I think it has great potential, if anyone was willing to do it... as it gives a definite fingerprint for each scan there could potentially be a database of each scan and it's relevant comicvine information/ scanner info etc etc. Although that may attract unwanted attention. Not sure if that's where you're going with it Xeloss?
That's something I'd definitely give my data/ time to help with (although skills are lacking)

I don't (yet) have the python skills to hack apart duplicate manager to include the page_hashes field, as you have done with comicrack_issue but if you're serious about developing it further I'd definitely give it a look and perhaps include that page_hashes field as you have done with comicvine_issue. Just a thought

Thank you all so much.

The problem with duplicate hash thing is that many comics share some pages... (that is exactly what I use in my script to find comic scanners by the shared pages). But it would be easy to do some script that look for example that two comics share 80% (for saying something) of the scans... That would be for sure a duplicate comic (you cannot suppose the opposite, as the same comic can be made by different scanners and they will have different scan hashes)
Last Edit: 8 months 3 weeks ago by Xelloss.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 8 months 3 weeks ago #48656

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Senior Boarder
  • Posts: 47
  • Thank you received: 5
  • Karma: 0
Xeloss -

I'm not sure I understand.

I'm presuming these shared pages are the scanners added ones 'advertising' themselves?

Just because some of these pages are shared between comics, I'm not sure it's relevant to finding duplicates... in the sense that I'd consider a duplicate having 100% of the same pages for purposes of automatically deleting them. Anything more would need looking at more carefully.
For example, if you've two copies but one page is corrupted, you'd probably need/ want to manually decide on the good one (?)

Actually that raised something - I know in comicrack you can mark a page as deleted in the comic.. I've not used it too much but I presume it's so it doesn't show those 'advertising' pages. I'm hoping/ assuming it doesn't actually delete the page but just sets a flag to not display it (?). As you can now identify those pages (e.g. it turns on in more than 1 file), if you could set that tag through a script that would be potentially helpful (but I'll admit, probably not high on the list)

----
On the hashing...

Interestingly, I have seem to got a couple of examples where the same page_hash value exists for two versions of the same scan with very different file sizes. 700Mb vs 1.4Gb. I had to delete them but am looking out for others. I'm not getting too excited until I see a few more as they may have been nuked releases.

Also, I think I may have examples where 2 versions exist with different scanners but the same hash- but I'm not 100% that I haven't just scraped the incorrect scanner so was waiting for a definite answer. It could indicate some people have re-released other people's scans though.
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 8 months 3 weeks ago #48659

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 582
  • Thank you received: 146
  • Karma: 29
The deleted pages in comicrack are, as you said, flagged as deleted, not deleted... BUT if you export the comic to another format (as I do) with the pages flagged as deleted and with the option to delete deleted pages, the new file will not have those pages...

The shared pages are not always only scanner advertising, but usually othe comics ads, or previews shared in many comics, or a lot of other things... Scanners usually re use pages if they are repeated in many comics... (but they only use their pages, the ones they scanned before for other comic)

A duplicate can have less then a 100% of the pages shared if the version you have was altered before... (mines don't have ads for example), or alt covers added, etc... So that's why I proposed 80% of the pages (to say something)

About to manually look for them, my idea was to mark comics to see them easily, always automatic deletion can be dangerous if you don0t see what you are really doing... In any case you can make it to mark the 100% the same and the >80<100% the same differently :)

"As you can now identify those pages (e.g. it turns on in more than 1 file), if you could set that tag through a script that would be potentially helpful (but I'll admit, probably not high on the list)"

mmmh... I will think about it... :P (it is easy to read that flag, I only question myself how many people use it)

"same page_hash value exists for two versions of the same scan with very different file sizes. 700Mb vs 1.4Gb."

That happens A LOT, specially with scanners that release many resolution of the same comics... as Ad page is usually the same one... YOU CAN NEVER say a comic is the same with only a few of pages shared,,,

About other people using pages from other scanners... Well, my script is based in people NOT doing that... D:
The administrator has disabled public write access.

Problems and Solutions - put your Ideas for smartlist scripts here! 7 months 1 week ago #48788

  • pweasel
  • pweasel's Avatar
  • Offline
  • Expert Boarder
  • Posts: 127
  • Thank you received: 18
  • Karma: 8
Hi there,
I have a severe case of duplicate entries of the same file in several issues.





is there a way to trim the supernumerary entries in the database automagically or with minimal user interaction? (I'm lazy and there are thousands of these)
CRW 0.9.178 x64 on Win10
CRA 1.80 on Nexus 10
Last Edit: 7 months 1 week ago by pweasel. Reason: wrong pic
The administrator has disabled public write access.
Time to create page: 0.224 seconds

Who's Online

We have 207 guests and 2 members online