Welcome, Guest
Python Scripts for ComicRack

TOPIC: Find Similar Sized Files - Same X different Y ?

Find Similar Sized Files - Same X different Y ? 10 months 5 days ago #50638

  • beardyandy
  • beardyandy's Avatar
  • Offline
  • Senior Boarder
  • Posts: 48
  • Thank you received: 5
  • Karma: 0
Is there a way to find similarly sized files e.g. within 1Mb of each other

Trying to find duplicates so using Xeloss' excellent Same X different Y but I think that will only match exactly

PS I'd use the duplicate manager percentage rules for this but I'm also dealing with mis-scraped comics to don't want to automate it yet
The administrator has disabled public write access.

Find Similar Sized Files - Same X different Y ? 10 months 4 days ago #50642

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 617
  • Thank you received: 192
  • Karma: 37
Mmh... no, my script won't work for that... let me see if I can add that funtion to the script, it shouldn't be hard...


Ok, I have found some problems with the logic of this when trying to apply it.

You say to be similar in size, for example FILEA is 13.2 MB and FILEB 13.7MB are similar with 1 MB error... ok, that is easy...

Now FILEC is 14.4 MB.... that is similar to FILEB, but not to FILEA... which is a problem...

My script groups first by similar groups, and then search for difference... but that applied only if the "similar" comparission is transitive (to make logic partitions). Similar as in "1MB difference in size", is not transitive... ergo, it can be a similar rule for the grouping process...

To make a similar example, imagine a large comic collection... if I divide the comics only if they are 1MB apart from any other comic in the group,. believe me all comics will be in the same group, as there will surely always be a "connection comic" in the middle of the size range...

I could also understand what you are saying as just consider the round number in mb of each comic... but so, let's look at this example:

COMICA 1.51MB > 2MB (rounded to MB)
COMICB 1.49MB > 1MB (rounded to MB)

This is transitive... however...

They are almost the same size... but they would be different for the script...

The same if I just cut the number:

COMICA 1.1MB > 1MB (cut to MB)
COMICB 0.9MB > 0MB (cut to MB)

Then, if you explain the situation you want to solve... perhaps I find some kind of solution to add a feature to the script that can help you with it...
Last Edit: 10 months 4 days ago by Xelloss.
The administrator has disabled public write access.
Time to create page: 0.160 seconds

Who's Online

We have 126 guests and one member online