Welcome, Guest
General discussion about ComicRack

TOPIC: Rule 42

Rule 42 3 years 8 months ago #38674

  • fieldhouse
  • fieldhouse's Avatar
  • Offline
  • Expert Boarder
  • Posts: 89
  • Thank you received: 10
  • Karma: 1
I see a lot of discussion around modifying files, converting archive formats, embedding extra data in archives, etc. What I haven't seen much of (if any) is a discussion of why modifying the original files could be detrimental.

Here's a direct copy of a post from elsewhere that talks about why modifying archives and their contents isn't a good idea. This particular example is focused on one specific file transfer technology but the story is similar for virtually any other mechanism - even cloud storage and deduplication are negatively effected since they look for identical files / file blocks to save on space. Note that the article specifically calls out ComicRack as a "problem child". That bothers me since I see ComicRack as an immensely powerful and extremely useful tool for managing what would otherwise be a jumbled chaos of files. I would much rather see ComicRack be part of the solution instead of being called out as part of the problem.

Just some food for thought and a (lonely) voice in the wilderness that hopefully provides a counterpoint to the general recommendation that the best way to maintain your library is to convert and embed.

=========================

*** Server command: !42
*** Life, the Universe and Everything

=======================================================================

A tale of TTH and repacks
=======================================================================


Basics of DirectConnect
DirectConnect (DC) is a peer-to-peer file sharing system.
You use a program (client) to share your files and download other
people's files.

To do so, the client connects to a central server (hub). The hub
provides a list of other users connected to the hub. You can then
either browse the filelist of a specific user from that list or
do a search for a file among the shares of all users.


TTH (Tiger Tree Hash)
When downloading, you may notice that you are (usually) grabbing
(parts of) the same files from multiple users (this is segmented
downloading). And you may also see that the search results will
be grouped and ordered by the number of users sharing that exact
same file. Both mechanisms work by virtue of TTH.

The TTH (Tiger Tree Hash) is a 39 characters long string of text
that serves as a digital fingerprint of the binary contents of
a file. If two files have the same TTH, they are byte for byte
identical. If two files have a different TTH, there's a binary
difference somewhere, even though they may appear the same.

When downloading a file, your client will ask the other users if
they have a file with the same TTH and if so, will start grabbing
segments of the file from those users too. This can speed up the
download tremendously and will tax the user you originally
started downloading from less. This process is sometimes referred
to as "using alternate sources".


Repacks and the problems they pose
Say you rename the file. No problem, the filename has nothing to
do with the contents of the file. The TTH remains the same and
others will still reap the benefit of using alternate sources
and higher speeds.

Say you alter the contents of the file. For instance, you change
the order of the scans in the archive, add, remove, rename or
change pages in the archive or use a program that stores extra
info about the file in the file itself, like ComicRack can do.
This will change the TTH of the file. Your file is no longer the
same as the other files out there. it will probably be unique.
We call this a "repack".

Your file will end up at the bottom of search results, since
you will be the only one sharing your unique file. That means
probably no-one will download from you. Which means you are
effectively not sharing. And DirectConnect is all about sharing.

In the event that people will download your file, there will
be no alternate sources, since the TTH is unique. You will be
the one they have to download the entire file from, which will
tax your bandwidth.

This is why repacks are detrimental (bad) for DirectConnect and
why OPs (operators, hub police) will kick you for having them.


In summary
Renaming a file = no problem
Altering a file in any other way = repack = bad for DirectConnect
The administrator has disabled public write access.

Rule 42 3 years 8 months ago #38675

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 676
  • Karma: 182
That's the reason why writing info to comic files is turned off by default in CRW if anybody is wondering.
The administrator has disabled public write access.

Rule 42 3 years 8 months ago #38682

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 233
cYo wrote:
That's the reason why writing info to comic files is turned off by default in CRW if anybody is wondering.
I've always felt that the user should be allowed (and informed about) choosing or rejecting this option in the installation wizard itself.
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.

Rule 42 3 years 8 months ago #38683

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1328
  • Thank you received: 508
  • Karma: 182
600WPMPO wrote:
cYo wrote:
That's the reason why writing info to comic files is turned off by default in CRW if anybody is wondering.
I've always felt that the user should be allowed (and informed about) choosing or rejecting this option in the installation wizard itself.

Yeah, I feel that way too. I'm not normally a fan of pestering the user with popup dialogs, but in this case it's too hard to make the right default choice for most users. There are valid reasons for not wanting to modify your cbz files (mostly related to preserving hashes of your files), but there are also valid reasons for wanting to embed metadata in your files (moving your cbz files to a new computer, recovering from database corruption, sharing with other programs that understand ComicRack's metadata format). You can get burned either way. There really isn't a good default behaviour; users should be forced to choose.
The administrator has disabled public write access.
The following user(s) said Thank You: 600WPMPO

Rule 42 3 years 8 months ago #38684

  • kenjio
  • kenjio's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 597
  • Thank you received: 127
  • Karma: 32
This has long been a problem, and has caused no end of arguments with the people "in charge" of that-place-we-know. I'm not going to get into the discussion of how these people behave, but I do see their point.
It's the same with other download methods. I'm a bit OCD with my files. I hate .nfo files and the likes, just as much as I hate scanner images.
That's why what is downloaded remains available in the download folder for a certain amount of time, and I make a copy that I can alter to my liking.
Not the best way, but as long as digital comics are simple zip or rar archives full of images, I don't see how this can be solved.
I'm baaaaaaaaaaaaaaack!!
The administrator has disabled public write access.

Rule 42 3 years 8 months ago #38690

  • fieldhouse
  • fieldhouse's Avatar
  • Offline
  • Expert Boarder
  • Posts: 89
  • Thank you received: 10
  • Karma: 1
kenjio wrote:
That's why what is downloaded remains available in the download folder for a certain amount of time, and I make a copy that I can alter to my liking.
Not the best way, but as long as digital comics are simple zip or rar archives full of images, I don't see how this can be solved.
Hey, sharing is caring. So at least you cared for a while :)


There really doesn't seem to be an easy way to both index the files and keep them recognizably the same for comparison purposes. Extended Attributes and NTFS Alternate Data Streams aren't portable. ComicBookInfo JSON stored in the zip file's comment doesn't work with many readers including ComicRack and cdisplay. Storing comicinfo.xml as a separate file doesn't seem supportable...

The crazy thing is, zip, rar, etc are just being used as collection containers since the images are already compressed so you may as well just create a new container format that has a hash of the data independent of the metadata section. Something similar to how .torrent files are structured as bencoded chunks of info with a SHA hash but still allow the modification of trackers, comments and some flags.

Wish there was a good solution.
The administrator has disabled public write access.

Rule 42 3 years 8 months ago #38691

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 676
  • Karma: 182
fieldhouse wrote:
The crazy thing is, zip, rar, etc are just being used as collection containers since the images are already compressed so you may as well just create a new container format that has a hash of the data independent of the metadata section.

World would be already a better place if we could get rid of all the solid archive craziness..
The administrator has disabled public write access.
Time to create page: 0.208 seconds

Who's Online

We have 209 guests and 2 members online