Welcome, Guest
Python Scripts for ComicRack
  • Page:
  • 1
  • 2

TOPIC: ComicBookXML

ComicBookXML 9 years 7 months ago #937

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Hi everyone,

cYo wrote back in October:

comicrack.cyolito.com/index.php?option=c...=view&id=7&Itemid=21

The last bit about a comic book archive format that was easier to use than cbz and cbr has been on my mind for a while. I decided to go ahead and start hacking something together in python (my first python program) to convert a CBZ file into an XML format. I am basing the XML meta data format on the ComicInfo.xml format so that it should be easy to convert back to CBZ and generate the ComicInfo.xml for import. Ideally, if the format stabilizes, it would be great if ComicRack could just read it natively.

Here are some Pros and Cons of this format and I am interested in feedback. I want to get the format as right as possible, then promote it to the people who actually package comics in digital form.

Pros:

XML is not proprietary, so anyone could read and write the format.
XML is well known and there are many tools for working with it in many languages
XML is easy to create file formats with
XML can be edited without recompressing the archive. This means it will be a lot faster to use than decompressing a CBZ, editing the comicinfo.xml file inside and then recompressing the CBZ
Can hold a scaled down first page image of whatever size
Page order doesn't have to be based on file names
Image data can be written in any order, or easily appended

Cons:

XML doesn't support binary data, so base64 encoding must be used
Because of base64 encoding ComcBookXML is about 35% larger than the CBZ file
Decoding Base64 is an extra step for working with the file
Parsing XML incurs some overhead
Monolithic file format, you can't edit a single image at a time, you have to extract it

There may be other considerations as well, but that's my list so far. The XML file could be compressed which would return the file pretty close to the CBZ size, but you're right back to the same problem of having to decompress to edit the meta data. Using an ODF format might be an idea, but it has pretty much the same pros and cons. I didn't see any new techniques for dealing with the binary data.

Anyway, I wanted to get this out into discussion. With some luck I will have a two way converter done this weekend, at least a passably functional one.
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #938

  • wadegiles
  • wadegiles's Avatar
  • Offline
  • Gold Boarder
  • Posts: 248
  • Thank you received: 3
  • Karma: 20
An example of what you are proposing would be helpful. I'm not sure I understand how the file format you've outlined above will replace CBZ files.

If you want to talk about storing binary images in a text file editable format, one choice would be Postscript. Images can be encoded and the resulting tags inserted with the proper commands. Of course, this would require a Postscript interpreter to be integrated into ComicRack. The files would be rather large as well.
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #941

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Ok, so, here would be a quick example of what I have so far:
<?xml version="1.0"?>
<ComicBookXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<version>0.1</version>
<file>test.cbz</file>
<imagefile>RacerX-Countdown32-p01.jpg</imagefile>
<imagedata>
/9j/4AAQSkZJRgABAgEBLAEsAAD/4SebRXhpZgAATU0AKgAAAAgABwESAAMAAAABAAEAAAEaAAUA
AAABAAAAYgEbAAUAAAABAAAAagEoAAMAAAABAAIAAAExAAIAAAAcAAAAcgEyAAIAAAAUAAAAjodp

...

msB1ih46WW1swv7BZ07UZA8BqRpxfvXLWPJYY0wgNwtiea9BNSbKBs4ieerKLAcQ/I2HBeVrE7AN
pruYVyox4u0aE84vLxDzf5eGTLyW8eFRKNpW56TV6H5C/s4mG2xHt8AoCv/Z

</imagedata>

Obviously the middle portion is truncated. Anyway, That's a base64 encoded image in the middle of XML. You can see from the repeated characters that there is some room for compression of the string, but it would need to be more of a string packing algorithm and not gzip or bzip or anything like that. XML doesn't like binary.

I looked briefly at PostScript, but it has the same issues. The image data appears to be encoded into a string, it doesn't look like base64, but it may very well be. I didn't dig too deeply. Since it is using an encoded format, you have the same file size problems, of similar magnitude using postscript.

I think XML is preferable because the data is self describing, so you can create a format that tells the interpreter what to do with the data. Plus any application that speaks XML could be used to manipulate the data.

With postscript, you need a post script interpreter, which may or may not be part of your programming toolkit. XML seems to be much more widely used and tools for working with it are common in many languages. base64 encoding is also a pretty well known web standard, so that should be easy as well.

I am hoping to get some more progress made on this over the weekend and have something more together in a few days. If the format seems acceptable, hopefully it could make its way into ComicRack. I will also be lobbying ComicBookLover on the Mac to support it if the kinks get worked out.
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #943

  • wadegiles
  • wadegiles's Avatar
  • Offline
  • Gold Boarder
  • Posts: 248
  • Thank you received: 3
  • Karma: 20
Now I get the idea you are working on implementing. Since formating of text and vector graphics are not relavant to an eComic format, Postscript would be overkill. Encoding in Postscript is also Base64.

I like this concept alot. Have you also looked into XSD (XML Schema Definition) for your implementation? This would enable the ability to verify whether a particular ComicBookXML file was valid and allow others to extend the definition of a valid ComicBookXML file.
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #944

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Great! Glad to hear this isn't a total waste of time.

My primary goal is to bootstrap the format and resolve the major kinks. I want to have a working demo in the next few days. If a native format reader is built into ComicRack that would be awesome, but for now my focus is importing via a translation to .cbz and exporting from .cbz to .cbx. I want the format to get stable asap so it can be used.

I am not an XML expert so more advanced XML features will come later. One thought I had is that might be possible to write some XSLT to transform the CBX to html for viewing online, yet another upside to using XML.

I am also looking into XML safe encoding that is more compact, like RLE. Now that I thought of using XSLT I need to try to make the encoding scheme safe for that.
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #976

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Ok, so it took a little longer than expected, but I got a bi-directional CBZ to CBX converter working. It doesn't handle ComicInfo.xml files yet, I hope to get to that soon. It works from the command line, I am sure it is trivial to get it to run from a menu in ComicRack, I just haven't gotten there yet.

The CBX file format needs some work. i am no XML expert, but I will keep working on it.

So, what does it do? If you pass a .cbz file to the script on the command line (CBXConverter.px somefile.cbz) it will convert it to a .cbx file of the same name. If you pass the .cbx file back to the script, it will convert it back to .cbz. If you pass it multiple files, it will convert them to the opposite format, so if you pass 1 .cbz and 1 .cbx file, it will output 1 .cbz and 1 .cbx file.

It doesn't do much error checking, so it will probably fail in a multitude of cases, but I tried it on random comics I have and so far no failures.

Now for the big question: Does anything read a .cbx file directly? Nope, nothing does, you can only convert one to the other. But hopefully, in a few weeks, the format will be more stable and ComicRack will read the files directly.

Does it work with .cbr, .cbt, .pdf or anything other than .cbz? Nope, it won't even work with .zip, so you have to choose a .cbz file or a .cbx file to do the conversion.

.cbx files are larger than .cbz files by about 35%, there might be some solutions for this in the future, but for now, that's just how it is. If you zip the .cbx file it will shrink down to just slightly larger than the .cbz.

Anyway, I am open for suggestions and improvements. I want to integrate this into ComicRack as soon as possible. I also hope cYo will add some kind of direct support for the files in ComicRack, but that probably requires a slightly more functional file format definition first.

Anyway, let me know what you think, it is my first Python script, so give me a little slack.

File Attachment:

File Name: CBXConverter.zip
File Size:2 KB
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #1002

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
So, this is why a CBX format is really cool.

Attached is a very very basic CBX viewer that runs in Firefox 2 (tested on Mac, but should work on any platform). I tested it in Safari 3 and it doesn't work, but I think it can be fixed. I haven't tested in any version of IE or any other browsers.

Right now, all it does is read the cbx and put the pages into the browser as a single web page. It doesn't scale for the screen or any other useful stuff like that, yet.

But all that is possible. It can be extended to have previous and next buttons and a table of contents. A lot can be done with it. The same file could be viewed online then downloaded and viewed in a desktop application that supports cbx.

To use the viewer unzip the files and then put a file called test.cbx in the folder. Open cbx.html in Firefox and you should see your comic.

Let me know if you have problems. I will keep extending this application and the conversion scripts. I will continue to refine the cbx spec as well.

File Attachment:

File Name: cbxviewer.zip
File Size:2 KB
The administrator has disabled public write access.

Re:ComicBookXML 9 years 7 months ago #1020

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Slightly better CBXViewer.

Now you can actually flip through the pages.

The images are scaled to approximately fit on screen. If you want them bigger, use the browser's Make Text Bigger function to scale them to a comfortable size.

Next feature I'd like to figure out how to add is selecting a file from a drop down list. I don't know if that is possible in JS though, so that might not work out.

Attachment CBXViewer.zip not found

Attachments:
The administrator has disabled public write access.

Re:ComicBookXML 9 years 3 months ago #1664

  • AlucardNoir
  • AlucardNoir's Avatar
  • Offline
  • Senior Boarder
  • MangAnime Vampire AlucardNoir
  • Posts: 78
  • Karma: 0
ok, so, i'm aware that there hasn't bean any development in this sector for like 4 months now but if there ever will be, i was thinking, if the .jpeg, .png or .gif normally used in comics were to be converted into .svg prior to the creation of the .cbx file then the entire file will be only .xml with no further encoding, that would of course need a reader that can support .xml and .svg but the advantages would be great, especially considering the way CR works now.
MangAnime Vampire AlucardNoir
The administrator has disabled public write access.

Re:ComicBookXML 8 years 7 months ago #3153

  • unteins
  • unteins's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Hmmm, interesting idea. I wonder if there are any tools to convert between image formats in AdobeAIR. Also, I would need to figure out if SVG is easily supported on desktop applications.

I've come back up from being busy and have some interest in pushing forward on this again. I will do some research and see how it goes. My biggest issue is trying to get anyone besides me to support the effort in the long run. No point in a new comic book file format if I'm the only one using it.
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Time to create page: 0.226 seconds

Who's Online

We have 183 guests and 2 members online