Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
guides:creating_webcomics [2010/03/25 22:37]
cyo fixed header
guides:creating_webcomics [2016/02/24 01:49] (current)
72.220.136.78 [WebComics]
Line 1: Line 1:
 ====== WebComics ====== ====== WebComics ======
  
-ComicRack supports WebComics (.cbw) files. With WebComics ComicRack can read Comics ​directly from Web Pages and display them as if they where standard eComics (CBR, CBZ). WebComics can be exported to other format. If the definition supports it, WebComics can update itself to add new pages (like for daily or weekly comics).+ComicRack supports WebComics (.cbw) files. With WebComics ComicRack can read comics ​directly from web pages and display them as if they were standard eComics (CBR, CBZ). WebComics can be exported to other formats. If the definition supports it, WebComics can update itself to add new pages (like for daily or weekly comics).
  
 ===== File Format ===== ===== File Format =====
  
-WebComics ​is an XML based file with the .cbw extension. The basic structure is the following:+WebComic ​is an XML based file with the .cbw extension. The basic structure is the following:
  
 <code xml> <code xml>
Line 13: Line 13:
     <​Variable Key="​Base"​ Value="​http://​www.milehighcomics.com/​firstlook/​marvel/​avengers500/"​ />     <​Variable Key="​Base"​ Value="​http://​www.milehighcomics.com/​firstlook/​marvel/​avengers500/"​ />
   </​Variables>​   </​Variables>​
 +  <​Compositing/>​
   <​Images>​   <​Images>​
     <Image Url="​{Base}cover.jpg"​ />     <Image Url="​{Base}cover.jpg"​ />
Line 28: Line 29:
 ==== Variables ==== ==== Variables ====
  
-This is a an optional collection to define ​text you can reuse in the image entries.+This is a an optional collection to define ​textual variables ​you can reuse in the image entries ​(with the {key} construct).
  
 +There are two variables predefined:
 +  * **ComicFileName** is only the name of the WebComic (e.g. Dragon.cbw)
 +  * **ComicFilePath** is the path of the WebComic (e.g. c:​\users\peter\comics)
 +
 +==== Compositing ====
 +
 +With this element it is possible to generate multi image layouts. That means, that every page of the WebComic is built from multiple images.
 +
 +<code xml>
 +<​Compositing ​
 +  Rows="​Images in each row"
 +  Columns="​Images in each colum"
 +  PageWidth="​page width in pixel"
 +  PageHeight="​page height in pixel"
 +  RightToLeft="​True/​False"​
 +  BackgroundColor="​Html Color, default is white"
 +  BorderWidth="​percent of page width"/>​
 +</​code>​
 +
 +There are two layout methods available. If PageWidth and PageHeight is not specified, each page of the WebComics is built out of multiple images that are aligned in a grid of the size rows x columns. ​
 +
 +If PageWidth and PageHeight are specified, every page has the same size and is filled up with images.
 +
 +Additionally each Image element can have its own Composition element. Defining a new Compositing element for an Image will start a new page. The last defined Compositing is the default for all following images. If no image has a definition, the WebComic definition is used.
 +
 +Please note: Use the PageWidth and PageHeight layout only if needed. Using this forces ​ ComicRack to download all images during page discovery and not only when the page is displayed in the reader.
 + 
 ==== Images / Image ==== ==== Images / Image ====
  
-This is a collection of Image entries ​the define the actual pages of the eComics. In the simplest case this can be direct ​link to a image on the internet or it can be complex scraping definitions.+This is a collection of Image entries ​that define the actual pages of the WebComic. In the simplest case this can be direct ​links to images ​on the internet or it can be complex scraping definitions.
  
 ===== Image Types ===== ===== Image Types =====
Line 41: Line 69:
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​Url"​ Url="​http://​somelinke.jpg"/>​+<​Image ​Type="​Url"​ Url="​http://​lala.com/​someimage.jpg"/>​
 </​code>​ </​code>​
  
-This is the simplest type of page link. It just tells ComicRack to add the linked image as a new page. As this type is the default, you can omit the //​PageLinkType//​ attribute.+This is the simplest type of page link. It just tells ComicRack to add the linked image as a new page. As this type is the default, you can omit the //​PageLinkType//​ attribute.
  
 This type supports defining references to multiple pages with one entry. The syntax is This type supports defining references to multiple pages with one entry. The syntax is
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page[format:​a-b].jpg"/>​+<​Image ​Type="​Url"​ Url="​http://​lala.com/​page[format:​a-b].jpg"/>​
 </​code>​ </​code>​
  
Line 57: Line 85:
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page[00:​8-11].jpg"/>​+<​Image ​Type="​Url"​ Url="​http://​lala.com/​page[00:​8-11].jpg"/>​
 </​code>​ </​code>​
  
Line 63: Line 91:
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page08.jpg"/>​ +<​Image ​Type="​Url"​ Url="​http://​lala.com/​page08.jpg"/>​ 
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page09.jpg"/>​ +<​Image ​Type="​Url"​ Url="​http://​lala.com/​page09.jpg"/>​ 
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page10.jpg"/>​ +<​Image ​Type="​Url"​ Url="​http://​lala.com/​page10.jpg"/>​ 
-<​Image ​PageLinkType="​Url"​ Url="​http://​lala.com/​page011.jpg"/>​+<​Image ​Type="​Url"​ Url="​http://​lala.com/​page11.jpg"/>​
 </​code>​ </​code>​
  
 ==== BrowseScraper ==== ==== BrowseScraper ====
  
-The browse scraper ​type is intended for WebComics that do not have an index page, but rather a start page with an image on it and a next button to get to the next page.+The BrowseScraper ​type is intended for WebComics that do not have an index page, but rather a start page with an image on it and a next button to get to the next page.
  
 BrowserScapers are using [[http://​www.regexbuddy.com/​regex.html|Regular Expressions]]. The basic structure is  BrowserScapers are using [[http://​www.regexbuddy.com/​regex.html|Regular Expressions]]. The basic structure is 
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​BrowseScraper"​ Url="​start page link|regex for the image link|regex for the next page link"/>​+<​Image ​Type="​BrowseScraper"​ Url="​start page link|regex for the image link|regex for the next page link"/>​
 </​code>​ </​code>​
  
Line 85: Line 113:
 </​code>​ </​code>​
  
-Or you can define the three parts separately (for example if they are very complex or contain the | delimiter in the regular expression):​+Or you can define the three parts separately (for example if they are very complex or contain the | delimiter in the regular expression). The general form of the Part element is: 
 + 
 +<code xml> 
 +<Part  
 +   ​MaximumMatches="​maximum returned matches"​  
 +   ​Reverse="​True/​False"​ 
 +   ​Sort="​True/​False"​ 
 +   ​AddOwn="​True/​False"​ 
 +   ​Cut="​regular expression">​regular expression</​Part>​ 
 +</​code>​ 
 + 
 +The Cut attribute allows to run a regular expression before the main regular expression is called. The main regular expression will only match on this result. 
 +This way you can also specify the additional attributes:
  
 <code xml> <code xml>
 <Image Url="?​start page link"/>​ <Image Url="?​start page link"/>​
 +  <​Compositing/>​
   <​Parts>​   <​Parts>​
-    <​Part>​regex for the image link</​Part>​+    <​Part ​MaximumMatches="​3"​ Sort="​True"​>regex for the image link</​Part>​
     <​Part>​regex for the next page link</​Part>​     <​Part>​regex for the next page link</​Part>​
   </​Parts>​   </​Parts>​
Line 96: Line 137:
 </​code>​ </​code>​
  
-So lets do this with an example. Our example is a classic daily web comicswww.penny-arcade.com. We want all the 2010 issues.+So let's try this with an examplea classic daily web comics www.penny-arcade.com. We want all the 2010 issues.
  
 <code xml> <code xml>
 <Image Url="?​http://​www.penny-arcade.com/​comic/​2010/​1/​1/"/>​ <Image Url="?​http://​www.penny-arcade.com/​comic/​2010/​1/​1/"/>​
 +  <​Compositing Rows="​5"​ BorderWidth="​3"/>​
   <​Parts>​   <​Parts>​
     <​Part>&​quot;​http.*/​\d\d.*jpg&​quot;</​Part>​     <​Part>&​quot;​http.*/​\d\d.*jpg&​quot;</​Part>​
Line 107: Line 149:
 </​code>​ </​code>​
  
-The //Url// is the start page for our scraper. The first part defines the regular expression for getting the link to the Jpeg image. The second part gets the link for ComicRack to move one forward. If this part does not match, or the link is one that ComicRack already scraped, the scraping ends.+The //Url// is the start page for our scraper. The first part defines the regular expression for getting the link to the Jpeg image(s). The second part gets the link for ComicRack to move one forward. If this part does not match, or the link is one that ComicRack already scraped, the scraping ends.
  
 Also note that as you are in an xml file you need to write special characters like " or > with their XML entities (like &quot; or &gt;). Also note that as you are in an xml file you need to write special characters like " or > with their XML entities (like &quot; or &gt;).
Line 115: Line 157:
 ==== IndexScraper ==== ==== IndexScraper ====
  
-The browser scraper ​is intended for web comics that have a central index page for all the pages. The general format is+The IndexScraper ​is intended for web comics that have a central index page for all their pages. The general format is
  
 <code xml> <code xml>
-<​Image ​PageLinkType="​IndexScraper"​ Url="​index page|[!]regex for page links|[!]regex for page links on these pages|...|[!]regex for the images"/>​+<​Image ​Type="​IndexScraper"​ Url="​index page|[!]regex for page links|[!]regex for page links on these pages|...|[!]regex for the images"/>​
 </​code>​ </​code>​
  
 The scraper supports a chain of n pages to get from the index pages to the actual images. This way it supports links like //Index Page->​Month Links->​Day Pages->​Images on day Page// The scraper supports a chain of n pages to get from the index pages to the actual images. This way it supports links like //Index Page->​Month Links->​Day Pages->​Images on day Page//
  
-The optional //!// in front of a regex tells the scraper to reverse the matches. This is helpful if the index page lists newest first.+The optional //!// in front of a regex tells the scraper to reverse the matches. This is helpful if the index page lists newest first. Alternatively you can specify the attribute **Reverse** to the part element. You can also sort the matches based on the text by specifying the **Sort** attribute.
  
-As with the IndexScraper ​you can also omit the PageLinkType and simply start the Url with an //!// or put the regexes ​into a part list.+As with the BrowserScraper ​you can also omit the PageLinkType and simply start the Url with an //!// or put the regex expressions ​into a part list.
  
 Let's look at an example: http://​www.abandoncomic.com/​ - Abandon: First Vimpire Let's look at an example: http://​www.abandoncomic.com/​ - Abandon: First Vimpire
Line 134: Line 176:
      <​Part>​!href=&​quot;​(?&​lt;​link&​gt;​.+\?​p=\d+)</​Part>​      <​Part>​!href=&​quot;​(?&​lt;​link&​gt;​.+\?​p=\d+)</​Part>​
      <​Part>​src=&​quot;​(?&​lt;​link&​gt;​.*/​comics/​.*\.jpg)&​quot;</​Part>​      <​Part>​src=&​quot;​(?&​lt;​link&​gt;​.*/​comics/​.*\.jpg)&​quot;</​Part>​
-    </Image>+    </Parts>
   </​Image>​   </​Image>​
 </​code>​ </​code>​
Line 145: Line 187:
 ===== How to create WebComics ===== ===== How to create WebComics =====
  
-Go with the browser of your choice to your WebComic page. Descide if you need to create a //​Url// ​bases (simple) or a regex based (//​BrowseScraper//,​ //​IndexScraper//​) ​WebComics.+Go with the browser of your choice to your WebComic page. Descide if you need to create a //​Url// ​based (simple) or a regex based (//​BrowseScraper//,​ //​IndexScraper//​) ​WebComic.
  
 To find the regular expressions,​ select "View Source"​ in your browser and copy the html code into a regex testing tool of your choice. Play around to the the regular expression. If you think you're done put the expressions into the WebComic file and open it with ComicRack. To find the regular expressions,​ select "View Source"​ in your browser and copy the html code into a regex testing tool of your choice. Play around to the the regular expression. If you think you're done put the expressions into the WebComic file and open it with ComicRack.
  
-Please note that ComicRack works with the .NET dialect ​of RegEx. If the expression contains a //link// group, this one is used. Otherwise the matched expression is used.+Please note that ComicRack works with the [[http://​msdn.microsoft.com/​en-us/​library/​hs600312.aspx|.NET implementation]] ​of RegEx. If the expression contains a //link// group, this one is used. Otherwise the matched expression is used.
  
 +If you start ComicRack with the -ssc commandline switch, ComicRack will display a log of all the scriping actions. This may help when debugging e Web Comics.
 ===== Useful links ===== ===== Useful links =====
  
-[[http://​www.regexbuddy.com/​]]+[[wp>​Regex|Regular expression]] 
 + 
 +[[http://​msdn.microsoft.com/​en-us/​library/​hs600312.aspx|.NET Framework Regular Expressions]] - msdn library regex documentation 
 + 
 +[[http://​www.regexbuddy.com/​|RegexBuddy]] - regex testing utility, commercial software 
 + 
 +[[http://​www.ultrapico.com/​Expresso.htm|Expresso]] - same as above but freeware 
 + 
  
-[[wp>​regex]] 

Navigation