Welcome, Guest
Python Scripts for ComicRack

TOPIC: [REQUEST] New parsing script

[REQUEST] New parsing script 6 years 11 months ago #10736

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
Hi,

I know this question was discussed a lot. I tried to modify the scripts I found about parsing filename to suit my needs but since I have none skills in coding, I failed.

I would need such a script because of french and belgian comic books that adopt a special name format.

Example from my filepath and filename (I listed all possible variants for a same name I thought about):
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand - THS - les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand - HS - les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand.HS.les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand, HS, les 12 travaux d'hercules, comma test

S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand - Tome 037- les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand.Tome 037.les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand-T037-les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand, T.037, les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand-T037-les 12 travaux d'hercules, comma test

S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand - Volume 037- les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand, Vol.037, les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand.V037.les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand, V.037-les 12 travaux d'hercules, comma test
S:\Manga - BD\le bon, la brute, et le truand\le 3 bon, la brute, et le truand,Vol037,les 12 travaux d'hercules, comma test

I wrote a regex able to parse all of that to:
Serie Volume or Number(as HS that means special issue) Title

The regex successfully tested in RegexBuddy is
^.*\\(?<Series>.+?(?=\.|,?\s?T\.|,?\s?V(?:ol)?(?:\.|\d+?)|,?\s?HS|\s?-)).*?(?:(?<Number>(?<=T*)HS)|(?<Volume>(?<=\D)\d+))[\s,\._-]*(?<Title>.*$)



First, this kind of regex is designed to mainly work with .Net and I don't know if it could be supported in IronPython script.

If it is, I'd like to a gentle coder take up the challenge to write a parser script using this regex. :)
The administrator has disabled public write access.

Re: [REQUEST] New parsing script 6 years 11 months ago #10738

  • Franck
  • Franck's Avatar
  • Offline
  • Junior Boarder
  • Posts: 31
  • Thank you received: 4
  • Karma: 4
what is your final need ?
once you've extracted serie name, volume and title, what the script is supposed to do? store them into comicrack fields ?
The administrator has disabled public write access.

Re: [REQUEST] New parsing script 6 years 11 months ago #10742

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
Well, if I refer to sample.py (see below) and if I understood its goal, 'ParseComicPath' is designed to run at CR start and retrieve infos tags from filepath\filename. But does 'prefilled' means that the fields will only be greyed with values it guessed? Because I wish at least that if 'Series' exist (see regex above) CR takes the initiative to make this value permanent so my books will be indexed by serie from CR start.


from sample.py:
#
# ParseComicPath - Sample parser for proposed comic values
#
# path : string containing the full path of the eComic
# proposed : object with properties Series/Volume/Number/Count/Year
#
# the values in proposed are prefilled with the guessed ComicRack made.
# You can change these values or replace them as you like.
#
def ParseComicPath (path, proposed):
	proposed.Series = "-" + proposed.Series
	proposed.Title = "This is a sample title"
	proposed.Volume = 1
	proposed.Number = "1.5"
	proposed.Count = 56
	proposed.Year = 2000
	proposed.Format = "Annual"
	proposed.CoverCount = 1

So, to resume, the answer to your question, Franck, is YES.
The administrator has disabled public write access.

Re: [REQUEST] New parsing script 6 years 9 months ago #11695

  • Franck
  • Franck's Avatar
  • Offline
  • Junior Boarder
  • Posts: 31
  • Thank you received: 4
  • Karma: 4
Your regexp is not compatible with python. You should use "Kodos - The Python Regex Debugger"

Modify the regexp FREMEA_PATTERN and save this script as freMea.py
# freMea.py
##########################################################################

import clr, re, nt, time

clr.AddReference('System')
from System import *
from System.IO import *
clr.AddReference('System.Windows.Forms')
from System.Windows.Forms import *
clr.AddReference('System.Drawing')
from System.Drawing import Point, Size, ContentAlignment

FREMEA_PATTERN= r'^(?P<Series>.+?)[ -]+Volume[ -]+(?P<Volume>[0-9].*?)[ -]+(?P<Title>.*$)'
FREMEA_REX = re.compile(FREMEA_PATTERN)

#@Name	freMea
#@Hook	Books
#@Description Parse filename and store them as field
def parseAlbumInfo(books):
	for book in books:
		print "book.FilePath="+book.FileName
		nameRegex = re.search(FREMEA_REX, book.FileName, re.IGNORECASE| re.DOTALL)
		if nameRegex:
			book.Series = nameRegex.group('Series')
			book.Number = nameRegex.group('Volume')
			book.Title = nameRegex.group('Title')
Last Edit: 6 years 9 months ago by Franck.
The administrator has disabled public write access.
Time to create page: 0.192 seconds

Who's Online

We have 203 guests and 7 members online