Welcome, Guest
Python Scripts for ComicRack

TOPIC: [MOD] Import tags from filename with regular expressions

[MOD] Import tags from filename with regular expressions 2 years 4 weeks ago #43800

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
Authors: Yellow with update tweaks by freMea
Version: 0.2
Last updated: October 25, 2015
Note: You may be interested in a more advanced script that made the same job with matches preview and more. Please have a look at Priat plugin.


Description

When scrapers fail or don't provide all you want, it is sometimes useful to apply values found in file name to the tags you want for the selected books. Make it happen thanks to a GUI and regular expression.

Please, see original topic here for more info and sample patterns.

The script supports the following tags

Textual:
Writer, Publisher, Penciller, Inker, Series, Number, AlternateSeries, AlternateNumber, Title, Summary, Notes, Genre, Colorist, Editor, Letterer, CoverArtist, Web, Imprint, Tags

Numeric:
Count, Year, Month, Volume, AlternateCount, Rating


Tips

(Forum post has links limits, so I can give them to you here. Sorry. Search them with your favorite Internet search engine.)

In ComicRack, Python scripts use the .Net regex engine to deal with regular expression. So it is recommended you use a regex checker to test your expressions on your comics paths.

The best offline tool I know is Regexbuddy. But it's not freeware. It supports lots of regex engines included .NET of course. Use it if you plan to test regular expression in other contexts such as web development and so on. It's really professional.

The freeware I know is Expresso. It supports .NET engine only. Test it.

Online checker that works for .NET exists. Regex Storm works for free.


Tweaks and fixes of this modded script

new: script in crplugin package for auto and easy installation.
new: button with icon in browser toolbar next to your other scripts and scrapers.
new: last applied pattern is auto restored as sample the next time you open the script.
new: patterns are saved in a sub folder of your documents directory for easy restore/backup and load/save.
new: open/save pattern file dialog always open the directory where your patterns are stored.
fix: when you opened the open file dialog to load a pattern file but then cancel action, you may have encountered an error.


Installation

  1. Download the zip in attachment.
  2. Extract it.
  3. If you want to have sample and default pattern to start with, move the ComicRack - Tags from filename patterns folder to your documents directory such as you get a path like this (if you didn't modify you documents location before):
    C:\Users\(User_Name)\Documents\ComicRack - Tags from filename patterns
  4. Just double click on TagsFromName x.x (freMea).crplugin to install the script to comicrack.

Thanks to the original author for this script.

I hope you will enjoy improvements and give your feedbacks.

edit: please subscribe to topic to be informed of new release.
Attachments:
Last Edit: 2 years 3 weeks ago by freMea.
The administrator has disabled public write access.
The following user(s) said Thank You: perezmu, jkthemac

[MOD] Import tags from filename with regular expressions 2 years 4 weeks ago #43805

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
Nice to see some work on this script again. Thanks. I have always thought this script could do with a little bit of work.

If you fancy reworking it further there are a couple of things that could make it better.

For a start, it is possible for a Regex expression to match the filename successfully but not actually scrape any data. It would be a good idea to highlight these instances much like a failed match, so that one could select another pattern.

Secondly, I think the word 'tag' was always a poor choice because there is a database entry with that name in ComicRack so it can confuse.

If you really wanted to go to town, it would be possible for a failed match to result in the script trying other patterns and asking the user which of the results found would be preferable (if any) instead of asking the user to select another pattern.
Last Edit: 2 years 4 weeks ago by jkthemac.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 4 weeks ago #43806

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
v0.2 is out and fixes the restore last applied pattern issue in v0.1.

@jkthemac

Nice ideas I'd like to see materialize but I have not so much time and skills to go further.
All the people who wants to contribute to this script dev, fork are welcomed.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43811

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
freMea wrote:
v0.2 is out and fixes the restore last applied pattern issue in v0.1.

@jkthemac

Nice ideas I'd like to see materialize but I have not so much time and skills to go further.
All the people who wants to contribute to this script dev, fork are welcomed.

Out of curiosity what are the key differences betwen this script and Priat (which is the script I tend to sue to import tags from filename with regular expressions)?

Thank you
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43812

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
Thanks for having let me know about Priat existence. I didn't know about it. It seems more advanced. I will test it.

Working on this script was just for fun, to fix some behavior that made the old one unusable for me.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43817

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
Priat's regex engine has some flaws which stop it working properly for some conditional folder structures.

Imagine you had a structure Publisher\Series Group\MainCharacter\Foldername\Filename
where it is possible that you have no {Series Group} or {MainCharacter}

ie my file:
N:\Comics\htdocs\Marvel\Captain America\Captain America Comics V1941\Captain America Comics V1941 #001 (1941).cbz
has no Main character because I don't bother when that replicates my series group.

Priat does not handle the regex that can match this properly, and as far as I can remeber this script did because it was fixed to handle regex in the standard manner.

For example this would be my regex:
^N\:\\Comics\\htdocs\\(?<Publisher>[^\\]*)\\((?<SeriesGroup>[^\\]*)\\)?((?<MainCharacter>[^\\]*)\\)?(?<seriesgroup>[^\\]*)\\(?<dummy>[^\\]*)\\(?<Series>[^\\]*)\sV(?<Volume>[\d]*)\s#(?<Number>[\d]*)\s\((?<Year>[\d]*)\)\....
it works in regex checkers like this one but not in priat.

I think it gets thown by unnamed matching groups.
Last Edit: 2 years 3 weeks ago by jkthemac.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43819

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
What regex engine does Priat use? The online checker you refered use PCRE but can use others if you want but it should be the same as Priat to avoid surprises in results. I guess you asked Priat topic for help, what was the answer if any?

edit: your regex checker only provide PCRE, python and javascript engine. Your regex fails in python or javascript one. I tested it on .NET checker engine and it seems to work. Here is the results:
Match 1:	N:\Comics\htdocs\Marvel\Captain America\Captain America Comics V1941\Captain America Comics V1941 #001 (1941).cbz
Group "Publisher":	Marvel
Group 1 did not participate in the match
Group "SeriesGroup" did not participate in the match
Group 2 did not participate in the match
Group "MainCharacter" did not participate in the match
Group "seriesgroup":	Captain America
Group "dummy":	Captain America Comics V1941
Group "Series":	Captain America Comics
Group "Volume":	1941
Group "Number":	001
Group "Year":	1941

But you have 2 SeriesGroup groups in your regex… Be careful to use tags Priat expects, it may be case sensitive. It make me think that my script doesn't support SeriesGroup.

Priat seems great but the bad news is the plugin is not written in python but in C# and sources are not provided. Plus its author has been offline since 1 year now.
Last Edit: 2 years 3 weeks ago by freMea.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43824

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Thank you for the responses - I suspected there were differences as jkthemac referred to this plugin a few times but never to Priat in his posts.
Given my basic needs (i.e. usually to do with title, year, series), limited knowledge of regex and having never tried to reimport from my full library I had never come across issues.

Thank you
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43828

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
To be honest, I only ever used to use these things as a way of reloading data if the database became corrupted, nowadays I use SQL so I havn't had any problems for years.

I don't think anyone was working on maintaining Priat, until now, but I do remember there being issues with the Import Tags script with non standard interpretations of Regex which were fixed. I havn't tested that on this regex either so I might be wrong in this instance.

My regex really doesn't have anything that should cause different implementations to go wrong. Although obviously the other versions you checked are programming implementations that would need different syntax, they wouldn't be expected to work in this case.

I knocked this regex up yesterday to demonstrate where the issues where so I am not worried about things like case sensitivity etc. although I think Priat is not case sensitive and uses reflection to allow any field to be filled. (I am not currently trying to use that regex for any practical purpose.)

The area of concern is that it struggles with the part that can cope with the SeriesGroup and or MainCharacter not being present. If you use a checker you will see that you can add in a folder after the Captain America folder or remove the Captain America folder and the results stay correct. Adding a MainCharacter field, or removing the SeriesGroup as necessary.

Ironically this is less of a concern in Priat because you can easily tick the issues that do conform and change the regex for those that don't. It just gets frustrating if you try and write code in the interface and it gets fussy despite the code being correct. I remember swearing at it last time I used it properly.

But in an ideal world Priat should conform to .NET fully, and I think the problem is that my example uses unnamed matching groups because I use brackets to allow for the flexibility.

P.S. I think the regex I posted was when I was checking that series group was properly double instantiating series group so I dropped the capitals to see both. But I can't remember now, regex can get complex very quickly and I would need to totally retrace my steps to remember my workings.
Last Edit: 2 years 3 weeks ago by jkthemac.
The administrator has disabled public write access.

[MOD] Import tags from filename with regular expressions 2 years 3 weeks ago #43829

  • freMea
  • freMea's Avatar
  • Offline
  • Junior Boarder
  • Posts: 38
  • Thank you received: 5
  • Karma: 1
jkthemac wrote:
To be honest, I only ever used to use these things as a way of reloading data if the database became corrupted, nowadays I use SQL so I havn't had any problems for years.

Well, there would be no problem like that for any users if CR could save all metadata into cbz/cb7. Just kudos this topic to make a point about it so maybe cYo will finally implement this option.
The administrator has disabled public write access.
Time to create page: 0.239 seconds

Who's Online

We have 203 guests and 4 members online