Welcome, Guest
Python Scripts for ComicRack

TOPIC: CreateBookList Custom Scrips "options" value (and some of my beta scripts)

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46380

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
It is interesting. The false hits are in clusters. If I weed out some of the false hits by excluding words in the series then another three of the false hits will disappear.

They seem to be matching across false hits.

For example I get a false positives on 'Father’s Day' and 'Ragnarök' but if I weed out any series with 'Day' in then Ragnarök is no longer a false positive.

issue 474589 volume 77682
&
issue 473755 volume 75887

Edgar Allan Poe’s The Fall Of The House Of Usher [ Issue 402270 Volume 61310] matched with the afore mentioned Gødland Finale
Last Edit: 1 year 3 weeks ago by jkthemac.
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46381

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
It makes sense with what I was saying in my last post...

Special characters make different texts to appears as the same... so, as you need 2 comics to make a hit, you need pairs of texts with special characters to see the false hit XD

I am working it correcting this now :)
Last Edit: 1 year 3 weeks ago by Xelloss.
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46382

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 295
  • Thank you received: 64
  • Karma: 8
Try just adding this at the beginning of the script:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

It's discouraged, and might blow some stuff up in IronPython, but it seems to be necessary to deal with the non-ascii characters that pop up from time to time. If you find a better solution, by all means post it here.
Last Edit: 1 year 3 weeks ago by boshuda.
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46383

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
boshuda wrote:
Try just adding this at the beginning of the script:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

It's discouraged, and might blow some stuff up in IronPython, but it seems to be necessary to deal with the non-ascii characters that pop up from time to time. If you find a better solution, by all means post it here.

Tried that... didn't work D:
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46384

  • boshuda
  • boshuda's Avatar
  • Offline
  • Gold Boarder
  • Posts: 295
  • Thank you received: 64
  • Karma: 8
Xelloss wrote:
boshuda wrote:
Try just adding this at the beginning of the script:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

It's discouraged, and might blow some stuff up in IronPython, but it seems to be necessary to deal with the non-ascii characters that pop up from time to time. If you find a better solution, by all means post it here.

Tried that... didn't work D:
Really? That prevents the Duplicates manager from blowing up on 'weird' characters. Are you going to have to do a character-by-character check and look for something to blow up with a try block and then do a byte-check compare? Ugh. Whatever. Just thinking it through. Ignore me and post a solution when you find it. I'm curious :).
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46385

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
It is perhaps worth noting that the actual series data in the xml is probably the problem. They are not properly encoded there.

On second thoughts even if i change them that doesn't make any difference ignore me.
Last Edit: 1 year 3 weeks ago by jkthemac.
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46386

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
I think I found the problem and the solution...

I converted all values to strings to make the groups names... (for example volume is an integer, and I needed a string for a group "ID name")

This doesn't work with unicode objects... so it gave "" as a result (that's why every unicode text was the same, they were all "")

I am trying now to try to convert the unicode object first to a python string, so it will give then a string instead of nothing... - if this throw an exception, I will use the old method instead - I will use then this "converted python string" to make the groups (and in doing so compare the values)

It will not differentiate things such as "Gødland Final" from "Gødland Final", but I don't think that will be a problem XD

edit: Sorry but something came up here in the office (I am at work XD). I have it resolved, but I want to make some more tests before uploading it!
Last Edit: 1 year 3 weeks ago by Xelloss.
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46387

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
Ok, I don't have much time now to test it correctly, but if you want to give it a try, this is the beta version.

If you have many comics with special characters in the series name, try to see if the script results are correct to the search please :P

In any case, if you can give me volume and comic ids of comics you found problems with (the more and the more different the better) to make some tests, I would be grateful.

edit: I found a more elegant solution to the above... and easier... instead of changing integers to string with str() I now use unicode(), which of course doesn't have problem with unicode characters...

Here the file:


File Attachment:

File Name: SameXDiffe...-2-3.zip
File Size:2 KB


To sum it up:

The problem was that I needed all "x" values in "string" format, because I used them as a string id to make the different groups in a dictionary type. So, as I didn't know if the value was a string, an integer, a float, etc (it could be any of them depending on the values the user used) I just converted "anything" to string with str(). If the value was already an string, it just kept it as it was. OR SO I THOUGHT

The thing is that str() works great with unicode strings (transforming them to "python" strings) UNLESS there is some character that is not an ASCII character in it. In that case, instead of ignoring it, it just give "" as answer for the whole string. This made all "whatever unicode string with an special character" the same for the algorithm, and so they were grouped in the same group (giving false positives)

This all was because I had no idea python has two types of string, Python Strings and Unicode Strings... (as everything works almost the same with the two, but not very well when you mix them).

To solve it (in the last version, in the previous one I did something more complicated with exceptions that worked too) I just used the unicode version of str(), that is unicode(). And just use unicode strings instead of python strings to make the groups. This seemed to work fine with my tests, but I ask you to test it with yours if you can.

IN OTHER WORDS: If you are making an script, and you are working with values with special characters (any text from comicvine), and you want to convert an integer, float, or whatever, to a string type value, use unicode() and not str() - specially if you don't know the type of the thing you are converting
Last Edit: 1 year 3 weeks ago by Xelloss.
The administrator has disabled public write access.
The following user(s) said Thank You: boshuda, jkthemac

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46403

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
That seems to have done the trick thanks!
The administrator has disabled public write access.

CreateBookList Custom Scrips "options" value (and some of my beta scripts) 1 year 3 weeks ago #46404

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
ok, I will package the script then and make a topic for it as it is not beta anymore :P
The administrator has disabled public write access.
Time to create page: 0.316 seconds

Who's Online

We have 147 guests and 4 members online