Welcome, Guest
Python Scripts for ComicRack

TOPIC: Bedetheque Scraper 2 - v4.9

Bedetheque Scraper 2 - v4.9 7 months 1 week ago #48798

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 459
  • Thank you received: 149
  • Karma: 69
Hi, ISBN has not been touched... by me. So, if BDTQ changed some of the HTML, might be not working.
As I have not a lo tod time to check the script, I cannot promise when, but I’ll have a look. I scraped quite some album last weeks, no issues, but I will check the things above.
Cheers!
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 7 months 1 week ago #48800

  • Chgros
  • Chgros's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
Well i am no python developper at all but i repaired the ISBN data

I don't know if there are other changes in the HTML on the Bedetheque side.


File : BedethequeScraper2.py

Changing
ALBUM_ISBN_PATTERN = r'<label>ISBN\s:\s.*?\">(.*?)<'
for
ALBUM_ISBN_PATTERN = r'<label>ISBN\s:\s?</label>(.*?)</'
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 7 months 3 days ago #48849

  • Chgros
  • Chgros's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
it seems something changed in Bedetheque HTML (or quickscrape is broquen)

If i use quickscrape, The title and series are wrong
If i scape the serie , i get the good informations.

expl with
www.bedetheque.com/BD-Star-Wars-Poe-Dame...on-Black-294102.html

Also if i try to quickscrappe some albums, it freez
exmpl : www.bedetheque.com/BD-Nyarlathotep-60843.html

Can someone look at this ?
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 7 months 2 days ago #48850

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 459
  • Thank you received: 149
  • Karma: 69
That is not an album link... Please check the manual for instructions...
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49252

  • pascal
  • pascal's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Hello,

First, thank you for the script.
I did have some try on few series, but I must admit I found the results quite poor.

----
The major problem I'm experiencing is a not found serie or book, ththat the scrapper seems to not found any reference I select in CR.
Well to be honest he found one (Incal (L')
But this typically end like this:
Calling 'BD_start'...

=========================- Begin! -=========================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Nom Série = J.K.J. Bloche - T09 - Absent No = []
Recherche sur le Web avec Nom de Série: J.K.J. Bloche - T09 - Absent
Recherche générique dans www.bedetheque.com/search/tout?RechTexte...20absent&RechWhere=0
# Temps nécessaire: 0:00:02

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

My first thought is I'm doing wrong stuff.
I read the doc,
I went through this topic in the forum,
I try to understand the script thanks to the debug mode,
I understand it works fine with quickscrapper button. So I admit this is something to deal with the fact the tool does not recognize the book on the bedetheque

Should I do some kind of pre processing stuff on the titile of the file (the book) or is there anything I might have not done before launching the tool on my selection of book ?

Any thoughts ?

Thank you in advance.
Pascal
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49255

  • ninjaw
  • ninjaw's Avatar
  • Offline
  • Senior Boarder
  • Posts: 63
  • Thank you received: 7
  • Karma: -2
This isn't very clear, at all, however, it's clear that you are trying to search with zero informations except the file name, so you obtain zero answers.
Try to enter at least the name of the serie. You doesn't have an issue with the script but with Comicrack.
Last Edit: 5 months 4 weeks ago by ninjaw.
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49256

  • StudioNeuneu
  • StudioNeuneu's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 11
  • Thank you received: 2
  • Karma: 0
I think you need at least the name of the serie, and the number of the comic (if there is no number, it will always find the comic number 1).
Try to rename your files with the name of the serie and the number of the comic ( for exemple : J.K.J Bloche #09.cbz). ComicRack will put the name of the serie and the number automatically, so you don't need to do anything, only scrape your book.
I do like that, and except for some files, it generally work.
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49257

  • ninjaw
  • ninjaw's Avatar
  • Offline
  • Senior Boarder
  • Posts: 63
  • Thank you received: 7
  • Karma: -2
I agree to that, however, the best way to use the tool for a whole collection, is mass naming your whole JKJ collection then autonumber it then scrape then FILE RENAMING

PS: Congrats at reading JKJ, you sure have to drop the TXX format and use the american #XX format for lots of reasons

Je suis d'accord avec ca, cependant, la meilleure facon d'utiliser l'outil pour toute une collection, est de renommer en masse toute ta collection de JKJ puis des les numeroter automatiquement puis de scrapper puis de FAIRE UN RENOMMAGE DE FICHIER

PS: felicitation pour ta lecture de JKJ, tu dois vraiment abandonner le format TXX et utiliser le format américain #XX pour plein de raisons
Last Edit: 5 months 4 weeks ago by ninjaw.
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49258

  • pascal
  • pascal's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 10
  • Karma: 0
Thank you all,
I have no doubt I made a mistake.
I firmly believe the script is running ok, I just need to understand the different patterns it is expecting to work fine.

So I understand and yes it works as long as the true name of the series is put in the serie field of CR.

In my example, the serie name (which btw is the filename) is: J.K.J. Bloche - T09 - Absent No
where:
JKJ. Bloche is standing for the serie name
T09 is the number of the comic
Absent No is the title of the comic

When looking the serie name in bedetheque, this is actually: Jerome K Jerome Bloche ( not J.K.J.) explaining why it is not working.
So I need to pre process somehow this name before running the scraper.

Keeping in mind the scraper is looking first for the name of the serie second for the number of the comic.
Often, the name of the serie will not match 100% with what I have.
Another example with this serie name: Bruce J Hawker whjile the serie name in bedetheque is Bruce J. Hawker
:)
The administrator has disabled public write access.

Bedetheque Scraper 2 - v4.9 5 months 4 weeks ago #49259

  • ninjaw
  • ninjaw's Avatar
  • Offline
  • Senior Boarder
  • Posts: 63
  • Thank you received: 7
  • Karma: -2
You are correct except the comic number is 09 on the label #
There is no T in comicrack and you should avoid V too for french comics
The administrator has disabled public write access.
Time to create page: 0.220 seconds

Who's Online

We have 208 guests and one member online