Welcome, Guest
Python Scripts for ComicRack

TOPIC: Bedetheque Scraper 2 - v4.9

Re: Bedetheque Scraper 2 - v1.1 6 years 5 months ago #15156

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
I am experiencing some issues with Unicode chars, like there are many in French... this is not happening always and since I'm leaving for some R&R tomorrow, I won't be able to quickly fix it if this happens.

SOLVED in 1.2

So, quick solution is to edit the BDTranslations.ini file in the script folder (C:\Users\YOURUSERNAMEHERE\AppData\Roaming\cYo\ComicRack\Scripts\Bedetheque Scraper 2, for W7) and remove all accented chars, changing them for normal letters.

Or switch to English :-)

I know Baudelaire would kill me instantly on the spot and so would {Oo}... but some days we can survive...

Ciao, I'll drink a mojito on your health!

M
Last Edit: 6 years 4 months ago by mizio66.
The administrator has disabled public write access.
The following user(s) said Thank You: Ludwig

Re: Bedetheque Scraper 2 - v1.1 6 years 4 months ago #15280

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 232
I have added the Bedetheque Scraper manual (made by mizio66, all credit to him) to the first post...
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.2 6 years 4 months ago #15290

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
New version 1.2 is out, see first topic for DL

I hope the unicode bugs are gone, please test... also some more responsiveness from forms is added and some more http performance... hopefully...

Enjoy,

M
Last Edit: 6 years 4 months ago by mizio66.
The administrator has disabled public write access.

Some issues. 6 years 4 months ago #15328

  • Uthopia
  • Uthopia's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
Hi,

Very good work but still some issues because of capitalize

Serie's name (bedetheque.com) :
- Rubrique-à-Brac (www.bedetheque.com/serie-1018-BD-Rubrique-a-Brac.html)
- Sabre et l'épée (Le) (www.bedetheque.com/serie-12721-BD-Sabre-et-l-epee.html)

The script look for :
- Rubrique-À-Brac
- Sabre Et L'Épée (Le)

and give no answer.


Another issue:
Serie's name (bedetheque.com) :
- The bridge (www.bedetheque.com/serie-18572-BD-The-bridge.html)
- The Girl from Ipanema (www.bedetheque.com/serie-11333-BD-The-Girl-from-Ipanema.html)

The script look for :
- Bridge (The)
- Girl From Ipanema (The)

seems the article "the" still in first place.

At last another issue for “I.R.$.“ don't know why.

Thank you for this handy script and all the work that goes with it.
Sorry for bad english.

Have a good day.
The administrator has disabled public write access.

Re: Some issues. 6 years 4 months ago #15335

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
They should all have been fixed, thanks for reporting!
Serie's name (bedetheque.com) :
- Rubrique-à-Brac (www.bedetheque.com/serie-1018-BD-Rubrique-a-Brac.html)
- Sabre et l'épée (Le) (www.bedetheque.com/serie-12721-BD-Sabre-et-l-epee.html)

Cause of error was a wrongly utilized change of the series name to Title (so from bernard prince to Bernard Prince, i.e.) and in case of accented letters at the beginning of the word, it screwed the scrape...
Another issue:
Serie's name (bedetheque.com) :
- The bridge (www.bedetheque.com/serie-18572-BD-The-bridge.html)
- The Girl from Ipanema (www.bedetheque.com/serie-11333-BD-The-Girl-from-Ipanema.html)
Fixed also, i removed the swap of THE as it was not used in BDT.com...
At last another issue for “I.R.$.“ don't know why.
This was the $, I noticed already and fixed before you noticed that...

I will release soon the 1.3 version..

ciao

M
Last Edit: 6 years 4 months ago by mizio66.
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.3 6 years 4 months ago #15342

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
New version 1.3 is out, see first topic for DL

Some more fixing for specific scraping cases, see previous post of mine.

Also, added a button for direct scraping of a link from www.bedetheque.com: this should solve the worst cases... pelase test it and let me know!

Manual updated and sent to 600...

Enjoy,

M
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.3 6 years 4 months ago #15344

  • Uthopia
  • Uthopia's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 2
  • Karma: 0
Everything seems to work perfectly.
The errors mentioned above are resolved.

Quickscrape is a very good idea (not tested yet)

Thx U again for all the good work.
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.3 6 years 4 months ago #15359

  • bakker_be
  • bakker_be's Avatar
  • Offline
  • Senior Boarder
  • Posts: 68
  • Thank you received: 6
  • Karma: 2
Hi there, just installed the 1.3 version and it is amazing how good it works compared to the beta :-)
Direct scrape doesn't seem to work for me however, and gives conflicting messages. I first tried a TinTin album, and then a HS of Bernard Prince. TinTin raised the error, while Bernard Prince said it was renamed, but it didn't actually do anything.
Debug Log
Wednesday 29 June 2011 21:56:01
Caught AttributeError: 'NoneType' object has no attribute 'group'
C:\Users\Administrator.000\AppData\Roaming\cYo\ComicRack\Scripts\Bedetheque Scraper 2\BedethequeScraper2.py,423,parseAlbumInfo
Rename Log
** Scraping started **  1 Comic(s)   
============ Wednesday 29 June 2011 21:56:00 ===========
Wednesday 29 June 2011 21:56:01 >    serie-8372-BD-Tintin__1.html#32570Failed!   'NoneType' object has no attribute 'group'
Wednesday 29 June 2011 21:56:01 > [serie-8372-BD-Tintin__1.html#32570]    ** Skipped **


Renamed: 0   
Skipped: 1   
============= Wednesday 29 June 2011 21:56:01 =============   


 ** Scraping started **  1 Comic(s)   
============ Wednesday 29 June 2011 21:59:23 ===========
Wednesday 29 June 2011 21:59:23 > [album-8429-BD-D-hier-et-d-aujourd-hui.html]    ** Renamed **
“Be who you are and say what you feel because those who mind don't matter and those who matter don't mind.”
Dr. Seuss
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.3 6 years 4 months ago #15363

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 557
  • Karma: 232
I have added the Bedetheque Scraper manual v1.3 (made by mizio66, all credit to him) to the first post...
Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.

Re: Bedetheque Scraper 2 - v1.4 6 years 4 months ago #15369

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
Released a v1.4 that should fix all these in the direct link scraper... as i said not a lot of tesing there, but fixes can still come !

Errors reported and fixed:
- Note that the TinTin error reported above is NOT an error... that is a SERIE link, not an ALBUM... album's link always includes ALBUM (strange, who would say that!) in the text of the link itself.

- Bernard prince error was due to the missing number for that release. Should be fixed now as well.

- I think the Cover retrieve for fileless was not working always, now it should better.

Enjoy, file is in the first post as usual!

M
The administrator has disabled public write access.
Time to create page: 0.254 seconds

Who's Online

We have 217 guests and 5 members online