Welcome, Guest
Python Scripts for ComicRack

TOPIC: Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher)

Bonelli (www.sergiobonelli.it) Scraper v3.3 BETA (Italian publisher) 2 years 2 months ago #43137

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Hi - not sure if this is a temporary problem and/or limited to me however today the scraper has stopped working.

On 11/09/15 I scraped a couple of issues with no problem whatsoever, today (16/09) I tried to do the same and the only result I get is "Ignorati".
  • I thought it might be due to the "lista delle collane" being old and tried to generate a new one: the process started and rapidly ended - I checked and the file "Collane_Bonelli.txt" was empty
  • I thought the website had changed - however when I compared the links in a backup copy of "Collane_Bonelli" with those of the website they are still correct
  • Other scrapers are working (both Comicvine and C.O.A.)

Any ideas/suggestions?

Thank you
The administrator has disabled public write access.
The following user(s) said Thank You: duque

Bonelli (www.sergiobonelli.it) Scraper v3.3 BETA (Italian publisher) 2 years 2 months ago #43139

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
I know, it's a (stupid) double space in the html code, added somewhere where there used to be a single one. Not sure if on purpose or not...

Be patient, I'll fix it...

Ciao,

m
The administrator has disabled public write access.
The following user(s) said Thank You: rmagere, duque

Bonelli (www.sergiobonelli.it) Scraper v3.3 BETA (Italian publisher) 2 years 2 months ago #43143

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Ciao Mizio
Grazie mille!
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 2 years 1 month ago #43157

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
Ufff, it was not just a "couple of spaces"... I think they made some low level changes to the structure, but it should work, hopefully, for now.

Please test... it could happen that the work is not finished on the site, so another fix might be required.

Link to v3.4b

As usual, read the manual!

Enjoy,

M
Last Edit: 2 years 1 month ago by mizio66.
The administrator has disabled public write access.
The following user(s) said Thank You: luke_70it, duque

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 2 years 1 month ago #43158

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Thank you!
Will test later today and will provide an update

--EDIT--

Tested - all the issues I tried worked ! :laugh: :silly: :laugh:

Thank you so much!
Last Edit: 2 years 1 month ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 2 years 1 month ago #43177

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
rmagere wrote:
Tested - all the issues I tried worked ! :laugh: :silly: :laugh:

Interestingly the scraper works perfectly but the site has lost information e.g. Color Zagor now only scrapes issues 1 & 2 and not also 3 like it used to do as it does not appear in "Arretrati", however the page that was originally scraped before the changes still exists. Similarly Romanzi Bonelli jumps from 8 to 10 skipping 9 as it does not exist on Arretrati.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 1 year 2 weeks ago #46479

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
It seems that some changes to the website have broken the "Genera Collane" aspect of the script.

Today tried to scrape some data and (as the list was getting old) a new "Collane Bonelli" was created. However the file now only has the following entries:
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=10&tag_55=
Prossimamente in libreria|sezioni/311/175/edicola?q=&sezione_ricerca=10&tag_16=
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=11&tag_55=
Prossimamente in libreria|sezioni/3460/189/in-libreria?q=&sezione_ricerca=11&tag_55=
In libreria|sezioni/3460/188/in-libreria?q=&sezione_ricerca=12&tag_55=
Prossimamente in libreria|sezioni/3460/189/in-libreria?q=&sezione_ricerca=12&tag_55=
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=14&tag_55=
Prossimamente in libreria|sezioni/3460/189/in-libreria?q=&sezione_ricerca=14&tag_55=
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=25&tag_55=
Prossimamente in libreria|sezioni/3460/189/in-libreria?q=&sezione_ricerca=25&tag_55=
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=42&tag_55=
Prossimamente in libreria|sezioni/3460/189/in-libreria?q=&sezione_ricerca=42&tag_55=
In libreria|sezioni/3460/188//in-libreria?q=&sezione_ricerca=38&tag_55=
Prossimamente in libreria|sezioni/3460/189/prossimamente-in-libreria?q=&sezione_ricerca=38&tag_55=
Tried to recreate again and the same outcome.

I recovered the file from a backup and the script works fine with a correct "Collana Bonelli" (except when gathering data from the very last number released).
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 11 months 4 days ago #46660

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
rmagere wrote:
It seems that some changes to the website have broken the "Genera Collane" aspect of the script.
Today tried to scrape some data and (as the list was getting old) a new "Collane Bonelli" was created.

The changes made on the Bonelli website beyond having broken "Genera Collana" have made it harder to find the right page to use manually.

E.g. for Dampyr when selecting arretrati now Bonelli sends you to their new shop page "shop.sergiobonelli.it/sezioni/10002/fume...ag_64=Dampyr&tag_0=1". However the old page "www.sergiobonelli.it/sezioni/470/dampyr470" still works.

That's not the case for Adam Wild where the old page stored in Collana Bonelli gives page not found and cannot workout manually the page url required (I can use the quickscraper on the issue page to get around the problem).

Also sometimes the script fails to obtain the Release Date and the Summary (e.g. Maxi Dampyr 8 but works on Maxi Dampyr 7, fails also on Lilith 17)
Last Edit: 11 months 4 days ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 11 months 4 days ago #46661

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
clearly i need to rework that... be patient...

M
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v3.4 BETA (Italian publisher) 11 months 4 days ago #46666

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
mizio66 wrote:
clearly i need to rework that... be patient...

Thank you!

I just like to keep a log of what I find for when you have some spare time ;)
The administrator has disabled public write access.
Time to create page: 0.600 seconds

Who's Online

We have 677 guests and 4 members online