Welcome, Guest
Python Scripts for ComicRack

TOPIC: Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher)

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 2 weeks ago #46774

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
A New release is out ! Links in the first page.

PLEASE NOTE

- The website has changed quite a lot since the last release. This means that this release will fix most of the issues, but not all.
- I suggest to try scraping one by one initially and verify the data, so you can always CTRL+Z it
- The Quickscrape is needed in some cases (like Orfani or NN Annozero) as Bonelli decided to unify the Collana to the same name, so we have now duplicate numbering. To Quickscrape, identify the volume in the shop.bonelli site and copy the link to that page.

Report any issue, I will try to have a look (be patient though!)

Enjoy !

M
Last Edit: 10 months 2 weeks ago by mizio66.
The administrator has disabled public write access.
The following user(s) said Thank You: rmagere, duque

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 1 week ago #46776

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
As always: thank you so much!
I look forward to trying it out :)

P.S. same message for the diabolik one!


Also regarding Orfani I found that the old scraper with the old series link was still working - will double check during the weekend to confirm before upgrading to the latest version of the scraper.
It seemed that although they created new shop pages for many series and deleted the old series, for some they have left the old links still active.
Last Edit: 10 months 1 week ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 1 week ago #46782

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
I'll use this post to log the issues I encounter so I'll be editing it over time.

Issues:
  • Titles with accented letters drop the letter. E.g. "verità" becomes "verit"
  • Martin Mystere does not scrape. Or rather some of the issues do not scrape I found the range 115-225 problematic. Tried by naming the series both Martin Mystere and Martin Mystere Bimestrale
  • Dylan Dog Color Feast does not work
  • Martin Mystere Gigante does not work - I am wondering if the issue is with the è in Mystère - I have tried with è and e but does not scrape regardless
  • Maxi Dylan Dog does not scrape
  • Speciale Brad Barron does not scrape
  • Speciale Dylan Dog does not scrape
  • Speciale Martin Mystere does not scrape
  • Storie da Altrove does not scrape
  • Lukas and Lukas Reborn are merged

It also appears that, when searching the website, not all issue are listed from a search. However when you go into one of those issues that were found you can then scroll to the other issues that did not appear at first sight (but the link is randomish)

As always thank you so much for your work on the scrapers!
Last Edit: 10 months 6 days ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 1 week ago #46785

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
As i was trying to say, the changes made to the organization of the albums and the collana naming, are making the scraper powerless in some cases.
You have to use the quick scrape for most of those you mentioned above, or use the collana name as in the site, scrape (mind that there could be duplicate numbers... so not working always) and then rename it as you want to.

To verify, just go to the details of any or the arretrati and you will see that the same collana name applies to most of them.

The accented letter... well i suppose it has to be with the codepage, as in some cases they work and in other don't. We need a pattern to understand when is happening.

Enjoy,

M
The administrator has disabled public write access.
The following user(s) said Thank You: rmagere

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 1 week ago #46789

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Oh yes Bonelli made a right mess of their website :(

Regarding the pattern that's why I was planning to update the above post to keep track of what I encounter. If you let me know what information is useful to post I'll do so after my scrapes.

In the new scraper there is no "collana bonelli" so I am not sure how it determines what to use for the different series (nor to be honest can I figure out how Bonelli names their attretrati page). One thing I noticed is that the search implemented by Bonelli is very very bad and similarly their arretrati list when a series checkbox is selected (e.g. Martin Mystere might list all number from 100 to 200 except for some random ones 116, 121, etc which are only found when going to the issue 117 and going back one page).

Not sure what they were thinking....
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 1 week ago #46790

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
Me neither.
The file collane is not needed anymore as they changed the structure of the site.

When you find something, always provide the link tonthe bonelli.shop page, so that i can test it.

Ciao,

M
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 6 days ago #46795

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
mizio66 wrote:
When you find something, always provide the link tonthe bonelli.shop page, so that i can test it.

Will do so - though not fully clear on which link would be useful. I.e. the only links I tend to refer to are the single issue links.

E.g. For Lukas (due to the merger of series that you highlighted in a previous post) the comicscraped is instead Lukas Reborn
So Lukas 9 (shop.sergiobonelli.it/scheda/37925/Zombie.html) is scraped as Lukas Reborn 9 (shop.sergiobonelli.it/scheda/39319/Il-prescelto.html)
Arretrati link: shop.sergiobonelli.it/sezioni/10002/fume...tag_64=Lukas&tag_0=1

Are the above the links I should provide or should I look for something else?
Last Edit: 10 months 6 days ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 6 days ago #46796

  • mizio66
  • mizio66's Avatar
  • Offline
  • Platinum Boarder
  • Started reading comics at 4... and still counting!
  • Posts: 451
  • Thank you received: 143
  • Karma: 67
No that's ok.

and this is what I mentioned. Not possible, logically to scrape autmatically such albums. How do we know if it is one or the other?

So, to solve this, you have to use the quick scrape. just paste the same link as below in the box and it will work.

Ciao,

M
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 6 days ago #46797

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Quickscrape is exactly what I have been up to :)

In the case of Lukas the following pages still work:
Lukas: www.sergiobonelli.it/sezioni/3217/lukas3217
Lukas Reborn: www.sergiobonelli.it/sezioni/3375/lukas-reborn

So v3.4 can scrape them automatically.
I wonder if it would work to have a hybrid mode with 3.4 being the default behaviour when the "sezioni" still exist and 4.0 being the one where those pages do not exist anymore.
Although the actual "issue scraping" would have to be 4.0 as the links in those sections have changed to point to a shop.bonelli page with the new layout.

I have found that using the following search in google "N°, Titolo, M/A prima edizione site:www.sergiobonelli.it/sezioni/" (+ or - name of series) helps me find the "hidden" collane that are still alive (www.google.com/search?q=N%C2%B0%2C+Titol...2F&ie=utf-8&oe=utf-8)
Last Edit: 10 months 6 days ago by rmagere.
The administrator has disabled public write access.

Bonelli (www.sergiobonelli.it) Scraper v4 BETA (Italian publisher) 10 months 3 days ago #46806

  • rmagere
  • rmagere's Avatar
  • Offline
  • Gold Boarder
  • Posts: 221
  • Thank you received: 24
  • Karma: 7
Ok so I had some time to go through an old "Collana Bonelli" and here are my findings on what can be used for a new "Collana Bonelli" based scraper:

Almost all series still have a page dedicated to it. Some have maintained the same link, some have changed to a link that is the same as before but with the "sezione" number added at the end, some (the ones with links ending in "/arretrati") have changed more significantly and the new link is aligned to the format of the others. Also I have found a new series for Orfani. All series that were not immediately obvious were found by adding the series name to the google search in the previous post.

Given that the series that used to have "arretrati" now have links closer to the other series I wonder whether -rather than removing the single pages- they are simply hiding them and making them more consistent. If that were the case we could expect that the missing series will have links reappearing in the future and that -maybe- those series that do not yet end with the "sezione" number will be changed to do so.

Below in the spoiler section the different links (hopefully I have made no mistakes :) ):

New Series
Warning: Spoiler! [ Click to expand ]


Unchanged Links
Warning: Spoiler! [ Click to expand ]


Links to which Sezioni number was added at the end
Warning: Spoiler! [ Click to expand ]


Bigger Changes
Warning: Spoiler! [ Click to expand ]


Still Exists but has no issues stored in it
Warning: Spoiler! [ Click to expand ]


Series I could not find
Warning: Spoiler! [ Click to expand ]


Hope this might be of help.
The administrator has disabled public write access.
Time to create page: 0.497 seconds

Who's Online

We have 328 guests and 4 members online