Welcome, Guest
Python Scripts for ComicRack

TOPIC: [Request]Myanimelist Scraper for manga

[Request]Myanimelist Scraper for manga 6 years 5 months ago #13421

  • arathon
  • arathon's Avatar
  • Offline
  • Junior Boarder
  • Posts: 27
  • Thank you received: 1
  • Karma: 1
i'm a manga reader and i want to ask if you could make a manga scraper like comicvine for comics, i tried to do it my self but unsucessfully, i'm not a coder and i don't know well python, but i've found that Myanimelist made some api to interface to their database there are also some unofficial api that they implement new functionality. Infact for now the only site that give some documentation is myanimelist.net, even though it has a smaller database than mangaupdate is usefull enough.

I hope that this great comunity appreciate my request.


P.S: the only bad side of this site is the publisher data they post the megazine one and not the main publisher.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 5 months ago #13422

  • cbanack
  • cbanack's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1318
  • Thank you received: 503
  • Karma: 181
The API is here:

myanimelist.net/modules.php?go=api#animemangasearch

I took a quick look at it, and everything you'd need to write a scraper does seem to be there. The complexity of the api is much simpler than ComicVine, which is good because it makes it easier to learn and use, but it also means that the resulting scraper will be a lot simpler than Comic Vine Scraper, and probably behave somewhat differently.

I don't have time these days to write a second scraper app--just keeping ComicVine up to date and functional is using up most of my (scarce) free time. But I am willing to offer bits of advice and suggestions if someone else wants to make this their project. :)
The administrator has disabled public write access.
The following user(s) said Thank You: arathon

Re: [Request]Myanimelist Scraper for manga 6 years 5 months ago #13424

  • arathon
  • arathon's Avatar
  • Offline
  • Junior Boarder
  • Posts: 27
  • Thank you received: 1
  • Karma: 1
it would be awesome if someone could do it, I hope that someone take this project.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 5 months ago #13426

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
@cbanack:

You did not write your own transparent scraper proxy layer where you only have to reimplement the backend for a new scraper?

Just kidding :)
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 3 months ago #15315

  • arathon
  • arathon's Avatar
  • Offline
  • Junior Boarder
  • Posts: 27
  • Thank you received: 1
  • Karma: 1
bump!!

is there a Very KIND people who can make a scraper of myanimelist/mangaupdate???? :)

i noticed but i can be wrong that the betheque scraper can be ported to these sites with "some" changes, unfortunatly i have zero knowledge of the regex expression and the other things i studied only language c, so i can't help ....sigh...

between is there a way to debug a comicrack script so i can atleast see what evEry code line do, and then try to modify on my own...thanks
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 3 months ago #15318

  • Stonepaw
  • Stonepaw's Avatar
  • Offline
  • Moderator
  • Posts: 920
  • Thank you received: 267
  • Karma: 173
Unfortunately I don't have time at the moment to write a scraper sorry :(
arathon wrote:
between is there a way to debug a comicrack script so i can atleast see what evEry code line do, and then try to modify on my own...thanks
Yes there is: Linky
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 3 months ago #15433

  • Joentjuh
  • Joentjuh's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 17
  • Thank you received: 1
  • Karma: 1
A great idea, I would love to see a manga scraper plug-in. (have been thinking about this for a while now, though my ideas were a bit more ambitious than a simple scraper)

I do foresee a few 'minor' obstacles:

- Scanlations are usually released on a per chapter basis and some release groups are quite stingy about modifying 'their' releases (i.e. removing added content).
- Most on-line databases (those I know of at least) contain very little information about the individual chapters/volumes of a manga... If any. (usually only the total - and often incorrect - number of volumes) -> MangaUpdates has a list (incomplete, but still better than nothing) of which scanlations made by which groups... Sadly it has no API.
- There usually aren't any (volume) covers available for comparison (doesn't help individual chapters)... I'm working on a solution for this, yet it's nowhere near ready for public use (not quite sure how I'm going to link content).
- Assuming we're going to work on a per volume basis, there is nearly no information about which chapters belong to which volume (especially the case for the longer and/or ongoing series) - Again, working on it, but can take a looong while and even then would need to be constantly updated).
- Scanlations come in many languages, as do the titles (and summaries).
- While digital comic releases often have some kind of naming standard, manga does not - ranging from very convoluted to less than you need.
- Assuming the ComicRack plug-in only uses the English content, you're still dealing with translations (which may differ per site or release group)
- Also not to forget is the display title (same issues as the above), some may like to use the English title, whilst others may like to use the Romaji or even the original title.

You could, of course, limit the plug-in to only use mangas officially released in English (i.e. TokyoPop or Viz)... but what would be the point in that.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 3 months ago #15436

  • arathon
  • arathon's Avatar
  • Offline
  • Junior Boarder
  • Posts: 27
  • Thank you received: 1
  • Karma: 1
i've also thought about these problems, first in these case we can't scrape per volumes because there are almost no data anywhere even though this can be accomplished by using the description in the publishers pages like panini,jpop,star comics (all italiamn pubblisher), but i think this is useless and i prefer a general summary of the series then one of the volumes because of spoilers.

Second the scraping of info is rather difficult this time because as you said the names differ a lot, and also for the same one you can find a lot of series, so that scraping became ,also because of lack of api, pretty impossible, also 'cause there isn't any method for mangaupdate to take the series that you want. i've tried some search and it continue to give too many series in a lot of the cases, unfortunatly. For this, is better myanimelist even though its database is very small.

But my HOPE that someone CRAZY make this plugin still remain...
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 3 months ago #15454

  • quidam
  • quidam's Avatar
  • Offline
  • Platinum Boarder
  • not for nom
  • Posts: 448
  • Thank you received: 23
  • Karma: 30
mangaupdates.com would be perfect but i wouldn't complain with MAL scraper either if that's easier.

proper script should start whit its own custom filename parser that would properly catch series title and volume/chapter values as the one in CR is not the best suited for the way scanlations are named and there are quite a few mismatches.

there's no way to scrap per chapter/volume because there's no source online that goes this much into details. so global series scraper has to be sufficient. and that's okay for me, i don't need to download volume covers per volume, etc. custom plot summary per volume would be nice but i've seen such info only on publisher webpages (tokyopop, dark horse) not on mangaupdates or mal.

still scraping basic info about series would be fine.

cheers to anyone who's up to the task. ;]
Last Edit: 6 years 3 months ago by quidam.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 6 years 2 months ago #15723

  • Isuldor
  • Isuldor's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 11
  • Karma: 0
Joentjuh wrote:
- Scanlations are usually released on a per chapter basis and some release groups are quite stingy about modifying 'their' releases (i.e. removing added content).
A metadata scraper wouldn't need to remove any pages. If anything, we could credit the scanlators in the "Scan Information" field.
Joentjuh wrote:
- Most on-line databases (those I know of at least) contain very little information about the individual chapters/volumes of a manga... If any.
This is actually the most compelling argument against creating a scraper. There just isn't that much rich metadata to get for manga series. For instance, the staff on western comics changes over time; whereas manga series generally have a single author that never changes. It'd be a lot of work for a little gain. Improving the "proposed values" for manga filenames would probably make a bigger improvement for manga CR libraries. Maybe tagging a book as "Manga" in CR could engage a set of different filename regex's for proposed values.Joentjuh wrote:
- While digital comic releases often have some kind of naming standard, manga does not - ranging from very convoluted to less than you need.
Chicken or egg problem here. We'd just have to try our best and hope that people care enough about library systems like CR to rename the files themselves, or petition the scanlators/scene to standardize.
Joentjuh wrote:
- Also not to forget is the display title (same issues as the above), some may like to use the English title, whilst others may like to use the Romaji or even the original title.
I think this could be solved with just a setting in the scraper. You could call them "User Preferences" :p
Last Edit: 6 years 2 months ago by Isuldor.
The administrator has disabled public write access.
Time to create page: 0.219 seconds

Who's Online

We have 174 guests and one member online