Welcome, Guest
Python Scripts for ComicRack

TOPIC: [Request]Myanimelist Scraper for manga

Re: [Request]Myanimelist Scraper for manga 5 years 10 months ago #19223

  • rdu90
  • rdu90's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 16
  • Thank you received: 1
  • Karma: 0
Ok, so I connected to myanimelist.net

but...I don't really see the kind of information I want (the most important to me being title followed by publication date)

not to mention that their searching api is very limited (unless there is more than I saw. Example, I don't see how you can search for a single issue)

so, now that I have the technical part at a good point, I now need to find the data. if anyone can shed any light, shed


aaron
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 10 months ago #19224

  • rdu90
  • rdu90's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 16
  • Thank you received: 1
  • Karma: 0
Right now, what I see that works for my own personal selfish use case is scraping wikipedia. For some, Bleach and Naruto, they even have volume breakdowns

that will make this thing of limited value to anyone that doesn't read my rather small variety of manga

maybe I should just try to make the parser a plugin so that additional ones can be added for the variety of sources that will probably be necessary for this to work on any kind of scale

if there is no central database, then it's the only thing I can think that will work
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 10 months ago #19240

  • Joentjuh
  • Joentjuh's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 17
  • Thank you received: 1
  • Karma: 1
rdu90 wrote:
Ok, so I connected to myanimelist.net

but...I don't really see the kind of information I want (the most important to me being title followed by publication date)

not to mention that their searching api is very limited (unless there is more than I saw. Example, I don't see how you can search for a single issue)

so, now that I have the technical part at a good point, I now need to find the data. if anyone can shed any light, shed

aaron
The way I see it, there is no need for looking up single issues on sites like MAL or MU. Having series information is enough. If you know which volume you're looking up (by number), and know how many volumes a series has, it's only a small step to find the correct volume on a site like Amazon, Kinokuniya, or ir you're lucky, manga.joentjuh.nl.
rdu90 wrote:
Right now, what I see that works for my own personal selfish use case is scraping wikipedia. For some, Bleach and Naruto, they even have volume breakdowns

that will make this thing of limited value to anyone that doesn't read my rather small variety of manga

maybe I should just try to make the parser a plugin so that additional ones can be added for the variety of sources that will probably be necessary for this to work on any kind of scale

if there is no central database, then it's the only thing I can think that will work

Using wikipedia as a primary source of information would probably make this plug-in useless to most people. A quick count of the series on MU reveals they have about 37000 series listed (not counting doujinshi, oneshots, webcomics, etc.). Even if MAL is not as large as MU, they still have more than enough to make it a viable source of information.

Sure, WikiPedia may have very detailed information on a manga or two, but how useful (and reliable) is it really?... Even if there was some sort of standardisation for the layout of the content (which is a pain to scrape as it is now).
If you know the series and was able to find it on MAL/MU, you already know a great deal. If you have the user tell you which volume you're dealing with (optionally via a list like in the ComicVine plug-in) you can then try looking for the cover for that specific volume. Have the user confirm it afterwards if you're not sure it's the correct volume.

There is no manga version of ComicVine, and the only way to get more (reliable) information is to step straight to publishers website (some publishers even have a catalogue)... Which usually has about the same, or less, information than what's provided on Amazon a.o.
Last Edit: 5 years 10 months ago by Joentjuh.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 10 months ago #19348

  • rdu90
  • rdu90's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 16
  • Thank you received: 1
  • Karma: 0
Yes, the data is what makes this a difficult problem :angry:

Well, something will be better than nothing, I'll attempt something this weekend and try to get something that someone else can at least try out and give feedback on
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 10 months ago #19356

  • Joentjuh
  • Joentjuh's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 17
  • Thank you received: 1
  • Karma: 1
Okay the preliminary version of the API is operational

Link

Currently you can search by MU ID, MAL ID, Title, and ISBN
MU/MAL/ISBN always returns a single result (or nothing)
Title always returns an array (or nothing) and searches through all known titles from both MU and MAL
ISBN searches though both ISBN-10 and ISBN-13, only fully numerical values work and no content correction is done (so remove all spaces and hyphen)... Will fix this when I'm bored.
Both POST and GET should work, the form uses GET by default (variables in the URL)
Every "Search by" option formats the results in the same way, with the exception of ISBN (will listed only the applicable issue, not the entire list).

The data itself should be self-explanatory, a few comments though:
- mal_id can be "0", meaning no lookup for MAL has been done yet (or not found), lookups are done automatically whenever the manga entry itself is updated (since I've only added it recently there are quite a few that have not been done yet)
- volumes is a guessed value, use status for the source
- path is the absolute path to the "raw" files (read: large in size and dimensions) and should precede both the cover, front, side, and back fields. Use thumbnail_path for the thumbnails
- titles is a collection of all synonyms/aliases found on both MU and MAL
- issues is a list of all volumes in the database, volumes not in database are not listed and no placeholder entry is added.

Hope it makes sense, would love some feedback and/or tips.
I will leave the structure as it is for now, but can easily change it upon request.

-- Edit:
I have just put my new search engine online, which uses a custom implementation of search relevancy, will try and clean the code up a bit and then also use it for the API. Search results then be listed by a relevancy percentage... See "search: dragon" for an example.
Last Edit: 5 years 10 months ago by Joentjuh. Reason: Update
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 8 months ago #20673

  • arathon
  • arathon's Avatar
  • Offline
  • Junior Boarder
  • Posts: 27
  • Thank you received: 1
  • Karma: 1
today i found something interesting in the xbmc forum, there is a Anime News Network scraper but for movies and animes, however that site provide lots of information regarding mangas (only, not manwhas), certainly the db is not huge like mangaupdate or myanimelist but quite usefull, hope the script will help.

here the link to the forum topic
forum.xbmc.org/showthread.php?t=69405
Last Edit: 5 years 8 months ago by arathon.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 6 months ago #22465

  • Joentjuh
  • Joentjuh's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 17
  • Thank you received: 1
  • Karma: 1
Time to resurrect this topic.
rdu90 wrote:
Yes, the data is what makes this a difficult problem :angry:

Well, something will be better than nothing, I'll attempt something this weekend and try to get something that someone else can at least try out and give feedback on

Any chance the project is not entirely off the table?

The cover database has grown quite a bit in the past couple of months and has become the largest currently online, with thousands of volumes still in the process queue.
The API is in dire need of upgrading (got a bit neglected during past database upgrades), will put it on higher priority if someone is actually working on a plug-in for ComicRack.
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 5 months ago #22734

  • rdu90
  • rdu90's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 16
  • Thank you received: 1
  • Karma: 0
no, it isn't entirely off the table. I allowed myself to backburner this once I clearly understood the problem

my problem is that nothing i can do actually solves my problem. there is no source offering the format I need with the information I want

my thought is that the approach is wrong and I haven't done the research to find out the best way to solve that problem

so, here is what I did. I wrote python to change my comicrack database. I embedded that into comicrack, it worked. Ok, so then, how do I get the data? oh wait, there isn't anything. I can get covers, but what good is that if I don't know what volume the single issue I have is? some people collect volumes, so it works well for them, but I don't. how do I get the information necessary to figure out volumes?

now, here is how I use comicrack

I get a cbr/cbz from some online source and I manually put some of the data into my local database. I don't fill out everything, but I put in the title, number, publisher and month/year. So, I'm building a database of my own for a handful of series that I'm interested in...maybe others are doing the same. So, maybe it is just better to be the guy that does THIS series, and if someone gets up earlier than I do, they put in this week's issue and I just get it when I run the plugin. and then some other gal out there puts in data for their series and that's available to me if i ever get info in there.

so what I think is, there is no central database until we build it. and it doesn't really have to be that high tech. i have thoughts on how to just have a versioned file out in the internet existing for anyone to update. or just a method for people to share amongst themselves

Now, is that available given the online capabilities of comicrack? i can think of a way to merge two databases given what I know of the api now

thoughts?
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 4 months ago #23309

  • Elhan
  • Elhan's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 8
  • Thank you received: 1
  • Karma: 0
i havent wrote any comicrack script but this summer i am planning to give it a try. I will let you know if i make progress
"In a world without walls and fences, who needs Windows and Gates" :D
The administrator has disabled public write access.

Re: [Request]Myanimelist Scraper for manga 5 years 4 months ago #23453

  • Elhan
  • Elhan's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 8
  • Thank you received: 1
  • Karma: 0
guys there are not much info about mangas in myanimelist. Are there another sites that gives good information? If site has an api it would be great else i will try to parse site and scrape the information.
"In a world without walls and fences, who needs Windows and Gates" :D
The administrator has disabled public write access.
Time to create page: 0.205 seconds

Who's Online

We have 181 guests and 4 members online