Welcome, Guest
Python Scripts for ComicRack

TOPIC: MAL scraper (in dev, looking for feeback)

MAL scraper (in dev, looking for feeback) 4 years 2 months ago #35588

  • Musera
  • Musera's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 4
  • Thank you received: 2
  • Karma: 1
Okay to begin this is not yet complete but shouldn't take too long. Right now it just has functions to get data from MAL, search and a basic matching algorithm. So overall it is just the beginning but it is just a scraper so it doesn't need a huge amount. Before proceeding however I want some feedback on how to handle a couple of things.

First up here is the data the api returns:
Warning: Spoiler! [ Click to expand ]


As you can tell it's not exactly inline with ComicRack metadata fields, not far off but want feedback on which field goes where nonetheless.

Now the big problem, chapters. MAL only keeps track of number of chapters and nothing else, in addition every ongoing manga I've looked at doesn't even keep that data. So how to handle it? I got a couple of ideas but want yours before proceeding.

That's all, the scraping functions should be finished in a couple of days and then I've just gotta build a gui. The gui is going to take some time because I don't want to base it on cbanack's comicvine scraper. I'm looking to develop a more streamlined interface focussing on scraping multiple files at once, not to insult the fantastic comicvine scraper but I always felt it made the process of scraping multiple files needlessly long.
The administrator has disabled public write access.
The following user(s) said Thank You: lg5, rmagere

Re: MAL scraper (in dev, looking for feeback) 4 years 2 months ago #35591

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 675
  • Karma: 181
As I never was the big manga reader, I can not really help here.
Maybe some others can weigh in.

Anyway, thank you for your effort :)
The administrator has disabled public write access.

Re: MAL scraper (in dev, looking for feeback) 4 years 2 months ago #35597

  • lg5
  • lg5's Avatar
  • Offline
  • Junior Boarder
  • Posts: 35
  • Karma: -1
The administrator has disabled public write access.

Re: MAL scraper (in dev, looking for feeback) 4 years 2 months ago #35602

  • Musera
  • Musera's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 4
  • Thank you received: 2
  • Karma: 1
Still looking for feedback on what was mentioned in the op but I've now created a mockup of what I'm thinking for the gui, this is just a quick image done in paint but gives a good idea of what I'm after.



Hopefully that is pretty self explanatory but I'll go through it and some of the features I'm looking at adding (Remember this is a mock up and may not be representitive of the final product!).

Firstly the left pane is a tree, files are grouped up by the assumed series based on their filename. The tree can be expanded to show what files are included in the set. As mentioned earlier, MAL does not keep information on chapters so using regular expressions the chapter or volume information is pulled out of the filename (this might be pretty iffy but won't know until I have the chance to examine filenames).

The middle panel is pretty self explantory, it is just the search results based on the set that's selected. Double clicking will select that manga as the set selection. In addition if a single file is selected in the tree then this can be set to a manga individually (useful if a mistake is made in matching filegroups).

Right panel (top and bottom) is just more in depth information based on the manga selected in the middle panel. The "SET" button on the bottom right just selects the manga as the one to be used, this is the same as double clicking the middle panel.

As for how it works in the background. Searches will be done concurrently, probably 4 at a time based on the order in the left panel (which will likely be sorted alphabetically but I didn't do that in the mock up). Not fully sure how I'm going to download images, possibly 10 per set, any not gotten will be done when the user selects the manga.

Also each manga that is seen as not the first chapter will firstly be searched inside ComicRack, if a previous entry is located then information will be grabbed from that.

A few minor things I missed off the mockup including a "search again" button, an "advanced" button (which will be set based and allow the user to add in fields that aren't done), a quick way of setting chapter/volume in case of errornous entries from the scraper (which unfortunately is pretty likely).

Any feedback on the gui would be greatly appreciated. I will probably finish off the scraper functions in the next couple of days and then begin work on the gui, it's got quite a bit to it and I'm not experienced with python so it will take some time. In addition I'm coming down with a cold so that'll delay it further. Basically I have no release date but thinking around mid August.
Last Edit: 4 years 2 months ago by Musera.
The administrator has disabled public write access.

Re: MAL scraper (in dev, looking for feeback) 3 years 10 months ago #37552

  • fieldhouse
  • fieldhouse's Avatar
  • Offline
  • Expert Boarder
  • Posts: 88
  • Thank you received: 9
  • Karma: 1
definitely interested in this 'though the CV Scraper has spoiled us all and is a hard act to follow :D
The administrator has disabled public write access.
Time to create page: 0.292 seconds

Who's Online

We have 187 guests and 5 members online