Welcome, Guest
General discussion about ComicRack

TOPIC: Using Marvel or DC wikias as scraping source.. could it be possible?

Using Marvel or DC wikias as scraping source.. could it be possible? 1 year 1 day ago #46552

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
I have been researching this for years now, as this wikias are usually just much better comic info source than comicvine for DC and Marvel comics... (they have different entries for alternate characters, they have much better plot texts, they have so much more information)

The first problem I found when trying to scrap info from the wikias, is that comic info in this sites is not "formated" (you can't use and api to just load characters in a comic, or dates, etc...). You have to do data mining with the content to achieve that, and that is TRICKY...

What I have been trying to do lately is just download the complete wikias as text, and do string automatic searches for data mine the data,,, with little and frustrating results for now....

Is there someone out there working in a scraper that scrap comic info to comicrack using the wikias? I could really give him/her a hand with that!

More than completely scrap comics with the wikias, I was just trying to complement the info where comicvine just lack the depth I want (characters managment for example)
Last Edit: 1 year 1 day ago by Xelloss.
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 2 weeks ago #46626

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Marvel has its own API that allows scraping. I tried it out when it first came out, and even though it worked, it seemed to me no improvement over Comicvine, but most of all, they do have a very low limit for daily API hits, so even with everyone using your own API number, I find that very very limiting...
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 2 weeks ago #46627

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
I DIDN't know that, but I was talking about the Marvel WIKI, no the official one which is not that good...

I have been thinking in an offline scrapper, as you can download the wikia complete database every week and it only weights 36mb (compressed)

But I don't have the time (at least for now) for such an ambitious and complicate project...
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 1 week ago #46636

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
Might I suggest it could be easier to do a reverse search on every character that has a reference to the specific issue than data mining the character info in the issue itself.
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 1 week ago #46639

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
Mmmh... I don't think so....

With few comics there are much more characters than issues (as every issue has many characters)

With more and more comics, I found the number of characters and the number of comics become about the same...
Last Edit: 11 months 1 week ago by Xelloss.
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 1 week ago #46646

  • jkthemac
  • jkthemac's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 766
  • Thank you received: 253
  • Karma: 55
I am not suggesting it as the quicker option, just easier to code for.
The administrator has disabled public write access.

Using Marvel or DC wikias as scraping source.. could it be possible? 11 months 4 days ago #46695

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
mmmh... I understand what you are talking about... as there are number of appeareces per character lists only pages... and characters in comics in each page is mixed with other things...

All the same, to do that, I have to first separate character pages and comic pages... and in doing so I can have a list of all characters links that I can use to pinpoint the characters in the comic pages...

Another reason I want to do a comic search, is that to do it your way I have to analyse the whole database even for a few comics... the way I am saying, with a list of characters already done and saved in a file, I can only read the selected comics pages...

The real reason, though, is that one of the reasons I want to have this scraper, is to separate characters in groups (Main Characters, Extras, Flashbacks, Mentions, etc) and in teams (which character is part of a team in each comic), and this info is only present in the comic page... (the other main reason to be knowing which Earth is each character from in each comic... as for example now all Batmans are just Batman, and that ruin my character analysis script I am making XD)
Last Edit: 11 months 4 days ago by Xelloss.
The administrator has disabled public write access.
Time to create page: 0.184 seconds

Who's Online

We have 181 guests and 4 members online