Monday September 06 , 2010
Text Size
   
Welcome, Guest
Please Login or Register.    Lost Password?

Grand Comics Database - API
(1 viewing) (1) Guest
A place to meet other Developers
Go to bottomPage: 12345
TOPIC: Grand Comics Database - API
#8987
Re:Grand Comics Database - API 1 Month, 2 Weeks ago Karma: 0
It's a huge job to create a big database do even compare it to comicvine and such.
Can you imagine the number of books you would have to fill in with the metadata??
You probably don't have that much slaves(hmm... users ) to do suhc a task.
KK1098
Fresh Boarder
Posts: 11
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9623
Re:Grand Comics Database - API 1 Week ago Karma: 2
600WPMPO wrote:
Yeah.. I remember that..

Can we have a reverse comic vine scraper ? We scrape our comics from comic vine, and then reverse scrape them to the database, from where they are available to be edited (like the wiki) till the boss considers it done and locks it as final ?


Looks like y'all are working on this scraper based on other threads. I'll just point out that scrapers are always hard to maintain unless the site you're scraping agrees to never change their UI. However, the GCD supplies MySQL data dumps every other week. The schema will gradually change as field definitions are improved but significant changes will be announced in advance and are generally easier to absorb than HTML restructuring.

Also, our data is licensed under a Creative Commons 3.0 Attribution license, which is even more permissive than the non-commercial version that ComicVine uses.

---
Henry Andrews
GCD Board Member / Lead Programmer
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9624
Re:Grand Comics Database - API 1 Week ago Karma: 2
DouglasBubbletrousers wrote:
Ok, I am totally fed up with Comicvine at this point. Not only do entries I've corrected change frequently

At the GCD we have a staff of approvers who, among other things, try to keep entries from getting thrashed between two different opinions. We keep a change history (which can be seen on our beta site, and will soon be visible in production once we're done testing it and migrating our old change history to the new format) so that approvers and editors can see if there's been a past controversy and avoid repeating it.

but they ignore reported bugs.

The GCD maintains a site that tracks open error reports (errors.comics.org/) and open technical bugs (dev.comics.org/bugs/). If you report either sort of problem, you are emailed whenever there is activity on your report, and you can go see what folks are working on if they're not working on your bugs.

--
Henry Andrews
GCD Board Member / Lead Programmer
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9625
Re:Grand Comics Database - API 1 Week ago Karma: 2
DouglasBubbletrousers wrote:
matrixik wrote:
Why you want to build everything from scratch if GCD is under heavy development and they try to address all that problems you write.
Simply give them feedback.

GCD indexing rules - adding and correcting data


Mainly for control, not just of the data being inputted/modified,


The control argument is always one that you have to decide about for the tradeoff. The GCD has it's own goals and it's own inertia, and that's what you deal with in return for the huge pile of data, and I'd be lying if I claimed that our direction and priorities would always match yours. However...

but also for development of the API to match CR needs specifically and the data standards (how to identify volumes, numbering of characters, etc.)

...you are also welcome to join our mailing lists to lobby for the data standards you want. You'll almost certainly find that many of the improvements you want are already in our plan. And if they're not we'd love to hear about them. The gcd-main and gcd-tech Google groups are the place to start.

Also, for fun

Speaking from experience, this is a really non-trivial project. That's not meant to be as discouraging as it sounds. It *is* a lot of fun, but it's a ton of work. You'll have advantages over us in that you're starting from scratch instead of trying to migrate data that's been collected originally in paper form starting in 1978 and in a series of increasingly detailed digital forms starting in 1994. But there's a lot of work to do, not just for the data itself, but for the approval process, UI design, and then keeping the system running once you have it up.

---
Henry Andrews
GCD Board Member / Lead Programmer
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9626
Re:Grand Comics Database - API 1 Week ago Karma: 2
spinekill wrote:
it looks like GCD doesn't have cover artist. Things like this is why it would be best to have a custom database would be best. Maybe I'll try to work on something and report back if my lazy ass actually starts this.

We do in fact have cover artist. There is a sequence of type Cover (always the first sequence) which has the credits for the cover. Currently, if there are multiple covers, then the first several sequences will be of type cover and have the credits for each variant cover. In the future, we plan to have a more clear notion of variant issues instead of just stuffing a bunch of extra cover sequences in. I'm sure you can find something else that you want that we don't have, but I just had to point out that the GCD does have cover artists.

---
Henry Andrews
GCD Board Member / Lead Programmer
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9627
Re:Grand Comics Database - API 1 Week ago Karma: 2
matrixik wrote:

Why?
1. Comics and manga at one place (hurray!)
2. Multi-language


I realize this was all in regards to a DB project here, but since someone pointed out this thread to me and I'm being GCD advocate here (hope y'all don't mind) I'll comment on how the GCD relates to all of this.

The GCD is multi-language and encourages the indexing of manga / manhwa / manhua / etc., but has only supported Unicode since late 2009. We need more folks who read Asian and other non-Latin-alphabet languages to start contributing in order to build up those areas. We do have some manga etc. in translation, and we have lots of European and Latin American comics in a variety of languages. For branching into new languages, we also need multi-lingual volunteers who can keep the non-English indexing communities in touch with the English-speaking lists. For instance, we have a Dutch mailing lists and one of the longtime project members who speaks both English and Dutch helps make certain that the Dutch contributors know what's being decided about the project direction, and the English users know what the Dutch contributors need.


2. Will you allow adding doujinshi? (Don't know name for comics... Self-published fanzine?)


The GCD currently allows anything that has at least half comics content, and this may be expanded in the future.


If you would allow adding all that then this db will grow rapidly and soon will be the biggest db at the internet. GCD don't have many different language comics. The Doujinshi & Manga Lexicon (mostly doujinshi) have over 300k objects (more than ComicVine or ComicBookDB have).


Yes, it's huge. The GCD currently has over 600k issues, although only about under 140k of them are fully indexed. Of the 600k, about 280k are in English with the rest mostly in western and northern European languages (60k Spanish, 54k Swedish, 42k Dutch, 42k Norwegian, 40k German, etc.) Given that we probably don't overlap much with the site you mention that's at least 900k issues to deal with! Not even counting the countries that neither site covers well.


Some problems that can appear when designing database are described here: New Fun Schema - GCD


I'm glad someone read that page :-D I spent most of a summer working with project members trying to hammer out many of the most difficult concepts, like how to deal with various formats that might have different notions of issue numbers, volume numbers, book numbers, volume titles, etc. There's a tremendous number of cases, and that's just dealing mostly with U.S., European, and translated Manga. And that was the 2nd pass at an improved database schema after a previous group spent a year on the first try.

---
Henry Andrews
GCD Board Member / Lead Programmer
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9628
Re:Grand Comics Database - API 1 Week ago Karma: 25
handrews wrote:

Henry, I think it is great having you in this discussion. As you can have read before, I support using data from GCD or other databases out there, rather than building our own.

Any news on the API?... it always seems more interesting than going to the MySQL dumps...


Looks like y'all are working on this scraper based on other threads. I'll just point out that scrapers are always hard to maintain unless the site you're scraping agrees to never change their UI. However, the GCD supplies MySQL data dumps every other week. The schema will gradually change as field definitions are improved but significant changes will be announced in advance and are generally easier to absorb than HTML restructuring.

---
Henry Andrews
GCD Board Member / Lead Programmer


Regarding this, I was the original creator of the script to get info from ComicVine. We are not using raw HTML scraping, but the API provided by comicvine. This is the reason I originally supported comicvine instead of other databases out there, regardless of the limitations of this API and the quality of the data (which, by the way, seems to have improved as of lately)!.

Cheers!
perezmu
Platinum Boarder
Posts: 487
graphgraph
User Offline Click here to see the profile of this user
Gender: Male An Idle Mind... Location: Spain Birthday: 01/01
The administrator has disabled public write access.
 
#9635
Re:Grand Comics Database - API 1 Week ago Karma: 0
Excellent that someone from GCD hear my(our) voice and I didn't waste my time for persuading for GCD.

handrews wrote:
matrixik wrote:

Some problems that can appear when designing database are described here: New Fun Schema - GCD

I'm glad someone read that page :-D

Sometimes I amaze myself how much I read (even if I don't need this to anything useful )

Cheers
matrixik
Junior Boarder
Posts: 20
graphgraph
User Offline Click here to see the profile of this user
Gender: Male Location: Poland Birthday: 11/21
Last Edit: 2010/08/30 11:26 By matrixik.
The administrator has disabled public write access.
I am what I am.
 
#9643
Re:Grand Comics Database - API 1 Week ago Karma: 2
perezmu wrote:

Henry, I think it is great having you in this discussion. As you can have read before, I support using data from GCD or other databases out there, rather than building our own.

Any news on the API?... it always seems more interesting than going to the MySQL dumps...


We still don't have any technical volunteers who can focus on it. We have three active server-side programmers, and all of us have a huge backlog of stuff to work on with the core data structures. We're integrating some new UI design volunteers who are badly needed (we now finally have one person who is focusing on that, and maybe one or two further volunteers who just joined the lists but haven't started working on anything yet). No one has stepped up on the API yet, although the new UI guy has a friend who might take this on. If we don't get new API-centric volunteers the core tech team will eventually get to the API once some of the most critical data structure fixes are finally complete. Like having proper database records for creators and characters.


Regarding this, I was the original creator of the script to get info from ComicVine. We are not using raw HTML scraping, but the API provided by comicvine. This is the reason I originally supported comicvine instead of other databases out there, regardless of the limitations of this API and the quality of the data (which, by the way, seems to have improved as of lately)!.


That makes a lot more sense If they've got an API up and running already we can't offer a solid timeline on when we can compete with that, as much as I'd like to.
handrews
Fresh Boarder
Posts: 7
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
#9648
Re:Grand Comics Database - API 6 Days, 20 Hours ago Karma: 0
hey. thanks for taking the time to respond.
spinekill
Fresh Boarder
Posts: 3
graphgraph
User Offline Click here to see the profile of this user
The administrator has disabled public write access.
 
Go to topPage: 12345
Moderators: 600WPMPO, Stonepaw

Who's Online

We have 126 guests and 4 members online
  • Alan Scott
  • GSwarthout
  • {Oo}
  • stuartjmoore

IM

You are not logged in.