Welcome, Guest
General discussion about ComicRack
  • Page:
  • 1
  • 2

TOPIC: COMIC.ORG offline "pseudo scraper" - Need help with a project!

COMIC.ORG offline "pseudo scraper" - Need help with a project! 8 months 1 week ago #47130

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
You are surely used to very good Internet connections... My connection here is only 3Mbs (with luck) and it take me ages...

In addition, even with a good connection, a crawler image by image would take hours... (I am quite used to use that kind of programs). We are talking about a million covers...
Last Edit: 8 months 1 week ago by Xelloss.
The administrator has disabled public write access.

COMIC.ORG offline "pseudo scraper" - Need help with a project! 8 months 1 week ago #47131

  • DanielFJorge
  • DanielFJorge's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 8
  • Thank you received: 6
  • Karma: 1
Well... that is true... my connection is 100Mbs... it is a pitty... I do not have time to actively contribute to this... But I think it is a really nice project...
The administrator has disabled public write access.

COMIC.ORG offline "pseudo scraper" - Need help with a project! 8 months 1 week ago #47133

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
I think yours is a fantastic idea, I have thought of that in the past... but it is NOT THAT simple to do...
The administrator has disabled public write access.

COMIC.ORG offline "pseudo scraper" - Need help with a project! 8 months 1 week ago #47138

  • perezmu
  • perezmu's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 1114
  • Thank you received: 64
  • Karma: 51
Also, the existing Comicvine scraper uses some library to do this image comparison, you can check the source code...

Cheers! :)
The administrator has disabled public write access.

COMIC.ORG offline "pseudo scraper" - Need help with a project! 6 months 4 weeks ago #47337

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
I just wanted to say I have NOT been doing any progress with this lately...

However i HAVE been doing some progress with a new tool to import data from the DC and MARVEL wikia to the comics in comicrack...

It is till a lot of work to be done... but things look promising... (today I achieved what is the beggining of a database of characters data mined from the wikia, which includes data tree with teams members in each comic)

I will release an experimental release when I have something that can be used without doing coding, but just wanted to post I am still working in my projects and making some progress XD
Last Edit: 6 months 4 weeks ago by Xelloss.
The administrator has disabled public write access.
The following user(s) said Thank You: Alan Scott, romsnesrom

COMIC.ORG offline "pseudo scraper" - Need help with a project! 6 months 3 weeks ago #47350

  • Alan Scott
  • Alan Scott's Avatar
  • Offline
  • Gold Boarder
  • Posts: 264
  • Thank you received: 20
  • Karma: 10
That would be fantastic! It would be great whenever I'm reading a comic and looking for more info pertaining to it to be able to do so from within ComicRack, similar to what I already can do thanks to the ComicVine script. Both Wikias are very well maintained so that data would be great to access.
... The failure to appreciate... is perfectly understandable, because the readership never evaluates old material in the context of the cultural climate in which it was created, or the state of the art at the time it was created.
Marty Pasko
The administrator has disabled public write access.

COMIC.ORG offline "pseudo scraper" - Need help with a project! 6 months 3 weeks ago #47364

  • Xelloss
  • Xelloss's Avatar
  • Offline
  • Platinum Boarder
  • Posts: 455
  • Thank you received: 117
  • Karma: 24
The problem with wikias is that its information is not databased, it is in the middle between some kind of data format and just text... so reading info from it is tricky and always wirh an error margin... (for example: somone used spaces for characters, but other *, and others ;... of course as it is a very serious wikia there is an official format, but not everybody respect it)

So I have to run tests and tests and tests.. picking the more used formats... and trying to reach 90% of the info without breaking the code... which is a hard task...

What I mean is, don't expect it to be a 100% accurate scrapper, as it is the comicvine one... that will never happen... what I am trying to achieve is something that read 90-95% of the info there with the lesst number of errors possible...

AND THAT IS ONLY the data mining... Then I will begin the part I sync that with comics scrapped with comicvine... which also will have an error margin... (I will not make an interactive GUI as in the comicvine one as it is A LOT OF WORK and I am not good at coding GUIs, it will be more of an automatic process, at least for now)

But I am doing some progress in my free time :)
Last Edit: 6 months 3 weeks ago by Xelloss.
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Time to create page: 0.217 seconds

Who's Online

We have 265 guests and 3 members online