Welcome, Guest
Discussion and Sharing of Web Comics

TOPIC: Regex help

Regex help 7 years 2 months ago #18205

  • Fuzzyluzzi
  • Fuzzyluzzi's Avatar
  • Offline
  • Gold Boarder
  • Posts: 314
  • Thank you received: 49
  • Karma: 12
Been working on making some CBWs for sites. Hit a snag on a site that uses Previous|Index|Next.
The regular expression only grabs .jpg. Anyone know the code needed to grab any valid picture format .png, .jpeg, .gif?
The administrator has disabled public write access.

Re: Regex help 7 years 2 months ago #18214

  • Helmic
  • Helmic's Avatar
  • Offline
  • Expert Boarder
  • (Don't be) That guy
  • Posts: 150
  • Thank you received: 82
  • Karma: 52
I'd have to see what your regex is, but it's simply a matter of not specifying that the text has to have ".jpg" somewhere in it. I'll use
src="(?<link>[^"]+)"

to find a URL. (?<link>regexgoeshere) simply is the "capture group" that highlights the text that you say will be the exact URL you want to use for the image. [^"]+ isn't too complicated either - the stuff in the bracket is the range of options I want to use. The carrot means anything BUT whatever is inside the brackets, so I want any character that isn't a quote. The plus sign following it means at least one of those characters needs to be present, but otherwise grab as many as possible. Altogether, you could read it as src equals quote text (if it's not another quote, it's part of the URL I want to capture) quote. That regex should grab any image on most webpages.

Obviously, you just want one image, the comic image. To specify that, you need to include some more text. Let's say the comic image is located here:
<img id="comic" src="http://www.webcomic.com/comics/page1.png" alt="Page1">

A lot of webcomics will include something like id="comic" or what have you somewhere near the URL, so you should use it to identify your comic image URL. You can then modify your regex into this
<img\sid="comic"\ssrc="(?<link>[^"]+)"

which should work 99% of the time. But what if the text on the page source looked like this?
<img alt="Sometimes I like to include alt text, sometimes I don't." src="http://www.webcomic.com/comics/page1.png" id="comic">

You'll still want to specify <img as the first thing to look for, but how do you skip over that variable alt text? I use [^<>]* to specify any character other than a greater than or lesser than sign, zero or more repetitions. So your regex may look like
<img[^<>]*src="(?<link>[^"]+)\sid="comic"

But sometimes there isn't any convenient little identifying text that you can use.
<img src="http://www.webcomic.com/comics/page1.png">

Since the URL is still predictable, all I have to do is match the pattern:
<img[^<>]*src="(?<link>http://www.webcomic.com/comics/[^"]+)"

Notice I don't include "page" as part of the pattern. Most artists may use a preditable naming scheme for their images, but they'll often break it for a special or guest comic. I also keep the <img[^<>]*src="(?<link>regex)" so that I know the image is actually visible on the screen and isn't just being linked to by someone in the comments.

Knowing just that about regexes should get you the vast majority of webcomics. This website should be able to teach you just about anything else you want to know about regexes.
Request webcomics on IRC: http://widget.mibbit.com/?server=irc.rizon.net&channel=%23comicrack
server: irc.rizon.net
channel: #comicrack
Or in the webcomics subforum: http://comicrack.cyolito.com/forum/22-web-comics
The administrator has disabled public write access.

Re: Regex help 7 years 2 months ago #18218

  • cYo
  • cYo's Avatar
  • Offline
  • Moderator
  • Posts: 3476
  • Thank you received: 676
  • Karma: 184
Great post :)
You could add it to the wiki as a kind of tips and tricks for "WebComic developers" (TM by 600 :)).
The administrator has disabled public write access.

Re: Regex help 7 years 2 months ago #18263

  • g1zm02k
  • g1zm02k's Avatar
  • Offline
  • Fresh Boarder
  • Posts: 9
  • Thank you received: 3
  • Karma: 4
You could just replace the 'jpg' with (jpg|gif|png|jpeg) which lets the evaluator pick either of the four options as a replacement.
This signature intentionally left blank ... well, apart from this bit with text!
Last Edit: 7 years 2 months ago by g1zm02k.
The administrator has disabled public write access.

Re: Regex help 7 years 2 months ago #18273

  • 600WPMPO
  • 600WPMPO's Avatar
  • Offline
  • Moderator
  • Posts: 3788
  • Thank you received: 560
  • Karma: 235
cYo wrote:
..."WebComic developers" (TM by 600 :)).
:blush:

Now Playing: The ComicRack Manual (Online)

See my new comics & gadgets on: Tumblr!
The administrator has disabled public write access.

Re: Regex help 7 years 2 months ago #18280

  • Helmic
  • Helmic's Avatar
  • Offline
  • Expert Boarder
  • (Don't be) That guy
  • Posts: 150
  • Thank you received: 82
  • Karma: 52
cYo wrote:
Great post :)
You could add it to the wiki as a kind of tips and tricks for "WebComic developers" (TM by 600 :)).

I demand to be referred to and credited as such, quotes and all.
Request webcomics on IRC: http://widget.mibbit.com/?server=irc.rizon.net&channel=%23comicrack
server: irc.rizon.net
channel: #comicrack
Or in the webcomics subforum: http://comicrack.cyolito.com/forum/22-web-comics
The administrator has disabled public write access.
Time to create page: 0.526 seconds

Who's Online

We have 123 guests and no members online