RawDev.net - Just another Zabreznik.si Sites site
Home - Mail - About

Posts Tagged "xkcd"

Size of XKCD

Saturday, March 15th, 2008 by Marko Zabreznik

As the last post was about the size of bash.org, this one is about xkcd, the famous comic site, a simple set of scripts and you get the whole set and a few stats:
Use script wisely, it’s a strain on servers.

#!/bin/bash
echo "Downloading 395 pages."
for i in `seq 1 395`;
do
	if [ -s "xkcd/$i" ]; then
		continue
	else
		echo -n "`date +%H:%M:%S`: Trying $i ..."
		lynx --source "http://xkcd.com/$i" > "xkcd/$i"
		echo -n " Done. Image:.. "
		wget -q -p "comics" -nH "http://imgs.xkcd.com/comics/"`awk 'BEGIN{FS="<img src=\"http://imgs.xkcd.com/comics/";RS="\" title="}/<img/{print $2}' "xkcd/$i"`
		echo " Done."
		sleep 2s
	fi
done
echo "All done."

This piece of code does sometihng special, it takes the name of the image and uses wget to download it.

$n=1;
$vse=0;
while ($n < 410) {
	unset ($fajl);
	$fajl=file_get_contents("original/".$n);

	preg_match_all("|
<p class=\"quote\">(.*)<b>#(.*)</b>(.*)
<p class=\"qt\">(.*)

|Us", $fajl, $out);
	$i=0;
	while (isset($out[0][$i])) {
		echo '('.$out[2][$i].")\n".$out[4][$i]."\n";
		echo $out[2][$i]."\n".$out[4][$i]."\n";
		$i++;
		$vse++;
	}
	$n++;
}
echo "\n(".$vse.")";

And a parser that makes the final big file of everything, coincidentally also making the comments easy to read.
Comics make the most part of the download, with ~22 MB.

And as usual, the download link: LINK (22mb), or email me for the data.

Tags: , , ,
Posted in Hacking, Scripting - No Comments