<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>RawDev.net &#187; script</title>
	<atom:link href="http://rawdev.net/tag/script/feed/" rel="self" type="application/rss+xml" />
	<link>http://rawdev.net</link>
	<description>Just another Zabreznik.si Sites site</description>
	<lastBuildDate>Tue, 27 Jul 2010 17:48:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Size of XKCD</title>
		<link>http://rawdev.net/2008/03/15/size-of-xkcd/</link>
		<comments>http://rawdev.net/2008/03/15/size-of-xkcd/#comments</comments>
		<pubDate>Sat, 15 Mar 2008 21:04:40 +0000</pubDate>
		<dc:creator>Marko Zabreznik</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[script]]></category>
		<category><![CDATA[xkcd]]></category>

		<guid isPermaLink="false">http://www.rawdev.net/2008/03/15/size-of-xkcd/</guid>
		<description><![CDATA[As the last post was about the size of bash.org, this one is about xkcd, the famous comic site, a simple set of scripts and you get the whole set and a few stats: Use script wisely, it&#8217;s a strain on servers. #!/bin/bash echo "Downloading 395 pages." for i in `seq 1 395`; do if [...]]]></description>
			<content:encoded><![CDATA[<p>As the last post was about the size of bash.org, this one is about xkcd, the famous comic site, a simple set of scripts and you get the whole set and a few stats:<br />
Use script wisely, it&#8217;s a strain on servers.</p>
<pre>#!/bin/bash
echo "Downloading 395 pages."
for i in `seq 1 395`;
do
	if [ -s "xkcd/$i" ]; then
		continue
	else
		echo -n "`date +%H:%M:%S`: Trying $i ..."
		lynx --source "http://xkcd.com/$i" &gt; "xkcd/$i"
		echo -n " Done. Image:.. "
		wget -q -p "comics" -nH "http://imgs.xkcd.com/comics/"`awk 'BEGIN{FS="&lt;img src=\"http://imgs.xkcd.com/comics/";RS="\" title="}/&lt;img/{print $2}' "xkcd/$i"`
		echo " Done."
		sleep 2s
	fi
done
echo "All done."</pre>
<p>This piece of code does sometihng special, it takes the name of the image and uses wget to download it.</p>
<pre>$n=1;
$vse=0;
while ($n &lt; 410) {
	unset ($fajl);
	$fajl=file_get_contents("original/".$n);

	preg_match_all("|
&lt;p class=\"quote\"&gt;(.*)&lt;b&gt;#(.*)&lt;/b&gt;(.*)
&lt;p class=\"qt\"&gt;(.*)

|Us", $fajl, $out);
	$i=0;
	while (isset($out[0][$i])) {
		echo '('.$out[2][$i].")\n".$out[4][$i]."\n";
		echo $out[2][$i]."\n".$out[4][$i]."\n";
		$i++;
		$vse++;
	}
	$n++;
}
echo "\n(".$vse.")";</pre>
<p>And a parser that makes the final big file of everything, coincidentally also making the comments easy to read.<br />
Comics make the most part of the download, with ~22 MB.</p>
<p>And as usual, the download link: <a href="http://upload2.net/page/download/uQpZwshE4OlMJ2W/xkcd.tar.gz.html" title="xkcd comics archive" target="_blank">LINK</a> (22mb), or email me for the data.</p>
]]></content:encoded>
			<wfw:commentRss>http://rawdev.net/2008/03/15/size-of-xkcd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Size of Bash.org</title>
		<link>http://rawdev.net/2008/03/15/size-of-bash-or/</link>
		<comments>http://rawdev.net/2008/03/15/size-of-bash-or/#comments</comments>
		<pubDate>Sat, 15 Mar 2008 06:20:26 +0000</pubDate>
		<dc:creator>Marko Zabreznik</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[bash.org]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[quote]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://www.rawdev.net/?p=1</guid>
		<description><![CDATA[I spent the last few hours on a simple question, how large is the worlds largest irc quote database (bash.org) ? Thinking specifically of the quotes themselves. So first i had to get them all, a simple bash script was sufficient. #!/bin/bash echo "Downloading 409 pages." for i in `seq 1 409`; do if [ [...]]]></description>
			<content:encoded><![CDATA[<p>I spent the last few hours on a simple question, how large is the worlds largest irc quote database (bash.org) ?<br />
Thinking specifically of the quotes themselves.</p>
<p>So first i had to get them all, a simple bash script was sufficient.</p>
<pre>#!/bin/bash
echo "Downloading 409 pages."
for i in `seq 1 409`;
do
if [ -s "original/$i" ]; then
continue
else
echo -n "`date +%H:%M:%S`: Trying $i ..."
lynx --source "http://www.bash.org/?browse=$i" &gt; "original/$i"
echo "Done."
sleep 10s
fi
done
echo "All done."</pre>
<p><em>Please, do not use that script, it is a strain on the bash servers, instead you can grab the original files at the end of the article.</em><br />
After a couple of hours that was done, and i had my next script ready as well;</p>
<pre>$n=1;
$vse=0;
while ($n &lt; 410) {
unset ($fajl);
$fajl=file_get_contents("original/".$n);

preg_match_all("|
<p class="">(.*)<strong>#(.*)</strong>(.*)
<p class="">(.*)|Us", $fajl, $out);
$i=0;
while (isset($out[0][$i])) {
echo '('.$out[2][$i].")\n".$out[4][$i]."\n";
echo $out[2][$i]."\n".$out[4][$i]."\n";
$i++;
$vse++;
}
$n++;
}
echo "\n(".$vse.")";
</pre>
<p>The last line is to make sure i got all of them, 20440 at the time.<br />
Ran it with shell, and piped to &#8220;final&#8221;: php parser.php &gt; final</p>
<p>So, the conclusion was, the size of bash.org is ~<strong>5 MB</strong><br />
This are the files if you want them: <a href="http://upload2.net/page/download/ZXuG8SjBioIIL5F/bash.tar.gz.html">link</a>. (or email me)</p>
]]></content:encoded>
			<wfw:commentRss>http://rawdev.net/2008/03/15/size-of-bash-or/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
