Two blogs in one weekend. Impressive for me!
I’ve made a lot of progress with the Cloud-Bible project (the name is growing on me, slowly).
Switching from the simpler-to-follow (but very assuming) big “switch” clause which was previously handling reverse-engineered tags (oh, I didn’t mention the XML I have is not precise OSIS format did I? :() to the more complex (but elegant?) recursive-function method brought not only more robust operation (a good thing – variances in the XML layout are handled now) but the performance has been greatly decrease (a bad thing – down from 0.008s to 0.014s average).
I’ve tried a lot of ideas to increase the performance again, including passing everything by reference (so the stack isn’t filled with copies of SimpleXML objects) – all to no avail.
I did however have (some) victory by reworking this:
This is slower by 0.000006s per iteration than the code in use below.
function process_xml ($x, $children=false) {
if ($subs) foreach ($z->children () as $x) process_xml ($x);
else {
// ... processing here
I wasn’t sure from the start that the additional recursion (in the foreach()) was a good thing, and my trial-and-error agreed:
function process_xml ($xml, $children=false) {
$to_process = ($children)?$xml->children ():array ($xml);
foreach ($to_process as $x) {
// ... processing here
Surprisingly (to me) the latter performed faster. ~0.000006s per iteration to be precise. Not much, but it adds up (larger XML files can reach 1000 or more iterations, recovering around half the performance lost earlier!)
The jury (in my mind) is therefore still out on whether to handle the XML each on each request (allowing users to turn on and off features, and include user-specific markup like highlighting etc) or whether to build a cache of HTML files alongside the XML counterparts. (Certainly, the latter would be faster. And given the Bible isn’t suspected to be changing any time soon…)
Some of the changes introduced include:
- Red-letter (Jesus’ words in red)
- Handling of quotation marks (IE doesn’t conform to the standards, but beside that the XML I use is UTF-8 encoded and provides sets of quotation marks as needed)
- (Corrected) handling of references, variances etc. Before the recursive code was introduced, I was missing any “notes” which bundled more than one reference up in them.
So it’s progress.
While on the subject of performance, I’ve been trying to work out some targets. How much memory will I allow this script to consume before it’s “too much”? How fast must it execute to remain acceptable?
Currently the figures for Mark chapter 1 come in at:
- 0.013s execution time
- 0.18MB peak memory usage
Interestingly, memory usage jumped up hugely after a tiny change in the recursive function (I can’t even remember what it was now). I thought about this for a moment at the time, and hypothesised that this really makes a lot of sense. Each iteration is shoving data onto the stack, eating memory and generally doing a whole lot of “weight throwing”. I went as far as jotting down my expectations of memory usage per iteration. I expect it to look like a very wide but short kind of bell-curve, initially shooting up as XML depth is reached, then bobbing up and down, finally tailing rapidly off as the end is reached.
Due to my geeky nature, I recorded the memory usage on Mark 1, and went to graph them. But I never got that far. Just looking at the results were quite clear: an immediate shoot upward (9KB across the first 9 iterations) then a very slow increase (about 1.5KB every 20 iterations) thereafter. And it never decreased.
PHP I believe uses a garbage collector, and presumably it considers my usage so low that it would have a negative effect to execute mid-script. Fair play!
My concern really boils down to this: let’s assume that this project really takes off. If we aim to handle 100 requests per second (that’s around 40 times what Wikipedia’s English wiki received on average 2003q1) … I want to service each request within 0.1s each and not exceed 512MB memory on a dual-core Intel (comparable to what I’m running on right now).
I’m working on the grossly over simplified (and wildly inaccurate) calculation:
MaxHits = Min((CpuCores/ExecutionTime),(MemoryLimit/PeakMemoryUsage))
Currently that stands at:
MaxHits = Min((2/0.013),(512/0.26)) = 154
So we’re in. We’re limited hugely by the execution time. I can afford to go up as far as 0.02s per request, but that’s it.
We’ll see what happens!
Hi,
I had exact thoughts a few months ago and came up with a similar solution that you’re thinking (still under process though).
There are some Cloud Bible apps but I felt very uncomfortable using them as there were very confusing and browsing around needed lots of clicks. I tried to overcome this problem in my app.
Please have a look.
http://www.cloudstudybible.com/
John.