[viff-devel] Mystery of the quadratic running time solved?

Marcel Keller mkeller at cs.au.dk
Tue Mar 10 10:24:07 PDT 2009


>>> I think we would get the same result if we started a LoopingCall that
>>> executes process_deferred_queue with an interval of, say, 100 ms:
>>>
>>>   http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.task.LoopingCall.html
>>>
>>> This should work since the runUntilCurrent method runs through the
>>> waiting calls and will trigger our process_deferred_queue method.
>>>
>>> And voilá -- no hacking of the Twisted source needed.
>> I'm not sure but LoopingCall._reschedule() looks more like it
>> schedules the calls at certain tick, not as soon as possible after the
>> interval is elapsed. This might not cost too much time but still
>> doesn't feel very elegant. Furthermore, setting the interval very low
>> leads to high CPU usage when waiting. Again, this is not too bad but
>> not elegant either. The same applies if using reactor.callLater()
>> directly.
> 
> A looping call is just a higher level wraper for doing
> 
>   def reschedule(func):
>     func()
>     reactor.callLater(interval, reschedule, func)
> 
>   reschedule(func)
> 
> It will execute the function when the (now + interval) time has been
> reached and when the control flow returns to the reactor's event loop.
> We probably wont need the extra logic in a looping call, so we can just
> let the function reschedule itself like above.

That's what I meant with calling reactor.callLater() directly.

> If we do this with an interval of 0, then the function will be called on
> each iteration through the reactor's event loop -- just like your
> loopCall I believe?

Not exactly because it also sets the timeout of the select call to 0
leading to 100% CPU usage while when we are waiting.

>>>> diff -r e2759515f57f viff/runtime.py
>>>> --- a/viff/runtime.py	Thu Mar 05 21:02:57 2009 +0100
>>>> +++ b/viff/runtime.py	Fri Mar 06 13:43:14 2009 +0100
>>>> @@ -306,6 +306,8 @@
>>>>                  deferred = deq.popleft()
>>>>                  if not deq:
>>>>                      del self.incoming_data[key]
>>>> +                # just queue
>>>> +                self.factory.runtime.queue_deferred(deferred)
>>> Why is this done?
>> At this time, we shouldn't call the callbacks because we might recurse
>> into selectreactor.doSelect(). However, we want to know which
>> deferreds are ready so we can call deferred.callback() later.
> 
> Uhh, this sounds a bit dangerous -- do we know exactly which Deferreds
> we can invoke callback on and which we cannot? As I remember, we invoke
> callback in a couple of other locations, is that safe?

Yes, it is safe because the callback is called only once. When the data 
arrives, the Deferreds are paused, appended to the queue, and the 
callback is called. The Deferres in the queue are unpaused and removed 
in process_deferred_queue(). As far as I know you can pause and unpause 
Deferreds as you like.

>>> If that doesn't matter, then I think this would be faster:
>>>
>>>   queue, self.deferred_queue = self.deferred_queue, []
>>>   map(Deferred.unpause, queue)
>>>
>>> My idea is that looping over the list with map is faster than
>>> repeatedly popping items from the beginning (an O(n) operation).
>> But map() still would need O(n) time because that is the nature of
>> calling a function n times, isn't it? Maybe the function calls are
>> optimized but the code in the function still is called n times.
> 
> Each pop(0) call is an O(n) operation, so we get O(n^2) here -- it is an
> expensive way to loop through a list. And now that I look at it, using
> map will still unpause the Deferreds in the order as you added them.

OK, I wasn't aware that pop(0) is O(n), but I still think that the
complexities should be added resulting in running time O(n) again. Using 
a linked list would be more reasonable, of course.

> The difference is then that anything added to the queue as a result of
> the unpause calls will be processed the next time the code is called.

Yes, and the Deferreds in the queue previously would wait. I considered 
it to be more safe if the Deferreds are processed in the order they arrive.

>>> A question springs to my mind: calling
>>>
>>>   reactor.runUntilCurrent()
>>>   reactor.doIteration(0)
>>>
>>> is the same as calling
>>>
>>>   reactor.iterate(0)
>>>
>>> and the documentation for that method says:
>>>
>>>   [...] This method is not re-entrant: you must not call it recursively;
>>>   in particular, you must not call it while the reactor is running.
>>>
>>> How does your code ensure that we only call myIteration when we're
>>> not in a call made by the reactor? And could we simply call
>>> reactor.iterate instead?
>> We actually call it recursively but it should be reentrant if it's not
>> called from doIteration(). doIteration() is a the same as
>> select.doSelect(), which certainly is not reentrant. We however call
>> it from the loop call (process_deferred_queue()) after doIterate().
>>
>> Calling reactor.iterate() is not enough because it doesn't call
>> process_deferred_queue().
> 
> So if we call reactor.iterate and then runtime.process_deferred_queue as
> 
>   reactor.callLater(0, reactor.iterate)
>   reactor.callLater(0, runtime.process_deferred_queue)
> 
> we should be fine? They would then both run one after another just as
> soon as the event loop is reached.

Isn't that more or less the same as your definition of reschedule() 
above? Again, i would say to callLater(0, ...) sets the timeout of the 
select call to 0 resulting in 100% CPU usage while we are waiting.

> My goal is to violate as few Twisted docstrings as possible :-) And to
> use the tools provided by Twisted as much as possible.

Sure, it always is wise to use libraries in the way they're meant to. 
I'm in favor of the two-thread solution because I think that it can be 
implemented without any changes to Twisted.

> I would also like to hear what the Twisted guys have to say about
> calling reactor.iterate like this, it would be nice if they could say
> that it *is* infact safe to do it like this.

Jean-Paul Calderone wrote in his reply to my mail that there is a way to 
make reactor.run() return without stopping the reactor. Only then a call 
to reactor.iterate() is safe.


More information about the viff-devel mailing list