[viff-devel] Profiling the remaining 80%
Martin Geisler
mg at daimi.au.dk
Mon Dec 8 05:46:18 PST 2008
Hi Peter, hi everybody
You might have seen the graphs here:
http://viff.dk/files/cace-wp4-amsterdam-2008.pdf
The one where we short-circuit the Shamir sharing and recombination
shows that "the rest" still take up some 80% of the running time.
Running the benchmark with the time command gives:
./benchmark.py player-1.ini -c 10000
19.57s user 0.50s system 86% cpu 23.296 total
This is the time for the entire benchmark, including the initial
sharing of input numbers and a 3 second count-down. The numbers you
see in the graphs excludes this -- they only time the secure
multiplications themselves.
If I understand these numbers correctly, then the benchmark used 19.57
CPU seconds in user space, 0.50 seconds in kernel space and 23.296
seconds in total. As mentioned, the player sleeps for 3 seconds after
initialising the TCP connections, this is just so that I can see that
all three terminals are roughly synchronized.
But does that mean that almost no time was spend waiting for the
network?
Running the benchmark with the --profile flag gives:
2131588 function calls (1834818 primitive calls) in 30.045 CPU seconds
Ordered by: internal time, call count
List reduced from 256 to 40 due to restriction <40>
ncalls tottime percall cumtime percall filename:lineno(function)
166690 4.995 0.000 22.401 0.000 defer.py:306(_runCallbacks)
72 3.114 0.043 15.685 0.218 selectreactor.py:82(doSelect)
The line above says that we spend 3.1 secs in the doSelect function in
Twisted. I tried running the benchmarks with a countdown of 10
seconds, and the time spend in doSelct went up to 10.119 sec. So only
about 0.114 sec are actually spend waiting for network traffic.
90013 1.802 0.000 2.418 0.000 field.py:371(__mul__)
16668 1.631 0.000 4.962 0.000 shamir.py:30(share)
The Shamir sharing method is one of the blocks which we think could be
accelerated by dedicated NaCl code. But maybe not since each call
takes less than 1 ms?
203362 1.545 0.000 13.071 0.000 defer.py:168(addCallbacks)
20004 1.466 0.000 4.233 0.000 runtime.py:189(__init__)
260039 1.281 0.000 1.281 0.000 field.py:339(__init__)
80004 0.919 0.000 1.233 0.000 field.py:342(__add__)
33336 0.706 0.000 0.706 0.000 runtime.py:579(_expect_data)
10000 0.689 0.000 1.799 0.000 shamir.py:96(recombine)
104 0.682 0.007 12.319 0.118 basic.py:345(dataReceived)
33336 0.680 0.000 2.065 0.000 runtime.py:613(_expect_share)
33338 0.667 0.000 11.637 0.000 runtime.py:272(stringReceived)
33340 0.658 0.000 1.887 0.000 runtime.py:311(sendData)
70019 0.619 0.000 11.009 0.000 defer.py:283(_startRunCallbacks)
20000 0.616 0.000 3.899 0.000 passive.py:389(shamir_share)
70019 0.562 0.000 11.444 0.000 defer.py:229(callback)
113348 0.533 0.000 11.673 0.000 defer.py:185(addCallback)
70010 0.512 0.000 1.078 0.000 runtime.py:66(__init__)
33342 0.497 0.000 0.913 0.000 abstract.py:164(write)
10000 0.483 0.000 9.848 0.001 passive.py:163(share_recombine)
80006 0.439 0.000 3.628 0.000 runtime.py:213(_callback_fired)
30006 0.404 0.000 3.577 0.000 runtime.py:597(_exchange_shares)
23336 0.358 0.000 0.703 0.000 random.py:148(randrange)
23336 0.345 0.000 0.345 0.000 random.py:218(_randbelow)
30002 0.331 0.000 17.727 0.001 runtime.py:358(inc_pc_wrapper)
33342 0.316 0.000 1.230 0.000 basic.py:357(sendString)
20004 0.219 0.000 4.950 0.000 runtime.py:221(gather_shares)
33349 0.205 0.000 0.417 0.000 tcp.py:260(startWriting)
70018 0.202 0.000 0.202 0.000 defer.py:162(__init__)
10000 0.200 0.000 13.496 0.001 passive.py:138(mul)
33336 0.197 0.000 0.317 0.000 runtime.py:615(<lambda>)
1 0.183 0.183 5.223 5.223 benchmark.py:197(begin)
105 0.167 0.002 0.167 0.002 tcp.py:364(writeSomeData)
33351 0.156 0.000 0.211 0.000 abstract.py:267(startWriting)
1 0.151 0.151 14.350 14.350 benchmark.py:247(run_test)
10000 0.141 0.000 10.808 0.001 runtime.py:539(schedule_callback)
33340 0.140 0.000 2.028 0.000 runtime.py:315(sendShare)
23336 0.130 0.000 0.833 0.000 random.py:212(randint)
10000 0.122 0.000 9.970 0.001 runtime.py:559(callback_wrapper)
./benchmark.py player-1.ini -c 10000 --profile
111.08s user 5.99s system 97% cpu 2:00.45 total
(The big user time here is because it takes ages to load the 11 MiB
profiling trace after the benchmark was run.)
Sigurd, could you try posting a profile report too? As far as I
remember, you got slightly more detail with your Mac version of
Python.
The above report is for player 1 since that player waits the least.
Try making a report with player 1 on your Mac and the other players on
DAIMI computers.
--
Martin Geisler
VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.
More information about the viff-devel
mailing list