Tuesday, September 13, 2016

DCMTK and 10 Gb/s

During testing on 10 Gb/s network with the tools mentioned in the previous post I experienced very low bandwidth usage during the tests: ~ 1.5 Gb/s instead of the 8 Gb/s I measured with IPerf and with my own Java tool for network transfer speed measurement.

Investigation showed that DCMTK unfortunately always set TCP send and receive buffers by setsoc

bufLen = 65536; // a socket buffer size of 64K gives best throughput for image transmission
...

setsockopt(sock, SOL_SOCKET, SO_SNDBUF, (char *) &bufLen, sizeof(bufLen));    (void) setsockopt(sock, SOL_SOCKET, SO_RCVBUF, (char *) &bufLen, sizeof(bufLen));

This was probably good approach before TCP autotuning was introduced, but since it is available in newer kernels (>2.6) setting SO_SNDBUF or SO_RCVBUF is switching off the TCP autotune feature that results terrible performance over 10 Gb/s (see https://www.psc.edu/index.php/networking/641-tcp-tune)


When I commented out these setsockopt lines in
./dcmnet/libsrc/dulfsm.cc
./dcmnet/libsrc/dul.cc
the newly built tools could reach ~5.2 Gb/s bandwidth utilization.

DCMTK adds 1 sec delay

When measured transfer speed with a simple dcmtk server (C-MOVE SCP/C-STORE SCU) and client (CMOVE SCU/C-STORE SCP) I was surprised to see that there is always a 1 sec delay before the actual C-STORE transfer starts.

Checking the source code revealed that in dimmove.c at selectReadable function:

timeout = 1; /* poll wait until an assoc req or move rsp */

then invoke with assoc and subassoc in the assocList:
ASC_selectReadableAssociation(assocList, assocCount, timeout))

that leads to DcmTransportConnection::fastSelectReadableAssociation
where it instead of a single select on the assoc and the subassoc it does a select for first on the association, then after a 1 sec delay on the subassociation.

This one second timout for select that is experienced as 1 sec extra if someone relies on the callback implementation for the C-STORE SCP on the client side.

When I implemented the C-CTORE SCP in its separate thread (and not using the callback in the DIMSE_moveUser) obviously this 1 sec delay was removed.

Thursday, February 4, 2016

DCMTK through localhost is slow

Recently I created a small DICOM server application for testing purposes, building them on 64 bit Ubuntu 15.10. As a first step I created C-STORE SCU and SCP pair of applications and I was happy with the performance I measured SCU accessing the SCP through the loopback interface (127.0.0.1): 1000 CT image in uncompressed explicit little endian format was received in less than a second. So far so good, I created the C-MOVE SCU and SCP applications (that included the C-STORE parts as well) and executed the same tests. To my surprise it was extremely slow: It was about 5 minutes to receive the same data.

First I was concerned that I was doing things wrong in my code, having not much experience with DCMTK. After spending too much time reviewing my code without any results I moved C-STORE SCP part to a separate process and bumm, got excellent performance. Another try, running the C-MOVE SCP and SCU in separate machines (or in C-MOVE SCU in a docker container) and bumm, again excellent performance. Plus if I used other C-MOVE SCU tool like gdcmscu it gave also excellent performance over the loopback interface...

But trying DCMTK's movescu tool connecting to my C-MOVE SCP gave the very same slow result, so it's kinda DCMTK SCU to DCMTK SCP strange behavior over the loopback interface?

All in all, after trying different things like disabling multi-threading support in DCMTK, profiling my SCU/SCP pair - those did not lead anywhere - my Goggle searches focused on people reporting slowness over the loopback interface and found this link useful:

http://stackoverflow.com/questions/5832308/linux-loopback-performance-with-tcp-nodelay-enabled

After setting

export TCP_NODELAY=0

for DCMTK (see ./dcmtk/src/dcmtk/config/docs/envvars.txt for environment variables configuring DCMTK runtime) the performance through loopback became significantly better (1 min instead of 4 mins), but still not as good as with C-STORE SCP/SCUs (1 sec).

After unsetting TCP_NODELAY I tried increasing

"TCP_BUFFER_LENGTH Affected: dcmnet Explanation: By default, DCMTK uses a TCP send and receive buffer length of 64K. If the environment variable TCP_BUFFER_LENGTH is set, it specifies an override for the TCP buffer length. The value is specified in bytes, not in Kbytes."

long story short: Increasing the BUFFER_LENGTH as

export TCP_BUFFER_LENGTH=129076 (129075 still results slow performance)
then performance became excellent, 1 sec for 1000 CT images (instead of 5 mins).
Root cause is not clear yet. Why it happens over the loopback interface and seemingly not over eth0?

Friday, January 1, 2016

Docear integration with PDFXCviewer through PlayOnLinux

Started exploring Docear (http://www.docear.org/) that is a "a unique solution to academic literature management, i.e. it helps you organizing, creating, and discovering academic literature". Downloading, starting it on my 15.10 Ubuntu went without problem. Configuring it to use the recommended PDF viewer application (PDFXCviewer) needed some tweaking, maybe because I installed it with PlayOnLinux.

Configuring it in Docear's preferences at the PDF Management needs the following command given:

playonlinux*--run*PDFXCview*/A*page=$PAGE*`winepath -w "$FILE"`