Recently I created a small DICOM server application for testing purposes, building them on 64 bit Ubuntu 15.10. As a first step I created C-STORE SCU and SCP pair of applications and I was happy with the performance I measured SCU accessing the SCP through the loopback interface (127.0.0.1): 1000 CT image in uncompressed explicit little endian format was received in less than a second. So far so good, I created the C-MOVE SCU and SCP applications (that included the C-STORE parts as well) and executed the same tests. To my surprise it was extremely slow: It was about 5 minutes to receive the same data.
First I was concerned that I was doing things wrong in my code, having not much experience with DCMTK. After spending too much time reviewing my code without any results I moved C-STORE SCP part to a separate process and bumm, got excellent performance. Another try, running the C-MOVE SCP and SCU in separate machines (or in C-MOVE SCU in a docker container) and bumm, again excellent performance. Plus if I used other C-MOVE SCU tool like gdcmscu it gave also excellent performance over the loopback interface...
But trying DCMTK's movescu tool connecting to my C-MOVE SCP gave the very same slow result, so it's kinda DCMTK SCU to DCMTK SCP strange behavior over the loopback interface?
All in all, after trying different things like disabling multi-threading support in DCMTK, profiling my SCU/SCP pair - those did not lead anywhere - my Goggle searches focused on people reporting slowness over the loopback interface and found this link useful:
for DCMTK (see ./dcmtk/src/dcmtk/config/docs/envvars.txt for environment variables configuring DCMTK runtime) the performance through loopback became significantly better (1 min instead of 4 mins), but still not as good as with C-STORE SCP/SCUs (1 sec).
After unsetting TCP_NODELAY I tried increasing
Explanation: By default, DCMTK uses a TCP send and receive buffer
length of 64K. If the environment variable TCP_BUFFER_LENGTH is set,
it specifies an override for the TCP buffer length. The value is
specified in bytes, not in Kbytes."
long story short: Increasing the BUFFER_LENGTH as
export TCP_BUFFER_LENGTH=129076 (129075 still results slow performance)
then performance became excellent, 1 sec for 1000 CT images (instead of 5 mins).
Root cause is not clear yet. Why it happens over the loopback interface and seemingly not over eth0?