[gnutls-help] Regression bug between 2.x and 3.2?

Mon Jun 16 19:38:32 CEST 2014

Hello Nikos,

> Does making that change address your issue?

It's not that I cannot reissue gnutls_record_send() upon EAGAIN, it's that I can no longer distinguish
two different write faults that I was able to, formerly:  whether send() failed to complete because of
the internal device timeout or just because send() was not able to write everything at once.

Our push callback is an extended send()-like call, with only a simple addition:  if the socket is
not ready to accept data, it does block only for a certain amount of time before failing with EAGAIN
(if socket becomes ready sooner, it will do send() and return).  If a blocking send() was used, it would
have blocked indefinitely for the socket to become ready.  Non-blocking send() would've returned immediately
with EAGAIN.  So our callback is half-blocking send(), so that the waited time can be controlled.

Now compare:

OLD (2.4.2 in our case):

gnutls_record_send():
  push():
    send() > 0 ? return many sent
    else wait for ready or timeout:
      ready ? send() and return how many sent
      timeout ? return -1 (EAGAIN)
  nothing written ? bail out with an error (including EAGAIN)
  not all written ? repeat push()
  else return success
error == EAGAN ? timeout : some other error

NEW (3.2.13 for us):

gnutls_record_send():
   push():
     send() > 0 ? return how many sent
     else wait for ready or timeout:
       ready ? send() and return how many sent
       timeout ? return -1 (EAGAIN)
   not all written ? bail out with EAGAIN
   error ? bail out with an error (including EAGAIN)
error == EAGAIN ???

So the outer code can no longer figure out what the problem has been (there could not have been any, actually).
The push callback cannot set up any other errnos, because only EAGAIN/EINTR are treated specially, with
all others being de-facto fatal.

The new flow control has merged the two formerly distinct states into one.  There is no way to distinguish
from a time-out situation without having to re-write the callback the way, which is not send()-like.

The new shortcoming is not very logical -- there is a very little cost to add another pass when the
push callback cannot send everything, instead of complete stack unwind up to the point of
gnutls_record_send(), then re-doing the entire sequence.  To work around the timeout issue what we're
having to distinguish in particular, we will have to track this down all the way with an additional
context (and even more CPU cycles), which was not necessary before.

Anton Lavrentiev
Contractor NIH/NLM/NCBI