Background

https://zhuanlan.zhihu.com/p/337199868

Bad Case

cat /tmp/shell/data.bin |\
    tee >(ssh localhost "cat - > /dev/null") |\
    cat > /dev/null

If the data.bin is big enough (e.g. generated via head -$((200000)) /dev/urandom > data.bin), then it might failed with error:

tee: ‘standard output’: Resource temporarily unavailable

Good Case

cat /tmp/shell/data.bin |\
    tee >(ssh localhost "cat - > /dev/null" | cat) |\
    cat > /dev/null

or

cat /tmp/shell/data.bin |\
    tee >(ssh localhost "cat - > /dev/null" > /dev/null) |\
    cat > /dev/null

Deep Analysis

Let’s first modify the script a bit to help us get more information:

cat /tmp/shell/data.bin |\
    tee >(ssh localhost "cat - > /dev/null") |\
    cat > /dev/null &

pstree -pa $$

Run it:

💤 strace -ff -o strace ./a.sh
a.sh,144826 ./a.sh   
  ├─cat,144827 /tmp/shell/data.bin
  ├─cat,144829     
  ├─pstree,144831 -pa 144826
  └─tee,144828 /dev/fd/63
      └─a.sh,144830 ./a.sh
          └─ssh,144832 localhost cat - > /dev/null

Now we get the strace for each process, let’s start from the root process a.sh,144826:

...
pipe([3, 4])                            = 0
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5f81798e50) = 144827
...
close(4)                                = 0
close(4)                                = -1 EBADF (Bad file descriptor)
pipe([4, 5])                            = 0
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5f81798e50) = 144828
...

The last line is the starting point of tee process, at which point, it has following entires in the file entry table:

fd description
0 stdin
1 stdout
2 stderr
3 output of cat /tmp/shell/data.bin
4 pipe read end (for | cat > /dev/null)
5 pipe write end (for | cat > /dev/null)

Now let’s look into tee,144828:

...
close(4)                                = 0
dup2(3, 0)                              = 0
close(3)                                = 0
dup2(5, 1)                              = 1
close(5)                                = 0
...
pipe([3, 4])                            = 0
...
dup2(4, 63)                             = 63
close(4)                                = 0
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5f81798e50) = 144830
...

The last line is the starting point of a.sh,144830, at which point, it has following entires in the file entry table:

fd description
0 output of cat /tmp/shell/data.bin
1 pipe write end (for | cat > /dev/null)
2 stderr
3 pipe read end (for >())
63 pipe write end (for >())

Now let’s look into a.sh,144830:

...
dup2(3, 0)                              = 0
close(3)                                = 0
close(63)                               = 0
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5f81798e50) = 144832
...

The last line is the starting point of ssh,144832, at which point, it has following entires in the file entry table:

fd description
0 pipe read end (for >())
1 pipe write end (for | cat > /dev/null)
2 stderr

Now let’s look into ssh,144832:

...
dup(0)                                  = 4
dup(1)                                  = 5
dup(2)                                  = 6
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK)  = 0
openat(AT_FDCWD, "/dev/null", O_WRONLY) = 7
dup2(7, 1)                              = 1
close(7)                                = 0
...

At this point, the file entry table is as below:

fd description
0 pipe read end (for >())
1 /dev/null
2 stderr
3 some socket (not important)
4 pipe read end (for >())
5 pipe write end (for | cat > /dev/null)

Notice that ssh also set O_NONBLOCK on fd4 and fd5, which in this case is the pipe read end (for >()) and pipe write end (for | cat > /dev/null). The formmer one only affects the ssh itself, however, the latter one is also used by tee!

Summary

In Linux, we have following kernel data structure in turns of duplicated file entries:

kernel data structure

(From APUE3 3.10)

Regarding the file table:

The kernel maintains a file table for all open files. Each file table entry contains

  1. The file status flags for the file, such as read, write, append, sync, and nonblocking
  2. The current file offset
  3. A pointer to the v-node table entry for the file

(From APUE3 3.10)

kernel data structure2

(From CSAPP2 10.6)

File table: The set of open files is represented by a file table that is shared by all processes. Each file table entry consists of (for our purposes) the current file position, a reference count of the number of descriptor entries that currently point to it, and a pointer to an entry in the v-node table. Closing a descriptor decrements the reference count in the associated file table entry. The kernel will not delete the file table entry until its reference count is zero.

(From CSAPP2 10.6)

Back into our case, when ssh set the O_NONBLOCK flag onto fd5, it actually modifies the file table of the pipe, which in turns affects tee as tee is another user of the pipe (via its fd1).

So this explains that when tee write the content forward to its stdout, because the stdout is now in non-blocking mode, it might return EAGAIN when the pipe is fed up. In this case, because tee do not have retry logic, it will just error out.

How About the Good Case

So let’s switch to see why does the Good Case will work. Let’s look into the strace of the ssh:

(the following output is from the 2nd Good Case)

...
openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
close(3)                                = 0
...
dup(0)                                  = 4
dup(1)                                  = 5
dup(2)                                  = 6
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
fcntl(5, F_SETFL, O_WRONLY|O_NONBLOCK)  = 0
...

You can see that before ssh doing all the dup and setting, it will firstly duplicate fd1 to a newly opened file : /dev/null. So later, when it dup fd1 again and do setting on it, ssh is actually setting the file /dev/null, not the pipe used by tee.