What's going on in there, feat. tmux

One of the things that I have occasionally struggled with is figuring out what is going on at the interface between terminal and shell. It doesn't help that most of the time I'm spending time inside tmux, which is itself a terminal application that implements terminal emulation among a bunch of other features.

When you delve into the command line space you will find that different environments have different quirks. It's also pretty complicated when you're looking at what's going on when running tmux. When you have your shell and typing something isn't behaving the way you want, it is a very daunting problem to troubleshoot at times because you have a pretty complex process hierarchy, and manipulations could be happening at every layer of it.

Let's look at a typical tmux situation's process tree. We're going to assume the programs that I've traditionally been using (that is, alacritty as terminal emulator), but it won't be any different if you use a different stack.

This output is not exactly output from pstree, i cleaned it so only relevant information is shown. The main part that matters that I'm showing is the process hierarchy structure.

In this example we have tmux open with a split with two terminals open.

init (or launchd)
├─ tmux: server
│  ├─ zsh
│  └─ zsh
└─ alacritty
   └─ zsh
      └─ tmux attach

Tmux has a client server design, so what we're seeing here is that in action. We see first two zsh shell instances here running under the server. These are the actual shell instances of the terminals that we are going to be interacting with inside tmux.

The other zsh running under alacritty is just the shell that it spawns on launch, and we ran tmux attach inside there to attach to the tmux server. So, this second instance of tmux at the bottom there is a tmux client. The client-server architecture is what enables us to be able to create a tmux session and detach from it and reattach later on or over SSH. In this way I can have some terminal sessions persistent on the system and I can SSH into the machine over the network or internet to access my ongoing command line terminal state at any time.

Now let's try to follow a bit the information flow as we start typing into the alacritty window.

Keystrokes are potentially modified by the terminal emulator's own configuration. We are able to specify in alacritty's configuration file any mappings so that we can tweak the behavior to our needs. For example shift+tab is a keystroke that has no official terminal standard, and does not come mapped by default in many terminals, but a de-facto convention exists in terminal land as the 3 character escape sequence <Esc> [ Z, so my alacritty config includes among other things the keyboard mapping { key = "Tab", mods = "Shift", chars = "\u001b[Z" }.
The keystroke stream goes from the terminal emulator into zsh's stdin stream. This zsh is running directly under the terminal emulator. Typically, as explained in the preceding post, the shell's responsibilities include command line command management, but in the current state, we are running a tmux (client) in this shell now. Terminal applications sometimes need to take over the entire terminal. Vim and less would be a few examples. This tmux client process is another example. How this is implemented is that there are a few conventions implemented with terminal escape sequences that allow the program to communicate to the shell and terminal emulator to work together to facilitate a reasonable experience for this: I actually don't know the specifics of which app is responsible for which exact functionality, but broadly speaking I have two examples of necessary state changes that a terminal UI program will trigger via escape codes:
- one is that the terminal emulator is toggled into alternate screen mode. In this mode the existing scrollback of the shell session is hidden and a blank buffer is shown. In this mode, data pushed off the top does NOT enter the scrollback of the terminal buffer.
- second, the terminal is switched into raw mode. This means that the running application is taking over control of the input stream, and disables the shell's own handling. For example the shell's typical handling would be to provide a command prompt editing interface and allow launching the command when hitting enter. But we want none of that behavior for the duration of the full screen terminal application's execution, so in this mode the input to the shell will be transparently sent through to the child process, with a few exceptions such as Ctrl+Backslash as a built-in emergency shortcut which the shell may still continue to handle to allow you to terminate the process (Note to self: I may be wrong about this one, needs more testing). Since the tmux client is running under this shell and needs to provide the entire tmux interface through this terminal, it will be in this mode, so all of this input stream will be transparently provided to the tmux client. But also understand that in the general case the shell also has its own configuration for how keys may get mapped, mostly for customizing the command prompt interaction behavior.
The tmux client will directly consume the keyboard input and communicate it to the tmux server. This tmux client will also be responsible for echoing back everything the user sees (the tmux child terminals and the bits of tmux interface). I may have one tmux client on the same machine and another one SSH'd into it from another machine, I'll make a pstree example to show this later.
tmux server will be responsible for driving its own terminals which are running zsh as we discussed earlier. There is another layer of configuration here with tmux. Just as with the terminal emulator, tmux itself is emulating the terminals that we run inside of it, and so all of the attendant configurability is available here as well. We can look at an illustrative example of something I configure for tmux:
```
bind -n C-h if -F '#{||:#{==:1,#{window_panes}},#{||:#{m/r:^N?VIM,#{pane_title}},#{&&:#{window_zoomed_flag},#{==:ssh,#{pane_current_command}}}}}' "send-keys C-h" "select-pane -L"
```
As you can see tmux has a powerful set of logical conditions that can be used to control behavior contextually. In this configuration line I am specifying the following logic:
```
When ctrl+h is received by tmux,
if (window_panes == 1 OR (pane_title has "NVIM" or "VIM" in it) OR (window_zoomed_flag is set AND pane_current_command is ssh))
  send ctrl + h to child
else
  select the pane to the left
```
This is how I implement streamlined keyboard terminal navigation interface in tmux. If I'm in a vim inside tmux it will send the ctrl+h through to the vim, and if I'm in a regular tmux pane (e.g. running a shell), it will move focus to the pane to the left.
Finally the input keystrokes land in the actual shell that we've got open inside tmux.

init (or launchd)
├─ tmux: server (4)
│  ├─ zsh (5)
│  └─ zsh
└─ alacritty (1)
   └─ zsh (2)
      └─ tmux attach (3)

The above is a diagram showing the flow of the steps through our processes.