Skip to content

Conversation

@qiluo-msft
Copy link
Collaborator

Why I did it

Many future exist container's docker-*-sidecar will potential reuse the code inside docker-telemetry-sidecar folder. So I extract common python/bash functions used in docker-telemetry-sidecar into a common python library repo and buildimage script folder.

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Passing unit test

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Qi Luo <qiluo-msft@users.noreply.github.com>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft marked this pull request as ready for review January 8, 2026 22:42
Copilot AI review requested due to automatic review settings January 8, 2026 22:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors common Python and Bash functions from docker-telemetry-sidecar into shared libraries for reuse across future sidecar containers. The refactoring extracts file synchronization, CONFIG_DB operations, and Kubernetes pod control logic into common modules.

Key Changes:

  • Created new sonic_py_common.sidecar_common Python module with utilities for file sync, CONFIG_DB operations, and subprocess management
  • Extracted Kubernetes pod control logic into a shared k8s_pod_control.sh script that can be parameterized by service name
  • Refactored systemd_stub.py to use the new common library instead of duplicated code

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/sonic-py-common/sonic_py_common/sidecar_common.py New common library providing file sync, CONFIG_DB operations, and process execution utilities
src/sonic-py-common/tests/test_sidecar_common.py Unit tests for the new common library
files/scripts/k8s_pod_control.sh New shared Kubernetes pod control script parameterized by service name
dockers/docker-telemetry-sidecar/systemd_stub.py Refactored to import and use functions from sidecar_common instead of defining them locally
dockers/docker-telemetry-sidecar/systemd_scripts/telemetry.sh Simplified to delegate to shared k8s_pod_control.sh script
dockers/docker-telemetry-sidecar/cli-plugin-tests/test_systemd_stub.py Updated to work with refactored code and patched common library functions
dockers/docker-telemetry-sidecar/Dockerfile.j2 Added copying and permissions for new k8s_pod_control.sh script
rules/scripts.mk Added k8s_pod_control.sh to build system
rules/docker-telemetry-sidecar.mk Added k8s_pod_control.sh to container files and added unit test target

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +92 to +152
def db_hget(key: str, field: str) -> Optional[str]:
"""Get a single field value from CONFIG_DB hash."""
db = _get_config_db()
if db is None:
return None
try:
table, entry_key = _split_redis_key(key)
entry: Dict[str, str] = db.get_entry(table, entry_key)
except Exception as e:
logger.log_error(f"db_hget failed for {key} field {field}: {e}")
return None

val = entry.get(field)
if val is None or val == "":
return None
return val


def db_hgetall(key: str) -> Dict[str, str]:
"""Get all field-value pairs from CONFIG_DB hash."""
db = _get_config_db()
if db is None:
return {}
try:
table, entry_key = _split_redis_key(key)
entry: Dict[str, str] = db.get_entry(table, entry_key)
return entry or {}
except Exception as e:
logger.log_error(f"db_hgetall failed for {key}: {e}")
return {}


def db_hset(key: str, field: str, value: str) -> bool:
"""Set a single field value in CONFIG_DB hash."""
db = _get_config_db()
if db is None:
return False
try:
table, entry_key = _split_redis_key(key)
entry: Dict[str, str] = db.get_entry(table, entry_key)
entry[field] = value
db.set_entry(table, entry_key, entry)
return True
except Exception as e:
logger.log_error(f"db_hset failed for {key} field {field}: {e}")
return False


def db_del(key: str) -> bool:
"""Delete an entry from CONFIG_DB."""
db = _get_config_db()
if db is None:
return False
try:
table, entry_key = _split_redis_key(key)
# In ConfigDBConnector, setting an empty dict deletes the key
db.set_entry(table, entry_key, {})
return True
except Exception as e:
logger.log_error(f"db_del failed for {key}: {e}")
return False
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db_hget, db_hgetall, db_hset, and db_del functions lack test coverage in the new test suite. While these functions are tested indirectly through the systemd_stub tests, the common library should have dedicated tests for these database operations to ensure they work correctly in isolation and handle edge cases properly.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +55
def run(args: List[str], *, text: bool = True, input_bytes: Optional[bytes] = None) -> Tuple[int, str | bytes, str | bytes]:
"""Run a subprocess command and return (returncode, stdout, stderr)."""
logger.log_debug("Running: " + " ".join(args))
p = subprocess.Popen(
args,
text=text,
stdin=subprocess.PIPE if input_bytes is not None else None,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
out, err = p.communicate(input=input_bytes if input_bytes is not None else None)
return p.returncode, out, err
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run function lacks test coverage. Although run_nsenter is tested indirectly through mocking, the base run function should have its own tests to ensure proper command execution, error handling, and stdin/stdout/stderr handling work correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +80
def _get_config_db() -> Optional[ConfigDBConnector]:
"""Get or create ConfigDBConnector instance (singleton pattern)."""
global _config_db
if _config_db is None:
try:
db = ConfigDBConnector()
db.connect()
_config_db = db
logger.log_info("Connected to CONFIG_DB via ConfigDBConnector")
except Exception as e:
logger.log_error(f"Failed to connect to CONFIG_DB: {e}")
_config_db = None
# Do not cache failed connection; try again next time
return _config_db
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_config_db function has problematic error handling in the singleton pattern. When a connection fails, it sets _config_db = None but the comment says "Do not cache failed connection; try again next time". However, since _config_db is checked with if _config_db is None at the start, it will attempt to reconnect on the next call. But the exception is caught broadly with except Exception as e:, which could mask critical errors. Consider using more specific exception handling or adding a cooldown mechanism to avoid repeated connection attempts on persistent failures.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +125
cmd_start() {
if command -v systemd-cat >/dev/null 2>&1; then
# background + pipe to journald with distinct priorities
( kill_pods ) \
> >(systemd-cat -t "${SERVICE_NAME}-start" -p info) \
2> >(systemd-cat -t "${SERVICE_NAME}-start" -p err)
else
# background + pipe to syslog via logger in case systemd-journald is masked/disabled
( kill_pods ) \
> >(logger -t "${SERVICE_NAME}-start" -p user.info) \
2> >(logger -t "${SERVICE_NAME}-start" -p user.err)
fi &
disown
exit 0
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cmd_start function backgrounds the kill_pods operation and immediately exits with code 0, which means it reports success before the pods are actually deleted. If kill_pods fails (returns non-zero), the calling code will never know. Consider either making start blocking or adding a mechanism to track and report background operation failures.

Copilot uses AI. Check for mistakes.
COPY ["systemd_stub.py", "/usr/bin/"]
COPY ["systemd_scripts/", "/usr/share/sonic/systemd_scripts/"]
COPY ["files/container_checker", "/usr/share/sonic/systemd_scripts/container_checker"]
COPY ["files/telemetry.sh", "/usr/share/sonic/systemd_scripts/telemetry_v1.sh"]
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The COPY command on line 23 copies k8s_pod_control.sh to /usr/share/sonic/scripts/k8s_pod_control.sh, but there's no explicit creation of the /usr/share/sonic/scripts/ directory. While COPY will create intermediate directories, it's better practice to explicitly create the directory first with RUN mkdir -p /usr/share/sonic/scripts before copying files to it, especially for clarity and to ensure proper permissions.

Suggested change
COPY ["files/telemetry.sh", "/usr/share/sonic/systemd_scripts/telemetry_v1.sh"]
COPY ["files/telemetry.sh", "/usr/share/sonic/systemd_scripts/telemetry_v1.sh"]
RUN mkdir -p /usr/share/sonic/scripts
``` #Resolved

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COPY will create intermediate directories, no harm.

shift
fi

# SERVICE_NAME must be set by caller (e.g., "telemetry", "telemetry")
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 19 mentions "telemetry" twice, which appears to be a copy-paste error. The second instance should likely be a different service name or removed entirely to make it a general comment.

Suggested change
# SERVICE_NAME must be set by caller (e.g., "telemetry", "telemetry")
# SERVICE_NAME must be set by caller (e.g., "telemetry")
``` #Resolved

Copilot uses AI. Check for mistakes.
return False
try:
table, entry_key = _split_redis_key(key)
entry: Dict[str, str] = db.get_entry(table, entry_key)
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db_hset function has a potential bug where it retrieves the current entry using get_entry, modifies the returned dict by adding a field, then sets it back. However, if get_entry returns None (which can happen when the entry doesn't exist), the code will fail with an AttributeError when trying to set entry[field]. The function should initialize entry as an empty dict when get_entry returns None.

Suggested change
entry: Dict[str, str] = db.get_entry(table, entry_key)
entry: Dict[str, str] = db.get_entry(table, entry_key) or {}

Copilot uses AI. Check for mistakes.
"""Set a field in a CONFIG_DB hash."""
if key not in config_db:
config_db[key] = {}
config_db[key][field] = value
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fake_db_hset function, there's a missing return value. The function should return True to indicate success (matching the signature of the real db_hset function which returns a boolean). This could cause tests to fail or behave unexpectedly.

Suggested change
config_db[key][field] = value
config_db[key][field] = value
return True

Copilot uses AI. Check for mistakes.
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

@make1980 make1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants