Skip to content

could provide DNS records for mgd #6077

@davepacheco

Description

@davepacheco

From this TODO:

// TODO this should be ServiceName::Mgd, but in the upgrade
// path, that does not exist because RSS has not
// created it. So we just piggyback on Dendrite's SRV
// record.

As I understand the history here (which could definitely be wrong), at one point we changed RSS to make DNS records for this, but we couldn't rely on them because systems that were set up by RSS prior to that change (like dogfood) wouldn't have them.

Fortunately, Reconfigurator can now ensure that DNS records are consistently created for all systems.

If the DnsConfigBuilder already has the information it needs to create the correct records, then I think you can just change it to do so and that will cause RSS to generate the records and the records will be generated on existing systems the next time somebody executes a blueprint. I think @jgallagher confirmed this while working on #6050. If you need to change the DnsConfigBuilder interface, it still shouldn't be too bad. I think it only has two consumers: RSS and this function:

/// Returns the expected contents of internal DNS based on the given blueprint
pub fn blueprint_internal_dns_config(
blueprint: &Blueprint,
sleds_by_id: &BTreeMap<SledUuid, Sled>,
overrides: &Overridables,
) -> DnsConfigZone {
// The DNS names configured here should match what RSS configures for the
// same zones. It's tricky to have RSS share the same code because it uses
// Sled Agent's _internal_ `OmicronZoneConfig` (and friends), whereas we're
// using `sled-agent-client`'s version of that type. However, the
// DnsConfigBuilder's interface is high-level enough that it handles most of
// the details.
let mut dns_builder = DnsConfigBuilder::new();
for (_, zone) in
blueprint.all_omicron_zones(BlueprintZoneFilter::ShouldBeInInternalDns)
{
let (service_name, port) = match &zone.zone_type {
BlueprintZoneType::BoundaryNtp(
blueprint_zone_type::BoundaryNtp { address, .. },
) => (ServiceName::BoundaryNtp, address.port()),
BlueprintZoneType::InternalNtp(
blueprint_zone_type::InternalNtp { address, .. },
) => (ServiceName::InternalNtp, address.port()),
BlueprintZoneType::Clickhouse(
blueprint_zone_type::Clickhouse { address, .. },
) => (ServiceName::Clickhouse, address.port()),
BlueprintZoneType::ClickhouseKeeper(
blueprint_zone_type::ClickhouseKeeper { address, .. },
) => (ServiceName::ClickhouseKeeper, address.port()),
BlueprintZoneType::CockroachDb(
blueprint_zone_type::CockroachDb { address, .. },
) => (ServiceName::Cockroach, address.port()),
BlueprintZoneType::Nexus(blueprint_zone_type::Nexus {
internal_address,
..
}) => (ServiceName::Nexus, internal_address.port()),
BlueprintZoneType::Crucible(blueprint_zone_type::Crucible {
address,
..
}) => (ServiceName::Crucible(zone.id), address.port()),
BlueprintZoneType::CruciblePantry(
blueprint_zone_type::CruciblePantry { address },
) => (ServiceName::CruciblePantry, address.port()),
BlueprintZoneType::Oximeter(blueprint_zone_type::Oximeter {
address,
}) => (ServiceName::Oximeter, address.port()),
BlueprintZoneType::ExternalDns(
blueprint_zone_type::ExternalDns { http_address, .. },
) => (ServiceName::ExternalDns, http_address.port()),
BlueprintZoneType::InternalDns(
blueprint_zone_type::InternalDns { http_address, .. },
) => (ServiceName::InternalDns, http_address.port()),
};
// This unwrap is safe because this function only fails if we provide
// the same zone id twice, which should not be possible here.
dns_builder
.host_zone_with_one_backend(
zone.id,
zone.underlay_address,
service_name,
port,
)
.unwrap();
}
let scrimlets = sleds_by_id.values().filter(|sled| sled.is_scrimlet);
for scrimlet in scrimlets {
let sled_subnet = scrimlet.subnet();
let switch_zone_ip = overrides.switch_zone_ip(scrimlet.id, sled_subnet);
// unwrap(): see above.
dns_builder
.host_zone_switch(
scrimlet.id,
switch_zone_ip,
overrides.dendrite_port(scrimlet.id),
overrides.mgs_port(scrimlet.id),
overrides.mgd_port(scrimlet.id),
)
.unwrap();
}
dns_builder.build_zone()
}

We still have to make sure that a blueprint does get executed on customer systems, and it may take two releases before we can rely on these being present (i.e., deploy the release that creates the records and execute a blueprint; then deploy another release that relies on the records). But there's definitely a path at this point to getting new DNS records installed.

The reason this is important is that without this, the code has to hardcode the port number to use, which I'm guessing is part of the reason for #6076. Might this also help with #5092?

CC @internet-diglett

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions