ACR Private Endpoint
An Azure Container Registry by default allows the access from public internet. To limit the access to private network (eg, Azure Virtual Network), ACR previewed Service Endpoint a couple years ago to allow the customers to restrict the access to certain VNets. However Service Endpoint doesn’t address a data exfiltration concern. After the customers enable Azure Container Registry Service Tag on their VNets, they don’t have an easy to way to configure the firewall rules to limit the outbound request to the specific registries.
The successor Private Endpoint was later introduced to solve the data exfiltration problem. After the customers enable registry private endpoints, the target registries will be “projected” into the VNets and have their dedicated IP address for each endpoint. With the dedicated IP addresses, the customer network administrators can easily enable the firewall rule to filter the outbound traffic. One bonus of using private endpoint is the client now access registry through a set of dedicated IPs and it will avoid potential SNAT port exhaustion problem when it tries to connect the public endpoints. For example, assuming you create myregistry.azurecr.io
in West US
region, after you enable the private endpoint in the subnet (192.168.1.0/24), you will see two IPs are allocated for myregistry.azurecr.io
endpoints.
-
myregistry.azurecr.io
: 192.168.1.5 -
myregistry.westus.data.azurecr.io
: 192.168.1.4[NOTE] The actual IP allocation is based on the address availability of the subnet. In the above example, both endpoints must be set up properly in VNet DNS server and firewall (if you choose Azure private DNS service, they are supposed to automatically set up for you).
myregistry.westus.data.azurecr.io
is the “invisible underlying” endpoint that actually serves the image layer data so it’s equally important asmyregistry.azurecr.io
endpoint. You can refer to Azure Container Registry: Image Pull Flow for the data flow detail.
When you try to pull the image from myregistry.azurecr.io, what you see from the application level is the requests are sent to 192.168.1.5
and 192.168.1.4
. The underlying Azure SDN (Software Defined Network) stack will look up the actual ACR server backing the private endpoint IPs and route the requests to the corresponding server. The data transmission is private and secured. It always stays in the Azure backbone.
ACR Geo Replication
ACR Geo Replication automatically replicates the data to the additional configured regions to provide the high available and low-latency registry service around the world. It can seamlessly works with ACR Private Endpoint feature to provide the same high available service in the customers’ private network environment as well. For example, if you add East US
replication in myregistry.azurecr.io
, the system will automatically provision an additional IP in the VNet for the new East US
data endpoint.
myregistry.azurecr.io
: 192.168.1.5myregistry.westus.data.azurecr.io
: 192.168.1.4myregistry.eastus.data.azurecr.io
: 192.168.1.6
Once you see the above setup in your VNet, you may start to wonder how Azure SDN determine which region the request should be routed to. If the request goes to West US
, will it automatically fail over to East US
if West US
has an outage or vice versa?
Simply speaking, public endpoint and private endpoint use the same routing mechanism for Geo Replication. After you create the first geo-replication region for myregistry.azurecr.io
, the system will create a dedicated TMP (Traffic Manager Profile) in Azure that is used to route the requests for myregistry.azurecr.io
. In this case, the TMP has two ACR regional servers as upstream: East US
and West US
. It decides the best routing path based on the client connection latency to the upstream ACR servers.
[TIP] You can run nslookup myregistry.azurecr.io
in a public network environment to glance over the TMP setup.
In the Private Endpoint scenario, when the requests are sent to 192.168.1.5
(myregistry.azurecr.io), the underlying Azure SDN will query the best routing path from TMP. If TMP determines West US
provides the lower latency to the client, it will instruct Azure SDN to route the requests to West US
. Once the routing path is settled, the following layer data requests will be pined on 192.168.1.4
(myregistry.westus.data.azurecr.io).
In a disaster recovery scenario, if TMP detects West US
is offline, it will automatically failover to East US
and instruct the new client connection to send the request to East US
.
To test the failover behavior, you can follow the instruction and temporarily disable West US
endpoint. After disabling, clients usually take ~15 minutes (up to 30 minutes) to refresh the local DNS cache and start to send the request to East US
. During the endpoint disabling, you can continue to push or import the image to the registry and the data will be replicated to the West US
continuously. After you reenable West US
endpoint, the request will be routed to West US
again.
VNet Peering and Hub-Spoke
Geo replication leverages TMP to determine the connection latency and choose the best routing path. If the VNet is in the same region as one of the replication region, most time the request is routed to the replication region as the intra-region connection latency is usually the lowest. When you start to grow your services network across multiple regions, you typically need to create create the corresponding private endpoint for the registry in each regional VNet. However private endpoint per VNet may not work for the following reasons.
- The existing VNet may not have enough IPs reserved for the private endpoints.
- Each registry can only have up to 200 private endpoints. If you have more than 200 VNet, you will hit the blocker.
- Your network administrator wants to use the cloud appliance solution (eg, Azure Firewall Service) to centrally control and monitor the network traffic.
Hub-Scope network provides a solution to host Azure service in a central Hub VNet (eg, you will only need to create one private endpoint for myregistry.azurecr.io
in the Hub VNet). The workloads running on other VNets can access the shared Azure service through VNet Peering.
┌───────────────────────────────────┐ ┌───────────────────────────────────┐
│ myregistry.azurecr.io (West US) │ │ myregistry.azurecr.io (East US) │
└───────────────────────────────────┴───────────────┴─────────────────┬─────────────────┘
│
┌───────────────────────────────────┐ ┌─────────────────┼──────────────────┐
│ │ │ │ │
│ ┌───────────────────────────┐ │ │ ┌──────────────▼─────────────┐ │
│ │ │ ├───────────────► │ 192.168.1.5 (myregistry) │ │
│ │ AKS Cluster │ │ VNet Peering │ │ 192.168.1.4 (westus.data) │ │
│ │ │ ◄───────────────┤ │ 192.168.1.6 (eastus.data) │ │
│ └───────────────────────────┘ │ │ └────────────────────────────┘ │
│ Spoke VNet (West US) 192.168.2/24 │ │ Hub VNet (East US) 192.168.1/24 │
└───────────────────────────────────┘ └────────────────────────────────────┘
The above diagram illustrates a simplified example of Hub-Scope network topology. myregistry.azurecr.io
private endpoint is provisioned to Hub VNet (East US). The AKS cluster running in Spoke VNet (West US) access myregistry.azurecr.io
through the peering network.
One caveat of the above network setup is the request from AKS Cluster to myregistry.azurecr.io
will be routed to East US
even though the registry has a replication in the same West US
region as the AKS cluster. It is because the private endpoint is provisioned in Hub VNet and TPM resolves the routing path based on Hub VNet which is close to East US
replication.