使用 Lettuce 和 Redisson 对 Amazon Multi-AZ ElastiCache for Redis 实现就近读取

日期: 2023-02-03

Amazon ElastiCache for Redis Multi-AZ 部署的必要性

Amazon ElastiCache for Redis 是一种 Web 服务，可让用户在云中轻松设置、管理和扩展分布式内存数据存储或缓存环境。它可以提供高性能、可扩展且具有成本效益的缓存解决方案。同时，它可以帮助消除与部署和管理分布式缓存环境相关的复杂性。在几乎所有的分布式应用系统当中，Redis 都是不可或缺的至关重要的中间件。

在使用 Amazon ElastiCache for Redis 过程中，集群的高可用对于关键的工作负载是至关重要的。在许多情况下，ElastiCache for Redis 可能需要替换主节点；这些情况包括特定类型的计划维护以及主节点或可用区出现故障的意外事件。根据良好架构设计的原则，我们通常都建议对 ElastiCache for Redis 启用多可用区（Multi-AZ）部署以提供更健壮的缓存服务。

Amazon ElastiCache for Redis Multi-AZ 部署带来的 Side Effect

多可用区部署 ElastiCache for Redis

Amazon Web Service 的每个区域都有多个相互隔离的位置，称为可用区（AZ）。可用区的代码由其区域代码后跟一个字母标识符组成。例如：us-east-1a。

启动 Amazon ElastiCache for Redis 时，我们可以选择区域和 Virtual Private Cloud (VPC)，然后选择对应的缓存子网组, 如以下截图所示：

开启集群模式：

子网组设置：

在本实验中，我们采用集群部署在香港区域，分片设置为3分片，每个分片2个备份,其中主分片分别分布在 ap-east-1a, ap-east-1b 和 ap-east-1c，参考截图如下：

多可用区部署带来的“Side Effect”

因为不同的 AZ 是相互隔离的，并且具有一定的物理距离，所以相同区域的 AZ 和 AZ 之间具有一定的延时，而且 AZ 和 AZ 之间的延时比 AZ 内部的延时要大（在大部分情况下这写延迟可以忽略不计），如下图所示，为我们实际测得的香港 Region 延时数据（ping RTT）：

虽然多数情况下我们可以忽略跨 AZ 带来的延时，但是在一些业务应用系统里面，一个 API 从请求到响应，中间环节可能会涉及到非常多次数的 Redis 操作（一般情况下读大于写），尤其一些金融科技类的业务应用系统，涉及非常多的中间数据校验环节，甚至还涉及云上和本地数据中心的数据交换。在某互联网证券客户的典型应用场景里面，一个 API 调用涉及了 30+ 微服务，总计 900+ 的 Redis操作（大部分为读操作）。假设 Redis 的操作本身的延迟（如 ClusterBasedCmdsLatency）可以忽略不计，若 900 次的 Redis 操作均为跨 AZ 的操作（即业务系统部署所在 AZ，比如 EKS pod，和读取的 Redis 主分片不在同一 AZ），其跨 AZ 读取延时总计为 0.359*90ms 和 0.554*900ms，即 323 ms 和 498 ms，而同 AZ 的读取则仅仅为 0.085*900ms，即 76ms，差距非常明显。加上和本地数据中心或者是第三方接口的交互，以及微服务之间的调用和业务逻辑处理，一个典型的 API 从调用到返回，需要花费接近 2 秒的时间，这对于很多延迟敏感的业务系统，如证券下单类型的系统，是不可以接受的。除此之外，跨 AZ 产生的流量费用也是很多情况下用户会考虑的成本问题。

所以接下来我们将围绕分布式应用系统，在采用 Amazon ElastiCache for Redis 集群模式的多可用区部署获得高可用的同时，如何对 Amazon ElastiCache for Redis 进行就近读取，即应用系统在哪个 AZ，就读取哪个 AZ 的 ElastiCache 分片节点（主分片或者是从分片）。

如下示意图所示：

说明：需要限制本AZ的业务应用只向本AZ的ElastiCache节点读和写，我们需要注意：

1. ElastiCache不支持像Aurora全球数据库集群内的写转发能力，因此在向AZ内的只读分片节点写的时候，集群会根据拓扑，将客户端请求redirect到主分本分片；
2. ElastiCache需要在AZ内部具有所有的数据slots，否则在读的时候也会redirect到其他AZ有所需数据slot的节点。

本文将围绕如何就近读展开，并在测试当中加入少量的写操作，以符合实际的ElastiCache读多写少的实际场景。

我们发现，以 java 语言编写的系统为例，大部分的用户会采用 Redisson 或是 Luttuce。

在分片模式，默认的情况下，如果客户端的读命令不指定为 readonly，如果读取的节点是副本分片节点，会重定向到主分片节点，因为 Redisson 会从任意分片节点获取全部集群的拓扑信息（参考链接），如下图所示，Redisson 客户端会在 SpringBoot 应用启动的时候获取集群的整体拓扑结构：

需要解决如何让应用系统和 Redis 分片节点在同一 AZ 作交互，我们接下来从两个角度进行分析和试验：

应用系统如何感知自己所在 AZ
如何让应用获取相同 AZ 的节点连接

应用系统如何感知自己所在 AZ

我们知道，AWS EC2 的 metadata 中有机器 placement 的信息（Fargate 除外），其中包括了机器所在的 AZ 信息。这些数据，可以通过 EC2 的元数据 IP 地址 169.254.169.254 获取包括 AZ 信息在内的所有元数据（参考链接）。

例如通过 http://169.254.169.254/latest/meta-data/placement/availability-zone，可以获取机器所在 AZ 代码：

以 Java 应用程序为例，AWS 的工具包 EC2MetadataUtils 可以获取机器所在 AZ。

Java 应用需要引入 AWS SDK 依赖包：

Lettuce配置代码片段:

RedisClusterClient redisClusterClient
= RedisClusterClient.create(RedisURI.create(host,port));
StatefulRedisClusterConnection
connection = redisClusterClient.connect();
connection.setReadFrom(ReadFrom.LOWEST_LATENCY);

我们用 900 次的读和 20 次的写进行测试, 可以发现，当集群拓扑稳定之后，900 次读取加上 20 次写的耗时约为 110000000 nano seconds，即 110ms，平均每次 Redis 操作的延迟为 0.11ms，非常接近同 AZ 内部的 ping 延时。

以下为其中一个 AZ 的测试结果:

. ____ _ __ _ _
/\\ / ___’_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | ‘_ | ‘_| | ‘_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
‘ |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.7.3)

2022-09-17 16:30:43.328 INFO 7 — [ main] com.rkdevblog.redis.RedisApplication : Starting RedisApplication v0.0.1-SNAPSHOT using Java 1.8.0_342 on redis-2048-6948f7f7b7-4prll with PID 7 (/root/app.jar started by root in /)
2022-09-17 16:30:43.330 INFO 7 — [ main] com.rkdevblog.redis.RedisApplication : No active profile set, falling back to 1 default profile: “default”
2022-09-17 16:30:44.438 INFO 7 — [ main] .s.d.r.c.RepositoryConfigurationDelegate : Multiple Spring Data modules found, entering strict repository configuration mode
2022-09-17 16:30:44.441 INFO 7 — [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data Redis repositories in DEFAULT mode.
2022-09-17 16:30:44.466 INFO 7 — [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 7 ms. Found 0 Redis repository interfaces.
2022-09-17 16:30:45.147 INFO 7 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2022-09-17 16:30:45.164 INFO 7 — [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2022-09-17 16:30:45.164 INFO 7 — [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.65]
2022-09-17 16:30:45.263 INFO 7 — [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2022-09-17 16:30:45.263 INFO 7 — [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1771 ms
2022-09-17 16:30:46.925 INFO 7 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ”
2022-09-17 16:30:46.939 INFO 7 — [ main] com.rkdevblog.redis.RedisApplication : Started RedisApplication in 4.453 seconds (JVM running for 5.194)
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 405088888
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 314916382
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 312353554
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 325471497
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 250399989
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 295906367
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 217750668
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 151018800
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 127229434
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 117712139
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 152999887
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 146169291
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 264680129
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 170438929
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 183907318
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 146358012
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 128591240
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 128826140
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 107487644
here is the applications’ zone ap-east-1a
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 107505052

考虑到很多客户在延时的问题上，还有跨可用区流量的顾虑，因此，我们在 Lettuce 上，通过扩展 Lettuce 的读方式，让同 AZ 的应用只读取相同 AZ 的 ElastiCache节点。Lettuce 内部已经实现了 read from same subnet 的内部类（io.lettuce.core.ReadFrom）。我们通过该类来实现相同子网，即相同 AZ 的读取。

我们将此内部类移到我们工程当中，变为一个可以使用的类, 并定义一个新的类来提供相同子网的读取，如 MyReadFrom，示范代码如下：

public class MyReadFrom {

public static final ReadFrom SAME_SUBNET(String cidr){
ReadFromSubnet readFromSubnet = new ReadFromSubnet(cidr);
return readFromSubnet;
}
}

在 ElastiCache for Redis 中，我们可以从控制台获取对应分片的子网 IP 段（从 ElastiCache 子网组中获取）：

将该内容配置到工程配置文件中（application.yml）：

客户端初始化时，无需指定 AZ，只需要利用前面所讲述的获取 AZ，并从配置文件中获得对应子网段即可：

get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 103029797
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 141577691
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 154775122
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 122297530
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 147463058
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 108557362
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 97147899
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 96497072
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 126204608
here is the applications’ zone ap-east-1c
get 900/set 20 lettuce redis total time using RedisAdvancedClusterCommands ==> 136220539

使用Redisson

Redisson 在基于 NIO 的 Netty 框架上，充分的利用了 Redis 键值数据库提供的一系列优势，在 Java 实用工具包中常用接口的基础上，提供了一系列具有分布式特性的常用工具类，大大降低了设计和研发大规模分布式系统的难度。

Redisson 提供了从集群模式 Redis 集群用负载均衡的方式读取数据，包括轮询和随机选择分片节点的负载均衡实现，其接口类为 LoadBalancer：

package org.redisson.connection.balancer;

import java.util.List;
import org.redisson.connection.ClientConnectionsEntry;

public interface LoadBalancer {
ClientConnectionsEntry getEntry(List var1);
}

通过实现该接口类的 RoundRobinLoadBalancer 和 RandomLoadBalancer 我们发现，可以通过自定义实现该接口类来达到同 AZ 读取。

我们新建 MyLoadBalancer 类来实现该接口，核心思路是，通过利用 IP 地址过滤和应用系统不同 AZ 的分片节点，仅将相同 AZ 的分片节点的连接加入到可用连接池，示范代码如下：

public class MyLoadBalancer implements LoadBalancer {
private final Map weights = new ConcurrentHashMap();
private final AtomicInteger index = new AtomicInteger(-1);
public MyLoadBalancer(Map weights) {
Iterator var3 = weights.entrySet().iterator();

while(var3.hasNext()) {
Map.Entry entry = (Map.Entry)var3.next();
System.out.println(“redis uri ” + entry.getKey());
RedisURI uri = new RedisURI((String)entry.getKey());
InetSocketAddress addr = new InetSocketAddress(uri.getHost(), uri.getPort());
System.out.println(“addr >>> ” + addr.getHostName());
this.weights.put(addr, new MyWeightEntry((Integer)entry.getValue()));
}
}
public ClientConnectionsEntry getEntry(List clients){
clients = clients.stream().filter(c-> this.weights.containsKey(c.getClient().getAddr())).collect(Collectors.toList());
int ind = Math.abs(this.index.incrementAndGet() % clients.size());
return (ClientConnectionsEntry)clients.get(ind);
}
}

@Bean
public RedissonClient redisson(){
String zone = getAvailabilityZone();
System.out.println(“With redisson ::::::: here is the applications’ zone ” + zone);
env.getProperty(“redis.subnets.”+zone);
String[] nodes = env.getProperty(“redis.redisson.”+zone).split(“,”);
Map zoneMap = new HashMap();
Arrays.stream(nodes).forEach(s->zoneMap.put(s,1));
Config config = new Config();
ClusterServersConfig clusterConfig = config.useClusterServers();
config.useClusterServers()
.setScanInterval(2000)
.setMasterConnectionPoolSize(30)
.setSubscriptionConnectionPoolSize(10)
.setSlaveConnectionPoolSize(10)
.setReadMode(ReadMode.MASTER_SLAVE)
.setLoadBalancer(new MyLoadBalancer(zoneMap))
.setSlaveConnectionMinimumIdleSize(10)
.setNodeAddresses(Arrays.stream(nodes).collect(Collectors.toList()));
RedissonClient redisson = Redisson.create(config);
return redisson;
}

public static String getAvailabilityZone() {
return EC2MetadataUtils.getData(EC2_METADATA_ROOT + “/placement/availability-zone”);
}

同样测试 900 次读取和 20 次写入的延迟，也可以稳定在 100ms 的总耗时，非常接近同 AZ 内的 ping 延迟。

Redisson with 20 set and 900 get:::::>>> 90287160
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 95619938
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 104916029
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 89409202
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 91133678
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 89742063
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 92031802
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 90864236
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 93690500
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 185000259
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 186251204
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 129376250
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 94480145
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 124260166
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 126388665
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 92145180
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 87230076
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 94824036
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 91171170
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 93827133
test 888 value >>> test1664085899594
Redisson with 20 set and 900 get:::::>>> 103257769

结论

通过应用端使用 AWS EC2 的元数据获取应用所在的 AZ，并扩展或者使用 Lettuce/Redisson，我们可以让应用系统从延时最低的分片节点读取数据，或者通过业务系统所在子网 CIDR 来强制业务系统从相同 AZ 的分片节点读取数据，从而使得对 ElastiCache 重度依赖的业务系统将 Redis 操作的延时降低，减少因为跨 AZ 带来的延迟放大和附加的跨可用区流量费用。

原文出处：https://aws.amazon.com/cn/blogs/china/proximity-reads-for-amazon-multi-az-elasticache-for-redis-using-lettuce-and-redisson/

作者：李俊杰