缓存介绍
- 在高并发多用户的系统中常?;崾褂没捍胬刺嵘列葱阅?/li>
- 常见的如memcached, redis, 内存缓存等
现象
- 某产品上线后不久,服务报警,看日志发现有sql的timeout报错,具体表现为:
- 页面许多逻辑超时、出错
- db所在机器load较高,dba经查为大量相同的sql在反复执行
定位
取应用服务的jstack
-
参考我之前的blog,stack dump文件用stackAnalysis工具分析,发现有大量的线程在做同一个事情:
40 threads at (state = RUNNABLE, locks_locked = [0x0000000725b33848, 0x0000000725b338f0, 0x0000000737ff37d0, 0x0000000737f88f08, 0x0000000737f817c8, 0x00000007fc8ba580, 0x0000000725d8e638, 0x0000000725d8e6e0, 0x0000000738274490, 0x0000000725b5f720, 0x0000000725b5f7c8, 0x00000007384c03f8, 0x00000007231683a8, 0x0000000723168450, 0x0000000731980608, 0x0000000725d27ab8, 0x0000000725d2fcd8, 0x00000007384b16c8, 0x0000000723221798, 0x00000007232299c0, 0x000000072efb1228, 0x00000007005b70c0, 0x00000007005aff10, 0x0000000738321660, 0x00000007318cb948, 0x00000007318c4780, 0x0000000737c7de70, 0x0000000725a02d30, 0x0000000725a02dd8, 0x00000007fc8f8b60, 0x00000007232918f8, 0x000000072329db00, 0x000000073186ee08, 0x0000000725b7b928, 0x0000000725b7bb98, 0x0000000738066408, 0x00000007230a6ef8, 0x00000007230a0160, 0x0000000738191a18, 0x0000000737f619e8, 0x0000000737f5a6d8, 0x00000007fc8b9518, 0x0000000725ba54d0, 0x0000000725ba5578, 0x0000000738239a40, 0x0000000725e885c0, 0x0000000725e810e8, 0x00000007b24ac378, 0x00000007230c47e8, 0x00000007230c4890, 0x0000000731907c58, 0x00000007005345a0, 0x000000070052d098, 0x0000000731a6d400, 0x00000007231879f8, 0x0000000723187aa0, 0x000000073846aa20, 0x00000007231e7128, 0x00000007231e71d0, 0x0000000731958f38, 0x00000007231b2500, 0x00000007231b25a8, 0x00000007fc8f8dc0, 0x0000000725e1af28, 0x0000000725e1afd0, 0x0000000738323388, 0x00000007319ad368, 0x00000007319a6588, 0x00000007384894f0, 0x00000007318b8af8, 0x00000007318b1ba8, 0x00000007380c9908, 0x0000000725c5e478, 0x0000000725c5e520, 0x0000000738256338, 0x00000007230c7cd0, 0x00000007230b9440, 0x000000072e8c7810, 0x0000000725dcd8d0, 0x0000000725dc66d8, 0x0000000732c2df18, 0x00000007232425a0, 0x0000000723242648, 0x0000000732c31da0, 0x0000000731a4fd78, 0x0000000731a4fe20, 0x0000000738139a10, 0x0000000725cda198, 0x0000000725cda240, 0x0000000738066638, 0x0000000702b936b8, 0x0000000702b929a0, 0x00000007384893f0, 0x00000007230f9150, 0x00000007230f91f8, 0x0000000738036fc8, 0x000000073198d218, 0x000000073198d2c0, 0x00000007384710c8, 0x00000007231b0bf0, 0x00000007231b0c98, 0x00000007fc8bdea8, 0x00000007318a5808, 0x000000073189e0c8, 0x0000000731870018, 0x0000000723279d10, 0x0000000723279db8, 0x0000000738471170, 0x000000072e8fabd8, 0x000000072e8f8af8, 0x0000000732c51a38, 0x00000007319c69a8, 0x00000007319b9238, 0x0000000737fd5758, 0x0000000725b0c488, 0x0000000725b0c530, 0x00000007381f44a0, 0x0000000731a095b8, 0x0000000731a09660, 0x0000000735cbb2b0]) : "http-bio-*-exec-*" daemon prio=* tid=******** nid=******** runnable [********] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114) at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161) at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189) - locked <********> (a com.mysql.jdbc.util.ReadAheadInputStream) at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719) - locked <********> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155) - locked <********> (a com.mysql.jdbc.JDBC4Connection) at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379) - locked <********> (a com.mysql.jdbc.JDBC4Connection) at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.execute(NewProxyPreparedStatement.java:67) at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56) at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70) at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:57) at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259) at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:105) at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:81) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104) at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98) at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358) at com.sun.proxy.$Proxy18.selectList(Unknown Source) at org.mybatis.spring.SqlSessionTemplate.selectList(SqlSessionTemplate.java:198) at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:114) at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:58) at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43) at com.sun.proxy.$Proxy46.selectAllValidActivityPush(Unknown Source) at com.xxxx.xxxx.module.inbox.InboxAgent.selectActivityPush(InboxAgent.java:612) at com.xxxx.xxxx.service.SystemMessageService.getActivityPushMessage(SystemMessageService.java:975) at com.xxxx.xxxx.service.login.logic.impl.LogicLoginServiceImpl.updateLoginUser(LogicLoginServiceImpl.java:438) at com.xxxx.xxxx.service.login.logic.impl.LogicLoginServiceImpl.updateLoginUser(LogicLoginServiceImpl.java:374) at com.xxxx.xxxx.web.controller.login.LoginController.login(LoginController.java:119)
可以看到有40个线程在等待db的数据返回,结合堆栈,基本可以定位到有问题的代码逻辑了
分析
-
仔细分析对应代码逻辑,可发现有如下的缓存策略:
Object getObject() { o = getFromCache() if(o == null){ o = getFromDb() if(o != null) { setToCache(o) } } return o; }
从上面看貌似没有问题,但仔细分析会发现当getFromDb()返回null即数据库中并不存在相关数据时,每一个线程都会去执行getFromDb()这个方法,每个请求都会穿透到db上
当用户请求较大时,对数据库的压力会非常大【上面的stack仅为多台应用web中的一台】
解决思路
- 当数据库中无数据时,可以在缓存中放一个无效的对象表明“数据为空,不需要到db中查询了”,如下:
Object getObject() {
o = getFromCache(key)
if(o == null){
o = getFromDb()
if(o != null) {
setToCache(key, o)
}
else {
setToCache(key, invalidObject)
}
}
return o == invalidObject ? null : o;
}
-
更进一步,上面的getFromDb()逻辑仍有可能会被多个线程同时操作,可以视业务场景而加上分布式锁的逻辑:
Object getObject() { o = getFromCache(key) if(o == null){ try { if(cache.lock(key)) { o = getFromDb() if(o != null) { setToCache(key, o) } else { setToCache(key, invalidObject) } } } finally { cache.unlock(key); } } return o == invalidObject ? null : o; }
思考及建议
- 多线程思维:每一行代码都要考虑其会被多个线程高并发的执行
- 抠门思维:每一行代码,尤其每一个网络操作(cache或db),都要考虑是否可以节省下来,或者将多个操作合并为一个操作
- 批量思维:多个动作是否可以一次完成。举个例子:去菜市场买菜大家都会一次把五种菜全买回,而不是买一次菜去菜市场一次。coding为什么不也这样呢?
- 每个逻辑都要谨慎思考,任何疏忽都可能会把线上搞死,服务宕机,造成严重后果