一次golang fasthttp踩坑经验
一个简单的系统,结构如下:
我们的服务A接受外部的http请求,然后通过golang的fasthttp将请求转发给服务B,流程非常简单。线上运行一段时间之后,发现服务B完全不再接收任何请求,查看服务A的日志,发现大量的如下错误
从错误原因看是因为连接被占满导致的。进入服务A的容器中(服务A和服务B都是通过docker启动的),通过netstat -anlp查看,发现有大量的tpc连接,处于ESTABLISH。我们采用的是长连接的方式,此时心里非常疑惑:1. fasthttp是能够复用连接的,为什么还会有如此多的TCP连接,2.为什么这些连接不能够使用了,出现上述异常,原因是什么?
从fasthttpclient源码出发,我们调用请求转发的时候是用的是
f.Client.DoTimeout(req, resp, f.ExecTimeout),其中f.Client是一个fasthttp.HostClient,f.ExecTimeout设置的是5s。追查代码,直到client.go中的这个方法
func (c *HostClient) doNonNilReqResp(req *Request, resp *Response) (bool, error) { if req == nil { panic("BUG: req cannot be nil") } if resp == nil { panic("BUG: resp cannot be nil") } atomic.StoreUint32(&c.lastUseTime, uint32(time.Now().Unix()-startTimeUnix)) // Free up resources occupied by response before sending the request, // so the GC may reclaim these resources (e.g. response body). resp.Reset() // If we detected a redirect to another schema if req.schemaUpdate { c.IsTLS = bytes.Equal(req.URI().Scheme(), strHTTPS) c.Addr = addMissingPort(string(req.Host()), c.IsTLS) c.addrIdx = 0 c.addrs = nil req.schemaUpdate = false req.SetConnectionClose() } cc, err := c.acquireConn() if err != nil { return false, err } conn := cc.c resp.parseNetConn(conn) if c.WriteTimeout > 0 { // Set Deadline every time, since golang has fixed the performance issue // See https://github.com/golang/go/issues/15133#issuecomment-271571395 for details currentTime := time.Now() if err = conn.SetWriteDeadline(currentTime.Add(c.WriteTimeout)); err != nil { c.closeConn(cc) return true, err } } resetConnection := false if c.MaxConnDuration > 0 && time.Since(cc.createdTime) > c.MaxConnDuration && !req.ConnectionClose() { req.SetConnectionClose() resetConnection = true } userAgentOld := req.Header.UserAgent() if len(userAgentOld) == 0 { req.Header.userAgent = c.getClientName() } bw := c.acquireWriter(conn) err = req.Write(bw) if resetConnection { req.Header.ResetConnectionClose() } if err == nil { err = bw.Flush() } if err != nil { c.releaseWriter(bw) c.closeConn(cc) return true, err } c.releaseWriter(bw) if c.ReadTimeout > 0 { // Set Deadline every time, since golang has fixed the performance issue // See https://github.com/golang/go/issues/15133#issuecomment-271571395 for details currentTime := time.Now() if err = conn.SetReadDeadline(currentTime.Add(c.ReadTimeout)); err != nil { c.closeConn(cc) return true, err } } if !req.Header.IsGet() && req.Header.IsHead() { resp.SkipBody = true } if c.DisableHeaderNamesNormalizing { resp.Header.DisableNormalizing() } br := c.acquireReader(conn) if err = resp.ReadLimitBody(br, c.MaxResponseBodySize); err != nil { c.releaseReader(br) c.closeConn(cc) // Don‘t retry in case of ErrBodyTooLarge since we will just get the same again. retry := err != ErrBodyTooLarge return retry, err } c.releaseReader(br) if resetConnection || req.ConnectionClose() || resp.ConnectionClose() { c.closeConn(cc) } else { c.releaseConn(cc) } return false, err }
请注意c.acquireConn()这个方法,这个方法即从连接池中获取连接,如果没有可用连接,则创建新的连接,该方法实现如下
func (c *HostClient) acquireConn() (*clientConn, error) { var cc *clientConn createConn := false startCleaner := false var n int c.connsLock.Lock() n = len(c.conns) if n == 0 { maxConns := c.MaxConns if maxConns <= 0 { maxConns = DefaultMaxConnsPerHost } if c.connsCount < maxConns { c.connsCount++ createConn = true if !c.connsCleanerRun { startCleaner = true c.connsCleanerRun = true } } } else { n-- cc = c.conns[n] c.conns[n] = nil c.conns = c.conns[:n] } c.connsLock.Unlock() if cc != nil { return cc, nil } if !createConn { return nil, ErrNoFreeConns } if startCleaner { go c.connsCleaner() } conn, err := c.dialHostHard() if err != nil { c.decConnsCount() return nil, err } cc = acquireClientConn(conn) return cc, nil }
其中 ErrNoFreeConns 即为errors.New("no free connections available to host"),该错误就是我们服务中出现的错误。那原因很明显就是因为!createConn,即无法创建新的连接,为什么无法创建新的连接,是因为连接数已经达到了 maxConns = DefaultMaxConnsPerHost = 512(默认值)。连接数达到最大值了,但是为什么连接没有回收也没有复用,从这块看,还是没有看出来。又仔细的查了一下业务代码,发现很多服务A到服务B的请求,都是因为超时了而结束的,即达到了 f.ExecTimeout = 5s。
又从头查看源码,终于发现了玄机。
温馨提示: 本文由杰米博客推荐,转载请保留链接: https://www.jmwww.net/file/web/9929.html