Updated: Jun 17, 2018
So we have some Scala code that consumes text from an InputStream from an HTTP response. So, like any good Scala developer, I handed over the response to a function that returns a String. This function performs some validations such as checking that the response status is 200, then consumes the InputStream. This was done by creating a scala.io.Source from the InputStream, then calling source.mkString to consume the response body.
Or so I thought.
Apparently, scala.io.Source is an Iterator[Char] and inherits the mkString() function from TraversableOnce. Calling a TraversableOnce.mkString() appends all members of the TraversableOnce instance to a StringBuilder – which is fine, unless this instance is actually an abstraction over an IO-bound stream, whereupon it consumes the stream byte-by-byte. This is, as some of you might now, a terribly inefficient way to consume an InputStream, especially a network-bound one. When running a thread profiler, we were horrified to discover that we are spending almost 70% of the time waiting on IO consuming these HTTP responses.
Time with mkString: 548707000, time with getLines: 1868000
This is a difference of about x600 in favor of the version with getLines()!
Edit: I have created an issue in the Scala issue tracker for this bug.
This post was written by Shai Yallin
You can follow him on Twitter