There are many posts online dedicated to questions an applicant for a developer role can be asked. Most of those questions are basic, intro level, aiming at verifying that a candidate is above some basic level of knowledge (for example, checkout the 100+ Coding Interview Questions for Programmers post - those questions are either entry level, algorithmic or very simple solving questions). This is not how we interview and those are not the questions we ask.
In an interview, the questions we ask take about 15 min to answer, and there are no single correct answers. What we are looking for is to evaluate the candidate based on the discussion itself, their thought process, and their experience.
The first thing we are checking is if we can actually talk with the candidate? Can we understand each other? Have a good conversation about a technical problem? Pros / Cons of different options? Does the candidate understand my requirements? Can they find solutions? Find missing needs? Detect blind spots? Can they challenge?
Let’s dive into an example:
We have 10 image files on a disk, on my MacBook. I want to sharpen those images using an algorithm called USM - it is a CPU intensive algorithm, with running time that depends on the image size - running about 2-10ms per megabyte of image size.
Given an image sharpening library is available, how do we write the code to sharpen the images?
Any programming language can be used here. We are not even looking for code that actually compiles - we are looking for ideas and patterns, not exact syntax.
At this point, most users answer something like -
Well, that was a great PoC. We have an application to sharpen images.
However, since we are working for Wix, we have a bit more images that we need to sharpen. 1,200,000 images to be exact. We have all the files in a drive, with a text file containing all the file names.
How does the answer above change now that we have 1,200,000 images to process?
At this point, there are two directions candidates take - either scaling out, or continuing to work on a single host.
For the people who are suggesting to scale out, we will ask what approach they consider? Building their own scale-out infrastructure? Using Kubernetes, and if so then how? Using functions? Again how would those be used? We will spend a few minutes looking into proposed solutions and understanding how they intend to build such a system, then refocus on the single host option - as this is still a simple enough problem that does not require long dev cycles.
Lets dig into single host - assuming the MacBook in question has 16 Cores (it has a powerful CPU), how can we minimize the time for the image processing to complete?
The first answer that we commonly hear is to split the names array into X groups, and process each in turn. This means loading all file names into memory as an array, splitting that array into X arrays, each containing 1/X of the names, and processing the files in turn. The code for such a solution looks something like the following:
With this solution, we wait for the candidate to raise any of the following concerns - and if not, we will actively ask the following questions.
Is there a concern about the memory load of loading 1,200,000 file names? 1,200,000 file names take about 1GByte of memory, so most MacBooks today should be able to handle it.
What is the right value for X? How does it correlate to the number of CPU Cores? Many people will answer 16, as that’s the number of Cores, but that can leave the computer unresponsive. Maybe it is better to assume 15?
What will be the running time of such an application? Assuming each image is about ~10 MBytes, it can be sharpened within 100 ms. Add about 10-50 ms for reading each file and writing it back to the hard drive - the process will take about 4-5 hours of runtime to complete.
Given the process will take hours to run, is there a concern about bugs, crashes, restarts, etc? How do you handle those?
Is this the most efficient way to run this process? (answer - no).
So, how can we make this process run faster? What is the optimal algorithm?
At this point, normally the candidate starts thinking creatively. We are looking for two aspects in this part of the interview - their ability to be creative and their understanding of IO.
Some leading questions -
What if the groups are not balanced? What if group 2 ends up with all the large images, while group 1 with all the small images? What if group 1 completes the processing way before group 2?
How do we utilize the resources to the best possible extent?
The answer we are looking for is splitting the process per resource, using different queues - one for IO, another - for CPU. The candidate can use an Actor model, or a consumer / producer model, or just a simple queueing model.
One such solution is below -
The follow-up questions for this solution are -
How many concurrent operations do we allow for each queue? What values for X, Y and Z? A good answer is 15 for the processing queue (number of Cores - 1) and 1 for both IO queues.
Can we improve this solution in any way?
To summarise, when we interview people for technical positions, asking a question like that takes up a large part of the interview process. However, it is not the only screening means.
We are also asking each candidate to actually write code as part of the interview - on an actual computer, with all the online resources they are used to. This can be done in the office, at home, or while doing pair programming with one of our developers.
We are also diving into the things the candidate has done at their previous places of work, what they have built, what was their part in the decision making process (technical aspects of it), etc.
Last, we take the insights from all the interviews to help us better understand if we have a good fit for the candidate. A question we have raised in this post plays a large part in forming that understanding.
This post was written by Yoav Abrahami
You can follow him on Twitter
For more engineering updates and insights:
Visit us on GitHub
Subscribe to our YouTube channel