Starting with a monolith application is not really uncommon. But when the demand arises it is important to have a plan or path to go distributed either a Big Bang change or phased approach. I took the phased approach and the phases sort of happened naturally (without even knowing the right technical terms, BUT the concept and vision was clear). I will try to tell the story in this post.
I am sure there's always a better way of doing it, but this is how I have approached it.
Firstly, let's set some functional specification for our "sample app":
- The app will take request from the user (there's no restriction on how many users can request the app in a given second.) via browser.
- The will show response/result of the request via dynamic web pages.
- The app will then do some heavy computing tasks during which the app may call an external IO for read purpose (not shown in the design as the read operation is assumed to be fast enough).
- The app will then write to a slow IO.
Let the story begin.
Back story:
I had an idea about a Web based application to solve certain business problem. I wan't sure if the application will be adopted or whether I will use it long term or not. But I wanted to build something quickly and start using it to see if it actually solves the problem or becomes useful to other people or to me.
Really, most of the new ideas start like this and evolves from here. To build the perfect solution there's just too many things to consider, too long to build it, too many scenarios and edge cases to consider, too much effort to start something that may or may not last long OR worse someone will do it before.
Phase -1:
So I did the terrible thing, and built a Monolithic Java application. I used Spring Boot and Thymeleaf as templating engine (do not ask why, I just did it). The "sample app" looked something like this:
Let's call this "The Monolith" and dissect a little bit on what's really happening here:
- The app is on Heroku ecosystem.
- The app's source code is on Git (Bitbucket / Github).
- There's a deployment pipeline to automate the deploy onto Heroku.
- Auto scaling is setup (for the Dynos to auto scale) based on default p95 response time (min 1 max 2). Reference here: https://blog.heroku.com/heroku-autoscaling
- Dyno type was Standard-1x.
Now let's dissect how the "sample app" is working here:
- Spring boot base java app. Thymeleaf is templating enginge. Tomcat is the webserver.
- By design/nature Tomcat will create 1 thread per request. This will cause some issue which I will point in the next section.
- When the request is received the "sample app" reads a BIG file that contains json data and does some heavy cpu intensive computing.
- Once the computation is finished the "sample app" makes write to our SLOW IO.
Let's discuss the issues that will happen (and has happened) in this scenario:
- The app is running on Standard-1x with scale of max 2. So restricted to 512 threads only.
- The app will take some time during the heavy process (let's assume the IO read is fast enough and negligible. Also for simplicity let's not worry about memory consumption for now).
- The app will take even longer to for IO write because IO response is slow.
Now, couple of things that will happen here that things are about to go terribly wrong when our "sample app" will start gaining popularity (meaning increased number of users and doing increased numbers of requests).
Everything that can go wrong will go wrong:
- Due to slow processing (read + compute) each request thread will take longer time to finish.
- The above will have a cascading reaction that will cause the app to hit 512 max thread limit.
- To make matter even worse the due to slow IO write the threads will take even longer and eventually many (most likely randomly) will hit Heroku's "Time our quickly" limit which is 30sec, (the randomness will depend on IO service how it performs when increasing number of simultaneous write operation are being called for)
Phase-2:
So I read few blogs here and there and did the next quick action (ignore the money/cost side for now, let's say I had a money growing tree in my backyard). I won't say whether it was good or bad. But in real life scenario it served me ok.
Here's what I did:
- Added threading capability (on top of Tomcat's request thread). So the "sample app" will create a background thread per request (depending on if the request calls for heavy compute and/or IO write).
- Scaled Dynos vertically. I upgraded the Web Dynos to Performance-m type. Reference here: https://devcenter.heroku.com/articles/dyno-types
- Made some optimisation on calling the slow IO write.
Why I did it:
- Serving the user request is priority #1. It does not have return the computed result right away but the "sample app" should give the requesting user some indication right away that the it is working to process the request. Simply waiting for a response causes frustration (and when users are frustrated they do interesting things like hitting the refresh button and resubmitting the request; which make things even worse :).. but it's fair).
- A nested/child thread will work ok in this context. All the request thread have to do is to compute that if the processing is going to be long or not (in this context heavy computing and/or slow IO write) and if the answer is yes then create a new thread and offload the heavy task to the child thread. Then immediately respond to the user's request saying "Trust me, I am working on it, check back in a moment or I will let you know when I am done."
- I did not go crazy and created child threads without any limit. I used ThreadExecutionPool matching with the number for cores/shares (Compute 11x).
- The vertical scaling made sure that the "sample app" and its threads have enough memory (2.5G) and and enough CPU (11x) on a dedicated environment. (probably c-2-mid in aws term).
- I did few things right when I was building the app.
- I made sure that application maintains strict layers of services and services are constrained by their domain.
- All the service classes are stateless which inherently made them somewhat thread safe.
- Session variables and or shared vars were maintained to a bare minimum and only in the controllers space. So none of these vars flowed through to services.
- When something was said to be POJO I made sure the object remains as dumb as it can be.
- Used some patterns (eg> flyweight) to keep the memory consumption low during read+compute.
- Separation of concerns were strictly maintained. The read was isolated from write and so was the compute. Controller only handle get,post and nothing creepy.
The Result:
The issues that I was facing went away. Request and response was smooth enough. Things started to look good.
BUT things still weren't really up to the mark:
- Cost (billing) was through the roof. Ofcourse I did not have a money growing tree in any yard where I live. At first it was ok. But then during spike when the Dynos scaled to MAX (6) and it was spiking often as the "sample app" gained popularity. Number to regular users started to increase exponentially. The cost was adding up.
- The slow IO went even worse. As the number of threads increased (with each Dyno scale up) the could not handle the load and started to time out and/or reject requests. So I had to do some creepy programming to keep trying which made my code ugly.
- The slow IO is the main culprit here. Because when the IO write threads were getting stuck and the request threads were creating more threads (per scaled Dyno) which made it even more slow for the IO write.
- The performance of the read+compute side was relatively ok.
It was ok, it worked ok. But it was not sustainable. I started to foresee the things that would eventually blow up.
Phase-3:
I finally listened to what Heroku have been telling repeatedly all along.
Reference here: https://devcenter.heroku.com/articles/background-jobs-queueing
Here's what I did:
- Divided the app into 2. Web and Worker.
- Web will do web processing.
- Thread will process fast read+compute operation as it was doing before. But no IO write anymore.
- Rather it will write the tasks (with its payload) to the queue.
- I suffered enough with the inconsistent behavior of Heroku affinity. So switched to Redis for session. My session was already lean so this wasn't an issue at all. Infact it went more smoother than I anticipated.
- The worker would pick tasks from the queue and perform the slow IO operation.
- I had to refactor things a little bit (eg, Devide the project into 2, Making commons libs that will be shared between both to avoid serialize and de-serialize issue (check the videos, you will know), parent child poms etc. But it was worth it.
- I decrease the Web Dynos Autoscale max down to 3. But I have noticed that the "sample app" rarely reaches to 3. At the same peak as before it, most of the time, reaches to 2 Dynos. Note to future self: Decrease the max to 2.
- Per Dyno there were 3 workers spin. Thus at the "sample app" at its max would spin max 9 workers. Yes they would perhaps run for longer but that's acceptable. (I measured the avg processing time, it was within the reasonable parameters).
Result:
- Costing now slashed to half. Which is always good (more money at beer-o-clock).
- User experience was improved by 10x. The users would still wait with a "processing" gif on their page and due to the nature of the application this is normal and standard. But the thread response was faster. So the average "result produce" time decreased significantly.
- The slow IO was also happy as it was getting way less writes at a given time. The rejection stopped. But it occasionally would do a time-out. But rather than handling this with creepy codes (I removed the creepy code) the job/task would simply go back to the queue to be processed again. And I did not have to do anything extra for it. It was by design with RabbitMQ and Spring-Rabbit-MQ.
Things are finally looking up. This is where I paused scaling my real life application.
But I have been thinking that I can do even better.
Phase-4:
This is a future phase and in theory this should work a lot better, both from costing and performance perspective. Sure, it will need further refactoring but that will be worth it.
Here's what I am thinking of doing:
- Remove the Multi-threading from the web application by not creating any child threads from request threads.
- Make the request response process supper dumb. All the Web App should do is to get the request from the user and raise a task/job in the queue for someone to process.
- Divide the Web application further into 2. Making the "sample app" becoming a collection of 3 small apps:
- One for request handling
- One for reading+computing
- One for writing to slow IO
- Completely separate the code bases and have different pipelines. As really these 3 are really 3 different apps. So possibly total code base would be 4. (3 app + 1 commons lib).
- Refactor to check statuses from queue rather than result on a CompletableFuture from thread. I should be able to even enhance user experience (Note to future self: Do the cool stuff here.).
- Decrease the Dyno size vertically, back to Standard-1x.
- Increase the number of workers per Dyno (I should be able to do this creating a clever pipeline .. post/experiment for another day).
- Utilise Rails AutoScaling to scale up and down the worker process. Reference here: https://devcenter.heroku.com/articles/rails-autoscale
Benefits:
- Costing will go down even more (if not significantly low).
- I could possibly even decrease the Web Autoscaling max down to only 2. Because all the Web app will do is to get request and add task/job in the queue.
- Standard-1x Dyno is going to run all the time anyway with Number of process types = unlimited. I should take advantage of this by creating more worker processes and offloading the heavy lifting to the workers.
Live coding of the "Sample App":
Comming soon.... (check in 3 days time)
Here're some good articles from Heroku:
- https://devcenter.heroku.com/articles/custom-domains -- It does not have to have herokuapp in the url.
- https://devcenter.heroku.com/articles/using-terraform-with-heroku -- Terraform
- https://www.heroku.com/private-spaces -- for dedicated environment, AZs for multi-region.
- https://blog.heroku.com/private-spaces-internal-routing --- to VPC and/or VPN and security. This is particularly super handy from microservice architecture. Something I am meaning to do as phase 2 of this project: a-cqrs-microservice-architecture-my-way.html
- https://devcenter.heroku.com/articles/scaling
- https://devcenter.heroku.com/articles/pipelines
Comments
Post a Comment