Recently we have moved 8 accounts under a new organization. It was full of ups and downs; there was hope, there was recklessness, there was fear, happiness, adrenaline rush, and feel of failure.
It all started as the same old story: with a spark of an idea, maybe a blogpost, maybe a podcast session, or a solution engineer mentioning it every email.
Why did we do it?
The new hype, a.k.a. advised industry practice, nowadays is splitting the workload into multiple business unit accounts. The golden path is built around the hype, and new accounts are popping up every now and then; so it makes sense to move to a better AWS account structure.
The old structure resembles a monolith. There was a prod account first, and all new accounts went under it, naturally, like a proper startup. One by one, requirement by requirement. You can even say we are moving to a microservice architecture, discuss its trade-offs, and mention that even Amazon moved back to monoliths1, allegedly2, but let’s not fire that debate here.
Let’s get straight here, the real reason was we wanted a backup account. We wanted to back up our data in another account within the same region, to save network costs & protect against ransomware. We discussed creating a backup account separately, that didn’t have any ties to the production account.
What are the risks with this account? Production account will not have permissions over this one, its credentials will be separate, whole access will be separate. So the only risk is losing credentials to the backup account. If we had the old structure, where prod account is the root account in the organization, then breaching prod account would potentially mean that backup account is breached, too. As you can guess, prod is pretty busy with a lot of visitors like good children making their grandmother happy over the summer by leveraging her cost-side property.
Prod account is prone to a security breach; it is just a time bomb.
But then, we thought, together with our beloved account manager, who is a very cool guy, what if we move to a different structure and have an umbrella account on top of prod and backup, where prod doesn’t have access to backup but root account would have potential access to it? We think the risk of losing root account access is the same as losing access to a standalone backup account.
Upon this revelation, we decided to change the organization structure and create a new one.
Step 1: create a new root account
That’s rather straightforward, you might think. Of course not.
For ownerless accounts like this, we employ this strategy: create a google group, add people to that group, and use the group’s email as the root email. We created the group, then proceeded to create the new AWS account. But we didn’t receive the confirmation email. A support case on Google, check the moderation settings on Google Group, search email logs… A couple of hours later, we saw that emails were pending moderator approval. We approved them, verified the account, then created the organization.
For the record, we used this strategy before on other AWS accounts we already have. It works. At this specific second, Google decided to filter messages coming from SES for some reason.
Verifying the account here is not easy of course, we had to catch the finance team available to configure the payment method properly. It’s not an easy task.
Step 2: move accounts one by one to new organization
You have an existing organization? You want all of them to go under another organization? Nope, you cannot do it in bulk.
For every account we have, we need to exit the current organization, then accept the invite from the new organization.
But wait, you cannot leave an organization if you don’t have a default payment method! Who is going to pay for your account’s bill?
We know you have been using AWS for some years & creating - destroying services, and generating revenue for us but in order to add a payment method you have to be verified. Amazon bot should call you on your phone and make you enter a code you see on the screen.
Some of these accounts were created by individuals, so they have some backup payment methods, verified accounts; they were easy to move: just find the person & make the change.
Some of them belonged to groups, with backup payment methods and verified accounts, luckily. Hats off to those people who configured their virtual cards before. We added ourselves to the required group emails, and voila! Change the root email password, login with it, accept the new invitation.
not so easy ones
Wait, there are more accounts that belong to some groups, that didn’t have any payment method, or verified. How did it happen then?
You know, good part of organization structure is you can create accounts very easily. But when you do that, they don’t have any payment method, no verification, nothing. You can create them, yes, you are gonna have fun if you want to move them to another organization.
For verification, we chose a sacrifice amongst us, to provide their cell phone. AWS couldn’t call us at first, we had to contact support, wait for correspondence, then make them call us; I assume it’s not their fault, 3rd world country problems.
Then the sacrificed also provided their card info as payment method, that’s more about start-up problems. We couldn’t wait for finance team to come to our rescue for each individual case; it would take forever.
After repeating all steps for each individual account, then we moved them to the new organization.
Step 3: move the root
After all accounts under the old organization are moved, now it’s time to move the root account. We gotta be careful, this is the production account. We might lose access, we will lose access, so we need a backup.
For previous accounts, we didn’t have an access issue. Maybe I should have mentioned in the beginning that we are using Identity Center SSO access over our dear Google integration. We used root accounts, yes, but it is also possible to use an Administrator account to leave an organization and join another one.
The tricky part is, you are logged in to your account using SSO over the old organization (assuming it’s not root account, of course). Now, when we leave the organization, our session is not logged out yet. When we figured that, it made the job easier because we can skip the finding password of root account hard work. We have to still verify the root account and add a payment method, but for the old root account, it’s not an issue.
Another tricky part for this account is, the root account belongs to a very busy person, who is almost impossible to get our hands on. And no, it is quite unsafe to get his account credentials; because, while we don’t maintain a perfect system, we do have that 2FA enabled for the root account.
In this case, we figured that an Administrator will work fine. As a backup, we created an IAM user for backup, in case our SSO-logged-in user gets logged out and we will have to use that backup Administrator user.
We got the plan, backup user, previous experience from other accounts; we are ready. But I am still logged in as SSO-admin user. That will turn out to be a mistake.
The next step is deleting the old organization first, because the account cannot accept an invitation if it is already in an organization itself. The problem is, we could lose the SSO-initiated session. But nothing to worry about; we tried it before and the session continues for some time, and we have the backup user still.
So, delete the old org, accept the invitation, and voila! Everything is smooooooth.
Well, almost everything.
Step 4: sso session? what session?
At this point, I felt safe: new organization is setup, all acounts are moved. Since I saw the prod account under the new organization, I removed the backup user, because there shouldn’t be any dangling over-privileged user hanging around, right? Right.
The organizations have this handy start page for SSO-login. If you haven’t tried it, it’s amazing; it reduced our login management significantly.
Now, this login page can has an alias. Since I, the guy obsessed with that Developer Experience thingy, wanted to keep the old alias so that everyone in the company can use the same link to login to AWS consoles.
But the new organization cannot use the same alias; because it’s already in use! The deleted old organization is still using it! Or AWS thinks that way, resulting in the same thing.
I went back to the old root, the prod, account to see if it still exists. Because you can add an alias and then remove it, too. And apparently, I thought, removing an organization doesn’t remove the alias. I have to check to be sure. I cannot see it because the prod account is not the root account in the organization anymore. I thought, okay, let me get out of the organization, create the org again, and see if the alias still exists. Go to organization settings, leave the organization, create a new organization, and BAM!
Remember, I was still in the old-SSO-logged in admin user from the old organization. I removed the backup admin user as well.
When I created the organization from the prod account again, AWS console remembered that: oh wait, what is this session doing? I think it is time to EXPIRE YOUR SSO SESSION.
You remembered to do it right now? I MEAN, NOW?????
As non-root users, we have lost our SSO logins to the prod account; it’s not in an organization so we cannot use org-level SSO login.
We only have a root user in our prod account.
Step 5: get the access back
We contacted the root account holder, now it’s time to wait.
And simultaneously, we looked into if we have any open assumable roles that’s capable of creating an admin user in the console. Or, any programmatic user with credentials somewhere, maybe?
Now this was a good opportunity to evaluate our security mechanisms. It turned out that we did a pretty good job! Because it is kind of impossible to do that. No exposed credentials; provisioning works with assume-roles with some strong identity policies; we need to create some noise to get access back.
We are waiting for the root account owner to get back to us. Meanwhile, there is not much to do if we are to have some incidents. We have tools in place that can let us operate without logging in to the console; we would be fine for a wide variety of cases; but still, the possibility is there.
At this moment, we are hoping; we don’t have a strategy. We have a strategy of waiting; but that’s not a strategy, that’s waiting helplessly. Waiting was a good strategy for org alias to be refreshed in AWS caches, but not here.
After a while, we got the root account access & got access back.
There are certainly a few lessons to take from this little adrenaline rush. Don’t use SSO-users for such a task; don’t trust AWS caches nor session durations; keep that backup user if you want to play with organization settings; and so on.
We also learned about our security precautions, how hackable our system is. I knew that it would be hard but seeing it in action was a relief as well.