WTF moments with Hashicorp Vault

Some things I found confusing or did not like about Vault

Fernando Villalba
10 min readAug 27, 2020

Vault is, for the most part, great. Currently I don’t know of any other product that comes close to provide what it does, even for all the hundreds of managed tools and services that GCP and AWS provide, the closest I’ve seen is GCP’s secret manager, but even that doesn’t offer dynamic secrets so Vault is still the best there is as far as I know in this domain.

That being said, this article is not a love letter extolling all its virtues, mostly is a therapy session listing some things I wish were better about it and the issues I have encountered in my journey to provision and configure vault.

It’s worth noting that I still have a lot to learn about Vault so there may be more things to mention that I haven’t seen, this is not an exhaustive list of issues, only the ones I could remember at this point.

Vault High Availably

I had a bit of an anxiety attack when it came to pick an HA automation framework and platform for Vault because I couldn’t find an ideal solution to provision the cluster. I knew I could have done it myself from scratch, but I certainly didn’t want to do that as I was sure someone must have done it before.

And sure enough there was a multitude of choices available, but all of them felt less than ideal to me. My original idea was to use Kubernetes for vault but Hashicorp didn’t offer much in the form of an official kubernetes deployment automation that I could use and adapt. There was only something that Google had done for GCP (but I was using AWS) and a Helm chart that was provided but not heavily promoted at the time that felt a bit too beta for my taste (also I am not a big Helm fan yet)

So I ended up speaking with a hashicorp engineer for his opinion and he advised using managed instance groups, especially when it came to using consul as a backend, which he also recommended, so that’s what I did.

Hashicorp seems to be promoting Vault in Kubernetes a lot more now so if I had to do this again from scratch, perhaps this would be the route I would have taken.

The best I could find in terms of an automation for vault in AWS with managed groups was created by Gruntwork and hosted by Hashicorp here. This was a great start and I was lucky to stand on the shoulders of giants but there was still a lot of work to do because this terraform scripts were missing a lot of things I really wanted:

  • Completely automated upgrades. Their implemented and recommended upgrade procedure for Vault was very manual, that to me was a not really a way I wanted to conduct my upgrades so I had to implement a blue/green sort of deployment procedure.
  • Automated procedure for consul snapshots, every 10 or 15 minutes.
  • Automated disaster recovery to restore from backup. I wanted to be able to destroy the cluster completely and pick up from the last backup that was taken when recreating.
  • Encrypted boot and internal consul communication (such as RPC and gossip) was not implemented by default.
  • An automated approach to store the secrets needed for provisioning of vault and consul. Sure, Vault manages your secrets, but what manages your vault’s secrets?
  • Endpoints for internal communication wherever possible. I don’t think Vault should send traffic over the internet wherever it is possible to use internal endpoints.
  • Peering and routing across multiple accounts is also not part of these scripts, so they needed to be added.

Those were just some of the changes I had to do, so it wasn’t exactly a walk on the park or a few days task, it took a while to complete, provision and test. Overall I was quite happy with the end result, but it was far from painless and it was very time consuming to finish.

Everything is a Path in Vault

In Vault, everything is a path, and your policies can and will grant access to anything based on a path.

In practice, the above sounds amazing but I personally find the path structure counter intuitive and weird to use.

For example you may be misled into thinking you can list all paths, that to me would have been useful to find configuration endpoints, but this is not always possible, even with a root token:

curl --header “X-Vault-Token: $(cat ~/.vault-token)” --request LIST “${VAULT_AH_ADD}/v1/sys/”

will not list all the endpoints under system backend, which would have been useful in some cases.

Let’s say I wanted to do something simple and list all the auth methods I have set up, let’s try and do it in a way that would make sense to me.

So auth is under sys, maybe I can do this:

curl -s --header "X-Vault-Token: $(cat ~/.vault-token)" --request LIST "${VAULT_AH_ADD}/v1/sys/auth/" | jq{"errors": ["1 error occurred:\n\t* unsupported path\n\n"]}

Maybe I try this instead?

curl -s --header "X-Vault-Token: $(cat ~/.vault-token)" --request LIST "${VAULT_AH_ADD}/v1/auth/" | jq{"errors": ["no handler for route 'auth/'"]}

Nope, that doesn’t work either. So I go to the documentation where I find that this is the right way to list auth backends:

curl  -s --header "X-Vault-Token: $(cat ~/.vault-token)" --request GET  "${VAULT_AH_ADD}/v1/sys/auth" | jq{"token/": {"accessor": "auth_token_blah","config": {"default_lease_ttl": 0,"force_no_cache": false,"max_lease_ttl": 0,"token_type": "default-service"},

So in order to do a listing here, we do a GET method instead of a LIST method. That for me would be okay if it was consistent, but then in other places like in kv you use the LIST method to list secrets.

The Confusing Key Value Version 2 Store

There is another thing about paths that everyone I have worked with gets confused with, the kv2 path structure.

KV2 allows you to have multiple versions of a secret, in case you mess up. However the way it was implemented makes it confusing to use at first.

For example if you have a path kv/database/ where all your db secrets reside and you create a policy like this:

path "kv/database/*"{capabilities = ["create", "read", "update", "delete", "list"]}

The above will grant access to nothing because the path kv/database/ does not exist. Confused? So was I, and everyone else that I have witnessed using vault with me.

The reason for this is that kv uses multiple “hidden” paths that perform different functions such as delete, update, metadata(to list), etc

So in order to gain the correct permissions to the above path your policy needs to look like this:

path "kv/+/database/*"{capabilities = ["create", "read", "update", "delete", "list"]}

The plus is essentially a wildcard matching a single directory, so the above is giving access to all secrets under database that reside under any parent directory.

These details are abstracted away for you when you use the UI or the CLI but not the API or any other programming library, so you need to make sure to communicate them to anyone who you are giving permissions to in Vault.

Inspecting audit logs can be tedious

One of the greatest things about Vault is that you you can see who requested what secret, when and from where. This is particularly very handy if you use vault to generate dynamic secrets because if anyone were to steal those generated credentials you could find out straight away what token generated them from what user or service and take action accordingly.

Vault sends all sensitive information hashed to the logs, which is a great thing, but sometimes it can be a little heavy handed with hashing and there is no simple way (that I know of) to selectively disable hashing for certain values.

For example when using dynamic secrets for DBs vault will dynamically generate a username and password and both are hashed in the logs, it would have been nice if only the password was hashed, but that’s not the case, so in order to search for your generated DB user v-oidc-fer-test-A1blahblah, you need to hash that value and then search for it hashed.

The only way I know of to retrieve that value is by using an API endpoint like this:

curl  -s --header "X-Vault-Token: $(cat ~/.vault-token)"  --request POST  --data \{\"input\":\"v-oidc-fer-test-A1blahblah\"\} "${VAULT_AH_ADD}/v1/sys/audit-hash/file" | jq .hash"hmac-sha256:d3626e5969..."

Which even if scripted is a little tedious to do every time. It would be a little nicer if at least this function was available in the UI, but it isn’t because…

The User Interface is missing a lot of features

Some developers and sysadmins look at UIs with disdain because everything should be done in code and because the command line is more powerful. While I do agree with both of those arguments, I still think having a good UI is very useful.

First, the UI is useful when you are delegating access to someone who is not familiar with Vault and overall is just so much more convenient to use when updating your KV secrets, etc. In this sense the Vault UI is generally very good.

Second it is very useful when it comes to viewing existing and available configuration or to test something you will later implement in code. Here is where I find the Vault UI a bit underwhelming in terms of the configuration you can view and apply.

If you want to use all the features and configuration vault has available the order is the following:

API → CLI → UI CLI → UI

Anything you want to do in Vault will be available via the API, followed by the CLI with fewer features, the UI CLI (A drop down menu in the UI that gives you access to a more limited CLI) and lastly the UI.

Most administrative tasks fall outside of the scope of the Vault UI and you won’t be able to do them. This is totally fine with me as I prefer to do all the configuration in terraform, however I do find UIs very useful to orientate myself, especially to find existing configuration options and see if they are correct and what else I can add, etc. It’s also quicker than going through the API documentation to find how I can fetch this config values.

For example you cannot view what roles you have configured with AWS auth method in the UI, and it is the same with many other auth methods with a few exceptions. Instead you need to do that via the CLI, or API. You also cannot view the roles configured with DB dynamic secrets, etc.

Actually, as of the writing of this article I don’t seem to be able to display the AWS auth roles in any way.

Beware of Upgrades

I love that Vault has fast release cycles and they are continuously improving it but sadly I have encountered issues with upgrades far too often, even with minor releases.

Some of these bugs, feels to me, like they could have easily been avoided with better automated testing, for example when they released version 1.5.1 they forgot to pack the UI with vault… think about that for a second, how do you build without the UI and have all your tests pass?

And then in the next release 1.5.2 they patched a security issue with AWS auth method but they forgot to include allowed accepted headers for it, which is also something that seems could have been easily avoided with a comprehensive set of tests.

I don’t want to be too judgemental with Hashicorp here because I really appreciate the great job they are doing overall, I am simply saying that you as a user should be mindful of this when doing your upgrades and perhaps apply them to a lower tier vault and wait for at least a couple of days of real life usage before applying them to production.

Also, always check their upgrade guides before upgrading, and bear in mind that if you don’t wait a few days before upgrading even to a minor version you may be the first one reporting something that will end up going on this guide.

You can’t search secrets

If you have your secrets organised in a hierarchy and you do not remember where something is, tough luck. There isn’t an out of the box way to search for secrets in vault. You can create a script for it sure, but it will likely be slow, as you have to do traverse all “directories” and do multiple lists in each of them. See this for reference:

https://github.com/hashicorp/vault/issues/5275

Weird Issues?

I often encounter things with Vault that I don’t know if I can call them issues or what. Some of these issues seem to be filed as bugs for months or over a year and not fixed. These seems to affect mostly edge cases or functionality that perhaps is not as widely used by everyone, overall sometimes feels like you are using a product that is still in beta. For example this issue I had with path permissioning with multiple plus signs, or this other issue with documented functionality not working as expected, etc. None of these were really deal breakers, but can be very annoying, especially when you spend a lot of time trying to figure out what have you done wrong, when in reality it wasn’t your fault.

Conclusion

Vault is a great tool and it really has little competition in doing what it does better. However if I had to do it again, and there was a managed cloud service that did the same I would probably give Vault a miss because the amount of work in setting it up and getting it to work nicely is quite substantial.

I have ex-colleagues that started trying to use vault and gave up because of this complexity. Problem is that your alternatives are not great if you are in AWS, they are a little better in GCP but Vault is still king for secret management so you may still want to use it because secret management is very, VERY important.

--

--