Firestore (and Firebase) is a really great solution for many different use cases. Everything that does so much gets complicated very quickly, even if it looks simple on the surface.
Here are my personal favorites that can bite you real bad based on more than ten mobile applications in Flutter and React Native and code audits, which we have accomplished at LeanCode.
This series of articles comprehensively describes the pros and cons of using Firestore as the backend for your next mobile application. In this series, we will show you that making this decision is not a simple process, and you need to analyze your app from multiple perspectives.
Would you like to learn from our past experience what can go wrong with Firestore and Firebase and their biggest disadvantages and limitations? Read on to find out why!
Being able to access the data directly from the end-user devices creates some non-trivial problems. Typically, your backend system would act as an intermediary that handles all the cross-cutting concerns. Here there isn’t one. Everything that would normally be done there is now the responsibility of Firestore. And this is a bad idea.
Of course, there are cases when that is not a problem. For example, you can “easily” shield yourself from bad actors by just ignoring/sanitizing malformed data on the client. There are cases when you don’t need any authorization besides simple “these users can only write these documents and read from those.” I would even say that most projects fall under this category in the beginning. Then the complexity creeps in, and you find yourself deep in the broken Firestore rules, with validation code everywhere in your app. And no security.
Uh oh. I think this is the thing that made me (and well… my clients) scream in agony. Firestore pricing looks normal — you pay for ingress/egress, storage & operations. That’s understandable. What makes it hard is expense monitoring. Or, to be more specific — lack of it.
Firestore doesn’t give you a way to check how much you use. You can see how many ops you’ve already used, but when it comes to storage and ingress/egress, you’re pretty much left to yourself. Firestore doesn’t give you anything meaningful there. All you have is a single “storage used” on your GCP bill. It doesn’t tell you how much data you really have or how much new data you’re creating, and it only means how much they’ve billed you—nothing else. You can try to derive the changes from it, but that won’t be anywhere near “accurate.”
Theoretically, the documentation tells you how to calculate the storage you use (or will use). You can calculate everything yourself, but that requires you to download every single document in the database or do the calculation up-front when uploading the document for the first time. It’s also painfully complicated (for such a simply stated problem), terribly slow, and will cost you money just to calculate how much you will pay.
What does count under the term storage used? Well, everything. Documents, collections (i.e., paths, as a collection isn’t really a thing when we’re talking about storage space), indices, you name it. You pay for every byte that you create and for every byte that Firestore creates for you. And it makes a lot.
By default, Firestore indexes all of the fields in your documents. All of them. It would be best to disable indices explicitly, and you can only create 200 exemption rules (as of 21–09–2020). This makes it extremely important to model your data carefully because one wrong index can result in a tremendous amount of unused data. I am guilty of overlooking this. In Activy, where we use Firestore to sync activities and sync Rankings across, we generated almost 24GB of indices for every 1GB of data. We haven’t used any of that.
So, when it comes to the pricing, you have to be really, really, really careful, even for simple cases. As I say — it only takes one bad actor to pollute your data.
Google does not make any promises regarding latency. That alone might be the key to rejecting Firestore as your database. Without any assurance, you can’t design your product well. Even if the timing would be high, but you would know it, you would be able to work around it. For example, you could hide the latency by starting the request earlier in the process of just doing it entirely in the background. This would increase complexity (that Firestore tries to avoid) but would be doable. You can only measure and hope it will be consistent without known RTT (round-trip time, latency times 2).
And the measurements aren’t that good. Over one second for a small query is a really long time. This mostly coincides with our benchmarks in Activy that uses Firestore quite extensively. It works more or less the same as in the article, i.e.:
Uploading the document (to a known path) takes more than 300ms in Activy. Waiting for processing and sending notification takes another couple hundred ms (~200ms in our case). All of this gives 1s at best. When comparing that to a simple WebSockets server running on the smallest GCP instance (as per the article), Firestore looks terrible. Even if you add message processing, some small database, and such, you won’t get more than, say, 500ms RTT on the smallest instances possible.
Accepting this kind of latency might be feasible for some applications, especially in their early stages, but using Firestore for near real-time communication is shooting yourself in the foot. You wouldn’t be fast even if you did your best.
Firestore, even though it is somewhat powerful, is rather limited compared to traditional databases (being it RDBMS or another document database). Combining the basic queries with the index-all-by-default approach gives you a great starting point, but you need to model your data for search-ability upfront. There are several limitations that make using Firestore painful. Some of them (e.g., the limits of OR or array-contains/array-contains-any) are not that awkward, but the first limitation, namely that you can do range queries only on a single field, is irritating. It’s pretty common to do “get me all transactions from this date range that are valued no less than X,” and this single rule disallows that. Also, Firestore does not support “negated” queries (like not-in or plain old !=), making common queries unrepresentable.
Document databases tend to have limited processing capabilities. That means you can’t compute values based on query results directly in the database like, e.g., SQL, nor do they allow joins. To overcome this, Google introduced the MapReduce approach that somewhat mitigates this issue. MapReduce became the de-facto standard for document databases (even MongoDB supports it!). Unfortunately, Firestore does not have anything like that. You can implement it yourself using so-called aggregation queries but that solution is really far from perfect. You do have control over the process, but Firestore is unable to optimize any of this. You effectively have to do optimistically-concurrent transactions to update a collection that works as a map-reduce index. This can work for simple cases where your source collections aren’t modified frequently, but your retries will eat all of your performance if there is some load.
You develop version 1 of your mobile or web application with Firestore. Everything is great. Everything syncs correctly, everything works fast, the development was a pleasure. Your userbase is getting bigger, you become famous, and money starts flowing. To make you even richer, you begin to think about new features. And you decide to implement them
This is where the pain starts. You’ve developed your app, you’ve gained a userbase, all of your business logic is in the app, running directly on end-user devices, and you need to migrate the data. Migrations are always tricky (esp. in NoSQL databases), but here, you’re in an even worse situation. You need to migrate some data and build your app to handle multiple versions of said data.
Consider the situation where you need to modify the model. It doesn’t really matter if it will be just a quick fix that results in adding a field or completely revamping the data, although adding things is much easier to migrate. If you have to do it once, it is doable but needs to be accounted for upfront: the previous version of your app needs to be created not to crash when the model slightly changes. The new version needs to handle the data from the old version (so — migrate it) and save the new data so that the old app won’t break.
Why not just abandon the old version, you might ask? Because you can’t force your users to upgrade the app. Some folks will plainly refuse to upgrade, but even if you can ignore them, the upgrade isn’t instantaneous, and you can’t control it. Doing extensive upgrades with standard backend systems isn’t really viable also, but here at least you control everything, and it is up to you when the upgrade will (or will not) be finished.
It gets even trickier when you need to migrate data more frequently. Requiring to handle three or even four versions will result in much grief and many, many bugs. And if you deploy a rogue version of the app with a critical error… You can’t really take it back fast enough. It might do you immense damages before you can revert it.
Because Firestore querying is limited and there is no map-reduce, one data model won’t handle all the cases. You will end up uploading much more data than you need to and do everything in-mem or duplicate the data. You can leverage Cloud Functions here to make it automatic, but you will need to handle all the CRUD actions yourself.
This also means that there will be some delay between adding/modifying the document and it being propagated to other collections. It is worth designing your app so that it handles eventual consistency well, but sometimes that is just overkill. Or worse, your business (or regulatory) requirements forbid you from being eventually consistent.
If you go with the data duplication (and doing the mapping yourself), you will end up with multiple separate copies of the same data, just structured differently. This not only increases the storage cost but also highly increases the complexity. You now not only have a single document to version but also multiple related documents that need to be upgraded with care (and possibly atomically which might make the process even harder).
This paragraph will be a short one.
There are no backups in the Firestore. You can export data to a GCS bucket, but that isn’t really a backup - it’s just an export. Firestore doesn’t ensure consistency, and the backups are mostly manual (you can script that, but you have to write it yourself). The timings aren’t really predictable.
It’s just not a backup.
What matters the most is to be aware of those limitations before you design your system. Those problems are not always red flags, and there are certain business cases where using it as your backend might still be a very reasonable decision. You have to weigh all the pros and cons and decide for yourself.
In the next article from our series, we will describe how to use the Firestore and Firebase benefits the right way and which steps should you consider when you have already found yourself in the traps described above.
Follow the link to read Why Firestore? - 6 things you need to know before using Firestore.
No reviews yet!