This guide covers some of the key concepts in data architecture and best practices for structuring the JSON data in your Firebase Realtime Database.
Building a properly structured database requires quite a bit of forethought. Most importantly, you need to plan for how data is going to be saved and later retrieved to make that process as easy as possible.
How data is structured: it's a JSON tree
All Firebase Realtime Database data is stored as JSON objects. You can think of the database as a cloud-hosted JSON tree. Unlike a SQL database, there are no tables or records. When you add data to the JSON tree, it becomes a node in the existing JSON structure with an associated key. You can provide your own keys, such as user IDs or semantic names, or they can be provided for you if you are using "Append List" action.
For example, consider a chat application that allows users to store a basic profile and contact list. A typical user profile is located at a path, such as /users/$user_id. The user Rashmi Chaudhuri might have a database entry that looks something like this:
Best practices for data structure
Avoid nesting data
Because the Firebase Realtime Database allows nesting data up to 32 levels deep, you might be tempted to think that this should be the default structure. However, when you fetch data at a location in your database, you also retrieve all of its child nodes. In addition, when you grant someone read or write access at a node in your database, you also grant them access to all data under that node. Therefore, in practice, it's best to keep your data structure as flat as possible.
For an example of why nested data is bad, consider the following multiply-nested structure:
With this nested design, iterating through the data becomes problematic. For example, listing the titles of chat conversations requires the entire chats tree, including all members and messages, to be downloaded to the client.
Flatten data structures
If the data is instead split into separate paths, also called denormalization, it can be efficiently downloaded in separate calls, as it is needed. Consider this flattened structure:
It's now possible to iterate through the list of rooms by downloading only a few bytes per conversation, quickly fetching metadata for listing or displaying rooms in a UI. Messages can be fetched separately and displayed as they arrive, allowing the UI to stay responsive and fast.
Create data that scales
When building apps, it's often better to download a subset of a list. This is particularly common if the list contains thousands of records. When this relationship is static and one-directional, you can simply nest the child objects under the parent.
Sometimes, this relationship is more dynamic, or it may be necessary to denormalize this data. Many times you can denormalize the data by using a query to retrieve a subset of the data, as discussed in Retrieve Data.
But even this may be insufficient. Consider, for example, a two-way relationship between users and groups. Users can belong to a group, and groups comprise a list of users. When it comes time to decide which groups a user belongs to, things get complicated.
What's needed is an elegant way to list the groups a user belongs to and fetch only data for those groups. An index of groups can help a great deal here:
You might notice that this duplicates some data by storing the relationship under both Ada's record and under the group. Now alovelace is indexed under a group, and techpioneers is listed in Ada's profile. So to delete Ada from the group, it has to be updated in two places.
This is a necessary redundancy for two-way relationships. It allows you to quickly and efficiently fetch Ada's memberships, even when the list of users or groups scales into the millions or when Realtime Database security rules prevent access to some of the records.
This approach, inverting the data by listing the IDs as keys and setting the value to 1, makes checking for a key as simple as reading /users/$user_id/groups/$group_id and checking if it exists. The index is faster and a good deal more efficient than querying or scanning the data.