Staff Advice - Common Mistakes
Session Management
Session management in this project is nothing like session management in web. If that’s what you were thinking, you’re…probably overcomplicating it.
To get an example of multiple sessions “in usage”, see this snippet from the starter code’s client_test.go file:
aliceDesktop, err = client.InitUser("alice", defaultPassword)
aliceLaptop, err = client.GetUser("alice", defaultPassword)
err = aliceDesktop.StoreFile(aliceFile, []byte(contentOne))
invite, err := aliceLaptop.CreateInvitation(aliceFile, "bob")
The user struct returned by InitUser
and GetUser
is like a Python instance of a class – in other words, if you call aliceDesktop = InitUser(...)
and aliceLaptop = GetUser(...)
, then these are two instances whose initial attributes are set at the time that the instance is created.
If your client API modifies attributes within these instances (e.g. a list of files, for example), you may want to consider “syncing” the user struct at the beginning and end of every client API function call, so that each instance is “updated” before any operations are done using the attributes of the user.
For this reason, if we store any mutable data in the user struct (e.g. a list of files), we have to “sync” the user struct with the latest state of it in Datastore before every Client API operation…which, consequently, may lead to a lot of datastore bandwidth when appending to a file!
Consider the case where we have a million files. In order to support multiple user sessions, we have to sync the user struct, which causes bandwidth of O(N = 1,000,000). This is incredibly inefficient if we’re only appending one byte to one file! As such, we recommend thinking about how to flatten your data structures to solve this efficiency problem (see the above section on how to do so).
One final note about sessions: if you do everything cleanly/correctly, multiple user sessions should be supported without any additional work.
Append Efficiency
As a reminder, here’s what the append efficiency requirement states:
The bandwidth of the AppendToFile()
operation MUST scale linearly with only the size of data beging appended and the number of users the file is shared with, and nothing else.
Here are some things that append should NOT scale linearly with:
- The size of previous appends. If I make one append of a billion bytes and another append that’s one byte, the second append should be fast and shouldn’t have to download the previous append - or any previous file content, for that matter!
- The number of previous appends. If I make one billion appends to a file, and then make one more append, that very last append should be fast. I shouldn’t have to download metadata that scales with the number of appends to the file!
- The number of files a user has. Designs that store maps or lists of filenames in user structs typically fail this test.
This isn’t a comprehensive list of everything that append should NOT scale with: see if you can think of more things that may grow linearly that shouldn’t.
NOTE: The append tests are capped at a 10% penalty. If you’re overwhelmed with other components, we recommend focusing on those (e.g. sharing, revocation, etc.) and coming back to append when you’re confident in the rest of your implementation. Sharing and revoking are weighted much more heavily than append efficiency!
Sharing and Revocation
A few notes on common design patterns we saw –
- Non-revoked users should not have to receive the file again.
- Consider a case where A shares with B and C, then revokes access from B. C should still be able to access the file (load, store, append, etc.) without having to call
AcceptInvitation()
again.= - If you’re updating any metadata while revoking, you’ll have to design your revoke function so that non-revoked users receive the updated metadata without calling
AcceptInvitation()
again. - When designing your revocation scheme, pay close attention to the critical design requirement around revocations: revocations are only defined in the case where the owner is revoking access from a direct (top-level) child. All other revocations are undefined, and you don’t need to worry about them.
- If you’re stuck on revocation, think about “permission groups.” Can you assign users into groups based on where they lie on a sharing tree?
- Consider a case where A shares with B and C, then revokes access from B. C should still be able to access the file (load, store, append, etc.) without having to call
- Revoked users can circumvent the API to regain access.
- We noticed some groups simply set a flag to support revoke, and checked the flag in their API functions.
- Note the revoked user can always record any information they have (or had) access to, which includes the location of file and keys used to decrypt the file before the file is revoked.
- The revoked user can then use this information to directly access Datastore and regain access to the file.
- The revoked user should not be able to learn ANYTHING about the file after their access is revoked (a.k.a. they shouldn’t be able to detect when file contents are updated!)