Database size not decreasing after deleting huge amount of files

database
february_2019

#1

I made some extensive cleanup of my Google Drive and removed hundreds of thousands of files. However, my database file inside the .config/Insync directory didn’t change a bit in size. It was still around 1GB(!).

I poked around in it and noticed that there was a flag removed in the table gd_base_files which was set for the files I had deleted. I then proceeded to remove all entries from other tables that referenced these files’ ids and the entries itself. After vacuuming the database, it now is only 75MB and Insync still works just fine.

But how is this supposed to work without human intervention? Will the database stay at 1GB and only grow further? Will the records of those long deleted files still be in that database? Why?


#2

I’ve noticed this too.

Speculation:
Preserving metadata of deleted files can help future optimizations.

Example:
User moves lots of files to outside of Insync folder (potentially into recycle bin) on the local computer. Insync syncs this change, and removes corresponding files on GD to trash.
Later, user decides to move these files back into Insync. By preserving metadata, Insync can tell (by MD5 and inode) that these are the same files user removed earlier, and can restore files server-side from GD’s trash, and move the correct new location, without uploading any significant amount of data.

Of course, Insync doesn’t currently use this sort of optimization. But Dropbox, for example, does have this optimization. It makes sense for Insync to preserve such metadata in the database if they plan to do this optimization in the future.

Another possibility, of course, is that they simply haven’t figured out a safe way to remove such metadata. Sync logic can get very complex.


#3

Thing is: Google keeps files in the Trash for 30 days. In the database however I’ve found records of files deleted a much longer time ago. This is also a data privacy issue if one can track down files I’ve long deleted …


#4

True that it should totally safe to delete metadata of files permanently deleted in Google Drive, and Insync doesn’t do this either.

However…

Google keeps files in the Trash for 30 days

This isn’t true. Dropbox, OneDrive, GMail, Google Photos, etc. all do this, but not Google Drive.
If you haven’t cleaned your trash for a long time, you’d probably be in for a surprise how much storage trashed files are taking up.
I even wrote a Python script just to help me clean Google Drive’s trash:


#5

Well, then Insync should check periodically, what items are in trash and which are purged from there and remove them from the database. (And maybe run a VACUUM once in a while.)