File Integrity Checking

Hi @marte. Thanks for the feedback.

General comment:
I am a little concerned with your responses however (or maybe I’m just misunderstanding :slight_smile: ), as so far I have not received a clear answer from anyone in a simple format stating categorically “Yes, we check file integrity for each file after its synced using mechanism XX (or not)”.
If we as users are to use your product with full trust and without hesitation accept its results, then we should have a guarantee that its doing its job 100%. Now, correct me if I’m wrong, but a file syncing tool has 2 jobs as far as I am concerned - 1. provide a channel to move the content of a file from side A to side B, and 2. check that both sides match after the move has completed.
Making a statement like “File corruption due to network transfer shouldn’t be a concern” is very dangerous as there are lots of things that can go wrong with a network transfer over a WAN link. Sure you are using TLS to secure the data packets and confirm packet integrity for each packet as it goes across, but that in itself still does not guarantee overall file integrity for a file as a whole (especially with large files synced over WAN links that may experience temporary breaks at any time). One can “assume” that the file will be okay but unless the file as a whole is checked in some way (like a hash check) on both sides after a sync, its impossible to guarantee its integrity.
As more users are totally replacing their local file servers with Google Drive, its imperative that we know when using your tool to help us sync content from local to cloud that there is no way whatsoever for us to corrupt any of our valuable data in the process.

Question responses:

  1. So what happens if its not in the same upload session and it times out? Does the sync for that file start again? What happens to the partially synced file on the target side? Does it get removed or just left there before the new sync starts? Who cleans up all these partials over time?
  2. If you are logging MD5 mismatches, where is this logged? In your logs.db? So other than logging it, you are basically not taking any corrective action at all? What if there is a mismatch not because of an API metadata failure but a genuine file difference? If I was to search in logs.db for MD5 mismatches, what field or parameter value should I be searching for?

Werner