I have lots of old SQL dumps stored in backups. I wanted to find a way to check to see if I was storing the same files over and over again. I did not want to check them line by line, because it would take too long. I remembered that message digests are a way to check to see if a file has been tampered. So, if I create a message digest of two files that I think are the same, a matching digest should (ideally) prove that they are the same.
By the way, what is a message digest? It’s “ … a cryptographic hash function containing a string of digits created by a one-way hashing formula”. ( https://www.techopedia.com/definition/4024/message-digest ). In other words, it is the result of sending a file or string through a one-way function and outputting the result. Ideally, it can be used to check to see if a file has been modified. If two files are related, but slightly different, they will generate two different message digests.
Back to digests. I like the idea of taking the sql dumps and generating a message digest. However, I noticed that the SQL dumps usually have a timestamp showing when the dump was created listed inside the SQL comments. This will automatically create a different digest. Can I remove the SQL comments and create a digest from that?
It turns out that I can. It works nicely.
> grep –regexp=“^–.*” <path-to-sql-dump>
shows all the SQL comments in the file
> grep –regexp=“^[^–.*]” <path-to-sql-dump>
shows everything but the SQL comments. Pipe that result into a digest function
> grep –regexp=“^[^–.*]” <path-to-sql-dump> | md5
shows the resulting digest using md5. Similarly, using “openssl sha1”, “shasum”, “shasum -a 512”, “shasum -a 512224” and “shasum -a 512256” will generate different digests, which can all be used to compare SQL commands in a SQL dump file.
I’m a little sad that “shasum” did not work completely. It adds the file name after the digest and hyphen, allowing storage of the digests. However, since the file is piped into the command, there is no file name to add to the end of the file. I’m sure there’s a way to add it to a file, though. Maybe something like?
> grep –regexp=“^[^–.*]” <path-to-sql-dump> | shasum; echo !!:2
then search for ‘- line-break’ and replace with ‘- ‘. … Maybe, maybe …