Message digests OR I should have known that

I have lots of old SQL dumps stored in backups. I wanted to find a way to check to see if I was storing the same files over and over again. I did not want to check them line by line, because it would take too long. I remembered that message digests are a way to check to see if a file has been tampered. So, if I create a message digest of two files that I think are the same, a matching digest should (ideally) prove that they are the same.

By the way, what is a message digest? It’s “ … a cryptographic hash function containing a string of digits created by a one-way hashing formula”. ( https://www.techopedia.com/definition/4024/message-digest ). In other words, it is the result of sending a file or string through a one-way function and outputting the result. Ideally, it can be used to check to see if a file has been modified. If two files are related, but slightly different, they will generate two different message digests.

Back to digests. I like the idea of taking the sql dumps and generating a message digest. However, I noticed that the SQL dumps usually have a timestamp showing when the dump was created listed inside the SQL comments. This will automatically create a different digest. Can I remove the SQL comments and create a digest from that?

It turns out that I can. It works nicely.

> grep –regexp=“^–.*” <path-to-sql-dump>

shows all the SQL comments in the file

> grep –regexp=“^[^–.*]” <path-to-sql-dump>

shows everything but the SQL comments. Pipe that result into a digest function

> grep –regexp=“^[^–.*]” <path-to-sql-dump> | md5

shows the resulting digest using md5. Similarly, using “openssl sha1”, “shasum”, “shasum -a 512”, “shasum -a 512224” and “shasum -a 512256” will generate different digests, which can all be used to compare SQL commands in a SQL dump file.

I’m a little sad that “shasum” did not work completely. It adds the file name after the digest and hyphen, allowing storage of the digests. However, since the file is piped into the command, there is no file name to add to the end of the file. I’m sure there’s a way to add it to a file, though. Maybe something like?

> grep –regexp=“^[^–.*]” <path-to-sql-dump> | shasum; echo !!:2

then search for ‘- line-break’ and replace with ‘- ‘. … Maybe, maybe …

Advertisements

My adventure with AWS, part 1

I decided to try out Amazon AWS to find out what it could do for me. As it turns out, I also has a project (WorldWarIICasualtyProject.org or ww2cp for short) that I wanted to host cheaply. Amazon AWS promises to do that.

First things first, I had to create an account. It’s not hard provided you have a credit card handy. Amazon tests the card to see if it has money (in case it’s a debit card). I chose to use a debit card first because there are no default brakes on spending. (Debit card runs out of money. I assume AWS stops services, but am I going to test it?) You have to protect your accounts to prevent others from spending your money. You also have to watch what you do decide to activate because there are no auto-shutdowns if you spend too much money. Live dangerously? Not really. Just figure out how to set alarms and stay on top of them.

AWS also recommends not using your root (or first) account for daily use, in case it gets compromised. I detoured over to IAM (Identity and Agent Management) and created a separate account that I would use every day. Oddly, it’s possible to assign almost all root powers to any child account, so, once again, be careful.

Once I was satisfied with the child accounts, I started testing S3 (Simple Storage Service). They operate on the concept of “buckets”, that hold pretty much everything. Amazon has built a pseudo-folder structure to allow some organization, but really everything goes in one big bucket.

One cool thing about S3 buckets is that they get mirrored to other nodes within a region. The idea is that this should make it easier to pull the data from the bucket regardless of where a browser is within a region. This becomes important when using Route53.

Route53 is Amazon’s version of DNS. I bought ww2cp from NameCheap.com and used them as the DNS to a placeholder while I figured out what I was going to do with the website. I discovered that I could have the S3 buckets I created earlier serve as a website, provided I let Route53 handle the DNS. Coolness!

Weird fact: Route53 assigned four name servers to resolve ww2cp. When I used nslookup to check for the correct IP address for the website, I would get a revolving set of four “web” servers instead of the one (parking) IP address I used before. I bet that has to do with the S3 mirrors I mentioned above.

Setting up Route53 to handle DNS is not hard. (There once was a time when AWS documentation was cryptic and undecipherable. If you read the same docs often enough, they make sense.) Anyway, I set up Route53 to handle the DNS services required to make the S3 bucket host the files for the website. I updated the name server information over at NameCheap … and nothing happened. For some odd reason, my changes to the name servers over at NameCheap kept reverting to their original settings. Eventually, I had to get NameCheap tech support involved to get the name server changes to stick, but it did.