Python updates and OpenCV

I recently updated my laptop to run Python 3.7.0. I forgot that updates knocks out all the Python virtual environments that expect Python 3.6. I’ve worked my way through the virtual environments, removed them, recreated them and restarted them.

> rm -rf venv

> python3 -m venv venv

> source venv/bin/activate

I noticed that I don’t have a requirements.txt file to make it simple to reload Python modules. I’ve been creating that file as needed. I also noticed I was having trouble with loading the modules needed for an OpenCV project.

i installed OpenCV with homebrew and I knew it was outdated, so I upgraded it (and other dependent files) with brew. I reloaded all the related Python modules, but still could not get the sample file to run.

> brew outdated

> brew upgrade opencv

Instructions here show full steps for installing OpenCV. Note step 6 where a symlink is set up to point from the openCV libraries installed by home-brew and pointing back to the virtual environment

> cd venv/lib/python3.7/site-packages/

> ln -s /usr/local/opt/opencv3/lib/python3.7/site-packages/cv2.cpython-37m-darwin.so cv2.so

Note that file paths now show 3.7. If and when Python 3.8 or Python 4 show up, the paths will probably need to be updated to show the Python version.

Advertisements

installation of OpenCV, finally

I mentioned in another post about my plans for WorldWarIICasualtyProject.org. In short, the U.S. National Archives has scanned pages out of books that list American casualties that took place in World War II. I was curious to find out more, but discovered that those records did not exist in searchable form. I thought it would be interesting to figure out how to scan them (gif files, really!) and read the data as OCR.

I decided to go ahead on my initial plan: use a Python OCR module to read the scan. However … the Python module I tracked down (pytesseract) also required PIL (another module, part of the dependencies) and strongly suggested I install the python science packages. I figured I would need them at some point, so I installed numpy, scipy, matplotlib, scikit-image, scikit-learn, ipython, and pandas. ( https://www.learnopencv.com/install-opencv3-on-macos/)

At this point, I paused. I found several pages that suggested OpenCV be installed with Homebrew. That’s not a big deal because I use Homebrew for python 2/3. It gets confusing here. At one time, OpenCV was kept in a specialized area named “homebrew/science” but was moved to “homebrew/core”. I’m told “homebrew/science” is empty, so there should be no reason to link to it. We’ll see.

Note: use ‘> brew tap’ to list all taps connected for homebrew

Also note: opencv3 does not exist anymore. I think it has been renamed to opencv. Opencv2 has been renamed ‘opencv@2’. … So confusing …

Then there’s the question of linking OpenCV to “… Homebrew Python’s site-packages directory”. What? See https://www.learnopencv.com/install-opencv3-on-macos/

I’m sticking with these instructions: https://robferguson.org/blog/2017/10/06/how-to-install-opencv-and-python-using-homebrew-on-macos-sierra/ except for the part where I tap into homebrew/science (it doesn’t exist any more) and I install opencv3. (It’s been renamed to opencv).

2018-02-21 update

I installed OpenCV through homebrew. Lots of dependencies were installed. Interestingly enough, I can see opencv through the default homebrew python3 install, but not in virtual environment I created for custom work. In other words:

> python3
>>> import cv2
>>> cv2.__version__
‘3.4.0’

However, when I go to the virtual environment set up for ww2cp, I don’t see it.

> source ww2venv/bin/activate
(ww2venv) > python3
>>> import cv2
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
ModuleNotFoundError: No module named ‘cv2’

So, following the instructions here: (https://robferguson.org/blog/2017/10/06/how-to-install-opencv-and-python-using-homebrew-on-macos-sierra/), I set up a symbolic link between homebrew’s openCV install and the site-packages inside the ww2 venv folder.

Now it works!

Message digests OR I should have known that

I have lots of old SQL dumps stored in backups. I wanted to find a way to check to see if I was storing the same files over and over again. I did not want to check them line by line, because it would take too long. I remembered that message digests are a way to check to see if a file has been tampered. So, if I create a message digest of two files that I think are the same, a matching digest should (ideally) prove that they are the same.

By the way, what is a message digest? It’s “ … a cryptographic hash function containing a string of digits created by a one-way hashing formula”. ( https://www.techopedia.com/definition/4024/message-digest ). In other words, it is the result of sending a file or string through a one-way function and outputting the result. Ideally, it can be used to check to see if a file has been modified. If two files are related, but slightly different, they will generate two different message digests.

Back to digests. I like the idea of taking the sql dumps and generating a message digest. However, I noticed that the SQL dumps usually have a timestamp showing when the dump was created listed inside the SQL comments. This will automatically create a different digest. Can I remove the SQL comments and create a digest from that?

It turns out that I can. It works nicely.

> grep –regexp=“^–.*” <path-to-sql-dump>

shows all the SQL comments in the file

> grep –regexp=“^[^–.*]” <path-to-sql-dump>

shows everything but the SQL comments. Pipe that result into a digest function

> grep –regexp=“^[^–.*]” <path-to-sql-dump> | md5

shows the resulting digest using md5. Similarly, using “openssl sha1”, “shasum”, “shasum -a 512”, “shasum -a 512224” and “shasum -a 512256” will generate different digests, which can all be used to compare SQL commands in a SQL dump file.

I’m a little sad that “shasum” did not work completely. It adds the file name after the digest and hyphen, allowing storage of the digests. However, since the file is piped into the command, there is no file name to add to the end of the file. I’m sure there’s a way to add it to a file, though. Maybe something like?

> grep –regexp=“^[^–.*]” <path-to-sql-dump> | shasum; echo !!:2

then search for ‘- line-break’ and replace with ‘- ‘. … Maybe, maybe …