Cleaning up Mercurial & Git repos

Recently, I cloned a repo that I created on my laptop and pushed to Bitbucket back down to my desktop. At that point, I discovered that I had included all the vendor files for the project within the repository. That’s a no-no. It’s a waste of space, mainly because the dependent libraries can be downloaded by running composer once the new clone is created. So, I decided to remove the vendor files from the repository without dropping them from the file directory.

I thought I had solved the problem by updating Mercurial’s .hgignore file, which is used by Mercurial to mark which files and folders to not track. Unfortunately, the vendor files had been included in the initial repository creation. Going forward, .hgignore would ignore new files and folders in the vendor directory, but I still needed to forget the files already listed. It turns out that ‘hg forget file_name’ would do the trick.

I wanted to be sure it worked, so I created a ‘tests’ folder and touched a file inside that folder. Sure enough, the file appeared in Mercurial’s commit list. I ran a commit, then tried

> hg forget a.php

This removed the file from the committed list, but the file still appeared in the staging area, which did not make sense. When I updated .hgignore to not track the ‘tests’ directory, the new file disappeared from the staging area. That’s what I wanted.

I now had all the files in the vendor directory for forget. I moved to the vendor directory and entered the following command to forget the .json files:

> hg forget -I **.json .

That removed the .json files from the commit list. I did the same for .js, .map, .txt, .conf, .tpl, .yml, .css and .html files. Oddly, only one .php file was forgotten in this manner in the vendor directory. I wonder what might happen if I try ‘ > hg forget -I vendor/** .’ ? (I may try that if I have to clone this repo again.)


It takes less command line work to forget files in git. This command did it all:

> git rm -r –cached vendor

where rm (folder remove) -r (recursively) –cached (from index only) vendor (the folder name). That’s much easier.


Still valid!

I knew I had written about using Mercurial and Git at the same time. I realized this would be useful when I decided to rebuild a helper application and (eventually) post it to Heroku. Heroku uses git to push files back and forth. As mentioned before, I used Mercurial first and was used to it. I want to stick with it because I’m used to to SourceTree, their desktop client. I followed my old instructions and it worked.

I noticed that Github allows unlimited private repositories. That was one of the reasons why I initially chose Bitbucket over Github so many years ago. I still have to pay $7/month, which I don’t have to do with Bitbucket. Ideally, the school would pay for it, but that’s not happening for a while.

(future me here)

I know why I did not drop Bitbucket entirely. It’s the free private repositories that I like. What I should have done is created the repo as a Git repo and saved that up to Bitbucket. I was not thinking.

hg vs git? Why not both?

For my projects, I prefer Mercurial over Git as the DVCS. Mostly, it’s a personal preference. Four years ago, when I was investigating what to use instead of Subversion, Mercurial seemed easier, so I stuck with it. Bitbucket, which at one time only hosted Mercurial repos, is free for personal projects, while Github charges at least $7/month to hold projects. SourceTree (Atlassian’s desktop app for DVCS) first handled Mercurial repos, then eventually handled Git repos. These people feel that git is good for distributed projects, while Mercurial is good for personal repositories.

SourceTree makes Git really easy to use. Eventually, I’ll get used to the command line instructions for git. I do know that more and more projects are saved in github and provide git instructions to download the files. I’ve used Heroku for a class on web applications and it made deployment very simple, as long as the project was stored it a Git repo. So … for now, I’ll run them both.

Usually, I start personal projects in Mercurial and get it set up and backed up to Bitbucket using SourceTree. I’ve realized that I need to set SourceTree/Mercurial to ignore the git files needed. Before setting up Git, I find the hgignore file and add ‘.git’ to the glob section. It’s much easier to set up hg to ignore files before they are committed into the hg repository. I’ve forgotten in the past and have found myself trying to run ‘hg remove’ (I think) on all the extra files that I can then ignore in hgignore. Too much efforts, when I can start the process on the right foot with the correct command.

I then set up git to watch the project directory with a simple ‘git init’ command. Before I add or commit anything to the Git repo, I update the gitignore file to ignore all the Hg files by adding ‘.hg/‘. I can then safely run ‘git add .’ and ‘git commit -m “first message” ‘ without worrying about all the HG files. It seems to work, so why not both.