Some time ago I realized that writing unittests for my Python software was really necessary. By having most of my source Python code covered with tests, I could be confident enough to do modifications to different sections of the code and sleep soundly, if all the tests passed. If the test coverage was close to 100% and all the tests passed, this was indication that I did not unwillingly break other parts of the code.

Otherwise this could happen when there are not enough tests for the code:

So I was very happy with my tests until I read this tweet:

Mocking network calls

I remember reading somewhere about a way to mock things when running unittests. After some googling I found out that Python 3.3+ comes with the library mock as part of unittest.

So it appears that is possible to mock the network call in the tests. Tests that do not perform network calls have the advantage that they will run quickly and will not be affected by network connection problems.

I noticed that some tests in Biopython are flagged as "offline only" so they do not run in Travis-ci because the unittests need to do network calls. One set of these tests is in the file test_Entrez_online.py.

Let us examine one of them:

def test_read_from_url(self):
    """Test Entrez.read from URL"""
    handle = Entrez.einfo()
    self.assertTrue(handle.url.startswith(URL_HEAD + "einfo.fcgi?"), handle.url)
    self.assertTrue(URL_TOOL in handle.url)
    self.assertTrue(URL_EMAIL in handle.url)
    rec = Entrez.read(handle)
    handle.close()
    self.assertTrue(isinstance(rec, dict))
    self.assertTrue('DbList' in rec)
    # arbitrary number, just to make sure that DbList has contents
    self.assertTrue(len(rec['DbList']) > 5)

Well, this test is actually testing two things:

  • that the URL of the handle was constructed in a correct way.
  • that the data obtained from the network call is parsed and returns contents in the variable Dblist.

We could simplify the test to focus on testing that the data returned from the network call was parsed correctly:

def test_read_from_url(self):
    """Test Entrez.read from URL"""
    handle = Entrez.einfo()
    rec = Entrez.read(handle)
    handle.close()

    self.assertTrue(isinstance(rec, dict))
    self.assertTrue('DbList' in rec)
    # arbitrary number, just to make sure that DbList has contents
    self.assertTrue(len(rec['DbList']) > 5)

One way of looking at this test is by drawing boxes:

The function Bio.Entrez._open also does two things: (i) constructs the URL that will be used for the network call and (ii) performs the actual network call by calling Bio._py3k.urlopen. But, Bio._py3k.urlopen is actually a function from the standard Python library (urllib.request.urlopen).

If we refactor Bio.Entrez._open to remove the lines that construct the URL, the only use of _open is to perform the network call (via urlopen). Thus, as we don't really need to test for _open, we can mock it.

I added a patch to the test that mocks the network call performed by Bio.Entrez._open and returns data from NCBI in binary form.

The patch from the mock library can be used as a decorator. So the test ends up like this:

@patch("Bio.Entrez._open", return_value=_binary_to_string_handle(open("Tests/Entrez/einfo1.xml", "rb")))
def test_read_from_url(self, mock_open):
    """Test Entrez.read from URL"""
    handle = Entrez.einfo()
    rec = Entrez.read(handle)
    handle.close()
    self.assertTrue(isinstance(rec, dict))
    self.assertTrue('DbList' in rec)
    # arbitrary number, just to make sure that DbList has contents
    self.assertTrue(len(rec['DbList']) > 5)

The expected data from the call to NCBI is in the file einfo.xml and it is read and parsed by Entrez.einfo and Entrez.read respectively. Thus we will be testing only these two functions without the need to do any network call.

Therefore this unittest will run much faster and can be unflagged so it is also run by Travis-ci.

Tags: Python