A good friend, Sam Matthews, recently gave a talk in December 2014 at a conference of the Australian Modernist Studies Network on “Transnational Modernisms”. Sam spoke about his discovery of a reference to a print-shop from Balzac’s “Two Poets” in Christina Stead‘s novel Seven Poor Men of Sydney. Sam later suggested that I check if we couldn’t use Lateral’s text matching service (the “Recommender (BYO documents!)” API) to confirm this reference to Balzac and potentially uncover other ones. As you’ll see below, the preliminary results are very encouraging. This is hardly a conclusive experiment, but
In case you would like to search for references to Balzac’s works yourself, you can do so by reusing the API key I created: b4de9b9183df4cbf8d70cde15609800a
.
This is how I proceeded:
- I downloaded the Complete works of Balzac from Project Gutenberg. This gives one HTML file for each of Balzac’s works.
- I split each work into paragraphs, labelling the paragraphs by their work and position within the work. Balzac wrote many paragraphs, it turns out!
- I subscribed to the API at Lateral, obtaining an API key.
- I installed Francis Tzeng’s python package for accessing the Lateral API
- Using the python package, I added the paragraphs of Balzac to the Lateral recommender. Short paragraphs containing not enough meaningful words were rejected; in total, the number of meaningful paragraphs of Balzac indexed was over 21,000.
- Again using the python package, I searched for the closest paragraphs of Balzac to the passage of Stead that Sam had indicated to me (see below).
The passage of Stead’s novel that evokes the print-shop appears below (from Chapter 3):
devil’s kitchen where the word is made bread … triangular park … A wide old doorway opened beside the tobacconist’s shop, and over it was a name, white on blue, “Tank Steam Press, Ground Floor.” The tobacconist owned the old single-storey building and rented out to several establishments the mouldy apartments of the ground and first floor. In the attic was the man who did heliogravure. The building had once been a private house. Its court was now a cart-dock and opened into the other street. Its first-floor bathroom at the head of the stairs contained the old water-closet, used by all the workers in the house, a gas-ring to make tea, and the usual broken chairs and out of-date telephone directories. The distinctive smell of the building came from this closet and from the printing-ink.Joseph walked through the old doorway, went by a staircase and entered the large airy double room occupied by the Press. He opened the glass back-door and moved about among the presses, curiously inspecting the jobs in their various stages, picking up a paper, looking through the bills on a bill-hook, putting his finger in the dust in the little glassed-in office of Chamberlain, the owner, and shutting off the stove, lighted by the cleaner, because the day was warm enough.
Below are the paragraphs of Balzac that are semantically closest to the text above, according to Lateral. As you can see, the 1st and the 9th closest paragraphs (of over 21,000!) indeed come from “Two Poets”, and inspection reveals that they indeed concern the printshop! You can click the links to fetch the corresponding paragraphs using the API. The intermediately ranked results seem to be architectural descriptions.
[
{
"distance": 0.034905,
"document_id": "TWO POETS-00019"
},
{
"distance": 0.035945,
"document_id": "THE COLLECTION OF ANTIQUITIES-00557"
},
{
"distance": 0.037409,
"document_id": "SONS OF THE SOIL-01139"
},
{
"distance": 0.038067,
"document_id": "A MAN OF BUSINESS-00034"
},
{
"distance": 0.038168,
"document_id": "URSULA-01020"
},
{
"distance": 0.038216,
"document_id": "COUSIN PONS-01938"
},
{
"distance": 0.03837,
"document_id": "COLONEL CHABERT-00023"
},
{
"distance": 0.038545,
"document_id": "COUSIN BETTY-01508"
},
{
"distance": 0.038823,
"document_id": "TWO POETS-00018"
},
{
"distance": 0.038891,
"document_id": "RISE AND FALL OF CESAR BIROTTEAU-01382"
},
{
"distance": 0.039151,
"document_id": "THE RED INN and others-00045"
},
{
"distance": 0.039195,
"document_id": "THE LESSER BOURGEOISIE(The Middle Classes)-00635"
},
{
"distance": 0.039369,
"document_id": "SCENES FROM A COURTESAN'S LIFE-00761"
},
{
"distance": 0.039377,
"document_id": "THE TWO BROTHERS-00663"
},
{
"distance": 0.039471,
"document_id": "HONORINE-00036"
},
{
"distance": 0.039808,
"document_id": "Z. MARCAS-00043"
},
{
"distance": 0.039896,
"document_id": "RISE AND FALL OF CESAR BIROTTEAU-00623"
},
{
"distance": 0.040041,
"document_id": "THE VILLAGE RECTOR-00313"
},
{
"distance": 0.040253,
"document_id": "A WOMAN OF THIRTY-00700"
},
{
"distance": 0.04031,
"document_id": "CATHERINE DE' MEDICI-01059"
}
]