So I launched my first Twitter bot today: The Gas Face Bot, a bot that simply picks a random proper noun from a huge list taken from CMU's NELL project, and tweets it out in the the form "_____ gets the #gasface". With approximately 2.5 million nouns, I think this bot will own that hashtage for quite some time.
Picking a random piece of data from 2.5 million choices turned out to be a sticky wicket, especially since I was using Google Apps Script to run the bot. Do I store the data in the PropertiesService? In a cloud MySQL database? I churned through a couple of ideas before I hit on the simple idea of storing the data in text files; that is, split the 2.5 million item list into files 10,000 lines long (using the beloved Unix command split, which I stupidly did not use after trying to roll my own inefficient text splitter), store the files in a Google Drive directory, and write code to access these files using the DriveApp class.
Splitting up the files made it much easier to handle the data in GS (it's only a few thousand lines, which can quickly be split up into an Array), and made it much more likely to pick a random item somewhere in the middle of the list. So imagine my surprise when one of the very first tweets it sent out referenced one of my wife's favorite bands:
Julie Ruin gets the #gasface— The Gas Face Bot (@thegasfacebot) January 18, 2016
I think this was the Twitter bot gods sending me a message, but I can't figure out if it's a good or bad omen.