I Automated Infosec "Thought Leadership", and it's Hilarious


Being a Thought Leader is Hard

@thought__leader - Thinking thoughts for you, so you don’t have to.

The infosec industry is full of “thought leaders”. These are people who are on the forefront of the industry, keeping up with latest trends, technologies, and philosophies.

Or they are heavy on the buzzwords and prolific on Linkedin/Twitter. That works, too.

Unfortunately, both of these take way too much time for me. In fact, I’d argue that they take too long for our industry. So this weekend, I set out to automate thought leadership, so that we can spend more time doing things that matter - things like coming up with marketing for the next CVE or finding obscure reflective XSS bugs that affect literally no one.

This resulted in @thought__leader. And it’s hilarious.

Becoming a Thought Leader

I created @thought__leader to be simple. Basically, it works by taking in a whole bunch of tweets from other thought leaders and spitting out new tweets using parts of what it gathered.

Finding Thought Leaders

First, I had to find thought leaders. My criteria was simple: the candidate had to have over 10k Twitter followers and they had to have a bunch of tweets. After all, I’m looking for thought leaders, not thought thinkers. This included infosec researchers, journalists, and more.

I also Googled “infosec thought leaders” and added people listed in the blog posts. They checked out somehow, I guess.

Then I added @threatbutt.

This actually narrowed down the list of people quite a bit. All in all, I came up with these 99 accounts in no particular order. I’m sure I missed someone obvious.

On a serious note, while we’re poking fun at the concept of thought leadership I should note that these folks are actually fantastic contributors to the field.

Gathering Thoughts

The next step was to download a whole bunch of premium thought leadership using the python-twitter library.

First, we connect to the Twitter API:

import twitter
api = twitter.Api(consumer_key='....',
          consumer_secret='....',
          access_token_key='....',
          access_token_secret='....')

Then, for every user, we can query the statuses/user_timeline to grab their most recent tweets. Since rate limits allow 180 requests per 15 minutes, I kept track of where I was and ran the script a few times. I also chose to get rid of retweets and replies to help keep original content where possible.

tweet_corpus = ""
with open("thought_leaders.txt", "r") as users:
	for user in users.readlines():
		user = user.strip()
		since_id = ""
		for user in users.readlines():
			user = user.strip()
			timeline = api.GetUserTimeline(screen_name=user, count=200, exclude_replies=True, max_id=tweets[user].get("max_id"))
			if len(timeline) <= 1: continue
			tweet_corpus += '\n'.join([t.text for t in timeline if not t.text[:2] == "RT"]) + "\n"

Running this a few times resulted in just over 52k tweets. The leadership sitting in memory at this point is palpable. Now let’s become a thought leader.

Generating Thoughts

To become a beacon of infosec futurism we need to be able to create new content from the tweets we grabbed. Taking the cue from other Twitter bots like @horse_ebooks or the awesome Subreddit Simulator project, we’ll use Markov Chains.

If you’re new to Markov Chains, this explanation from the Subreddit Simulator description will work good enough:

The text for [tweets] are generated using “markov chains”, a random process that’s “trained” from looking at real data. If you’ve ever used a keyboard on your phone that tries to predict which word you’ll type next, those are often built using something similar.

Fortunately for us, there’s a Python library called markovify that makes building Markov Chains super easy.

To use the library, we just need a corpus of data to work with. In our case, it’s the tweet_corpus we built up earlier. From the markovify readme, if we are using newline split data, we can send our corpus to an instance of the markovify.NewLineText class to let it build up our Markov Chains.

m = NewlineText(tweet_corpus)

From here, we can generate new tweets by making calls like this:

m.make_short_sentence(140)

Here are some samples:

Hackers hijacking water treatment plant controls shows how to resist and even with no security #6wordcyber
Dell apologizes for HTTPS certificate for Google Maps tampering http://t.co/WyNrI7Snk3
Clearly this ex-defense minister is worried @kevinmitnick could whistle into a website for that gig.
Management will spend $12k on IM but wont pay for a Diffie-Hellman 1024-bit pair in 154 minutes http://t.co/UZerPfZbS3
Man-on-the-side content injection attacks in wake of Ashley Madison passwords http://t.co/zw4fqvMAbk
Just had my frist experience with @SixtUSA and it wants its network management problems back.
Working on Tool to Test Network Security. And guess what? It is an even bigger God-complex https://t.co/vPcPyorOQp
Let's do the talk again, for sure, especially as a geek, this makes me miss TrueCrypt
Microsoft and fixes for Adobe Flash Player or Windows, it's time for Valentine's Day - https://t.co/L6tkJEmjld

Awesome. All that’s left is to saturate the infosec Twittersphere with glorious thought leadership. Right now, I have the bot tweeting every 7 minutes. I’m also shamelessly re-grabbing the latest tweets every hour and retraining the model. I’ll open source the code eventually.

Consider thought leadership solved. Now we can get back to naming those CVE’s.