If you are data scientist like me, then you love data and always trying to see if there are patterns and predictions that can be made with data. And since BlackHat, a well-known cybersecurity conference, was coming up – I thought let’s see what kind of data I could find to predict the importance of AI at BlackHat.
Note that this was not an exhaustive analysis and it was done based on basic, publicly accessible information. Still, there were a few surprising outcomes [that made me think is AI hitting its peak interest or maybe AI/cyber experts have just moved on from BlackHat?
In summary, here were a few key observations:
- The number of times AI/ML terms were used in BlackHat briefings or abstract descriptions dropped significantly from 2018 to 2019.
- Yet, the number of sponsors/exhibitors using AI/ML terms remained about the same from 2018 to 2019.
- The type of companies that presented briefings in 2017-2018 using AI/ML terms changed dramatically in 2019. 2017-2018 saw many of the bigger, well known brands present. 2019 saw smaller, private companies, or academic institutions present. For example, Google was represented in 2017, but had no briefings mentioning AI/ML terms in 2018 or 2019.
To be clear, while the data and trends describe one story, there could be other perspectives from the same data. For example, perhaps because 2018 saw the maximum number of briefings using AI/ML terms, the BlackHat organizers decided to stop accepting as many AI/ML topics. Or perhaps AI/ML and cyber experts decided to go to other conferences (including AI conferences with cyber tracks). Or … (please post your best conspiracy ideas in the comments!)
- Collected my datasets:
- Used BlackHat publicly available information on conferences from 2016-2019.
- Also used various references to settle on the most commonly used terms for AI/ML
- For reference, used Google search (for AI/ML term counts trends from 2016-2019) and the AI Index 2018 Annual report for both trends and AI/ML terms.
- Using python libraries “BeautifulSoup”, “Requests” and “Selenium”, I pulled all the abstract and company information from 2016-2019 from the Black Hat website.
- Next, I cleaned the data and put it in a usable format for analysis : (removed html tags, isolated body of text from header, removed unnecessary spaces/newlines)
- I decided on a list of common AI/ML keywords that one could potentially use in an abstract about AI/ML.
Keywords I used were:
[‘AI’, ‘machine learning’, ‘artificial intelligence’, ‘neural network’, ‘Natural language processing’, ‘Computer vision’, ‘Supervised learning’, ‘Unsupervised learning’, ‘Reinforcement learning’, ‘classification’, ‘Deep learning’, ‘NLP’, ‘Cluster’, ‘data science’]
Top AI/ML topical keywords and their frequencies were:
|Keyword||Frequency of keyword|
|AI / Artificial intelligence||17|
I also calculated various statistics such as keyword frequency per year, keyword frequency based on Blackhat’s track labels per year, etc.
More information can be found here.
This was a simple attempt to use data to understand the meta trend of the interest of AI intersecting with cybersecurity. But the conclusions were definitely surprising and worth investigating further to understand why BlackHat does not have many AI/ML related briefings in 2019 compared to many more in 2016-2018.