Co-written with Praveen Hebbagodi, CTO of Netenrich.
Netenrich’s Knowledge Now (KNOW), is a free AI-based threat intelligence news aggregator, that provides a broader and deeper context of emerging threats and attacks – in one place. We believe that KNOW fulfills a very specific and much-needed role in the current cybersecurity landscape. However, before we go any further, let’s look at the main inspiration behind KNOW – Google News.
What is Google News?
Google News is the world’s largest news aggregator. It presents a continuous flow of articles organized from thousands of publishers and magazines.
The screenshot above is how your typical Google News homepage looks. So, how exactly do we collate these news stories? Once you have logged into your account, your stories will be personalized for you. The system’s computer algorithm determines which news stories, images, and videos appear and in what order.
Along with the regular news, the Google News Product Experience team can add temporary topics for major events like the Fifa World Cup or Elections.
What is KNOW?
KNOW’s overall layout structure is pretty similar to that of Google News. As you can see here, KNOW correlates global news around a specific threat by adding diverse perspectives from different publishers.
There is another feature that you will find extremely valuable. Check out what happened recently after Iranian hackers deployed attacks via the Dharma ransomware.
KNOW curated all the related articles and singled out the ransomware softwares responsible in a story card. Following that, KNOW gives the following information, as well:
- A list of threat actors the ransomware was historically linked to.
- Recent sandbox sighting.
- Historic sandbox sighting.
- A link to the malware’s threat intelligence page.
Articles in KNOW are categorized as such:
- COVID-19 cyber threat.
- Data breach.
- Threat actor.
- Cloud security.
- Emerging threat.
- Zero-day & exploit.
Why KNOW? How does it change the game?
To understand the importance of KNOW, let’s grasp our head around what security teams and analysts go through every single day for threat research.
- KNOW analyzes millions of digital signals every single day while accounting for the ever-changing global threat landscape.
- Filtering out risk and identifying IOCs from these signals is an extremely cumbersome process that may take hours or even days.
Leveraging KNOW allows security teams to:
- Access free threat intelligence—in context—including the latest information from social media, technical reports, and advisories, worldwide threat feeds, and more all from a single screen.
- Read up on the latest, hottest-trending security stories of the moment that’s easily accessible on-the-fly via mobile.
KNOW for nerds – What’s going on behind the scenes?
Step #1: Stacking
The first step in KNOW is to retrieve the most relevant articles for proper curation. The collected articles are initially passed through a stacked ensemble of classifiers to segregate the articles into cybersecurity and non-cybersecurity articles.
Stacking is a method to ensemble multiple classifications or regression models together. The primary idea behind it is to train several different weak learners and ensemble them together by training a meta-model to make predictions based on the weak models’ individual predictions. The intuition is that the weak models can learn different parts of the same problem but not the whole issue. Once the articles are correctly classified, we discard the non-cybersecurity related articles altogether.
Step #2: Capturing context
The remaining articles are then vectorized by embedding the texts so as to capture as much of the linguistic meaning of the words and sentences as possible. These vectors capture the grammatical meanings and semantics of the words that make it possible to perform mathematical operations. Sentence encoding, character embedding, and dependency tree encodings generate semantically useful vectors for each article.
We use character embeddings to handle very specific words like surnames, or words produced by typing mistakes and erroneous spellings. Word embeddings can handle only the known words already present in their vocabulary during its training. Character level embeddings help in resolving this problem by finding a numeric representation of words by looking at their character-level compositions.
Dependency tree encoding, on the other hand, can encode not only the syntactic structure of a sentence but also several different aspects of its semantics. The vectors generated through dependency trees help in representing the grammatical context of sentences. The vectors generated from the three embedding techniques are concatenated together to form one single vector for each article.
Step #3: Clustering
The articles that are similar to one another in terms of the contents in its text are clustered together. This clustering is a combination of a gaussian mixture model (GMM) and a K-means model. GMM clustering can accommodate clusters that have different sizes and correlation structures within them. This clustering is done on a broader scale, making it necessary for further intracluster clustering to generate optimally clustered data.
For performing intra-clustering, a K-means model ensures that semantically different content in the same cluster can be further divided into sub-clusters. Text analysis determines the title for each of the clusters via keyword extraction.
A scoring method is implemented by taking into account the recency of the articles in each cluster and simultaneously by determining each cluster’s clustering quality using graph theory. This scoring system helps in determining the order of clusters while displaying them on the web page.
Step #4: Entity recognition
Entity recognition/extraction is extensively applied on the clustered data. Entities such as threat actors, malware names, etc. present in the articles are extracted out for proper segmentation into different sections. A bidirectional LSTM-CRF (BI-LSTM-CRF) model with word and character-level embeddings extract cyber threat entities from the texts. The LSTM layer filters unwanted information so that it keeps only the important features/information and the CRF layer deals with sequential data.
The BI-LSTM-CRF model can efficiently use both past and future input features because of the presence of a bidirectional LSTM component. This model produces a vector representation of each word in both directions (forward and backward) where forward direction accesses past information and backward direction accesses future information. It then further amalgamates itself with a Conditional Random Field (CRF) layer at the end. This model is robust and has fewer dependencies on word embeddings alone as compared to other similar models.
The use of character embeddings, in addition to the word embeddings, helps in extracting more information from the texts. A highway network concatenates the vectors obtained through word and character embeddings. This highway network adjusts the relative contribution from the word embedding and character embedding steps. Its output represents modified vector representations of words from both of the embedding steps.
Step #5: The finishing touches
With a deep model architecture as ours, the training error gets affected and degrades rapidly when data goes through the network. This degradation is not due to over-fitting but due to the presence of deep layers in the model. Residual and skip connections help solve this problem. They allow gradients to flow through a network directly, without passing through non-linear activation functions. Rather than using the usual softmax layer used in such cases, a CRF layer is used as the last layer.
The Softmax layer normalizes its outputs into a vector of values that follows a probability distribution whose total sums up to 1. Hence each of the probability scores is not independent of one another. On the other hand, the CRF layer outputs individual probability scores for each of its predictions, with each output being independent of one another. It is an optimization on top of BI-LSTM layer and can efficiently predict the current tag based on the past attributed tags. The CRF layer adds some constraints to the final predicted labels to ensure that they are valid. These constraints are learned by the CRF layer automatically from the training dataset during the training process and thus prove to be better as an output layer to the model than the softmax layer.
We built KNOW because we realized that there was a massive gap in the cyber-security space. Collecting threat intelligence is crucial for proactive security, but the process is too cumbersome and requires menial labor. Our idea was to take out toil as much as possible for cyber-security. There has never been a tool like KNOW that offers the same level of contextual data along with a user-friendly interface. The best part is that you can leverage this excellent tool for absolutely free. Just go and subscribe to KNOW right now. Level up your SecOps by empowering your team to KNOW what’s going on in the global threat landscape.