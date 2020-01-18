Machine learning is indeed today’s technology! Security, which is a growing problem for many companies today, and machine learning is one of the solutions to deal with it. ML can help cyber security systems analyze and learn from patterns to prevent similar attacks and respond to changing behaviors.

To learn more about machine learning and its application in cybersecurity, we spoke to Emmanuel Tsukerman, a cybersecurity data scientist and author of Machine learning for the cybersecurity cookbook, The book also includes modern AI for creating powerful cyber security solutions for malware, pentesting, social engineering, data protection and intrusion detection. In 2017, Tsukerman’s anti-ransomware product was included in the top 10 ransomware products of 2018 by PC Magazine. In his interview, Emmanuel talked about how ML algorithms help solve cybersecurity problems and gave a brief overview of some chapters in his book. He also discussed the rise of deepfakes and malware classifiers.

Learn how to use machine learning for cybersecurity

The use of machine learning in cyber security scenarios enables systems to identify different types of attacks at different security levels and to create a correct POA. Can you give some examples of the successful use of ML for cybersecurity that you saw recently?

A new and interesting development in cybersecurity is that the bad guys have started to keep up with technology. In particular, they have started using deepfake technology to commit crimes. For example, they used AI to mimic a CEO’s voice to defraud a $ 243,000 company. On the other hand, the use of ML in malware classifiers is quickly becoming the industry standard due to the incredible number of unprecedented samples (over 15,000,000) generated each year.

Stay up to date with developments in technology to defend against attacks

Machine learning technology is used not only by ethical people, but also by cybercriminals who use ML for ML-based interventions. How can organizations deal with such scenarios and ensure the security of confidential organizational / personal data?

The main tools that organizations can use to prevent attacks are: stay current and to pentest, In order to stay up to date, of course, you need to be informed about the latest developments in technology and its applications. For example, it is important to know that hackers can now use AI-based voice imitation to pretend to be someone they want. This knowledge should be shared in the organization so that individuals are not caught off guard.

The other way to improve security is to run regular pen tests using the latest attack method. be it by trying to bypass the company’s virus protection program, send phishing messages, or infiltrate the network. In all cases, it is important to use the most dangerous techniques, often based on ML

How ML algorithms and GANs help solve cybersecurity problems

You mentioned various algorithms in your book, e.g. B. Clustering, gradient elevation, random forests and XGBoost. How do these algorithms help solve cybersecurity problems?

Unless a machine learning model is restricted in any way (e.g. in terms of computation, time, or training data), there are 5 types of algorithms that have had the best results in the past: neural networks, tree-based methods, Clustering, anomaly detection and reinforcement learning (RL). These are not necessarily disjoint, since anomaly detection can be carried out, for example, via neural networks. To keep it simple, we stick to these 5 classes.

Neural networks shine with large amounts of data for visual, acoustic or textual problems. For this reason, they are used in deepfakes and their detection, lie detection and speech detection. There are also many other uses. However, one of the most interesting applications of neural networks (and deep learning) is the creation of data via Generative Adversarial Networks (GANs). GANs can be used to generate password estimates and evasive malware. For more information, see the machine learning cookbook for cybersecurity.

The next class of models that perform well is tree-based. This includes random forests and slope trees. These offer good performance for structured data with many functions. For example, the PE header of PE files (including malware) can be provided with approximately 70 numerical features. It is practical and effective to use this data to create an XGBoost model (a gradient boosting model) or a random forest model. Chances are that other algorithms won’t outperform.

Next up is clustering. Clustering lights up when you want to automatically segment a population. You may have a large collection of malware samples and would like to divide them into families. Clustering is a natural choice for this problem.

By detecting anomalies, you can ward off invisible and unknown threats. For example, if a hacker uses a new tactic to infiltrate your network, an anomaly detection algorithm can protect you, even if the new tactic has not been documented.

Finally, RL algorithms are well suited for dynamic problems. The situation can be, for example, a penetration test in a network. The DeepExploit framework covered in this book uses an RL agent in addition to Metasploit to learn from previous pen tests and find vulnerabilities more and more.

Generative Adversarial Networks (GANs) are a popular branch of ML that is used to train systems against counterfeit data. How can these help in the detection and protection of malware systems to detect correct intrusion?

A good way to think about GANs is to play a pair of neural networks against each other. The loss of one is the goal of the other. When the two networks are trained, each of them gets better at their work. We can then take the tug of war side, separate it from its rival and use it. In other cases, we can “freeze” one of the networks, which means that we don’t train it, we only use it for scoring. In the case of malware, the book covers the use of MalGAN, a GAN to circumvent malware. A network, the detector, is frozen. In this case it is an implementation of MalConv. The other network, the opposing network, is trained to modify malware until MalConv’s detection value drops to zero. When it trains, it gets better and better.

In a practical situation, we would like to activate both networks. Then we can use the trained detector as part of our anti-malware solution. We would then be confident that it is very good to detect evasive malware. The same ideas can be applied in a number of cybersecurity contexts such as intrusion and deepfakes.

How machine learning for cybersecurity can help Cookbook easily implement ML for cybersecurity issues

Which of the tools / recipes mentioned in your book can help cybersecurity professionals easily implement machine learning and make it part of their daily activities?

The machine learning cookbook for cybersecurity offers amazing 80+ recipes. The most common recipes vary between professionals, and even different recipes apply to each person at different times in their careers. For cybersecurity experts who want to work with malware, the basic chapter, Chapter 2: ML-based malware detection, offers a solid and excellent introduction to creating a malware classifier. For advanced malware analysts, Chapter 3: Advanced malware detection offers more sophisticated and specialized techniques, such as: B. dealing with obfuscation and script malware.

Any cybersecurity expert would benefit if they understood Chapter 4, “ML for Social Engineering”. In fact, everyone should understand how ML can be used to fool unsuspecting users as part of their cyber security training. This chapter really shows that you have to be careful because machines can imitate people better. On the other hand, ML also offers the tools to know when such an attack is carried out.

Chapter 5, “Penetration Tests with ML”, is a technical chapter and best suited for cybersecurity experts who deal with penetration tests. There are 10 ways to improve pen testing using ML, including neural network-assisted fuzzing and DeepExploit, a framework that uses a reinforcement learning (RL) agent in addition to metasploit to perform automatic pen testing.

Chapter 6, “Automatic Intrusion Detection,” is more appealing because many cybersecurity experts need to know how to protect a network from intruders. You would benefit from the ability to stop ML-Day attacks on your network. In addition, many other use cases are dealt with in this chapter, e.g. These include spam filtering, botnet detection, and inside threat detection, which are more useful to some than others.

Chapter 7, “Securing and Attacking Data with ML,” provides great content for cyber security professionals who want to use ML to improve their password security and other forms of data security.

Chapter 8, “Secure and Private AI,” is invaluable to data scientists in the cybersecurity field. The recipes described in this chapter include federated learning and differentiated data protection (with which an ML model can be trained on customer data without compromising privacy) and testing the robustness of opponents (with which the robustness towards ML models is compared enemy attacks can be improved).

Your book is about using machine learning to generate custom malware to increase security. Can you explain how this works and why it is important?

Typically, you want to figure out your vulnerabilities before someone else does it (who may not be doing anything good). For this reason, testing pens has always been an important step to ensure security. In order to do your antivirus test well, it is important to use the latest malware evasion techniques, as the bad guys are sure to try them out. These are deep learning-based techniques for changing malware.

About Emmanuel’s personal success in cybersecurity

Dr. Tsukerman, in 2017 your anti-ransomware product was included in the top 10 ransomware products of 2018 by PC Magazine. In your experience, why are ransomware attacks increasing and what makes an effective anti-ransomware product? In 2018, you also developed an ML-based malware detection system for Palo Alto Networks’ WildFire service for over 30,000 customers. Can you tell us more about this project?

If you monitor cyber security messages, you will find that ransomware continues to be a major threat. The reason for this is that ransomware offers cybercriminals an extremely attractive weapon. First, it is very difficult to identify the culprit based on the malware or the crypto wallet address. Second, the payouts can be massive, be it when the right goal (e.g., a HIPAA-compliant health organization) or a large number of goals (e.g., all traffic to an e-commerce website) is achieved. Third, ransomware is offered as a service that effectively democratizes it!

On the other hand, much of the risk of ransomware can be reduced by using common sense tactics. First save the data. Second, with an anti-ransomware solution that offers guarantees. A generic antivirus cannot guarantee – either the ransomware will be intercepted or not. If not, your data is toast. However, certain anti-ransomware solutions, such as the one I developed, offer guarantees (e.g. no more than 0.1% of the lost files). Finally, as millions of new ransomware samples are developed each year, the malware solution must include a machine learning component to intercept the zero-day samples. This is another component of the anti-ransomware solution that I developed.

The Palo Alto Networks project is a similar implementation of ML to detect malware. The only difference is that, unlike the anti-ransomware service, which is an endpoint security tool, protection services are offered from the cloud. Since Palo Alto Networks is a provider of firewall services, this makes a lot of sense, since the malicious example is ideally stopped at the firewall and does not even reach the end point.

To learn how to implement the techniques outlined in this interview, read the Machine learning for the cybersecurity cookbook Don’t wait – the bad guys don’t wait,

Author bio

Emmanuel Tsukerman graduated from Stanford University and received his PhD from UC Berkeley. In 2017, Dr. Tsukerman’s anti-ransomware product ranked in the top 10 ransomware products of 2018 by PC Magazine. In 2018, he designed an ML-based malware detection system for Palo Alto Networks’ WildFire service for over 30,000 customers. In 2019 Dr. Tsukerman’s first course in cybersecurity data science.

About the book

The Cybersecurity Machine Learning Cookbook guides you through creating classifiers and malware functions that you can train and test using real-world examples. You’ll also learn how to build self-learning, reliable systems for cyber security tasks such as malicious URL detection, spam email detection, intrusion detection, network protection, and user and process behavior tracking, and more!

