Database Security: An Overview and Analysis of Current Trend

International Journal of Management, Technology, and Social Sciences (IJMTS), 4(2), 53- 58. ISSN: 2581-6012, 2019

6 Pages Posted: 19 Dec 2019 Last revised: 20 May 2020

Prantosh Paul

Raiganj University

P. S. Aithal

Institute of Management and Commerce, Srinivas University

Date Written: October 30, 2019

Information is the core and most vital asset these days. The subject which deals with Information is called Information Science. Information Science is responsible for different information related affairs from collection, selection, organization, processing, management and dissemination of information and contents. And for this information related purpose Information Technology plays a leading role. Information Technology has different components viz. Database Technology, Web Technology, Networking Technology, Multimedia Technology and traditional Software Technology. All these technologies are responsible for creating and advancing society. Database Technology is concerned with the Database. It is worthy to note that, Database is concerned with the repository of related data in a container or base. The data, in Database normally stored in different forms and Database Technology play a lead role for dealing with the affairs related to database. The Database is very important in the recent past due to wider applications in different organizations and institutions; not only profit making but also nonprofit making. Today most organizations and sectors which deal with sensitive and important data keep them into the database and thus its security becomes an important concern. Large scale database and its security truly depend on different defensive methods. This paper talks about the basics of database including its meaning, characteristics, role etc. with special focus on different security challenges in the database. Moreover, this paper highlights the basics of security management, tools in this regard. Hence different areas of database security have mentioned in this paper in a simple sense.

Keywords: Database, Database Technology, Security Technology, IT Management, Information Networking, Privacy and Security Management, Trust Management, Cloud Computing

Suggested Citation: Suggested Citation

Raiganj University ( email )

Yogesh Bhawan; S/O Santi Ranjan Biswas; Ashok Pall Near Asha Cinema Hall; P.O.+ P.S. Raiganj Raiganj, West Bengal 733134 India

P. S. Aithal (Contact Author)

Institute of management and commerce, srinivas university ( email ), do you have a job opening that you would like to promote on ssrn, paper statistics, related ejournals, sustainable technology ejournal.

Subscribe to this fee journal for more curated articles on this topic

Innovation & Management Science eJournal

Political economy - development: public service delivery ejournal, other information systems & ebusiness ejournal.

Blue polygon background

Database security refers to the range of tools, controls, and measures designed to establish and preserve database confidentiality, integrity, and availability. This article will focus primarily on confidentiality since it’s the element that’s compromised in most data breaches.

Database security must address and protect the following:

Database security is a complex and challenging endeavor that involves all aspects of information security technologies and practices. It’s also naturally at odds with database usability. The more accessible and usable the database, the more vulnerable it is to security threats; the more invulnerable the database is to threats, the more difficult it is to access and use. (This paradox is sometimes referred to as Anderson’s Rule . (link resides outside IBM)

By definition, a data breach is a failure to maintain the confidentiality of data in a database. How much harm a data breach inflicts on your enterprise depends on a number of consequences or factors:

Many software misconfigurations, vulnerabilities, or patterns of carelessness or misuse can result in breaches. The following are among the most common types or causes of database security attacks and their causes.

Insider threats

An insider threat is a security threat from any one of three sources with privileged access to the database:

Insider threats are among the most common causes of database security breaches and are often the result of allowing too many employees to hold privileged user access credentials.

Human error

Accidents, weak passwords, password sharing, and other unwise or uninformed user behaviors continue to be the cause of nearly half (49%) of all reported data breaches .

Exploitation of database software vulnerabilities

Hackers make their living by finding and targeting vulnerabilities in all kinds of software, including database management software. All major commercial database software vendors and open source database management platforms issue regular security patches to address these vulnerabilities, but failure to apply these patches in a timely fashion can increase your exposure.

SQL/NoSQL injection attacks

A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack strings into database queries served by web applications or HTTP headers. Organizations that don’t follow secure web application coding practices and perform regular vulnerability testing are open to these attacks.

Buffer overflow exploitations

Buffer overflow occurs when a process attempts to write more data to a fixed-length block of memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent memory addresses, as a foundation from which to launch attacks.

Malware is software written specifically to exploit vulnerabilities or otherwise cause damage to the database. Malware may arrive via any endpoint device connecting to the database’s network.

Attacks on backups

Organizations that fail to protect backup data with the same stringent controls used to protect the database itself can be vulnerable to attacks on backups.

These threats are exacerbated by the following:

Denial of service (DoS/DDoS) attacks

In a denial of service (DoS) attack, the attacker deluges the target server—in this case the database server—with so many requests that the server can no longer fulfill legitimate requests from actual users, and, in many cases, the server becomes unstable or crashes.

In a distributed denial of service attack (DDoS), the deluge comes from multiple servers, making it more difficult to stop the attack. See our video “What is a DDoS Attack”(3:51) for more information:

Because databases are nearly always network-accessible, any security threat to any component within or portion of the network infrastructure is also a threat to the database, and any attack impacting a user’s device or workstation can threaten the database. Thus, database security must extend far beyond the confines of the database alone.

When evaluating database security in your environment to decide on your team’s top priorities, consider each of the following areas:

In addition to implementing layered security controls across your entire network environment, database security requires you to establish the correct controls and policies for access to the database itself. These include:

Database security policies should be integrated with and support your overall business goals, such as protection of critical intellectual property and your cybersecurity policies and cloud security policies . Ensure you have designated responsibility for maintaining and auditing security controls within your organization and that your policies complement those of your cloud provider in shared responsibility agreements. Security controls, security awareness training and education programs, and penetration testing and vulnerability assessment strategies should all be established in support of your formal security policies.

Today, a wide array of vendors offer data protection tools and platforms. A full-scale solution should include all of the following capabilities:

Related solutions

Ibm cloud security.

Continuous edge-to-edge cloud protection for your data and applications with regulatory compliance.

IBM Security Guardium

Wide visibility, compliance, and protection throughout the data security lifecycle.

Data Security Services for Cloud

Comprehensive data protection for the most critical enterprise data.

Learn more about data organization in the cloud.

Now in its 17th year, the 2022 Cost of a Data Breach report shares the latest insights into the expanding threat landscape and offers recommendations for how to save time and limit losses.

In this introduction to networking, learn how computer networks work, the architecture used to design networks, and how to keep them secure.

As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria are the influencing factors of database security. Compared with the traditional and machine learning (ML) methods, some explanations of concepts are interspersed to make these methods easier to understand. Secondly, we find that the related research has achieved some gratifying results, but there are also some shortcomings, such as weak generalization, deviation from reality. Then, possible future work in this research is proposed. Finally, we summarize the main contribution.

Database Security , Threat Agent , Traditional Approaches , Machine Learning

Share and Cite:

1. Introduction

Database has been widely used in production and life, but data pool has been under severe security threats. At present, due to the development of computer network, technical loopholes and other factors, the database is often attacked [1]. In January 2019, data from a Philippine financial services company, were leaked, over 900,000 customer data were stolen by unauthorized hackers; In September 2019, Facebook confirmed that 419 million user phone information was leaked. In 2018, the losses of various network security incidents reached $45 billion, and most of events were related to databases. The above instances show that the study of database security is urgent.

With the increasing complexity of data and database functions, the change of attackers’ attacking methods and the improvement of technology, traditional methods cannot meet the reality. Machine learning (ML) can transform sequential scanning into calculation model and DBA (Database Administrator) experience into prediction model, which makes the intrusion detection more intelligent and dynamic to adapt to the rapid variety of workload changes [2], and now computing power can satisfy machine learning. Therefore, there are more and more articles applying machine learning in database security threat response, but few people sort out these coping methods, which reflects the advantages of machine learning over traditional methods in dealing with some threats types.

This paper first obtains the source of database security threats, as shown in Figure 1 . Then we carefully sort out and review the papers dealing with these threats, and find that machine learning has its advantages. Finally, we point out the shortcomings of relevant research and possible research directions.

The organization of this paper proceeds as follows. Section 2 summarizes data security issues and solutions. Sections 3, 4, 5 and 6 elaborate database security threats’ solutions from four aspects: ineffectively data protection, user exception, vulnerability of defense system, and external attacks. Section 7 carries out research prospects and briefly sums up the full text.

2. Database Security Issues and Solutions

With the development of IT, database security risks are manifold [3]. We comb the research on database security, and find these factors closely related to database security: data, role, defense system, external factors. Therefore, we mark off four main threat sources: ineffective data protection, abnormal users, fragile defense system and external attacks. Data can be further divided into three categories: data tampering, data exposure, data being monitored or collected. User exception is subdivided into: illegal behavior, unauthorized access, weak security

Figure 1 . Sources of database security threats.

awareness. Weak defense system can also be divided by vulnerability, inaccurate identification. The external attacks are the main source of database security threats and they cause the most serious damage. Further, there are many secondary categories, including spam, malicious traffic, SQL injection, illegal access, malware, DDoS attacks, bypass and physical attacks. For the above-mentioned various security threats and their import, researchers use a series of methods to deal with these threats, as shown in Table 1 . In the following four sections, the

Table 1 . Solutions and damage to database security threats.

above-mentioned four database security threats sources are expanded successively, various threats attack principles and response methods are analyzed in detail.

3. Data Ineffectively Protected Problems and Solutions

Data is the most watched factor among database security-related factors, since databases store large amounts of data. The data in the database is faced with serious threat. In January 2018, data from Indian citizenship database was leaked, including private information such as fingerprints and general personal information such as birthday. The main threat to data factor is ineffective data protection, such as data exposure, data tampering, data being monitored or collected. This section will focus on these threats.

3.1. Data Exposure Problems and Solutions

Data exposure means that data in a database is stored in clear text, and an attacker can easily get the data when he breaks through the defense system. In 2012, Rambler’s database in Russia was leaked, and even more alarmingly, nearly 100 million user passwords were leaked and stored in plain text. Unfortunately, in order to the efficiency of access, much data is still stored in clear text recently.

Most researchers focus on data encryption. Ni et al. [4] proposed to encrypt sensitive data and the database, this method is only for specific systems and has poor scalability. Wang et al. [5] designed a general database encryption and decryption engine system, the system encrypted data on the application side, and utilized different user IDs to identify different transmission commands, and finally exploited the user’s private key for encryption storage, but they should clarify the generation and distribution of keys. Hence, Huang et al. [6] adopted a weighted encryption scheme and related access control policies, however, the encryption and decryption process might be cumbersome excessively, leading to not so satisfactory application efficiency. Zhang et al. [7] firstly classified the users of web server: ordinary users, high-level users, and then encrypted the data of the high-level users. The method ensured the data security of high-level users, but might ignore ordinary users. Mei et al. [8] improved the AES algorithm and applied the encryption algorithm to the database management system, they converted the user name, password, database and user’s activity with AES, this method had a wide range of applications and high reliability, but only processed binary files. Dandekar et al. [9] combined SHA-256 with ASCII control replacement technology to hide database data. They used the SHA-256 algorithm to make SOH replace binary information and generated hash values, and then compared the hash values with the encoded information, the efficiency of this approach was not so good. Andrey et al. [10] also embedded special code elements and representative data into a symmetric cryptographic algorithm, they replaced plain text elements with elements of the sequence associated with the key and then restored plain text through the key, the method effectively simplified the encryption operation, but had lower data security. Awais et al. [11] deployed parallel query execution techniques and AES on different data records, they used hash functions on metadata and multithreading on.NET applications, and then exploited AES encryption before inserting data into the data table. However, there would be conflicts when multiple technologies are used together. Uma et al. [12] utilized AES encryption and MD5 code conversion in the Medical Records Security System database. AES divides 128-bit medical data into four basic blocks for processing, while MD5 code divides any medical data into 512-bit data blocks and generates a fixed 128-bit length result, but the efficiency of this method is not high. He et al. [13] exploited quantum cipher to encrypt database data, they combined key dilution and auxiliary parameters, only a few quanta were sent in the quantum channel to generate the initial key, then the initial key was diluted by bitwise addition to several consecutive bits. The strategy’s performance was high, but the quantum cipher was not yet mature enough. Fortunately, machine learning models were applied to data encryption. Shumeet [14] utilized DNN (Deep Neural Network) to hide image data from the database in the image. DNN is a neural network with a multilayer hidden layer. The basic structure is shown in Figure 2 below. He exploited a large number of bits to embed RGB pixels of panchromatic images into another similar image of the same size, and then hid the decoding results and the appearance of the host image through a compression network of deep nerves. Experiments showed that the hiding effect was fine, but the hiding image required a lot of extra storage space. After an attacker detected a large number of hidden images, it was easy to recognize the image contents.

There are other ways to solve database data exposure issues. Wang et al. [15] firstly designed a signature scheme that could specify a verifier by using the authentication method. After signing the root node with this scheme, users need server participation to verify data using MHT tree. The experimental results showed that the verification speed was fast and the database data could be protected effectively, but the operability of the method was not strong. Jovan et al.

Figure 2 . Basic structure of deep neural network.

[16] brought block chain technology to database security, their system sent different coded data blocks through separate channels, and exploited block chains to store encoding matrices for distributed storage systems. However, in some scenarios, each channel needs to be highly uncorrelated to avoid data interaction, so this method was limited. Auditing is also used by researchers, Vitthal et al. [17] proposed data auditing on the public cloud by third-party auditors. Auditors could read the data, but costs might be high unduly. Modeling methodologies are also considered. Minh et al. [18] attempted to build a common model for database data security using cloud services. They performed a feasibility analysis of information to create risk models. In machine learning, Boudheb et al. [19] exploited genetic algorithms and Naive Bayes to protect medical data. Genetic algorithm was a computational model that simulated the natural selection and genetic mechanism of Darwin’s biological evolution. On the premise of independent and identical distribution of objects, Naive Bayesian obtains the posterior probability of objects from the prior probability of objects, and then uses the maximum posterior probability to determine the category of objects [20]. The specific calculation steps are as following: Figure 3 . There are many sources of medical data and complex storage. The selection of safety features played a decisive role in the training model, the paper utilized the most representative safety features (patient identification, birthday, blood type, etc.).

3.2. Data Tampering Problems and Solutions

Data tampering means that the data in the database has been illegally altered, the situation causes the original data to be lost, replaced, or added or subtracted. In January 2010, the website of an educational examination center was invaded, and somebody logged into the database, he added a record of someone’s exam passing information, such behavior seriously violated the fairness of the examination.

Some research is intended to prevent data tampering. Piggin et al. [21] exploited honeypot technology in common physical components of a database system to attract attackers to modify fake data, and then to protect truly valuable data, but there was a risk that the honeypot could be used to attack by attackers. Elena et al. [22] implemented data entry through spin current, they made use of

Figure 3 . Steps of simple Bayesian calculation.

the high variability that affected the resistance of magnetic tunnel junction devices and the special configuration of read operation reference units to make data physically non-cloning, the effectiveness of this method was proved in theory, but lack of practical verification. The development of machine learning also brings an opportunity to solve the issue. Some researchers have focused on ECG (electrocardiogram) data, which is physically non-cloning and can effectively combat data tampering. Yin et al. [23] learned and extracted different features before using the neural network training data to minimize overlap in the distribution of cosine/hamming distances between individual and inter-individual, but that needed large amount of calculation. Kiran [24] introduced a minimum absolute contraction selection operator to identify the most appropriate ECG features. This method effectively avoided random, correlated, and over-fitting features, and reduced the feature space, and improved the prediction speed, but the detection accuracy was reduced slightly. He also proposed an effective ECG feature extraction method [25], which extracted six optimal segments based on priority and normalizes positions, but there might be over fitting.

Some research focuses on the processing of data tampering when it has happened. Li et al. [26] hoped that the normal data query service would continue after the data was partially polluted. They utilized the data query service rules to determine whether they decided to return the user’s partially legitimate data collection. This method improved the usability of the database, but could not determine the location of the data pollution. Yin et al. [27] designed a detection mechanism for database tampering, they exploited two signatures both horizontally and vertically to ensure that the data table could be detected by signature after tampering with the data sheet. However, the system cost a lot and the operation was cumbersome. Xian et al. [28] took a simpler approach. After the server responded to a query request, the servicer sent the verification value, the mask of the verification tree, and the signature of the mask and the number of root nodes of the verification tree to the query party to verify whether the data had been tampered with. This method effectively decreased the amount of computation, but reduced the safety. In machine learning method, Lai et al. [29] exploited K-means clustering algorithm. K-means’ workflow is: randomly selecting k points as the initial centroid, and then assigning each point in the dataset to a cluster. In order to detect the web page data which had been tampered with, they grabbed information from the first page of some websites and established detection rules by classifying the data to determine whether the web page had been misrepresented. However, this method needed to adjust the detector, which required rich experience in dealing with hackers, and wrong adjustment would greatly reduce the recognition effect.

3.3. Data Monitored or Collection Problems and Solutions

Data is eavesdropped or collected by an attacker during transmission, and then they analyze the information about the target. Recently, social software has been exposed to monitor user chat records, the conduct seriously violates user privacy.

Data encryption is the most common way to solve the problem. Kushko et al. [30] proposed a new method to protect network data transmission, they hid the interaction between nodes in the network and utilized encryption, multicast and packet retransmit for traffic interaction, the operation was too complicated. Andrey et al. [31] introduced the homologous encryption and logistic regression model. Homologous encryption enabled people to perform certain forms of algebraic operations on cipher text and still encrypted it. The result of decryption was the same as that of plain text. They lessened the storage of encrypted databases by using an approximate homologous encryption method, and accelerated gradients by using logistic regression models to speed up computations. However, logistic regression was prone to the phenomenon of under fitting. In addition to data encryption, Li et al. [32] exploited a remote method to invoke the server to receive and parse network packets transmitted by the server-side proxy, and then to filter the address information securely, and finally to invoke the JDBC driver to connect to each database management system for data interaction and return the results, but the solution was costly to implement.

4. User Exceptions Problem and Solutions

User exceptions are the most difficult to guard against in database security threats. In March 2017, Tencent jointly with the Jingdong security team uncovered a case of self-theft. An insider in Jingdong stole more than 5 billion pieces of information. After that, they made profits by selling through various illegal ways, such action caused huge economic and reputation losses in Jingdong. Researchers subdivide user anomaly threats into illegal behavior, unauthorized access, and weak security awareness.

4.1. Illegal Acts Problems and Solutions

Illegal behavior refers to the user’s behavior that violates the role positioning or behavior rules in the database, such as unauthorized access to the database, users’ illegal operations in the database system, and so on.

Researchers want to detect such behaviors. Chen et al. [33] utilized C and C# to achieve real-time tracking and analysis of database operation information, database and server status, but the efficiency should be improved. In order to improve the processing speed, the machine learning model is introduced. Liu [34] exploited the naive Bayesian classification algorithm to build files for each database role, then trained the user behavior database, and finally classified the database transaction through the user behavior database, but it lacked experimental support. Andrey et al. [35] utilized a K-means clustering algorithm to process text log information. They converted the text log information into clustering vectors, calculated outliers, and sorted the output anomalies to get the clusters to which the user behavior log information most likely belonged, but the processing accuracy needed to be improved, and this method could only apply single structure text log.

4.2. Unauthorized Access Problem and Solutions

Unauthorized access refers to users illegally accessing data that does not conform to their privileges by means of delegation, etc. An average user can be an administrator, or even a super administrator by privilege promotion, and then he can acquire other user data.

Access control is a widely used solution. Xu et al. [36] gave the user a multilevel role name based on which to acquire internal roles before granting the user permissions, but this method could not resist hidden channel access effectively. He et al. [37] utilized the security baseline to evaluate the database access control, and took measures to improve the control effect after quantifying the score, however, there was no specific method to improve the effect of access control. An et al. [38] exploited the history of multi-connection pool and different configurations to achieve strict and dynamic access control. Yang et al. [39] proposed a method to refine database access control through permission extension. They split the primary key in the permission table into corresponding storage structure and saved permission information with built-in key values to achieve more refined access control, but the application scenarios were limited.

4.3. Weak Safety Awareness Problem and Solutions

Weak security awareness means that database users create attack points that may be exploited by attackers for the sake of saving trouble, such as setting weak password and not modifying the default password of database, the consciousness can improve security through educational means. Therefore, there are a few related technological research papers. Yung et al. [40] investigated the impact of security awareness on bank security performance management and the use of information technology through a questionnaire, and concluded that compliance had a significant impact on information security management performance and information technology capabilities.

5. Vulnerability of Defense System Problem and Solutions

The vulnerability of database defense system is reflected in two layers: the operating system layer and the database layer. The former refers to that the user’s host is easy to be controlled by hackers and then attacked, while the latter refers to the unclear division of storage authority and the incorrect configuration by DBA. There was fragility in SQL server, the default password of SA, the super administrator, was empty. Attackers could log in to SQL server directly through SA account without password. There are two reasons for the vulnerability of database defense system: firstly, there are defects in initial configuration, secondly, the system’s identification is not accurate.

5.1. Bug Problems and Solutions

Bug refers to the design defects of database defense system. In May 2011, hackers used the user of Oracle database to invade the database of Korea Convention and Exhibition Center. The reason why the system was broken was that the DBSNMP user used the default password.

Most researchers adopt the strategy of defense in advance. Gao [41] designed a database security evaluation model for SQL server, Sybase and Oracle, the method could evaluate the overall security of the database. Kozlov et al. [42] utilized fuzzy logic to evaluate the security of enterprise information management system. They expressed all possible threats of the system into a function, and each value of the function represented a possible threat. An attack tree was constructed to deal with each threat. Zhang et al. [43] presented intelligent security assessment for system software. They exploited crawlers to obtain natural language evaluation data, and then utilized various machine learning methods to obtain safety evaluation indicators to build a security assessment model, but the method was not easy to implement.

5.2. Inaccurate Identification Problems and Solutions

Inaccurate identification means that the illegal users are wrongly identified as normal users or normal users are identified as illegal users when the database conducts identification. As a good identification method, biometric identification method is fast, safe and rapid development, but biometrics verification also acts out some problems.

Prabu et al. [44] exploited the effective linear binary pattern and scaled invariant Fourier transform to process and store the biometrics of hand type and iris into database, and then utilized neural network and Bayesian network classifier to detect, due to mix the two biological features together, the recognition is inefficient. Musab et al. [45] exploited CNN (Convolutional Neural Network) to improve the recognition effect of face recognition. CNN is a feed forward neural network with deep structure including convolution calculation. The basic structure is shown in Figure 4 below. The author improved CNN by adding standard operation between input layer and output layer, the improvement could accelerate network standardization, but there was a problem of over-fitting in face recognition. Aishwarya et al. [46] utilized aggregation and RF (Random Forest) to improve face recognition rate. RF introduces random attribute selection in the training process of decision tree [47]. They exploited local aggregation to store the features of the detected face images, and then use RF to train and classified face images, this method consumed a lot of storage space. However, Csaba [48]

Figure 4 . Basic structure of convolutional neural network.

pointed out the inherent problems of biometrics: lacking relevant features, high spending, and privacy issues.

6. External Attack Problems and Solutions

Overt attack refers to external attacker directly threatening database security through some ways. The blackmail virus appeared in the first half of 2019, which encrypted the important data in the user system. This virus had caused great damage to the social service infrastructure on a global scale. As the main threat to database security, external attacks can be roughly divided into seven categories: spam, malicious traffic, SQL injection, illegal access, malware, DDoS attacks, bypass and physical attacks.

6.1. Spam Problems and Solutions

Spam refers to a large number of emails sent by attackers to users with phishing, advertisements, viruses, etc., junk mail will occupy a large amount of storage space in the e-mail database, and users may suffer economic losses after clicking on such e-mails. The operation of the mail system will involve multiple databases and protocols, and the specific process is shown as Figure 5 .

Researchers use machine learning method to improve the ability of spam detection. He et al. [49] utilized language decision tree to improve the performance of spam detection based on semantic features. Language decision tree classifies samples with different linguistic attributes through tree structure. They extracted feature information from spam information and decomposed junk mail into several feature subsets in the light of the meaning of attributes. Then they processed, classified and trained these feature subsets by using language decision tree to get the spam classification model, however, they did not consider that the machine learning model was attacked.

6.2. Malicious Traffic Problems and Solutions

Malevolent traffic refers to a large number of requests forged by external attackers through some tools to prevent normal users from accessing the database. In August 2019, snapex platform was attacked by malicious traffic saturation by hackers, which made the platform users temporarily unable to access, and some users suffered economic losses because they were unable to trade virtual currency.

There are many ways for researchers to deal with malicious traffic. Zhang et al. [50] exploited the method of security audit. They firstly captured the user’s access operation data to the database, then submitted the data to the auditors for analysis, and finally fed back the results. The method relied too much on the

Figure 5 . Workflow of mail system.

expertise of auditors and was inefficient to handle large-scale data situations. Yu [51] introduced CNN (convolutional neural network) for intrusion detection. He numerically processed the traffic data set of KDD99 network, and then set the learning rate, iteration times, sample size and other parameters of the convolutional neural network, and finally trained the traffic data to generate the traffic classification model. During the training process, the random gradient descent method was used to accelerate the convergence speed of the model, however, the accuracy of other types of malicious traffic classification was not high, and it was only experimented on classical datasets and not detected in practical applications.

6.3. SQL Injection Problems and Solutions

SQL injection means that external attackers submit query code to the database and get the desired data according to the feedback from the database. The attacker can enter select * from users where (username: “1” or “1” = “1”) and (password: “1” or “1” = “1”), the operation can bypass the user name and password input and directly log into the database management system.

There are many ways to solve SQL injection problems. Li et al. [52] prevented SQL injection through the integrity evaluation of user behavior policy. The administrator stored the mandatory access policy and behavior constraint code summary value into the database management system. After the user submitted the transaction, TSB (the trusted software base) calculated the behavior constraint code summary value and compared the result with the previously stored value, thus finding an exception, but this could not handle logical operations, concurrent transactions. Ma et al. [53] defined a set of ternary strings in the flow of accessing the database, including user name, password and SQL injection attack detection results, so as to describe the probability of database intrusion, and then they executed the pattern matching algorithm. If the user name and password were the identical and the result of SQL injection attack detection is normal, users would be allowed to access, detection of SQL injection attack took too much time to detect, which resulted in long processing time and could not meet the requirements of real-time detection. Xing et al. [54] proposed a real-time detection model of SQL injection attack, which included two parts: real-time detection module and model training module, the real-time detection module detected the access packets in the actual network environment. The model training module improved the convolutional neural network, and normalized the SQL injection samples between 0 - 1. In the training process, the model training module used ReLu activation function and ADM algorithm, and exploited dropout strategy to prevent over fitting, this method was slow to detect.

Machine learning specific methods are also used to resist SQL attacks. Hu et al. [55] utilized vulnerability mining methods to solve the problem of SQL injection. They labelled the PHP class SQL injection code and transformed the code using the bag of words model, and then exploited SVMs (support vector machines) to classify them. Finally, they put new PHP files into the model for classification. Only SQL injection attacks limited to PHP classes were handled. Solomon et al. [56] utilized pattern driven corpus to reduce the harm of SQL injection attack on back-end database. They extracted SQL injection code and exploited regex constraint analytic learning, and then hashed the features to the matrix to meet the needs of classifiers, and divided the feature matrix values into training set and test set, and finally trained the feature matrix value with support vector machine to obtain support vector machine classifier, this approach was specific to specific types of SQL injection attacks.

6.4. Illegal Access Problems and Solutions

Illegal access refers to the external attacker for some illegal purposes to access the database. The deed will cause damage to the database or obtaining the desired data. Attackers write codes to bypass the database management system and its authorization mechanism, and directly access and modify the data in the database through the operating system.

A small number of researchers adopt unique methods. Xiang et al. [57] utilized SSL protocol to build a secure channel between server and database users, and adopted double authentication. Fu [58] proposed the strategy of building database security defense system. He exploited antivirus software and data mining technology to deeply analyze database information. Seok et al. [59] combined convolutional neural network and learning classifier system to improve the detection effect. They extracted features from logs to obtain feature vectors, then selected features by rules, trained with convolutional neural network, and obtained chromosome model with better characteristics by genetic operation, and extracted feature classification for new transactions to find anomalies, but it was difficult to ensure the stability of classification results because of the poor genetic operation.

Most researchers use intrusion detection methods. Wang et al. [60] monitored, counted and analyzed the log and illegal access behavior on SQL Server 2000 database. Li et al. [61] utilized SQL statement structure, statement operation data and system behavior to detect whether the transaction submitted by normal users, this method was prone to single-task misses. Tang et al. [62] were concerned about the illegal scanning of the server port. They extracted several features in the log by Naive Bayesian algorithm and limited the threshold value. If the threshold value was exceeded, Naive Bayesian model would determine the deed as the scanning behavior and blocked. This method relied on the log information and couldn’t handle semi open scanning. Zhang [63] combined support vector machine and ant colony algorithm to build an intrusion detection classifier. The specific process is shown in Figure 6 . The recognition accuracy of the model was more than 95%, and the recognition speed was fast. But the ant colony algorithm was easy to fall into the local optimal solution, which made the parameters not optimal, and the parameters affected the classification accuracy. Jong et al. [64] solved the problem of illegal connection through traffic analysis. They developed an abnormal link detection system, which could detect real-time

Figure 6 . Network intrusion detection classifier construction process.

data flow of MySQL database, but the method only had a narrow application range. Salimov et al. [65] detected and countered attack IP addresses that exceeded the specified threshold, they formulated a white list of database access and exploited honeypot technology to attract attackers to visit specific databases and analyzed attackers. Finally, a model was established according to the threat degree. Sharmila et al. [66] utilized density based clustering technology and supervised learning for database intrusion detection. They constructed normal behavior clusters of users, and the transactions submitted by users were either located in normal behavior clusters, or were classified into abnormal behavior clusters because of local outlier factors, so as to identify intrusion documents, but this method had a high false positive rate.

6.5. Malware Problems and Solutions

Malicious software refers to a kind of special software that attackers secretly install on the victim’s terminal. They are used to monitor, collect and even destroy resources, including database data. The attacker can obtain the database administrator’s authority through some rootkit tool, so as to further implement the database data replication, modification, deletion and other destruction actions.

While mobile phones bring convenience and entertainment to people, there are also some malicious applications, which steal data, monitor calls, and even damage the storage area of mobile phones. Liu et al. [67] exploited K-Nearest clustering to detect Android malware. K-Nearest clustering finds K records closest to the new data from the training set, and uses the main classification to determine the category of new data. They utilized reverse engineering to obtain application feature information, and trained K-Nearest clustering algorithm to obtain malware detection model, Chen et al. [68] resisted the repackaging attack of Android applications through encryption technology. They encrypted the code of the application. If the application was repackaged, the decrypted code would change, thus people could discover that the application had become malware, but the same problem existed.

Some researchers use common methods to resist all kinds of malware. Liu et al. [69] collected common malicious code into malicious images and processed the image into the same size. After that, the image samples were input into convolutional neural network for training to obtain a classification model. This method had good detection effect, but the key feature information of malicious images might be lost if the malicious images were transformed into the same size. Ajay et al. [70] exploited virtual machine monitoring technology and machine learning to build a malware detection system. They reconstructed executable files, and then detected and classified these files through advanced client assisted automatic multi-level malware detection system, but the use of virtual machine monitoring technology could lead to a steep increase in economic input and complex configuration.

6.6. DDoS Attack Problems and Solutions

DDoS is one of the most common malicious traffic. Large scale DDoS requests will occupy the network traffic, and constantly submit query requests to the database, thus preventing real users from accessing the actual services [71]. In May 2016, anonymity, the world’s largest hacker organization, launched a short-term DDoS attack on bank websites around the world. This attack led to the network system of many central banks in a state of paralysis. DDoS attacks can also bring about website failure loading, unavailable software, and the state of failure logging in game accounts.

Bashar et al. [72] had carefully sorted out the application of artificial intelligence and statistical methods in resisting DDoS attacks in recent years. According to the types of vulnerabilities, the degree of automation and the degree of dynamics, they divided DDoS attacks. After summarizing the achievements of predecessors, they put forward better methods to deal with DDoS attacks, they also pointed out that the existing classifications of DDoS attacks were not detailed enough to block certain specific DDoS attacks. Liu et al. [73] resisted capacity attacks through traffic control. They exploited a router dependent access control list to eliminate traffic that did not pass through all MBOX, thus avoiding a large number of unknown traffic blocking normal database query requests and other database operations. Shan et al. [74] utilized feature learning, multi-kernel learning and automatic encoder to predict DDoS attacks. Multi-kernel learning refers to fusing multiple kernel functions when using support vector machines. Automatic encoder is a special neural network, which transforms the input into features, and then reconstructs the original input from the new features. They firstly learnt multi-level automatic encoder, then exploited the encoder to train data to get features, and finally combined multi-core learning algorithm to obtain a unified detection model, but multi-core learning was sensitive to large sample data, which made the computation time-consuming and space-consuming, resulting in poor detection performance.

6.7. Bypass and Physical Attack Problems and Solutions

Bypass and physical attacks refer to attacking on database hardware. If the laser fault injector is used to inject faults into the chip supporting the database, the system will work abnormally.

In recent years, there are a few related research papers. In order to resist bypass attacks, Mohammad et al. [75] exploited Intel cache monitoring technology and hardware performance counter to provide hardware fine-grained information, and utilized Gaussian anomaly detection method to detect bypass attacks based on virtual machine cache, but this method could only be used on devices that installed Intel processors. It also targeted specific bypass attacks and had poor scalability. Jeyavijayan et al. [76] exploited emerging materials to make electronic devices, such as silicon nanowire field-effect transistors and Nano electromechanical switches. Experimental results showed that Nano materials had security advantages over traditional materials, but there were also many challenges, including equipment stability, new protocols, etc.

7. Conclusions

According to the above contents, in the light of data, users, protection system and external attackers, researchers study database security. Some efficient and high recognition methods have been produced in the field of data tampering, data exposure and illegal access. However, some methods to deal with database security threats are only feasible in theory. Therefore, they should improve to adapt the actual needs.

1) The generalization ability of the method is improvement required. In reality, database security is threatened in many ways. However, most of the research only focuses on one aspect or even a specific threat to database security. Although some methods may perform well in well-designed experiments, the protection effect may not be ideal in real environment. People hope that a solution can be extended to all aspects of database security as far as possible. It has become a difficult problem to be studied.

2) Some methods need more practical requirements. There are conditions when any idea or scheme is put into practice: safety, ease of implementation and understanding, cost, etc. Previous studies paid more attention to security and accuracy, some of which were too complex and costly. In future research, we should pay attention to convenience, simplicity, efficiency, user experience, cost and other factors while ensuring security.

After discussing the recent research on database security and the existing problems, we analyze the possible research directions in the future. The method of combining similar scenarios can combine the DNN method of image data processing proposed by Shumeet [11] with the Naive Bayesian method of text data processing by Boudheb et al. [16] to satisfy the effective protection of data in the hospital DBMS. According to the use frequency and privacy level, different types of data can be encrypted discriminatingly. Using K-means clustering algorithm model based on user behavior information in database to log information is a possible research point. After extracting the feature information of the vulnerability, various machine learning methods can be used to detect the vulnerability quickly and automatically, the same processing method can be introduced to other database threat processing. When using machine learning to deal with database security threats, the idea of ensemble learning is reasonable, better classification results can be obtained by using AdaBoost algorithm.

In this paper, we find that database threats include data, users, protection system and external attacks, then their harms and the existing literatures’ solutions are elaborated. Then, we sum up the current shortcoming of related research, and finally give several possible research directions.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Journals Menu  

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License .

About SCIRP

EssayTown.com

Database Security Research Paper

Pages: 11 (4073 words)  ·  Bibliography Sources: 5  ·  File: .docx  ·  Level: College Senior  ·  Topic: Education - Computers

TOPIC: Research Paper on Database Security Assignment

Two Ordering Options:

Which Option Should I Choose?

Download the perfectly formatted MS Word file!

We'll follow your exact instructions! Chat with the writer 24/7.

Related Research Papers:

Database Security Design of an Online Membership Term Paper …

Database Security Design of an online membership and payment management system for the web using a Microsoft SQL Server database and a front end built in Microsoft Visual Stuido C#.net…

Pages: 11 (2976 words)  ·  Type: Term Paper  ·  Bibliography Sources: ≈ 31

Database Security and SQL Injection Research Proposal …

Database Security and SQL Injection Technology has become an integral part of today's business environment. No business today can operate without an Internet connection or at the very least a…

Pages: 1 (364 words)  ·  Type: Research Proposal  ·  Bibliography Sources: ≈ 2

Database Security Annotated Bibliography …

¶ … Security Issues and Features of Database Management Systems (Feeney, 1986) the author creates a taxonomy and framework to support his contention that while a distributed database architecture creates…

Pages: 4 (975 words)  ·  Type: Annotated Bibliography  ·  Bibliography Sources: 5

Anonymous Hackers Target U.S. Security Think Tank. (2011). Newsday. Retrieved from: http://Www.newsday.com/business/technology/anonymous-hackers-target-us-security-think-tank-1.3411610 This article is discussing how databases are becoming increasingly vulnerable by highlighting how the firm Stratfor was hacked.…

Pages: 2 (752 words)  ·  Type: Annotated Bibliography  ·  Bibliography Sources: ≈ 7

Database Security Article Review …

Database Security Over the last several years the overall issue of database security has been increasingly brought to the forefront. Part of the reason for this is the increased amounts…

Pages: 2 (904 words)  ·  Type: Article Review  ·  Bibliography Sources: 2

View other related papers   >>

View 200+ other related papers   >>

How to Cite "Database Security" Research Paper in a Bibliography:

Chicago Style

Sat, Mar 4, 2023

EssayTown.com © and ™ 2001–2023.  All Rights Reserved.  Terms & Privacy

Research Integrity and Assurance

Home

Research Data Security

SUMMARY TABLE:  PROTECTING YOUR RESEARCH DATA

The security of information at Princeton University is directed by the Data Governance Steering Committee, which oversees the actions of the Privacy Policy Committee, the Information Security Policy, and the Data Management Advisory Group.   Researchers in the biomedical as well as social and behavioral sciences are expected to be proactive in designing and performing research to ensure that the dignity, welfare, and privacy of individual research subjects are protected and that information about an individual remains confidential. The protection of research data is a fundamental responsibility, rooted in regulatory and ethical principles and should be upheld by all data stewards.

The Research Data Security Guidelines pertain to researchers and research team members who obtain, access or generate research data, regardless of whether the data is associated with funding or not. These guidelines help Princeton University researchers understand the sensitivity of the data they are collecting and develop appropriate data protection plans, know the appropriate mediums and places to store data, understand how and when to dispose of data, prepare their research data for public use, understand how to keep research data secure while traveling, and what to do in the event of theft, loss, or unauthorized use of confidential research data.   These guidelines can also be used as part of the data management planning process in conjunction with other tools such as the DMPTool to help meet federal funding agency requirements and prepare research data for public use.

Anyone who conducts research with human subjects at Princeton University has a responsibility to protect the data collected and used for their research.   This is especially important when the data (a) contain personal identifiers or enough detailed information that the identity of participating human subjects can be inferred, (b) contain information that is highly sensitive, or (c) are covered by a restricted use agreement.   The guidelines below are intended to help researchers understand when and how to use the most effective and efficient methods for storing and analyzing confidential research data so that those data are adequately protected from theft, loss or unauthorized use.

As a general practice, researchers working with human subjects should avoid collecting personally identifiable information (PII) whenever possible. Perhaps the best way to protect a research subject’s identity is by not knowing that identity in the first place. However, in many cases, the collection of PII is necessary for carrying out a research project. There are many ways in which PII arises in the normal course of conducting research.   If subjects sign informed consent agreements, their signatures are identifying information that must be securely stored.   If subjects are awarded a prize or paid for their participation in a study, the researcher needs enough identifying information to enable delivery of the payment or prize. In some cases, researchers may need to merge data from different sources (e.g., survey responses and biological data), a step that can only be carried out with some form of personal identifier. Likewise, longitudinal studies usually require storage of detailed personal identifiers so that subjects can be contacted for subsequent interviews over long periods of time.

PII is defined as information that is uniquely associated with an individual person. The HIPAA privacy rules identify 18 items (such as name, mailing address, email address, social security number, etc.) that are considered to be forms of PII. While the list is regarded as comprehensive, it is not necessarily exhaustive.

It is sometimes possible to infer the identity of someone participating in a research study even when the data for the study do not contain any explicit identifiers such as those listed above. For example, by cross-referencing certain variables such as state of residence, occupation, education, age, sex, and race, it might be possible to infer the identity of a research subject. As such, the absence of personal identifiers from a research data set does not obviate the need for secure storage and protection. Similarly, when research data sets are being made available for public use, the data need to be stripped of all personal identifiers and coded in a manner that does not allow anyone to infer the identity of a subject. This is often a difficult task because the identity of individuals can be inferred by using data sets from multiple sources.   The proliferation of public use datasets and publicly available records has increased the odds of being able to infer someone’s identity by merging multiple data sources through a phenomenon known as the Mosaic effect . Researchers who produce or share anonymous public use data files need to consider whether the data they are using or releasing could be used in combination with other publicly available data to infer individual identities. Researchers are encouraged to consult the Institutional Review Board to determine if their proposed research involves human subjects.

Research data are considered highly sensitive when there is a heightened risk that disclosure may result in embarrassment or harm to the research subject. Data on topics such as sexual behavior, illegal drug use, criminal behavior, crime victimization or mental health are considered highly sensitive. Information that could have adverse consequences for subjects or damage their financial standing, employability, insurability, or reputation should be adequately protected from public disclosure, theft, loss or unauthorized use, especially if it includes PII.

Many researchers at Princeton University receive data from outside agencies or institutions that are subject to restricted use agreements (also called data sharing agreements).  These are legal contracts that impose restrictions on the researchers’ use of the data and sometimes include detailed procedures for secure storage, restricted access and analysis of the data. As part of the agreement, certain government agencies may also visit the researcher (or “licensee”) to conduct a compliance audit. In other cases, restricted use agreements may simply prevent public release of the data or sale of the data to a third party.  But in cases where an agreement does not specify data security procedures, researchers must consider the need to keep their data secure so that the potential for harm to any individuals or organizations is minimized. When faced with two sets of data security requirements (e.g., one from the Princeton University IRB and one from a restricted use agreement), the researcher should always default to the requirements with higher standards for data protection. Additional information on restricted use agreements can be found here .

Researchers who work with open public records that contain PII (e.g., voter registration files, telephone directories, occupational license registries, property tax records, firearms registries, criminal records) may not meet the regulatory definition of research involving human subjects . However, researchers are advised to use caution when dealing with public records data that contain sensitive information. Merging and publishing sensitive information from publicly available records has the potential to embarrass or harm individuals described in the records even though the information is already public.   Researchers are encouraged to consult the Institutional Review Board to determine if their proposed research involves human subjects and whether risk of harm has be adequately minimized.

Public use data files are files from which all PII has been removed and the data are coded in such a way as to make identification of research subjects extremely unlikely.  Researchers who work with public use data sets that do not contain PII may not meet the regulatory definition of research involving  human subjects. However some restricted use agreements nevertheless require local IRB review. As such, researchers are encouraged to consult the Institutional Review Board to determine if their proposed research requires IRB review.

It is important to understand the differences between the terms  anonymous and confidential as they are used in different phases of a research study. When subjects are recruited for a research project, their involvement can be described as anonymous if it is impossible for anyone (even the researcher) to know whether or not those individuals participated in the study. For example, participation in an online survey that cannot be linked in any way to the individual would be considered anonymous. However, when participation is confidential , the research team knows that a particular individual has participated in the research and is obligated to protect that information from disclosure to others outside of the team, except as clearly noted in the consent document. Thus, in this   example, if study participants sign a consent form, the consent documents the subject’s participation in the study and must be treated as a confidential document, even if there is no way to connect data about them to their identity. In terms of the research data that are produced by a study, those data are anonymous if no one, not even the researcher, can connect the information back to the individual who provided it. The data do not contain any PII and it is not possible to infer the identity of anyone in the study.

When data are confidential , there continues to be a link between the data and the identity of the individual who provided it. The link usually takes the form of a study ID number that is common to both the de-identified data and the corresponding list of names or other types of PII. The research team is obligated to protect both the PII and the links from unintended disclosure according to the terms of the protocol approval by the IRB and the terms of the informed consent document. In order to protect against accidental disclosure, the subject’s name or other identifiers should be stored separately from their research data and replaced with a unique code to create a new identity for the subject.

Cybersecurity data science: an overview from machine learning perspective

Journal of Big Data volume  7 , Article number:  41 ( 2020 ) Cite this article

98k Accesses

118 Citations

41 Altmetric

Metrics details

In a computing context, cybersecurity is undergoing massive shifts in technology and its operations in recent days, and data science is driving the change. Extracting security incident patterns or insights from cybersecurity data and building corresponding data-driven model , is the key to make a security system automated and intelligent. To understand and analyze the actual phenomena with data, various scientific methods, machine learning techniques, processes, and systems are used, which is commonly known as data science. In this paper, we focus and briefly discuss on cybersecurity data science , where the data is being gathered from relevant cybersecurity sources, and the analytics complement the latest data-driven patterns for providing more effective security solutions. The concept of cybersecurity data science allows making the computing process more actionable and intelligent as compared to traditional ones in the domain of cybersecurity. We then discuss and summarize a number of associated research issues and future directions . Furthermore, we provide a machine learning based multi-layered framework for the purpose of cybersecurity modeling. Overall, our goal is not only to discuss cybersecurity data science and relevant methods but also to focus the applicability towards data-driven intelligent decision making for protecting the systems from cyber-attacks.

Introduction

Due to the increasing dependency on digitalization and Internet-of-Things (IoT) [ 1 ], various security incidents such as unauthorized access [ 2 ], malware attack [ 3 ], zero-day attack [ 4 ], data breach [ 5 ], denial of service (DoS) [ 2 ], social engineering or phishing [ 6 ] etc. have grown at an exponential rate in recent years. For instance, in 2010, there were less than 50 million unique malware executables known to the security community. By 2012, they were double around 100 million, and in 2019, there are more than 900 million malicious executables known to the security community, and this number is likely to grow, according to the statistics of AV-TEST institute in Germany [ 7 ]. Cybercrime and attacks can cause devastating financial losses and affect organizations and individuals as well. It’s estimated that, a data breach costs 8.19 million USD for the United States and 3.9 million USD on an average [ 8 ], and the annual cost to the global economy from cybercrime is 400 billion USD [ 9 ]. According to Juniper Research [ 10 ], the number of records breached each year to nearly triple over the next 5 years. Thus, it’s essential that organizations need to adopt and implement a strong cybersecurity approach to mitigate the loss. According to [ 11 ], the national security of a country depends on the business, government, and individual citizens having access to applications and tools which are highly secure, and the capability on detecting and eliminating such cyber-threats in a timely way. Therefore, to effectively identify various cyber incidents either previously seen or unseen, and intelligently protect the relevant systems from such cyber-attacks, is a key issue to be solved urgently.

figure 1

Popularity trends of data science, machine learning and cybersecurity over time, where x-axis represents the timestamp information and y axis represents the corresponding popularity values

Cybersecurity is a set of technologies and processes designed to protect computers, networks, programs and data from attack, damage, or unauthorized access [ 12 ]. In recent days, cybersecurity is undergoing massive shifts in technology and its operations in the context of computing, and data science (DS) is driving the change, where machine learning (ML), a core part of “Artificial Intelligence” (AI) can play a vital role to discover the insights from data. Machine learning can significantly change the cybersecurity landscape and data science is leading a new scientific paradigm [ 13 , 14 ]. The popularity of these related technologies is increasing day-by-day, which is shown in Fig.  1 , based on the data of the last five years collected from Google Trends [ 15 ]. The figure represents timestamp information in terms of a particular date in the x-axis and corresponding popularity in the range of 0 (minimum) to 100 (maximum) in the y-axis. As shown in Fig.  1 , the popularity indication values of these areas are less than 30 in 2014, while they exceed 70 in 2019, i.e., more than double in terms of increased popularity. In this paper, we focus on cybersecurity data science (CDS), which is broadly related to these areas in terms of security data processing techniques and intelligent decision making in real-world applications. Overall, CDS is security data-focused, applies machine learning methods to quantify cyber risks, and ultimately seeks to optimize cybersecurity operations. Thus, the purpose of this paper is for those academia and industry people who want to study and develop a data-driven smart cybersecurity model based on machine learning techniques. Therefore, great emphasis is placed on a thorough description of various types of machine learning methods, and their relations and usage in the context of cybersecurity. This paper does not describe all of the different techniques used in cybersecurity in detail; instead, it gives an overview of cybersecurity data science modeling based on artificial intelligence, particularly from machine learning perspective.

The ultimate goal of cybersecurity data science is data-driven intelligent decision making from security data for smart cybersecurity solutions. CDS represents a partial paradigm shift from traditional well-known security solutions such as firewalls, user authentication and access control, cryptography systems etc. that might not be effective according to today’s need in cyber industry [ 16 , 17 , 18 , 19 ]. The problems are these are typically handled statically by a few experienced security analysts, where data management is done in an ad-hoc manner [ 20 , 21 ]. However, as an increasing number of cybersecurity incidents in different formats mentioned above continuously appear over time, such conventional solutions have encountered limitations in mitigating such cyber risks. As a result, numerous advanced attacks are created and spread very quickly throughout the Internet. Although several researchers use various data analysis and learning techniques to build cybersecurity models that are summarized in “ Machine learning tasks in cybersecurity ” section, a comprehensive security model based on the effective discovery of security insights and latest security patterns could be more useful. To address this issue, we need to develop more flexible and efficient security mechanisms that can respond to threats and to update security policies to mitigate them intelligently in a timely manner. To achieve this goal, it is inherently required to analyze a massive amount of relevant cybersecurity data generated from various sources such as network and system sources, and to discover insights or proper security policies with minimal human intervention in an automated manner.

Analyzing cybersecurity data and building the right tools and processes to successfully protect against cybersecurity incidents goes beyond a simple set of functional requirements and knowledge about risks, threats or vulnerabilities. For effectively extracting the insights or the patterns of security incidents, several machine learning techniques, such as feature engineering, data clustering, classification, and association analysis, or neural network-based deep learning techniques can be used, which are briefly discussed in “ Machine learning tasks in cybersecurity ” section. These learning techniques are capable to find the anomalies or malicious behavior and data-driven patterns of associated security incidents to make an intelligent decision. Thus, based on the concept of data-driven decision making, we aim to focus on cybersecurity data science , where the data is being gathered from relevant cybersecurity sources such as network activity, database activity, application activity, or user activity, and the analytics complement the latest data-driven patterns for providing corresponding security solutions.

The contributions of this paper are summarized as follows.

We first make a brief discussion on the concept of cybersecurity data science and relevant methods to understand its applicability towards data-driven intelligent decision making in the domain of cybersecurity. For this purpose, we also make a review and brief discussion on different machine learning tasks in cybersecurity, and summarize various cybersecurity datasets highlighting their usage in different data-driven cyber applications.

We then discuss and summarize a number of associated research issues and future directions in the area of cybersecurity data science, that could help both the academia and industry people to further research and development in relevant application areas.

Finally, we provide a generic multi-layered framework of the cybersecurity data science model based on machine learning techniques. In this framework, we briefly discuss how the cybersecurity data science model can be used to discover useful insights from security data and making data-driven intelligent decisions to build smart cybersecurity systems.

The remainder of the paper is organized as follows. “ Background ” section summarizes background of our study and gives an overview of the related technologies of cybersecurity data science. “ Cybersecurity data science ” section defines and discusses briefly about cybersecurity data science including various categories of cyber incidents data. In “  Machine learning tasks in cybersecurity ” section, we briefly discuss various categories of machine learning techniques including their relations with cybersecurity tasks and summarize a number of machine learning based cybersecurity models in the field. “ Research issues and future directions ” section briefly discusses and highlights various research issues and future directions in the area of cybersecurity data science. In “  A multi-layered framework for smart cybersecurity services ” section, we suggest a machine learning-based framework to build cybersecurity data science model and discuss various layers with their roles. In “  Discussion ” section, we highlight several key points regarding our studies. Finally,  “ Conclusion ” section concludes this paper.

In this section, we give an overview of the related technologies of cybersecurity data science including various types of cybersecurity incidents and defense strategies.

Over the last half-century, the information and communication technology (ICT) industry has evolved greatly, which is ubiquitous and closely integrated with our modern society. Thus, protecting ICT systems and applications from cyber-attacks has been greatly concerned by the security policymakers in recent days [ 22 ]. The act of protecting ICT systems from various cyber-threats or attacks has come to be known as cybersecurity [ 9 ]. Several aspects are associated with cybersecurity: measures to protect information and communication technology; the raw data and information it contains and their processing and transmitting; associated virtual and physical elements of the systems; the degree of protection resulting from the application of those measures; and eventually the associated field of professional endeavor [ 23 ]. Craigen et al. defined “cybersecurity as a set of tools, practices, and guidelines that can be used to protect computer networks, software programs, and data from attack, damage, or unauthorized access” [ 24 ]. According to Aftergood et al. [ 12 ], “cybersecurity is a set of technologies and processes designed to protect computers, networks, programs and data from attacks and unauthorized access, alteration, or destruction”. Overall, cybersecurity concerns with the understanding of diverse cyber-attacks and devising corresponding defense strategies that preserve several properties defined as below [ 25 , 26 ].

Confidentiality is a property used to prevent the access and disclosure of information to unauthorized individuals, entities or systems.

Integrity is a property used to prevent any modification or destruction of information in an unauthorized manner.

Availability is a property used to ensure timely and reliable access of information assets and systems to an authorized entity.

The term cybersecurity applies in a variety of contexts, from business to mobile computing, and can be divided into several common categories. These are - network security that mainly focuses on securing a computer network from cyber attackers or intruders; application security that takes into account keeping the software and the devices free of risks or cyber-threats; information security that mainly considers security and the privacy of relevant data; operational security that includes the processes of handling and protecting data assets. Typical cybersecurity systems are composed of network security systems and computer security systems containing a firewall, antivirus software, or an intrusion detection system [ 27 ].

Cyberattacks and security risks

The risks typically associated with any attack, which considers three security factors, such as threats, i.e., who is attacking, vulnerabilities, i.e., the weaknesses they are attacking, and impacts, i.e., what the attack does [ 9 ]. A security incident is an act that threatens the confidentiality, integrity, or availability of information assets and systems. Several types of cybersecurity incidents that may result in security risks on an organization’s systems and networks or an individual [ 2 ]. These are:

Unauthorized access that describes the act of accessing information to network, systems or data without authorization that results in a violation of a security policy [ 2 ];

Malware known as malicious software, is any program or software that intentionally designed to cause damage to a computer, client, server, or computer network, e.g., botnets. Examples of different types of malware including computer viruses, worms, Trojan horses, adware, ransomware, spyware, malicious bots, etc. [ 3 , 26 ]; Ransom malware, or ransomware , is an emerging form of malware that prevents users from accessing their systems or personal files, or the devices, then demands an anonymous online payment in order to restore access.

Denial-of-Service is an attack meant to shut down a machine or network, making it inaccessible to its intended users by flooding the target with traffic that triggers a crash. The Denial-of-Service (DoS) attack typically uses one computer with an Internet connection, while distributed denial-of-service (DDoS) attack uses multiple computers and Internet connections to flood the targeted resource [ 2 ];

Phishing a type of social engineering , used for a broad range of malicious activities accomplished through human interactions, in which the fraudulent attempt takes part to obtain sensitive information such as banking and credit card details, login credentials, or personally identifiable information by disguising oneself as a trusted individual or entity via an electronic communication such as email, text, or instant message, etc. [ 26 ];

Zero-day attack is considered as the term that is used to describe the threat of an unknown security vulnerability for which either the patch has not been released or the application developers were unaware [ 4 , 28 ].

Beside these attacks mentioned above, privilege escalation [ 29 ], password attack [ 30 ], insider threat [ 31 ], man-in-the-middle [ 32 ], advanced persistent threat [ 33 ], SQL injection attack [ 34 ], cryptojacking attack [ 35 ], web application attack [ 30 ] etc. are well-known as security incidents in the field of cybersecurity. A data breach is another type of security incident, known as a data leak, which is involved in the unauthorized access of data by an individual, application, or service [ 5 ]. Thus, all data breaches are considered as security incidents, however, all the security incidents are not data breaches. Most data breaches occur in the banking industry involving the credit card numbers, personal information, followed by the healthcare sector and the public sector [ 36 ].

Cybersecurity defense strategies

Defense strategies are needed to protect data or information, information systems, and networks from cyber-attacks or intrusions. More granularly, they are responsible for preventing data breaches or security incidents and monitoring and reacting to intrusions, which can be defined as any kind of unauthorized activity that causes damage to an information system [ 37 ]. An intrusion detection system (IDS) is typically represented as “a device or software application that monitors a computer network or systems for malicious activity or policy violations” [ 38 ]. The traditional well-known security solutions such as anti-virus, firewalls, user authentication, access control, data encryption and cryptography systems, however might not be effective according to today’s need in the cyber industry

[ 16 , 17 , 18 , 19 ]. On the other hand, IDS resolves the issues by analyzing security data from several key points in a computer network or system [ 39 , 40 ]. Moreover, intrusion detection systems can be used to detect both internal and external attacks.

Intrusion detection systems are different categories according to the usage scope. For instance, a host-based intrusion detection system (HIDS), and network intrusion detection system (NIDS) are the most common types based on the scope of single computers to large networks. In a HIDS, the system monitors important files on an individual system, while it analyzes and monitors network connections for suspicious traffic in a NIDS. Similarly, based on methodologies, the signature-based IDS, and anomaly-based IDS are the most well-known variants [ 37 ].

Signature-based IDS : A signature can be a predefined string, pattern, or rule that corresponds to a known attack. A particular pattern is identified as the detection of corresponding attacks in a signature-based IDS. An example of a signature can be known patterns or a byte sequence in a network traffic, or sequences used by malware. To detect the attacks, anti-virus software uses such types of sequences or patterns as a signature while performing the matching operation. Signature-based IDS is also known as knowledge-based or misuse detection [ 41 ]. This technique can be efficient to process a high volume of network traffic, however, is strictly limited to the known attacks only. Thus, detecting new attacks or unseen attacks is one of the biggest challenges faced by this signature-based system.

Anomaly-based IDS : The concept of anomaly-based detection overcomes the issues of signature-based IDS discussed above. In an anomaly-based intrusion detection system, the behavior of the network is first examined to find dynamic patterns, to automatically create a data-driven model, to profile the normal behavior, and thus it detects deviations in the case of any anomalies [ 41 ]. Thus, anomaly-based IDS can be treated as a dynamic approach, which follows behavior-oriented detection. The main advantage of anomaly-based IDS is the ability to identify unknown or zero-day attacks [ 42 ]. However, the issue is that the identified anomaly or abnormal behavior is not always an indicator of intrusions. It sometimes may happen because of several factors such as policy changes or offering a new service.

In addition, a hybrid detection approach [ 43 , 44 ] that takes into account both the misuse and anomaly-based techniques discussed above can be used to detect intrusions. In a hybrid system, the misuse detection system is used for detecting known types of intrusions and anomaly detection system is used for novel attacks [ 45 ]. Beside these approaches, stateful protocol analysis can also be used to detect intrusions that identifies deviations of protocol state similarly to the anomaly-based method, however it uses predetermined universal profiles based on accepted definitions of benign activity [ 41 ]. In Table 1 , we have summarized these common approaches highlighting their pros and cons. Once the detecting has been completed, the intrusion prevention system (IPS) that is intended to prevent malicious events, can be used to mitigate the risks in different ways such as manual, providing notification, or automatic process [ 46 ]. Among these approaches, an automatic response system could be more effective as it does not involve a human interface between the detection and response systems.

We are living in the age of data, advanced analytics, and data science, which are related to data-driven intelligent decision making. Although, the process of searching patterns or discovering hidden and interesting knowledge from data is known as data mining [ 47 ], in this paper, we use the broader term “data science” rather than data mining. The reason is that, data science, in its most fundamental form, is all about understanding of data. It involves studying, processing, and extracting valuable insights from a set of information. In addition to data mining, data analytics is also related to data science. The development of data mining, knowledge discovery, and machine learning that refers creating algorithms and program which learn on their own, together with the original data analysis and descriptive analytics from the statistical perspective, forms the general concept of “data analytics” [ 47 ]. Nowadays, many researchers use the term “data science” to describe the interdisciplinary field of data collection, preprocessing, inferring, or making decisions by analyzing the data. To understand and analyze the actual phenomena with data, various scientific methods, machine learning techniques, processes, and systems are used, which is commonly known as data science. According to Cao et al. [ 47 ] “data science is a new interdisciplinary field that synthesizes and builds on statistics, informatics, computing, communication, management, and sociology to study data and its environments, to transform data to insights and decisions by following a data-to-knowledge-to-wisdom thinking and methodology”. As a high-level statement in the context of cybersecurity, we can conclude that it is the study of security data to provide data-driven solutions for the given security problems, as known as “the science of cybersecurity data”. Figure 2 shows the typical data-to-insight-to-decision transfer at different periods and general analytic stages in data science, in terms of a variety of analytics goals (G) and approaches (A) to achieve the data-to-decision goal [ 47 ].

figure 2

Data-to-insight-to-decision analytic stages in data science [ 47 ]

Based on the analytic power of data science including machine learning techniques, it can be a viable component of security strategies. By using data science techniques, security analysts can manipulate and analyze security data more effectively and efficiently, uncovering valuable insights from data. Thus, data science methodologies including machine learning techniques can be well utilized in the context of cybersecurity, in terms of problem understanding, gathering security data from diverse sources, preparing data to feed into the model, data-driven model building and updating, for providing smart security services, which motivates to define cybersecurity data science and to work in this research area.

Cybersecurity data science

In this section, we briefly discuss cybersecurity data science including various categories of cyber incidents data with the usage in different application areas, and the key terms and areas related to our study.

Understanding cybersecurity data

Data science is largely driven by the availability of data [ 48 ]. Datasets typically represent a collection of information records that consist of several attributes or features and related facts, in which cybersecurity data science is based on. Thus, it’s important to understand the nature of cybersecurity data containing various types of cyberattacks and relevant features. The reason is that raw security data collected from relevant cyber sources can be used to analyze the various patterns of security incidents or malicious behavior, to build a data-driven security model to achieve our goal. Several datasets exist in the area of cybersecurity including intrusion analysis, malware analysis, anomaly, fraud, or spam analysis that are used for various purposes. In Table 2 , we summarize several such datasets including their various features and attacks that are accessible on the Internet, and highlight their usage based on machine learning techniques in different cyber applications. Effectively analyzing and processing of these security features, building target machine learning-based security model according to the requirements, and eventually, data-driven decision making, could play a role to provide intelligent cybersecurity services that are discussed briefly in “ A multi-layered framework for smart cybersecurity services ” section.

Defining cybersecurity data science

Data science is transforming the world’s industries. It is critically important for the future of intelligent cybersecurity systems and services because of “security is all about data”. When we seek to detect cyber threats, we are analyzing the security data in the form of files, logs, network packets, or other relevant sources. Traditionally, security professionals didn’t use data science techniques to make detections based on these data sources. Instead, they used file hashes, custom-written rules like signatures, or manually defined heuristics [ 21 ]. Although these techniques have their own merits in several cases, it needs too much manual work to keep up with the changing cyber threat landscape. On the contrary, data science can make a massive shift in technology and its operations, where machine learning algorithms can be used to learn or extract insight of security incident patterns from the training data for their detection and prevention. For instance, to detect malware or suspicious trends, or to extract policy rules, these techniques can be used.

In recent days, the entire security industry is moving towards data science, because of its capability to transform raw data into decision making. To do this, several data-driven tasks can be associated, such as—(i) data engineering focusing practical applications of data gathering and analysis; (ii) reducing data volume that deals with filtering significant and relevant data to further analysis; (iii) discovery and detection that focuses on extracting insight or incident patterns or knowledge from data; (iv) automated models that focus on building data-driven intelligent security model; (v) targeted security  alerts focusing on the generation of remarkable security alerts based on discovered knowledge that minimizes the false alerts, and (vi) resource optimization that deals with the available resources to achieve the target goals in a security system. While making data-driven decisions, behavioral analysis could also play a significant role in the domain of cybersecurity [ 81 ].

Thus, the concept of cybersecurity data science incorporates the methods and techniques of data science and machine learning as well as the behavioral analytics of various security incidents. The combination of these technologies has given birth to the term “cybersecurity data science”, which refers to collect a large amount of security event data from different sources and analyze it using machine learning technologies for detecting security risks or attacks either through the discovery of useful insights or the latest data-driven patterns. It is, however, worth remembering that cybersecurity data science is not just about a collection of machine learning algorithms, rather,  a process that can help security professionals or analysts to scale and automate their security activities in a smart way and in a timely manner. Therefore, the formal definition can be as follows: “Cybersecurity data science is a research or working area existing at the intersection of cybersecurity, data science, and machine learning or artificial intelligence, which is mainly security data-focused, applies machine learning methods, attempts to quantify cyber-risks or incidents, and promotes inferential techniques to analyze behavioral patterns in security data. It also focuses on generating security response alerts, and eventually seeks for optimizing cybersecurity solutions, to build automated and intelligent cybersecurity systems.”

Table  3 highlights some key terms associated with cybersecurity data science. Overall, the outputs of cybersecurity data science are typically security data products, which can be a data-driven security model, policy rule discovery, risk or attack prediction, potential security service and recommendation, or the corresponding security system depending on the given security problem in the domain of cybersecurity. In the next section, we briefly discuss various machine learning tasks with examples within the scope of our study.

Machine learning tasks in cybersecurity

Machine learning (ML) is typically considered as a branch of “Artificial Intelligence”, which is closely related to computational statistics, data mining and analytics, data science, particularly focusing on making the computers to learn from data [ 82 , 83 ]. Thus, machine learning models typically comprise of a set of rules, methods, or complex “transfer functions” that can be applied to find interesting data patterns, or to recognize or predict behavior [ 84 ], which could play an important role in the area of cybersecurity. In the following, we discuss different methods that can be used to solve machine learning tasks and how they are related to cybersecurity tasks.

Supervised learning

Supervised learning is performed when specific targets are defined to reach from a certain set of inputs, i.e., task-driven approach. In the area of machine learning, the most popular supervised learning techniques are known as classification and regression methods [ 129 ]. These techniques are popular to classify or predict the future for a particular security problem. For instance, to predict denial-of-service attack (yes, no) or to identify different classes of network attacks such as scanning and spoofing, classification techniques can be used in the cybersecurity domain. ZeroR [ 83 ], OneR [ 130 ], Navies Bayes [ 131 ], Decision Tree [ 132 , 133 ], K-nearest neighbors [ 134 ], support vector machines [ 135 ], adaptive boosting [ 136 ], and logistic regression [ 137 ] are the well-known classification techniques. In addition, recently Sarker et al. have proposed BehavDT [ 133 ], and IntruDtree [ 106 ] classification techniques that are able to effectively build a data-driven predictive model. On the other hand, to predict the continuous or numeric value, e.g., total phishing attacks in a certain period or predicting the network packet parameters, regression techniques are useful. Regression analyses can also be used to detect the root causes of cybercrime and other types of fraud [ 138 ]. Linear regression [ 82 ], support vector regression [ 135 ] are the popular regression techniques. The main difference between classification and regression is that the output variable in the regression is numerical or continuous, while the predicted output for classification is categorical or discrete. Ensemble learning is an extension of supervised learning while mixing different simple models, e.g., Random Forest learning [ 139 ] that generates multiple decision trees to solve a particular security task.

Unsupervised learning

In unsupervised learning problems, the main task is to find patterns, structures, or knowledge in unlabeled data, i.e., data-driven approach [ 140 ]. In the area of cybersecurity, cyber-attacks like malware stays hidden in some ways, include changing their behavior dynamically and autonomously to avoid detection. Clustering techniques, a type of unsupervised learning, can help to uncover the hidden patterns and structures from the datasets, to identify indicators of such sophisticated attacks. Similarly, in identifying anomalies, policy violations, detecting, and eliminating noisy instances in data, clustering techniques can be useful. K-means [ 141 ], K-medoids [ 142 ] are the popular partitioning clustering algorithms, and single linkage [ 143 ] or complete linkage [ 144 ] are the well-known hierarchical clustering algorithms used in various application domains. Moreover, a bottom-up clustering approach proposed by Sarker et al. [ 145 ] can also be used by taking into account the data characteristics.

Besides, feature engineering tasks like optimal feature selection or extraction related to a particular security problem could be useful for further analysis [ 106 ]. Recently, Sarker et al. [ 106 ] have proposed an approach for selecting security features according to their importance score values. Moreover, Principal component analysis, linear discriminant analysis, pearson correlation analysis, or non-negative matrix factorization are the popular dimensionality reduction techniques to solve such issues [ 82 ]. Association rule learning is another example, where machine learning based policy rules can prevent cyber-attacks. In an expert system, the rules are usually manually defined by a knowledge engineer working in collaboration with a domain expert [ 37 , 140 , 146 ]. Association rule learning on the contrary, is the discovery of rules or relationships among a set of available security features or attributes in a given dataset [ 147 ]. To quantify the strength of relationships, correlation analysis can be used [ 138 ]. Many association rule mining algorithms have been proposed in the area of machine learning and data mining literature, such as logic-based [ 148 ], frequent pattern based [ 149 , 150 , 151 ], tree-based [ 152 ], etc. Recently, Sarker et al. [ 153 ] have proposed an association rule learning approach considering non-redundant generation, that can be used to discover a set of useful security policy rules. Moreover, AIS [ 147 ], Apriori [ 149 ], Apriori-TID and Apriori-Hybrid [ 149 ], FP-Tree [ 152 ], and RARM [ 154 ], and Eclat [ 155 ] are the well-known association rule learning algorithms that are capable to solve such problems by generating a set of policy rules in the domain of cybersecurity.

Neural networks and deep learning

Deep learning is a part of machine learning in the area of artificial intelligence, which is a computational model that is inspired by the biological neural networks in the human brain [ 82 ]. Artificial Neural Network (ANN) is frequently used in deep learning and the most popular neural network algorithm is backpropagation [ 82 ]. It performs learning on a multi-layer feed-forward neural network consists of an input layer, one or more hidden layers, and an output layer. The main difference between deep learning and classical machine learning is its performance on the amount of security data increases. Typically deep learning algorithms perform well when the data volumes are large, whereas machine learning algorithms perform comparatively better on small datasets [ 44 ]. In our earlier work, Sarker et al. [ 129 ], we have illustrated the effectiveness of these approaches considering contextual datasets. However, deep learning approaches mimic the human brain mechanism to interpret large amount of data or the complex data such as images, sounds and texts [ 44 , 129 ]. In terms of feature extraction to build models, deep learning reduces the effort of designing a feature extractor for each problem than the classical machine learning techniques. Beside these characteristics, deep learning typically takes a long time to train an algorithm than a machine learning algorithm, however, the test time is exactly the opposite [ 44 ]. Thus, deep learning relies more on high-performance machines with GPUs than classical machine-learning algorithms [ 44 , 156 ]. The most popular deep neural network learning models include multi-layer perceptron (MLP) [ 157 ], convolutional neural network (CNN) [ 158 ], recurrent neural network (RNN) or long-short term memory (LSTM) network [ 121 , 158 ]. In recent days, researchers use these deep learning techniques for different purposes such as detecting network intrusions, malware traffic detection and classification, etc. in the domain of cybersecurity [ 44 , 159 ].

Other learning techniques

Semi-supervised learning can be described as a hybridization of supervised and unsupervised techniques discussed above, as it works on both the labeled and unlabeled data. In the area of cybersecurity, it could be useful, when it requires to label data automatically without human intervention, to improve the performance of cybersecurity models. Reinforcement techniques are another type of machine learning that characterizes an agent by creating its own learning experiences through interacting directly with the environment, i.e., environment-driven approach, where the environment is typically formulated as a Markov decision process and take decision based on a reward function [ 160 ]. Monte Carlo learning, Q-learning, Deep Q Networks, are the most common reinforcement learning algorithms [ 161 ]. For instance, in a recent work [ 126 ], the authors present an approach for detecting botnet traffic or malicious cyber activities using reinforcement learning combining with neural network classifier. In another work [ 128 ], the authors discuss about the application of deep reinforcement learning to intrusion detection for supervised problems, where they received the best results for the Deep Q-Network algorithm. In the context of cybersecurity, genetic algorithms that use fitness, selection, crossover, and mutation for finding optimization, could also be used to solve a similar class of learning problems [ 119 ].

Various types of machine learning techniques discussed above can be useful in the domain of cybersecurity, to build an effective security model. In Table  4 , we have summarized several machine learning techniques that are used to build various types of security models for various purposes. Although these models typically represent a learning-based security model, in this paper, we aim to focus on a comprehensive cybersecurity data science model and relevant issues, in order to build a data-driven intelligent security system. In the next section, we highlight several research issues and potential solutions in the area of cybersecurity data science.

Research issues and future directions

Our study opens several research issues and challenges in the area of cybersecurity data science to extract insight from relevant data towards data-driven intelligent decision making for cybersecurity solutions. In the following, we summarize these challenges ranging from data collection to decision making.

Cybersecurity datasets : Source datasets are the primary component to work in the area of cybersecurity data science. Most of the existing datasets are old and might insufficient in terms of understanding the recent behavioral patterns of various cyber-attacks. Although the data can be transformed into a meaningful understanding level after performing several processing tasks, there is still a lack of understanding of the characteristics of recent attacks and their patterns of happening. Thus, further processing or machine learning algorithms may provide a low accuracy rate for making the target decisions. Therefore, establishing a large number of recent datasets for a particular problem domain like cyber risk prediction or intrusion detection is needed, which could be one of the major challenges in cybersecurity data science.

Handling quality problems in cybersecurity datasets : The cyber datasets might be noisy, incomplete, insignificant, imbalanced, or may contain inconsistency instances related to a particular security incident. Such problems in a data set may affect the quality of the learning process and degrade the performance of the machine learning-based models [ 162 ]. To make a data-driven intelligent decision for cybersecurity solutions, such problems in data is needed to deal effectively before building the cyber models. Therefore, understanding such problems in cyber data and effectively handling such problems using existing algorithms or newly proposed algorithm for a particular problem domain like malware analysis or intrusion detection and prevention is needed, which could be another research issue in cybersecurity data science.

Security policy rule generation : Security policy rules reference security zones and enable a user to allow, restrict, and track traffic on the network based on the corresponding user or user group, and service, or the application. The policy rules including the general and more specific rules are compared against the incoming traffic in sequence during the execution, and the rule that matches the traffic is applied. The policy rules used in most of the cybersecurity systems are static and generated by human expertise or ontology-based [ 163 , 164 ]. Although, association rule learning techniques produce rules from data, however, there is a problem of redundancy generation [ 153 ] that makes the policy rule-set complex. Therefore, understanding such problems in policy rule generation and effectively handling such problems using existing algorithms or newly proposed algorithm for a particular problem domain like access control [ 165 ] is needed, which could be another research issue in cybersecurity data science.

Hybrid learning method : Most commercial products in the cybersecurity domain contain signature-based intrusion detection techniques [ 41 ]. However, missing features or insufficient profiling can cause these techniques to miss unknown attacks. In that case, anomaly-based detection techniques or hybrid technique combining signature-based and anomaly-based can be used to overcome such issues. A hybrid technique combining multiple learning techniques or a combination of deep learning and machine-learning methods can be used to extract the target insight for a particular problem domain like intrusion detection, malware analysis, access control, etc. and make the intelligent decision for corresponding cybersecurity solutions.

Protecting the valuable security information : Another issue of a cyber data attack is the loss of extremely valuable data and information, which could be damaging for an organization. With the use of encryption or highly complex signatures, one can stop others from probing into a dataset. In such cases, cybersecurity data science can be used to build a data-driven impenetrable protocol to protect such security information. To achieve this goal, cyber analysts can develop algorithms by analyzing the history of cyberattacks to detect the most frequently targeted chunks of data. Thus, understanding such data protecting problems and designing corresponding algorithms to effectively handling these problems, could be another research issue in the area of cybersecurity data science.

Context-awareness in cybersecurity : Existing cybersecurity work mainly originates from the relevant cyber data containing several low-level features. When data mining and machine learning techniques are applied to such datasets, a related pattern can be identified that describes it properly. However, a broader contextual information [ 140 , 145 , 166 ] like temporal, spatial, relationship among events or connections, dependency can be used to decide whether there exists a suspicious activity or not. For instance, some approaches may consider individual connections as DoS attacks, while security experts might not treat them as malicious by themselves. Thus, a significant limitation of existing cybersecurity work is the lack of using the contextual information for predicting risks or attacks. Therefore, context-aware adaptive cybersecurity solutions could be another research issue in cybersecurity data science.

Feature engineering in cybersecurity : The efficiency and effectiveness of a machine learning-based security model has always been a major challenge due to the high volume of network data with a large number of traffic features. The large dimensionality of data has been addressed using several techniques such as principal component analysis (PCA) [ 167 ], singular value decomposition (SVD) [ 168 ] etc. In addition to low-level features in the datasets, the contextual relationships between suspicious activities might be relevant. Such contextual data can be stored in an ontology or taxonomy for further processing. Thus how to effectively select the optimal features or extract the significant features considering both the low-level features as well as the contextual features, for effective cybersecurity solutions could be another research issue in cybersecurity data science.

Remarkable security alert generation and prioritizing : In many cases, the cybersecurity system may not be well defined and may cause a substantial number of false alarms that are unexpected in an intelligent system. For instance, an IDS deployed in a real-world network generates around nine million alerts per day [ 169 ]. A network-based intrusion detection system typically looks at the incoming traffic for matching the associated patterns to detect risks, threats or vulnerabilities and generate security alerts. However, to respond to each such alert might not be effective as it consumes relatively huge amounts of time and resources, and consequently may result in a self-inflicted DoS. To overcome this problem, a high-level management is required that correlate the security alerts considering the current context and their logical relationship including their prioritization before reporting them to users, which could be another research issue in cybersecurity data science.

Recency analysis in cybersecurity solutions : Machine learning-based security models typically use a large amount of static data to generate data-driven decisions. Anomaly detection systems rely on constructing such a model considering normal behavior and anomaly, according to their patterns. However, normal behavior in a large and dynamic security system is not well defined and it may change over time, which can be considered as an incremental growing of dataset. The patterns in incremental datasets might be changed in several cases. This often results in a substantial number of false alarms known as false positives. Thus, a recent malicious behavioral pattern is more likely to be interesting and significant than older ones for predicting unknown attacks. Therefore, effectively using the concept of recency analysis [ 170 ] in cybersecurity solutions could be another issue in cybersecurity data science.

The most important work for an intelligent cybersecurity system is to develop an effective framework that supports data-driven decision making. In such a framework, we need to consider advanced data analysis based on machine learning techniques, so that the framework is capable to minimize these issues and to provide automated and intelligent security services. Thus, a well-designed security framework for cybersecurity data and the experimental evaluation is a very important direction and a big challenge as well. In the next section, we suggest and discuss a data-driven cybersecurity framework based on machine learning techniques considering multiple processing layers.

A multi-layered framework for smart cybersecurity services

As discussed earlier, cybersecurity data science is data-focused, applies machine learning methods, attempts to quantify cyber risks, promotes inferential techniques to analyze behavioral patterns, focuses on generating security response alerts, and eventually seeks for optimizing cybersecurity operations. Hence, we briefly discuss a multiple data processing layered framework that potentially can be used to discover security insights from the raw data to build smart cybersecurity systems, e.g., dynamic policy rule-based access control or intrusion detection and prevention system. To make a data-driven intelligent decision in the resultant cybersecurity system, understanding the security problems and the nature of corresponding security data and their vast analysis is needed. For this purpose, our suggested framework not only considers the machine learning techniques to build the security model but also takes into account the incremental learning and dynamism to keep the model up-to-date and corresponding response generation, which could be more effective and intelligent for providing the expected services. Figure 3 shows an overview of the framework, involving several processing layers, from raw security event data to services. In the following, we briefly discuss the working procedure of the framework.

figure 3

A generic multi-layered framework based on machine learning techniques for smart cybersecurity services

Security data collecting

Collecting valuable cybersecurity data is a crucial step, which forms a connecting link between security problems in cyberinfrastructure and corresponding data-driven solution steps in this framework, shown in Fig.  3 . The reason is that cyber data can serve as the source for setting up ground truth of the security model that affect the model performance. The quality and quantity of cyber data decide the feasibility and effectiveness of solving the security problem according to our goal. Thus, the concern is how to collect valuable and unique needs data for building the data-driven security models.

The general step to collect and manage security data from diverse data sources is based on a particular security problem and project within the enterprise. Data sources can be classified into several broad categories such as network, host, and hybrid [ 171 ]. Within the network infrastructure, the security system can leverage different types of security data such as IDS logs, firewall logs, network traffic data, packet data, and honeypot data, etc. for providing the target security services. For instance, a given IP is considered malicious or not, could be detected by performing data analysis utilizing the data of IP addresses and their cyber activities. In the domain of cybersecurity, the network source mentioned above is considered as the primary security event source to analyze. In the host category, it collects data from an organization’s host machines, where the data sources can be operating system logs, database access logs, web server logs, email logs, application logs, etc. Collecting data from both the network and host machines are considered a hybrid category. Overall, in a data collection layer the network activity, database activity, application activity, and user activity can be the possible security event sources in the context of cybersecurity data science.

Security data preparing

After collecting the raw security data from various sources according to the problem domain discussed above, this layer is responsible to prepare the raw data for building the model by applying various necessary processes. However, not all of the collected data contributes to the model building process in the domain of cybersecurity [ 172 ]. Therefore, the useless data should be removed from the rest of the data captured by the network sniffer. Moreover, data might be noisy, have missing or corrupted values, or have attributes of widely varying types and scales. High quality of data is necessary for achieving higher accuracy in a data-driven model, which is a process of learning a function that maps an input to an output based on example input-output pairs. Thus, it might require a procedure for data cleaning, handling missing or corrupted values. Moreover, security data features or attributes can be in different types, such as continuous, discrete, or symbolic [ 106 ]. Beyond a solid understanding of these types of data and attributes and their permissible operations, its need to preprocess the data and attributes to convert into the target type. Besides, the raw data can be in different types such as structured, semi-structured, or unstructured, etc. Thus, normalization, transformation, or collation can be useful to organize the data in a structured manner. In some cases, natural language processing techniques might be useful depending on data type and characteristics, e.g., textual contents. As both the quality and quantity of data decide the feasibility of solving the security problem, effectively pre-processing and management of data and their representation can play a significant role to build an effective security model for intelligent services.

Machine learning-based security modeling

This is the core step where insights and knowledge are extracted from data through the application of cybersecurity data science. In this section, we particularly focus on machine learning-based modeling as machine learning techniques can significantly change the cybersecurity landscape. The security features or attributes and their patterns in data are of high interest to be discovered and analyzed to extract security insights. To achieve the goal, a deeper understanding of data and machine learning-based analytical models utilizing a large number of cybersecurity data can be effective. Thus, various machine learning tasks can be involved in this model building layer according to the solution perspective. These are - security feature engineering that mainly responsible to transform raw security data into informative features that effectively represent the underlying security problem to the data-driven models. Thus, several data-processing tasks such as feature transformation and normalization, feature selection by taking into account a subset of available security features according to their correlations or importance in modeling, or feature generation and extraction by creating new brand principal components, may be involved in this module according to the security data characteristics. For instance, the chi-squared test, analysis of variance test, correlation coefficient analysis, feature importance, as well as discriminant and principal component analysis, or singular value decomposition, etc. can be used for analyzing the significance of the security features to perform the security feature engineering tasks [ 82 ].

Another significant module is security data clustering that uncovers hidden patterns and structures through huge volumes of security data, to identify where the new threats exist. It typically involves the grouping of security data with similar characteristics, which can be used to solve several cybersecurity problems such as detecting anomalies, policy violations, etc. Malicious behavior or anomaly detection module is typically responsible to identify a deviation to a known behavior, where clustering-based analysis and techniques can also be used to detect malicious behavior or anomaly detection. In the cybersecurity area, attack classification or prediction is treated as one of the most significant modules, which is responsible to build a prediction model to classify attacks or threats and to predict future for a particular security problem. To predict denial-of-service attack or a spam filter separating tasks from other messages, could be the relevant examples. Association learning or policy rule generation module can play a role to build an expert security system that comprises several IF-THEN rules that define attacks. Thus, in a problem of policy rule generation for rule-based access control system, association learning can be used as it discovers the associations or relationships among a set of available security features in a given security dataset. The popular machine learning algorithms in these categories are briefly discussed in “  Machine learning tasks in cybersecurity ” section. The module model selection or customization is responsible to choose whether it uses the existing machine learning model or needed to customize. Analyzing data and building models based on traditional machine learning or deep learning methods, could achieve acceptable results in certain cases in the domain of cybersecurity. However, in terms of effectiveness and efficiency or other performance measurements considering time complexity, generalization capacity, and most importantly the impact of the algorithm on the detection rate of a system, machine learning models are needed to customize for a specific security problem. Moreover, customizing the related techniques and data could improve the performance of the resultant security model and make it better applicable in a cybersecurity domain. The modules discussed above can work separately and combinedly depending on the target security problems.

Incremental learning and dynamism

In our framework, this layer is concerned with finalizing the resultant security model by incorporating additional intelligence according to the needs. This could be possible by further processing in several modules. For instance, the post-processing and improvement module in this layer could play a role to simplify the extracted knowledge according to the particular requirements by incorporating domain-specific knowledge. As the attack classification or prediction models based on machine learning techniques strongly rely on the training data, it can hardly be generalized to other datasets, which could be significant for some applications. To address such kind of limitations, this module is responsible to utilize the domain knowledge in the form of taxonomy or ontology to improve attack correlation in cybersecurity applications.

Another significant module recency mining and updating security model is responsible to keep the security model up-to-date for better performance by extracting the latest data-driven security patterns. The extracted knowledge discussed in the earlier layer is based on a static initial dataset considering the overall patterns in the datasets. However, such knowledge might not be guaranteed higher performance in several cases, because of incremental security data with recent patterns. In many cases, such incremental data may contain different patterns which could conflict with existing knowledge. Thus, the concept of RecencyMiner [ 170 ] on incremental security data and extracting new patterns can be more effective than the existing old patterns. The reason is that recent security patterns and rules are more likely to be significant than older ones for predicting cyber risks or attacks. Rather than processing the whole security data again, recency-based dynamic updating according to the new patterns would be more efficient in terms of processing and outcome. This could make the resultant cybersecurity model intelligent and dynamic. Finally, response planning and decision making module is responsible to make decisions based on the extracted insights and take necessary actions to prevent the system from the cyber-attacks to provide automated and intelligent services. The services might be different depending on particular requirements for a given security problem.

Overall, this framework is a generic description which potentially can be used to discover useful insights from security data, to build smart cybersecurity systems, to address complex security challenges, such as intrusion detection, access control management, detecting anomalies and fraud, or denial of service attacks, etc. in the area of cybersecurity data science.

Although several research efforts have been directed towards cybersecurity solutions, discussed in “ Background ” , “ Cybersecurity data science ”, and “ Machine learning tasks in cybersecurity ” sections in different directions, this paper presents a comprehensive view of cybersecurity data science. For this, we have conducted a literature review to understand cybersecurity data, various defense strategies including intrusion detection techniques, different types of machine learning techniques in cybersecurity tasks. Based on our discussion on existing work, several research issues related to security datasets, data quality problems, policy rule generation, learning methods, data protection, feature engineering, security alert generation, recency analysis etc. are identified that require further research attention in the domain of cybersecurity data science.

The scope of cybersecurity data science is broad. Several data-driven tasks such as intrusion detection and prevention, access control management, security policy generation, anomaly detection, spam filtering, fraud detection and prevention, various types of malware attack detection and defense strategies, etc. can be considered as the scope of cybersecurity data science. Such tasks based categorization could be helpful for security professionals including the researchers and practitioners who are interested in the domain-specific aspects of security systems [ 171 ]. The output of cybersecurity data science can be used in many application areas such as Internet of things (IoT) security [ 173 ], network security [ 174 ], cloud security [ 175 ], mobile and web applications [ 26 ], and other relevant cyber areas. Moreover, intelligent cybersecurity solutions are important for the banking industry, the healthcare sector, or the public sector, where data breaches typically occur [ 36 , 176 ]. Besides, the data-driven security solutions could also be effective in AI-based blockchain technology, where AI works with huge volumes of security event data to extract the useful insights using machine learning techniques, and block-chain as a trusted platform to store such data [ 177 ].

Although in this paper, we discuss cybersecurity data science focusing on examining raw security data to data-driven decision making for intelligent security solutions, it could also be related to big data analytics in terms of data processing and decision making. Big data deals with data sets that are too large or complex having characteristics of high data volume, velocity, and variety. Big data analytics mainly has two parts consisting of data management involving data storage, and analytics [ 178 ]. The analytics typically describe the process of analyzing such datasets to discover patterns, unknown correlations, rules, and other useful insights [ 179 ]. Thus, several advanced data analysis techniques such as AI, data mining, machine learning could play an important role in processing big data by converting big problems to small problems [ 180 ]. To do this, the potential strategies like parallelization, divide-and-conquer, incremental learning, sampling, granular computing, feature or instance selection, can be used to make better decisions, reducing costs, or enabling more efficient processing. In such cases, the concept of cybersecurity data science, particularly machine learning-based modeling could be helpful for process automation and decision making for intelligent security solutions. Moreover, researchers could consider modified algorithms or models for handing big data on parallel computing platforms like Hadoop, Storm, etc. [ 181 ].

Based on the concept of cybersecurity data science discussed in the paper, building a data-driven security model for a particular security problem and relevant empirical evaluation to measure the effectiveness and efficiency of the model, and to asses the usability in the real-world application domain could be a future work.

Motivated by the growing significance of cybersecurity and data science, and machine learning technologies, in this paper, we have discussed how cybersecurity data science applies to data-driven intelligent decision making in smart cybersecurity systems and services. We also have discussed how it can impact security data, both in terms of extracting insight of security incidents and the dataset itself. We aimed to work on cybersecurity data science by discussing the state of the art concerning security incidents data and corresponding security services. We also discussed how machine learning techniques can impact in the domain of cybersecurity, and examine the security challenges that remain. In terms of existing research, much focus has been provided on traditional security solutions, with less available work in machine learning technique based security systems. For each common technique, we have discussed relevant security research. The purpose of this article is to share an overview of the conceptualization, understanding, modeling, and thinking about cybersecurity data science.

We have further identified and discussed various key issues in security analysis to showcase the signpost of future research directions in the domain of cybersecurity data science. Based on the knowledge, we have also provided a generic multi-layered framework of cybersecurity data science model based on machine learning techniques, where the data is being gathered from diverse sources, and the analytics complement the latest data-driven patterns for providing intelligent security services. The framework consists of several main phases - security data collecting, data preparation, machine learning-based security modeling, and incremental learning and dynamism for smart cybersecurity systems and services. We specifically focused on extracting insights from security data, from setting a research design with particular attention to concepts for data-driven intelligent security solutions.

Overall, this paper aimed not only to discuss cybersecurity data science and relevant methods but also to discuss the applicability towards data-driven intelligent decision making in cybersecurity systems and services from machine learning perspectives. Our analysis and discussion can have several implications both for security researchers and practitioners. For researchers, we have highlighted several issues and directions for future research. Other areas for potential research include empirical evaluation of the suggested data-driven model, and comparative analysis with other security systems. For practitioners, the multi-layered machine learning-based model can be used as a reference in designing intelligent cybersecurity systems for organizations. We believe that our study on cybersecurity data science opens a promising path and can be used as a reference guide for both academia and industry for future research and applications in the area of cybersecurity.

Availability of data and materials

Not applicable.

Abbreviations

Artificial Intelligence

Information and communication technology

Internet of Things

Distributed Denial of Service

Intrusion detection system

Intrusion prevention system

Host-based intrusion detection systems

Network Intrusion Detection Systems

Signature-based intrusion detection system

Anomaly-based intrusion detection system

Li S, Da Xu L, Zhao S. The internet of things: a survey. Inform Syst Front. 2015;17(2):243–59.

Google Scholar  

Sun N, Zhang J, Rimba P, Gao S, Zhang LY, Xiang Y. Data-driven cybersecurity incident prediction: a survey. IEEE Commun Surv Tutor. 2018;21(2):1744–72.

McIntosh T, Jang-Jaccard J, Watters P, Susnjak T. The inadequacy of entropy-based ransomware detection. In: International conference on neural information processing. New York: Springer; 2019. p. 181–189

Alazab M, Venkatraman S, Watters P, Alazab M, et al. Zero-day malware detection based on supervised learning algorithms of api call signatures (2010)

Shaw A. Data breach: from notification to prevention using pci dss. Colum Soc Probs. 2009;43:517.

Gupta BB, Tewari A, Jain AK, Agrawal DP. Fighting against phishing attacks: state of the art and future challenges. Neural Comput Appl. 2017;28(12):3629–54.

Av-test institute, germany, https://www.av-test.org/en/statistics/malware/ . Accessed 20 Oct 2019.

Ibm security report, https://www.ibm.com/security/data-breach . Accessed on 20 Oct 2019.

Fischer EA. Cybersecurity issues and challenges: In brief. Congressional Research Service (2014)

Juniper research. https://www.juniperresearch.com/ . Accessed on 20 Oct 2019.

Papastergiou S, Mouratidis H, Kalogeraki E-M. Cyber security incident handling, warning and response system for the european critical information infrastructures (cybersane). In: International Conference on Engineering Applications of Neural Networks, p. 476–487 (2019). New York: Springer

Aftergood S. Cybersecurity: the cold war online. Nature. 2017;547(7661):30.

Hey AJ, Tansley S, Tolle KM, et al. The fourth paradigm: data-intensive scientific discovery. 2009;1:

Cukier K. Data, data everywhere: A special report on managing information, 2010.

Google trends. In: https://trends.google.com/trends/ , 2019.

Anwar S, Mohamad Zain J, Zolkipli MF, Inayat Z, Khan S, Anthony B, Chang V. From intrusion detection to an intrusion response system: fundamentals, requirements, and future directions. Algorithms. 2017;10(2):39.

MATH   Google Scholar  

Mohammadi S, Mirvaziri H, Ghazizadeh-Ahsaee M, Karimipour H. Cyber intrusion detection by combined feature selection algorithm. J Inform Sec Appl. 2019;44:80–8.

Tapiador JE, Orfila A, Ribagorda A, Ramos B. Key-recovery attacks on kids, a keyed anomaly detection system. IEEE Trans Depend Sec Comput. 2013;12(3):312–25.

Tavallaee M, Stakhanova N, Ghorbani AA. Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(5), 516–524 (2010)

Foroughi F, Luksch P. Data science methodology for cybersecurity projects. arXiv preprint arXiv:1803.04219 , 2018.

Saxe J, Sanders H. Malware data science: Attack detection and attribution, 2018.

Rainie L, Anderson J, Connolly J. Cyber attacks likely to increase. Digital Life in. 2014, vol. 2025.

Fischer EA. Creating a national framework for cybersecurity: an analysis of issues and options. LIBRARY OF CONGRESS WASHINGTON DC CONGRESSIONAL RESEARCH SERVICE, 2005.

Craigen D, Diakun-Thibault N, Purse R. Defining cybersecurity. Technology Innovation. Manag Rev. 2014;4(10):13–21.

Council NR. et al. Toward a safer and more secure cyberspace, 2007.

Jang-Jaccard J, Nepal S. A survey of emerging threats in cybersecurity. J Comput Syst Sci. 2014;80(5):973–93.

MathSciNet   MATH   Google Scholar  

Mukkamala S, Sung A, Abraham A. Cyber security challenges: Designing efficient intrusion detection systems and antivirus tools. Vemuri, V. Rao, Enhancing Computer Security with Smart Technology.(Auerbach, 2006), 125–163, 2005.

Bilge L, Dumitraş T. Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM conference on computer and communications security. ACM; 2012. p. 833–44.

Davi L, Dmitrienko A, Sadeghi A-R, Winandy M. Privilege escalation attacks on android. In: International conference on information security. New York: Springer; 2010. p. 346–60.

Jovičić B, Simić D. Common web application attack types and security using asp .net. ComSIS, 2006.

Warkentin M, Willison R. Behavioral and policy issues in information systems security: the insider threat. Eur J Inform Syst. 2009;18(2):101–5.

Kügler D. “man in the middle” attacks on bluetooth. In: International Conference on Financial Cryptography. New York: Springer; 2003, p. 149–61.

Virvilis N, Gritzalis D. The big four-what we did wrong in advanced persistent threat detection. In: 2013 International Conference on Availability, Reliability and Security. IEEE; 2013. p. 248–54.

Boyd SW, Keromytis AD. Sqlrand: Preventing sql injection attacks. In: International conference on applied cryptography and network security. New York: Springer; 2004. p. 292–302.

Sigler K. Crypto-jacking: how cyber-criminals are exploiting the crypto-currency boom. Comput Fraud Sec. 2018;2018(9):12–4.

2019 data breach investigations report, https://enterprise.verizon.com/resources/reports/dbir/ . Accessed 20 Oct 2019.

Khraisat A, Gondal I, Vamplew P, Kamruzzaman J. Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity. 2019;2(1):20.

Johnson L. Computer incident response and forensics team management: conducting a successful incident response, 2013.

Brahmi I, Brahmi H, Yahia SB. A multi-agents intrusion detection system using ontology and clustering techniques. In: IFIP international conference on computer science and its applications. New York: Springer; 2015. p. 381–93.

Qu X, Yang L, Guo K, Ma L, Sun M, Ke M, Li M. A survey on the development of self-organizing maps for unsupervised intrusion detection. In: Mobile networks and applications. 2019;1–22.

Liao H-J, Lin C-HR, Lin Y-C, Tung K-Y. Intrusion detection system: a comprehensive review. J Netw Comput Appl. 2013;36(1):16–24.

Alazab A, Hobbs M, Abawajy J, Alazab M. Using feature selection for intrusion detection system. In: 2012 International symposium on communications and information technologies (ISCIT). IEEE; 2012. p. 296–301.

Viegas E, Santin AO, Franca A, Jasinski R, Pedroni VA, Oliveira LS. Towards an energy-efficient anomaly-based intrusion detection engine for embedded systems. IEEE Trans Comput. 2016;66(1):163–77.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.

Dutt I, Borah S, Maitra IK, Bhowmik K, Maity A, Das S. Real-time hybrid intrusion detection system using machine learning techniques. 2018, p. 885–94.

Ragsdale DJ, Carver C, Humphries JW, Pooch UW. Adaptation techniques for intrusion detection and intrusion response systems. In: Smc 2000 conference proceedings. 2000 IEEE international conference on systems, man and cybernetics.’cybernetics evolving to systems, humans, organizations, and their complex interactions’(cat. No. 0). IEEE; 2000. vol. 4, p. 2344–2349.

Cao L. Data science: challenges and directions. Commun ACM. 2017;60(8):59–68.

Rizk A, Elragal A. Data science: developing theoretical contributions in information systems via text analytics. J Big Data. 2020;7(1):1–26.

Lippmann RP, Fried DJ, Graf I, Haines JW, Kendall KR, McClung D, Weber D, Webster SE, Wyschogrod D, Cunningham RK, et al. Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation. In: Proceedings DARPA information survivability conference and exposition. DISCEX’00. IEEE; 2000. vol. 2, p. 12–26.

Kdd cup 99. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html . Accessed 20 Oct 2019.

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications. IEEE; 2009. p. 1–6.

Caida ddos attack 2007 dataset. http://www.caida.org/data/ passive/ddos-20070804-dataset.xml/ . Accessed 20 Oct 2019.

Caida anonymized internet traces 2008 dataset. https://www.caida.org/data/passive/passive-2008-dataset . Accessed 20 Oct 2019.

Isot botnet dataset. https://www.uvic.ca/engineering/ece/isot/ datasets/index.php/ . Accessed 20 Oct 2019.

The honeynet project. http://www.honeynet.org/chapters/france/ . Accessed 20 Oct 2019.

Canadian institute of cybersecurity, university of new brunswick, iscx dataset, http://www.unb.ca/cic/datasets/index.html/ . Accessed 20 Oct 2019.

Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur. 2012;31(3):357–74.

The ctu-13 dataset. https://stratosphereips.org/category/datasets-ctu13 . Accessed 20 Oct 2019.

Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS). IEEE; 2015. p. 1–6.

Cse-cic-ids2018 [online]. available: https://www.unb.ca/cic/ datasets/ids-2018.html/ . Accessed 20 Oct 2019.

Cic-ddos2019 [online]. available: https://www.unb.ca/cic/datasets/ddos-2019.html/ . Accessed 28 Mar 2019.

Jing X, Yan Z, Jiang X, Pedrycz W. Network traffic fusion and analysis against ddos flooding attacks with a novel reversible sketch. Inform Fusion. 2019;51:100–13.

Xie M, Hu J, Yu X, Chang E. Evaluating host-based anomaly detection systems: application of the frequency-based algorithms to adfa-ld. In: International conference on network and system security. New York: Springer; 2015. p. 542–49.

Lindauer B, Glasser J, Rosen M, Wallnau KC, ExactData L. Generating test data for insider threat detectors. JoWUA. 2014;5(2):80–94.

Glasser J, Lindauer B. Bridging the gap: A pragmatic approach to generating insider threat data. In: 2013 IEEE Security and Privacy Workshops. IEEE; 2013. p. 98–104.

Enronspam. https://labs-repos.iit.demokritos.gr/skel/i-config/downloads/enron-spam/ . Accessed 20 Oct 2019.

Spamassassin. http://www.spamassassin.org/publiccorpus/ . Accessed 20 Oct 2019.

Lingspam. https://labs-repos.iit.demokritos.gr/skel/i-config/downloads/lingspampublic.tar.gz/ . Accessed 20 Oct 2019.

Alexa top sites. https://aws.amazon.com/alexa-top-sites/ . Accessed 20 Oct 2019.

Bambenek consulting—master feeds. available online: http://osint.bambenekconsulting.com/feeds/ . Accessed 20 Oct 2019.

Dgarchive. https://dgarchive.caad.fkie.fraunhofer.de/site/ . Accessed 20 Oct 2019.

Zago M, Pérez MG, Pérez GM. Umudga: A dataset for profiling algorithmically generated domain names in botnet detection. Data in Brief. 2020;105400.

Zhou Y, Jiang X. Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on security and privacy. IEEE; 2012. p. 95–109.

Virusshare. http://virusshare.com/ . Accessed 20 Oct 2019.

Virustotal. https://virustotal.com/ . Accessed 20 Oct 2019.

Comodo. https://www.comodo.com/home/internet-security/updates/vdp/database . Accessed 20 Oct 2019.

Contagio. http://contagiodump.blogspot.com/ . Accessed 20 Oct 2019.

Kumar R, Xiaosong Z, Khan RU, Kumar J, Ahad I. Effective and explainable detection of android malware based on machine learning algorithms. In: Proceedings of the 2018 international conference on computing and artificial intelligence. ACM; 2018. p. 35–40.

Microsoft malware classification (big 2015). arXiv:org/abs/1802.10135/ . Accessed 20 Oct 2019.

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Future Gen Comput Syst. 2019;100:779–96.

McIntosh TR, Jang-Jaccard J, Watters PA. Large scale behavioral analysis of ransomware attacks. In: International conference on neural information processing. New York: Springer; 2018. p. 217–29.

Han J, Pei J, Kamber M. Data mining: concepts and techniques, 2011.

Witten IH, Frank E. Data mining: Practical machine learning tools and techniques, 2005.

Dua S, Du X. Data mining and machine learning in cybersecurity, 2016.

Kotpalliwar MV, Wajgi R. Classification of attacks using support vector machine (svm) on kddcup’99 ids database. In: 2015 Fifth international conference on communication systems and network technologies. IEEE; 2015. p. 987–90.

Pervez MS, Farid DM. Feature selection and intrusion classification in nsl-kdd cup 99 dataset employing svms. In: The 8th international conference on software, knowledge, information management and applications (SKIMA 2014). IEEE; 2014. p. 1–6.

Yan M, Liu Z. A new method of transductive svm-based network intrusion detection. In: International conference on computer and computing technologies in agriculture. New York: Springer; 2010. p. 87–95.

Li Y, Xia J, Zhang S, Yan J, Ai X, Dai K. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Syst Appl. 2012;39(1):424–30.

Raman MG, Somu N, Jagarapu S, Manghnani T, Selvam T, Krithivasan K, Sriram VS. An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm. Artificial Intelligence Review. 2019, p. 1–32.

Kokila R, Selvi ST, Govindarajan K. Ddos detection and analysis in sdn-based environment using support vector machine classifier. In: 2014 Sixth international conference on advanced computing (ICoAC). IEEE; 2014. p. 205–10.

Xie M, Hu J, Slay J. Evaluating host-based anomaly detection systems: Application of the one-class svm algorithm to adfa-ld. In: 2014 11th international conference on fuzzy systems and knowledge discovery (FSKD). IEEE; 2014. p. 978–82.

Saxena H, Richariya V. Intrusion detection in kdd99 dataset using svm-pso and feature reduction with information gain. Int J Comput Appl. 2014;98:6.

Chandrasekhar A, Raghuveer K. Confederation of fcm clustering, ann and svm techniques to implement hybrid nids using corrected kdd cup 99 dataset. In: 2014 international conference on communication and signal processing. IEEE; 2014. p. 672–76.

Shapoorifard H, Shamsinejad P. Intrusion detection using a novel hybrid method incorporating an improved knn. Int J Comput Appl. 2017;173(1):5–9.

Vishwakarma S, Sharma V, Tiwari A. An intrusion detection system using knn-aco algorithm. Int J Comput Appl. 2017;171(10):18–23.

Meng W, Li W, Kwok L-F. Design of intelligent knn-based alarm filter using knowledge-based alert verification in intrusion detection. Secur Commun Netw. 2015;8(18):3883–95.

Dada E. A hybridized svm-knn-pdapso approach to intrusion detection system. In: Proc. Fac. Seminar Ser., 2017, p. 14–21.

Sharifi AM, Amirgholipour SK, Pourebrahimi A. Intrusion detection based on joint of k-means and knn. J Converg Inform Technol. 2015;10(5):42.

Lin W-C, Ke S-W, Tsai C-F. Cann: an intrusion detection system based on combining cluster centers and nearest neighbors. Knowl Based Syst. 2015;78:13–21.

Koc L, Mazzuchi TA, Sarkani S. A network intrusion detection system based on a hidden naïve bayes multiclass classifier. Exp Syst Appl. 2012;39(18):13492–500.

Moon D, Im H, Kim I, Park JH. Dtb-ids: an intrusion detection system based on decision tree using behavior analysis for preventing apt attacks. J Supercomput. 2017;73(7):2881–95.

Ingre, B., Yadav, A., Soni, A.K.: Decision tree based intrusion detection system for nsl-kdd dataset. In: International conference on information and communication technology for intelligent systems. New York: Springer; 2017. p. 207–18.

Malik AJ, Khan FA. A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection. Cluster Comput. 2018;21(1):667–80.

Relan NG, Patil DR. Implementation of network intrusion detection system using variant of decision tree algorithm. In: 2015 international conference on nascent technologies in the engineering field (ICNTE). IEEE; 2015. p. 1–5.

Rai K, Devi MS, Guleria A. Decision tree based algorithm for intrusion detection. Int J Adv Netw Appl. 2016;7(4):2828.

Sarker IH, Abushark YB, Alsolami F, Khan AI. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Puthran S, Shah K. Intrusion detection using improved decision tree algorithm with binary and quad split. In: International symposium on security in computing and communication. New York: Springer; 2016. p. 427–438.

Balogun AO, Jimoh RG. Anomaly intrusion detection using an hybrid of decision tree and k-nearest neighbor, 2015.

Azad C, Jha VK. Genetic algorithm to solve the problem of small disjunct in the decision tree based intrusion detection system. Int J Comput Netw Inform Secur. 2015;7(8):56.

Jo S, Sung H, Ahn B. A comparative study on the performance of intrusion detection using decision tree and artificial neural network models. J Korea Soc Dig Indus Inform Manag. 2015;11(4):33–45.

Zhan J, Zulkernine M, Haque A. Random-forests-based network intrusion detection systems. IEEE Trans Syst Man Cybern C. 2008;38(5):649–59.

Tajbakhsh A, Rahmati M, Mirzaei A. Intrusion detection using fuzzy association rules. Appl Soft Comput. 2009;9(2):462–9.

Mitchell R, Chen R. Behavior rule specification-based intrusion detection for safety critical medical cyber physical systems. IEEE Trans Depend Secure Comput. 2014;12(1):16–30.

Alazab M, Venkataraman S, Watters P. Towards understanding malware behaviour by the extraction of api calls. In: 2010 second cybercrime and trustworthy computing Workshop. IEEE; 2010. p. 52–59.

Yuan Y, Kaklamanos G, Hogrefe D. A novel semi-supervised adaboost technique for network anomaly detection. In: Proceedings of the 19th ACM international conference on modeling, analysis and simulation of wireless and mobile systems. ACM; 2016. p. 111–14.

Ariu D, Tronci R, Giacinto G. Hmmpayl: an intrusion detection system based on hidden markov models. Comput Secur. 2011;30(4):221–41.

Årnes A, Valeur F, Vigna G, Kemmerer RA. Using hidden markov models to evaluate the risks of intrusions. In: International workshop on recent advances in intrusion detection. New York: Springer; 2006. p. 145–64.

Hansen JV, Lowry PB, Meservy RD, McDonald DM. Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection. Decis Supp Syst. 2007;43(4):1362–74.

Aslahi-Shahri B, Rahmani R, Chizari M, Maralani A, Eslami M, Golkar MJ, Ebrahimi A. A hybrid method consisting of ga and svm for intrusion detection system. Neural Comput Appl. 2016;27(6):1669–76.

Alrawashdeh K, Purdy C. Toward an online anomaly intrusion detection system based on deep learning. In: 2016 15th IEEE international conference on machine learning and applications (ICMLA). IEEE; 2016. p. 195–200.

Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954–61.

Kim J, Kim J, Thu HLT, Kim H. Long short term memory recurrent neural network classifier for intrusion detection. In: 2016 international conference on platform technology and service (PlatCon). IEEE; 2016. p. 1–5.

Almiani M, AbuGhazleh A, Al-Rahayfeh A, Atiewi S, Razaque A. Deep recurrent neural network for iot intrusion detection system. Simulation Modelling Practice and Theory. 2019;102031.

Kolosnjaji B, Zarras A, Webster G, Eckert C. Deep learning for classification of malware system call sequences. In: Australasian joint conference on artificial intelligence. New York: Springer; 2016. p. 137–49.

Wang W, Zhu M, Zeng X, Ye X, Sheng Y. Malware traffic classification using convolutional neural network for representation learning. In: 2017 international conference on information networking (ICOIN). IEEE; 2017. p. 712–17.

Alauthman M, Aslam N, Al-kasassbeh M, Khan S, Al-Qerem A, Choo K-KR. An efficient reinforcement learning-based botnet detection approach. J Netw Comput Appl. 2020;150:102479.

Blanco R, Cilla JJ, Briongos S, Malagón P, Moya JM. Applying cost-sensitive classifiers with reinforcement learning to ids. In: International conference on intelligent data engineering and automated learning. New York: Springer; 2018. p. 531–38.

Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Application of deep reinforcement learning to intrusion detection for supervised problems. Exp Syst Appl. 2020;141:112963.

Sarker IH, Kayes A, Watters P. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.

John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.; 1995. p. 338–45.

Quinlan JR. C4.5: Programs for machine learning. Machine Learning, 1993.

Sarker IH, Colman A, Han J, Khan AI, Abushark YB, Salah K. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mobile Networks and Applications. 2019, p. 1–11.

Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.

Keerthi SS, Shevade SK, Bhattacharyya C, Murthy KRK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.

Freund Y, Schapire RE, et al: Experiments with a new boosting algorithm. In: Icml, vol. 96, p. 148–156 (1996). Citeseer

Le Cessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J Royal Stat Soc C. 1992;41(1):191–201.

Watters PA, McCombie S, Layton R, Pieprzyk J. Characterising and predicting cyber attacks using the cyber attacker model profile (camp). J Money Launder Control. 2012.

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):95.

MacQueen J. Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley symposium on mathematical statistics and probability, vol. 1, 1967.

Rokach L. A survey of clustering algorithms. In: Data Mining and Knowledge Discovery Handbook. New York: Springer; 2010. p. 269–98.

Sneath PH. The application of computers to taxonomy. J Gen Microbiol. 1957;17:1.

Sorensen T. method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948;5.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J. 2018;61(3):349–68.

Kim G, Lee S, Kim S. A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Exp Syst Appl. 2014;41(4):1690–700.

MathSciNet   Google Scholar  

Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM; 1993. vol. 22, p. 207–16.

Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.

Agrawal R, Srikant R, et al: Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994, vol. 1215, p. 487–99.

Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Proceedings of the eleventh international conference on data engineering. IEEE; 1995. p. 25–33.

Ma BLWHY. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.

Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record. ACM; 2000. vol. 29, p. 1–12.

Sarker IH, Salim FD. Mining user behavioral rules from smartphone data through association analysis. In: Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Melbourne, Australia. New York: Springer; 2018. p. 450–61.

Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on information and knowledge management. ACM; 2001. p. 474–81.

Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.

Coelho IM, Coelho VN, Luz EJS, Ochi LS, Guimarães FG, Rios E. A gpu deep learning metaheuristic based model for time series forecasting. Appl Energy. 2017;201:412–8.

Van Efferen L, Ali-Eldin AM. A multi-layer perceptron approach for flow-based anomaly detection. In: 2017 International symposium on networks, computers and communications (ISNCC). IEEE; 2017. p. 1–6.

Liu H, Lang B, Liu M, Yan H. Cnn and rnn based payload classification methods for attack detection. Knowl Based Syst. 2019;163:332–41.

Berman DS, Buczak AL, Chavis JS, Corbett CL. A survey of deep learning methods for cyber security. Information. 2019;10(4):122.

Bellman R. A markovian decision process. J Math Mech. 1957;1:679–84.

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet of Things. 2019;5:180–93.

Kayes ASM, Han J, Colman A. OntCAAC: an ontology-based approach to context-aware access control for software services. Comput J. 2015;58(11):3000–34.

Kayes ASM, Rahayu W, Dillon T. An ontology-based approach to dynamic contextual role for pervasive access control. In: AINA 2018. IEEE Computer Society, 2018.

Colombo P, Ferrari E. Access control technologies for big data management systems: literature review and future trends. Cybersecurity. 2019;2(1):1–13.

Aleroud A, Karabatis G. Contextual information fusion for intrusion detection: a survey and taxonomy. Knowl Inform Syst. 2017;52(3):563–619.

Sarker IH, Abushark YB, Khan AI. Contextpca: Predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Madsen RE, Hansen LK, Winther O. Singular value decomposition and principal component analysis. Neural Netw. 2004;1:1–5.

Qiao L-B, Zhang B-F, Lai Z-Q, Su J-S. Mining of attack models in ids alerts from network backbone by a two-stage clustering method. In: 2012 IEEE 26th international parallel and distributed processing symposium workshops & Phd Forum. IEEE; 2012. p. 1263–9.

Sarker IH, Colman A, Han J. Recencyminer: mining recency-based personalized behavior from contextual smartphone data. J Big Data. 2019;6(1):49.

Ullah F, Babar MA. Architectural tactics for big data cybersecurity analytics systems: a review. J Syst Softw. 2019;151:81–118.

Zhao S, Leftwich K, Owens M, Magrone F, Schonemann J, Anderson B, Medhi D. I-can-mama: Integrated campus network monitoring and management. In: 2014 IEEE network operations and management symposium (NOMS). IEEE; 2014. p. 1–7.

Abomhara M, et al. Cyber security and the internet of things: vulnerabilities, threats, intruders and attacks. J Cyber Secur Mob. 2015;4(1):65–88.

Helali RGM. Data mining based network intrusion detection system: A survey. In: Novel algorithms and techniques in telecommunications and networking. New York: Springer; 2010. p. 501–505.

Ryoo J, Rizvi S, Aiken W, Kissell J. Cloud security auditing: challenges and emerging approaches. IEEE Secur Priv. 2013;12(6):68–74.

Densham B. Three cyber-security strategies to mitigate the impact of a data breach. Netw Secur. 2015;2015(1):5–8.

Salah K, Rehman MHU, Nizamuddin N, Al-Fuqaha A. Blockchain for ai: review and open research challenges. IEEE Access. 2019;7:10127–49.

Gandomi A, Haider M. Beyond the hype: big data concepts, methods, and analytics. Int J Inform Manag. 2015;35(2):137–44.

Golchha N. Big data-the information revolution. Int J Adv Res. 2015;1(12):791–4.

Hariri RH, Fredericks EM, Bowers KM. Uncertainty in big data analytics: survey, opportunities, and challenges. J Big Data. 2019;6(1):44.

Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV. Big data analytics: a survey. J Big data. 2015;2(1):21.

Download references

Acknowledgements

The authors would like to thank all the reviewers for their rigorous review and comments in several revision rounds. The reviews are detailed and helpful to improve and finalize the manuscript. The authors are highly grateful to them.

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Iqbal H. Sarker

Chittagong University of Engineering and Technology, Chittagong, 4349, Bangladesh

La Trobe University, Melbourne, VIC, 3086, Australia

A. S. M. Kayes, Paul Watters & Alex Ng

University of Nevada, Reno, USA

Shahriar Badsha

Macquarie University, Sydney, NSW, 2109, Australia

Hamed Alqahtani

You can also search for this author in PubMed   Google Scholar

Contributions

This article provides not only a discussion on cybersecurity data science and relevant methods but also to discuss the applicability towards data-driven intelligent decision making in cybersecurity systems and services. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and Permissions

About this article

Cite this article.

Sarker, I.H., Kayes, A.S.M., Badsha, S. et al. Cybersecurity data science: an overview from machine learning perspective. J Big Data 7 , 41 (2020). https://doi.org/10.1186/s40537-020-00318-5

Download citation

Received : 26 October 2019

Accepted : 21 June 2020

Published : 01 July 2020

DOI : https://doi.org/10.1186/s40537-020-00318-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

research papers in database security

Home  >  Learning Center  >  DataSec  >  Database Security  

Article's content

Database security, what is database security.

Database security includes a variety of measures used to secure database management systems from malicious cyber-attacks and illegitimate use. Database security programs are designed to protect not only the data within the database, but also the data management system itself, and every application that accesses it, from misuse, damage, and intrusion.

Database security encompasses tools, processes, and methodologies which establish security inside a database environment.

Database Security Threats

Many software vulnerabilities , misconfigurations, or patterns of misuse or carelessness could result in breaches. Here are a number of the most known causes and types of database security cyber threats.

Insider Threats

An insider threat is a security risk from one of the following three sources, each of which has privileged means of entry to the database:

An insider threat is one of the most typical causes of database security breaches and it often occurs because a lot of employees have been granted privileged user access.

Blog: How Insider Threats Drive Better Data Protection Strategies.

Human Error

Weak passwords, password sharing, accidental erasure or corruption of data, and other undesirable user behaviors are still the cause of almost half of data breaches reported.

Exploitation of Database Software Vulnerabilities

Attackers constantly attempt to isolate and target vulnerabilities in software, and database management software is a highly valuable target. New vulnerabilities are discovered daily, and all open source database management platforms and commercial database software vendors issue security patches regularly. However, if you don’t use these patches quickly, your database might be exposed to attack.

Even if you do apply patches on time, there is always the risk of zero-day attacks , when attackers discover a vulnerability, but it has not yet been discovered and patched by the database vendor.

Blog: Imperva Protects from New Spring Framework Zero-Day Vulnerabilities.

SQL/NoSQL Injection Attacks

A database-specific threat involves the use of arbitrary non-SQL and SQL attack strings into database queries. Typically, these are queries created as an extension of web application forms, or received via HTTP requests. Any database system is vulnerable to these attacks, if developers do not adhere to secure coding practices, and if the organization does not carry out regular vulnerability testing.

Buffer Overflow Attacks

Buffer overflow takes place when a process tries to write a large amount of data to a fixed-length block of memory, more than it is permitted to hold. Attackers might use the excess data, kept in adjacent memory addresses, as the starting point from which to launch attacks.

Denial of Service (DoS/DDoS) Attacks

In a denial of service (DoS) attack, the cybercriminal overwhelms the target service—in this instance the database server—using a large amount of fake requests. The result is that the server cannot carry out genuine requests from actual users, and often crashes or becomes unstable.

In a distributed denial of service attack (DDoS), fake traffic is generated by a large number of computers, participating in a botnet controlled by the attacker. This generates very large traffic volumes, which are difficult to stop without a highly scalable defensive architecture. Cloud-based DDoS protection services can scale up dynamically to address very large DDoS attacks .

Malware is software written to take advantage of vulnerabilities or to cause harm to a database. Malware could arrive through any endpoint device connected to the database’s network. Malware protection is important on any endpoint, but especially so on database servers, because of their high value and sensitivity.

An Evolving IT Environment

The evolving IT environment is making databases more susceptible to threats. Here are trends that can lead to new types of attacks on databases, or may require new defensive measures:

How Can You Secure Your Database Server?

A database server is a physical or virtual machine running the database. Securing a database server, also known as “hardening”, is a process that includes physical security, network security, and secure operating system configuration.

db security

Ensure Physical Database Security

Refrain from sharing a server for web applications and database applications, if your database contains sensitive data. Although it could be cheaper, and easier, to host your site and database together on a hosting provider, you are placing the security of your data in someone else’s hands.

If you do rely on a web hosting service to manage your database, you should ensure that it is a company with a strong security track record. It is best to stay clear of free hosting services due to the possible lack of security.

If you manage your database in an on-premise data center, keep in mind that your data center is also prone to attacks from outsiders or insider threats. Ensure you have physical security measures, including locks, cameras, and security personnel in your physical facility. Any access to physical servers must be logged and only granted to authorized individuals.

In addition, do not leave database backups in locations that are publicly accessible, such as temporary partitions, web folders, or unsecured cloud storage buckets.

Lock Down Accounts and Privileges

Let’s consider the Oracle database server. After the database is installed, the Oracle database configuration assistant (DBCA) automatically expires and locks most of the default database user accounts.

If you install an Oracle database manually, this doesn’t happen and default privileged accounts won’t be expired or locked. Their password stays the same as their username, by default. An attacker will try to use these credentials first to connect to the database.

It is critical to ensure that every privileged account on a database server is configured with a strong, unique password. If accounts are not needed, they should be expired and locked.

For the remaining accounts, access has to be limited to the absolute minimum required. Each account should only have access to the tables and operations (for example, SELECT or INSERT) required by the user. Avoid creating user accounts with access to every table in the database.

Regularly Patch Database servers

Ensure that patches remain current. Effective database patch management is a crucial security practice because attackers are actively seeking out new security flaws in databases, and new viruses and malware appear on a daily basis.

A timely deployment of up-to-date versions of database service packs, critical security hotfixes, and cumulative updates will improve the stability of database performance.

Disable Public Network Access

Organizations store their applications in databases. In most real-world scenarios, the end-user doesn’t require direct access to the database. Thus, you should block all public network access to database servers unless you are a hosting provider. Ideally, an organization should set up gateway servers (VPN or SSH tunnels) for remote administrators.

Encrypt All Files and Backups

Irrespective of how solid your defenses are, there is always a possibility that a hacker may infiltrate your system. Yet, attackers are not the only threat to the security of your database. Your employees may also pose a risk to your business. There is always the possibility that a malicious or careless insider will gain access to a file they don’t have permission to access.

Encrypting your data makes it unreadable to both attackers and employees. Without an encryption key, they cannot access it, this provides a last line of defense against unwelcome intrusions. Encrypt all-important application files, data files, and backups so that unauthorized users cannot read your critical data.

Database Security Best Practices

Here are several best practices you can use to improve the security of sensitive databases.

Actively Manage Passwords and User Access

If you have a large organization, you must think about automating access management via password management or access management software. This will provide permitted users with a short-term password with the rights they need every time they need to gain access to a database.

It also keeps track of the activities completed during that time frame and stops administrators from sharing passwords. While administrators may feel that sharing passwords is convenient, however, doing so makes effective database accountability and security almost impossible.

In addition, the following security measures are recommended:

Test Your Database Security

Once you have put in place your database security infrastructure, you must test it against a real threat. Auditing or performing penetration tests against your own database will help you get into the mindset of a cybercriminal and isolate any vulnerabilities you may have overlooked.

To make sure the test is comprehensive, involve ethical hackers or recognized penetration testing services in your security testing. Penetration testers provide extensive reports listing database vulnerabilities, and it is important to quickly investigate and remediate these vulnerabilities. Run a penetration test on a critical database system at least once per year.

Use Real-Time Database Monitoring

Continually scanning your database for breach attempts increases your security and lets you rapidly react to possible attacks.

In particular, File Integrity Monitoring (FIM) can help you log all actions carried out on the database’s server and to alert you of potential breaches. When FIM detects a change to important database files, ensure security teams are alerted and able to investigate and respond to the threat.

Use Web Application and Database Firewalls

You should use a firewall to protect your database server from database security threats. By default, a firewall does not permit access to traffic. It needs to also stop your database from starting outbound connections unless there is a particular reason for doing so.

As well as safeguarding the database with a firewall, you must deploy a web application firewall (WAF). This is because attacks aimed at web applications, including SQL injection , can be used to gain illicit access to your databases.

A database firewall will not stop most web application attacks, because traditional firewalls operate at the network layer, while web application layers operate at the application layer (layer 7 of the OSI model ). A WAF operates at layer 7 and is able to detect malicious web application traffic, such as SQL injection attacks, and block it before it can harm your database.

Imperva Database Security

Imperva provides an industry-leading Web Application Firewall , which can prevent web application attacks that affect databases, including SQL injection. We also provide file integrity protection (FIM) and file security technology, defending sensitive files from cybercriminals and malicious insiders.

In addition, Imperva protects all cloud-based data stores to ensure compliance and preserve the agility and cost benefits you get from your cloud investments:

Cloud Data Security – Simplify securing your cloud databases to catch up and keep up with DevOps. Imperva’s solution enables cloud-managed services users to rapidly gain visibility and control of cloud data.

Database Security – Imperva delivers analytics, protection, and response across your data assets, on-premise and in the cloud – giving you the risk visibility to prevent data breaches and avoid compliance incidents. Integrate with any database to gain instant visibility, implement universal policies, and speed time to value.

Data Risk Analysis – Automate the detection of non-compliant, risky, or malicious data access behavior across all of your databases enterprise-wide to accelerate remediation.

Latest Blogs

mobile pay on pos with phone

research papers in database security

Luke Richardson

Jan 16, 2023 5 min read

API Security the new Battle Ground in Cybersecurity

Dec 8, 2022 3 min read

Latest Articles

576.7k Views

529.8k Views

478.3k Views

378.6k Views

363.2k Views

243.1k Views

240.8k Views

The State of Security Within eCommerce in 2022

Learn how automated threats and API attacks on retailers are increasing

Prevoty is now part of the Imperva Runtime Protection

Protection against zero-day attacks

No tuning, highly-accurate out-of-the-box

Effective against OWASP top 10 vulnerabilities

An Imperva security specialist will contact you shortly.

Top 3 US Retailer

We use cookies to enhance our website for you. Proceed if you agree to this policy or learn more about it.

Data Security Research Papers Samples For Students

17 samples of this type

WowEssays.com paper writer service proudly presents to you a free database of Data Security Research Papers meant to help struggling students tackle their writing challenges. In a practical sense, each Data Security Research Paper sample presented here may be a guide that walks you through the essential phases of the writing procedure and showcases how to pen an academic work that hits the mark. Besides, if you need more visionary help, these examples could give you a nudge toward an original Data Security Research Paper topic or inspire a novice approach to a threadbare subject.

In case this is not enough to slake the thirst for effective writing help, you can request personalized assistance in the form of a model Research Paper on Data Security crafted by an expert from scratch and tailored to your specific directives. Be it a plain 2-page paper or an in-depth, lengthy piece, our writers specialized in Data Security and related topics will deliver it within the pre-agreed timeframe. Buy cheap essays or research papers now!

Risk Management Research Paper Examples

Boss i think someone stole our customer data research paper.

In “Boss, I Think Someone Stole Our Customer Data,” four potential solutions for the problem of Flayton Electronics’ recent security leak, in which thousands of customers had their information leaked and abused due to a downed firewall that left them vulnerable. The company must determine what step to take next - this paper will address the most effective of the four commentaries, and the project management plan that arises from it.

Free Research Paper About Cost Implication Of Social Media On Personal Data Security

Introduction.

Don't waste your time searching for a sample.

Get your research paper done by professional writers!

Just from $10/page

Privacy Laws In Information Technology Research Paper Examples

Information pervasiveness, along with all its benefits, brings concerns with respect to security issues. Data is no longer hidden behind the walls of companies. It does not reside only on mainframes physically isolated within an organization where all kind of physical security measures are taken to defend the data and the systems. Systems are increasingly open and interconnected, which poses new challenges for security technologies.

Good Research Paper On The Literature Review Part 2

Literature review part 2, good example of research paper on enterprise resource planning, free research paper on privacy in information technology, cloud computing security issues research paper example.

As organizational needs expand beyond current limits, companies seek flexible, cost-effective, and proven ways of providing their services without compromising security. Cloud computing present businesses and companies a massive chance for growth and efficient consumer IT service delivery over the internet. However, the added level of risk culminating from a combination of various technologies results in security and privacy issues. This paper is going to explore the main vulnerabilities and threats highlighted in Cloud Computing literature as the basis of developing counter applications.

Free Research Paper On Capstone Simulation Round 2

Free research paper on insider threat and cyber threat, a brief review, multiuser sql database research paper examples, which multi-user database will be capable of handling the needs of centre for disease control (cdc), good research paper on data security, good research paper on employee motivation, interview methodology, cloud security research paper example, example of cloud computing research paper, identity theft research paper example, enterprise system risks research paper examples, controls and access.

Password recovery email has been sent to [email protected]

Use your new password to log in

You are not register!

Short on a deadline?

Don't waste time. Get help with 11% off using code - GETWOWED

No, thanks! I'm fine with missing my deadline

AnyFreePapers

Research Paper on Database Security

Free database security research paper:.

Database security includes several technologies to ensure the protection of databases and applications, servers, and systems that use databases. To implement the comprehensive protection, many different tools are used, starting with the physical security and ending with antivirus software.

Among the most common threats to the database systems functioning are unauthorized access and use of data, damage caused by malicious software, hacker attacks intended to disable temporary the system, technical problems with the database equipment, the physical damage to equipment due to natural disasters, failure of power supply, errors in the design of the system and the software failures, problems related to the human factor.

We can write a Custom Research Paper on Database Security for you!

Usually databases system are equipped with serious protection in order to make impossible any unauthorized access to the system data and management. For this there are a number of specialized software tools that have a fairly high efficiency. However, due to the specifics of these systems, which are basically a repository of diverse information provided to the general or limited access, the main feature that such protection required is flexibility.

Some companies have a comprehensive approach to the protection of their database systems and develop methods of universal protection. Such complex security is primarily a standardization of the premises, equipment, and software. There is also a development of a strict procedure of the personnel access to protected data

Information is the most valuable product of our time. Who owns the information, owns the world. In these conditions, data protection is of paramount importance. For providing database security huge amount of money is spend, but the effectiveness of the means used is far from absolute. Students, who are interested in such a topic, should give full consideration to the mechanisms of functioning of these systems and carefully explore the technology for their protection. It is necessary for young researchers to see clearly the effectiveness of a particular approach to ensure information security. Students should carefully consider the question of flexibility of software protection and explain the controversial aspects of the defense.

To write a research paper on Database security, the subject of your investigation must be profoundly studied. This topic is too complex to take up its investigation without having behind you a decent base for doing it. So first of all you have to study thoroughly the subject of your research, and only then proceed to the presentation of your opinions on this matter. But it must be done skillfully, it is not enough to just present the results of you research, you need to show persuasive argumentation that your thoughts are worthy of attention and that the subject of your work is actually urgent. To cope with this problem, you need to learn some free examples of research papers on database security. They will help you understand the basic principles for interesting and detailed scientific text.

Get Custom Research Paper on Any Topic

EffectivePapers.com is a professional academic paper writing service committed to writing non-plagiarized custom research papers of top quality. All academic papers are written from scratch by highly qualified research paper writers you can hire online. Just proceed with your order, and we will find the best expert for you!

Related Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Skip to Main Content

IEEE Account

Purchase Details

Profile Information

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

burger-menu

research papers in database security

Data Security Research Paper

Health ehr policy pros and cons.

Health information technology can advance the health of individuals and aid with the performance of providers to produce and improve quality and cost savings in patients’ health. In 2009, Congress passed, and former President Obama signed into law the Health Information Technology for Economic and Clinical Health (HITECH) Act, as part of the American Recovery and Reinvestment Act (Buntin, Burke, Hoaglin and Blumenthal, 2011). Authorized by the HITECH Act, the Office of the National Coordinator for Health Information Technology’s (ONC) has worked on health IT. The Health IT created legislation and regulations to provide requirements and certification criteria that the EHRs must meet to ensure health care

Strengths And Weaknesses Of HIPAA System

Information security considered as the procedure of protecting information against unauthorized access, disclosure, disruption, modification, use, or destroyed. In other word information security include defending information whatever the form this data may take. Although each organization employ information security to protect its secret data, but security breaches or identity theft may take place, security breach mean illegal access to defined categories of personal information. In other word it mean illegal access to personal information to use, destroy or amend it (Cate, 2008, p.4).

Hrm/531 Week 1 Data Analysis Paper

Many people in the company need access to data to help them do their job better. The main questions revolve around who needs what data, and who chooses what data gets to be shared. Looking at all the pieces, as well as the IT and information assets, the governance of the data belongs to a data owner (Khatri & Brown, 2010). The main questions to be answered must include who is the data owner? Who is responsible for data quality? And who enforces the data access controls? (Brown, 2012).

The Benefits Of The Fourth Amendment

The fourth amendment allows the NSA to conduct searches of phone records to find evidence of a crime. The NSA has recently went to Apple to try and access suspects phone records, although it requires a court order. Some of the most common requests for phone files are clues for robberies, kidnappings, and suicidal prevention. George Bush created the U.S. patriot act which allowed the government to better access telephone and communications. The NSA was also conducting wiretaps and surveillance. The increased usage of computers, phones and other electronics have led to excessive crime electronically that can be discovered by the NSA following the U.S. patriot act. Law enforcement also can get access to hard drives and emails. Although, the fourth

Information that the company has on any service users, staff or other professionals (private contact numbers or information in client files relating to third party) is confidential information and should not be shared with anyone as it comes under the data protection act. All information regarding any staff member or service user that the company has is confidential information and cannot be shared outside of the company unless consent to share by the individual is provided. The company has many procedures that have to be followed in order to keep all this information confidential. Managers have locked cabinets with staff files and information such as contact details, supervision notes and emergency contact details, service users have their files and information stored in lockable cabinets and can only be accessed by staff.

The HIPAA Act Of 1996: Protecting Patient Confidentiality

Healthcare providers and organizations are obligated and bound to protect patient confidentiality by laws and regulations. Patient information may only be disclosed to those directly involved in the patient’s care or those the patient identifies as able to receive the information. The HIPAA Act of 1996 is the federal law mandating healthcare organizations and clinicians to safeguard patient’s medical information. This law corresponds with the Health Information Technology for Economic and Clinical Health Act to include security standards for protecting electronic health information. The healthcare organization is legally responsible for establishing procedures to prevent data

Biometric Safe Research Paper

Looking for the best biometric safe to keep all of your guns and valuables safe? Finding the best biometric gun safe can be difficult, especially if you don 't know where to start or don 't know very much about the various aspects of a biometric safes. Here are some tips for first choosing the aspects that are most important to you and then finding the best biometric safe to fit those aspects.

Universal Patient Identifier

Universal patient identifiers can safely enhance efficiency to connect patients to their healthcare records. Although, many patients evade the anguish from adverse events due to a misidentification from the existing patient-matching technology, however misidentification in patients can have inflated financial ramifications to hospital systems. “Denied claims can become a huge waste of time and money for any practice manager; per a recent MGMA Connection article the average cost to rework a claim is $25. When you multiply that cost by dozens of denied claims, it quickly adds up”. (Taufen, A., MA., 2014). Moreover, organizations associated with healthcare risk squandering money due to patient misidentification consequently resulting in claim

Patient Confidentiality

Technology has become an essential part of our everyday life therefore, it makes sense that doctors and hospitals get rid of the old fashioned paper charting and use technology to access patient records. Electronic health records (EHR) provide quick access to information, as doctors no longer have to wait for other providers to fax previous records to them. The accessibility of Electronic Health Records assist medical providers to make quick medical care decisions, by accessing previous care provided to patients including treatment and diagnosis. Quick access to information through EHR enables health care providers to treat patients faster as there is no need for records to be mailed or

Secure Attachment Research Paper

There are different forms of attachment the mother and infant have which occurs around 8 or 9 months. There are four different types of attachment, secure, avoidant, resistant, and disorganized. Secure attachment is which happens to about 60 - 65% of babies in the United States. It’s when the mother leaves and the baby may or may not cry but when the mother comes back, all the infant wants to do is be with its mother. It’s the infant’s way of saying that they missed their mom and everything is okay now that she’s back. A positive aspect of having secure attachment is that children interact positively with their peers, have better friendships and fewer conflicts. Avoidant attachment occurs in about

Ethical Use Of Information Technology Essay

There are many privacy areas impacted by of the use of technology. These areas include:

The Independent Commissioners Office (I. C. O)

In a health and social care setting protecting sensitive information is vital to good care practice. It is the duty of employers to ensure that their policies and procedures adequately cover Data protection and meet the Care Quality Commission standards. The laws that should be followed are the Data Protection Act 1998, and the Freedom of information act 2000. The Independent Commissioners Office (I.C.O) deals primarily with breaches of information should they occur. Below is a description of the Data Protection Act and the Freedom of Information act. It is also the duty of employers to ensure that employer’s policies and procedures adequately cover Data protection.

Pros And Cons Of Internet Privacy

"While the Internet-based economy provides many benefits, it also raises new concerns for maintaining the privacy of information. “Internet privacy is the privacy and security level of personal data published via the Internet. It is a broad term that refers to a variety of factors, techniques and technologies used to protect sensitive and private data, communications, and preferences.”[1]

Persuasive Essay On Information Technology

Technology is growing at a fast pace and every day we see a new product or service that is available. Many times it is hard to even keep up with the latest phone, computer, game console, or software. There are so many different gadgets to choose from and even the internet is on information overload. As a result, we can no longer truly expect to have privacy. However, does all this new technology really benefit us? Will we allow technology to overtake our world? We can already see the ramifications of so much technology. Adults and children have become stagnate which is affecting their health. On the other hand we can also see all the good technology can do.

More about Data Security Research Paper

Related topics.

Cybersecurity

Cybersecurity Cover Image

Cybersecurity Award 2023

Call For Nominations——Cybersecurity Award 2023

The Cybersecurity Award is presented to authors whose work represents outstanding and groundbreaking research in all essential aspects of cybersecurity. The award will be bestowed upon three distinguished papers focused on the following perspectives:

Track A--- Best Theoretical Research Paper Track B--- Best Practical Research Paper Track C--- Best Machine Learning and Security Paper

The award carries a USD 1500 prize for every winning paper and comes with a statue and certificate to commemorate.

Picture1

An ensemble deep learning based IDS for IoT using Lambda architecture

Authors: Rubayyi Alghamdi and Martine Bellaiche

Cancelable biometric schemes for Euclidean metric and Cosine metric

Authors: Yubing Jiang, Peisong Shen, Li Zeng, Xiaojie Zhu, Di Jiang and Chi Chen

RBFK cipher: a randomized butterfly architecture-based lightweight block cipher for IoT devices in the edge computing environment

Authors: Sohel Rana, M. Rubaiyat Hossain Mondal and Joarder Kamruzzaman

Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection

Authors: Haizhou Wang, Anoop Singhal and Peng Liu

Practical autoencoder based anomaly detection by using vector reconstruction error

Authors: Hasan Torabi, Seyedeh Leili Mirtaheri and Sergio Greco

Most recent articles RSS

View all articles

Survey of intrusion detection systems: techniques, datasets and challenges

Authors: Ansam Khraisat, Iqbal Gondal, Peter Vamplew and Joarder Kamruzzaman

Fuzzing: a survey

Authors: Jun Li, Bodong Zhao and Chao Zhang

Review and insight on the behavioral aspects of cybersecurity

Authors: Rachid Ait Maalem Lahcen, Bruce Caulkins, Ram Mohapatra and Manish Kumar

Detecting telecommunication fraud by understanding the contents of a call

Authors: Qianqian Zhao, Kai Chen, Tongxin Li, Yi Yang and XiaoFeng Wang

A critical review of intrusion detection systems in the internet of things: techniques, deployment strategy, validation strategy, attacks, public datasets and challenges

Authors: Ansam Khraisat and Ammar Alazab

Most accessed articles RSS

Thematic Series

Data-Driven Security Edited by: Yang Liu, Xinming Ou, Xinyu Xing, Guozhu Meng

Data Security and Privacy Edited by: Dan Lin, Jingqiang Lin and Bo Luo

Information Abuse Prevention Edited by: Gang Li and Jianlong Tan

2018 System Security   Edited by: Peng Liu

AI and Security    Edited by: Xiaofeng Wang

Why submit to us

• 1 st open access journal on Cybersecurity • APC fully covered by IIE, CAS • Served by a dedicated international editorial board to give thorough swift editorial response

Aims and scope

This journal is aimed to systematically cover all essential aspects of cybersecurity, with a focus on reporting on cyberspace security issues, the latest research results, and real-world deployment of security technologies.  

The journal publishes research articles and reviews in the areas including, but not limited to:

• Cryptography and its applications • Network and critical infrastructure security • Hardware security • Software and system security • Cybersecurity data analytics • Data-driven security and measurement studies • Adversarial reasoning • Malware analysis • Privacy-enhancing technologies and anonymity • IoT Security • AI Security

Call for Papers

Thematic Series:  Security and Safety of Autonomous Driving Systems Submission Due: February 28, 2023 Guest Editors: Yinxing Xue, University of Science and Technology of China, China Yuqun Zhang, Southern University of Science and Technology, China Xi Zheng, Macquarie University, Australia  

Thematic Series:  New Advanced Techniques in Secure Data Analysis Submission Due: February 28, 2023 Guest Editors: Zheli Liu, Nankai University, China Yang Xiang, Swinburne University of Technology, Australia Lingyu Wang, Concordia University, Canada

Upcoming Events

NDSS Symposium 2023 (27 Feb–3 March 2023, San Diego, CA, USA) 32 USENIX SECURITY SYMPOSIUM (9–11 August 2023, Anaheim, CA, USA)

Editor-in-Chief: MENG Dan

New Content Item (1)

Full Professor in Institute of Information Engineering (IIE), Chinese Academy of Sciences (CAS). His work focuses on network and system security, parallel distributed processing. He has lead important research projects including Dawning supercomputers, National Science and Technology Major Project, National High Technology Research and Development Program of China, and strategic priority research program of CAS. He has published over one hundred peer-reviewed papers. He is the director of IIE, after serving as the deputy director of IIE, the deputy director of the High Technology Research and Development Bureau of CAS.

Executive Editor-in-Chief: LIU Peng

New Content Item (1)

LIU Peng received his BS and MS degrees from the University of Science and Technology of China, and his PhD from George Mason University in 1999.  Dr. Liu is a Professor of Information Sciences and Technology, founding Director of the Center for Cyber-Security, Information Privacy, and Trust, and founding Director of the Cyber Security Lab at Penn State University.   His research interests are in all areas of computer and network security.  He has published a monograph and over 260 refereed technical papers.  His research has been sponsored by NSF, ARO, AFOSR, DARPA, DHS, DOE, AFRL, NSA, TTC, CISCO, and HP.  He has served as a program (co-)chair or general (co-)chair for over 10 international conferences (e.g., Asia CCS 2010) and workshops (e.g., MTD 2016). He chaired the Steering Committee of SECURECOMM during 2008-14. He has served on over 100 program committees and reviewed papers for numerous journals. He is an associate editor for IEEE TDSC. He is a recipient of the DOE Early Career Principle Investigator Award.  He has co-led the effort to make Penn State a NSA-certified National Center of Excellence in Information Assurance Education and Research.  He has advised or co-advised over 30 PhD dissertations to completion.

Affiliated with

New Content Item

The Institute of Information Engineering (IIE) is a national research institute in Beijing that specializes in comprehensive research on theories and applications related to information technology.

IIE strives to be a leading global academic institution by creating first-class research platforms and attracting top researchers. It also seeks to become an important national strategic power in the field of information technology.

IIE’s mission is to promote China’s innovation and industrial competitiveness by advancing information science, standards, and technology in ways that enhance economic security and public safety as well as improve our quality of life.

Read more..

The journal is indexed by

Annual Journal Metrics

Citation impact 1.740 -  Source Normalized Impact per Paper (SNIP) 1.242 -  SCImago Journal Rank (SJR) 6.1 -  CiteScore

Speed 11 days to first decision for all manuscripts (Median) 51 days to first decision for reviewed manuscripts only (Median)

Usage  284,555 downloads (2022) 223 Altmetric mentions (2022)

IMAGES

  1. Cyber Security Research Papers 2021

    research papers in database security

  2. Network security research papers 2013 calendar

    research papers in database security

  3. 😊 Computer security research paper topics. Topic ideas for cyber security research paper

    research papers in database security

  4. 😊 Network security research papers. Network Security Research Paper. 2019-03-03

    research papers in database security

  5. 005 Largepreview Database Security Research ~ Museumlegs

    research papers in database security

  6. 005 Database Researchs Free Download ~ Museumlegs

    research papers in database security

VIDEO

  1. Exam Papers Plus || 11 Plus (11+) School Search Option

  2. Functions of research report

  3. Day 1: Oracle Database Security 11gr2

  4. Research Administration and Data: What the Data Says About Us

  5. Research Report 1 Accessibility

  6. Research Basics: Searching Databases Part 1

COMMENTS

  1. Database Security: An Overview and Analysis of Current Trend

    Database Security: An Overview and Analysis of Current Trend International Journal of Management, Technology, and Social Sciences (IJMTS), 4 (2), 53- 58. ISSN: 2581-6012, 2019 6 Pages Posted: 19 Dec 2019 Last revised: 20 May 2020 Prantosh Paul Raiganj University P. S. Aithal Institute of Management and Commerce, Srinivas University

  2. Database security

    In this paper, we first survey the most relevant concepts underlying the notion of database security and summarize the most well-known techniques. We focus on access control systems, on which a large body of research has been devoted, and describe the key access control models, namely, the discretionary and mandatory access control models, and ...

  3. Security Analysis, Threats, & Challenges in Database

    This research paper assesses existing explorations and research challenges on this specific area. Content uploaded by Muhammed Rijah Author content Content may be subject to copyright....

  4. (PDF) Security Of Database Management Systems

    This paper addresses the relational database threats and security techniques considerations in relation to situations: threats, countermeasures (computer-based controls) and database...

  5. Database Security: An Essential Guide

    When evaluating database security in your environment to decide on your team's top priorities, consider each of the following areas: Physical security: Whether your database server is on-premise or in a cloud data center, it must be located within a secure, climate-controlled environment. (If your database server is in a cloud data center ...

  6. The Overview of Database Security Threats' Solutions: Traditional and

    The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria are the influencing factors of database security.

  7. Research Paper: Database Security

    TOPIC: Research Paper on Database Security Assignment Over the last several years, the issue of database security has been increasingly brought to the forefront. Part of the reason for this, is because larger amounts of data are being stored online and in mainframe computers.

  8. Research Data Security

    The protection of research data is a fundamental responsibility, rooted in regulatory and ethical principles and should be upheld by all data stewards. The Research Data Security Guidelines pertain to researchers and research team members who obtain, access or generate research data, regardless of whether the data is associated with funding or ...

  9. PDF Database Security: Attacks and Techniques

    security is an important factor to provide integrity, confidentiality and availability of data. This paper generally provide a review of need of database security, attacks possible on databases and their prevention techniques. Keywords- Access Control, Active Attack, Attacker, Database, SQLIA. 1. INTRODUCTION . he data plays an crucial role in ...

  10. Cybersecurity data science: an overview from machine learning

    In a computing context, cybersecurity is undergoing massive shifts in technology and its operations in recent days, and data science is driving the change. Extracting security incident patterns or insights from cybersecurity data and building corresponding data-driven model, is the key to make a security system automated and intelligent. To understand and analyze the actual phenomena with data ...

  11. What is Database Security

    Database security programs are designed to protect not only the data within the database, but also the data management system itself, and every application that accesses it, from misuse, damage, and intrusion. Database security encompasses tools, processes, and methodologies which establish security inside a database environment.

  12. Data Security Research Papers Samples For Students

    Data Security Research Papers Samples For Students 17 samples of this type WowEssays.com paper writer service proudly presents to you a free database of Data Security Research Papers meant to help struggling students tackle their writing challenges.

  13. Research Paper on Database Security

    Free Database Security Research Paper: Database security includes several technologies to ensure the protection of databases and applications, servers, and systems that use databases. To implement the comprehensive protection, many different tools are used, starting with the physical security and ending with antivirus software.

  14. Review Paper on Data Security in Cloud Computing Environment

    Review Paper on Data Security in Cloud Computing Environment Abstract: In today's scenario cloud computing in growing concept in the computer and information technology field for storing the data. Because on the physical storage device whatever the huge amount of data which is sensitive or confidential that cannot be stored.

  15. PDF DATA SECURITY PROCEDURES FOR RESEARCHERS

    often simplify and accelerate the research data flow. Reduce the data security threat-level a priori by acquiring and handling only the minimum amount of sensitive data strictly needed for the research study. ... Paper-based surveys should also be designed in a way such that PII is removable. Refer to Figure 1 for a mock-up of

  16. Database Security Research Paper

    The major properties of database security are: 1) confidentiality 2) integrity and 3)availability. Fig1. depicts the major properties for database security. Fig. 1 Properties of DB Security As we know, confidentiality gives rise to the concept of data hiding. It shares a part of data to the outside world and the rest of the things remain hidden.

  17. Data Security Research Paper

    Data security is the protective digital privacy actions that are applied to avoid unauthorized admission to computers, websites, and databases. It also protects data from corruption. Data security is essential for every different size and type of organizations in IT.

  18. Data Security and Privacy in Cloud Computing

    In this paper, we will review different security techniques and challenges for data storage security and privacy protection in the cloud computing environment. As Figure 1 shows, this paper presents a comparative research analysis of the existing research work regarding the techniques used in the cloud computing through data security aspects ...

  19. PDF A Review on Database Security

    This paper is all about the security of database management systems, as an example of how application security can be designed and implemented for specific task. There is substantial current interest in DBMS Security because databases are newer than ... International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Volume 3 Issue ...

  20. Cybersecurity

    The award will be bestowed upon three distinguished papers focused on the following perspectives: Track A--- Best Theoretical Research Paper Track B--- Best Practical Research Paper Track C--- Best Machine Learning and Security Paper. The award carries a USD 1500 prize for every winning paper and comes with a statue and certificate to commemorate.