Lior Rokach is an Associate Professor of Information Systems and Software Engineering at Ben-Gurion University of the Negev. Dr. Rokach is a recognized expert in intelligent information systems and has held several leading positions in this field. His main areas of interest are Machine Learning, Information Security, Recommender Systems and Information Retrieval.
more ...
Data Mining with Decision Trees: Theory and Applications Lior Rokach and Oded Maimon Series in Machine Perception and Artificial Intelligence -
Vol. 61 World Scientific Publishing, 2007, 270 p, Hardcover,
ISBN:981-2771-719
A Survey of Data Leakage Detection and Prevention Solutions Asaf Shabtai, Yuval Elovici, Lior Rokach SpringerBriefs in Computer Science Springer, 2012, Hardcover,
ISBN:978-1-4614-2052-1
Pattern Classification Using Ensemble Methods Lior Rokach Series in Machine Perception and Artificial Intelligence - Vol. 75 World Scientific Publishing, 2010, 225 p, Hardcover,
ISBN:981-4271-063
The Data Mining and Knowledge Discovery Handbook A
Complete Guide for Practitioners and Researchers Oded Maimon and Lior
Rokach (Eds.) Springer, 2005, XXXVI, 1383 p. 400 illus., Hardcover
ISBN: 0-387-24435-2
The Data Mining and Knowledge Discovery Handbook - Second Edition A
Complete Guide for Practitioners and Researchers Oded Maimon and Lior
Rokach (Eds.) Springer, 2010, 1305 p., Hardcover
ISBN: 0387098224
Decomposition Methodology for Knowledge Discovery and Data
Mining:
Theory and Applications Oded Maimon and Lior
Rokach Series in Machine Perception and Artificial Intelligence -
Vol. 61 World Scientific Publishing, 2005, 323 p, Hardcover,
ISBN:981-256-079-3
Rokach, L., and Mitra, P.Parsimonious Citer-Based Measures: Artificial Intelligence Domain as a Case Study.2013.JASIST. Bibtex
Fire, M.; Tenenboim, L.; Lesser, O.; Puzis, R.; Rokach, L.; and Elovici, Y.Computationally Efficient Link Prediction in Variety of Social Networks.2013.ACM Transactions on Intelligent Systems and Technology. Bibtex
Shani, G.; Rokach, L.; Shapira, B.; Hadash, S.; and Tangi, M.Investigating Confidence Displays for Top-N Recommendations.2013.JASIST. Bibtex
Dror, M.; Shabtai, A.; Rokach, L.; and Elovici, Y.OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage.2013.IEEE Trans. Knowl. Data Eng. Bibtex
Fire, M.; Katz, G.; Rokach, L.; and Elovici, Y.Links Reconstruction Attack.2013.Springer New York. Bibtex
inproceedings (10)
Tenenboim-Chekina, L.; Rokach, L.; and Shapira, B.Ensemble of Feature Chains for Anomaly Detection.2013.In Multiple Classifier Systems, Volume 7872, 295-306, Springer Berlin Heidelberg. Ensemble of Feature Chains for Anomaly DetectionBibtex
Khalastchi, E.; Kalech, M.; and Rokach, L.Sensor fault detection and diagnosis for autonomous systems.2013.In The 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2013). Bibtex
Tenenboim-Chekina, L.; Barad, O.; Shabtai, A.; Mimran, D.; Rokach, L.; Shapira, B.; and Elovici, Y.Detecting Application Update Attack on Mobile Devices through Network Features.2013.In INFOCOM. Bibtex
Choudhury, S. R.; Tuarob, S.; Mitra, P.; Rokach, L.; and Giles, C. L.ChemXSeer Figure Search: A Chemical Literature Figure Search Engine.2013.In JCDL '13: 13th ACM/IEEE-CS Joint Conference on Digital Libraries Proceedings.. Bibtex
Rokach, L.; Kalech, M.; Provan, G.; and Feldman, A.Machine-Learning-Based Circuit Synthesis.2013.In IJCAI. Bibtex
Rokach, L.; Mitra, P.; Kataria, S.; Huang, W.; and Giles, L.A Supervised Learning Method for Context-Aware Citation Recommendation in a Large Corpus.2013.In The 10th Workshop on Large-Scale Distributed Systems for Information Retrieval, LSDS-IR 2013, Co-located with ACM WSDM 2013. A Supervised Learning Method for Context-Aware Citation Recommendation in a Large CorpusBibtex
Ofek, N.; Daranyi, S.; and Rokach, L.Linking Motif Sequences to Tale Type Families by Machine Learning.2013.In Workshop on Computational Models of Narrative. Bibtex
Ofek, N.; Caragea, C.; Biyani, P.; Yen, J.; Rokach, L.; and Mitra, P.Improving Sentiment Analysis in an Online Cancer Survivor Community Using Dynamic Sentiment Lexicon.2013.In First International Workshop on Public Health in the Digital Age: Social Media, Crowdsourcing and Participatory Systems, WWW, 2013.. Bibtex
Most recommender systems, such as collaborative filtering, cannot provide personalized recommendations until a user profile has been created. This is known as the new user cold-start problem. Several systems try to learn the new users #x2019; profiles as part of the sign up process by asking them to provide feedback regarding several items. We present a new, anytime preferences elicitation method that uses the idea of pairwise comparison between items. Our method uses a lazy decision tree, with pairwise comparisons at the decision nodes. Based on the user #x2019;s response to a certain comparison, we select on-the-fly what pairwise comparison should next be asked. A comparative field study has been conducted to examine the suitability of the proposed method for eliciting the user #x2019;s initial profile. The results indicate that the proposed pairwise approach provides more accurate recommendations than existing methods and requires less effort when signing up newcomers.
In this paper, we introduce a novel approach to generate an intention prediction model of user interactions with systems. As part of this new approach, we include personal aspects, such as user characteristics, that can increase prediction accuracy. The model is automatically trained according to the user's fixed attributes (e.g., demographic data such as age and gender) and the user's sequences of actions in the system. The generated model has a tree structure. The building blocks of each node can be any probabilistic sequence model [such as hidden Markov models (HMMs) and conditional random fields (CRFs)] and each node is split according to user attributes. Thus, we refer to this algorithm as an attribute-driven model tree. The new model was first tested on simulated data in which users with different attributes (such as age and gender) behave differently when trying to accomplish various tasks. We then validated the ability of the algorithm to discover the relevant attributes. We tested our algorithm on two real datasets: from a web application and a mobile application dataset. The results were encouraging and indicate the capability of the proposed method to discover the correct user intention model and increasing intention prediction accuracy compared with single HMM or CRF models.
User authentication based on username and password is the most common means to enforce access control. This form of access restriction is prone to hacking since stolen usernames and passwords can be exploited to impersonate legitimate users in order to commit malicious activity. Biometric authentication incorporates additional user characteristics such as the manner by which the keyboard is used in order to identify users. We introduce a novel approach for user authentication based on the keystroke dynamics of the password entry. A classifier is tailored to each user and the novelty lies in the manner by which the training set is constructed. Specifically, only the keystroke dynamics of a small subset of users, which we refer to as representatives, is used along with the password entry keystroke dynamics of the examined user. The contribution of this approach is twofold: it reduces the possibility of overfitting, while allowing scalability to a high volume of users. We propose two strategies to construct the subset for each user. The first selects the users whose keystroke profiles govern the profiles of all the users, while the second strategy chooses the users whose profiles are the most similar to the profile of the user for whom the classifier is constructed. Results are promising reaching in some cases 90 #x0025; area under the curve. In many cases, a higher number of representatives deteriorate the accuracy which may imply overfitting. An extensive evaluation was performed using a dataset containing over 780 users.
Detecting and preventing data leakage and data misuse poses a serious challenge for organizations, especially when dealing with insiders with legitimate permissions to access the organization's systems and its critical data. In this paper, we present a new concept, Misuseability Weight, for estimating the risk emanating from data exposed to insiders. This concept focuses on assigning a score that represents the sensitivity level of the data exposed to the user and by that predicts the ability of the user to maliciously exploit this data. Then, we propose a new measure, the M-score, which assigns a misuseability weight to tabular data, discuss some of its properties, and demonstrate its usefulness in several leakage scenarios. One of the main challenges in applying the M-score measure is in acquiring the required knowledge from a domain expert. Therefore, we present and evaluate two approaches toward eliciting misuseability conceptions from the domain expert.
Decision trees have three main disadvantages: reduced performance when the training set is small, rigid decision criteria and the fact that a single #x0022;uncharacteristic #x0022; attribute might #x0022;derail #x0022; the classification process. In this paper we present ConfDTree - a post-processing method which enables decision trees to better classify outlier instances. This method, which can be applied on any decision trees algorithm, uses confidence intervals in order to identify these hard-to-classify instances and proposes alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%-9% in the AUC performance is reported.
Dahan, H.; Maimon, O.; Cohen, S.; and Rokach, L.Proactive data mining using decision trees.2012.In Electrical \& Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of, 1--5, IEEE. Proactive data mining using decision treesBibtexAbstract:
Most of the existing data mining algorithms are #x2018;passive #x2019;. That is, they produce models which can describe patterns, but leave the decision on how to react to these patterns in the hands of the user. In contrast, in this work we describe a proactive approach to data mining, and describe an implementation of that approach, using decision trees. We show that the proactive role requires the algorithms to consider additional domain knowledge, which is exogenous to the training set. We also suggest a novel splitting criterion, termed maximalutility, which is driven by the proactive agenda.
Khalastchi, E.; Kalech, M.; Rokach, L.; Shicel, Y.; and Bodek, G.Sensor fault detection and diagnosis for autonomous systems.2012.In 23rd International Workshop on Principles of Diagnosis (DX 2012). Sensor fault detection and diagnosis for autonomous systemsBibtex
Fire, M.; Kagan, D.; Puzis, R.; Rokach, L.; and Elovici, Y.Data mining opportunities in geosocial networks for improving road safety.2012.In Electrical \& Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of, 1--4, IEEE. Data mining opportunities in geosocial networks for improving road safetyBibtexAbstract:
Traffic measurements, road safety studies, and surveys are required for efficient road planning and ensuring the safety of transportation. Unfortunately, these methods can be cumbersome and very expensive. In this paper we point out a source of transportation information that is based on collaborative community-based navigation applications, such as Waze. Partial and anonimized information publicly exposed by Waze through their application provides valuable information that can significantly ease the future of transportation studies. Moreover, we show that Waze user reports may expose locations plagued with accidents but in lacking police coverage. This knowledge may help police departments to improve road safety by relocating the police units to these locations. Lastly, the data discussed in this paper connects transportation and road safety research to location based services and social network platforms.
Khalastchi, E.; Kalech, M.; and Rokach, L.Multi-Layered Model Based Diagnosis in Robots.2012.In 23rd International Workshop on Principles of Diagnosis (DX 2012). Multi-Layered Model Based Diagnosis in RobotsBibtex
Figueiras-Vidal, A., and Rokach, L.An Exploration of Research Directions in Machine Ensemble Theory and Applications.2012.In European Symposium on Artificial Neural Networks, Computational Intelligence, 221--226. Bibtex
Huang, W.; Kataria, S.; Caragea, C.; Mitra, P.; Giles, C. L.; and Rokach, L.Recommending citations: translating papers into references.2012.In 21st ACM International Conference on Information and Knowledge Management, CIKM'12, Maui, HI, USA, October 29 - November 02, 2012, 1910-1914. Recommending citations: translating papers into referencesBibtex
Chekina, L.; Rokach, L.; and Shapira, B.Introducing diversity among the models of multi-label classification ensemble.2012.In European Symposium on Artificial Neural Networks, Computational Intelligence, 239--244. Bibtex
Rokach, L.; Feldman, A.; Kalech, M.; and Provan, G.Machine-learning-based circuit synthesis.2012.In Electrical \& Electronics Engineers in Israel (IEEEI), 2012 IEEE 27th Convention of, 1--5, IEEE. Bibtex
Schclar, A.; Rokach, L.; and Amit, A.Diffusion Ensemble Classifiers.2012.In IJCCI 2012 - Proceedings of the 4th International Joint Conference on Computational Intelligence, Barcelona, Spain, 5 - 7 October, 2012, 443-450. Bibtex
Bitton, Y.; Fire, M.; Kagan, D.; Shapira, B.; Rokach, L.; and Bar-Ilan, J.Social Network Based Search for Experts.2012.In Symposium on Human-Computer Interaction and Information Retrieval. Social Network Based Search for ExpertsBibtex
Online social networking sites have become increasingly popular over the last few years. As a result, new interdisciplinary research directions have emerged in which social network analysis methods are applied to networks containing hundreds millions of users. Unfortunately, links between individuals may be missing due to imperfect acquirement processes or because they are not yet reflected in the online network (i.e., friends in real world did not form a virtual connection.) Existing link prediction techniques lack the scalability required for full application on a continuously growing social network which may be adding everyday users with thousands of connections. The primary bottleneck in link prediction techniques is extracting structural features required for classifying links. In this paper we propose a set of simple, easy-to-compute structural features that can be analyzed to identify missing links. We show that a machine learning classifier trained using the proposed simple structural features can successfully identify missing links even when applied to a hard problem of classifying links between individuals who have at least one common friend. A new friends measure that we developed is shown to be a good predictor for missing links and an evaluation experiment was performed on five large social networks datasets: Face book, Flickr, You Tube, Academia and The Marker. Our methods can provide social network site operators with the capability of helping users to find known, offline contacts and to discover new friends online. They may also be used for exposing hidden links in an online social network.
Rokach, L.; Luke, K.; Aydin, A.; and Schwaiger, R.Recommenders Benchmark Framework.2011.In 11th International Conference on Innovative Internet Community Services (I$^\mbox2$CS 2011), June 15-17, 2011, Deutsche Telekom Laboratories, Berlin, Germany, 115-126. Recommenders Benchmark FrameworkBibtex
Bercovitch, M.; Renford, M.; Hasson, L.; Shabtai, A.; Rokach, L.; and Elovici, Y.HoneyGen: An automated honeytokens generator.2011.In 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, Beijing, China, 10-12 July, 2011, 131-136. HoneyGen: An automated honeytokens generatorBibtexAbstract:
Honeytokens are artificial digital data items planted deliberately into a genuine system resource in order to detect unauthorized attempts to use information. The honeytokens are characterized by properties which make them appear as genuine data items. Honeytokens are also accessible to potential attackers who intend to violate an organization's security in an attempt to mine information in a malicious manner. One of the main challenges in generating honeytokens is creating data items that appear as real and that are difficult to distinguish from real tokens. In this paper we present #x201C;HoneyGen #x201D; - a novel method for generating honeytokens automatically. HoneyGen creates honeytokens that are similar to the real data by extrapolating the characteristics and properties of real data items. The honeytoken generation process consists of three main phases: rule mining in which various types of rules that characterize the real data are extracted from the production database; honeytoken generation in which an artificial relational database is generated based on the extracted rules; and the likelihood rating in which a score is calculated for each honeytoken based on its similarity to the real data. A Turing-like test was performed in order to evaluate the ability of the method to generate honeytokens that cannot be detected by humans as honeytokens. The results indicate that participants were unable to distinguish honeytokens having a high likelihood score from real tokens.
Chekina, L.; Rokach, L.; and Shapira, B.Meta-learning for Selecting a Multi-label Classification Algorithm.2011.In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, Vancouver, BC, Canada, December 11, 2011, 220-227. Meta-learning for Selecting a Multi-label Classification AlgorithmBibtexAbstract:
Although various algorithms for multi-label classification have been developed in recent years, there is little, if any, information as to when each method is beneficial. The main goal of this paper is to compare the classification performance of several multi-label algorithms and to develop a set of rules or tools that will help in selecting the optimal algorithm according to a specific dataset and target evaluation measure. We utilize a meta-learning approach allowing fast automatic selection of the most appropriate algorithm for an unseen dataset based on its descriptive characteristics. We also define a list of characteristics specific for multi-label datasets. The experimental results indicate the applicability and usefulness of the meta-learning approach.
Kisilevich, S.; Keim, D. A.; Byshko, R.; Tsibelman, M.; and Rokach, L.Developing a Price Management Decision Support System for Hotel Brokers using Free and Open Source Tools.2011.In ICEIS 2011 - Proceedings of the 13th International Conference on Enterprise Information Systems, Volume 2, Beijing, China, 8-11 June, 2011, 147-156. Developing a Price Management Decision Support System for Hotel Brokers using Free and Open Source ToolsBibtex
In this paper we propose a new access control mechanism, Dynamic Sensitivity-Based Access Control (DSBAC), designed to regulate users' access to sensitive data stored in relational databases. The DSBAC is an extension of the basic mandatory access control (MAC) mechanism, and it uses the M-score (Misuseability score) measure in order to assign, dynamically, an access class to each set of tuples.
misc (3)
Shapira, B.; Mimran, D.; Meyer, J.; Rokach, L.; Peretz, S.; Glass, G.; Henke, K.; and Schneider, L.A system for detecting usability problems of users while using their mobile devices.2011.sep# 28.EP Patent 2,369,481 Bibtex
Schclar, A.; Rokach, L.; Shapira, B.; Glass, G.; Jepsen, K.; and Henke, K.System and method for the detection of usability problems in an interactive application.2011.EP Patent 2,367,113 Bibtex
Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the data set that includes private information about subjects before being released for data mining. One way to anonymize data set is to manipulate its content so that the records adhere to k-anonymity. Two common manipulation techniques used to achieve k-anonymity of a data set are generalization and suppression. Generalization refers to replacing a value with a less specific but semantically consistent value, while suppression refers to not releasing a value at all. Generalization is more commonly applied in this domain since suppression may dramatically reduce the quality of the data mining results if not properly used. However, generalization presents a major drawback as it requires a manually generated domain hierarchy taxonomy for every quasi-identifier in the data set on which k-anonymity has to be performed. In this paper, we propose a new method for achieving k-anonymity named K-anonymity of Classification Trees Using Suppression (kACTUS). In kACTUS, efficient multidimensional suppression is performed, i.e., values are suppressed only on certain records depending on other attribute values, without the need for manually produced domain hierarchy trees. Thus, in kACTUS, we identify attributes that have less influence on the classification of the data records and suppress them if needed in order to comply with k-anonymity. The kACTUS method was evaluated on 10 separate data sets to evaluate its accuracy as compared to other k-anonymity generalization- and suppression-based methods. Encouraging results suggest that kACTUS' predictive performance is better than that of existing k-anonymity algorithms. Specifically, on average, the accuracies of TDS, TDR, and kADET are lower than kACTUS in 3.5, 3.3, and 1.9 percent, respectively, despite their u- - sage of manually defined domain trees. The accuracy gap is increased to 5.3, 4.3, and 3.1 percent, respectively, when no domain trees are used.
Zilberman, P.; Shabtai, A.; and Rokach, L.Analyzing Group Communication for Preventing Accidental Data Leakage via Email.2010. Bibtex
Weiss, Y.; Fledel, Y.; Elovici, Y.; and Rokach, L.Cost-Sensitive Detection of Malicious Applications in Mobile Devices.2010.In Mobile Computing, Applications, and Services - Second International ICST Conference, MobiCASE 2010, Santa Clara, CA, USA, October 25-28, 2010, Revised Selected Papers, 382-395. Cost-Sensitive Detection of Malicious Applications in Mobile DevicesBibtex
Marom, N.; Rokach, L.; and Shmilovici, A.Using the confusion matrix for improving ensemble classifiers.2010.In Electrical and Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Convention of, 000555--000559, IEEE. Using the confusion matrix for improving ensemble classifiersBibtexAbstract:
The code matrix enables to convert a multi class problem into an ensemble of binary classifiers. We suggest a new un-weighted framework for iteratively extending the code matrix which based on confusion matrix. The confusion matrix holds important information which is exploited by the suggested framework. Evaluating the confusion matrix at each iteration enables to make a decision regarding the next one against all classifier that should be added to the current code matrix. We demonstrate the benefits of the method by applying it to Error Correcting Code based ensemble and to AdbaBoost. We use Orthogonal arrays as the basic code matrix.
One of the challenges that companies face when launching a campaign to promote new services is selecting the 'right' customers for the campaign, i.e., customers with the highest probability of a positive response. Active learning can be used to efficiently identify this set of customers. It can also prevent approach to non-relevant customers and reduce the campaign's cost. The problem is more challenging when parallel campaigns for multiple new services are launched, given a constraint on the number of promotions that can be offered to the same customer during a defined period of time. The goal is to maximize the total net profit. In this paper we present MutiCamp, a new cost sensitive active learning based algorithm that uses the Hungarian Algorithm to find the optimal match between campaigns and customers. MultiCamp was tested on a real world dataset using a decision tree classifier. Results were compared to a random baseline, indicating the superiority of the proposed algorithm.
Baltrunas, L.; Kaminskas, M.; Ricci, F.; Rokach, L.; Shapira, B.; and Luke, K.Best usage context prediction for music tracks.2010.In Proceedings of the 2nd Workshop on Context Aware Recommender Systems. Bibtex
Rokach, L., and Schclar, A.k-Anonymized Reducts.2010.In 2010 IEEE International Conference on Granular Computing, GrC 2010, San Jose, California, USA, 14-16 August 2010, 392 -395. k-Anonymized Reductsk-Anonymized ReductsBibtexAbstract:
Privacy preserving data mining aims to prevent the violation of privacy that might result from mining of sensitive data. This is commonly achieved by data anonymization. One way to anonymize data is adherence to the k-anonymity concept which requires that the probability to identify an individual by linking databases not to exceed 1/k. In this paper we propose an algorithm which utilizes rough set theory to achieve k-anonymity. The basic idea is to partition the original dataset into several disjoint reducts such that each one of them adheres to k-anonymity. We show that it is easier to make each reduct comply with k-anonymity if it does not contain all quasi-identifier attributes. Moreover, our procedure ensures that even if the attacker attempts to rejoin the reducts, the k-anonymity is still preserved.
Rokach, L., and Itach, E.An Ensemble Method for Multi-label Classification using an Approximation Algorithm for the Set Covering Problem.2010.37. Bibtex
Gershman, A.; Meisels, A.; Luke, K.; Rokach, L.; Schclar, A.; and Sturm, A.A Decision Tree Based Recommender System.2010.In 10th International Conference on Innovative Internet Community Services (I$^\mbox2$CS), Jubilee Edition 2010, June 3-5, 2010, Bangkok, Thailand, 170-179. A Decision Tree Based Recommender SystemBibtex
Kisilevich, S.; Keim, D.; and Rokach, L.Geo-Spade: A Generic Google-Earth Based Framework For Analysis And Exploration Of Spatiotemporal Data.2010.In 12th International Conference on Enterprise Information Systems (ICEIS 2010), 13--20. Bibtex
Gafny, M.; Shabtai, A.; Rokach, L.; and Elovici, Y.Detecting data misuse by applying context-based data linkage.2010.In Proceedings of the 2010 ACM workshop on Insider threats, 3--12, ACM. Bibtex
Kisilevich, S.; Keim, D.; and Rokach, L.A Novel Approach to Mining Travel Sequences Using Collections of Geotagged Photos.2010.In Geospatial Thinking, 163--182, Springer Berlin Heidelberg. Bibtex
Traditionally users are authenticated based on a username and password. However, a logged station is still vulnerable to imposters when the user leaves her computer without logging off. Keystroke dynamics methods can be useful to continuously verify a user, after the authentication process has successfully ended. Within the last decade several studies proposed the use of keystroke dynamics as a behavioral biometric tool to verify users. We propose a new method, for compactly representing the keystroke patterns by joining similar pairs of consecutive keystrokes. The proposed method considers clustering di-graphs based on their temporal features. The proposed method was evaluated on 10 legitimate users and 15 imposters. Encouraging results suggest that the proposed method detection performance is better than that of existing methods. Specifically we reach a False Acceptance Rate (FAR) of 0.41% and a False Rejection Rate (FRR) of 0.63%.
Kisilevich, S.; Keim, D. A.; and Rokach, L.GEO-SPADE - A Generic Google Earth-based Framework for Analyzing and Exploring Spatio-temporal Data.2010.In ICEIS 2010 - Proceedings of the 12th International Conference on Enterprise Information Systems, Volume 5, HCI, Funchal, Madeira, Portugal, June 8 - 12, 2010, 13-20. Bibtex
Shimshon, T.; Moskovitch, R.; Rokach, L.; and Elovici, Y.Continuous Verification Using Keystroke Dynamics.2010.In 2010 International Conference on Computational Intelligence and Security, CIS 2010, Nanning, Guangxi Zhuang Autonomous Region, China, December 11-14, 2010, 411-415. Continuous Verification Using Keystroke DynamicsBibtexAbstract:
Traditionally user authentication is based on a username and password. However, a logged station is still vulnerable to imposters when the user leaves her computer without logging-off. Keystroke dynamics methods can be useful for continuously verifying a user once the authentication process has successfully ended. However, current methods require long sessions and significant amounts of keystrokes to reliably verify users. We propose a new method that compactly represents the keystroke patterns by joining similar pairs of consecutive keystrokes. This automatically created representation reduces the session size required for inducing the user's verification model. The proposed method was evaluated on 21 legitimate users and 165 attackers. The results were encouraging and suggest that the detection performance of the proposed method is better than that of existing methods. Specifically we attained a false acceptance rate (FAR) of 3.47% and false rejection rate (FRR) of 0% using only 250 keystrokes.
Harel, A.; Shabtai, A.; Rokach, L.; and Elovici, Y.M-score: estimating the potential damage of data leakage incident by assigning misuseability weight.2010.In Proceedings of the 2010 ACM workshop on Insider threats, 13--20, ACM. Bibtex
Tenenboim-Chekina, L.; Rokach, L.; and Shapira, B.Identification of label dependencies for multi-label classification.2010.In Proceedings of the second International Workshop on Learning from Multi-Label data, 53--60. Bibtex
misc (2)
Rokach, L.; Antwarg, L.; and Shapira, B.Next-step prediction system and method.2010.aug# 25.EP Patent 2,221,719 Bibtex
Kisilevich, S.; Rokach, L.; Elovici, Y.; and Shapira, B.Efficient multi-dimensional suppression for k-anonymity.2010.EP Patent 2,228,735 Bibtex
Rokach, L., and Elovici, Y.An Overview of IDS Using Anomaly Detection.2009.Database Technologies: Concepts, Methodologies, Tools, and Applications, 384-394. An Overview of IDS Using Anomaly DetectionBibtexBuy
Chizi, B.; Rokach, L.; and Maimon, O.A Survey of Feature Selection Techniques.2009.Encyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes), 1888-1895. A Survey of Feature Selection TechniquesBibtexBuy
Itach, E.; Tenenboim, L.; and Rokach, L.An Ensemble Method for Multi-label Classification using a Transportation Model.2009.49. Bibtex
Moskovitch, R.; Feher, C.; Messerman, A.; Kirschnick, N.; Mustafic, T.; ̧Camtepe, S. A.; Löhlein, B.; Heister, U.; Möller, S.; Rokach, L.; and Elovici, Y.Identity theft, computers and behavioral biometrics.2009.In IEEE International Conference on Intelligence and Security Informatics, ISI 2009, Dallas, Texas, USA, June 8-11, 2009, Proceedings, 155-160. Identity theft, computers and behavioral biometricsIdentity theft, computers and behavioral biometricsBibtexAbstract:
The increase of online services, such as eBanks, WebMails, in which users are verified by a username and password, is increasingly exploited by identity theft procedures. Identity Theft is a fraud, in which someone pretends to be someone else is order to steal money or get other benefits. To overcome the problem of identity Theft an additional security layer is required. Within the last decades the option of verifying users based on their keystroke dynamics was proposed during login verification. Thus, the imposter has to be able to type in a similar way to the real user in addition to having the username and password. However, verifying users upon login is not enough, since a logged station/mobile is vulnerable for imposters when the user leaves her machine. Thus, verifying users continuously based on their activities is required. Within the last decade there is a growing interest and use of biometrics tools, however, these are often costly and require additional hardware. Behavioral biometrics, in which users are verified, based on their keyboard and mouse activities, present potentially a good solution. In this paper we discuss the problem of identity theft and propose behavioral biometrics as a solution. We survey existing studies and list the challenges and propose solutions.
Tenenboim, L.; Rokach, L.; and Shapira, B.Multi-label classification by analyzing labels dependencies.2009.In European conference on machine learning (ECML)/principles and practice of knowledge discovery in databases (PKDD)-1st international workshop on learning from multi-label data (MLD'2009), 117--131. Bibtex
A data warehouse is a special database used for storing business oriented information for future analysis and decision-making. In business scenarios, where some of the data or the business attributes are fuzzy, it may be useful to construct a warehouse that can support the analysis of fuzzy data. Here, we outline how Kimballpsilas methodology for the design of a data warehouse can be extended to the construction of a fuzzy data warehouse. A case study demonstrates the viability of the methodology.
Naamani, L.; Rokach, L.; and Shmilovici, A.A logistic regression method for cost sensetive active learning.2008.In Electrical and Electronics Engineers in Israel, 2008. IEEEI 2008. IEEE 25th Convention of, 707--710, IEEE. BibtexAbstract:
Direct marketing involves offering a product or service to a carefully selected group of customers, the ones expected to render the most profits. Active learning is a data mining policy which actively selects unlabeled instances for labeling. In this research our goal is to construct a model that minimizes the net acquisition cost of selection of instances for labeling and at the same time maximizes the net profit gained from approaching selected customers. We present a new framework which combines a cost-sensitive active learning algorithm with a logistic regression classifier. We evaluated the framework on two benchmark datasets. The results appear encouraging.
Gershman, A.; Grubshtein, A.; Meisels, A.; Rokach, L.; and Zivan, R.Scheduling meetings by agents.2008.In Proc. 7th International Conference on Practice and Theory of Automated Timetabling (PATAT 2008). Montreal (August 2008). Bibtex
Rokach, L.; Meisels, A.; and Schclar, A.Anytime AHP Method for Preferences Elicitation in Stereotype-Based Recommender System.2008.In ICEIS 2008 - Proceedings of the Tenth International Conference on Enterprise Information Systems, Volume AIDSS, Barcelona, Spain, June 12-16, 2008, 268-275. Anytime AHP Method for Preferences Elicitation in Stereotype-Based Recommender SystemBibtex
Ben-Shimon, D.; Tsikinovsky, A.; Rokach, L.; Meisels, A.; Shani, G.; and Naamani, L.Recommender System from Personal Social Networks.2007.In Advances in Intelligent Web Mastering, Proceedings of the 5th Atlantic Web Intelligence Conference - AWIC 2007, Fontainebleau, France, June 25 - 27, 2007, 47-55. Recommender System from Personal Social NetworksBibtex
Shani, G.; Rokach, L.; Meisels, A.; Naamani, L.; Piratla, N. M.; and Ben-Shimon, D.Establishing User Profiles in the MediaScout Recommender System.2007.In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2007, part of the IEEE Symposium Series on Computational Intelligence 2007, Honolulu, Hawaii, USA, 1-5 April 2007, 470-476. Establishing User Profiles in the MediaScout Recommender SystemBibtexAbstract:
The MediaScout system is envisioned to function as personalized media (audio, video, print) service within mobile phones, online media portals, sling boxes, etc. The MediaScout recommender engine uses a novel stereotype-based recommendation engine. Upon the registration of new users the system must decide how to classify the new users to existing stereotypes. In this paper we present a method to achieve this classification through an anytime, interactive questionnaire, created automatically upon the generation of new stereotypes. A comparative study performed on the IMDB database illustrates the advantages of the new system
Zakin, O.; Levi, M.; Elovici, Y.; Rockach, L.; Shafrir, N.; Sinter, G.; and Pen, O.Identifying computers hidden behind a NAT using machine learning techniques.2007.In The 6th European Conference on Information Warfare and Security, 335--340. Bibtex
Shani, G.; Meisles, A.; Gleyzer, Y.; Rokach, L.; and Ben-Shimon, D.A stereotypes-based hybrid recommender system for media items.2007.In AAAI Workshop on Intelligent Techniques for Web Personalization, Vancouver, 76--83, The AAAI Press. Bibtex
Romano, R.; Rokach, L.; and Maimon, O.Cascaded Data Mining Methods for Text Understanding, with medical case study.2006.In Workshops Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18-22 December 2006, Hong Kong, China, 458-462. Cascaded Data Mining Methods for Text Understanding, with medical case studyBibtexAbstract:
Substantial electronically stored textual data such as clinical narratives reports often need to be retrieved to find relevant information for clinical and research purposes. The context of negation, a negative finding, is of special importance, since many of the most frequently described findings are such. Hence, when searching free-text narratives for patients with a certain medical condition, if negation is not taken into account, many of the documents retrieved were irrelevant. We present a new cascaded pattern learning method for automatic identification of negative context in clinical narratives reports. Studying the training corpuses, the classification errors and patterns selected by the classifier, we noticed that it is possible to create a more powerful ensemble structure than the structure obtained from general-purpose ensemble method (such as Adaboost). We compare the new algorithm to previous methods proposed for the same task of similar medical narratives, and show its advantages: accuracy improvement compared to other machine learning methods, and much faster than manual knowledge engineering techniques with matching accuracy
Rokach, L.; Romano, R.; and Maimon, O.Automatic Identification of Negated Concepts in Narrative Clinical Reports.2006.In ICEIS 2006 - Proceedings of the Eighth International Conference on Enterprise Information Systems: Databases and Information Systems Integration, Paphos, Cyprus, May 23-27, 2006, 257-262. Bibtex
Decision trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. This paper presents an updated survey of current methods for constructing decision tree classifiers in a top-down manner. The paper suggests a unified algorithmic framework for presenting these algorithms and describes the various splitting criteria and pruning methodologies.
Averbuch, M.; Maimon, O.; Rokach, L.; and Ezer, E.Free-text information retrieval system for a rapid enrollment of patients into clinical trials.2005.Clinical Pharmacology & Therapeutics, 77(2):. Bibtex
The idea of decomposition methodology is to break down a complex data mining task into several smaller, less complex and more manageable, sub-tasks that are solvable by using existing tools, then joining their solutions together in order to solve the original problem. In this paper we provide an overview of decomposition methods in classification tasks with emphasis on elementary decomposition methods. We present the main properties that characterize various decomposition frameworks and the advantages of using these framework. Finally we discuss the uniqueness of decomposition methodology as opposed to other closely related fields, such as ensemble methods and distributed data mining.
2004 (6)
article (1)
Maimon, O., and Rokach, L.Ensemble of Decision Trees for Mining Manufacturing Data Sets.2004.Machine Engineering, 4(1-2):. Bibtex
incollection (1)
Zeira, G.; Maimon, O.; Last, M.; and Rokach, L.Change detection in classification models induced from time series data.2004.Data Mining in Time Series Databases, M. Last, A. Kandel, and H. Bunke (Editors), Volume57, 101--125, World Scientific Publishing Company Incorporated. BibtexBuy
Averbuch, M.; Karson, T.; Ben-Ami, B.; Maimon, O.; and Rokach, L.Context-sensitive medical information retrieval.2004.In Medinfo 2004: proceedings of the 11th World Conference on Medical Informatics, Volume 107, 282, OCSL Press. Bibtex
Rokach, L.; Maimon, O.; and Averbuch, M.Information Retrieval System for Medical Narrative Reports.2004.In Flexible Query Answering Systems, 6th International Conference, FQAS 2004, Lyon, France, June 24-26, 2004, Proceedings, 217-228. Information Retrieval System for Medical Narrative ReportsBibtex
phdthesis (1)
Rokach, L.Decomposition Methodology in Data Mining with Emphasis on Feature Set Decomposition Approach.2004.Ph.D. Thesis, Tel Aviv University. Bibtex
2003 (2)
inproceedings (1)
Rokach, L.; Maimon, O.; and Lavi, I.Space Decomposition in Data Mining: A Clustering Approach.2003.In Foundations of Intelligent Systems, 14th International Symposium, ISMIS 2003, Maebashi City, Japan, October 28-31, 2003, Proceedings, Volume 2871, 24-31. Space Decomposition in Data Mining: A Clustering ApproachBibtex
misc (1)
Maimon, O.; Ezer, E.; Rokach, L.; and Averbuch, M.Medical data storage system and method.2003.feb# 13.US Patent App. 10/365,405 Bibtex
Maimon, O.; Rokach, L.; and Lavi, I.Space decomposition in data mining-a clustering approach.2002.In Electrical and Electronics Engineers in Israel, 2002. The 22nd Convention of, 101--104, IEEE. Space decomposition in data mining-a clustering approachBibtexAbstract:
Decomposition may divide the database horizontally (subsets of rows or tuples) or vertically. It may be aimed at minimizing space and time needed for the classification of a dataset (e.g. sampling, windowing) or rather attempt to improve accuracy (e.g. bagging, boosting). This paper presents a horizontal space-decomposition algorithm, exploiting the K-means clustering algorithm. It is aimed at decreasing error rate compared to the simple classifier embedded in it while being rather understandable.
Maimon, O., and Rokach, L.Improving Supervised Learning by Feature Decomposition.2002.In Foundations of Information and Knowledge Systems, Second International Symposium, FoIKS 2002 Salzau Castle, Germany, February 20-23, 2002, Proceedings, 178-196. Improving Supervised Learning by Feature DecompositionBibtex
2001 (2)
inproceedings (1)
Rokach, L., and Maimon, O.Theory and Applications of Attribute Decomposition.2001.In Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, 473-480. Theory and Applications of Attribute DecompositionBibtexAbstract:
This paper examines the attribute decomposition approach with simple Bayesian combination for dealing with classification problems that contain high number of attributes and moderate numbers of records. According to the attribute decomposition approach, the set of input attributes is automatically decomposed into several subsets. A classification model is built for each subset, then all the models are combined using simple Bayesian combination. This paper presents theoretical and practical foundation for the attribute decomposition approach. A greedy procedure, called D-IFN, is developed to decompose the input attributes set into subsets and build a classification model for each subset separately. The results achieved in the empirical compart. son testing with well-known classification methods (like C4.5) indicate the superiority of the decomposition approach
misc (1)
Harari, Y.; Rokach, L.; Klevansky, Y.; Galili, B.; and Tsenter, I.Method and system for enabling the exchange, management and supervision of leads and requests in a network.2001.US Patent App. 09/801,560 Bibtex