Projects View

Software and Information Systems Engineering

Evaluating anomalies explanations using the ground truth

Students

Chen Galed

Guides

Bracha Shapira

Abstract
Machine and deep learning algorithms are widely used for anomaly detection. In many cases, explanations for the anomalies are required to highlight the features that most merit investigation by domain experts. Proper evaluation frameworks for anomaly explanation methods are currently lacking, particularly evaluation approaches in which the outcomes of various explanation methods are compared to the underlying reasons for which a certain outcome was returned, i.e. the ground truth. In this setting, the ground truth enables examination of the correctness and robustness of the explanations. This is extremely important in anomaly detection, a field in which the ability to provide useful information about the source of the anomaly is critical. In this work, we present a framework that includes (1) a dataset of digital circuits containing generated model-based anomalies, (2) anomaly ground truth explanations, created by an algorithm we present in this work (3) metrics for evaluating the correctness and robustness of anomaly explanations. The proposed framework can be used to detect anomalies with any anomaly detector, explain them using various methods, and evaluate the explanations using correctness and robustness metrics. Such evaluation is possible due to our use of ground truth explanations. We demonstrate the use of the framework with three common model-agnostic explanation methods and show the results of three correctness metrics and a robustness metric calculated by comparing the explanations to the ground truth.

Improving text editing by integrating voice & gaze in multimodal interfaces

Students

Hila Cohen
Maxim Zhivodrov
Sean Avrutin

Guides

Meirav simha Maimon

Abstract
Nowadays, the process of editing texts and correcting spelling errors in text editors is carried out with the use of a mouse and a keyboard. The user has to move the mouse to the word he wishes to correct, click on it and choose the preferred option. That process significantly slows down the work pace, especially when working on long texts where the number of correction and editing operations is high. In our project, we suggest to develop and estimate an innovative method of text editing that is based on natural interaction between the user and the computer. Using that method, the user will edit the text only by using his gaze and his voice. During development, we have developed a user friendly interface for editing and correcting text. One of our leading principles was that the interface would be as intuitive and natural as possible for the user, so that the user can quickly get used to our innovative method after a prolonged usage of the traditional method of correcting and editing text using a mouse and a keyboard. In the research part, we’ve conducted an experiment for the purpose of comparison between our suggested method to other three methods: using mouse and voice, using voice only and the traditional method (using mouse and keyboard). As part of the experiment, we have taken measures like the amount of time it took the participant to complete an editing or correction task and the effort he required to invest in order to complete that task. Moreover, we’ve asked the subjective opinion of every participant about certain aspects related to the use of the system like usability and effort.

Impact of Surgeon and Hospital Procedural Volume on Outcomes after CABG

Students

Yaara Rumney
Daniel Gorsia
Hilla Peterfreund

Guides

Nadav Rappoport

Abstract
The main agenda of our study is to examine whether the experience of a medical center or a surgeon has an impact on the risk of complications and the results of bypass surgery. The data of the research includes over one and a half million records of surgeries in the last ten years from around the world and thus allows us to examine trends over time. Although a medical center effect can be by following work procedures, population, and other parameters, we had to use a model that relates to this effect. We built GLMM, a linear regression model that models a random effect for different medical centers. The purpose of the model is to test whether the experience has an impact on mortality while adjusting the existing variance between the various medical centers. We trained XGBoost model - a machine learning model based on decision trees that predict the likelihood of mortality and complications following surgery. In our research we created two models for predicting mortality and two models for complications, where one model is without medical experience and the second with. Finally, we performed a SHAP analysis examining the effect of each measure on the prediction of mortality and complications following surgery. The results of the analysis showed that there is a negative correlation between the experience of a surgeon or a medical center and the results of the surgery. That is, the greater the experience, the smaller the chance of mortality and complications.

Multi AI Agents Simulation System

Students

Eran Toutian
Daniel Ben simon
Lasry Alona
Chen Shoresh

Guides

Guy Shani

Abstract
The project's goal is to make it easier for programmers/observers/analysts drawing conclusions and decision-making when they want to run machine learning algorithms that use intelligent agents. Prof. Guy Shani has learning algorithms that operate intelligent agents and his desire to develop a simulation that will help him understand the running processes of the agents in maximum comfort. The simulation is a visual tool to which the series of action performed by the agents are loaded, through which the programmer/observer/analyst will be able to make decisions more easily that is not possible from reading the output alone. The simulation we built in this project was done using Unity platform, which provides an interface and toolbox that enables to create simulations that can be used to simulate multi-agent design algorithms. The simulation we built contains two different environments: 1. Sokoban: video game where the player's goal is to push objects to different destinations using a minimum number of movements. The objects are boxes, and the agents are the characters who push them towards the target. 2. Multi Agent Elevator Movement Simulation System: here, the agents are the elevators, and the objects are people scattered on the different floors and want to reach the different destinations. The simulation is not a necessary tool for understanding the processes that occur during the running of the algorithm, but it is an essential tool that optimizes the times of reasoning, the quality of the conclusions, and the decision-making processes that the programmer/observer/ analyst must perform.

Pepper - Social Robot

Students

Itzhak Finkelstein
Itamar Sigel
Zahi Kapri
Noa Weiss

Guides

Lior Rokach

Abstract
This project is collaboration with the Cognition, Aging and Rehabilitation Lab in Ben Gurion University. Our project focuses on those who have suffered from a stroke and need to endure a hard rehabilitation process. The uniqueness of our system is the involvement of robotics in the rehabilitation process using games. social robots proved as motivating patients and encouraging in a long-term process. Session with the robot invloves movement detection system that record patients while playing the games, the data from the system is being sent to the algorithm that will classify which compensated limb movements the patient have committed and our robot – Pepper decides which feedback will be presented to the patient. Our team has improved and developed the algorithm who was trained according to data collected and classified by multiple physiotherapists and uses time-series based model. By transferring the problem to the image processing domain and then using transfer learning we have significantly improved our score and run times. After analyzing the results of the algorithm Pepper then decides, using a set of rules which feedback to present for the patient. We developed services that communicate and connects between all those technologies and runs the entire process. Those services include the movement tracking service which we developed using python, Arduino controllers and the Motive software that record and generate time-series data files automatically when a patient press a button. Those data files are being sent to the algorithm service that classify which wrong limb movements have been committed in the data file. The algorithm service communicates with Pepper feedback system that decides which feedback should Pepper present to the patient based on set of rules that was suggested by therapist and involve the time that took to the patient to finish the game and his previous wrong movements.

SummarEyes - Text summarization based on eye movement

Students

Matan Shoushan
Ido Kestenbaum
Adir Biran

Guides

Mark Last

Abstract
Text summarization is a complex task which was researched quite a lot in the past, while the focus of past researches were on the textual features (Font size, location of the word, beginning of a paragraph etc). In this project, we will research another way of summarizing texts – by mining eye movement data of the participant during the text reading. During the project, we conducted an experiment including 80 participants and 320 texts, in order to collect the eye movement data of the participants while natural reading of the text. The data received from the GazePoint system, that can calibrate to the participant’s eyes, and then process and output a lot of information at a very high rate of 40 records/second. From this information we can conclude the location of the point on the screen, the delayed time for each point, location and order of the fixations and more parameters. There are many thing that affect the eye movement data, such as: calibration of the participant to the system, the participant’s position during the experiment, wearing glasses or eye contacts, mother tongue of the participant and more. We use this data in order to create the summarization model, after deep understanding of the data and finding the most relevant features to maximize the model’s accuracy. In addition, we evaluate and compare our model to existing text summarization models that are based on textual features only, while using their Gold Standard summaries and metrics that can compare both abstractive summary and extractive summary.

Systance - A stance detection framework

Students

Chen Avraham
Gal Tadir
Adi Avinun
Iris Dreizenshtok

Guides

Rami Puzis

Abstract
Today, the field of stance detection, which deals with predicting a person's position on a particular subject, is one of the hottest areas in the market. Marketing companies, surveys and those involved in the capital market, invest huge sums in trying to find an accurate algorithm, which given a sentence, knows how to identify ones opinion on a particular subject. The development of technology and in particular the field of natural language processing is the basis for studies that try to identify a user's opinion on social networks. Many researchers have developed different methods of predicting stances and the results do look promising. However, one of the major problems the domain encounters is that each method is tailored to a specific dataset. Therefore, when a researcher chooses an algorithm that has yielded high accuracy, there is no guarantee that this will be repeated in the dataset he is interested in training (one of the reasons for this is overfitting) or alternatively there is no guarantee that the algorithm will run. For these reasons, our group has chosen to implement a system, which given a dataset, allows a comparison between different models of stance detection. The system allows the user to select a dataset from the ten datasets we offer, or alternatively insert their own dataset, as well as select algorithms from six algorithms we implemented and compare the results. Information and statistics are provided and displayed in a dashboard that we created, which will allow the user to analyze the data more conveniently. In addition, in our project there is a research aspect in which we develop a new method for classifying stance. The method works with datasets from Twitter. For each tweet we created a graph that represents its propagation on Twitter. From the graph we extracted network features, which we integrated in mainstream machine learning algorithms (SVM, Logistic Regression) or attached them to existing methods. Since the method has not proven high accuracy in all cases, we decided not to attach it to our system at the moment because it requires more in-depth research.

Cell Death Annotation Tool using Active Learning

Students

Liat Cohen
Amit Sultan
Yarin Hayun
Haim Reyes

Guides

Assaf Zaritsky

Abstract
In 2012, a biological mechanism which promotes propagating cell death was discovered. This mechanism might indicate the existence of communications between neighboring cells and that cells influence each other. To investigate cell death in tissues, researchers need to tag experiments with hundreds of microscope images, tagging dead cells in every image. To make this process more efficient we developed a web application named “Cell Death Annotation Tool using Active Learning” which will serve as an easy-to-use generic platform for tagging cells in microscopy image stacks. The system we developed is a two-phase platform for automating the cell tagging process. The first phase is basic image processing and initial tagging using StarDist which is a CNN based tool. As additional preparation for phase two the algorithm will also compute the cells’ routes (i.e. tracks) throughout the image stacks and extract relevant features from them. The second phase improves the tagging using the extracted features which are fed to an ML model, XGBoost, which classifies each cell tagging as either true-positive (if the cell died in the frame) or as false-positive (an error of which occurred in the initial processing of the image stack). The model utilizes Active Learning, an iterative learning method that includes the researchers in the model’s learning process. At the end of each tagging process the researcher will go through the results and correct them if necessary. In the end of the active learning process, given satisfactory results, the model will perform image tagging automatically without manual intervention. The project is done in collaboration between Michael Overholtzer’s lab at Memorial Sloan Kettering Cancer Center, NY and Assaf Zaritsky’s lab at Ben Gurion University.

Conflict-Free Multi-Agent Meeting

Students

Oran Shichman
Tomer Godelli
Oscar Epshtein
Shahar idan Freiman

Guides

Dor Atzmon

Abstract
Multi-Agent Meeting (MAM) is the problem of finding a meeting location for multiple agents and paths to that location. Practically, a solution to MAM may contain conflicting paths. A related problem that plans conflict-free paths to a given set of goal locations is the Multi-Agent Path Finding problem (MAPF). In this project, we solve the Conflict-Free Multi-Agent Meeting problem (CF-MAM). In CF-MAM, we find a meeting location for multiple agents (as in MAM) as well as conflict-free paths (as in MAPF) to that location. We introduce two algorithms, which combine MAM and MAPF solvers, for optimally solving CF-MAM. We compare the algorithms experimentally, showing the pros and cons of each algorithm.

Tamporal Data Imputation

Students

Alisa Faingold

Guides

Robert Moskovitch

Abstract
Missing values is a common problem that occurs in the data gathering process in many domains. A common approach to handle missing values is imputation, in which values are injected instead of the missing values. Most of the common standard imputation methods rely on inter-attribute correlations to estimate a missing value, while effective temporal imputation needs to employ time dependencies and make use of the temporal data characteristics along time. Today, this issue is a significant challenge, especially in the field of medicine. To overcome this problem and efficiently impute missing data in multivariate symbolic time-series datasets, we propose a new approach to look at the problem. In our method, multivariate time-series data will be first transformed for discrete temporal data, where each time-point value is classified into a state, based on cutoffs from different discretization methods. Then, for the imputation process, we will use the sliding window approach for moving existing data to prediction-missing ones. Our hypothesis is that turning data into discrete values using discretization methods will allow the model to learn from both the global and local structure of the data, and to perform better imputation. In our experiments we will compare our method to other methods that perform imputation in discrete values. In addition, we will experimentally compare the results of state-of-the-art temporal classification techniques with and without our new algorithm on different temporal datasets using different ratios of missing data.

Identify new connections between miRNAs Using embedding techniques

Students

Roy Judes
Naor Ben ivgi

Guides

Isana Veksler-lublinsky

Abstract
This is a bioinformatics research project that deals with finding associations between miRNA molecules. miRNA (microRNA) are small non-coding RNA molecules that exist in the cells of most organisms, including humans. These molecules have a significant role in diseases detection and prevention and in medicine manufacturing, thus have been developing as an attractive domain of research. The project aims to uncover new associations between miRNAs using NLP techniques, as have never been used in previous studies related to miRNAs. It uses miRNA profiles, which show the miRNA expressions in samples from different organs in healthy and ill humans and converts them to textual data in order to apply word embedding algorithms on it. That’s implemented by three steps, beginning with conversion of miRNA profiles to documents that contain the miRNAs as terms. This produces a corpus of text documents that represent the healthy and ill samples. Next, applying word embedding algorithms that produce the embedding vectors. Each vector represents a miRNA molecule, taking into account its appearances all over the corpus. Last, computation of similarity indices on the embedding vectors and the raw profiles. These indices help us get the similar miRNAs and discover new associations. After this pipeline’s execution the miRNA pairs with high computed similarity are compared to pairs already known as associated, for deduction of new associations. Out of the newly discovered similar pairs, those indicated to be associated are those that showed high similarity throughout multiple tissues, what may confirm a potential collaboration between them.

Improving correction and text editing by integrating voice only

Students

Adi Flint
Inbal Bitton
Noa Shabtay

Guides

Meirav simha Maimon

Abstract
Today, when technology is gaining momentum and is an integral part of our daily routine, the need for useable and efficient interfaces is rising. The world today offers a wide range of natural interfaces to the users, in order to increase our efficiency and productivity in our work, communication, and personal lives. In our research, we will deal with the question of what is the most natural and effective way to correct errors and text edit in documents. Several methods and methodologies have been proposed over the years to be an alternative to the keyboard and mouse method, methods such as using gaze recognition, voice and gaze use, and so on. We developed a system, which will be an alternative method based on interfacing through voice recognition to correct errors and text edit, and we would like to examine in an experiment we perform, whether this method is more effective, efficient, and useful alternative for the user.

Automated machine learning (AutoML) using convolutional neural network (CNN)

Students

Itai Dagan

Guides

Lior Rokach

Abstract
This research aims to examine the usefulness of a convolutional neural network (CNN) as a meta-learner for machine learning algorithm selection by representing datasets as images. One of the most challenging steps in applying machine learning to a new dataset is the algorithm selection task. There are several available platforms for this task, but none is using computer vision in their solution. In the last decade, deep neural networks have made a significant breakthrough in image classification tasks – As can be seen in many studies and extensive uses of everyday life. Our approach creates image representation for datasets. Making this transformation allows us to use CNN, which excels in image classification, and we utilize it to predict the best performing algorithm. Our study is based on the hypothesis that datasets with similar image representations also have similar algorithm rankings. Moreover, as a consequence of using whole dataset as an instance, our training data is scarce for training a top-level CNN model. As a solution for this problem, we examine the idea of utilizing a pre-trained CNN model instead of training the model from scratch. Specifically, we perform transfer learning from the third edition of Google's Inception CNN and adapt it for our task. Our experiments show that our proposed approach yields predictive performance at a level that stands in one line with popular AutoML methods for algorithm recommendation.

Uncovering fundamental principles of muscle regeneration using quantitative live-cell imaging, and machine learning

Students

Aamit Shakarchy

Guides

Assaf Zaritsky

Abstract
Upon muscle injury, specialized forms of muscle stem cells differentiate, migrate and fuse into existing muscle fibers to repair the muscle. In order to decipher the regulation of this complex process, I am developing a computational framework to quantitatively monitor it using live-cell imaging data acquired from Dr. Ori Avinoam’s lab from the Weizmann Institute of Science. My framework includes image processing to identify and track the cells, machine learning to score the cells' differentiation state and decipher the relations between cell migration, differentiation, and fusion. Preliminary results indicate that Cells tend to fuse into existing muscle fibers in a regulated manner; cells conduct a local communication, intensifies over the process of differentiation. Together proposing a hypothesized mechanism where the differentiation stage of the cells is related to cell-cell communication. Cells process information received from local neighboring cells, then decide accordingly where to migrate, to differentiate and fuse. These technical advances will allow us to shed light on the cellular and molecular mechanisms regulating muscle fiber formation and regeneration. In the longer term, these can improve post-injury regeneration by identifying novel mechanisms that enhance muscle function and lead to rapid muscle regeneration and recovery from fatigue.

Remote Motor Synchronization Project

Students

Yarden Schwartz
Eden Mizrahi
Avital Zehavi
Or Alfasi

Guides

Noam Tractinsky

Abstract
The subject of our project is the field of remote motor synchronization, that is, performing synchronized motor actions by people who do not see each other. This system is designed to help researchers conduct research on the subject. The researchers aim is to examine whether remote motor synchronization is possible and what its effects are on users. Our system provides the base on which synchronization between remote people will take place. This is a first-of-its-kind system, one of its features is the ability to isolate one side of the interaction (i.e. one user) and therefore control and change the ability of the other user to synchronize. With this control, we will be able to examine in a supervised way the effect of certain parameters on the ability to synchronize and the user's feelings. The final product is a system that includes an Android app where the experiment will be performed for the study. The system will allow researchers to change parameters of the behavior of one of the parties and of the delay of the communication network, collect and maintain data on the conduct of the subjects in the game, and prepare the data for analysis. The app contains a common task between the human user and the virtual user ("Agent"). During the experiment, the user plays in front of 5 "players", which will be embodied by various behavioral parameters - the level of delay and the reaction rate of the agent. The various parameters should make the user feel as if he is playing against 5 different players during the game. As part of the project, we build the user interface of the app, developed an algorithm to calculate the timing of the future agent's action in the game, and defined the parameters that will affect the level of synchronization of the agent and his pace of action. The algorithm is based on the history of the agent and user actions in the game and the set parameters for the relevant agent. When we finished building the system, we performed a user experiment during which the subjects played the app. We collected data on the course of each user's games and analyzed the data to check the durability of the system and the quality of the experiments.

The Velodrom - Analytical and visualize system for a cycling team

Students

Amit Nachimovitz
Yaar May-paz
Inbar Tzur

Guides

Robert Moskovitch

Abstract
Israel Start-Up Nation is an Israeli cyclist team that is rated as UCI World Tour by the International Cycling Union. To determine the training program for each cyclist, the team manager and coaches need information about the cyclist's physical conditions and recent performance during training. Each cyclist in the team is equipped with a cycling computer that collects such information during the ride. To generate this information, the team uses an external API. The team information manager uses a script to extract the information from the API and then produces various reports based on it. The reports are displayed in google sheets and are represented by various graphs of information for the selected cyclists. The main drawbacks of this process are that the information retrieval is not done automatically, and it requires manual activation. Additionally, the information is not stored by the team but only on the external API servers. The project goal is to create a centralized system that will automatically perform the processes of retrieving, processing, and storing the information, as well as enabling convenient and accessible presentation of the information using several customizable charts. The system performs the task of collecting the information using scheduled processes that are activated automatically and retrieve information from the external API. The information collected is processed and stored in a SQL Server database that was created as part of the project. Then, another automated process updates the summary tables in a Data Warehouse that was created specifically for the project and the expected information needs, from which the information for the various charts is retrieved (designed for quick retrieval). To display the information for the team coach or management, we built a Web application that allows users to view different graphs about the cyclists as well as perform various administrative actions.

OpenCoVid : COVID-19 usecases powered by computer vision

Students

Avihai Serfati
Dvir Simhon
Assaf Attias

Guides

Guy Shani

Abstract
In December 2019, a pandemic named “Covid-19” was spread across the world. According to WHO, as we are writing those lines, more than 169 million cases were discovered. In order to take over the pandemic WHO published emergency instructions, such as: wearing a face mask in public places and keeping social distance of 2 meters between people. The need for enforcement and supervision after people keeping the instructions caused many difficulties. Those difficulties brought many people, and us among them, to look for efficient, available and accessible solutions, in order to make this coping easier. We offer an efficient solution based on computer vision, focusing on automatic monitoring of people, in real time. The goal: check whether people wear face masks and measure distance between people who were identified. Our solution uses a deep learning algorithm that combines geometric techniques to calculate distances. We use a neural network to analyze video streams with a state-of-the-art object detection algorithm named “YOLO”. Our solution has 2 main parts: 1. Mask detection model. 2. Algorithm for calculating distance between people. Our software displays the outcomes of the analysis on screen in real time, and marks the violators. Moreover, the software displays the results summary in each moment: how many people wore masks? How many social distancing violations were detected? By our tests, we reached 81% accuracy in our wearing mask model and 93% accuracy in distance calculation (60 CM safety margin, between people with distance of 2-3 meters). Compared to other solutions available on the market today, we offer dedicated and free solution. Our system does not require any dedicated equipment, you can execute it on every camera output and receive an immediate and accurate output.

Bounded Suboptimal Search In Multiplayer Game Trees

Students

Peleg Biton
Shachar Meretz
Asaf Zaks
Omer Nagar

Guides

Dor Atzmon

Abstract
For many years game systems were designed to beat world champions. The concept behind these systems is based on decision trees - the system maps the game states as a search tree, where each node represents the game state and edges represent actions between states. Systems based on naive algorithms that investigate all existing nodes cannot complete the task in a feasible time. Therefore, such systems limit the number of investigated nodes and determine the state value by heuristic calculation. Pruning techniques have been developed to prune nodes in a way that does not affect the search optimality because the calculation quality increases as the nodes we evaluate are farther from the initial node. In this project, we defined conditions for performing a bounded suboptimal search in multiplayer game trees based on optimal pruning techniques. By setting these conditions on existing optimal algorithms, we have created two algorithms that have the potential to prune more since they settle for a suboptimal solution (within the range of a constant). The aim is to reach deeper nodes and make a better decision, despite the cost we agreed to pay. In the experiments we performed using a random tree generator and a four-player Rolit game simulator, the suboptimal algorithms investigate deeper nodes and achieve better results than the optimal algorithms.

Meta Aggregator for Group Decision-Making Cases

Students

Michal Ezrets
Ortal Parpara
Raz Klein

Guides

Hilla Shinitzky

Abstract
In the context of collective decision-making and problem-solving, whether it is in a group setting or a dispersed crowd, arriving at the optimal decision is considered a primary goal. Composing methods for combining decisions made by multiple individuals into one collective decision is an essential issue in various research. Yet, most of the available techniques for aggregating a collective's final decision come down to a simple, deterministic aggregation rule. While simplicity has its advantages, it has been shown that different techniques are preferable in different situations. Thus, it would be beneficial to find a way to know which method to use in each given case and increase the probability of choosing the right decision. We propose a Meta-Aggregation algorithm that, given a set of responses, suggests the most likely method to perform a successful aggregation of the final collective decision. At the core of the meta-aggregation procedure lies a Machine-Learning classification model that learns from past decision-making cases and their associated features. The classified instance is a decision-making case, and the classification indicates which rule-based method is expected to perform a successful aggregation procedure. Our implementation included Multi-Label Classification technique, psychology and statistical based features, and additional novel aggregation method solving difficult cases that classic aggregation methods couldn’t. Experimental results show an increase in the success rate of about 20% when using the meta-aggregation program, compared to the best rule-based technique. This approach has the potential to assist in better characterizing collective decisions and problem-solving cases.

Twixper: Twitter Experimentation Platform

Students

Dekel Levy
Tal Frimerman
Nir Dzouraev

Guides

Nir Grinberg

Abstract
Social networks are one of the main channels for people to learn about what is happening in the world. It is critical to be able to conduct experiments on social media to better understand the social interactions between people, developing ways to ensure the health of online conversations, and building better social platforms for people. In our project, we developed Twixper - a platform for organic experimentation on Twitter for consenting individuals users. It allows researchers to manipulate the experience of real Twitter users with real content, and record their interactions with it. Our system consists of a website for researchers and a mobile app for participants. Researchers can view experiments they created and download detailed logs of participants’ activity. Consenting individuals can participate in an experiment by installing the mobile app we developed. The app emulates the organic Twitter experience and supports core Twitter functionality including see the feed, react to content, post new content, follow and unfollow accounts, search, see profile information, and more. The system supports several experimental manipulations including removing content from the feed, injecting content to the feed, pixelating media or removing it from tweets. Finally, researchers can download the complete log of activity in their experiments and assess the effectiveness of their manipulation on organic behavior of users on Twitter.

A Framework for Privacy Preserving Cloud-based ML

Students

Yiftach Savransky

Guides

Gilad Katz

Abstract
Recent advances in cloud computing and machine learning (ML) capabilities have made it possible for individuals to access state-of-the-art algorithms that until recently were only accessible to few. These capabilities, however, come with a significant potential risk: the loss of privacy. Solutions such as Homomorphic encryption keep the data encrypted at all times, but they are computationally expensive and require cooperation from the cloud’s service provider. Differential privacy solutions ensure a certain degree of obfuscation, but still enable an attacker to infer various forms of information about the processed content. In this study we propose a novel solution that offers many of the advantages of Homomorphic encryption at no additional computational cost at the cloud side, and at a limited cost at the client side. By encrypting our data using randomly generated deconvolutional nets and training an architecture that translates the cloud’s outputs into meaningful classifications, we are able to create a secure and simple encryption solution. Empiric analysis demonstrates the effectiveness and robustness of our approach.

Defining Evading Policy for Pursuit-Evasion Problems

Students

Hilla Ben avi

Guides

Guy Shani

Abstract
The pursuit-evasion problem describes two agents, pursuer and evader, who are performing in a 3D environment. The evader aims to reach a goal from a pre-defined set of static goals by reaching its position. The pursuer's purpose is to prevent the evader from reaching a goal by reaching the evader’s position. Once the pursuer captures the evader, both agents disappear from the environment. The research focuses on the evader when the main goal is to develop a method that helps the evader find the optimal policy. The optimal policy allows the evader to maximize the chances of reaching a goal and minimize the chances to get caught. The main challenge is that the evader doesn’t have any information about the location or the movement of the pursuer, when the pursuer knows where is the evader all the time. To find the optimal evading policy the pursuit-evasion problem formalized as a Q-learning model. According to this formalization, a state is defined as a tuple that contains the evader’s position (x, y, z) and speed. The speed is the position changes in a time unit, meaning (dx, dy, dz). An action is defined as the change in the speed of the evader (can be 1, -1, or 0 for each axis). Through the experiments, the evader generates millions of trajectories and examines for each one of them whether it reached a goal, caught by the pursuer, or ran out of time. Eventually, the pursuer learns what is the policy which allows it to reach as many goals as possible.

Pseudomonas AeruginoSite - Web platform for exploration of bacterial defense systems

Students

Inon Ben zekri
Ovadia ido Efroni
Alon Golombek
Tomer Seinfeld

Guides

Isana Veksler-lublinsky

Abstract
This bioinformatics development project provides a platform for mining defense systems of the bacteria Pseudomonas Aeruginosa, via a user-friendly web application. Due to the vast amount of information that exists, lack of Pseudomonas Aeruginosa-specialized interface, and no tools to research defense systems and phenotypic or genomic correlation - the need for a cross-referenced information platform arose. This project revolves around defense systems of Pseudomonas Aeruginosa and provides information regarding ~18 different defense systems in ~6000 strains, each of their ~6000 genes, and ~50,000 gene clusters of the bacteria. The project would enable researchers to explore defense systems of bacterial species and to better understand their association with genomic, environmental, and phenotypic traits. The application provides a variety of displays and visualization tools that helps researchers to get insights by a few clicks, among them: 1. Interactive tabular display of genes of the desired strains. 2. Interactive Circos Genome Browser of a desired strain genome and information about the genes associated with specific defense systems. 3. Circular Phylogenetic Tree with the distribution of desired associated data (i.e. defense systems, meta-data, and more). 4. Correlation tests between any pair from of the following groups: defense systems, cluster, strain, and genes attributes (i.e. genome size and more) and meta-data. 5. Download any data from the system’s database for personal use. The goal of this project is to supply researchers and the scientific community with a web interface to consume, map, and visualize the rather complex information on Pseudomonas Aeruginosa.

Identification bug inducing commits

Students

Shir Cohen

Guides

Meir Kalech

Abstract
Predicting defects by machine learning is a known problem that has been researched for many years. The main goal of this thesis is to detect code defects before committing to the repository. In order to achieve this goal, we will investigate two things. First, we will find a set of features that indicate inducing a bug in the source code. The features will be based on static analysis, and it seems that these features allow us to prove the insertion of a bug in the code. Secondly, we will build a model that will enable us to identify whether a particular commit has induced a bug in the code with the help of the set of features that we found. This model is based on Generative Adversarial Networks (GAN). Using GAN, you can generate more samples and improve bug prediction.

Boosting Anomaly Detection Using Unsupervised Diverse Test-Time Augmentations

Students

Goldshlager Goldshlager

Guides

Bracha Shapira

Abstract
Anomaly Detection is a well-known task that has been studied for decades. Anomalies are observations that do not meet the expected behavior with respect to some context or domain. Anomalous events occur relatively infrequently, yet they can have serious and dangerous consequences in domains such as intrusion detection in cyber security, credit card fraud, health care and insurance and industrial damage. Test time augmentation (TTA) is an application of a data augmentation technique on the test set. This is done by generating multiple augmented copies for each test sample, predicting each of them, and combine the results with the result of the original sample. TTA is more efficient than data augmentation at the training phase since it does not require re-training the model, while preserving accuracy and introduces robustness. To the best of our knowledge, there do not exist any studies that utilize TTA for anomaly detection for tabular data. We propose a TTA-based method to improve the performance of anomaly detection. We take test sample's nearest neighbors and generate its augmentations by using the centroids of a k-Means model that was trained on the sample's neighborhood. Our advanced approach utilizes a Siamese Network to learn an appropriate distance metric to use when retrieving a test sample's neighborhood. We show that for all eight datasets we evaluated, anomaly detection that uses our TTA approach improves AUC significantly. Moreover, for almost all evaluated datasets (except one), the learned distance metric approach shows a better improvement than the nearest neighbors model with the Euclidean distance metric.

KarmaLegoWeb Time Interval Mining System

Students

Yiftah Szoke
Omer Hofman
Roi Katz

Guides

Robert Moskovitch

Abstract
KarmaLegoWeb is a Temporal Data Mining project. Today, to extract Time Interval Related Patterns (TIRPs) from massive data, a researcher is required to use several applications and possess highly technical programming knowledge. KarmaLegoWeb simplify this process into a single web application where every user can upload time-based data, extract unique patterns, analyze it from several different angles using advanced visual tools and share it findings with the world. The client of this project is Dr. Robert Moskovitch and CDALab Laboratory. The project goal is to create an all-in-one time-based data mining platform that presents the use of time-based algorithms in an efficient, simple, fast, and stable way. Our system has been introduced to the laboratory and aid to various studies. The solution chosen is to create a web application that provides a simple and accessible UI to technical aspects of using time-based template mining. By creating the project, we used many technologies such as Python, Flask, React, Apache and more. In the project we faced many challenges, including design and create a server and client-side application, generate a visualization module which can display the algorithm output in interactive way, with this output the user can generate new insights. In addition, we integrated several lab projects (thousands of code lines written in different languages) Into one web app. We achieved all the defined goals and today the system is available at the following link: https://icc.ise.bgu.ac.il/njsw22

Bidi Research Platform

Students

Razy Alchech
Mohsen Abdalla
Evgeny Umansky

Guides

Noam Tractinsky

Abstract
Today we are living in a technological age where we can build highly sophisticated applications and websites. Nevertheless, sometimes the interfaces of those applications or sites do not display the content in a way that suits users whose language is written from right to left (RTL), such as Hebrew and Arabic. This is due to the lack of standards and design guidelines for BiDi languages, and the lack of research that deals with bidirectional languages (i.e. languages whose main writing direction is from right to left). To help researchers in the field, we developed a web-based system for conducting research in languages with different writing directions. There is currently no similar system on the market. Our system will be used by researchers to build and run experiments with human participants. The system supports the researchers in developing the experiments and the experimental stimuli. It further support the researchers by saving metrics on the performance of different participants, providing statistical analyzes to researchers, flexibility in manipulating the direction of presentation of different controls, and consistency in presenting the content of the experiment in different languages and writing directions – from left to the right and from right to left. Our system provides the researchers with all the above in order to improve the user experience and also to refine existing design standards, which will help website and application developers. The system's customers are researchers who will use a system built to conveniently construct experiments, conduct them, and analyze their results. The system supports 4 languages: Hebrew, Arabic, Russian and English. The multiplicity of languages in the system will allow researchers to get a more general picture of user’s preferences of how content and controls are displayed in interfaces on the one hand, and a more specific picture when referring to a particular language, on the other hand. Finally, we believe that our system will contribute greatly to the issue of displaying content controls on multilingual sites. It is also a research-support system that can further evolve and promote positive change in the field of user experience.

Prediction of Multiple Sclerosis Prognosis

Students

Guy Zamostiano
Itai Katz
Shira Wertheim

Guides

Nadav Rappoport

Abstract
Multiple sclerosis (MS) is the most prevalent chronic inflammatory disease of the central nervous system (CNS), affecting more than two million people worldwide. The course of MS is highly varied and unpredictable. In most patients, the disease is characterized initially by episodes of reversible neurological deficits. MS patients are usually divided into groups based on the severity of their condition. Our research focuses two main groups: Relapsing-remitting MS (RRMS) patients and Secondary Progressive MS (SPMS) patients. RRMS patients very often experience the disease in a mild form that doesn’t affect their daily routine, while SPMS patients’ condition deteriorates constantly. Our project engages the challenge of identifying MS patients in RRMS group who are most likely to have their medical condition worsen in the following years and move to SPMS group. Our efforts focus towards the attempt to develop a model that uses clinical data of the patient from the moment he/she was diagnosed in order to face this challenge. In order to produce such a model, we make use of machine learning algorithms and methods. In order to achieve our project’s goals, we’ve used variety of machine learning models such as Logistic Regression and Random Forest. In addition, we supported our solution with advanced tools, such as Lasso Regression and SHAP to select the most influencing features and GAN architecture to enrich our datasets and achieve more promising results. We also used Leave-One-Out Cross-Validation to improve our results.

Developing interactive technology for remote Interpersonal Motor Synchronization

Students

Lior Shaposhnikov

Guides

Noam Tractinsky

Abstract
A multitude of research in the last decade has been devoted to interpersonal motor synchrony: the time-based alignment of people’s movement. Studies generally show that after engaging in short sessions of temporally synchronous motor activity with another person, pro-social effects such as increased empathy, willingness to help, and perceived similarity are demonstrated. In our research, we developed and examined smartphone-based remote interpersonal motor synchronization (IMS) activities to test whether remote IMS leads to prosocial effects similar to co-present synchronization. In addition, we examine the effect of network latency and aspects of information exchange on remote IMS. For our research, we developed an application using Flutter and WebRTC communication technology. Pairs of users perform activities through the application in real time. We examine different configurations of IMS using two activities: (1) Tapping activity: in-phase synchronized tapping between two remote users, and (2) Slingshot activity: an anti-phase activity of passing a puck between peers with different speeds. Each of the activities requires a different temporal pattern and different movement, and has different information exchange. We measure achieved synchronization using different indices, and administer a prosocial attitude questionnaire. We hypothesize that successful remote IMS (as measured by subjective and objective synchronization measures) will lead to more prosocial attitudes. We also hypothesize that increased network latency will reduce subjective and objective synchronization measures.

ModaMedic V2 - Patient information data extraction and analysis system in collaboration with the Orthopedic Department - Soroka Medical Center

Students

Shay Eretz kdosha
Gal Buzaglo
Noy Harari
Sahar Ben baruch

Guides

Erez Shalom
Yuval Shahar

Abstract
Today, surgeons in the orthopedic department at Soroka Hospital perform surgeries and have no objective indication as to the success of the surgery. Physicians make decisions about continuing treatment based on the patient's sharing his or her feelings, as they cannot obtain non-subjective information about the change that has occurred in the patient's daily life following surgery. Dr. Alex Geftler - A senior Physician in the Orthopedic Department at Soroka University Medical Center, raised the need for a solution to the described problem, and also provided us with a source of knowledge for the medical processes involved in developing the system. Enables analysis and presentation of objective motility data on patients before and after surgery. The system monitors metrics that will allow to study patient's condition and draw conclusions about the success of the surgery and its impact on the patient's daily life and will assist the physician in deciding to continue treatment. Compare a patient to a similar group of patients according to different criteria to check his rate of progress and more accurately estimate the rate and duration of recovery. The system also allows the patient to watch his measurements, watch home physiotherapist training videos to speed up his recovery, before or after surgery, and communicate with the physician through a “messages board.” The project team received approval from the Helsinki Committee and performed assimilation with the participation of patients from the Orthopedic Department at Soroka Hospital, which included training users of the application regarding the use of the system as well as the provision of technical support from the development team. We received positive feedback from most users of the system and the medical staff and a meeting was held for assessments for future development. It can be concluded that there is significant potential for improving service in the public health system. Furthermore, this project will provide an infrastructure for performing time-dependent analyzes on the unique information collected in follow-up projects. We would like to enable research questions to be answered for various needs such as patient classification, predicting diagnoses according to time-dependent patterns, and predicting the success of surgeries.

Evaluation system for ML based predictions' explanations

Students

Hen Debi
Shiran Golzar
Michal Talmor

Guides

Liat Anturg

Abstract
In recent years, Machine Learning algorithms have been widely used. The decisions made from these algorithms in many cases are made in a way that is non transparent to the user. Providing explanations for the results of Machine Learning algorithms is essential for researchers and users to make responsible decisions in many areas. Explainability is based on the idea that learning algorithms that refer to them as a "black box" will become "transparent box" algorithms and show the user the features that were dominant in every decision the model made. As the field develops, there are not enough tools to know if the explanations you will receive are indeed correct, consistent, robust and other evaluation aspects. In this research, we have met this need and provide an innovative evaluation of explanations obtained through XAI so that the user can rely on these explanations. As part of our project, we developed 4 innovative methods for evaluating explanation models based on 3 types of evaluation: Robustness, Faithfulness, and Consistency. All our methods were tested on existing explanation models SHAP and LIME. The first method evaluates the explanations’ robustness, testing how sensitive the explanation model is to noise. The second method evaluates the faithfulness and uses counterfactual examples to test how faithful the explanations are. The last two methods evaluate the consistency and use entropy measure and clusters interaction. As a part of our research, we test our evaluation methods on 3 different data sets with 3 tree prediction models like XGBoost and Random Forest. All the data sets include medical features in order to predict disease in patients Moreover, we have developed a user interface for researchers to run explanation on machine learning models and to get our developed evaluations as well.

DonateItApp

Students

Merav Shaked
Tair Cohen
Gal Rosenthal
Yuval Ben eliezer

Guides

Nir Grinberg

Abstract
Today, there are a variety of platforms in Israel that allow donations to nonprofit-organizations as well as the sale of second-hand products, but they do not provide the strong combination that we believe will benefit and improve the ability of nonprofit-organizations to raise money and increase the motivation and value of selling second-hand products. DonateItApp Is a website and mobile-app that serves as a social platform that aims to help small nonprofit-organizations through the sale of second-hand products, the money for the products will be donated to an organization, as well as increase their awareness. Currently, we have collaborated with several nonprofit-organizations. The donor uploads an item with its details. Other users can view the products that were uploaded. After selecting the requested product, the user is asked to pay for it, according to the value determined by the donor. With the consent of both parties, the money will be transferred to the nonprofit-organization of choice and the product will be transferred between the two. Thanks to our application, both users are satisfied, one by donating a product and the other by donating money to a good cause. In addition, users can see the distribution of their donations among the various nonprofit-organizations to which they have donated. Also, any user can search for products or nonprofit-organizations by relevant filters. After viewing a particular product, the system will recommend products according to pre-known criteria. Each nonprofit-organization has a dedicated user and with it they can publish events and edit the personal page. https://donateitapp.herokuapp.com

RepFeed: A representative social media feed

Students

Itay Merhav
Matan Bruker
Aviran Goel
Doron Shamai

Guides

Nir Grinberg

Abstract
In recent decades, political polarization in almost all western societies has increased considerably. Many see social media platforms and the personalized feeds that they provide as key vectors for polarization. Empirical evidence supports these claims by showing that ranking algorithms contribute to the creation of filter bubbles and echo chambers -- environments where only one side of a debate is over-represented and discussed. RepFeed aims to reverse these trends by providing users with an authentic and diverse set of political opinions. RepFeed is a system that includes database and server that integrates with Twitter's web interface via Chrome Extension and adds two social feeds. The first feed allows users to see an aggregate feed of content from people along different points on the political spectrum. The second feed allows users to drill down on the point of view of a specific cross-section of the population, for example, seeing the feed of individuals of a certain age range, gender, state, race and political party. RepFeed is based on a unique panel of registered American voters on Twitter, which powers the organic, dynamic and live content that is available in RepFeed.

Recommendation System For OCL Constraints

Students

Amit Wolf
Ohad Nave
Idan Albilia

Guides

Arnon Sturm
Rami Puzis

Abstract
OCL is a language used to define constraints on objects and pre/post conditions on operations when modeling software systems. The language was developed to overcome the design limitations of software systems modeling using UML and its use significantly improves the accuracy of system specifications. Despite the many benefits of OCL, on a practical level non-expert users have difficulty when it comes to writing a legal and correct OCL constraint. Our research attempts to solve this problem by using machine learning techniques to recommend constraints for a given model. With the help of a large data set containing a sufficient amount of models describing different software systems and their corresponding OCL constraints, we try to study different aspects of those models, mainly learning the patterns that exist in their structure and the semantics of the constraints in them so eventually we can predict new constraints for a given model that was received as an input to the system. The main research objective consists of different levels of solution which are built on top of each other and due to the complexity of each, the ultimate goal of the research will be realized in further research when our system and the accompanying research will form the majority of the necessary foundation. The system has two main components, the first is responsible for preparing the data used for the learning process and the second is responsible for prediction procedures and execution of experiments. In the data process, the system parses the entire data set, while verifying the legality of it by interfacing with external tools that have been adapted to our system, dismantling any OCL constraint to the language building blocks and placing the legal information into a final database. The second component includes extracting features of all existing objects in the database, such as those derived from the raw information and more advanced features related to the world of graphs embedding. The component includes an experimental system that able to predict on 3 different levels of solution, using a large number of different parameters and possible values ??and exporting the experimental data to the hard disk.

Predicting the Success of Social Media Marketing Campaigns

Students

Amir Gabay

Guides

Bracha Shapira

Abstract
The size and the growth speed of social media networks is tremendous with 4.14 billion active users in 2020. As social media attracts a significant number of users, it is of great interest to brands and e-commerce marketing efforts. Advertising efforts in social media are being held by companies through digital campaign management. A digital campaign manager should consider dozens of performance measures in every time period and decide whether a marketing campaign is going to be profitable in the future or not. Based on the campaign manager’s decision, actions such as pausing the campaign or scaling it up are taken. There is a large number of measures and statistics (e.g., sales history) that the campaign manager should consider while taking an action, which makes this task very challenging. In this study, we collect unique data from real e-commerce companies’ campaigns derived from the largest social media networks, Facebook and Instagram. Then, we suggest a machine-learning data-based method for assisting the campaign managers through their daily decisions. Using our method campaign managers can make greater profits by scaling their campaigns further or either pausing their campaigns and save a lot of expenses.

Predicting the success of marketing campaigns in social media

Monitoring and Forecasting Changes in Online Feature Selection

Students

Gal Atedgi
Roei Cohen
Roi Nissan

Guides

Mark Last

Abstract
Our project is a Research project which deals with the implementation of Online Feature Selection algorithms, validation of the algorithms results, evaluating them with different hyper-parameters and comparing the results in a comprehensive review article. In the beginning, we implemented 5 OFS algorithms in Python (Alpha-Investing(open-source code), F-OSFS, OSFS, SAOLA, FIRES(open-source code)), which considered to be state-of-the-art in the field of Machine Learning. The next stage was the validation stage. for each algorithm we used the same conditions used in its article, with the same hyper-parameters and the same datasets. Our goal was to obtain similar measure parameters results to the results obtained by the authors of the articles, so we could prove that the algorithms we have reconstructed from the articles were implemented well. Then, after performing the validation stage, we continued to the experimental stage, which included 5 OFS algorithms, 4 Online Learning algorithms (KNN, NN, RF, NB) and 10 Time-Stamp Datasets. We also performed experiments without OFS algorithm. The experiments setup was predefined and included 5 different windows size (100,200,300,500,1000) and different hyper-parameters for each experiment (Example: K=3/5 in KNN). The OFS algorithms were evaluated by the OL algorithms we used (mentioned above) and the results of the measure parameters obtained by the experiments written in reports which documented the process and were used to write the comprehensive review article. Finally, after analyzing the results, the most noticeable conclusion reached was that using OFS algorithm depends on the specific dataset, it can improve the results or make it worst.

A tool for interactive visualization and analysis of biological data

Students

Matan Gadasi
David Zaltsman
Gal Burabia
Chen Arazi

Guides

Isana Veksler-lublinsky
Meirav simha Maimon

Abstract
The project is a research project in the field of bioinformatics, focusing on the visualization and analysis of biological data, with an emphasis on microRNA data and genes. The study of microRNA data has gained momentum in recent years thanks to new technologies that enable mapping the genome of living organisms. Genome mapping creates a huge amount of data that biological researchers must study in order to find connections between microRNAs and their target genes. Understanding these connections can lead to finding ways to treat many diseases. The most prominent example of the importance of microRNA is the COVID-19 vaccine, which is partly based on this technology. Today, there are several microRNA data research systems that allow interactive visualization of the data along with the use of statistical algorithms to explore the data. However, the main disadvantage of these systems is that they do not support exploring two heat maps simultaneously, which impairs the user's ability to understand the connections between the different maps. Our system is a unique web application that supports exploring two heat maps and the connections between them. In such a way, it is easier for the user to see the connections between the maps and find new insights. The system uses machine learning algorithms and statistics algorithms to find insights from the data and uses interactive visualization tools such as interactive heat maps to present the data in a user-friendly way.

CRM-CMS Real Estate

Students

David Fadida
Gal Azaria
Eitan Platok

Guides

Arnon Sturm

Abstract
The purpose of the system is to provide a solution to the problem of information management that passes through the organization, and in addition, its purpose is to enable real estate agents to manage information, assets, and customers in the best and most efficient way. Thus generating more money for the firm. According to our client's work methodology, the system should support a method called Pipeline, thereby actually making real estate agents work solely according to the method and tasks the manager defines for them. This method is implemented in the system by preventing agents from exiting the page until they have until they have completed their task. As part of the system's capabilities, appropriate reports are generated that we characterize together with the client, thus allowing the real estate agent, office manager, and company manager to track progress, sales, learning, telephones, etc. In addition, the system enables office management and agent management according to a variety of appropriate permissions and thus allows the manager to monitor and manage all the employees under him efficiently and accurately without having to perform operations manually. The system is cloud-based, the Backend and FrontEnd servers are stored on the EC2 AWS server and the Database server is stored on the RDS AWS server.

WeKeyLeak

Students

Aviv Amsellem
Yarden Curiel

Guides

Mordehai Guri

Abstract
Cyber Attacks are a big concern in the internet security community. Corporations and Governments are investing a huge effort to secure and protect their data from any malicious acts. Covert channel attacks put its focus on communication channels that were not destined to transfer data. In this way, covert channel attacks have the potential to exfiltrate data from machines through non-standard communication channels. One kind of channel attack that we demonstrate in our final project is optic channel attack which uses lights. In our scientific project, “WeKeyLeaks”, we engage in the cyber field and demonstrate the possibility of transferring and leaking data through optic covert channel. By using a RGB keyboard and a smartphone standard camera, the project provides a proof of concept that RGB keyboards can transfer/leak a great amount of data per second. In our attack model, the base assumption is that there is malware inside the victim's computer that can find sensitive data and control the RGB keyboard’s lights. The model is based on two main modules. First is the RGB keyboard that is used to transfer or leak data. Second is a standard camera that is used for receiving the data by using computer vision. In our conclusion, we found that the most optimized light exposure (lux) intensity to transfer data accurately on average is 500 lux. In this light exposure intensity, we managed to reach up to 100% accuracy for more than 130 characters in a reasonable amount of time.

Media Slant: monitor and measure biases in different media outlets

Students

Koren Ishlach

Guides

Lior Rokach

Abstract
The existence of media slant might seem obvious to the general public, but this matter has been troubling researchers in the fields of social science and computer science for years. I show that when conservative and liberal sources cover an event, the tone of the coverage would match the themes, persons, and entities (TPE) that they choose to focus on, and that supports their political View. I trained a Deep Neural Network model over 2 years' worth of news data on the following task: predict which aspects of news events liberals and conservatives will emphasize. I use this model to predict their response in the 10 highly controversial episodes that are being monitored. Moreover, using simulated data, I identify the core TPE that minimize and maximize the polarization between liberals and conservatives. In addition, I also suggest through the use of Autoencoders, a method to predict changes in the event's presentation, meaning predicting the way an event is going to unfold in each political view. While most of the previous work in the field focused on predicting the sentiment of an article or simply identifying slanted text, my approach offers a more direct measure for media slant which takes into consideration articles representations. For performing those tasks, a wealth of articles from the Global Data on Events, Location, and Tone (GDELT) dataset are utilized by monitoring articles from 25 different media sources.

Robo Advisor

Students

Noa Gorengot
Omer Shlomo
Israel aviel Fedida
Moran Chery

Guides

Ofer Zoin

Abstract
In recent years there has been a surge in the number of companies providing digital investment consulting services and in the amounts of money managed through this service. According to forecasts, this surge is expected to continue to rise in the coming years. However, many people are still afraid to enter the investment world, as they have no knowledge in the field and there is concern about the various costs involved in building and managing the portfolio. Robo Advisors are methods for automating the allocation of assets using a computerized algorithm. The use of technology is vital in any competitive market. In our case: costs such as management fees, trading fees and more can be significantly reduced. All this while creating a portfolio that is hopefully tailored for the customer. In this project we built a platform for building investment portfolios throught Robo Advisors that uses various algorithms to build optimal investment portfolio that is adapted to client need's and character. All of this is done by a dedicated questionnaire the user answers, we analyze the results and attach a risk profile. This profile is later used by various portfolio algorithms to select the percentage of each asset in the portfolio. In addition, to help provide information for potential investors, we have built an information center that will make investment information accessible to the public. Furthermore, we have built a forum where users can discuss the results they have received from the system and share additional knowledge.

Leveraging text summarization accessibility using different visualizations

Students

Yasmin Avraham
Rotem Miara
Meytal Yaniv
Roman Grig

Guides

Meirav simha Maimon

Abstract
Our project is part of a study on text summarization conducted at the university. Our research question is which visualization is the best way to visually present the text summary so that it will be best understood. We argue that presenting the summary in a readable and convenient way to the subject will help him get the most out of this summary and save valuable time. The goal is to improve text accessibility using various text highlighting and emphasizing methods to display text summaries. We will study the impact of visualization techniques on user effectiveness, time efficiency, and satisfaction – using methods that are well known from relevant literature such as: font size and text highlighting as well as additional visualization techniques specially developed for this project such as: gradually increasing font and gradually highlighted background. The impact of each visualization on the test results will be compared with the plain text method and summarized text method. We use Bert model algorithm, which ranks the sentences in the text by their context to the main subject of the text. In our experiment texts are presented to the subject in various visualizations. Immediately after reading each text, the subject will summarize it and answer comprehension questions. We then measure the time for each action he made, and the number of correct answers. We will also check the subjects summaries level of compatibility with the sentences that were ranked highly important by BERT algorithm. Finally, we will analyze all the results obtained. we have developed a system for planning and performing the experiment. The examiner can design tests for the experiment either by selecting specific texts and visualizations or randomly creating a test. In addition, he can view the test results and download them to his computer. The subject is able to answer tests created by the examiner. He receives texts and questions and after answering, he ranks which visualization was most comfortable and effective for him.

CAPI - Emotion Prediction

Students

Yuval Khoramian
Ron Zeidman
Hod Twito
Shoham Zarfati

Guides

Yuval Shahar

Abstract
In this project, we explore the possibility of predicting a user's emotions using machine learning methods. In our exploration, we studied previous attempts at solving this problem through different lenses, using keyboard data, mouse data, and images of the user's face. To collect the data we need, we created two pieces of software: A library that can collect data on different channels, which by default collects mouse, keyboard, and camera data. We also created a graphical user interface that builds on the library, runs on the user's computer, and prompts the user for labels. We gave the program to a group of participants to collect their data. We then analyzed the data we collected and built models that could predict the user's emotional state to varying results. As we expected, the models that rely on facial data achieved the best results, models based on keyboard and mouse data achieved comparably worse results but on par with what we saw in the literature. As well as using multiple feature channels, we researched the psychology of identifying emotions and decided to use four types of labels, categorical or traditional emotions such as happiness, sadness, Et cetera. We also used three continuous emotion dimensions, valance, arousal, and dominance, which describe a more nuanced image of the user's emotional state. We trained models for various combinations of these labels and ensemble models that use multiple feature channels at once.

time series pattern discovery of electrical consumption data

Zero Shot Super Resoultion for microscopic images

Students

Lior Baruchovich
Dor Elkabetz
Tzlil Polyak

Guides

Assaf Zaritsky

Abstract
The quality of microscopic images consists of tradeoffs between serval factors that every biologist should consider (such as the health of the sample versus the image resolution). Any researcher with an average performing microscope cannot mitigate these tradeoffs with the traditional equipment at his disposal. To try and mitigate these tradeoffs there are attempts to try and enhance images quality using deep learning algorithms. Unfortunately, these algorithms require large sets of high definition microscopic images which can be very costly resources wise. To try and solve this problem we try to introduce the algorithm ZSSR- Zero Shot Super Resolution, to the microscopic world. This algorithm which was originally intended for natural images is trained using patches from the original image, making it cheaper for researches to improve images. In our research we found the best method to improve microscopic images using ZSSR and we showed through our experiments that it is comparable to other SOTA algorithms which were originally intended for microscopic images when put under the same conditions.

Beautiful Images

Students

Yarden Levy
Liron Oskar
Shir Ben dor
Tali Schvartz

Guides

Noam Tractinsky

Abstract
It is said that “one image is worth a thousand words”, so how many words one beautiful image is worth? Do people tend to consistently prefer certain images? The aim of this project is to develop an end-to-end system which allow us to develop and administer experiments regarding the above-mentioned research issues. Specifically, the researchers are interested in studying the personal aesthetic preferences of people and the degree to which those preferences are consistent. The system we develop for our final project includes a database of hundreds of aesthetic images which were downloaded from publicly open websites of beautiful photos. It serves as a platform for two types of games: 1. Consistency of personal preferences. In the first phase, users will sign up to the system and evaluate at least 60 images from the database on a 1-10 scale of how much they like each image. In the second phase (the game), some of the images that were rated by the user will be sampled randomly and presented. The user will need to identify those images that he ranked as the most beautiful in the first phase. The users’ scores in the game will be based on their success in this activity. 2. Identifying others’ personal preferences. The aim of this advanced game is to test whether a user can guess the aesthetic preferences of other users. This game can inform us if images which are preferred by one person are also preferred by others (i.e., if aesthetic preferences are universal). Running the experiment with real users is one of the project’s requirements. We expect to recruit more than 100 people for this experiment. To analyze the results, we will use machine learning algorithm such as clusters of users divided by their aesthetic preferences

Evaluating the Robustness of AI Fraud Detection Systems

Students

Chen Doytshman

Guides

Asaf Shabtai

Abstract
In recent years, financial fraud detection systems (FDS) have become very efficient at detecting financial frauds. This includes machine learning based algorithms such as KNN, SVM and Linear Regression, Deep-Learning methods such as Deep Neural Networks (DNN) or anomaly detection Autoencoders and even traditional rule-based methods. These models aim at detecting anomalous financial activity and reporting frauds to an operator (e.g., a credit card company, a bank or insurance company or an official state authority). As a result, financial fraudsters face a substantial risk of getting detected by such FDSs. This all raises the need for the attacker to choose his victims wisely: the attacker (i.e., a fraudster) would like to target the most vulnerable victims in terms of risk and profit. To this goal, an attacker may use Adversarial Machine Learning (AML) to analyse its victims’ behavioural patterns and deduce hidden ranking of them. In this research we study the application of adversarial attack-based ranking techniques to the fraud detection domain. To this end, we identify the main challenges and design a novel approach for ranking financial entities based on their susceptibility to financial fraud attacks. Using a private E-commerce dataset, we evaluate the vulnerability of real-world users to financial frauds. We also present an ML-based approach for predicting such scores. This tool, when at the disposal of an attacker, acts as a golden compass - it allows an attacker to minimize his risk and to maximize his profit, by attacking highly "fraudable" accounts, avoiding the most immune-to-fraud, i.e., least "fraudable" ones. The findings we present in this research are also valuable to the defender: by learning the hidden adversarial ranking, a defender (i.e., a financial company) may put extra protection on highly “fraudable” users in the form of 2FA, human expert analysis or lower detection thresholds.

CanvasCache

Students

Lishay Aben sour

Guides

Yossef Oren

Abstract
Web browsers such as Firefox are software applications for accessing the World Wide Web. When a user requests a web page from a particular website, the web browser retrieves the necessary content from a web server, renders it mostly using HTML and then displays the page on the user's device. To calculate and render the pages downloaded from the internet in the fastest way possible, nearly all of today's computers use small and fast memory units called caches. In this research, we use a side-channel through the last-level cache of Intel processors to implement a pixel-stealing attack on the Firefox web browser. To construct this attack, we needed both to find an execution path that will cause different cache activity based on the color of pixels on the victim screen, and to find a way to efficiently detect this activity amongst all cache sets while the calculation occurs. To evaluate the effectiveness of our attack, we apply it on three different computers with different generation Intel processors, steal pixels at a variety of speeds and show how we can use pixel stealing to gain full reconstruction images of complex figures and text. Finally, we plan to perform a responsible disclosure to the appropriate facilities.

Micro-architectural hardware fingerprinting

Students

Marina Botvinnik

Guides

Yossef Oren

Abstract
In recent years, researchers have discovered multiple methods for identifying computers. Studies aim to use every aspect and component of the computer to create a more stable, robust, and accurate fingerprint. Examples of fingerprint sources include software and networking configurations. Among the different techniques presented to date, there is no technique that directly uses CPU components to extract a fingerprint. This study demonstrates a way to use minor hardware manufacturing differences in the micro-architectural components of the CPU to produce a fingerprint, making it possible to tell apart devices with identical software and hardware components. The fingerprint relies on PUF Physically Unclonable Function (PUF) concepts. Specifically, we rely on a micro-architectural CPU property known as port contention, which is triggered by creating a race condition between micro-operations that use the same port on the same physical core with Hyper-Threading enabled. To carry out our research, we will first design an algorithm that classifies a small number of devices, then enlarge it to a big dataset and improve the accuracy by using deep learning algorithms, and finally find a way to run the method remotely in JavaScript instead of native code.

Targeting organelle-organelle organization via microscopy-based high-content phenotypic screening and generative neural networks

Students

Naor Kolet

Guides

Assaf Zaritsky

Abstract
Cells are the fundamental unit of structure and function of all organisms. Cell organelles are the molecular machines that define cell architecture. Disruption in cell organization determined by the cell’s organelles composition in space and improper organelle-organelle organization leads to impaired cell function in many diseases. Thus, discovery of drugs that revert the cell structure and organization to its “healthy” state is an initial step in some drug discovery pipelines. High-content image-based screening is emerging as a powerful technology to identify phenotypic differences in cell populations with several applications including drug screening. While current computational approaches pool image-based features from different modalities, each of a distinct organelle, I am developing new methodology to measure alterations in the spatial dependencies between different organelles and apply it to identify new treatments that interfere with specific spatial dependencies between organelles. The methodology is based on measuring the reconstruction error of generative neural networks that map the different modalities to one another. Preliminary results indicate that this approach is more sensitive, specific, and complementary in relation to the state of the art. Overall, the methodology will enable discovery and mechanistic interpretability of the effects each treatment has on specific aspects of cell organization in terms of “breaking” existing relations between multiple cell structures, which are currently inaccessible.

Robustness of DRL models against adversarial attacks.

Students

Amir Loewenthal

Guides

Asaf Shabtai

Abstract
Deep learning and deep reinforcement learning (DRL) algorithms have become the state-of-the-art solution in multiple domains. However, these algorithms are also highly vulnerable to noise and adversarial attacks. Improving the robustness of DRL agents against such attacks is important for safety-critical domains where even a small change in the agent’s decision can cause great damage. In this research, we focus on increasing the robustness of DRL algorithms through the use prediction intervals Our work centers on the adaptation of PIVEN – a recently proposed PI generation approach for regression problem – to settings where the data contains varying degrees of noise. We embed PIVEN in various DRL-based architectures (e.g., DQN, dueling-DQN) and explore their performance under noise in the training and test phases. Additionally, we design novel architectures and training setups that enable the DRL algorithms to effectively utilize PIVEN’s prediction intervals. Our evaluation, conducted on the famous CartPole problem, shows that our approach maintains equal performance in noiseless settings while showing lower degradation in performance when encountering high-noise settings.

Using Genetic Programming to Evolve Behavioral Programming Source Code

Students

Roy Poliansky

Guides

Achiya Elyasaf

Abstract
Ever since Genetic Programming (GP) was first introduced by John R. Koza in 1992, a lot of research was done in order to use GP to generate code automatically. However, despite many implementations and improvements, such as using AST, evolving bytecode etc. Success in generating code has been limited to generation of small programs of 200 lines of code. There are a few reasons as to why this field has been stagnant, such as the enormous search space of potential programs, complex syntax of modern languages and the difficulty to evaluate and test the generated code. To tackle these problems, we propose to use GP in conjunction with Behavioral Programming - a programming paradigm with unique characteristics that allows for the overcoming of the aforementioned obstacles. Behavioral Programming (BP) models programs as a set of behavioral threads (b-threads), each aligned to a single behavior or requirement of the system. To evolve behavioral programs we will develop effective genetic operators, representation, and evaluation methods for BP. We will evaluate our method on three domains. The simplicity of this paradigm, along with simple syntax, the option to use validation to test programs, and the program consisting of small independent chunks, allow us to effectively generate behavioral programs using GP.

Learning Centrality Measures on Graphs

Students

Liav Bachar

Guides

Rami Puzis

Abstract
In graph theory and network analysis, centrality measures identify the most important nodes within a graph. Applications of centrality measures include control and monitor traffic in complex networks, finding the influential person in social networks, detecting bots on the internet. One of the main problems with centrality measures is that they fit specific graph structures and for many graphs, a specific centrality measure cannot depict the real centrality value of the nodes. We suggest a model that is based on the Routing Betweenness Centrality (RBC) that can learn a centrality measure. Our model learns a routing function (R) and transition matrix (T) from the graph’s node embeddings then using the learnt R and T as inputs to the RBC algorithm to compute the node’s centrality . To evaluate our performance, we use a correlations measurement including Kendall coefficient, Spearman, Pearson. Using these measurements to compute the correlations between the model’s approximated node centrality vector to the actual node centrality vector.

Risk-Oriented Resource Allocation in Swarm Robotics

Students

Yakov Mallah

Guides

Asaf Shabtai
Yuval Elovici

Abstract
The use of swarm robotics in various military and civil tasks is gaining popularity. During a mission, swarm members require access to different resources (both data and capabilities) to effectively perform their tasks. These resources may have different levels of sensitivity, and some of them may be highly classified and need to be protected. Since the risk level of each swarm member may change during the mission, the decision on how to deploy the resources among the swarm members is crucial. In this research, we present a novel framework for distributing resources among the swarm members such that three main goals are achieved: (1) each member can access the resources it needs to perform its tasks (either locally or remotely), (2) the overall risk to the resources during the mission is minimized, and (3) the resources can be redeployed during the mission in response to changes in the risk level of swarm members. We evaluated the initial resource allocation provided by the proposed framework in various use cases and showed that it outperforms a baseline resource allocation approach in terms of the mission's risk. We also evaluated dynamic, efficient heuristics and showed that they help maintain a low mission risk after the reallocation of resources following changes in the risk level of swarm members.

Predicting Protest by social media

Students

Ron Edri

Guides

Aviad Eyashar
Rami Puzis

Abstract
A protest is a public expression of objection, disapproval or dissent towards an idea, or action, protests could disrupt the regular movement of vehicles on the road and include violence and environmental damage, so early forecasting of protest events will lead to authorities taking early steps to suppress anarchy and maintain order and security of residents. Studies have shown that social networks play a significant role in transmitting information, recruiting people, and organizing protests. And that there is also a connection between the exceptional and controversial events that took place to protest events. Following the theory on protest, for more than a decade, various learning-based approaches have been presented to predict demonstrations. The approaches include using of analyzing the structure of information propagation on social networks, analyzing the text of posts in well-known NLP tools such as emotion analysis, and using the GDELT database containing significant events. In our research, we will focus on three groups of protest events in the US (FergusonI(2014), FergusonII(2014-2015), Charlottesville(2017), and we would want to predict them several days before with machine learning tools, for this purpose we use an extensive collection of tweets from the period of the events occurred. The method we offer is an innovative method that contains a few levels. First, we estimate the number of potential demonstrators each day, by analyzing the content posted by each user. Second, we developed a new method base on tf-idf to weigh the tweets, and we use it to weigh the emotions and semantics that we extract from tweets. And finally a forecasting representative for the demonstration event according to the features we described above.

Interpretable Context-Aware Recommender Systems Utilizing Evolutionary Algorithms

Students

Eliad Shem tov

Guides

Lior Rokach

Abstract
A context-aware recommender system (CARS) utilizes users’ context to provide personalized services. Contextual information can be derived from sensors in order to improve the accuracy of the recommendations. In our work, we focus on CARSs with high-dimensional contextual information that typically impacts the recommendation model, for example, by increasing the model’s dimensionality and sparsity. Generating accurate recommendations is not enough to constitute a useful system from the user’s perspective, since the use of some contextual information may cause problems, such as draining the user’s battery, raising privacy concerns, and more. Previous studies suggested reducing the amount of contextual information utilized by using domain knowledge to select the most suitable information. This approach is only applicable when the set of contexts is small enough to handle and sufficient for preventing sparsity. Moreover, hand-crafted context information may not represent an optimal set of features for the recommendation process. Another approach is to compress the contextual information into a denser latent space, but this may limit the ability to explain the recommended items to the users or compromise their trust. In this work, we present a multi-step approach for selecting low-dimensional subsets of contextual information and incorporating them explicitly within CARSs. At the core of our approach is a novel feature selection algorithm based on genetic algorithms, which outperforms state-of-the-art dimensionality reduction CARS algorithms by improving recommendation accuracy and interpretability. Over the course of evolution, thousands of diverse feature subsets are generated; a deep context-aware model is produced for each feature subset, and the subsets are stacked together. The resulting stacked model is accurate and only uses interpretable, explicit features. Our approach includes a mechanism of tuning the different underlying algorithms that affect user concerns, such as privacy and battery consumption. We evaluated our approach on two high-dimensional context-aware datasets derived from smartphones. An empirical analysis of our results confirms that our proposed approach outperforms state-of-the-art CARS models while improving transparency and interpretability for the user. In addition to the empirical results, we present several use cases, examples and methodology of how researchers, domain experts and CARS modelers can tweak the feature selection algorithm to improve various user concerns and interpretability.

Behavior analysis of healthcare professionals in social media during the Coronavirus pandemic using machine learning

Students

Ilia Plochotnikov

Guides

Yuval Shahar

Abstract
The COVID-19 pandemic has affected populations worldwide, with extreme health, economic, social, and political implications. Healthcare professionals (HCPs) are at the core of pandemic response and are one of the most crucial factors in maintaining coping capacities. Yet, they are also vulnerable to mental health effects, managing a long-lasting emergency under lack of resources and complicated personal concerns. Our objective is to analyse the state of mind of HCPs as expressed in online discussions published on Twitter in light of COVID-19, from the pandemic onset until the end of 2020. The population for this study was selected from followers of a few hundred Twitter accounts of healthcare organizations and common HCP points of interest. We use active learning, a process that iteratively uses machine learning and manual data labeling, to select the large-scale population of Twitter accounts maintained by English speaking HCPs focusing on individuals rather than official organizations. We obtain topic distributions via the Latent Dirichlet Allocation (LDA) algorithm and analyse the topics and emotions in the discourse of HCPs during 2020. We define a measure of topic cohesion and describe the most cohesive topics. The emotions expressed in tweets during 2020 are compared to 2019. Finally, the emotion intensities are cross-correlated with the pandemic waves to identify causal relationships.

Time Series Anomaly Detection using Capsule Network

Students

Omer Sudri

Guides

Gilad Katz

Abstract
Anomaly detection is the identification of data points that deviate from a dataset’s normal behavior. In recent years, many papers proposed convolutional neural networks (CNNs) based architectures to achieve this goal. The main problem with CNNs is that they do not take into account spatial hierarchies between objects, thus failing to identify the existence of elements in an incorrect order. To handle this shortcoming, we apply Capsule Networks, a novel neural architecture that represents an object as a hierarchy of its sub-components, as well as the way in which they are integrated. In our study, we will explore the adaptation and expansion of Capsule networks to the domain of anomaly detection in time-series data, and test different combinations of elements from the architecture. We performed our experiments on data extracted from SCADA systems, a multi-sensor environment. Our current models reach near state-of-the-art results, without data preprocessing and model optimization. In addition, we report our findings regarding the relevance of various components in the Capsule Networks architecture and propose adaptations that make our approach more effective in the field of anomaly detection.

Classification of Tabular Data using CNN

Students

Amit Damri

Guides

Mark Last

Abstract
Convolutional neural networks (CNN's) have been widely used in image classification tasks and have achieved great results compared with traditional methods. The main advantage is their ability to extract hidden features automatically using local connectivity and spatial locality. However, conventional Machine Learning methods like Gradient Boosting Trees, SVM and Random Forest, still dominate when using tabular data. A possible reason is the unsuitability of the tabular data structure to the CNN input. We propose a new generic method for representation of tabular data as images that can be used for data visualization and classification using CNN. Our approach is based on user-oriented data visualization ideas, especially on pixel-oriented techniques. These allow to reduce the visual clutter and visualize the largest amount of data. Our technique includes a transformation of each instance of the tabular data into 2D representation, while features with strong correlation are adjacent to each other. To reorder the features, we split the features into clusters according to their correlation. Then for each cluster we use dimensionality reduction techniques, to find the relative position of the features inside the cluster. Finally, we order the clusters according to the mutual information between them. The result is 2D images that can be used for user interpretation and CNN classification. We applied our method to an RNA dataset and compared the performance with previous work and non-CNN state-of-the-art classifiers. The initial results show that our approach is as accurate as other algorithms that use tabular representation. Our next steps are to conduct a user study to test the quality of our visualization and apply our approach to more datasets.

Searching for Class Models

Students

Maksim Bragilovski

Guides

Arnon Sturm
Ron zvi Stern

Abstract
Models in model-based development play a major role and serve as the main design artifacts, in particular class models. As there are difficulties in developing high-quality models, different repositories of models are established to address that challenge, so developers would have a reference model. Following the existence of such repositories, there is a need for tools that can retrieve similar high-quality models. To search for models in these repositories, we propose a greedy algorithm that matches the developer’s intention by considering semantic similarity, structure similarity, and type similarity. The initial evaluation indicates that the algorithm achieved high performance in finding the relevant class model fragments. Though additional examination is required, the sought algorithm can be easily adapted to other modeling languages for searching models and their encapsulated knowledge.

Cyber Threat Reports TTPs Extraction

Students

Omri Margalit grigg

Guides

Bracha Shapira
Rami Puzis

Abstract
Cyber Threat Intelligence (CTI) refers to information about threats and threat actors that helps mitigate harmful events in cyberspace. Cyber Threat Reports (CTR) are human-readable CTI in the form of textual and unstructured reports. Each report focuses on a malware and discusses its characteristics and modus operandi. The goal of this thesis is to develop a system which can perform the extraction of malwares’ TTPs (Tactics, Technniques and Procedures) out of CTRs, and provide an explanation of the classification. Tactics describe what the adversary wants to achieve, and techniques describe how he achieves it. This classification has been shown to be particularly valuable to characterize threat actors' behaviors and improve defensive countermeasures. Preceding work has been done in the field, but we believe we can improve the performance, and achieve better results than our competitors. Our hypotheses include: 1. Using advanced NLP methods such as transformers can yield better results than current works’, as well as using Deep Learning models which has not been done 2. Utilizing the relationships between the tactics and techniques can help achieve better results As part of the research process we are trying out and experimenting with a lot of different embedding models and machine learning techniques. These include LSTM (long short-term memory) with Bert sentence embeddings, SMOBI (smoothed binary embedding), LogEntropy and Word2Vec models. Our metrics of evaluation are precision, recall and F_{0.5} score. Thus far, our preliminary results show that our new techniques are working well, and even better than some main competitors.

Adapting Machine Learning Models to Dynamic Environments Using Shadow Models

Students

Rotem Hananya

Guides

Gilad Katz

Abstract
The need to respond to changes in the analyzed data (i.e., changes in the class distribution or the addition of new classes) is a challenging problem for any supervised learning model. Identifying these changing circumstances in a timely manner and selecting the right learning model to address them remains an open problem. This problem is exacerbated when previously unseen classes of items are introduced during the system’s lifetime. Our proposed approach to this challenge consists of three parts: a) We train multiple learning models “in the background” and then use a meta-learner to select the most suitable model for the next time step. Simply put, our meta-model predicts which of our currently trained models will be most effective in the next time step. b) To address the limited amounts of information in any given dataset, we train our meta-model across multiple datasets. In addition to improving our model’s performance, the said model can be applied to any new dataset without additional training. c) To address the challenge of new class detection, we will use deep reinforcement learning together with a novel embedding-based approach we developed. Our current evaluation is focused on the detection of changes in the environment, without the inclusion of new classes. Extensive evaluation on 50 datasets shows that our proposed model outperforms multiple popular and highly effective models (e.g., Adaboost, Random Forest, Neural nets).

How Polynomial Regression Improves DeNATing

Students

Ari Adler

Guides

Rami Puzis

Abstract
The ubiquity of Network Address Translation (NAT) and mobile hotspots that aggregate source IP addresses of connected devices to a single IP address makes it difficult for an observer in the Internet to learn anything about the internal network. The IP Identification header field of Domain Name System queries and the TCP Timestamp header field of TCP SYN packets are the main features for counting devices in the internal network and association of packets to these devices, also known as DeNATing. In this paper, we introduce a new method that relies on polynomial least-squares curve fitting for DeNATing. Evaluation of our model is performed on multiple real-world datasets containing Windows and Unix devices behind a router using NAT and a mobile hotspot. The proposed method outperforms state-of-the-art for all of the used datasets on all types of devices. Successful DeNATing may help in cybersecurity, anti-fraud, and other use cases.

Network Anomaly Detection via Temporal Data Analytics

Students

Lidor Prager

Guides

Robert Moskovitch

Abstract
The continued growth in computer networks and network applications worldwide had caused an increase in cyberattack count, coupled with an increase in the appearance of novel attack vectors, which require new methods for their detection. While meaningful work was done on network attacks (intrusion) detection via the detection of anomalous behavior, our work focuses on the intrinsic temporal dimension of network data that had yet to be fully utilized. We suggest the use of frequent Time-Intervals Related Patterns (TIRPs) to represent the network’s dynamics and behavior. Since attacks can be seen as anomalous periods of network traffic, TIRPs that are considered frequent in mostly normal periods may be missing or may appear with different properties’ values, such as their repetitiveness, their duration, and more. To evaluate the framework, CSE-CIC-IDS2018 was used, a relatively recent dataset created for the evaluation of anomaly-based intrusion detection systems, that includes different attack scenarios, such as Brute-force, DDoS and more. Experiments showed great accuracy in detecting anomalous time periods. This can benefit supervised anomaly detection of flows, as only time windows that are identified as anomalous may contain malicious flows – that way both it is more efficient and accurate.

Inverse Reinforcement Learning for Cost-Effective Solutions

Students

Ido Rom

Guides

Asaf Shabtai

Abstract
Deep learning and deep reinforcement learning-based methods currently achieve state-of-the-art results in multiple domains. One of the major challenges of using these approaches is to effectively train them for complex decision making in areas of high uncertainty. One approach for overcoming this difficulty is Inverse Reinforcement Learning (IRL), where the algorithm learns from the actions of human experts. While highly effective at times, this approach is problematic in cases where personal style and preferences can result in widely changing behavior. In this research we focus on the identification and representation of various “personalities” and the ways in which they could be strategically activated in alternating ways to improve the algorithm’s performance. By grouping human experts according to their preferences, we will create several DRL agents, each modeling a different personality type. Then, based on interactions with other agents, we will alternate between these personalities to maximize our agent’s performance. For our evaluation we chose the game of poker, which contains both high degrees of uncertainty and human players with widely-varying styles.

Episodic application of clinical GL-based DSS

Students

Guy Har zahav

Guides

Yuval Shahar

Abstract
A clinical guideline (GL) is a document that consists of a set of procedural instructions, founded on evidence-based research, for diagnosis, treatment and management of patients in a specific healthcare area, such as Type 2 Diabetes. Some specific, usually more rigid, types of GLs are referred to as protocols, as in the Oncology domain, in which chemotherapy protocols are common. Clinical GLs support medical personnel such as physicians, nurses, and others, to make better and more consistent decisions regarding the medical care they provide to patients, and reduce the variability of the quality of care. Over the past several decades, multiple efforts have been made to develop decision-support systems (DSSs) for automated GL application, using GLs formalized as computer-interpretable guidelines (CIGs). However, none of the previous efforts concentrated on an episodic application of clinical GLs. Such an episodic application should fit the physician’s workflow and apply the DSS in a realistic fashion. In this research study, we aim to develop a DSS that supports an episodic application of GLs. Using plan recognition techniques and an Asbru-language-based rich representation of CIGs, we shall infer the physician's objectives, determine the actions they performed according to the GL and those that were not, and thus, the stage in the GL application in which they are. Thus, we shall be able to critique the physicians’ actions, and, using the Picard GL-application engine, provide further recommendations for their next one or more actions.

Two Ways of Treatment for Prostate Cancer

Students

Shachar Ron
Alon Gutman
Matan Anavi

Guides

Mordehai Guri

Abstract
There are two main treatments methods when dealing with high risk prostate cancer patients. The most prevailing method is to radiate only the prostate itself, while some are radiating the lymph node as well as the prostate. The medicine world is yet to find significant superior findings about which method is producing a better result and this question remain unanswered. In this project, we have been gathered data about high-risk patients from 3 hospitals – Soroka, Tel Hashomer and Beilinson. The first two are using the main method and the last is using the second method. Based on the data which consists of the medical parameters for each patient and his survival, we have developed a GUI that enables a doctor to insert a patient's details and get the model's preferable method of treatment for him.

Tele-Vol

Students

Erez Shalom
Noam Tractinsky
Ziv Gura
Nadav Chapnick

Guides

Nadav Rappoport

Abstract
The project is part of a collaboration with the Milbat Association at Sheba-Tel ‎Hashomer Hospital, which works to adapt technological solutions for the elderly and ‎disabled‏ ‏‎ persons. As part of the collaboration with Milbat and due to the Coronavirus ‎epidemic, Milbat has identified an urgent need for a volunteer management system ‎that will enable remote communication between its volunteers and the elderly who ‎need help at home. Before the Coronavirus epidemic, the only way for the volunteers ‎of the various organizations to communicate with the elderly was face-to-face ‎meetings‏.‏‎ During the Coronavirus epidemic, volunteers could no longer reach the isolated ‎elderly because both sides were at risk and could have been infected with the virus. ‎Therefore, the elderly were left alone, with no people to talk to and to convey ‎relevant guidance. The solution proposed by our project is to develop an accessible web application that ‎can be easily adapted by the elderly. The application will allow the elderly and the ‎volunteer to interact remotely by video call. The uniqueness of our system is the accessible design for the elderly, and the smart ‎matching between the volunteers and the elderly. The matching is executed by an ‎algorithm we wrote that is based, among other things, on common interests and ‎language. As part of the volunteers' administration system, the application allows the ‎assignment of appointments between volunteers and the elderly, filling in feedback ‎on each meeting, and monitoring the meetings.‎

Time Interval-Related Patterns Clustering

Students

Dor Pinhas

Guides

Abstract
The purpose of my research is to find the relations between entities, in our case patients, behavior to their characteristic and to their diagnosis for better understanding of the population at risk, the response to medication and treatments and to improve the treatment to those patients. Behavior is a sequence of events having temporal relations between them, so we can refer behavior as Time Interval-Related Patterns (TIRP). This method composes from 2 main stages, first is temporal abstraction and the second is TIRPs mining in the. In the abstraction phase we are transform the time series data into time intervals, using discretization method to create cutoffs. To increase the dependency between the TIRPs and the characteristic variable we developed a discretization method that finds the cutoffs most correlated with a given groups, which is a way to split the entities by an atemporal variable (i.e., Age). The second phase is TIRP mining, after we transformed the data to time intervals representation, we are using the KaramaLego algorithm, which is an algorithm for pattern mining in temporal data. The patterns are composed of intervals and the temporal relations between each pair. The frequent TIRPs we find using the KarmaLego are the clusters, and all the entities, we can find in their data the TIRP at least once, belongs to the TIRP cluster. In this way we can analysis the relations between entities behavior, characteristics, and diagnosis. We are using that to investigate in a better way the influence of treatments on patients with correlation to their characteristics and personalize the treatments to these patients.

CyberSeq: Real-Time Cyber-Attacks on Next-Generation Sequencing Devices and practical Defenses