Dr. Rakesh Kanji Assistant Professor (SG) (91)01792-239249 rakesh.kanji@juit.ac.in For More Information Click here
Dr. Rakesh Kanji is currently working as Assistant Professor in the Department of CSE, Jaypee University of Information Technology, Waknaghat-173234. Prior to that he has worked in Narula Institute of Technology (JIS Group), Kolkata (2019-2020); UEM, Kolkata (2018-2019) for almost 2 years in Academics. He has completed Ph.D. degree from IIT Jodhpur in 2018, working on Drug Side effect prediction. He has published papers in international journals and conferences of repute. His research interests include Personalized medicine, Natural language processing. He has been certified by UGC-NET as Assistant Professor.
Open Project Titles:
1. Scalable recommendation system
Recommender systems, especially those employing collaborative fltering techniques, require large amounts of training data, which cause scalability problems. Te scalability problem arises when the amount of data used as input to a recommender system increases quickly. In this era of big data, more and more items and users are rapidly getting added to the system and this problem is becoming common in recommender systems. Two common approaches used to solve the scalability problem is dimensionality reduction and using clustering-based techniques to find users in tiny clusters instead of the complete database. Autoencoders could be used to train over data to generate the hidden parameters which capture the information in relatively very lesser dimension and these are further used to generate the latent information of upcoming or new query data.
2. Semi Parameter free Recommendation system
Model-based algorithms, on the other hand, typically make recommendations by first developing a model of user ratings. Regression, matrix factorization algorithms and Bayesian methods are the popular ones that fall into this category. These methods either explore a ‘latent space” or build a model to capture the relationship. Any model must have to assume about the data such as their distribution like normal, exponential or their relationship with other data in terms of linear and non linear. These make the recommendation biased to the assumption, thus there is strong requirement of implementing parameter free approaches which happens to be simple, faster and easily adoptable with large data. One of the way is computing biclusters which consists of most interconnected user and items. We start with finding potential candidate user which are very similar and further identification of sharing items which increase their similarity.
3. Text Classification
With the advancement in the technology and in the age in internet, huge volume of data is generated by the internet users on the web. The data may be from online businesses, social networking sites, blog spots, social media etc. People have changed the way of exchanging their ideas among themselves. Large volume of data is generated by expressing one’s own opinion about the products, by comments, by reviews, by tweets etc. All the data generated so far by this type of information exchange is in unstructured form, so to get any type of information or patterns from the available data it is needed to be converted into the structured form. This led to the emergence of the fields of text data preprocessing, analysis, categorization and extracting patterns from it, which segregate positive tweet or review or types of documents, keywords etc, These identification are useful for speed up the searching documents or stopping some dispute tweet. We are currently implementing different machine learning models on numeric representation of sentence, finding keywords which captures global sentence similarity and employ the graph neural network on this.
4. Spam detection
Online reviews are often the primary factor in a customer’s decision to purchase a product or service, and are a valuable source of information that can be used to determine public opinion on these products or services. Because of their impact, manufacturers and retailers are highly concerned with customer feedback and reviews. Reliance on online reviews gives rise to the potential concern that wrongdoers may create false reviews to artificially promote or devalue products and services. This practice is known as Opinion (Review) Spam, where spammers manipulate and poison reviews (i.e., making fake, untruthful, or deceptive reviews) for profit or gain. Since not all online reviews are truthful and trustworthy, it is important to develop techniques for detecting review spam. In this field we face challenge in machine learning training phase with unbalanced dataset, hence we need to add synthetic data. We are interested to use Synthetic Minority Oversampling Technique or Generative adversial network in order to generate such data.