Abstract —Recommendation systems become essential in web applications that provide mass services, and aim to suggest automatic items (services) of interest to users. The most popular used technique in such systems is the collaborative filtering (CF) technique, which suffer from some problems such as the cold- start problem, the privacy problem, the user identification problem, the scalability problem, etc. In this paper, we address the cold-start problem by giving recommendations to any new users who have no stored preferences, or recommending items that no user of the community has seen yet. While there have been lots of studies to solve the cold start problem, but it solved only item-cold start, or user-cold start, also provided solutions still suffer from the privacy problem. Therefore, we developed a privacy protected model to solve the cold start problem (in both cases user and item cold start). We suggested two types of recommendation (node recommendation and batch recommendation), and we compared the suggested method with three other alternative methods (Triadic Aspect Method, Naïve Filterbots Method, and MediaScout Stereotype Method), and we used dataset collected from online web news to generate recommendations based on our method and based on the other alternative three methods. We calculated level of novelty, coverage, and precision. We found that our method achieved higher level of novelty in the batch recommendation whilst it achieved higher levels of coverage and precision in the node recommendations technique comparing to these three methods. 1 Keywords — The Cold Start Problem, Recommendation Systems, Personalization Systems, Adaptive Web Systems. 1. P REVIOUS S OLUTIONS Several solutions are provided to the cold start problem, one of these solutions is known as Naïve Filterbots Model which it depended on an algorithm called Naïve Filterbots algorithm to inject pseudo users or bots into the system [1]. These bots rate items according to the attributes of items or users, for example, according to the average rate of some demographic similarities between users. Once the filterbots are defined and injected into the user-item matrix, the system will treat them like any other existing users or items ratings (actual user-item ratings), and then standard CF algorithms are applied to generate recommendations. This method is an extension of RipperBots [2], where filterbots were automated agents that rate all or most items using information filtering techniques, and these rates were in a binary form, while Naïve Filterbots used average ratings form. As soon as they inject ratings into the systems, they used user-based and item-based algorithms to calculate predictions. The user-based algorithm depended on the Person correlation coefficient to find a similarity between users as follows: − − − − = ∩ ∈ i 2 v i v, i 2 u i u, I I i v i v, u i u, ) r (r ) r (r ) r ).(r r (r v) sim(u, v u (1) Where v) sim(u, is the similarity between users u , and v , while i u, r , and i v, r are the ratings of an item i which had been done by both users u and v In addition u r represents the user u average rating for all items, and v r represents the user v average rating for all items, and v u I I ∩ is the set of items that rated by both users u and v. Triadic Aspect Model depends on users’ demographic information such as age, gender, and job [3]. They used a triadic aspect model suggested by Hofmann [4], which includes three-way co-occurrence data:. ages, genders, and jobs. An observation is a triple (a, g, i) corresponding to a user u with three features. We can summarize the model steps as follows: i. Find potential cause = z z j P z g P z a P z P j g a p ) / ( ) / ( ) / ( ) ( ) , , ( (2) Where P (a, g, j) is the potential cause for n (a, g, j) users who have features <a,g,j>. ii. The probability latent cause for specific user of features <a,g,j> Where L represents latent value for specific user, and all calculated latent variables considered as types of users z. Ossama H. Embarak. ohke1@hw.ac.uk School of Mathematics & Computer Sciences Department of Computer Science Heriot Watt University, UK A Method For Solving The Cold Start Problem In Recommendation Systems = j g a j g a P j g a n L , , ) , , ( log ) , , ( (3) 2011 International Conference on Innovations in Information Technology 978-1-4577-0314-0/11/$26.00 ©2011 IEEE 238 iii. Parameter Estimation Each has a set of parameters over user types (calculated latent variables in previous step). Therefore they estimate these parameters for each user until it reached a specific local maximum value. This maximization process consists of two steps that are, Expectation step (E) ′ ′ ′ ′ ′ = z z j P z g P z a P z P z j P z g P z a P z P j g a z P ) | ( ) | ( ) | ( ) ( ) | ( ) | ( ) | ( ) ( ) , , | ( (4) Maximization step (M) j g j g a z P j g a n z a P , ) , , | ( ) , , ( ) | ( α (5) j a j g a z P j g a n z g P , ) , , | ( ) , , ( ) | ( α (6) g a j g a z P j g a n z j P , ) , , | ( ) , , ( ) | ( α (7) iv. Prediction Find distribution rating predictions of a new item for an existing user u as follows = = = z y y u z P z v P(R v|u) P(R ) | ( ) | (8) Where item y distribution ratings are equal to rating values provided by existing user u on different latent causes (variables) z Find distribution rating predictions of new item for a new user u as follows = = = z y y j g a z P z v P(R v|u) P(R ) , , | ( ) | (9) Since a user is a new one, then the distribution of latent variables z over user u is zero, therefore the collected user demographic data substitutes the distribution over the user type. MediaScout Stereotype Model [5] is a stereotype approach which represents a combination of elements from both content-based and collaborative filtering approaches. They had created a set of stereotype content-based profiles and they used a similarity vector of stereotypes as a user profile. Moreover, they classified new users into clusters through an interactive questionnaire generated automatically from the stereotypes after each updates, while existing users are automatically classified to new stereotypes through the update process and they do not need to go through the questionnaire again. A relevance value is calculated based on matching between an item profile and a stereotype profile; as shown by equation (10). The calculated relevance value is used to generate recommendations. ∈ = s stereotype s s) nce(i, v(s)releva u) i, relevance( (10) Where relevance (i, u) refers to the relevance value of an item i to the user u . while relevance (i, s) refers to the relevance value of an item i to a stereotype s 2. S UGGESTED M ETHOD The main goal of web personalization and recommendation systems is to equip users with what they are looking for on a particular web site. We suggested a method which we called the Active Node Technique (ANT); we focus on users browsing targets, which reflect their power of thinking about specific items. We assume that “abstract users are similar and think the same way in certain item(s) of a particular web site” ; hence “Users who go through a specific path have similar interests on nodes of this path, which differ from the other paths, and they should inherit benefits of this path“. Therefore, we collected users abstract click-streams, which reflect their power of thinking of a specific item(s). Collected abstract click-streams are used to create abstract loop-less sessions (maximal sessions) that show the abstract users’ preferences. We evaluated the collected maximal sessions to remove extremely low and extremely high sessions (in order to avoid the robustness problem), and then we absorb all sub- sessions into its super-sessions (in order to reduce the storage space without losing data quality). Then, we created integrated routes (Elastic routes); which represent the largest abstract loop-less routes visited by abstract users through their click- streams on a specific web site, which in turn is used for generating the delivered recommendation sets to site visitors regardless of their personal data. Provide a recommendation for new users; using the presented concept, is valid where any new user enters the web site will follow a specific path and do his own-click streams, which express his/her thinking. Therefore,the system will be able to provide recommendations based on his online maximal preferences and based on his followed path(s). As indicated earlier, users are identified by their online browsing targets. Therefore, the privacy problem using this concept is vanished. A. Method Illustration & Implementation As soon as users’ preferences are collected, a session significance value is calculated, where a session is significant if it makes a clear difference to the stored integrated routes. Session significance value is also an estimation of how much it reflects the users’ power of thinking, and it depends on the spent time by specific user during his session. However, we see that very low sessions as well as very high sessions are not valid, because they involve non-important browsing. Equation (11) is used to calculate each node (item) threshold, which in 239 turn is used to find the significance of every session as shown by the equation (12). k x time x Threshold k i i j = = 1 ) ( ) ( (11) Where, the numerator refers to the summation of spent time on an item by site users in k sessions , while the denominator refers to the number of sessions which contains an item as an element. Threshold value should be updated with changes in the created maximal forward sessions. TH TH TH n i i j Min Max Min x time s Sig − − = = 1 ) ( ) ( (12) Where, ) ( j s Sig , is the significance value of the session j s , = n i i x time 1 ) ( , the time summation of all items in the session j s , TH Max , the maximum threshold value from the session j s items, TH Min , the minimum threshold value from the session j s items. All significance sessions input to the absorption process (AP), if ) ( k S P objects are the ordered set of pages visited in session k S , then whenever we have ) ( ) ( j i S P S P ⊆ , the integrated route profile (IRP) only Store j S with appropriate recalculated weights. Therefore, as soon as an absorption case detected, we update the larger session and remove the smaller one, and then we started to generate integrated routes and store it in the integrated route profile (IRP). We can generate integrated routes if there is an intersection between the beginning and the end of any two absorbed sessions. The main goal of integrated routes is to generate maximal stretchable paths of interests. Create such integrated routes reduce number of stored session as well as it provides more flexibility to generate recommendations using the active node technique. We should ask ourselves one more question, what about new added items?; we consider new added items as not only newly added items to the web site, but also all items which were never visited before and still have zero threshold weight (average spent time on an item). Therefore, we used the new added items related hyperlinks structure to calculate a weight and then add it to recommendation set. We used the link structure raised when new item is added. There will be at least one link from the requested item to the new item (e.g. when a new book is added to Amazon, it will be linked from a ‘New Books’ page, as well as other pages relating to its category). To do this, we use a ‘ virtual weight’, which reflects the expected weight of new items by all site visitors who have preferences relating to this new item, as shown by fig. 1. We consider all hyperlinks between nodes as e , where e=1 if the hyperlink (or semantic relationship) is found, else e=0. As well as, every item appears or selected in sequential manner with any other item stored in the integrated route has a real weight w Let N be the new item, and let X={x 1, x 2, x 3, ....,x n} be the set of items that link to N. Let A be the active node, and there is a path Ax i for any x i in X, then we calculate a virtual weight for the link A N, as shown by equation (17). Formula (13) is used to calculate the virtual weight between the active node A and the new added item N. ) ( ) ( ). , ( ) | ( A Threshold i x Threshold i x A r w e i x A N A v W = → → (13) Then by substituting the threshold value; using equation (11), we got the virtual weight equation as follows. k k j j A time n i x n i time i x A r w e i x A N A v W = = = → → 1 ) ( ) ( 1 ). , ( ) | ( (14) = = = → → k j j A time n i x n i time k i x A r w e i x A N A v W 1 ) ( ) ( 1 ). , ( ) | ( (15) = = = → → k j j A time i x n i time n k i x A r w e i x A N A v W 1 ) ( ) ( 1 ). , ( ) | ( (16) Where ) | ( i v x A N A W → → is the virtual weight between the active node A and the new added item N via the item x i. And ) , ( i r x A w is the real weight between an active item A and item x i , n represents the number of times the item x i is found in the integrated routes. While k represents the number of times the item A is found in the integrated routes, ) ( 1 i n i x time = represents the spent time by all site visitors on an item x i , which stored in IRP; and = k j j A time 1 ) ( represents the spent time by all site visitors on item A , which are stored in IRP. If the collected data are in ratings format; then instead of using the consumed time in sessions by users, we can calculate virtual weight between item N and item A via item x i , as shown by equation (17). 240 = = = → → k j j i n i i r i v A R x R n k x A w e x A N A W 1 1 ) ( ) ( ). , ( ) | ( (17) Where ) | ( i v x A N A W → → is the virtual weight between the active node A and the new added item N via item x i , and e=1 if the hyperlink (semantic relationship) is found between items x i and N, else e=0 . On other hand, ) , ( i r x A w refers to the number of times items A and x i appears (purchased) together. While k refers to the number of users whom rate item A, and n refers to the number of users whom rate an item x i . As well as, ) ( 1 i n i x R = represents the total ratings done by all users for item x i , while = k j j A R 1 ) ( represent the total ratings done by all users for item A j Fig. 1 New item in virtual link to old one We can predict average virtual weight of any new item by considering all hyperlinks relationships (semantic) using equation 18. n x A N A W N A W n i i v v = → → = → 1 ) | ( ) ( (18) Virtual weight is not a random weight, but it’s a logical weight calculated based on users’ browsing preferences via hyperlink (or semantic) structure between items, and also by considering real weights and calculated thresholds of the involved items. B. Generate recommendations Two types of recommendations are generated for new users based on the created integrated routes and the users’ online maximal forward reference, batch recommendation and node recommendation. In node recommendation, the system creates recommendation set based on directly linked nodes to the active node (page); node recommendation is not an item-to- item recommendation, because it does not depend on items attributes, but it depends on the associated visited nodes in a specific path(s) with high relative weight (high power of thinking weight). Highly weighted nodes that are stored in IRP and have a sequential association to the online new user path are selected for recommendation. The used rule to find a candidate for node recommendation is ) IR | ( j ⊂ ⎯→ ⎯ i e i x A x Find j , where A refers to the new user selected active node, and i x refers to all items in association with the active node A and stored in the integrated route j IR All i x are involved as candidates for recommendations and only top n items are selected for recommendation. While in batch recommendation, the recommendation set is generated using top N highly weighted nodes in his expected paths based on his/her online maximal path. The used rule to find a candidate for batch recommendation is ) | ( j i IR CMP x Find ⊂ , where CMP refers to the user current online maximal path, and j IR refers to the stored integrated routes that CMP is a subset of, and recommendation set maintains top n candidates. New items are also involved in the recommendation set, as indicated before; we consider new added items as well as old items that were never visited before as new items. These new items threshold weights are equal to zero. All selected candidates for recommendation in node and batch recommendation should be checked if they have a direct link to new items (nodes with a threshold equal to zero), and then by implementing the virtual weight equation (17), we can select top n items for recommendation. 3. D ESCRIPTION OF E XPERIMENTS The standard data set used in the evaluation of many collaborative filtering algorithms are the MovieLens, Book- Crossing and/or Jester joke dataset, but these datasets do not provide the site structure, or semantic relationships between objects, besides these datasets are rating-based, which is unsuitable for testing our method. Therefore, we collected dataset with site structure; we collected it in three phases 1. In the first phase (starting from January 2009 to march 2009) log data are cleaned and used to generate integrated routes (ANT method) and spent time per page is considered as the page weight that reflects how a page is valuable to site visitors. As well as, for stereotype method we created an initial stereotype profile with specific relevance value related to different topics (news, sports, business, technology, etc.), while the cleaned data is used to generate user affinity stereotypes profile, and spent time per page is considered as the relevance value. For naïve Filterbots method , we created an item-item matrix with all cleaned data associated with its weight (average time duration). Then we created items feature bots that used to generate items ratings (using average weight) for any new items based on the features of items (news, sports, business, technology, etc.). Then we injected these filterbots (like any other existing items associated with its ratings) into the item- 1 Every phase, dataset has been collected in three months period 241 item matrix (we implemented naïve filterbots for an item based only), and then implemented item based algorithm to calculate predictions and find recommendation sets. For demographic based model , we determined age, gender, and job for all users involved in the training session (264 users are involved in the training process), and then by implementing Triadic Aspect Model, we calculated item rating predictions using users demographic features (we classified users based on their gender, Age, and Professions). Second phase (starting from April 2009 to June 2009), we provided recommendations to all users involved in the training and based on each method. Provided recommendations were displayed to users in one set without separation. As well as we collected their selections, and then the collected data are input to the ANT to update integrated routes stored in the IRP from first phase. Third phase (starting from August 2009 to October 2009), we again provided recommendations to users based on different used methods, and we collected user selections to be processed by the ANT. Recommendation sets collected throughout the second and third phases beside users’ selection are used for evaluation. Fig. 2 shows that batch recommendations achieved highest level of novelty, because the created recommendations were based on all visited nodes on the stored super sessions of the current user online maximal path. While in node recommendations level of novelty were lower than batch recommendations because recommendation candidates are restricted to those nodes with virtual or hyperlink relationship to the visited node. Therefore the available candidate nodes for recommendations in node recommendations are less than the available for batch recommendations. Naïve filterbots achieve a high level of novelty similar to node recommendations, but with the increase in the training session, level of novelty declined because the re-injected ratings are not too much differed from previously injected ratings 2 (we used each item average time as a default prediction). Although, naïve filterbots started with a high level of novelty because all site nodes injected with false ratings and hence the available non repeated candidates for recommendation were high too, we should mention here that these injected ratings are rule-based which not reflect an actual rating done by users. However, stereotype and Aspect models were started with a high level of novelty (but lower than the Active node model and naïve filterbots model) because during first training sessions, we still have too many novel items (categorized with specific stereotype and demographic category) that never involved in the recommendation sets. Nevertheless, their novelty dramatically declined as shown by fig. 2, because both models (stereotype and triadic aspect) classify users based on stereotype, and demographic data of users. Therefore, when a specific user classified into specific stereotype or demographic categories 2 In the implementation, we used items’ time duration as ratings then he/she always received recommendations based on his/her class, and the novelty of recommended items depend on changes that happen on the user category e.g. by adding a new item(s) to his category Fig. 2 Novelty of recommendations. In coverage, we tried to measure how the recommended sets cover items of target sets. Fig. 3 shows the level of coverage for the suggested method as well as for the other three alternative methods. Node’s recommendation achieves the highest level of coverage because provided candidates for recommendation based on their virtual and/or hyperlinks relationships of these items to the visited node, also target sets stored in integrated routes are those virtually and/or semantically related to the visited node. Batch recommendation achieves lower coverage than the node recommendation but its coverage increased with the increase of visited nodes, the increase on the number of visitors lead to the increase in the stored integrated routes, which also lead to increase on the provided candidates for recommendations. Therefore, in the first iterations, number of integrated routes was lower than those in the later sessions, and hence coverage level was lower than the coverage level of earlier sessions. Triadic aspect achieved coverage level similar to batch recommendation in the first iterations, but its coverage value is declined in the last iterations because the item predications affected by the increase on users of specific demographic data, which also affects users calculated parameters. Hence some items of high predictions that appears in previous recommendation sessions becomes with lower prediction values, which then prevents them from involvement in the recommendation sets. Both naïve filterbots and stereotype models achieved lower coverage than active node method. In naïve filterbots, all non rated items are injected with false ratings, which consider all site nodes as visited ones, and hence candidate items for recommendation are widely available and hence the match between recommended items and target sets becomes very low (this also explained its high level of novelty). Stereotype model achieved a lower level of coverage than the active node because the system was updated using implicit feedback that considered all non selected items by a specific user as not liked items. Hence, all previously selected items in the target sets will be vanished from recommendation sets after some 242 iterations, and hence, the match between new recommendations sets and target sets becomes low. Fig. 3 Coverage of recommendations. Fig. 4 shows the calculated accuracy levels, where node recommendation achieved the highest level of precision. Increase in novelty doesn’t contradict with the increase in coverage, because we measure novelty between items of recommendation sets, while coverage and precisions calculated based on recommendation sets and target sets. In addition, we should mention here that the scope of generated recommendations based active node method was the users’ power of thinking, whereas in Triadic Aspect model generated recommendations were restricted to those latent demographic parameters, while stereotype model are restricted to stereotype categories of the users, and in naïve filterbots recommendations were restricted to injected filterbots which cause high robustness problem and direct recommendations to specific highly weighted items. Fig. 4 Precision of recommendations. 4. C ONCLUSION & F UTURE W ORKS The suggested method considered users by their browsing targets (power of thinking) on the web site. We collected users’ preferences without their personal data, as well as all sessions are evaluated (before storing it in the integrated routes) in order to avoid the robustness problem. The active node method provided two different recommendation methods, the node recommendation; that generated recommendations based on the virtual relationship between the active node and the candidate items, and batch recommendation; that generated recommendations using all super sessions of the visited online maximal session. Although our experiments showed that batch recommendation achieved the highest level of novelty, and the node recommendation achieved the highest level of coverage and precision. Nevertheless, we only used one website and limited number of users, but what about using many websites as well as what about the increase in the number of online users, is this affecting the system performance?, is the active node technique needs adaptation to fit specific system requirements?. Is the merging of the active node technique into the semantic structure improving levels of novelty for node recommendation and/or level of coverage and precision for batch recommendations?, all these queries still need to be verified. R EFERENCES [1] S. Park, D. Pennock, O. Madani, N. Good, and D. DeCoste, "Naïve filterbots for robust cold-start recommendations," In Proc. KDD. New York, USA, pp.699-705, 2006. [2] N. Good, B. Schafer, J. Konstan, A. Borchers, B. Sarwar, J. Herlocker, and J. Riedl, “Combining Collaborative Filtering With Personal Agents for Better Recommendations,” In Proceedings of the AAAI-'99 conference, pp. 439-446, 1999. [3] X. Lam, T. Vu, T. Le, and A. Duong, “Addressing cold-start problem in recommendation systems,” In Proceedings of the 2nd international ACM conference on Ubiquitous Information Management and Communication. New York, NY, USA, pp. 208–211, 2008. [4] T. HOFMANN, “Probabilistic latent semantic indexing,” In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. Berkeley, California , USA, pp.50 – 57, 1999. [5] G. Shani, L. Rokach, A. Meisles, Y. Gleizer, and D. Ben-Shimon,”A Stereotypes-Based Hybrid Recommender System for Media Items,” In Proc. AAAI Workshop On Recommender Systems, 2007. 243