Graphs & Tensors

[SCIE] Sunhwa Jo, Beomjun Park, Suan Lee, and Jinho Kim. 2021. "OLGAVis: On-Line Graph Analysis and Visualization for Bibliographic Information Network" Applied Sciences 11, no. 9: 3862. (IF: 2.474, JCR: Q1)

Sunhwa Jo, Beomjun Park, Suan Lee, and Jinho Kim
Applied Sciences
April 2021

Abstract

Real-world systems that are composed of various types of components, their interactions, and relationships, and numerous applications are often modeled as graphs or network structures to represent and analyze the presence of relationship, shape, and meaning of objects. Network-structured data are used for various exploration and in-depth analysis through visualization of information in various fields. In particular, online bibliographic databases are a service that is used for a myriad of purposes, such as simple search of research materials as well as understanding the history and flow of research, current status, and trends. A visualization tool that can intuitively perform exploration and analysis by modeling the data provided by the online bibliographic database in a network structure will be a very meaningful study for the exploration of various information using a large amount of complex bibliographic data. This study has modeled an online bibliographic database as an information network, and further developed a prototype of a visualization tool that provides an interactive interface for easily and efficiently performing visual exploration and multidimensional analysis. The visualization tool that was developed through this study will be used to conveniently perform various online analysis of the bibliographic data, and the information and knowledge acquired as a result of the analysis are expected to contribute to the research development of various researchers. Furthermore, this visualization tool can be applied to other types of data in the future, and it is expected to develop into a useful tool for various information network analysis by improving, supplementing, and expanding the functions and performance of the developed prototype.

Keywords

bibliographic information network; Information Network OLAP; information network visualization

Minkyu Kim and Suan Lee, “Predicting Revenues of Seoul Commercial Alley using Neural Tensor Factorization,” Big Data and Smart Computing (BigComp), 2021 IEEE International Conference on. IEEE, 2021.

Minkyu Kim and Suan Lee
Big Data and Smart Computing (BigComp), 2021 IEEE International Conference on. IEEE, 2021.
January 17-20, 2021

Abstract

Those who want to start their own businesses must decide a location and service to start. In order to make the decision, they must know characteristics of the location and service, such as average revenues and floating population. However, it is usually very difficult to collect and analyze these characteristics. Therefore, we propose a novel deep learning model named Neural Tensor Factorization (NeuralTF) that automatically analyzes the characteristics for predicting revenues, and a method for recommending appropriate location or service to start their businesses based on the predicted revenues. NeuralTF is a combination of Tensor Factorization(TF) and Deep Neural Network(DNN). We compare NeuralTF with other machine learning models using Seoul Commercial Alley dataset. In addition, we compare performances of NeuralTF when TF and DNN components share the embedding space and when they do not.

Keywords

Recommender System, Tensor Factorization, Neural Network, Deep Learning, Neural Tensor Factorization

[SCIE] Seok Kang, Suan Lee, and Jinho Kim, “Distributed graph cube generation using Spark framework,” The Journal of Supercomputing 76.10 (2020): 8118-8139. (IF: 2.469, JCR: Q2)

Seok Kang, Suan Lee, and Jinho Kim
Cluster Computing
October 2020

Abstract

Graph OLAP is a technology that generates aggregates or summaries of a large-scale graph based on the properties (or dimensions) associated with its nodes and edges, and in turn enables interactive analyses of the statistical information contained in the graph. To efficiently support these OLAP functions, a graph cube is widely used, which maintains aggregate graphs for all dimensions of the source graph. However, computing the graph cube for a large graph requires an enormous amount of time. While previous approaches have used the MapReduce framework to cut down on this computation time, the recently developed Spark environment offers superior computational performance. To leverage the advantages of Spark, we propose the GraphNaïve and GraphTDC algorithms. GraphNaïve sequentially computes graph cuboids for all dimensions in a graph, while GraphTDC computes them after first creating an execution plan. We also propose the Generate Multi-Dimension Table method to efficiently create a multidimensional graph table to express the graph. Evaluation experiments demonstrated that the GraphTDC algorithm significantly outperformed Spark SQL’s built-in library DataFrame, as the size of graphs increased.

Keywords

Distributed parallel processing, Spark framework, Resilient distributed dataset, Graph cube, Data cube, Online analytical processing

[SCIE] Svetlana Kim, Suan Lee, Jinho Kim, and Yong-Ik Yoon, “MRTensorCube: tensor factorization with data reduction for context-aware recommendations,” The Journal of Supercomputing 76.10 (2020): 7847–7857 (IF: 2.469, JCR: Q2)

Svetlana Kim, Suan Lee, Jinho Kim, and Yong-Ik Yoon
The Journal of Supercomputing
October 2020

Abstract

Context information can be an important factor of user behavior modeling and various context recognition recommendations. However, state-of-the-art context modeling methods cannot deal with contexts of other dimensions such as those of users and items and cannot extract special semantics. On the other hand, some tasks for predicting multidimensional relationships can be used to recommend context recognition, but there is a problem with the generation recommendations based on a variety of context information. In this paper, we propose MRTensorCube, which is a large-scale data cube calculation based on distributed parallel computing using MapReduce computation framework and supports efficient context recognition. The basic idea of MRTensorCube is the reduction of continuous data combined partial filter and slice when calculating using a four-way algorithm. From the experimental results, it is clear that MRTensor is superior to all other algorithms.

Keywords

Context awareness, Tensor data cube, MapReduce framework

[KCI후보] 유영석, 박범준, 조선화, 이수안, 김진호, “그래프 구조를 갖는 서지 빅데이터의 효율적인 온라인 탐색 및 분석을 지원하는 그래픽 인터페이스 개발,” 한국빅데이터학회지, 한국빅데이터학회, 제5권, 제1호, pp. 77-88, 2020.08.25.

박범준, 조선화, 이수안, 신지운, 유혁상, 김진호
한국빅데이터학회지, 한국빅데이터학회
2020년 08월

Abstract

최근 다양한 실세계의 복잡한 관계를 그래프의 형태로 구성하고 분석하는 다양한 연구들이 진행되고 있다. 특히 DBLP와 같은 컴퓨터 분야 문헌 데이터 시스템은 논문의 저자, 그리고 논문과 논문들이 서로 인용 관계로 표현되는 대표적인 그래프 데이터이다. 그래프 데이터는 저장 구조 및 표현이 매우 복잡하므로, 문헌 빅데이터의 검색과 분석, 그리고 시각화는 매우 어려운 작업이다. 본 논문에서는 문헌 빅데이터를 그래프의 형태로 시각화한 그래픽 사용자 인터페이스 도구, 즉 EEUM을 개발하였다. EEUM은 그래프 데이터를 시각적으로 표시하여 연결된 그래프 구조에 따라 문헌 데이터를 브라우징 하는 기능을 제공하며, 문헌 빅데이터에 대한 검색 및 관리, 분석이 가능하도록 구현하였다. 또한 EEUM을 DBLP가 제공하는 문헌 그래프 빅데이터에 적용하여 편리하게 검색, 탐색 및 분석하는 할 수 있음을 시연한다. EEUM을 이용하여 모든 연구 분야에서 영향력 있는 저자나 논문을 쉽게 찾을 수 있으며, 여러 저자와 논문 사이의 모든 관계를한 눈에 볼 수 있는 등 복잡한 문헌 그래프 빅데이터의 검색 및 분석 도구로 편리하게 사용할 수 있다. Recently, many researches habe been done to organize and analyze various complex relationships in real world, represented in the form of graphs. In particular, the computer field literature data system, such as DBLP, is a representative graph data in which can be composed of papers, their authors, and citation among papers. Becasue graph data is very complex in storage structure and expression, it is very difficult task to search, analysis, and visualize a large size of bibliographic big data. In this paper, we develop a graphic user interface tool, called EEUM, which visualizes bibliographic big data in the form of graphs. EEUM provides the features to browse bibliographic big data according to the connected graph structure by visually displaying graph data, and implements search, management and analysis of the bibliographc big data. It also shows that EEUM can be conveniently used to search, explore, and analyze by applying EEUM to the bibliographic graph big data provided by DBLP. Through EEUM, you can easily find influential authors or papers in every research fields, and conveniently use it as a search and analysis tool for complex bibliographc big data, such as giving you a glimpse of all the relationships between several authors and papers.

Keywords

그래프 데이터, 문헌 빅데이터, 그래픽 인터페이스, 시각화 도구 Graph Data, Bibliographic Big Data, Graphic Interface, Visualization Tool

SungJin Park, Suan Lee, and Jinho Kim, “Estimating Revenues of Seoul Commercial Alley Services using Tensor Decomposition & Generating Recommendation System,” Big Data and Smart Computing (BigComp), 2020 IEEE International Conference on. IEEE, 2020.

SungJin Park, Suan Lee, and Jinho Kim
Big Data and Smart Computing (BigComp)
February 19-22, 2020

Abstract

As for people who wish to start their own businesses, their concerns are whether they could survive during their operations because most stores or services in Seoul are not able to survive for more than 5 year possibly due to the poor decision of location/service to start. In order to solve this problem, using big data could be helpful to increase the survival rate. Singular Value Decomposition (SVD) has been widely used in finding the similarity between all pairs of alleys and obtaining predictions from unknown relevance scores. Since tensor decomposition is the extension of SVD for multi-dimensional data, using this method to find the similarity between all pair of alleys could be the solution of increasing survival rate. This paper aims to generate good prediction tensor, TENSORCABS, that is able to recommend users appropriate alley location to start their businesses or the appropriate services to start at the user’s desired location. Both CP and Tucker decompositions are used and compared to evaluate which method has better performance. Also, r-square for regression problem and precision & recall for top-k recommendation performance are used to evaluate the TENSORCABS. As results, actual and predicted values are good-fitted, and prediction tensor performs well on the top-k recommendation. In addition, Tucker outperforms CP for this situation. Therefore, the proposed method has advantages that can handle high-dimensional data and can use decomposition for recommendations of various perspectives. With this method, users are able to obtain recommendations of the appropriate alleys with predicted revenues for opening any business service or the appropriate services with predicted revenues on a user’s desired alley location.

Keywords

Recommendation System, Tucker Decomposition, Canonical Polyadic, Collaborative Filtering, Top-k Recommendation, Tensor, Commercial Alley

Heesang Kim, Suan Lee, SungJin Park, and Jinho Kim, “Detecting Spammers on Social Networks using Strongly Connected Components in the Distributed Environment,” The 2nd International Workshop on Big Data Analysis for Smart Energy, 2019.

Heesang Kim, Suan Lee, SungJin Park, and Jinho Kim
The 2nd International Workshop on Big Data Analysis for Smart Energy
14 Oct. 2017

Abstract

Recently, several studies attempt to process big data using Apache Spark. In addition, spammers are increasing lately, and they are exposing indiscriminate information on social networks (SNS) that users do not want. Previous relationshipbased spammer detections are not suitable for big data graph, because the detecting spammers in big data graphs requires a long computation time. Therefore, we propose an efficient spammer detection scheme based on social relations using Strongly Connected Components (SCC), which quickly finds cyclical relationships on Spark GraphX. We test our proposed spammer detection method through experiments, and it is able to find spammers in big data graph quickly.

Keywords

Spammer, Spark, Strongly Connected Components, Social Networks

Suan Lee, Young-Seok You, Sungjin Park, and Jinho Kim, “EEUM: Explorable and Expandable User-interactive Model for Browsing Bibliographic Information Networks,” Proceedings of the Seventh International Conference on Emerging Databases: Technologies, Applications, and Theory. ACM, 2017.

Suan Lee, Young-Seok You, Sungjin Park, and Jinho Kim
Proceedings of the Seventh International Conference on Emerging Databases: Technologies, Applications, and Theory
14 Oct. 2017

Abstract

A suitable user interactive model is required to navigate efficiently in information network for users. In this paper, we have developed EEUM (Explorable and Expandable User-interactive Model) that can be used conveniently and efficiently for users in bibliographic information networks. The system shows the demonstration of efficient search, exploration, and analysis of information network using EEUM. EEUM allows users to find influential authors or papers in any research field. Also, users can see all relationships between several authors and papers at a glance. Users are able to analyze after searching and exploring (or navigating) bibliographic information networks efficiently by using EEUM.

Keywords

Information networks, Graph database, Data visualization, User-interactive model

Young Seok You, Suan Lee, and Jinho Kim, “Design and Development of Visualization Tool for Movie Review and Sentiment Analysis,” Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory. ACM, 2016.

Young Seok You, Suan Lee, and Jinho Kim
Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory. ACM
17-19 Oct. 2016

Abstract

Typically, application or website shows the comments of people in a list format. This list means in seeing chronologically or log of recommends. However, it is difficult to grasp because of reading and knowing all countless comments of the topic at a glance. Therefore, it requires a lot of ability to grasp information at a glance via picking only the important information. In this paper, we design and develop a visualization tool that can identify a number of reviews containing comments on the movie at a glance. Review assumed to be extracted from the Amazon and IMDb that are both subjective information. The tool that we develop visualizes sentimental analysis of the review on pre-made Sentiment Dictionary with objective information of a movie. Our proposed system can search and display one or more movies. Users can determine the relationship between movies by clustering sentiment of positive/negative reviews and movie's factors. In the future, based on all the reviews on Amazon and grasp the reviews on a variety of movies and products, as well, it will be used as tools to help users of a rational choice.

Keywords

Review DAta, Sentiment Analysis, Visualization

Information

Publication