top of page

Identifying City Phenomena using Natural Language Processing (NLP) and Geo-location Technology - Internship story @ RISE Ltd.

I had an opportunity to join RISE Ltd. (now CYENS), a research company in Cyprus for my mandatory intership. I was assigned to one of their research group, MRG DARE (Digitally Enhanced Urban Environment) which focuses on innovation and technology exploration for creating digital twin, smart city and urban development. During my 6-months part-time internship I was working on a dedicated project: "Identifying city phenomena using Natural Language Processing (NLP) and Geo-location technology". By working on the project I could learn how spatial data can contribute to analysis in urban science and moreover, combined with machine learning technology, in this case, is NLP, it can be a powerful tool for data science application in the context of a smart city.

Tasks and Objectives

The objectives and tasks of the internship are the following:

  • Work on different methods for acquiring online data, i.e. web scraping or through APIs

  • Learn to process and model geolocated social media data for event identification in the context of Smart City development.

  • Improve her geospatial skills and learn how to integrate them with AI, IoT, etc.

  • Learn how to automate tasks with various tools, e.g. scripting language, open-source software.

  • Acquire data modelling and visualization skills from spatial and non-spatial data sources.

The data used in this project is travel review data from TripAdvisor, for world's popular venues and all venues in the city of Nicosia. Travel review sites are used by travelers and locals to provide rating for facilities, services and attractions around a city. Though most of the time the reviews target the specific venue the reviewer is acquiring the services of a lot of time information about surrounding contexts is also mentioned​. Such information might refer to the city, the country, or even the immediate location of the venue. In this work, we will try to extract some information for the surrounding contexts to get an idea of the "quality" of these contexts​. Such data will be very useful for smart city-related applications to identify issues and perceptions around the city neighborhoods ​ but also for traveling applications for identifying new places or unknown events around the city​.

Methodology

Natural language processing is a field of Artificial Intelligence that gives the machines the ability to read, understand and derive meaning from human languages. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. The technology can be applied to TripAdvisor reviews for identifying events, danger, or the stigma from the visitor towards a specific venue. In this project, common NLP algorithms such as PosTagging, LDA, sentiment analysis and cosine similarity were used, along with geo-location technology such as geocoding and distance dependency clustering to map the city phenomena.

floch%20intern_edited.png

Project Workflow

Results and Conclusions

Using natural language processing and geo-location technology, we can extract the valuable information for improvement of the city, such as incident review and public opinion within the city. Sample results from the project as follows:

Incident-Epicure restaurant

Incident-Epicure restaurant

Some particular events or phenomena can be identified by querying keywords to the reviews. Example: fire incident in Venice

Negative review-Peru

Negative review-Peru

Within March, 5 negative complaints about the bus connection to Machu Picchu

Negative review-Venice

Negative review-Venice

Within March, 5 negative complaints about the bridge

Nicosia

Nicosia

Negative review within city clusters in Nicosia

Sample results (1) -- Click for more detailed image

Nicosia result: Cluster plotting

Nicosia result: Topic modelling using LDA algorithm

Nicosia result: Public opinion based on LDA results

Some recommendation for future studies and intern:

  • Add more data from multiple source, e.g. Twitter, Google reviews etc.

  • Perform more comprehensive data cleaning and filtering

  • Create better visualization, e.g. Dashboard, web with recommendation features, time-series

  • Be proactive, always eager to learn more

  • The task assigned to you can be something you never worked on before, always find reference from books, websites, forums or your colleagues.

internship: Quote
bottom of page