스노우플레이크 해커톤 도전기: 토지가격과 인구밀집도의 숨은 관계 파헤치기 🔍🏙️

안녕하세요, 여러분! 평소에 데이터에 관심이 많아서 관련 컨퍼런스도 몇 번 참여했던 스노우플레이크에서 해커톤을 진행한다는 사실을 얼마 전에 알았습니다. 🤓✨

게다가 1위는 M4 맥북 에어의 선물이 존재하는데요. 제가 한번 맛나게 가져가보겠습니다! 💻🏆

그래서 오늘부터 약 3일간 해커톤 준비를 해보려고 합니다. 짧은 시간이지만 최선을 다해볼게요!

🤔 어떤 주제로 도전할까? 해커톤 주제 선정 과정

해커톤에 참가하기로 결정했을 때 가장 먼저 맞닥뜨린 문제는 바로 ‘주제 선정’이었어요. 수많은 데이터와 가능성 속에서 무엇을 선택할지 고민이 되더라고요. 그러다 미시경제학 수업 시간에 들었던 한 가지 질문이 떠올랐습니다.

“토지가 비싸기 때문에 사람이 많이 모이는 걸까, 사람이 많이 모이기 때문에 토지가 비싼 걸까?” 🏢💰

이 닭이 먼저냐 달걀이 먼저냐 같은 질문이 문득 흥미롭게 느껴졌어요. 마침 마샬의 지가이론과 관련된 내용도 공부하고 있었거든요! 그래서 이번 기회에 실제 데이터로 이 관계를 증명해보면 어떨까 하는 생각이 들었죠. 💡

📊 주어진 데이터 살펴보기

스노우플레이크 해커톤에서는 정말 풍부한 데이터를 제공해 주었어요:

부동산 데이터: 서울 3개구(서초구, 영등포구, 중구)의 아파트 시세 (2020-2024)
인구 데이터: 행정안전부 제공 인구 통계
유동인구 데이터: SKT 유동인구 데이터 (2021-2023)
소비 데이터: 신한카드 소비내역
소득 데이터: KCB 자산소득 데이터

이 데이터들을 보자마자 ‘이거다!’ 싶었어요. 아파트 시세는 토지 가치를 간접적으로 보여주고, 인구와 유동인구 데이터는 사람들의 밀집도를 나타내니까요. 이 두 변수 사이의 관계를 분석하면 제 질문에 답할 수 있을 것 같았습니다. 🧐📈

🧪 분석 방법론 정하기

자, 이제 어떻게 분석할지 방법론을 정해야 했어요. 사실… 제가 파이썬 모델링을 한 번도 해본 적이 없어서 처음에는 좀 두려웠어요. 😅 그래도 도전해보기로 했습니다!

주요 분석 방법은 다음과 같이 정했어요:

그랜저 인과관계 검정(Granger Causality Test) 👉 두 변수 간의 시간적 선후관계 검증
구조방정식 모델링(SEM) 👉 토지가격-인구밀집도-상권발달의 순환 구조 분석
패널 데이터 분석 👉 지역별 특성을 통제하면서 관계 분석

그리고 기술 스택으로는:

Snowflake 👉 대용량 데이터 처리
Python 👉 통계 분석 및 모델링
Streamlit 👉 인터랙티브 시각화

파이썬을 처음 사용하지만, 이번 기회에 배워보자는 마음으로 도전하기로 했습니다! 💪

🚀 프로젝트 흐름 계획하기

단 3일이라는 짧은 시간 안에 프로젝트를 완성해야 하므로, 압축적인 로드맵을 다음과 같이 설계했어요:

1일차: 데이터 준비 및 탐색적 분석 🛠️🔎

Snowflake 연결 및 데이터 로드
기본 데이터 전처리 (결측치 처리, 이상치 제거 등)
기본 상관관계 분석
지역별 특성 탐색

2일차: 인과관계 및 패널 데이터 분석 ⚖️📊

그랜저 인과관계 검정
구조방정식 모델링
고정효과 모델 구축
지역별 특성 통제 분석

3일차: 시각화 및 발표 자료 준비 💻📝

Streamlit 대시보드 개발
인터랙티브 시각화 구현
인사이트 도출
피치 덱 작성

🎯 비즈니스 임플리케이션

이 분석을 통해 얻을 수 있는 비즈니스적 가치도 정리해봤어요:

부동산 개발사 👉 부동산 가격 상승 가능성이 높은 지역 예측
정책 입안자 👉 주택 공급 정책의 효과 예측 모델
소매업 👉 상권 발달 예측을 통한 입지 선정

😅 솔직한 고민들

사실 이 모든 계획을 세우고 나니, ‘내가 이걸 할 수 있을까?’ 하는 의문이 들었어요. 파이썬 모델링 경험이 전혀 없는데다, 통계 분석도 학부 수준의 지식밖에 없거든요. 게다가 이번에는 혼자서 진행하는 해커톤이라 모든 걸 스스로 해결해야 한다는 부담감도 있습니다! 🤯

하지만 해커톤은 배움의 장이라고 생각하며 도전해보기로 했습니다. 처음부터 완벽하게 할 필요는 없으니, 기본부터 차근차근 배워나가면 될 것 같아요. 무엇보다 이런 복잡한 미시경제학적 질문을 데이터로 풀어보는 과정 자체가 정말 흥미롭거든요. 😊

🌟 마치며

해커톤은 아직 시작하지 않았지만, 준비 과정에서 이미 많은 것을 배우고 있어요. 경제학 이론을 실제 데이터로 검증해본다는 생각에 설렙니다!

다음 포스트에서는 실제 해커톤 진행 과정과 분석 결과를 공유해 드릴게요. 여러분도 관심 있는 주제가 있다면, 데이터를 통해 분석해보는 도전을 해보세요! 💯

그럼 다음 포스트에서 만나요~ 👋

#스노우플레이크해커톤 #데이터분석 #경제학 #토지가격 #인구밀집도 #파이썬입문 #도전기

My Snowflake Hackathon Journey: Uncovering the Hidden Relationship Between Land Prices and Population Density 🔍🏙️

Hello everyone! I’ve always been interested in data and have attended several related conferences. Recently, I found out that Snowflake is hosting a hackathon! 🤓✨

And guess what? The first prize is an M4 MacBook Air! I’m definitely going to try to win that beautiful machine! 💻🏆

So, I’m starting my hackathon preparation today and will be working intensively for the next 3 days. It’s a short timeframe, but I’m determined to make the most of it!

🤔 What Topic Should I Choose? The Selection Process

When I decided to participate in the hackathon, the first challenge I faced was selecting a topic. Among the countless possibilities, one question from my economics class kept coming back to me:

“Do people cluster together because land is expensive, or is land expensive because people cluster together?” 🏢💰

This chicken-or-egg question suddenly seemed fascinating to me. I had been studying Marshall’s land value theory in my microeconomics class, and I thought this would be the perfect opportunity to test this relationship with real data! 💡

📊 Exploring the Available Data

The Snowflake Hackathon provided an incredible wealth of data:

Real Estate Data: Apartment prices in three Seoul districts (Seocho, Yeongdeungpo, Jung-gu) from 2020-2024
Population Data: Official population statistics from the Ministry of the Interior
Floating Population Data: SKT mobile data tracking movement (2021-2023)
Consumption Data: Shinhan Card transaction records
Income Data: KCB asset and income data

Looking at these datasets, I had my “eureka” moment! Apartment prices could serve as a proxy for land value, while population and floating population data represented people’s concentration. Analyzing the relationship between these variables could answer my question. 🧐📈

🧪 Developing the Methodology

Now I needed to decide how to analyze the data. I’ll be honest… I’ve never done Python modeling before, so I was a bit intimidated at first. 😅 But I decided to give it a try anyway!

I selected these primary analytical methods:

Granger Causality Test 👉 To verify the temporal cause-effect relationship between variables
Structural Equation Modeling (SEM) 👉 To analyze the circular structure of land price-population density-commercial development
Panel Data Analysis 👉 To analyze relationships while controlling for regional characteristics

And for the technology stack:

Snowflake 👉 For processing large datasets
Python 👉 For statistical analysis and modeling
Streamlit 👉 For interactive visualization

Even though Python is new to me, I decided to embrace this as a learning opportunity! 💪

🚀 Planning the Project Flow

With only 3 days to complete the project, I designed a compressed roadmap:

Day 1: Data Preparation and Exploratory Analysis 🛠️🔎

Setting up Snowflake connections and loading data
Basic data preprocessing (handling missing values, removing outliers)
Basic correlation analysis
Exploring regional characteristics

Day 2: Causality and Panel Data Analysis ⚖️📊

Granger causality testing
Structural equation modeling
Building fixed-effects models
Analysis controlling for regional differences

Day 3: Visualization and Presentation Preparation 💻📝

Streamlit dashboard development
Implementing interactive visualizations
Drawing key insights
Creating the pitch deck

🎯 Business Implications

I also considered the business value this analysis could provide:

Real Estate Developers 👉 Predicting areas likely to experience price increases
Policy Makers 👉 Models to forecast effects of housing supply policies
Retail Businesses 👉 Site selection based on commercial area development predictions

😅 My Honest Concerns

After making all these plans, I did wonder, “Can I really do this?” I have no experience with Python modeling, and my statistics knowledge is limited to undergraduate courses. Plus, I’m doing this hackathon solo, which means I’ll have to handle everything by myself! 🤯

But I decided to embrace this as a learning opportunity! I don’t need to be perfect from the start – I can learn step-by-step as I go. Most importantly, I find the process of answering complex microeconomic questions with data absolutely fascinating. 😊

🌟 Concluding Thoughts

The hackathon hasn’t even started yet, but I’m already learning so much during the preparation process. I’m excited about testing economic theory with real data!

In my next post, I’ll share the actual hackathon process and analysis results. If you have a topic you’re interested in, I encourage you to challenge yourself to analyze it through data! 💯

See you in the next post! 👋

#SnowflakeHackathon #DataAnalysis #Economics #LandPrices #PopulationDensity #PythonBeginner #MyJourney

Snowflake Hackathon – 1위는 내가 할게요(맥북에어m4 냠냠)