Python으로 MongoDB 이용하기 01

PyMongo Tutorial

PyMongo 라이브러리를 통해 Python으로 MongoDB를 이용할 수 있다. 우선, MongoDB를 로컬이나 클라우드 환경에 설치한 다음 PyMongo의 MongoClient 메서드를 이용해 DB에 연결할 수 있다. PyMongo 공식 문서의 내용을 바탕으로 한글로 재구성하였으며, 상세한 설명은 원문을 참고하면 된다.

MongoDB 관련 글

PyMongo 설치

환경

Windows 10 pro
Python: 3.8.3
PyMongo: 3.11.0

CMD를 열어 아래 명령어를 실행하면 PyMongo가 설치된다.

pip install pymongo

DB연결 - MongoClient

MongoDB에 연결하는 방법은 두 가지가 있다. 첫 번째 방법은 MongoClient의 값으로 MongoDB 서버 URI를 파라미터로 입력하는 것이고, 두 번째 방법은 MongoDB 서버의 호스트와 포트 각각의 값을 파라미터로 입력하는 것이다. 나는 로컬 환경에서 MongoDB를 실행 중이어서 고정IP 대신 localhost를 입력하였다.실제 운영중인 DB 서버를 연결하려면 localhost 대신 운영중인 DB 서버의 URI를 입력하거나 HOST, PORT를 입력하면 된다. (계정 정보가 있는 경우는 username과 password 인자를 추가하면 된다.)

from pymongo import MongoClient

# 방법1 - URI
# mongodb_URI = "mongodb://localhost:27017/"
# client = MongoClient(mongodb_URI)

# 방법2 - HOST, PORT
client = MongoClient(host='localhost', port=27017)

print(client.list_database_names())

['admin', 'config', 'local']

DB 접근

DB에 접근하는 방법도 두 가지가 있다. 하나는 메서드 형태로 . 뒤에 DB명을 입력하는 것이고, 다른 하나는 Dictionary를 인덱싱하는 형태로 대괄호 안에 DB명 문자열을 입력하는 것이다.

# DB 접근
# 방법1
# db = client.mydb

# 방법2
db = client['mydb']

Collection 접근

Collection 접근 방법도 DB 접근 방법과 유사하다.

# Collection 접근
# 방법1
# collection = db.myCol
# 방법2
collection = db['myCol']

Documents

Document 생성

MongoDB는 Data를 JSON 스타일의 Document로 저장한다. JSON과 유사한 Python Dictionary 데이터를 MongoDB Collection에 Document로 저장할 수 있다.

import datetime
post = {"author": "Mike",
        "text": "My first blog post!",
        "tags": ["mongodb", "python", "pymongo"],
        "date": datetime.datetime.utcnow()
        }
post

{'author': 'Mike',
 'text': 'My first blog post!',
 'tags': ['mongodb', 'python', 'pymongo'],
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217943)}

Collection 접근 및 Document 추가

posts라는 Collection에 접근하여 위에서 만든 Dictionary 데이터를 추가한다. 이 때, 데이터 추가 후 자동으로 생성되는 _id 값을 post_id에 저장한다.

# Collection 접근 - 'posts' Collection
posts = db.posts

# Document 추가 - insert_one() 메서드 이용
post_id = posts.insert_one(post).inserted_id
print(post_id)

5fa2a1d5a7bd62cd633fdb5b

db 인스턴스의 list_collection_names()를 호출하면 DB에 존재하는 Collection들의 목록을 출력할 수 있다.

# Collection 리스트 조회
db.list_collection_names()

['posts']

단일 Document 조회 - find_one() 메서드 이용

find_one() 메서드는 MongoDB에서 사용할 수 있는 가장 기본적인 쿼리이다. 이 메서드는 쿼리와 일치하는 단일 Document를 불러온다. 일치하는 항목이 없다면 아무것도 불러오지 않는다. 쿼리와 일치하는 Document가 해당 Collection에 하나만 있다는 것을 확인할 때 유용하다. 혹은 쿼리와 일치하는 첫 번째 Document를 불러올 때도 유용하다.

# Collection 내 단일 Document 조회
import pprint
pprint.pprint(posts.find_one())

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

find_one() 메서드는 SQL 문법의 WHERE 구문과 같이 특정 요소를 필터링할 수 있는 쿼리도 가능하다. 아래는 posts Collection에서 author가 Mike인 단일 Document를 불러오는 예제이다.

# 쿼리를 통한 Documents 조회
pprint.pprint(posts.find_one({"author": "Mike"}))

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

# _id를 통한 Documents 조회 - _id는 binary json 타입으로 조회해야 함
pprint.pprint(posts.find_one({"_id": post_id}))
print(type(post_id))

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
<class 'bson.objectid.ObjectId'>

# _id 값이 str인 경우 조회 안 됨.
post_id_as_str = str(post_id)
pprint.pprint(posts.find_one({"_id": post_id_as_str}))

None

# _id 값이 str인 경우 bson(binary json) 변환 후 조회
from bson.objectid import ObjectId

bson_id = ObjectId(post_id_as_str)
pprint.pprint(posts.find_one({"_id": bson_id}))

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

여러 Documents 추가 - insert_many() 메서드 이용

insert_many()에 여러 딕셔너리가 포함된 리스트를 파라미터로 입력해주면, 복수 개의 Documents를 Collection에 추가할 수 있다.

# Bulk Insert

new_posts = [
    
    {
     "author": "Mike",
     "text": "Another post!",
     "tags": ["bulk", "insert"],
     "date": datetime.datetime(2009, 11, 12, 11, 14)
    },
     
    {
     "author": "Eliot",
     "title": "MongoDB is fun",
     "text": "and pretty easy too!",
     "date": datetime.datetime(2009, 11, 10, 10, 45)
    }
    
]

result = posts.insert_many(new_posts)
result.inserted_ids

[ObjectId('5fa2a1eda7bd62cd633fdb5c'), ObjectId('5fa2a1eda7bd62cd633fdb5d')]

여러 Documents 조회 - find() 메서드 이용

# Collection 내 모든 Documents 조회
for post in posts.find():
    pprint.pprint(post)

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5fa2a1eda7bd62cd633fdb5c'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}
{'_id': ObjectId('5fa2a1eda7bd62cd633fdb5d'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}

# 쿼리를 통한 Documents 조회
for post in posts.find({"author": "Mike"}):
    pprint.pprint(post)

{'_id': ObjectId('5fa2a1d5a7bd62cd633fdb5b'),
 'author': 'Mike',
 'date': datetime.datetime(2020, 11, 4, 12, 42, 56, 217000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5fa2a1eda7bd62cd633fdb5c'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}

카운팅

# 컬렉션 내 도큐먼트 수 조회
posts.count_documents({})

# 쿼리를 통한 도큐먼트 수 조회
posts.count_documents({"author": "Mike"})

범위 쿼리

MongoDB는 여러 고급 쿼리를 사용할 수 있다. 예를 들어, 특정 날짜 이전의 Document들을 author별로 정렬하려면 아래와 같이 쿼리를 사용하면 된다.

# 범위 쿼리
d = datetime.datetime(2009, 11, 12, 12)
for post in posts.find({"date": {"$lt": d}}).sort("author"):
    pprint.pprint(post)

{'_id': ObjectId('5fa2a1eda7bd62cd633fdb5d'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('5fa2a1eda7bd62cd633fdb5c'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}

인덱싱

import pymongo
# Indexing
result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
                                  unique=True)

# 두 개의 인덱스 확인
sorted(list(db.profiles.index_information()))

['_id_', 'user_id_1']

# 유저 정보 관련 도큐먼트 생성
user_profiles = [
    
    {"user_id": 211, "name": "Luke"},
    {"user_id": 212, "name": "Ziltoid"}
    
]

# 도큐먼트 추가
result = db.profiles.insert_many(user_profiles)

# 이미 컬렉션에 있는 user_id이면 추가 방지
new_profile = {"user_id": 213, "name": "Drew"}
duplicate_profile = {"user_id": 212, "name": "Tommy"}

# user_id 신규 이므로 정상 추가 됨
result = db.profiles.insert_one(new_profile)

try:
    # user_id 기존에 있으므로 추가 안 됨
    result = db.profiles.insert_one(duplicate_profile)
except:
    print("Error")

Error

for doc in db.profiles.find():
    pprint.pprint(doc)

{'_id': ObjectId('5fa2a1f4a7bd62cd633fdb5e'), 'name': 'Luke', 'user_id': 211}
{'_id': ObjectId('5fa2a1f4a7bd62cd633fdb5f'), 'name': 'Ziltoid', 'user_id': 212}
{'_id': ObjectId('5fa2a1f5a7bd62cd633fdb60'), 'name': 'Drew', 'user_id': 213}

MongoDB 관련 글

Twitter Facebook LinkedIn

Python으로 MongoDB 이용하기 01

PyMongo Tutorial

PyMongo Tutorial

MongoDB 관련 글

PyMongo 설치

DB연결 - MongoClient

DB 접근

Collection 접근

Documents

Document 생성

Collection 접근 및 Document 추가

단일 Document 조회 - find_one() 메서드 이용

여러 Documents 추가 - insert_many() 메서드 이용

여러 Documents 조회 - find() 메서드 이용

카운팅

범위 쿼리

인덱싱

MongoDB 관련 글

공유하기

댓글남기기

참고

Spark Kafka 설치 방법(Docker Compose)

Running LLM locally with GGUF files

GGUF 파일로 로컬에서 LLM 실행하기

LLM 모델 저장 형식 GGML, GGUF