这周主要是介绍Performance, 主要讲了索引. MongoDB为了保持key有序使用的是B树. 建立正确的索引能显著提高读的性能, 写由于需要更新index反而会变慢.
使用的一些例子:
db.students.ensureIndex({
student_id: 1,
class: -1
}, {
unique: true,
dropDups: true,
sparse: true,
background: true
})
//创建索引
//unique参数, 保证index中的key是唯一的
//dropDups参数, 移除重复的key, 行为是随机的
//sparse参数, only contain entries for documents that have the indexed
//field. Any document that is missing the field is not indexed.
db.foo.ensureIndex({a:1, b:1})
//multikey index(key:[array])
//invalid:db.foo.insert({a:[1,2,3], b:[5,6,7]})
db.places.ensureIndex({location:'2d',type:1})
db.places.find({location:{$near:[74,140]}}).limit(3)
//geo index~
db.system.indexes.find()
//查看当前db的所有索引
db.students.getIndexes()
//查看studens集合的索引
db.students.dropIndex({student_id:1})
//删除索引
db.students.stats()
db.students.tootalIndexSize()
//索引的大小, not free
db.students.find({}).hint()
//suggest mongodb use which index
//In pymongon, the parameters of hint() is a list of tuples.
db.system.profile.find()
db.system.profile.find({millis:{$gt:1000}}).sort({ts:-1})
mongdod --profile 1 --slowms 100
db.setProfileingLevel(1,100)
db.getProfilingLevel()
//Useful for performance tuning
//Some Comments:
//Use dot notation for embeded part
//Index creation is in forground by default,
//It's faster but mayblock other writers
//A background index creation still blocks the mongo shell that
//you are using to create the index.Although the database server
//will continue to take requests,
//A mongod instance can only build one background index at a time
//per database.
//$gt/$lt/$ne/$existes/$regex not efficient in using index
//index cardinality
//regular : 1:1
//sparse : <= collection documents
//multikey : >> collection documents
;Explain command
具体每项的解释可以看这里: http://docs.mongodb.org/manual/reference/explain/
db.zips.find({"_id":"35004"}).explain()
{
"cursor" : "BtreeCursor _id_", // use index, if BasicCursor, not use index
"isMultiKey" : false,
"n" : 1, // return documents
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0, // query time, a slow query > build in 100ms
"indexBounds" : {
"start" : {
"_id" : "35004"
},
"end" : {
"_id" : "35004"
}
},
"server" : "freyr.lan:27017"
}
;Week5 Aggregation
这周主要介绍了Aggregation框架, 与传统SQL对比可以看 SQL mapping chart
Pipeline concept
Aggregation的基本理念基本和Unix的管道类似, 链式的一系列操作:
$project -> $match -> $group -> $sort -> $skip -> $limit -> $unwind (unjoin data)
1:1 -> n:1 -> n:1 -> 1:1 -> n:1 -> n:1 -> n:1
;
使用的一些例子:
db.products.aggregate([{
$group: {
_id: "$category",
"num_products": {
$sum: 1
}
}
}])
//A little like upsert, iterate all documents, return new collections
{_id:{"manufacture":"$manufacture",category:"$category"}}
//group by multiple key:
//compound id
//_id can be document, must unique
db.zips.aggregate([{
$group: {
_id: "$state",
population: {
$sum: "$pop"
}
}
}])
//计算每个state的总人口数
db.zips.aggregate([{
$group: {
_id: "$state",
postal_codes: {
$addToSet: "$_id"
}
}
}])
//统计每个state的所有邮编
db.zips.aggregate([{
$project: {
_id: 0,
city: {
$toLower: "$city"
},
pop: "$pop",
state: "$state",
zip: "$_id"
}
}])
//or
db.zips.aggregate([{
$project: {
_id: 0,
city: {
$toLower: "$city"
},
pop: 1,
state: 1,
zip: "$_id"
}
}])
//Reshape~
db.zips.aggregate([{$match:{pop:{$gt:100000}}}])
//人口大于100000的state
db.zips.aggregate([{$sort:{state:1,city:1}}])
//排序
;Limitations:
- 16MB document
- 10% of the memory on a machine
- sharding, mongos
- mapreduce/hadoop
Week6 Application Engineering
这周主要探讨了如下几个话题:
- Durability of writes
- Avalibility fault tolerance
- Scaling
Write concern
通常在mongodb的driver可以配置, 如pymongo. 值得一提的是课程还没结束pymongo就release新版本了 pymongo 2.4~
Warning DEPRECATED: Please use mongo_client instead.
;
import pymongo
m = pymongo.MongoClient()
print m.write_concern
# {}
m.write_concern = {'w': 2, 'wtimeout': 1000}
print m.write_concern
# {'wtimeout': 1000, 'w': 2}
m.write_concern['j'] = True
print m.write_concern
# {'wtimeout': 1000, 'j': True, 'w': 2}
m.write_concern = {'j': True}
print m.write_concern
# {'j': True}
m.write_concern['w'] = 0
#Disable write acknowledgement and write concern
# SafeMode == {'w': 1,'j':False}
# w=n: write ack back from n nodes
# j: journal, complete inserted to db
# w='majority'
# wtimeout=1000
# if mongod nodes is 3, 'w' is 4 and not set wtimeout
# wait tcp timeout maybe more than 10 mintues
;Possiable Network errors
- The network TCP network connection between the application and the server was reset between the time of the write and the time of the getLastError call.
- The MongoDB server terminates between the write and the getLastError call.
- The network fails between the time of the write and the time of the getLastError call
Replication
At least 3 mongod nodes, If primary is down, secondary will elect new primary. It's transparently for client. If primary is up after down.(Failover and Rollback or copy data from currently primary).
Both primary and secondary can read, But only primary can write.
Replication sync between primary and secnodary is asynchronous:
secondary query primary's oplog.rs collection acoording to timestamp.
type `show collections` in mongo shell can see more details
;Replicate set nodes type:
- regular
- arbiter
- delayed -> can not become primary
- hidden -> never primary
Create:
mongod --replSet rs1 --logpath "1.log" --dbpath ~/data/rs1 --port 27017 --fork --shardsvr/--configsvr
mongo --port xxxx
;
Config:
config = {}
rs.initiate(config)
rs.status()
rs.slaveOk()
//can read from secondary now
rs.isMaster()
rs.conf()
rs.help()
;
Client
pymongo.MongoReplicaSetClient
Sharding
Need a config server(another mongod instance)
Sharding implications:
1. every documents contains the shard key
2. shard key is immutable
3. Should index the shard key
4. shard key update must set parameter `multi`
5. Query without shard key -> query all shards
6. no unqiue key unless / begin or part of shard key
;
-END-
--转自
该贴由koei123转至本版2015-6-1 15:14:43
该贴由system转至本版2019-3-2 21:07:44