Full Scenario Architecture 全场景架构图
Layered diagram covers access, control, data/compute, and cloud/DR zones with VLAN IDs and expected bandwidth to accelerate onboarding.
图示覆盖接入、控制、数据/算力与云灾备各层,标注 VLAN 与带宽要求,便于快速对齐交付范围。
User & Lab Edge 用户与实验端
- Work Client 1–4, LIMS stations, lab Wi-Fi aggregate run prep. Work Client 1–4、LIMS 终端与实验室 Wi-Fi 承载 Run 准备。
- Dedicated NE1072T uplinks anchor ACLs into SR630 Crown gateway. 专用 NE1072T 上联并通过 SR630 Crown Gateway 做 ACL 控制。
Instrument Fabric 仪器网络
- New cores replace unavailable NE10032 ports to host VLAN4/7. 因 NE10032 不可用,新增核心交换机承载 VLAN4/7。
- Isolated ACLs bridge to Crown data domains only through SR630 policies. 通过 SR630 策略隔离 ACL,仅与 Crown 数据域互通。
Storage & Compute Plane 存储与算力平面
- New 500TB Storwize/Quantum pools plus Cohesity tier for landing. 新增 500TB Storwize/Quantum 池与 Cohesity 分层构成登陆区。
- SR630/R740xd, GPU & RAM-heavy nodes deliver WGS/PacBio compute. SR630/R740xd 及 GPU/大内存节点支撑 WGS/PacBio 算力。
Hybrid & Resiliency 混合云与韧性
- Direct Connect, Snowball, and partner MPLS deliver global transport. Direct Connect、Snowball 与合作方 MPLS 承载全球传输。
- Cohesity cloud tier provides policy-based backup, extra landing, and offline retention. Cohesity Cloud Tier 提供策略化备份、额外登陆区与离线留存。
Sequencers 测序仪
- MGISeq2K & NovaSeq stream raw reads and QC metrics. MGISeq2K 与 NovaSeq 实时输出原始读段与 QC 指标。
- Instrumentation APIs forward run manifests to SR630. 仪器 API 将 Run 元数据推送至 SR630。
Landing & Storage 登陆区与存储
- 500TB staging pools issue checksum + Cohesity snapshot automatically. 500TB 登陆池自动执行校验并触发 Cohesity 快照。
- Policy engine tags clinical vs. pre-clinical buckets. 策略引擎区分临床/前临床桶。
SR630 Control SR630 管控
- IAM + ACL determine who can promote data upstream. IAM 与 ACL 决定可向上游传输的主体。
- Automation gateway schedules transfers, audit logs every hop. 自动化 Gateway 编排传输并记录日志。
Compute Cluster 计算集群
- R740xd / GPU nodes run WGS, WTS, PacBio pipelines. R740xd/GPU 节点运行 WGS、WTS、PacBio 流水线。
- Signed results staged for AWS export or local delivery. 经签名的结果可导出至 AWS 或本地交付。
AWS Analytics AWS 分析层
- Direct Connect/S3/Snowball pipelines hand off to global bioinformatics. Direct Connect/S3/Snowball 管线将数据交付全球生信。
- Results sync back to Crown LIMS and Cohesity cloud tier. 结果回写 Crown LIMS 并同步至 Cohesity Cloud Tier。
Design Priorities 设计重点
Scope 范围- Fast track VLAN4/7 commissioning for sequencers 优先完成 Sequencer 所需 VLAN4/7 开通
- Reserve 2×42U racks + UPS footprint 预留 2×42U 机柜与 UPS 占地
- Align AWS/Direct Connect handoff with MPLS Direct Connect 与 MPLS 交付需同步
Delivery Window 交付窗口
Milestones 里程碑- Dec 20 · Power + network baseline ready 12/20 · 完成供电与网络基线
- Jan 15 · Storage, backup commissioning 01/15 · 存储与备份联调
- Feb 1 · Sequencer & AWS automation online 02/01 · Sequencer 与 AWS 自动化上线
Current State Snapshot CBSD 现状快照
- No spare racks; evaluate lab-adjacent space or reclaim archive aisle capacity immediately. 无空余机柜;需立即评估实验室邻近空间或回收归档通道容量。
- Existing NE10032/NE1072T capacity cannot be reused—plan for new core/ToR switches to host VLAN4/7. 现有 NE10032/NE1072T 端口不可用,VLAN4/7 需新增核心/接入交换机。
- Storwize pool is at capacity; zero landing space remains, so procure additional 500TB staging immediately. Storwize 池已满,登陆区无可用空间,需立刻准备额外 500TB 预生产存储。
- Confirm CRAC tonnage to cover 38k BTU/h in the server room; current HVAC status is unknown. 需确认机房空调是否能覆盖 38k BTU/h 热负荷,当前 HVAC 能力未知。
- No dedicated 30A circuits in the lab; NovaSeq, PacBio, and Tape feeds plus UPS capacity must be revalidated. 实验室缺少专用 30A 回路,NovaSeq/PacBio/Tape 供电与 UPS 容量均需重检。
- UPS coverage for both server room and lab racks is missing—budget for dual UPS strings and bypass panels. 机房与实验室机柜均无 UPS 覆盖,需采购双路 UPS 及旁路柜。
Readiness Checklist · 8 Domains 项目准备 Checklist · 八大域
- Finalize new rack locations and floor loading for 2×42U racks plus UPS blocks. 确认 2×42U 机柜 + UPS 占地与楼板承重。
- Deploy fresh core/ToR switches to deliver 10/40Gb with dual fiber from lab to room. 布署新核心/接入交换机,提供 10/40Gb 并完成实验室至机房双链路。
- Define 500TB clinical + 500TB preclinical zoning and tiering. 定义临床/前临床各 500TB 的分区与分层策略。
- Deliver 30A×1 (PacBio), 15A×1 (NovaSeq), 20–30A×2 (Tape) before Dec 20. 12/20 前就绪:30A×1(PacBio)、15A×1(NovaSeq)、20–30A×2(Tape)。
- Provide redundant UPS strings and static bypass for sequencers, storage, and core network. Sequencer、存储与核心网络需双路 UPS + 静态旁路。
- Lock ownership between facility staff and external electricians with clear outage windows. 明确设施团队与外部电工责任及停机窗口。
- Build redundant VLAN4 (172.23.63.0/24) fabric with ACLs toward Crown core services. 搭建冗余 VLAN4(172.23.63.0/24)并与 Crown 核心服务配置 ACL。
- Extend 40Gb GPFS VLAN7 (10.0.6.0/24) to AFM/NSD/Protocol nodes. 将 40Gb GPFS VLAN7 (10.0.6.0/24) 延伸至 AFM/NSD/Protocol。
- Budget Direct Connect/S3/Snowball automation plus VPN to CBCN. 规划 Direct Connect/S3/Snowball 自动化与 CBCN VPN。
- Provision dual 500TB landing zones with 40Gb storage fabric. 建设双 500TB 登陆区并接入 40Gb 存储网络。
- Use Cohesity clusters for both backup and extended-capacity tiers. 使用 Cohesity 集群同时承担备份与扩展容量分层。
- Confirm retention targets and DR metrics (RPO/RTO). 确认保留策略与灾备指标(RPO/RTO)。
- On-prem HPC: PowerEdge R740xd compute + SR630 management. 本地 HPC:PowerEdge R740xd 计算节点 + SR630 管理节点。
- Add GPU and high-memory nodes for WGS/PacBio (>3000 CPU hrs/sample). 增补 GPU 与大内存节点,应对 WGS/PacBio(>3000 CPU 小时/样本)。
- Deploy two QC workstations + NAS on VLAN4 for run validation. 在 VLAN4 上部署 2 台 QC 工作站与 NAS,支持 Run 校验。
- Sequencer → NE1072T → SR630 gateway → Storwize/Quantum → analytics. Sequencer → NE1072T → SR630 Gateway → Storwize/Quantum → 分析集群。
- Automate AWS transfers (Direct Connect/S3/Snowball) budgeted at ≥12k USD/month. 自动化 AWS 传输(Direct Connect/S3/Snowball),预算 ≥12k USD/月。
- Integrate LIMS and Cohesity for end-to-end audit trails. LIMS 与 Cohesity 打通端到端审计链。
- Validate HVAC capacity, temperature/humidity monitoring, and alarm points. 确认 HVAC 容量以及温湿度监控与告警点位。
- Engineer NovaSeq chimney exhaust and cold/hot aisle separation. 完善 NovaSeq chimney 排风与冷热通道隔离。
- Improve cabling baffles to avoid hotspots. 布置线缆挡板,避免局部热点。
- Milestones: 12/20 power/network, 01/15 storage/backup, 02/01 run. 里程碑:12/20 供电/网络,01/15 存储/备份,02/01 运行。
- Clarify CapEx/Opex split and Phase-1 cloud fallback. 明确 CapEx/Opex 与 Phase-1 云应急方案。
Sequencer Architecture Sequencer 运行架构
MGISeq2K
- VLAN4 / 172.23.63.6, 10Gb into new NE1072T, dual uplink to core.VLAN4 / 172.23.63.6,10Gb 接入新 NE1072T,双上联至核心。
- Writes to 500TB preclinical landing (Storwize → GPFS → Cohesity/Tape).写入 500TB 前临床登陆区(Storwize → GPFS → Cohesity/Tape)。
- MGI Agent on SR630 gateway syncs to on-prem cluster or AWS buckets.SR630 Gateway 上的 MGI Agent 同步至本地集群或 AWS 存储桶。
- Power: UPS + dedicated 15A circuit with monitoring.电力:UPS + 专用 15A 回路并监控。
NovaSeq 6000
- VLAN4 / 172.23.63.7 with isolated ACLs, redundant 10Gb paths.VLAN4 / 172.23.63.7 独立 ACL,冗余 10Gb 通道。
- Feeds 500TB clinical pool before sharing through 40Gb storage LAN.写入 500TB 临床池后通过 40Gb 存储网共享。
- ≥64GB RAM + 3000 CPU hrs/sample for WGS/SNV workloads.WGS/SNV 负载需 ≥64GB RAM + 3000 CPU 小时/样本。
- 15A circuit + UPS; confirm chimney exhaust path.15A 回路 + UPS;需确认 chimney 排风路径。
Shared Components 共用组件
- SR630 management, SR650/PacBio nodes for future growth.SR630 管理节点与 SR650/PacBio 节点支撑扩展。
- NAS/Protocol/NSD/AFM ride the 40Gb VLAN7 GPFS fabric.NAS/Protocol/NSD/AFM 运行于 40Gb VLAN7 GPFS 网络。
- Crown interface (179.17.1.0/24) and AWS Direct Connect traverse SR630 gateway.Crown 接口 (179.17.1.0/24) 与 AWS Direct Connect 通过 SR630 Gateway 互通。
- Cohesity delivers cold tiering, air-gap snapshots, and export workflows.Cohesity 提供冷数据分层、Air-Gap 快照与导出流程。
Network & Address Plan 网络与地址规划
| VLANVLAN | Purpose用途 | Subnet / Rate网段 / 速率 | Key Nodes关键节点 |
|---|---|---|---|
| 2 | IPMI / ManagementIPMI / 管理 | 172.23.64.0/24 · 1Gb | Sw-Mgt-01/02, PDU, UPS, SR630 BMC |
| 3 | Platform Services平台服务 | 172.23.69.0/24 · 1Gb | AFM, Protocol, NSD, Archive Fabric |
| 4 | Sequencer & WorkstationsSequencer 与工作站 | 172.23.63.0/24 · 10Gb | MGISeq, NovaSeq, PacBio, Bionano, QC clients |
| 5 | AWS Transfer / AutomationAWS 传输 / 自动化 | 10.0.50.0/24 · 1Gb | SR630 Gateway, AWS Transfer Family, Automation Agents |
| 6 | Crown InterfaceCrown 接口 | 179.17.1.0/24 | SR630 Crown Data Gateway, VPN |
| 7 | Data / GPFS数据 / GPFS | 10.0.6.0/24 · 40Gb | AFM/NSD/Protocol/Archive/SR630/SR650 |
Storage & Tiering Strategy 存储与分层策略
Landing & Primary 登陆区与主存储
- Greenfield build: deploy dual 500TB landing pods (Storwize + NVMe cache) with zero legacy storage.全新建设:部署双 500TB 登陆集群(Storwize + NVMe Cache),无任何旧有存储。
- Define clinical vs. preclinical zoning and growth policy before hardware arrives.硬件到货前即完成临床/前临床分区与扩容策略定义。
- Commission new 40Gb fabric (replacement core + FC Switch 8969-F24) to link sequencer → compute → storage.启用全新的 40Gb Fabric(替换核心 + FC Switch 8969-F24),贯通 Sequencer→计算→存储链路。
Backup & Archive 备份与归档
- Cohesity acts as the backup + extra storage tier with immutable snapshots.Cohesity 作为备份与扩展存储层,支持不可变快照。
- Policy-based tiering from Cohesity to AWS Glacier Deep Archive (0.002 USD/GB) or N. Virginia promo (1 USD/TB).通过 Cohesity 策略分层至 AWS Glacier Deep Archive(0.002 USD/GB)或北弗吉尼亚 1 USD/TB 促销。
- Expansion shelves can be added to Cohesity for landing overflow prior to cloud tiering.可为 Cohesity 增加扩展盘柜,在上云分层前承接溢出数据。
Data Transfer & Cloud Connectivity 数据传输与云连接
57TB/month cross-border traffic requires ≥264 Mbps sustained throughput to close within 30 days.
57TB/月的跨国数据量需 ≥264 Mbps 持续带宽,才能在 30 天内完成上传。
| Option方案 | Description描述 | Timeline / SLA周期 / SLA | Reference Cost费用参考 |
|---|---|---|---|
| AWS Direct + Public | Suzhou → HK POP → Glacier / Direct Connect → Zurich/Geneva.苏州 → 香港 POP → Glacier / Direct Connect → Zurich/Geneva。 | Depends on 1Gbps internet, Snowball limited, cross-border self-managed.依赖 1Gbps 互联网,Snowball 受限,跨境需自运维。 | 100TB transfer ≈12,237 USD/month; Glacier deep archive 205 USD/month.传输 100TB ≈12,237 USD/月;Glacier 深归档 205 USD/月。 |
Short term: AWS S3 + Snowball; long term: blend dedicated circuits/VPN with AWS plus Cohesity tiering.
短期可用 AWS S3 + Snowball;长期建议专线/VPN 与 AWS 混合,并结合 Cohesity 分层。
Cost Breakdown for PPT 费用分类汇总
| Category分类 | Scope / Key Actions范围 / 关键动作 | Estimate费用估计 | Notes备注 |
|---|---|---|---|
| Facility | New racks, floor reinforcement, HVAC, UPS, structured cabling新增机柜、地板加固、空调、UPS、布线 | 45k – 80k USD | Power/CRAC upgrades due Dec 2012/20 完成配电/空调升级 |
| Network | New core/ToR, 40Gb storage switches, fiber plant新核心/接入、40Gb 存储交换、光纤布放 | 35k – 60k USD (excl. circuits) | Direct Connect / partner VPN adds 12k–20k USD/month OpexDirect Connect / 合作方 VPN 另需 12k–20k USD/月 Opex |
| Compute | SR630 management, R740xd compute, GPU/RAM nodes, workstationsSR630 管理、R740xd 算力、GPU/大内存节点、工作站 | 150k – 250k USD | Includes Crown interface/Agent gateway, NAS含 Crown 接口/Agent Gateway、NAS |
| Storage | New Storwize clusters, NVMe landing shelves, 40Gb FC, NAS新 Storwize 集群、NVMe 登陆盘柜、40Gb FC、NAS | 300k – 1.7M USD | Variante 1–4 sourced from global vendorsVariante 1–4 为海外报价 |
| Backup | Cohesity clusters, expansion shelves, Glacier tieringCohesity 集群、扩展盘柜、Glacier 分层 | 120k – 220k USD | Includes 5-year 24×7×4 intl. support含 5 年 24×7×4 国际保修 |
| Data Transfer | AWS fees / circuits / VPN / SnowballAWS 费用 / 专线 / VPN / Snowball | 12k USD+/month (cloud) + 15k–25k USD/month if dedicated circuits云侧 12k USD+/月;如需专线再增 15k–25k USD/月 | Tie into Cohesity Cloud Tier automation结合 Cohesity Cloud Tier 自动分层 |
Global Pricing Benchmarks 国外报价对标
Benchmarks sourced from U.S. / EU vendors to guide CBSD negotiations.
引用美国/欧洲供应商报价,便于 CBSD 采购对标。
| Category品类 | Vendor / Model海外供应商 / 型号 | Highlights配置亮点 | Ref. Price (USD)参考价 (USD) |
|---|---|---|---|
| HPC Node | Dell PowerEdge R740xd (US) | 2×Gold 6248R, 384GB RAM, 4×GPU ready, 25Gb NIC2×Gold 6248R, 384GB RAM, 4×GPU 位, 25Gb NIC | ~34,500 w/ NBD ProSupport |
| Storage | Quantum QXS-4 (EU) | 24×16TB NL-SAS, dual controllers, 40GbE24×16TB NL-SAS,双控,40GbE | ~92,000 per array |
| Backup | Cohesity C4000 Cluster (US) | 4×nodes, 120TB usable, CloudArchive/CloudTier licenses4×节点,120TB 可用,含 CloudArchive/CloudTier 许可 | ~185,000 incl. 3-year support |
| Cloud Network | AWS Direct Connect 1Gb (HK) | Cross connect + port + data transfer交叉连接 + 端口 + 数据传输 | 0.25 USD/h port + 0.025 USD/GB data |
| Dedicated Circuit | Equinix ECX + Partner MPLS | 1Gb dual sites + proactive monitoring1Gb 双站点 + 主动监控 | ~32,000/month (36 mo term) |
Plan → Quote → Build → Run 方案 → 报价 → 采购 → 配置 → 运维
Baseline 现状校准
- Inventory room/HVAC/network/storage assets.完成机房/空调/网络/存储资产清点。
- Answer eight key questions (power, racks, network, storage, compute, flows, environment, budget).回答配电、机柜、网络、存储、算力、数据流、环境、预算八大问题。
- Lock VLAN/IP/ACL plus AWS dataflow diagram.锁定 VLAN/IP/ACL 及 AWS 数据流图。
Design & Pricing 方案与分类报价
- Split budget by facility/network/compute/storage/backup/transfer.按机房/网络/算力/存储/备份/传输拆分预算。
- Collect Quantum/Cohesity/circuit lead times.获取 Quantum/Cohesity/专线交付周期。
- Approve Phase-1 cloud vs. Phase-2 on-prem compute strategy.明确 Phase-1 云与 Phase-2 本地算力策略。
Procure & Deploy 采购与部署
- Execute room retrofit, power, UPS, cabling; stand up 10/40Gb network.完成机房改建、配电、UPS、布线,部署 10/40Gb 网络。
- Rack SR630/R740xd/Storwize/Cohesity; finalize VLAN/ACL.上架 SR630/R740xd/Storwize/Cohesity 并完成 VLAN/ACL。
- Test sequencer UPS, automation gateways, NAS/workstation connectivity.测试 Sequencer UPS、自控 Gateway、NAS/工作站连通。
Operate 配置与运维
- Automate LIMS → dataflow → AWS Transfer → Cohesity tiering & CloudArchive.自动化 LIMS→数据流→AWS Transfer→Cohesity 分层与 CloudArchive。
- Enable monitoring for HVAC, UPS, current, VLAN health, Spectrum Scale.上线 HVAC、UPS、电流、VLAN、Spectrum Scale 监控。
- Publish SOP for room checks, incident response, cloud cost control.发布机房巡检、故障响应、云成本优化 SOP。
Open Questions 待确认的关键问题
- Power work window: external electricians required? downtime plan?供电施工窗口:是否必须外部电工?如何安排停机?
- Rack plan: lab-adjacent vs. existing room vs. reclaimed archive aisle; floor loading?机柜方案:实验室旁、现机房腾挪或回收归档通道?楼板承重?
- Network: new 10Gb core, fiber plant, sequencer VLAN to Crown ACL baseline.网络:新 10Gb 核心、光纤布放、Sequencer VLAN 与 Crown ACL 基线。
- Data governance: retention policy, storage/backup ownership, DR runbooks.数据治理:保留策略、存储/备份责任、灾备流程。
- Bioinformatics: timing for Phase-1 cloud vs. Phase-2 on-prem; CBCN VPN controls.Bioinformatics:Phase-1 云与 Phase-2 本地切换时间;CBCN VPN 安全要求。
- Automation: sequencer→storage→analysis→results pipeline + LIMS integration.自动化:Sequencer→Storage→Analysis→Results 流水线与 LIMS 集成。
- Environment: HVAC capability, NovaSeq chimney exhaust, sensor locations.环境:HVAC 能力、NovaSeq chimney 排风、传感器点位。
- Budget & milestones: capital ceiling, Feb 1 readiness, long-lead mitigation.预算与里程碑:资本上限、2 月上线、长周期物料应急。