Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.0a3 - 2026-04-20
Added
- Phase 7: MkDocs + Material documentation site at https://psxdata.readthedocs.io — 12 pages covering getting started, 6 tutorial guides, full API reference, changelog, and contributing.
- Phase 7:
.readthedocs.yamlv2 build config for Read the Docs free-tier hosting. - Phase 7:
examples/quickstart.py— runnable demo script covering all 5 core functions. - Phase 7: Expanded all 8 module-level function docstrings (Args, Returns, Raises, Example) for clean API reference rendering via mkdocstrings.
Changed
pyproject.toml: addeddocsoptional dependency group (mkdocs-material,mkdocstrings[python]),DocumentationURL, and updated package description.
Unreleased
Added
- Phase 4: Added
CONTRIBUTING.mdguide for adding new API endpoints — router pattern, registry wiring, response envelope, typedresponse_model, error codes,Depends()injection, andTestClientfixture conventions. - Phase 4: Added
test-apiCI job that installs.[dev,api]and runstests/unit/api/in isolation from the core test environment. - Phase 4: Added
api/schemas.py— six Pydantic v2 models (MetaSingle,MetaList,ErrorDetail,ErrorEnvelope,HealthData,HealthResponse) forming the standardized response envelope. - Phase 4: Added
GET /healthendpoint returning{"data": {"status": "ok"}, "meta": {"timestamp": ..., "cached": false}}using typedresponse_model=HealthResponse. - Phase 4: Added 17 unit tests covering schemas, error envelope paths, and the health route.
Changed
- CI
lintjob now installs.[dev,api]so mypy can resolve FastAPI imports when checkingapi/. - CI
testjob now ignorestests/unit/api/— API tests require FastAPI extras and run in the dedicatedtest-apijob. .gitignore— fixed self-ignoring pattern; contributors now receive the file on clone. Personal paths moved to maintainer's global gitignore.api/main.py— replaced three ad-hoc exception handlers with five spec-compliant handlers;PSXUnavailableError→ 503,InvalidSymbolError→ 404, all errors return{"error": {"status", "code", "message"}}envelope.api/routers/__init__.py— registeredhealth_router; switched from relative to absolute imports.README.md,ARCHITECTURE.md,CONTRIBUTING.md— updated API response envelope documentation to includecounton list responses and the error envelope shape.
[Unreleased — pre-Phase 4]
Added
- Phase 0: Probed all 8 PSX endpoints. Confirmed rendering modes —
/sector-summaryand/financial-reportsrequire Playwright; all other endpoints work with plainrequests. - Phase 0: Captured HTML fixtures for all 5 key endpoints (
historical_engro,trading_panel,screener,sector_summary,financial_reports). - Phase 0: Added
tools/probe_endpoints.py— reusable diagnostic that probes all PSX endpoints and writesdocs/PSX_ENDPOINTS.md. - Phase 0: Added
tools/capture_fixtures.py— reusable fixture capture that saves stamped HTML snapshots totests/fixtures/. - Phase 0.5: Repository infrastructure — issue templates, PR template, CI/CD workflows, community files, GitHub labels, milestones, and development roadmap issues.
- Phase 2: Added
psxdata/exceptions.py— 10-class exception hierarchy rooted atPSXDataError. - Phase 2: Added
psxdata/constants.py— all PSX endpoint URLs, request headers, retry/rate-limit config, cache settings, and fullCOLUMN_MAPfrom Phase 0 fixtures. - Phase 2: Added
psxdata/utils.py—chunk_date_range(configurable date splitter),RateLimiter(thread-safe, injectable clock),validate_ohlc_dataframe(flags/drops bad rows, addsis_anomalycolumn). - Phase 2: Added
psxdata/parsers/normalizers.py—parse_date_safely(never-raises, multi-format + fuzzy fallback),coerce_numeric,normalize_column_name. - Phase 2: Added
psxdata/parsers/html.py— dynamic HTML table parser usingCOLUMN_MAP; unknown headers fall back tonormalize_column_namewith a logged warning. - Phase 2: Added
psxdata/cache/disk_cache.py—DiskCachebacked bydiskcache+ parquet; historical data never expires, today's data expires after 15 minutes. - Phase 2: Added
psxdata/models/schemas.py— 7 thin Pydantic v2 models:OHLCVRow,Quote,IndexRecord,SectorSummary,TickerInfo,DebtInstrument,EligibleScrip. - Phase 2: Added
psxdata/scrapers/base.py—BaseScraperwith persistent session, exponential backoff retry, rate limiter, and Playwright context manager. - Phase 3: Added scrapers for all 8 PSX endpoints —
historical.py(POST, all-time OHLCV),realtime.py(15 trading-board combos),indices.py(18 indices),sectors.py(37 sectors),fundamentals.py(financial reports),screener.py(1000+ tickers with fundamentals),debt_market.py(4 instrument tables),eligible_scrips.py(9 category tables). - Phase 3: Deprecated Playwright for scraping — all endpoints confirmed accessible via plain
requests+BeautifulSoup._playwright_page()retained inBaseScraperfor tooling only. - Phase 3 API: Added public package interface —
stocks(),tickers(),quote(),indices(),sectors(),fundamentals(),debt_market(),eligible_scrips()exported frompsxdata/__init__.py. - Phase 5: Added full unit test suite — 80+ tests covering parsers, validators, cache, utils, and scraper reliability (mocked failure modes).
- Phase 5: Added integration test suite — real PSX endpoint tests for all 8 scrapers, marked
@pytest.mark.integration. - Phase 5: Added
tests/fixtures/— static HTML/JSON snapshots for deterministic unit tests. - Phase 6: Published to PyPI as
psxdata==0.1.0a1. - Phase 6: Added 5-job gated publish pipeline (
.github/workflows/publish.yml) — tag verification → unit tests → build+twine check → TestPyPI → PyPI (manual approval gate). - Phase 6: Removed
playwrightfrom core dependencies — optional tooling only.
Changed
- Corrected
ARCHITECTURE.mdscraper→endpoint map:/screenerand/trading-paneluserequests+BeautifulSoup;/sector-summaryand/financial-reportsuse Playwright. CHANGELOG.md: Added entries for Phases 3 through 6.
0.1.0a1 — 2026-04-19
First PyPI release. Core scraping library complete for all 8 PSX endpoints with caching, validation, and a clean public Python API.