Chatak Eco Acoustic Analytics Platform Table of Content Executive Summary Project overview Project Objectives User Workflow Implementation Details Features developed Technology Stack System Architecture Components overview Module-Level Breakdown Data Pipeline Acoustic Event Detection (AED) Clustering Annotation Limitations Known issues Future Enhancements Planned improvements Conclusion Summary 2 Executive Summary Project overview By collecting and examining metrics like total minutes streamed, genres preferred, and subscription details, organizations can develop targeted strategies to enhance user satisfaction and maximize profitability. This report focuses on leveraging user data to identify key behavioural trends, measure engagement, and predict churn risk. The insights gained from this analysis will inform decision-makers on how to refine their service offerings and maintain a competitive edge in the market. 3 Project Objectivies Automate acoustic monitoring of biodiversity in ecological research Provide species detection using location-aware BirdNet AI models Enable general acoustic event detection (vehicles, animals, environmental) Support audio clustering for discovering unknown sound patterns Facilitate collaborative annotation and data validation Generate comprehensive biodiversity insights and analytics Project Overview Chatak Eco Soundscape is a comprehensive desktop application designed for ecological soundscape analysis and biodiversity monitoring. The application enables researchers, ecologists, and environmental scientists to upload, process, and analyze audio recordings from field sites to detect and identify species, acoustic events, and biodiversity patterns. The system combines traditional web technologies (React, Node.js) with advanced machine learning models (BirdNet, PANNs) in a unified Electron-based desktop application, providing both online and offline capabilities for field research scenarios. 4 1. Authentication └─ > Login with Google OAuth or local credentials 2. Project Setup └─ > Create new project └─ > Define project metadata (name, description, dates) 3. Site Management └─ > Add monitoring sites with GPS coordinates └─ > View sites on interactive map 4. Recording Upload └─ > Upload audio files (multiple formats supported) └─ > System extracts metadata automatically └─ > Files stored locally or on S3 5. Audio Processing [Optional] └─ > Segment long recordings into chunks └─ > Process segments in parallel 6. Acoustic Event Detection General Acoustic Event Detection └─ > Run PANNs-based AED └─ > Detect vehicles, animals, human sounds, etc. └─ > Review events by type and confidence └─ > View spectrograms with detection overlays User Workflow 5 BirdNet Species Detection └─ > Configure location-based filtering └─ > Run BirdNet analysis └─ > Review detected species with confidence scores └─ > Listen to audio snippets 7. Audio Clustering └─ > Extract audio features └─ > Run HDBSCAN clustering └─ > Explore clusters visually └─ > Discover unknown sound patterns 8. Annotation & Validation └─ > Manually verify automated detections └─ > Add custom labels └─ > Submit to volunteer annotation system [Optional] 6 Implementation Details Project overview By collecting and examining metrics like total minutes streamed, genres preferred, and subscription details, organizations can develop targeted strategies to enhance user satisfaction and maximize profitability. This report focuses on leveraging user data to identify key behavioural trends, measure engagement, and predict churn risk. The insights gained from this analysis will inform decision-makers on how to refine their service offerings and maintain a competitive edge in the market. 7 Key features developed PROJECT & SITE MANAGEMENT - Multi-project workspace organization - Geographic site management with GPS coordinates - Interactive map visualization (MapLibre GL) - Hierarchical data structure: Projects → Sites → Recordings AUDIO RECORDING MANAGEMENT - Support for multiple audio formats (WAV, MP3, FLAC, M4A) - Large file handling - Dual storage: Local filesystem + AWS S3 (optional) - Metadata extraction (duration, sample rate, channels) BIRDNET ACOUSTIC EVENT DETECTION - AI-powered bird species identification - Location-based species filtering (latitude/longitude) - Confidence scoring for each detection - Audio snippet extraction for each event - Embedding extraction for clustering PANNS ACOUSTIC EVENT DETECTION - Deep learning-based general sound event detection - Frequency range estimation for each event type - Temporal overlap merging - Spectrogram visualization with detection overlays 8 AUDIO CLUSTERING - Using HDBSCAN algorithm - Audio feature extraction (MFCC, spectral features) - Embedding-based clustering - Cluster visualization and exploration ANNOTATION & VALIDATION - Manual event labeling interface - Volunteer crowdsourcing system - Quality control mechanisms AUDIO SEGMENTATION - Time-based audio splitting - Configurable segment duration 9 FRONTEND TECHNOLOGIES Platform: Electron 37.4.0 (Desktop Application) UI Framework: React 18.3.1 Language: TypeScript 5.8.3 Build Tool: Vite 5.4.19 Styling: TailwindCSS 3.4.17 UI Components: Radix UI (Accessible component library) State: React Query (TanStack Query) Routing: React Router 6.30.1 Maps: MapLibre GL 3.6.2 Charts: Recharts 2.15.4 BACKEND TECHNOLOGIES Runtime: Node.js 18+ Framework: Express.js 4.18.2 Database ORM: Sequelize 6.35.2 Authentication: JWT + Google OAuth 2.0 File Storage: AWS SDK S3 + Local Filesystem Audio: FFmpeg (external binary) PYTHON SERVICES Version: Python 3.10.9 (Virtual Environment) Audio Processing: librosa 0.10.0 (Audio analysis) soundfile 0.10.0 (Audio I/O) pydub 0.25.1 (Audio manipulation) Technology Stack 10 Machine Learning: PyTorch 1.9.0+ (Deep learning framework) TensorFlow 2.10.0-2.16.0 (BirdNet models) scikit-learn 1.0.0+ (Classical ML) HDBSCAN 0.8.29+ (Clustering) UMAP 0.5.3+ (Dimensionality reduction) ML Models: BirdNetLib 0.18.0 (Bird species detection) PANNs Inference 0.1.0 (General audio tagging) Visualization: matplotlib 3.5.0-3.8.0 (Spectrograms) Pillow 8.0.0+ (Image processing) DATABASE Primary: AWS RDS PostgreSQL Alternative: MySQL (via Sequelize) Connection: Connection pooling with SSL support 11 System Architecture Overview Project overview 12 13 The application provides a cross-platform desktop interface built using Electron and React. Users can access project dashboards, upload audio recordings, view analytics, and manage configuration parameters. Focuses on delivering an intuitive, responsive, and offline-capable experience while communicating with the backend through secure API calls. The backend exposes a structured REST API implemented using Node.js and Express. All incoming requests pass through the authentication middleware (Google OAuth + JWT validation) before being routed to the appropriate service modules. Components overview Detection Engines (PANNs, BirdNet, Feature Extraction) Specialized Python-based models perform the core acoustic analysis: Feature Extractor generates MFCC and spectral embeddings. PANNs CNN14 handles general-purpose Acoustic Event Detection (AED). BirdNet performs species-level bird call identification. This layer is isolated in a dedicated environment to ensure reproducibility, version control, and high-performance inference. File Upload Service handles validated audio uploads and stores files either locally or in AWS S3. Audio Processing Service coordinates segmentation, feature extraction, and pre-processing. 14 15 Analysis Modules (Spectrograms, Clustering, Annotation) Post-processing is handled by structured analysis components: Spectrogram Generator creates visual plots for inspection. Clustering Engine groups acoustically similar audio events using embeddings. Annotation Manager supports manual review and labeling workflows.These modules refine raw model outputs into structured, validated, and human-understandable insights. All metadata, analysis outputs, clustering results, and annotations are stored in AWS RDS PostgreSQL, ensuring reliability, ACID compliance, and scalability. Audio files are stored via the File Upload Service, either locally (during development) or in AWS S3 for production.This layered storage architecture separates large binary files from structured metadata for optimized performance. Six-layer architecture separating UI (Electron + React), API (Express.js), Core Services (file upload, audio processing), AI Detection (BirdNet + PANNs), Analysis Modules (clustering + annotation), and Data Layer (PostgreSQL + local/S3 storage) Dual AI detection engines: BirdNet for species-specific bird identification (3,000+ species) and PANNs CNN14 for general acoustic events (527 classes), both feeding into HDBSCAN clustering and expert annotation workflows Data flows top-down from user uploads through preprocessing, segmentation, AI analysis, clustering discovery, and annotation validation, with all results persisting in PostgreSQL database External integrations include Google OAuth for user authentication Module-Level Breakdown 16 DATA PIPELINE MODULE 17 • Upload workflow: File validation (format, size, integrity) → metadata extraction (duration, sample rate, channels) → database entry with user-provided context (location, date, project) • Preprocessing standardization: Format conversion to WAV → resampling to 48kHz (optimized for bird vocalizations) → amplitude normalization for consistent signal levels across recordings • Parallel processing generates spectrograms for visualization, extracts acoustic features for clustering, and prepares segments for AI inference while maintaining complete data lineage for validation 18 ACOUSTIC EVENT DETECTION (AED) MODULE 19 Dual-model strategy: BirdNet (3,000+ bird species with location filtering, TensorFlow, confidence threshold) and an optional PANNs CNN14 (527 AudioSet classes including animal sounds, human activities, natural phenomena, PyTorch, threshold) • Location-aware filtering: BirdNet uses GPS coordinates to filter species database to regionally probable candidates, significantly reducing false positives from geographically impossible species • Complementary coverage: BirdNet excels at fine-grained species discrimination while PANNs captures non-avian events (habitat quality indicators, human disturbance, equipment issues) for comprehensive acoustic scene understanding 20