Small file sizes can degrade performance in Spark and Hive queries. This is because each small file requires overhead to open, read, and process. The problem is common with event streaming data and IoT sensors that produce many small files. To detect the issue, check for data skew across partitions and Spark job writers processing many small files. Mitigation techniques include file hierarchy designs, repartitioning, Delta Lake optimizations, and Databricks Auto Optimize to merge small files.