feat: add file scanner utility with demo program#3568
feat: add file scanner utility with demo program#3568deepin-bot[bot] merged 1 commit intolinuxdeepin:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Johnson-zs The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Reviewer's GuideImplements a new asynchronous FileScanner utility with a threaded ScannerWorker for directory statistics (including hard-link deduplication and throttled progress reporting), integrates it into dfm-base, and adds an optional GUI demo/test application wired via a new OPT_ENABLE_BUILD_TESTS CMake option. Sequence diagram for asynchronous FileScanner lifecyclesequenceDiagram
actor User
participant DemoMainWindow
participant FileScanner
participant FileScannerPrivate
participant WorkerThread as QThread_workerThread
participant ScannerWorker
participant LocalFS
participant InfoFactory
User->>DemoMainWindow: select_directories()
DemoMainWindow->>FileScanner: setOptions(options)
DemoMainWindow->>FileScanner: start(urls)
FileScanner->>FileScannerPrivate: startWorker(urls)
alt worker_thread_not_created
FileScannerPrivate->>WorkerThread: create()
end
alt worker_not_created
FileScannerPrivate->>ScannerWorker: create()
FileScannerPrivate->>ScannerWorker: moveToThread(WorkerThread)
end
FileScannerPrivate->>ScannerWorker: setUrls(urls)
FileScannerPrivate->>ScannerWorker: setOptions(options)
FileScannerPrivate->>WorkerThread: start()
FileScannerPrivate->>ScannerWorker: start() queued
activate ScannerWorker
ScannerWorker->>ScannerWorker: reset_state()
alt first_url_is_local_file_scheme
ScannerWorker->>LocalFS: scanLocalPaths_with_fts()
else other_protocols
ScannerWorker->>InfoFactory: scanOtherProtocols_via_InfoFactory()
end
loop periodic_progress
ScannerWorker-->>FileScannerPrivate: resultReady(currentResult,false)
FileScannerPrivate->>FileScanner: onWorkerResultReady(result,false)
FileScanner-->>DemoMainWindow: progressChanged(result)
end
ScannerWorker-->>FileScannerPrivate: resultReady(finalResult,true)
ScannerWorker-->>FileScannerPrivate: finished()
Deactivate ScannerWorker
FileScannerPrivate->>FileScanner: onWorkerFinished()
FileScanner-->>DemoMainWindow: finished(lastResult,true)
rect rgb(230,230,230)
QCoreApplication->>FileScanner: aboutToQuit()
FileScanner->>FileScannerPrivate: stopWorker()
FileScannerPrivate->>ScannerWorker: stop()
end
Class diagram for FileScanner and ScannerWorker utilityclassDiagram
class FileScanner {
+ScanOption enum
+ScanResult struct
+FileScanner(QObject_parent)
+~FileScanner()
+void setOptions(ScanOptions_options)
+ScanOptions options()
+ScanResult result()
+bool isRunning()
+void start(QList_QUrl_urls)
+void stop()
+progressChanged(ScanResult_result)
+finished(ScanResult_result,bool_success)
-QScopedPointer_FileScannerPrivate_d
}
class FileScannerPrivate {
+FileScannerPrivate(FileScanner_qq)
+~FileScannerPrivate()
+void startWorker(QList_QUrl_urls)
+void stopWorker()
+void onWorkerResultReady(FileScanner_ScanResult_result,bool_isFinal)
+void onWorkerFinished()
-FileScanner_q
-QThread_workerThread
-ScannerWorker_worker
-FileScanner_ScanResult_lastResult
-FileScanner_ScanOptions_options
}
class FileScanner_ScanResult {
+qint64 totalSize
+qint64 progressSize
+int fileCount
+int directoryCount
+bool isValid()
+void clear()
}
class ScannerWorker {
+ScannerWorker(QObject_parent)
+~ScannerWorker()
+void setUrls(QList_QUrl_urls)
+void setOptions(FileScanner_ScanOptions_options)
+void start()
+void stop()
+resultReady(FileScanner_ScanResult_result,bool_isFinal)
+finished()
-void scanLocalPaths()
-void scanOtherProtocols()
-void processFileWithStat(QUrl_url,stat64_pointer_statBuf,bool_followSymlink)
-bool shouldStop() const
-bool isInodeProcessed(quint64_device,quint64_inode)
-void markInodeProcessed(quint64_device,quint64_inode)
-void emitProgress(bool_force)
-bool isSpecialSystemFile(QString_path) const
-QList_QUrl_urls
-FileScanner_ScanOptions_options
-FileScanner_ScanResult_currentResult
-atomic_bool_stopped
-QElapsedTimer_progressTimer
-qint64 lastEmittedSize
-QHash_quint64_QSet_quint64_processedInodes
-static QSet_QString_kSpecialSystemFiles
}
FileScanner o-- FileScannerPrivate
FileScannerPrivate --> ScannerWorker
FileScanner ..> FileScanner_ScanResult
ScannerWorker ..> FileScanner_ScanResult
Flow diagram for ScannerWorker scanning logic with protocol selection and deduplicationflowchart TD
A_start["Start scan in ScannerWorker.start"] --> B_checkScheme{First_URL_scheme_is_file}
B_checkScheme -->|Yes| C_local["scanLocalPaths using fts"]
B_checkScheme -->|No| D_remote["scanOtherProtocols using InfoFactory"]
subgraph Local_scan
C_local --> C1_iterFTS["Iterate entries via fts_read"]
C1_iterFTS --> C2_type{Entry_type}
C2_type -->|Directory| C_dir["directoryCount++ and progressSize+=4096"]
C_dir --> C3_singleDepth{SingleDepth_option}
C3_singleDepth -->|Yes| C_skip["fts_set FTS_SKIP for children"]
C3_singleDepth -->|No| C1_iterFTS
C2_type -->|Regular_file| C_file["Check st_nlink and size"]
C_file --> C4_nlink{st_nlink>1}
C4_nlink -->|Yes and not_processed| C5_mark["markInodeProcessed and add_size"]
C4_nlink -->|Yes and processed| C6_countOnly["fileCount++ only"]
C4_nlink -->|No| C7_add["fileCount++ and add_size"]
C5_mark --> C8_progress["progressSize+=st_size"]
C6_countOnly --> C8_progress
C7_add --> C8_progress
C2_type -->|Symlink| C_sym["fileCount++ (optionally follow target)"]
C_sym --> C9_emitLocal["emitProgress_if_throttling_allows"]
C8_progress --> C9_emitLocal
C9_emitLocal --> C1_iterFTS
end
subgraph Remote_scan
D_remote --> D1_initQueue["Initialize directoryQueue with source_urls"]
D1_initQueue --> D2_loop{directoryQueue_not_empty_and_not_stopped}
D2_loop -->|Dir| D_dir["directoryCount++ and progressSize+=4096"]
D_dir --> D3_singleDepth{SingleDepth_option}
D3_singleDepth -->|Yes| D4_emitRemote["emitProgress_if_throttling_allows"]
D3_singleDepth -->|No| D5_iterDir["Create AbstractDirIterator and enqueue children"]
D5_iterDir --> D4_emitRemote
D2_loop -->|File| D_file["fileCount++ and add childInfo.size"]
D_file --> D4_emitRemote
D4_emitRemote --> D2_loop
end
C1_iterFTS -->|fts_read_null_or_stopped| E_postLocal["Adjust directoryCount by excluding source dirs unless IncludeSource"]
D2_loop -->|Queue_empty_or_stopped| E_postRemote["Adjust directoryCount by excluding source dirs unless IncludeSource"]
E_postLocal --> F_finish["emitProgress(force=true) and emit finished"]
E_postRemote --> F_finish
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
FileScanner::~FileScanneryou dereferenced->workerThreadwithout a null check (QThread::currentThread() != d->workerThreadfollowed byd->workerThread->quit()), which will crash if no worker thread was ever started; guard the whole block with aif (d->workerThread)check. - The
isFinal/successsemantics are currently inconsistent:ScannerWorker::emitProgresspasses theforceflag asisFinalandFileScannerPrivate::onWorkerResultReadyignores it, whileFileScannerPrivate::onWorkerFinishedalways emitsfinished(..., true)even on stop; either wireisFinal/successcorrectly (including cancellation) or remove these parameters to avoid confusion. - Thread and worker lifetime management is split between
FileScannerPrivate::~FileScannerPrivateandFileScanner::~FileScanner(both attempting to stop/quit/wait/delete the same QThread/worker), which risks double management; consider centralizing ownership and shutdown logic in one place to make the lifecycle clear and safe.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `FileScanner::~FileScanner` you dereference `d->workerThread` without a null check (`QThread::currentThread() != d->workerThread` followed by `d->workerThread->quit()`), which will crash if no worker thread was ever started; guard the whole block with a `if (d->workerThread)` check.
- The `isFinal`/`success` semantics are currently inconsistent: `ScannerWorker::emitProgress` passes the `force` flag as `isFinal` and `FileScannerPrivate::onWorkerResultReady` ignores it, while `FileScannerPrivate::onWorkerFinished` always emits `finished(..., true)` even on stop; either wire `isFinal`/`success` correctly (including cancellation) or remove these parameters to avoid confusion.
- Thread and worker lifetime management is split between `FileScannerPrivate::~FileScannerPrivate` and `FileScanner::~FileScanner` (both attempting to stop/quit/wait/delete the same QThread/worker), which risks double management; consider centralizing ownership and shutdown logic in one place to make the lifecycle clear and safe.
## Individual Comments
### Comment 1
<location> `src/dfm-base/utils/filescanner.cpp:67-71` </location>
<code_context>
+
+FileScannerPrivate::~FileScannerPrivate()
+{
+ if (workerThread) {
+ workerThread->quit();
+ workerThread->wait(5000);
+ delete worker;
+ delete workerThread;
+ }
+}
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid double-deleting workerThread by relying on QObject parenting or changing the parent.
`workerThread` is constructed with `q` as its parent (`workerThread = new QThread(q);`), but is also deleted here. When `FileScanner` is destroyed, it will delete its children (including `workerThread`), so this risks a double delete. Either construct `workerThread` with `this` as the parent and keep the manual delete, or keep the parent as `q` and remove `delete workerThread;`. Review `worker`’s ownership as well to avoid the same issue.
</issue_to_address>
### Comment 2
<location> `src/dfm-base/utils/filescanner.cpp:152-153` </location>
<code_context>
+{
+ d->stopWorker();
+
+ if (QThread::currentThread() != d->workerThread) {
+ d->workerThread->quit();
+ d->workerThread->wait(5000); // 最多等 5 秒
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against null workerThread when tearing down in FileScanner destructor.
`d->workerThread` is dereferenced here without a null check. If no scan was started, it stays null and this is UB; combined with the earlier manual deletion risk, this could also be a use-after-free. Please guard with something like `if (d->workerThread && QThread::currentThread() != d->workerThread)` or centralize thread shutdown in `FileScannerPrivate` and avoid touching `workerThread` here.
</issue_to_address>
### Comment 3
<location> `src/dfm-base/utils/filescanner.cpp:78-79` </location>
<code_context>
+void FileScannerPrivate::startWorker(const QList<QUrl> &urls)
+{
+ // 如果已有工作线程在运行,先停止
+ if (workerThread && workerThread->isRunning()) {
+ stopWorker();
+ }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Starting a new scan while the previous worker is still shutting down can race and duplicate signal connections.
On `startWorker`, when `workerThread` is running you call `stopWorker()` (which only sets `stopped = true`) and then immediately reuse the same `worker`/`workerThread` and attach new `resultReady`/`finished` connections. The previous run may still be unwinding and emit `finished()` later, so the old `onWorkerFinished` can run concurrently with the new task and you can end up with multiple active connections to the same signals. Please ensure the previous thread is fully stopped before reuse (e.g. `quit()` + `wait()` or a state flag), or create a new worker/thread per start and dispose the old pair on completion.
Suggested implementation:
```cpp
void FileScannerPrivate::startWorker(const QList<QUrl> &urls)
{
// 如果已有工作线程在运行,先停止
if (workerThread && workerThread->isRunning()) {
// 确保上一次扫描的线程已经完全退出,避免信号重复连接和并发回调
workerThread->quit();
workerThread->wait();
}
```
This change assumes the current implementation of `startWorker` begins immediately after the comment (i.e. right after the line we modified). If there is still an explicit call to `stopWorker()` inside this function, it should be removed to avoid redundant state changes, since we now directly manage the thread shutdown with `quit()`/`wait()`.
</issue_to_address>
### Comment 4
<location> `src/dfm-base/utils/filescanner.cpp:124` </location>
<code_context>
+ disconnect(worker, &ScannerWorker::resultReady, this, &FileScannerPrivate::onWorkerResultReady);
+ disconnect(worker, &ScannerWorker::finished, this, &FileScannerPrivate::onWorkerFinished);
+
+ emit q->finished(lastResult, true);
+}
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** `finished(success)` always reports success=true, even on user stop or traversal errors.
Since `FileScanner::finished` exposes a `success` flag, hardcoding it to `true` means callers can’t distinguish a complete traversal from one that hit `FTS_ERR/FTS_DNR/FTS_NS`, was explicitly stopped, or ended early based on `isFinal` in `resultReady`. Please track a success state in the worker (e.g., set to false on errors or `stop()`), and emit that value here instead of always `true`.
Suggested implementation:
```cpp
void FileScannerPrivate::onWorkerFinished()
{
// 断开连接
disconnect(worker, &ScannerWorker::resultReady, this, &FileScannerPrivate::onWorkerResultReady);
disconnect(worker, &ScannerWorker::finished, this, &FileScannerPrivate::onWorkerFinished);
const bool success = worker ? worker->isSuccess() : true;
emit q->finished(lastResult, success);
}
#include "filescanner.h"
```
To fully implement the behavior you described, `ScannerWorker` needs to track and expose its success state:
1. In the `ScannerWorker` class (likely in `filescanner.h/.cpp` or a related worker file):
- Add a private member, initialized to `true` at the start of each scan:
- `bool m_success { true };`
- Add a public const accessor:
- `bool isSuccess() const { return m_success; }`
2. Ensure `m_success` is set to `false` in these situations:
- When traversal hits `FTS_ERR`, `FTS_DNR`, `FTS_NS`, or analogous error conditions in your implementation.
- When the user explicitly stops the scan via `stop()` or equivalent API.
- When early termination is triggered based on `isFinal` logic, if you consider that a non-successful completion.
3. Reset `m_success` to `true` at the beginning of each new traversal (e.g., in the worker’s `start()` / `run()` / setup method) so each scan reports success independently.
After these changes, `FileScanner::finished(lastResult, success)` will correctly distinguish a complete traversal from one that ended due to errors or early termination.
</issue_to_address>
### Comment 5
<location> `src/dfm-base/utils/filescanner.h:139` </location>
<code_context>
+ * @brief 进度更新信号
+ * @param result 当前统计结果
+ *
+ * 发送频率:每200ms或每处理1000个文件
+ */
+ void progressChanged(const ScanResult &result);
</code_context>
<issue_to_address>
**nitpick:** The documented progress signal frequency does not match the implementation.
The comment says `progressChanged` is emitted every 200ms or every 1000 files, but `emitProgress()` currently uses a 500ms interval and a 10MB size-delta threshold. This discrepancy can confuse callers relying on the documented behavior. Please update either the implementation or the comment (e.g., to describe the 500ms/10MB strategy) so they match.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
a545f79 to
de702a9
Compare
1. Implement FileScanner class for asynchronous file system statistics collection 2. Add ScannerWorker with optimized local file traversal using fts(3) API 3. Support both local filesystem and other protocols (SMB, SFTP) via InfoFactory 4. Include hard link deduplication using device+inode tracking 5. Add progress reporting with throttling 6. Create test/demo program with GUI for directory statistics 7. Add CMake option OPT_ENABLE_BUILD_TESTS to control test program compilation Log: Added file scanner utility for directory statistics with demo program Influence: 1. Test FileScanner with various directory structures and file types 2. Verify hard link handling and deduplication functionality 3. Test progress reporting and throttling behavior 4. Verify both local filesystem and protocol-based scanning 5. Test demo program with single and multiple directory selections 6. Verify scanner stops correctly during application shutdown 7. Test with special system files and symlink handling feat: 添加文件扫描器工具及演示程序 1. 实现 FileScanner 类用于异步文件系统统计收集 2. 添加 ScannerWorker,使用 fts(3) API 优化本地文件遍历 3. 支持本地文件系统和其他协议(SMB、SFTP)通过 InfoFactory 4. 包含使用设备+inode 跟踪的硬链接去重功能 5. 添加节流进度报告 6. 创建带有 GUI 的测试/演示程序用于目录统计 7. 添加 CMake 选项 OPT_ENABLE_BUILD_TESTS 控制测试程序编译 Log: 新增文件扫描器工具及目录统计演示程序 Influence: 1. 使用各种目录结构和文件类型测试 FileScanner 2. 验证硬链接处理和去重功能 3. 测试进度报告和节流行为 4. 验证本地文件系统和基于协议的扫描 5. 测试演示程序的单选和多选目录功能 6. 验证应用程序关闭时扫描器正确停止 7. 测试特殊系统文件和符号链接处理
deepin pr auto reviewGit Diff 代码审查报告总体评价这段代码实现了一个异步文件扫描器,用于统计目录的文件数量和大小,并提供了一个测试程序。整体设计合理,使用了Qt的信号槽机制和线程模型,代码结构清晰,但存在一些可以改进的地方。 详细审查1. CMakeLists.txt语法逻辑
代码质量
代码性能
代码安全
2. filescanner.h语法逻辑
代码质量
代码性能
代码安全
3. filescanner.cpp语法逻辑
代码质量
代码性能
代码安全
4. 测试代码语法逻辑
代码质量
代码性能
代码安全
改进建议1. 代码质量改进
2. 代码性能改进
3. 代码安全改进
4. 其他改进
总结这段代码整体质量较高,设计合理,实现正确。主要改进点在于:
建议在后续开发中,逐步实现这些改进,以提高代码质量和用户体验。 |
|
/forcemerge |
|
This pr force merged! (status: blocked) |
Log: Added file scanner utility for directory statistics with demo program
Influence:
feat: 添加文件扫描器工具及演示程序
Log: 新增文件扫描器工具及目录统计演示程序
Influence:
Summary by Sourcery
Introduce an asynchronous file scanning utility with directory statistics support and a GUI-based demo application, guarded by a new optional tests build flag.
New Features:
Build:
Tests: