Implementation of a subset of git features
- Learn how git works in depth
- Try Scala3
- Have several loosely-coupled interchangeable components thanks to hexagonal architecture
- Try to integrate practices and patterns from DDD
- (double loop) TDD approach
- motivations and presentation of the objectives
- generated project
sbt new scala/scala3.g8 - hash a blob
- What is a blob?
- SHA1 of file with a prefix
blob <content_size>\0<content> - Hash of a blob:
echo -n 'test content' | git hash-object --stdin - Comparing with sha1 hash of the same string
echo -n 'blob 12\0test content' | shasum -a 1
- SHA1 of file with a prefix
- What is a blob?
๐บ Episode 2: Refactoring to use hexagonal architecture and introduce concepts like Command and UseCase
- refactoring and extension of the code to support other input options (file, write in database, type, etc.)
- setup domain and infrastructure packages (hexagonal architecture)
- write a test for Main
- introducing a
HashObjectCommand
- add zio (resource management, streaming, retries, parallelism, etc.)
- objective of the chapter: making a commit
- hash stdin string - change the way the command is used:
hash-object --text "test content"instead ofhash-object "test content"
- Fix the encoding issue
- Hashing a stream of bytes (ZStream and ZSink)
- Write test to hash a file
- Refactor so the hash object usecase accepts several types of command
- Implement hashing a file
- Model the return type of the usecase with a richer type
- Update test to hash several files and implement
- [Refactor/hexagonal arch.] extract reading a file and have the implementation in the infrastructure package.
- problem in the hash object usecase
- fixing the problem
- [Business Logic] write a blob in git objects directory
- create an ObjectRepository
- [/] write a test for HashObjectUseCase verifying that the repository is called
- [Business Logic] write a blob in git objects directory
- create an ObjectRepository
- write a test for HashObjectUseCase verifying that the repository is called
- [/] create the implementation for the repository and test
- what to test? we are looking to test compatibility with Git: right place, right format
- [Business Logic] write a blob in git objects directory
- Object Repository File System
- Refactor the ObjectRepositoryFileSystemSpec to generate a single hash to avoid a "cache" issue.
- Implement Object Repository File System
- Object Repository File System
- Check that hash object use case is calling the object repository with the right value (with the blob + size prefix)
- Put things together: hash and save a blob from the app and try to read it with git
- Test missing: not call the repository when the save option is false
- refactor main to extract the parsing and the formatting part
- [Business Logic] read and write git index file
- read the git index file
- [Business Logic] read and write git index file
- create a dummy index file and read it
- refactor the code to use case classes
- [Business Logic] read and write git index file
- productionize the code
- [Business Logic] write a tree in git object directory
- refactor the MainSpec to separate the concerns
- use a more specific type than string for dealing with files
- [Business Logic] write a tree in git object directory
- [Business Logic] write a commit (with a tree hash provided)
Source: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelaigitn
Git uses the concept of Object. There 3 types of object:
- blobs. A blob basically represents the content of a file. It is stored in a file named after the hash of the content.
- trees. Trees are used to represent the hierarchy between blobs. A tree contains blobs and other trees with their names. For instance :
100644 blob dc711f442241823069c499197accce1537f30928 .gitignore
100644 blob e5d351c3cd44aa1d8c1cb967c7e7fde1dee4b0ad README.md
100644 blob 7a010b786eb29b895ba5799306052b996516d63b build.sbt
040000 tree 8bac5f27882165d313f5732bb4f140003156c693 project
040000 tree 163727ec9bd17ef32ee088a52a31fe0b483fa18f src
- there are different types of files:
100644is a normal file,100755is an executable file,120000for symbolic links,040000for tree160000for sub-modules
- commits. Commits are used to capture :
- the
treesnapshot of the code - the
parent(s)commits. Usually a commit has only one parent, but it can have 0 to n parents. The first commit does not have any parent. A merge commit has several parents (usually 2). - the
author - the
commiter - a blank line
- the commit
message
- the
Those files are stored in .git/objects. Each file representing either blobs, trees or commits, are stored within directory named after the first two characters of the hexadecimal hash. For the hash dc711f442241823069c499197accce1537f30928 will be stored the in folder .git/objects/dc.
The filename is the hash without the first two letters. For the hash dc711f442241823069c499197accce1537f30928, the filename will be 711f442241823069c499197accce1537f30928 -- note that the prefix dc has been removed here. The file corresponding to the hash dc711f442241823069c499197accce1537f30928 would be .git/objects/dc/711f442241823069c499197accce1537f30928.
ZLib is a C library used for data compression. It only supports one algorithm: DEFLATE (also used in the zip archive format). This algorithm is widely used.
https://git-scm.com/docs/index-format
git cat-fileshow information about an object-p <hash>show the content of an object.hashcan bemaster^{tree}to reference the tree object pointed to the last version of master.-t <hash>show the type of object
git hash-object(explicit)git update-indexRegister file contents in the working tree to the indexgit write-treewrites the staging area to a tree objectgit ls-files--stageor-sshow all files tracked
zlib-flate -uncompress < .git/objects/18/7fbaf52b4fdebd0111740829df5b51edc8b029other program that deflates files
- https://git-scm.com/book/sv/v2/Git-Internals-Git-Objects
- https://stackoverflow.com/questions/4084921/what-does-the-git-index-contain-exactly
- https://git-scm.com/docs/gitglossary
- https://github.com/git/git/blob/master/Documentation/technical/index-format.txt
- https://git-scm.com/book/en/v2/Git-Internals-Packfiles
- Good explanations about the format of git tree https://stackoverflow.com/questions/14790681/what-is-the-internal-format-of-a-git-tree-object
