we have now a legacy venture, on Home windows. It goes on from 1990-s.
Till very just lately it was not backed by any model management, right this moment it’s moved into Git.
The query is all of the prior snapshots. As of now, they’re unfold on community shared folders, with folder names normally suggesting the a part of the venture, the date or product model, and sometimes the particular programmer, from whose improvement machine it was copied.
Pointless ot say, it takes a lot of disk area, and totally on many copies of precisely similar recordsdata, and right this moment when Covid-19 makes many people do business from home – that’s turning into an issue.
These legacy sources are not often wanted, however generally one simply to search for a selected identifier (variable, operate, file, and so forth) or perhaps a assertion – to search out when and why it was launched.
Copying them onto my house machine could be gradual and would occupy an excessive amount of of my native disk. Similar for different builders. Even when i’d Zip (or PPMd) recordsdata on server – it might be nonetheless large blob to repeat over RDP, and i’d not be capable of seek for the textual content in that archive with out velocity affect, or with out unpacking it first – which is able to take an enormous hit on native disk area on each developer machine.
I can’t transfer them into current Git repository – when i settled for my very own convenicenceit i didn’t even knew about these archives, much less so had entry to them.
Anyway, Git will not be an answer right here. Git (or different DVCS) will do knowledge de-duplication and can significantly lower disk area required, certainly. However Git mantains a “one true present snapshot” mannequin, which is helpful for improvement, however not for “industrial archeology” (sure, i can settle a number of repositories and change them to totally different branches – however that defeats the concept). I simply can’t run the textual content search by ALL the snapshots saved inside Git repo, inside all their branches and revisions/commits, with the identical ease i can run
grep (or an analogous GUI instrument) over hundreds of principally plain textual content recordsdata, even on a shared community folder.
Moreover, i wouldn’t have info to get better the “actual” scheme of all of the “branches” that had been happenning with individuals i by no means met. Whereas guessing timestamps of each folder is doable even when time consuming, inventing some formally right relationship of snapshots like “from John” and “from Mary” will not be.
Moreover, placing recordsdata into Git will free their timestamps metadata. Whereas it’s not wanted a lot for the sources that had been in DVCS from get go, for pre-VCS sources that has plenty of that means. Even for mere guessing which modules had been alive and which had been derelicts that folks simply couldn’t virtually delete away with out VCS.
So, how individuals do handle conditions like this?
I feel i would like some system that:
- runs on Home windows
- deduplicates recordsdata
- maintains unique paths, names and timestamps
- has helpful and straightforward to be taught FTS with out proscribing me to at least one “present” snapshot out of many saved
- ideally FTS to be accessible as ordinary “Search textual content in recordsdata” with no sensible efficiency hit when in comparison with simply storing recordsdata and folders “as is” (so, Zip archive will not be an possibility, and it has no deduplicaiton too), so zero studying curve and freedom to make use of any grep-like and diff-like instruments of alternative one would possibly like.
- doesn’t require formal relationships from one snapshot to a different, although optionally available timeline relations could be good bonus. It could be okay to don’t have any idea of snapshots in any respect – it might be no worse than what it’s right this moment.
- may be simply and effectively copied over the comparatively gradual community (so, if there may be some natively de-duplicating filesystem for Home windows Servers – it might not assist a lot, it might be problematic to repeat the information over community with out re-duplication and there’ll in all probability be no such filesystem on non-Server improvement machines)
- low upkeep burden
- ideally detects and fixes or at the least warns about random knowledge degradation occasions (bit flips on HDD/SSD, and so forth).
- no want to change knowledge after preliminary storing it, truly it higher be strictly read-only
I feel hypothetically it will possibly principally be carried out on high of NTFS 5 symbolic hyperlinks, and maybe somebody had already carried out it?
Or perhaps i simply fail to suppose out of the field and see a distinct workflow, simple to be taught and use.