Goddess of Justice DB, the database used for storage on IzaroDFS
Vous ne pouvez pas sélectionner plus de 25 sujets Les noms de sujets doivent commencer par une lettre ou un nombre, peuvent contenir des tirets ('-') et peuvent comporter jusqu'à 35 caractères.

2782 lignes
126 KiB

il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
il y a 4 ans
  1. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. % The Legrand Orange Book
  3. % LaTeX Template
  4. % Version 2.4 (26/09/2018)
  5. %
  6. % This template was downloaded from:
  7. % http://www.LaTeXTemplates.com
  8. %
  9. % Original author:
  10. % Mathias Legrand (legrand.mathias@gmail.com) with modifications by:
  11. % Vel (vel@latextemplates.com)
  12. %
  13. % License:
  14. % CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
  15. %
  16. % Compiling this template:
  17. %
  18. % 1) pdflatex main
  19. % 2) makeindex main.idx -s StyleInd.ist
  20. % 3) biber main
  21. % 4) pdflatex main x 2
  22. %
  23. % After this, when you wish to update the bibliography/index use the appropriate
  24. % command above and make sure to compile with pdflatex several times
  25. % afterwards to propagate your changes to the document.
  26. %
  27. % Chapter heading images should have a 2:1 width:height ratio,
  28. % e.g. 920px width and 460px height.
  29. %
  30. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  31. %----------------------------------------------------------------------------------------
  32. % PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
  33. %----------------------------------------------------------------------------------------
  34. \documentclass[12pt,fleqn]{book} % Default font size and left-justified equations
  35. \usepackage[T1]{fontenc}
  36. \usepackage[automake]{glossaries}
  37. \usepackage{amssymb}
  38. \usepackage{wasysym}
  39. \usepackage{eurosym}
  40. \usepackage{amsmath}
  41. \usepackage{numprint}
  42. \usepackage{bytefield}
  43. \usepackage{siunitx}
  44. \usepackage{placeins}
  45. \usepackage{pgf-umlsd}
  46. \usepackage{pgf-umlcd}
  47. \usepackage{adjustbox}
  48. \usepackage{multirow}
  49. \usepackage{enumitem}
  50. \usepackage{hhline}
  51. \usepackage{pgfplots}
  52. \usepackage{float}
  53. \usepackage{fp}
  54. \let\Oldsection\section
  55. \renewcommand{\section}{\FloatBarrier\Oldsection}
  56. \let\Oldsubsection\subsection
  57. \renewcommand{\subsection}{\FloatBarrier\Oldsubsection}
  58. \let\Oldsubsubsection\subsubsection
  59. \renewcommand{\subsubsection}{\FloatBarrier\Oldsubsubsection}
  60. \author{Ludovic `Archivist' Lagouardette}
  61. \title{SStorage}
  62. \date{2019}
  63. \makeglossaries
  64. \input{structure.tex} % Insert the commands.tex file which contains the majority of the structure behind the template
  65. %\hypersetup{pdftitle={Title},pdfauthor={Author}} % Uncomment and fill out to include PDF metadata for the author and title of the book
  66. %----------------------------------------------------------------------------------------
  67. \begin{document}
  68. %----------------------------------------------------------------------------------------
  69. % TITLE PAGE
  70. %----------------------------------------------------------------------------------------
  71. \begingroup
  72. \thispagestyle{empty} % Suppress headers and footers on the title page
  73. \begin{tikzpicture}[remember picture,overlay]
  74. \node[inner sep=0pt] (background) at (current page.center) {\includegraphics[width=\paperwidth]{background.pdf}};
  75. \draw (current page.center) node [fill=ocre!30!white,fill opacity=0.6,text opacity=1,inner sep=1cm]{\Huge\centering\bfseries\sffamily\parbox[c][][t]{\paperwidth}{\centering SStorage\\[15pt] % Book title
  76. {\Large Whitepaper}\\[20pt] % Subtitle
  77. {\small Ludovic `Archivist' Lagouardette}}}; % Author name
  78. \end{tikzpicture}
  79. \vfill
  80. \endgroup
  81. \frontmatter
  82. %----------------------------------------------------------------------------------------
  83. % COPYRIGHT PAGE
  84. %----------------------------------------------------------------------------------------
  85. \newpage
  86. ~\vfill
  87. \thispagestyle{empty}
  88. \noindent Copyright \copyright\ 2019 NekoIT\\ % Copyright notice
  89. \noindent \textsc{Published by NekoIT}\\ % Publisher
  90. \noindent\includegraphics[width=8cm]{Pictures/nekoit_title}
  91. \noindent \textsc{https://archivist.nekoit.xyz}\\ % URL
  92. \noindent This document is under Creative-Common License BY-NC-SA 3.0.\\ % License information, replace this with your own license (if any)
  93. \noindent \textit{First printing, 2019} % Printing/edition date
  94. %----------------------------------------------------------------------------------------
  95. % TABLE OF CONTENTS
  96. %----------------------------------------------------------------------------------------
  97. %\usechapterimagefalse % If you don't want to include a chapter image, use this to toggle images off - it can be enabled later with \usechapterimagetrue
  98. \chapterimage{chapter_head_1.jpg} % Table of contents heading image
  99. \cleardoublepage % Forces the first chapter to start on an odd page so it's on the right side of the book
  100. \pagestyle{fancy} % Enable headers and footers again
  101. \chapter{Introduction}
  102. \pagestyle{empty} % Disable headers and footers for the following pages
  103. \tableofcontents % Print the table of contents itself
  104. \pagestyle{fancy} % Enable headers and footers again
  105. \mainmatter
  106. \part{Project presentation}
  107. \chapter{The state of cloud storage}
  108. \vspace{-5em}
  109. \begin{center}
  110. {\tiny All brands and companies belong to their rightful owners. We are independent from them and are only quoting them for the sake of comparison.}
  111. \end{center}
  112. \vspace{5em}
  113. \vspace{-3em}\hspace{-1.4em}Nowadays, cloud storage is getting a fair amount of popularity: saving space on mobile devices, backing up important data, getting access on data while using multiple devices.
  114. \section{Competition}
  115. Multiple service providers, Google, Amazon, Ovh, Apple, as well as some other smaller actors have tackled the task of storing data for a variety of use-cases, pricing tables and options.
  116. In this section you will find a report on the current state of competition. We do not aim at taking a look at the whole market but at a number of important or interesting actors.
  117. \subsection{Google Drive and Google Cloud Platform}
  118. Google is famous for its economical impact on the software development industry. This is also true of its Google Drive and Google Cloud Platform products.
  119. It could be labeled the cheapest way to backup multiple terabytes\autocite{ltt_backup}. You can also quote their other products Google Cloud Storage as a cheap yet very efficient tool to store data for applications and websites.
  120. They however do not offer any kind of protection on their services, the data stored on their side is not encrypted and they may use it for advertisement purposes for example. They however are not misleading on their offer even if their product is not privacy centered at all.
  121. \subsection{Amazon Cloud Drive and their variety of services}
  122. It is possible to expand at length on how varied and efficient Amazon cloud storage is. They provide nearly all types of storage for any type of data structure from the typical file system to the most advanced layouts of databases.
  123. Their prices are slightly higher than those of Google. Like Google however, they only propose encryption of the data while in transit.
  124. Most of Amazon Cloud Service are targeted towards professional users.
  125. \subsection{Operation Tulip (NextCloud over Ceph)}
  126. An open-source initiative to propose a simple cloud suite with file storage and tools like a calendar and an online LibreOffice implementation.
  127. This service is in open beta\footnote{can be tested by anyone} and uses open source software to hold encrypted data with storage redundancy: Ceph and NextCloud.
  128. It is not to be used for actively using the data but more as a backup solution and cold storage.
  129. \subsection{Backblaze}
  130. A data backup company that offers multiple solutions with diverse options. They permit the use of forms of secure encryption. They however do not encrypt all of the metadata and reserve the right to sell those to a variety of company.
  131. They also offer a storage for live data in one of their offers. Their prizes are relatively competitive compared to Amazon Cloud Storage for example.
  132. \subsection{Dropbox}
  133. A well known actor in backup cloud storage system. They provide multiple tiers of pricing, from a free offer to multiple paid storage offers. All of them are meant for dead storage, for roaming and for sharing files.
  134. \subsection{Tarsnap}
  135. A small actor based in Canada. they offer cold storage services, encrypted and open-source on client side. They pricing is on a \textit{"as you go"} basis, pricing network traffic as well as storage used.
  136. The performance of the service was not tested.but from its software architecture and design, it may be relatively slow for using it as an active storage as it is delta based, as well as being unlikely to be usable with a high degree of concurrency.
  137. \section{Technology}
  138. Multiple technologies and their open-source counterparts can be used to handle online data storage. In this section we will explore those possibilities by comparing both commercial and free solutions where possible.
  139. \subsection{Google Spanner and CockroachDB}
  140. Google Spanner and CockroachDB are two database software for geo-replicated databases. They use a clock based mechanism for handling transactions, making them as fast as their clock synchronization. CockroachDB have however lower requirements on clock accuracy that Google Spanner does\autocite{cockroach_atomic}.
  141. Google Spanner is a proprietary product from Google. CockroachDB is an open-source project from CockroachLabs made to implement as much of Google Spanner features as possible. It also intends to try to be compatible with PostgreSQL to ease application porting\autocite{cockroach_postgres}.
  142. Both of those tools can be used to implement either a block based storage or an object storage usable to implement a geo-replicated filesystem.
  143. Using CockroachDB as a back-end to implement SStorage was envisioned, but latency tests made us choose to use a custom implemented data server. SStorage uses a similar way of resolving database conflict (see the sequence diagrams \ref{fig:confirmation_proto} and \ref{fig:2user_confirmation_proto} at page \pageref{fig:confirmation_proto} and \pageref{fig:2user_confirmation_proto}).
  144. \subsection{Ceph, RADOS and CRUSH}
  145. Ceph is a distributed data storage system. It uses the RADOS (Reliable Autonomic Distributed Object Store), a storage system designed around the idea of placing data in predictable place following a mathematical equation. This is named CRUSH, for Controlled Replication Under Scalable Hashing.
  146. Placement of data in SStorage system follows some concepts from Ceph, RADOS and CRUSH.
  147. Ceph is currently in development by the CERN. Ceph is used as a back-end for many types of storage, from filesystems to block devices and storage for scientific data.
  148. As mentioned in the listing of other actors, the Operation Tulip project are using it to store the files they manipulate. Other not mentioned actors like Ovh use it for storing Virtual Machines and as storage for cloud computing for example.
  149. \subsection{NextCloud}
  150. NextCloud is an open-source system written in PHP to be used as a front-end for cloud hosting. It supports WebDAV and other protocols as well as providing multiple productivity features like text edition, spreadsheets and calendars.
  151. It is slow due to having been designed in a programming language unsuitable for performance applications.
  152. It supports end to end encryption.
  153. \section{Hardware and hosting}
  154. Naming it cloud storage doesn't mean the data is in some phantasmagorical place. As such we will study here the possibilities for one to deploy their own cluster of servers to host their own data.
  155. For that will be provided a comparison of pricing of the hardware required to deploy their own solution online for a data size around 50\si{\tera{}B (\pm 5\percent)} of storage.
  156. \begin{table}[h]
  157. \centering
  158. \begin{tabular}{|l|l|l|l|}
  159. \hline
  160. System & Upfront price & Price per GB per year & Ac. D.\footnote{Accounting depreciation in years} \\\hhline{|=|=|=|=|}
  161. SuperMicro SC825TQ-560LP $\times 3$ & USD15100 & USD0.21 & 5 \\
  162. and SuperMicro 5018D-MF (new) & +USD900/m & & \\\hline
  163. HP~ProLiant~DL180-G5 $\times 4$ & USD3350 & USD0.21 & 3 \\
  164. (refurbished) & +USD900/m & & \\\hline
  165. Ovh rented servers $\times 4$ & USD780/m & USD0.20 & 0 \\\hline
  166. \end{tabular}
  167. \textit{It is to be noted that the performance is also decreasing with each category down, as the upfront price is rising. Monthly fees for non-rented items are the housing for the hardware.}
  168. \caption{Server pricing}
  169. \label{tab:server_pricing}
  170. \end{table}
  171. As the table \autoref{tab:server_pricing} expresses, buying hardware in 2019 is not as efficient as renting servers if you do not have the ability to host your servers in your own premises.
  172. \subsection{Brand new hardware}
  173. Brand new hardware is generally a real investment for an individual or a new company. It is also a technical choice that can have lifelong consequences on the business as computers are subdivided in families that each have specific features and behaviours.
  174. Of those families, named architectures, we will consider two: the \texttt{x86\_{}64}, also referred as \texttt{amd64}; and the most recent architecture from the \texttt{ARM} group, the \texttt{ARMv8} architecture and its variants.
  175. Both of them share the minimal set of features for a type of storage named a \texttt{memory mapped hash table} to be implementable.
  176. \subsubsection{\texttt{x86\_{}64} architecture}
  177. This architecture is common to most modern computers, laptops, workstations and servers. It is therefore easy to make software for it as it is well documented.
  178. It have the huge drawback of being power hungry, having been extended over time, it tends to be consume more power and require proportional cooling.
  179. Taking for example a server from SuperMicro SC825TQ-560LP, an estimate of price around 4'500USD per server three times for the data storage, as well as any other server for handling coordination of data storage.
  180. For coordination, a server of the likes of a SuperMicro 5018D-MF, for which we estimate a price of about 1'600USD equipped with a proper network card for handling connections to the storage servers properly is appropriate.
  181. \subsubsection{\texttt{ARMv8} architecture}
  182. This architecture being new, it is hard to adventure into pricing it, but adapted servers for storage equipped with ThunderX2 CPUs from Cavium would do well as storage server and likewise equipped servers with a ThunderX2 adapted for computationally heavy loads would fit the use case as a coordination server.
  183. This setup is however untested and it would not be possible at the time of redaction of these lines to test it for us. These servers also may not run some operating systems critical for safety of network infrastructure like OpenBSD as of the writing of these lines\footnote{last verification on January 14th 2020}.
  184. \subsection{Refurbished hardware}
  185. We looked into the products of professionals in sales of refurbished hardware. It is advised for those with small budget a constellation of HP~ProLiant~DL180-G5, with a price per server of about 950USD (disks being new), and any server with a decent enough set of network connectivity to not be a bottleneck in coordination.
  186. \subsection{Rented dedicated servers}
  187. As for dedicated servers, Ovh proposes to rent servers for 230USD per month per server with an added 80USD per month for the coordination server. This doesn't encompasses any backup server additionally needed to guarantee fast replication if one of the three servers fails, but it takes into account all hosting costs.
  188. \section{The users}
  189. We conducted a survey on Telegram and Discord about privacy in cloud storage and cloud storage usage. While with hindsight, some questions given the population surveyed lead to unsurprising responses (Telegram being quite security oriented), they may be interesting on certain regards.
  190. The survey was conducted on 23 people, most of them from computer science and programming related groups on Telegram and politics and economics related groups on Discord.
  191. \begin{figure}[h]
  192. \centering
  193. \begin{tikzpicture}
  194. \begin{axis}[
  195. ybar,
  196. enlargelimits=0.15,
  197. legend style={at={(0.5,-0.15)},
  198. anchor=north,legend columns=-1},
  199. ylabel={Percentage on interrogated people},
  200. symbolic x coords={Not so concerned,Concerned,Very concerned},
  201. xtick=data,
  202. nodes near coords,
  203. nodes near coords align={vertical},
  204. ]
  205. \addplot coordinates {(Not so concerned,18.2) (Concerned,13.8) (Very concerned,68.2)};
  206. \end{axis}
  207. \end{tikzpicture}
  208. \caption{Concerns about privacy}
  209. \label{fig:privacy_concerns}
  210. \end{figure}
  211. \FPeval{telegramvpn}{round(800/15, 2)}
  212. \FPeval{notelegramvpn}{round(400/8, 2)}
  213. First of all, most of the people interrogated were Telegram users, in the demographic of Telegram users \telegramvpn\% are using a VPN, and \notelegramvpn\% or the surveyed people that do not use Telegram use a VPN.
  214. It also shows that the privacy concerned population have significantly more trust in open-source community approved cryptography than in government approved cryptography.
  215. Lots of people also use encrypted hard-drives but most are cold to the use of cloud storage. From a few interviews most of them harbor distrust of the majors cloud storage service providers. Our goal is to provide a solution that these people can trust to store their hot and cold data.
  216. \begin{figure}
  217. \centering
  218. \begin{tikzpicture}
  219. \begin{axis}[
  220. ybar,
  221. enlargelimits=0.15,
  222. legend style={at={(0.5,-0.15)},
  223. anchor=north,legend columns=-1},
  224. ylabel={Percentage on interrogated people},
  225. symbolic x coords={A,B,C,D},
  226. xtick=data,
  227. nodes near coords,
  228. nodes near coords align={vertical},
  229. ]
  230. \addplot coordinates {(A,15.8) (B,52.6) (C,63.2) (D,78.9)};
  231. \end{axis}
  232. \end{tikzpicture}
  233. \begin{minipage}{0.82\textwidth}
  234. \begin{itemize}
  235. \item A: Harddrive encryption (Hardware or commercial solution)
  236. \item B: Harddrive encryption (Open-source solution)
  237. \item C: VPN
  238. \item D: Telegram
  239. \end{itemize}
  240. \end{minipage}
  241. \caption{Use of privacy enabling tools}
  242. \label{fig:privacy_tools}
  243. \end{figure}
  244. \begin{figure}
  245. \centering
  246. \begin{tikzpicture}
  247. \begin{axis}[
  248. ybar,
  249. enlarge x limits=0.15,
  250. ymin=0, ymax=100,
  251. xmin=A, xmax=D,
  252. legend style={at={(0.5,0.0)},
  253. anchor=north,legend columns=-1},
  254. ylabel={Percentage on interrogated people},
  255. symbolic x coords={A,B,C,D},
  256. xtick=data,
  257. nodes near coords,
  258. nodes near coords align={vertical},
  259. ]
  260. \addplot coordinates {(A,86.4) (B,9.1) (C,0.0) (D,4.5)};
  261. \end{axis}
  262. \end{tikzpicture}
  263. \begin{minipage}{0.82\textwidth}
  264. \begin{itemize}
  265. \item A: Open-source community approved cryptography
  266. \item B: Government approved cryptography
  267. \item C: Hardware implemented cryptography
  268. \item D: I don't know
  269. \end{itemize}
  270. \end{minipage}
  271. \caption{Opinions on cryptography}
  272. \label{fig:crypto_tools}
  273. \end{figure}
  274. \begin{figure}
  275. \centering
  276. \begin{tikzpicture}
  277. \begin{axis}[
  278. ybar,
  279. enlargelimits=0.15,
  280. legend style={at={(0.5,-0.15)},
  281. anchor=north,legend columns=-1},
  282. ylabel={Percentage on interrogated people},
  283. symbolic x coords={Yes (Encrypted),Yes,No},
  284. xtick=data,
  285. nodes near coords,
  286. nodes near coords align={vertical},
  287. ]
  288. \addplot coordinates {(Yes (Encrypted),27.3) (Yes,18.2) (No,54.5)};
  289. \end{axis}
  290. \end{tikzpicture}
  291. \caption{Use of private online data storage}
  292. \label{fig:cloud_usage}
  293. \end{figure}
  294. \chapter{A personal view on privacy}
  295. \vspace{-5em}
  296. \begin{center}
  297. {\tiny This chapter expresses the view of the author of both this documentation and the software associated with it and only of him.}
  298. \end{center}
  299. \vspace{5em}
  300. \vspace{-3em}\hspace{-1.4em}Privacy is a daily concern. Everyday people use objects made to guarantee some levels of it, from curtains to acoustic insulation, from locked doors to security cabinets, privacy is something that concerns doctors, lawyers, engineers, inventors, chefs, military staff\ldots{} But also each and everyone to some degree.
  301. Sometime privacy is an indirect concern: an archival company should not take a peek at your doctor's or lawyer's files and cases. Sometimes such an indirect concern reaches to be a concern of someone through friends, business partners, lovers\ldots{} You would not want someone to learn your friend's secrets through you.
  302. 50 years ago, someone wanting to learn your secrets had to listen on your telephone line, breach in your office and open your safe, go in your house or hire a detective. Today, that person may just be able to buy your secrets.
  303. \vspace{1em}In this chapter we will talk about a variety of topics related to digital privacy, from its premises to its implementation.
  304. \section{On terms of service}
  305. In the software and service industry, terms and conditions of service are the typical way for a company to announce the type of data they collect and the use they make of said data.
  306. Those conditions may also tell a person to who the data sent on their service belongs, some services taking property of, for example, all and any picture uploaded to them.
  307. This is in my opinion problematic from a moral standpoint when it is not explicit that the service acquires your information with your consent but on terms you may not entirely agree with for the simple reason that those terms are buried into a huge quantity of legal information.
  308. The projection of that issue is when the very same terms and conditions allow for the company to sell or provide the information, generally non-anonymized, to a third party without additional demand for consent. This is extremely common in companies that offer services "for free" or for very low prices compared to the cost of the actual service.
  309. \section{On advertisement}
  310. Advertisement is a very close issue to the one above. Most advertisements online run code on the computer than sees the advertisement to ensure the advertisement is seen by a human and not a computer. The advertisement also collect information to uniquely identify the user and link the user to the data in the page and website the user is visiting. This allows the advertisement company to run finely adapted advertisement.
  311. One of the bad consequence of that is what we saw during the Cambridge Analytica incident in 2017, when the company of the same name got tasked to influence voters of the United States of America targeting them specifically on points of the opposing party they were likely not agreeing on with advertisement and viral videos.
  312. In that very scandal, it had appeared the developed database could for example accurately point out users that were in favour of free access to guns\autocite{cambridge_analytica}.
  313. Those are however not the average practices. For example the DuckDuckGo search engine providers only provide advertisement depending on the exact query entered and nothing more, not collecting any data. On a grayer side, Twitter provide access to tweet statistics and advertise this feature to all users, letting you know how they get their money and offering you to go from user to consumer very easily, which is a better practice than silently collecting data for sales to companies only.
  314. \section{On manipulation and political acts}
  315. As technology evolved, we gained power to make links on multiple pieces of data about users like presented in the previous part of this chapter, we also showed how data collection can be used to further a political agenda with targeted advertisement. But this omit the most clear and easily forgotten form of political warfare, collecting the other side's information at the source. This also applies to people that may want to blackmail someone else or just ruin their reputation for hidden motives or just the challenge of it.
  316. Numerous times have we saw such breaches. From the Watergate scandal half a century ago to Apple iCloud breaches in 2014\autocite{bbc_icloud}, stealing data is a typical way of spying on your opponent whatever the game being it political, gambling, contests, art, etc\ldots{} It can also be used to obtain an unfair advantage in trials and other circumstances.
  317. We here see the importance of privacy for public figures just as we saw it for voters in the last part. It goes the same for other methods like viral advertising and scientific cherry-picking when pushing a political agenda.
  318. It is also valid on a bigger scale like for example, standardization of encryption by a country in order to enforce it to be breakable like it happened in 1977 with the DES~56 encryption primitive\autocite{wiki:Data_Encryption_Standard}.
  319. \section{On misrepresentation of encryption}
  320. \begin{center}
  321. \textit{If you want to further your understanding of encryption before proceeding, I advise you to take a read at the \autoref{annex:encryption_popularized}}
  322. \end{center}
  323. \vspace{2em}
  324. \hspace{-1.4em}Encryption is often misrepresented, both by lots of governmental figures and by lots of commercial software providers.
  325. Nowadays, most of the web communications are encrypted and at least partially authenticated. Authentication is done through asymmetrical encryption based systems, contacting an intermediate named a certificate provider. Some of the certificates are included in most mainstream browsers (for example, the certificates of \texttt{google.com} are embedded in Android devices and in Google Chrome), which secures the communication with those entities if the private key is not compromised.
  326. This doesn't mean that any data sent to those services is encrypted once stored on the provider: most providers do not store data encrypted as it brings computing costs up by a very significant margin if they need to access that data.
  327. Similarly, it is considered bad practice to store passwords in a readable format, to protect them, specific cryptographic techniques exist so that it is possible to verify a password from a form of said password transformed with a one way transformation named a cryptographic hash function. That being said, some companies still store password in readable form in their databases.
  328. This means that, access should be compromised on the database of a company or within any vulnerable part of their computer system, data could be entirely compromised. This has happened a lot in recent years, and is bound to be a phenomena that multiplies should companies not start caring for their customer's privacy.
  329. Such compromise can happen in various ways, and having physical access to the machine makes it easy to access most if not all data on the machine. Most companies renting their disk space, servers, or computing power from other companies, it means you rely on companies contractors not to sell your data per the terms of their hosting services too, and encryption of the transit of data will not protect you from this issue.
  330. In the meanwhile, all companies play the game of demagogy and present themselves as perfectly secure. Some have the transparency to present you the way their technology works, relying on open-source software to provide their services.
  331. \vspace{1em}There is also the topic of back-doors in those services for governmental checks. The main issue with them is the following: if the government can access your data without you knowing, then virtually anyone can do the same. It adds a critical point of failure in the system. It also means the service is not usable for sensitive topics like defense and military uses.
  332. On an even worrisome topic, some companies boast to feature encryption of user data, while they only ever ensure this encryption on transit, or advertise it while not all of their offers are actually featuring encryption of user data.
  333. \chapter{Izaro storage}
  334. The way we decided to implement our storage focuses on protection, obfuscation and performance. We will here explain the influences and consequences of these ideas.
  335. \section{Goals}
  336. We want to provide an online storage with the following properties:
  337. \vspace{0.6em}First of all, it must be georeplicated. It is not okay to lose service access due to the loss of one server farm on our own side.
  338. \vspace{0.3em}Then, the data must be protected, we ourselves should be entirely unable to read it, we should also be unable to read the metadata that is not absolutely required to provide the service.
  339. \vspace{0.3em}Also, any part of the data must be fast enough to access that it is hard to differentiate our service from access of an encrypted hard-drive given a good enough network connection, same goes for writing.
  340. \vspace{0.3em}Finally, it must be flexible and adaptable to multiple use-cases.
  341. \vspace{1em}This leads us to the following idea: we are aiming to create a service that can store encrypted data, it must be able to store it in a layout similar to a disk, this way it possesses the same capabilities as a hard drive disk. The key to decipher the data is stored online but encrypted using a key derived from the user password. Authentication requires the user to be able to read the password to get a token. It is possible to leave said token disabled and enable it only with a second authentication factor.
  342. We want our system to be protected from the point of view of our customers, as such, we aim at it having a code-base readable and short enough to be explored completely in 3 days by a developer with access to enough documentation.
  343. \section{Principles}
  344. Our project aim to follow the following principles:
  345. \begin{itemize}
  346. \item Principle of least knowledge: if it is possible for us to never have access to a readable form of some data, then we should not make it mandatory or provide alternatives.
  347. \item Principle of greater usage: if it is possible, we have to use the most out of the algorithms we use, be it cryptographic primitives or other algorithms.
  348. \item Principle of openness: we aim to disclose any incident that may happen, and to disclose any request by officials to access anyone's data.
  349. \end{itemize}
  350. The principle of least knowledge is upheld in the very design of the system: only the user can make sense of the address space of both the file system and block device. To provide an analogy, the data is stored in multiple boxes. The user side software randomly labels the boxes and seal them (that seal is the encryption). If you store data that overflows from one box, you will store in multiple boxes. deciphering any data requires to know which box is the first one and which is the next one, but that very piece of information is not stored on the server: it is stored in one of the sealed boxes.
  351. Furthermore, the labels of each block of data can be used as a piece of the encryption process, this is an example of the principle of greater usage: any additional information that can help make the system safer, we will use it.
  352. As for the openness principle, it is just as stated, we will disclose any demand that are made as soon as they are made as well as our responses to them. We will disclose any security issue or concern we receive. We will provide tools for anyone to be informed of these information through multiple channels.
  353. \section{Consequences of those principles in the design}
  354. The first consequence of that design is that it is impossible for us to decipher data sent to us. We however had a trade-off to make to maintain a healthy performance for reading or writing sequential data of considerable size: if a big file is written at once and the client send the data in order, the big file will be stored approximately sequentially in our database. This can be however be blurred by not writing the big file in order.
  355. Another consequence is that the most permissions that can be handled on a block (a unit of a file system) is making it either non-readable, read-only, or write, and that this unit of file-system is not suitable to implement in system access control (for example, Microsoft Windows permissions, Linux system permissions\ldots{}), meaning those are to be enforced by the client computer. Whatever the permissions, this means that having any file-system write permission allows a user to perform any operation on the file-system as a whole.
  356. \section{Data life cycle}
  357. Here is a list of the data that may be collected by us in any interaction with our software. This data is sorted by interaction.
  358. \begin{table}[h]
  359. \centering
  360. \begin{tabular}{lll}
  361. \hline
  362. \ttfamily User action & \ttfamily Data collected & \ttfamily Reason \\\hhline{===}
  363. \multirow{4}{*}{Account creation} & \multicolumn{1}{l}{Email} & \multicolumn{1}{l}{Authentication} \\\cline{2-3}
  364. & \multicolumn{1}{l}{Nickname} & \multicolumn{1}{l}{Authentication} \\\cline{2-3}
  365. & \multicolumn{1}{l}{Password (Obfuscated)} & \multicolumn{1}{l}{Authentication} \\\cline{2-3}
  366. & \multicolumn{1}{l}{Time of creation} & \multicolumn{1}{l}{Bookkeeping} \\\hline
  367. \multirow{1}{*}{Connection} & \multicolumn{1}{l}{Connection time} & \multicolumn{1}{l}{Authentication} \\\hline
  368. \multirow{3}{*}{Payment (below 20\euro)} & \multicolumn{1}{l}{Amount} & \multicolumn{1}{l}{Accounting} \\\cline{2-3}
  369. & \multicolumn{1}{l}{Time of payment} & \multicolumn{1}{l}{Accounting} \\\cline{2-3}
  370. & \multicolumn{1}{l}{Type of payment} & \multicolumn{1}{l}{Accounting} \\\hline
  371. \multirow{4}{*}{Payment (above 20\euro)} & \multicolumn{1}{l}{Amount} & \multicolumn{1}{l}{Accounting} \\\cline{2-3}
  372. & \multicolumn{1}{l}{Time of payment} & \multicolumn{1}{l}{Accounting/Bookkeeping} \\\cline{2-3}
  373. & \multicolumn{1}{l}{Type of payment} & \multicolumn{1}{l}{Accounting} \\\cline{2-3}
  374. & \multicolumn{1}{l}{Invoice address} & \multicolumn{1}{l}{Accounting} \\\hline
  375. \multirow{2}{*}{Writing data} & \multicolumn{1}{l}{Server time} & \multicolumn{1}{l}{Data protection (Consensus system)} \\\cline{2-3}
  376. & \multicolumn{1}{l}{Number of used blocks} & \multicolumn{1}{l}{Accounting/Bookkeeping} \\\hline
  377. \end{tabular}
  378. \caption{Table of collected data}
  379. \label{tab:data_collection}
  380. \end{table}
  381. \chapter{A personal view on business practices}
  382. \vspace{-5em}
  383. \begin{center}
  384. {\tiny This chapter expresses the view of the author of both this documentation and the software associated with it and only of him.}
  385. \end{center}
  386. \vspace{5em}
  387. \vspace{-3em}\hspace{-1.4em}On the internet, we encounter a huge variety of businesses and of company cultures. We also encounter a just as huge variety of bad business practices.
  388. As a person with peculiar ways to see good and bad, I will say that is actively doing things that may be hurtful to a customer or have dire consequences to them as bad.
  389. \section{On selling the user}
  390. For some companies, users are the resource that they sell to gain their money. I would like to put down 3 forms of business just after that sentence:
  391. \begin{enumerate}
  392. \item Companies that sell products and services
  393. \item Companies that showcase or regroup other products and services of other companies and sell them
  394. \item Companies that provide a service to someone for free in exchange for their personal information in order to sell said information to other companies
  395. \end{enumerate}
  396. I think it is fairly easy to see how a user could be troubled by the idea that the service they are providing them is a honeypot for advertisement information. I make a difference between that and providing advertisement to user of one's platform in the idea that they are providing information to their users (advertisements) but not selling the information of any individual user of their platform, putting it akin to advertisements on television or on cars.
  397. I see two issues in selling personal information from users \emph{even if they signed a contract to agree you do} and first is that users do not know to who their personal information has been exchanged with; in short, they lose control of their privacy. Coming from a country where keeping video protection records for more than necessary is illegal and where one can forbid companies from using their phone number for any kind of communications, I was raised in the idea you are free to exchange your own personal information but not anyone's else.
  398. This leaves you master of your information at all times.
  399. The second problem is a moral one, and it is double. First, their user is a product, no longer so much of a human. This means that more than respecting them as individuals, you have to get a lot of them to create a significant revenue. Then, it is about how explicit companies that do that kind of model inform their users of the way their model works and on the alternatives to having your data sold they do provide, which are generally nonexistent.
  400. I do understand people would be willing to sell their data for a service free of charges, It should be understandable people would also be willing to have that service and pay for it to be in a situation where their data is not sold to third parties.
  401. \section{On misrepresenting the invisible}
  402. Some companies are proud to showcase their service/products as a first class services/products, whenever it is customer support or products, while it is actually third-world exploitation or exploiting illegal immigrants\autocite{apple_sweatshop}.
  403. No need to present any details on morals implications on lies and of employing low qualification labor for as cheap as it is possible. I will focus on how bad that is for your customers. First of all, under-qualified labor is very likely to provide wrong advice for products or to be very under performing providing a service, not to mention building or assembling any sort of products.
  404. Some also present their products like they are so advanced they are magical while they actually provide very few added value.
  405. \vspace{1em}On this side is also the representation of the mythical beast named the Cloud. People tend to represent it to themselves as their data stored in some imaginary location, as it effectively can be anywhere in the world. Where a company could make sure data about a single customer is in a handful of their data centers so that they could say "your data is in located in our datacenter of \emph{name of the city} in \emph{name of the country}" it would make for a nice improvement.
  406. \section{On practice of transparent business}
  407. I think any level of openness of practices in a business is good. When you show the insides of your business and of how it works, I think it both bears trust and customers to you.
  408. Companies with open business practices have in my opinion the greatest chance of surviving for a long time and to avoid scandal. I think it can be as open as presenting the cost of the products sliced by its components and margin to the customers. This brings a relationship of trust between the business and its customer, making the business more capable of actually changing the price of their products given changes of the cost they incur to make them.
  409. \part{Functional specification}
  410. \chapter{Client capabilities}
  411. \section{Key header}
  412. \begin{itemize}[label={\Square}]
  413. \item Can be version-controlled
  414. \item Can store a pointer to a root
  415. \item Do not permit retrieval of key alone
  416. \item Can be encrypted with a password key generator
  417. \item Can be encrypted with a one-time pad (USB drive base OTP)
  418. \end{itemize}
  419. \section{System root header}
  420. \begin{itemize}[label={\Square}]
  421. \item Is extendable
  422. \item Can store a pointer to a device and its type
  423. \item Can store an attribute pointer for a device
  424. \item The root is on a non-shared block device and device UUID.
  425. \end{itemize}
  426. \section{Argus block device}
  427. \begin{itemize}[label={\Square}]
  428. \item Requires enough RAM space to store its metadata for access
  429. \item Reading data is $O(1)$
  430. \item Uses record $x$ and $y$ as $iv$ and/or $nonce$
  431. \item Uses stream position as position
  432. \item Can list multiple keys and ciphers in its header, applied in sequential order
  433. \item Support transactional access
  434. \end{itemize}
  435. \section{Izaro file system}
  436. Let $n$ be the number of files in the file system. Let $m$ be the size of a given file.
  437. \begin{itemize}[label={\Square}]
  438. \item All directory changes are transactional and atomic
  439. \item Opening a file is $O(log~n)$ or better
  440. \item Reading a block from an open file is $O(log~m)$ or better
  441. \item Writing a block from an open file is $O(log~m)$ or better
  442. \item Changing file attributes is atomic
  443. \item Uses record $x$, $y$, and UUID as $iv$ and/or $nonce$
  444. \item Uses file position as position inside files
  445. \item Uses position in block as position inside non-files elements
  446. \item[] On clients implementing a POSIX interface
  447. \begin{itemize}[label={\Square}]
  448. \item Append is atomic if possible
  449. \item Advisory locking is implemented on a file by file basis
  450. \end{itemize}
  451. \item POSIX permissions are implemented
  452. \begin{itemize}[label={\Square}]
  453. \item An association file only modifiable by user 0 (root) maps UIDs to user names
  454. \item Permissions are handled by the client only
  455. \item User 0 (root) can alter any permission
  456. \end{itemize}
  457. \item NT permissions MAY NOT be implemented
  458. \begin{itemize}[label={\Square}]
  459. \item NT clients MAY NOT perform ACL checks
  460. \item Any NT user can alter file data
  461. \item NT users can not alter file permissions
  462. \item Items created by NT users are created with the permission of their parent item
  463. \end{itemize}
  464. \item Verifies the long term lock of devices before use every 30 seconds at least
  465. \item Long term lock a device
  466. \begin{itemize}[label={\Square}]
  467. \item The lock is marked immediately
  468. \item The lock is active after 61 seconds of waiting have passed
  469. \item[] This is made for critical maintenance and version upgrade scripts to be able to run properly.
  470. \end{itemize}
  471. \end{itemize}
  472. \section{Heavy-clients}
  473. \begin{itemize}[label={\Square\Square}]
  474. \item[{TD}]
  475. \item Can obtain a shared secret from the data interface
  476. \item Can obtain a token from a connection
  477. \item Can obtain a token from user input
  478. \item Can synchronize a supported filesystem
  479. \item Can display usage statistics
  480. \item Can detect a block device and mount it
  481. \item Can mount IzaroFS using FUSE on systems that support it
  482. \item Can mount IzaroFS in slow mode on other systems
  483. \end{itemize}
  484. \subsection{Command-line interface client}
  485. \begin{itemize}[label={\Square}]
  486. \item Can be called with automated tools
  487. \item Supports callback scripts
  488. \end{itemize}
  489. \subsubsection{General options}
  490. \textit{Any option marked with a * must be order independent provided it is following the command that defines it.}
  491. \begin{itemize}[label={\Square}]
  492. \item \texttt{----help}*: provides a list of commands and of all the general options, if used with a command, display the help page of that command.
  493. \item \texttt{----verbose}*: trips a flag that makes any operations to output their information in \texttt{stderr}
  494. \item \texttt{----out-info}*: trips a flag that makes any operations to output more information in a computer readable format on \texttt{stdout}
  495. \end{itemize}
  496. \subsubsection{Commands}
  497. \begin{itemize}[label={\Square}]
  498. \item \texttt{help}: same as the similar option.
  499. \item \texttt{list}: enter into list mode for the next command, if no further arguments, display a list of all accessible roots and their types if they are known.
  500. \begin{itemize}[label={\Square}]
  501. \item \texttt{help}: provide help on the subcommands
  502. \item \texttt{mountable}: Display all accessible roots that are mountable as file systems.
  503. \item \texttt{in-use}: Display roots that are currently in use.
  504. \item \texttt{connections}: Display all registered connections in a \texttt{csv} format with headers as in \autoref{fig:connection_csv_columns}.
  505. \item \texttt{endpoints}: Display all registered endpoints in a \texttt{csv} format with headers as in \autoref{fig:endpoints_csv_columns}.
  506. \end{itemize}
  507. \item \texttt{mount}: Mount a mountable file system, the identifier of the file system must be provided as next argument. Outputs \texttt{OK} on success and \texttt{KO} on failure.
  508. \begin{itemize}[label={\Square}]
  509. \item \texttt{help}: provide help on the subcommands
  510. \item \texttt{----read-only}*: set up the read only flag of the file system
  511. \end{itemize}
  512. \item \texttt{login}: performs an interactive login sequence, no information is stored on disk.
  513. \begin{itemize}[label={\Square}]
  514. \item \texttt{help}: provide help on the subcommands
  515. \item \texttt{----persistent}*: trips the system into persistent login, with the key stored on the system in a configuration file. If the \texttt{----unsafe} flag is not activated, does nothing.
  516. \item \texttt{----unsafe}*: allows unsafe behaviours.
  517. \end{itemize}
  518. \item \texttt{logout}: performs a logout of the selected user, will also remove persistent logout information. Outputs \texttt{OK} on success and \texttt{KO} on failure.
  519. \item \texttt{umount}: performs the unmounting of the provided filesystem. Outputs \texttt{OK} on success and \texttt{KO} on failure.
  520. \begin{itemize}[label={\Square}]
  521. \item \texttt{help}: provide help on the subcommands
  522. \item \texttt{----NOW}*: forces the unmounting to happen even if unterminated operations are pending.
  523. \end{itemize}
  524. \item \texttt{enable}: enables a storage registered as block device storage. Outputs \texttt{OK} on success and \texttt{KO} on failure.
  525. \begin{itemize}[label={\Square}]
  526. \item \texttt{help}: provide help on the subcommands
  527. \item \texttt{----read-only}*: set up the read only flag of the device and prohibits its use for write operations.
  528. \end{itemize}
  529. \item \texttt{disable}: disables a storage registered as block device storage. Outputs \texttt{OK} on success and \texttt{KO} on failure.
  530. \begin{itemize}[label={\Square}]
  531. \item \texttt{help}: provide help on the subcommands
  532. \item \texttt{----NOW}*: stops the device immediately, ignoring any currently executing operations.
  533. \end{itemize}
  534. \end{itemize}
  535. Computer readable output format should be a subset of JSON, hence readable by any JSON parser, with the values shown in \autoref{fig:computer_readable_output}.
  536. \begin{figure}[h]
  537. \begin{center}
  538. \begin{minipage}[c]{0.9\textwidth}
  539. \begin{itemize}
  540. \item \texttt{status}: \texttt{"OK"} if the command is a success, \texttt{"KO"} if the command failed.
  541. \item \texttt{error\_{}code}: an integer representing the error as a hash of the line of code that failed.
  542. \end{itemize}
  543. \end{minipage}
  544. \end{center}
  545. \caption{Computer readable output contents}
  546. \label{fig:computer_readable_output}
  547. \end{figure}
  548. Multiple types of data are provided as CSV with the separator ``\texttt{|}'' and the first line always being a header.
  549. \begin{figure}[h]
  550. \begin{center}
  551. \begin{minipage}[c]{0.9\textwidth}
  552. \begin{itemize}
  553. \item \texttt{user}: the email address of the user
  554. \item \texttt{token}: the token as an hexadecimal string
  555. \item \texttt{"connected since"}: Unix timestamp in seconds since the connection got opened.
  556. \item \texttt{"server time offset"}: The estimated offset of time between the server and client in microseconds.
  557. \item \texttt{persistent}: \texttt{true} if the connection is stored on disk, \texttt{false} if it is stored only in memory.
  558. \end{itemize}
  559. \end{minipage}
  560. \end{center}
  561. \caption{Connection information}
  562. \label{fig:connection_csv_columns}
  563. \end{figure}
  564. \begin{figure}[h]
  565. \begin{center}
  566. \begin{minipage}[c]{0.9\textwidth}
  567. \begin{itemize}
  568. \item \texttt{user}: the email address of the user that owns the connection used
  569. \item \texttt{"unix user"}: UID of the user that mounted the endpoint
  570. \item \texttt{"endpoint file"}: the file where the endpoint is mounted if applicable, may be a device on NT systems
  571. \item \texttt{"root id"}: the identifier of the root
  572. \end{itemize}
  573. \end{minipage}
  574. \end{center}
  575. \caption{Active endpoint data}
  576. \label{fig:endpoints_csv_columns}
  577. \end{figure}
  578. \subsection{Desktop client}
  579. \textit{Only set to support the core subset of features for heavy-clients.}
  580. The desktop application is expected to work on at least Windows and Linux entirely, and to also work on any system with FUSE for the IzaroFS part.
  581. Any cosmetic shaders should for that reason be made with OpenGL, and at that with a very compatible version. Development of those shaders will be left to some professional specially trained in making shaders.
  582. Those cosmetic items must have a textual fallback accessible by hovering their element. If they represent performance statistics, any visual may be clickable and print the statistics in a file.
  583. \begin{figure}[H]
  584. \centering
  585. \includegraphics[width=0.9\columnwidth]{Pictures/desktopmain}
  586. \caption{Desktop application main view mockup}
  587. \label{fig:mockup-desktopmain}
  588. \end{figure}
  589. On \autoref{fig:mockup-desktopmain}, the circle and pie will actually be replaced by a render graphic whose inner part will spin if and only if time synchronization is happening normally, the outer part will grow brighter if elements of progress are made on the selected file system, and the inner part with transition between green and red through yellow showing data congestion.
  590. Other render elements could be added to the application for aesthetic purposes, at the sole condition they provide a visually appealing way to understand the status of the system or provide meaningful information about the system (updates, maintenances\ldots{}).
  591. \begin{figure}[H]
  592. \centering
  593. \includegraphics[width=0.9\columnwidth]{Pictures/loginpage}
  594. \caption{Desktop application login and register page mockup}
  595. \label{fig:mockup-loginpage}
  596. \end{figure}
  597. Behaviour of the login part of the page is expected to be identical to the behaviour of the login command.
  598. \begin{figure}[H]
  599. \centering
  600. \includegraphics[height=0.4\textheight,width=0.9\columnwidth,keepaspectratio]{Pictures/confirmregister}
  601. \caption{Desktop application registration confirmation page mockup}
  602. \label{fig:mockup-confirmregister}
  603. \end{figure}
  604. \begin{figure}[H]
  605. \centering
  606. \includegraphics[height=0.4\textheight,width=0.9\columnwidth,keepaspectratio]{Pictures/newdevice}
  607. \caption{Desktop application block device creation page mockup}
  608. \label{fig:mockup-newdevice}
  609. \end{figure}
  610. \begin{figure}[H]
  611. \centering
  612. \includegraphics[height=0.4\textheight,width=0.9\columnwidth,keepaspectratio]{Pictures/newfs}
  613. \caption{Desktop application Izaro filesystem creation page mockup}
  614. \label{fig:mockup-newfs}
  615. \end{figure}
  616. \chapter{Front-end capabilities}
  617. \section{Web interface}
  618. \begin{itemize}[label={\Square}]
  619. \item[] Account management
  620. \begin{itemize}[label={\Square}]
  621. \item Can create an account
  622. \item Can share a 2FA secret
  623. \item Can delete an account
  624. \item Can confirm a payment
  625. \item Can inform of payment lateness
  626. \item Can self-wipe account
  627. \end{itemize}
  628. \item[] Authentication
  629. \begin{itemize}[label={\Square}]
  630. \item Can generate a token from a connection request
  631. \item Can confirm a token with a 2FA subtoken
  632. \item Can invalidate a token
  633. \item Can generate a shared secret to handle UDP connections
  634. \item Can regenerate a password block from an older password block
  635. \end{itemize}
  636. \item Provides a payment link
  637. \item Provides a link to this document
  638. \item Provides all personal information stored on our side for the logged in user
  639. \item Provide a link to our company balance
  640. \end{itemize}
  641. \section{Data interface}
  642. \begin{itemize}[label={\Square}]
  643. \item Supports data interaction actions
  644. \item Can execute a chain of actions
  645. \item Can confirm a chain of actions
  646. \item Can clear unconfirmed actions
  647. \item Can execute a single action and confirm it
  648. \item Can read usage statistics
  649. \item[] Shared systems
  650. \begin{itemize}[label={\Square}]
  651. \item Can lock and unlock a named mutex
  652. \item Can compare and swap a semaphore
  653. \item Can provide a list of the other gateways
  654. \end{itemize}
  655. \item[] Administration
  656. \begin{itemize}[label={\Square}]
  657. \item Can read global statistics
  658. \item Can read list of UUIDs
  659. \item Can list users
  660. \item Can list connections
  661. \item Can wipe account
  662. \item Can register a spare data server. MUST be leader.
  663. \item Can register a spare coordination server. MUST be leader.
  664. \end{itemize}
  665. \item[] Automation
  666. \begin{itemize}[label={\Square}]
  667. \item Can replace a dead server
  668. \item Will duplicate if spares available following the rules below:
  669. If any category $N$ of $A$, $B$ or $AB$ has less than 1 server, make a copy of $N$;
  670. \\
  671. else if there are as many of $A$, $B$ and $AB$, copy an $A$ server;
  672. \\
  673. else if there are more $A$ than $B$, copy a $B$ server;
  674. \\
  675. else if there are more $B$ than $AB$, copy an $AB$ server.
  676. \end{itemize}
  677. \end{itemize}
  678. \section{Leadership interface}
  679. \section{Alternative interface}
  680. \chapter{Back-end capabilities}
  681. \section{GoJ Database}
  682. \begin{itemize}[label={\Square}]
  683. \item[] Data
  684. \begin{itemize}[label={\Square}]
  685. \item Can operations in bulks (see operation codes lists)
  686. \item Can read a record
  687. \item Can write a record
  688. \item Can confirm a record
  689. \item Can validate a record
  690. \item Can remove a record
  691. \item Can test existence of a record
  692. \item Can allocate a record
  693. \item Can read a record metadata
  694. \end{itemize}
  695. \item[] Statistics
  696. \begin{itemize}[label={\Square}]
  697. \item Can provide statistics given the statistics format
  698. \item Can alert in case of suspicious statistics
  699. \end{itemize}
  700. \item Can stream a list of all records, then stream all transformations that happened since the streaming started
  701. \end{itemize}
  702. \section{Performance requirements}
  703. \begin{itemize}[label={\Square}]
  704. \item Less than 20\% below the server disk performance in terms of latency of page reads
  705. \item Assumes available RAM to be at least 0.4\% of the disk capacity
  706. \item Assumes a POSIX environment (Both Linux and OpenBSD should be supported as first class targets)
  707. \end{itemize}
  708. \part{Technical specification}
  709. \chapter{Protocols}
  710. \section{About \texttt{Andrew}}
  711. Most of the code to implement the protocols is expected to use a small code generator named Andrew that generate parsers from YAML files.
  712. \section{\texttt{izaro-storage} queries}
  713. \begin{figure}[H]
  714. \centering
  715. \begin{bytefield}[bitwidth=1.06em, bitformatting={\tiny\ttfamily}]{32}
  716. \bitheader{0-31} \\
  717. \bitbox{14}{\texttt{unused}}
  718. \bitbox{1}{\texttt{\footnotesize 2S}}
  719. \bitbox{1}{\texttt{\footnotesize B}}
  720. \bitbox{16}{\texttt{operation code}} \\
  721. \wordbox{2}{\texttt{request identifier}} \\
  722. \wordbox{1}{\textit{optional\ldots{}} \texttt{continuation}}
  723. \end{bytefield}
  724. \begin{itemize}
  725. \item[] \texttt{2S}: the operation is two-stepped (requires a confirmation to be applied) if set
  726. \item[] \texttt{B}: the operation is a bulk operation if set
  727. \item[] operation code is a big endian integer see page \pageref{fig:gp_class_diagram_regulated}
  728. \end{itemize}
  729. \caption{Common request format (storage)}
  730. \label{fig:common_format_storage}
  731. \end{figure}
  732. \begin{figure}[H]
  733. \centering
  734. \begin{bytefield}[bitwidth=1.06em]{32}
  735. \bitheader{0-31} \\
  736. \wordbox{1}{$x$ (Big endian integer) (see page \pageref{fig:gp_class_diagram_regulated})} \\
  737. \wordbox{1}{$y$ (Big endian integer) (see page \pageref{fig:gp_class_diagram_regulated})} \\
  738. \wordbox{4}{\texttt{UUID}}
  739. \end{bytefield}
  740. \begin{itemize}
  741. \item[] a zeroed UUID means the record is invalid
  742. \end{itemize}
  743. \caption{Record identifier format}
  744. \label{fig:record_identifier}
  745. \end{figure}
  746. \begin{figure}[H]
  747. \centering
  748. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  749. \bitheader{0,31,63} \\
  750. \wordbox{2}{\texttt{Common request format (storage)}} \\
  751. \wordbox{3}{\texttt{Record identifier}} \\
  752. \wordbox{1}{\texttt{Timestamp (Big endian integer)} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  753. \end{bytefield}
  754. \begin{itemize}
  755. \item[] Timestamp of 0 means latest valid value
  756. \end{itemize}
  757. \caption{Request format for read (storage)}
  758. \label{fig:read_format_storage}
  759. \end{figure}
  760. \begin{figure}[H]
  761. \centering
  762. \begin{bytefield}[bitwidth=0.46em, bitformatting={\tiny\ttfamily}]{64}
  763. \bitheader{0,31,63} \\
  764. \wordbox{2}{\texttt{Common request format (storage)}} \\
  765. \wordbox{3}{\texttt{Record identifier}} \\
  766. \begin{rightwordgroup}{$16Kio$}
  767. \wordbox[lrt]{1}{\texttt{Database page}} \\
  768. \skippedwords \\\wordbox[lrb]{1}{}
  769. \end{rightwordgroup} \\
  770. \wordbox{1}{\texttt{Timestamp (Big endian integer)} (see page \pageref{fig:gp_class_diagram_regulated})}
  771. \end{bytefield}
  772. \begin{itemize}
  773. \item[] Timestamp of 0 or maxed as of \footnotesize{\texttt{std::numeric\_limits<uint64\_t>::max()}} means server time should be used
  774. \end{itemize}
  775. \caption{Request format for write (storage)}
  776. \label{fig:write_format_storage}
  777. \end{figure}
  778. \begin{figure}[H]
  779. \centering
  780. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  781. \bitheader{0,31,63} \\
  782. \wordbox{2}{\texttt{Common request format (storage)}} \\
  783. \wordbox{3}{\texttt{Record identifier}} \\
  784. \wordbox{1}{\texttt{Timestamp (Big endian integer)} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  785. \end{bytefield}
  786. \begin{itemize}
  787. \item[] An invalid timestamp does triggers behaviours as of page \pageref{time_sync_equation}
  788. \end{itemize}
  789. \caption{Request format for confirm (storage)}
  790. \label{fig:confirm_format_storage}
  791. \end{figure}
  792. \begin{figure}[H]
  793. \centering
  794. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  795. \bitheader{0,31,63} \\
  796. \wordbox{2}{\texttt{Common request format (storage)}} \\
  797. \wordbox{3}{\texttt{Record identifier}} \\
  798. \wordbox{1}{\texttt{Timestamp (Big endian integer)} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  799. \end{bytefield}
  800. \begin{itemize}
  801. \item[] An invalid timestamp implies the request is invalid, a zeroed timestamp means remove all, a specific value removes the target value.
  802. \end{itemize}
  803. \caption{Request format for remove (storage)}
  804. \label{fig:remove_format_storage}
  805. \end{figure}
  806. \begin{figure}[H]
  807. \centering
  808. \begin{bytefield}[bitwidth=1.06em, bitformatting={\tiny\ttfamily}]{32}
  809. \bitheader{0,31} \\
  810. \wordbox{4}{\texttt{Common request format (storage)}} \\
  811. \bitbox{32}{\texttt{Size}} \\
  812. \begin{rightwordgroup}{Repeats \\ \texttt{Size} times}
  813. \wordbox{6}{\texttt{Record identifier}} \\
  814. \wordbox{2}{\texttt{Timestamp (Big endian integer)} (see page \pageref{fig:gp_class_diagram_regulated})}
  815. \end{rightwordgroup} \\
  816. \end{bytefield}
  817. \begin{itemize}
  818. \item[] Timestamps are to be interpreted as in the read operation, with the exception they cannot be omitted
  819. \end{itemize}
  820. \caption{Request format for bulk read (storage)}
  821. \label{fig:bulk_read_format_storage}
  822. \end{figure}
  823. \begin{figure}[H]
  824. \centering
  825. \begin{bytefield}[bitwidth=1.06em]{32}
  826. \bitheader{0,31} \\
  827. \wordbox{4}{\texttt{Common request format (storage)}} \\
  828. \bitbox{32}{IP address (see page \pageref{fig:gp_class_diagram_regulated})}\\
  829. \bitbox{16}{UDP port (see page \pageref{fig:gp_class_diagram_regulated})}
  830. \bitbox[lrt]{16}{} \\
  831. \bitbox[lr]{32}{Timestamp (see page \pageref{fig:gp_class_diagram_regulated})} \\
  832. \bitbox[lrb]{16}{}
  833. \bitbox[t]{16}{} \\
  834. \end{bytefield}
  835. \begin{itemize}
  836. \item[] The time is the time for which data labeled at any time before won't be sent
  837. \end{itemize}
  838. \caption{Stream writer query}
  839. \label{fig:stream_writes_request}
  840. \end{figure}
  841. \begin{figure}[H]
  842. \centering
  843. \begin{bytefield}[bitwidth=1.06em]{32}
  844. \bitheader{0,31} \\
  845. \wordbox{4}{\texttt{Common request format (storage)}} \\
  846. \bitbox{16}{\texttt{UDP port} (see page \pageref{fig:gp_class_diagram_regulated})}
  847. \bitbox[lrt]{16}{} \\
  848. \bitbox[lr]{32}{\texttt{Timestamp} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  849. \bitbox[lrb]{16}{}
  850. \bitbox{1}{\texttt{N}}
  851. \bitbox{15}{\texttt{Unused}} \\
  852. \end{bytefield}
  853. \begin{itemize}
  854. \item[-] The time is the time for which data labeled at any time before won't be written
  855. \end{itemize}
  856. \texttt{N}: if this flag is set, instead of ignoring data that is before the limit, it will only keep the latest version of that data
  857. \caption{Stream reader query}
  858. \label{fig:stream_reads_request}
  859. \end{figure}
  860. \section{Operation codes}
  861. \begin{itemize}
  862. \item $0$=Statistics
  863. \item $1$=Read
  864. \item $2$=Write
  865. \item $3$=Confirm
  866. \item $4$=Validate
  867. \item $5$=Allocate
  868. \item $6$=Meta read
  869. \item $129$=Read $n$
  870. \item $130$=Write $n$
  871. \item $131$=Confirm $n$
  872. \item $132$=Validate $n$
  873. \item $133$=Allocate $n$
  874. \item $134$=Meta read $n$
  875. \item $257$=Unconfirmed read
  876. \item $262$=Meta unconfirmed read
  877. \item $383$=Unconfirmed read $n$
  878. \item $388$=Meta unconfirmed read $n$
  879. \item $513$=Uncommitted read
  880. \item $518$=Meta uncommitted read
  881. \item $641$=Uncommitted read $n$
  882. \item $646$=Meta uncommitted read $n$
  883. \end{itemize}
  884. \section{\texttt{izaro-storage} replies}
  885. \begin{figure}[H]
  886. \centering
  887. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  888. \bitheader{0,31,63} \\
  889. \wordbox{3}{\texttt{Record identifier}} \\
  890. \bitbox{64}{\texttt{Timestamp} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  891. \bitbox{64}{\texttt{Offset} (see page \pageref{fig:gp_class_diagram_regulated})} \\
  892. \bitbox{29}{\texttt{Unused}}
  893. \bitbox{1}{\texttt{\tiny V}}
  894. \bitbox{1}{\texttt{\tiny R}}
  895. \bitbox{1}{\texttt{\tiny C}}
  896. \end{bytefield}
  897. \begin{itemize}
  898. \item[] \texttt{R}: is set if the record is removed (not eligible for cleanup)
  899. \item[] \texttt{C}: is set if the record is confirmed
  900. \item[] \texttt{V}: is set if the record is validated/committed
  901. \end{itemize}
  902. \caption{Record format (storage)}
  903. \label{fig:record_storage}
  904. \end{figure}
  905. \begin{figure}[H]
  906. \centering
  907. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  908. \bitheader{0,31,63} \\
  909. \bitbox{64}{\texttt{request identifier}} \\
  910. \wordbox[lrt]{5}{\texttt{record information}} \\
  911. \begin{rightwordgroup}{$16Kio$}
  912. \bitbox[lrb]{32}{}
  913. \bitbox[lrt]{32}{} \\
  914. \wordbox[lr]{1}{\vspace{0.96em}\texttt{Database page}} \\
  915. \skippedwords \\\wordbox[lrb]{1}{}
  916. \end{rightwordgroup} \\
  917. \end{bytefield}
  918. \begin{itemize}
  919. \item[-] the request identifier must be equal to the request identifier from the query
  920. \item[-] in case of a write request or confirm request, the database page may be omitted or contain invalid information
  921. \end{itemize}
  922. \caption{Common reply format (storage)}
  923. \label{fig:common_format_reply_storage}
  924. \end{figure}
  925. \begin{figure}[H]
  926. \centering
  927. \begin{bytefield}[bitwidth=0.48em, bitformatting={\tiny\ttfamily}]{64}
  928. \bitheader{0,31,63} \\
  929. \bitbox{64}{\texttt{request identifier}} \\
  930. \begin{rightwordgroup}{$48o$}
  931. \bitbox[lrt]{64}{} \\
  932. \wordbox[lr]{1}{\vspace{0.96em}\texttt{unused}} \\
  933. \skippedwords \\\wordbox[lrb]{1}{}
  934. \end{rightwordgroup} \\
  935. \bitbox{64}{\texttt{number of unused pages}} \\
  936. \bitbox{64}{\texttt{number of number of free pages due to deletions}} \\
  937. \bitbox{64}{\texttt{number of pages}} \\
  938. \bitbox{64}{\texttt{total size of the record table}} \\
  939. \bitbox{64}{\texttt{total size of the delete table}} \\
  940. \bitbox{64}{\texttt{unreclaimable pages}} \\
  941. \bitbox{64}{\texttt{available records in the table}} \\
  942. \bitbox{64}{\texttt{configured synchronization rate}} \\
  943. \bitbox{64}{\texttt{duration of the last synchronization}} \\
  944. \bitbox{64}{\texttt{longest duration of synchronization}} \\
  945. \bitbox{64}{\texttt{average duration of synchronizations}} \\
  946. \end{bytefield}
  947. \begin{itemize}
  948. \item[-] all values are big endian (see page \pageref{fig:gp_class_diagram_regulated})
  949. \item[-] all times are in \si{\micro\second}
  950. \end{itemize}
  951. \caption{Statistics reply format (storage)}
  952. \label{fig:stats_format_reply_storage}
  953. \end{figure}
  954. \section{\texttt{izaro-coordinate} queries}
  955. \begin{figure}[H]
  956. \centering
  957. \begin{bytefield}[bitwidth=0.48em]{64}
  958. \bitheader{0,31,63} \\
  959. \wordbox{1}{\texttt{operation}} \\
  960. \wordbox{2}{\texttt{UUID}} \\
  961. \wordbox{2}{\texttt{token}} \\
  962. \begin{rightwordgroup}{nonce}
  963. \wordbox{1}{\texttt{request identifier}} \\
  964. \wordbox{1}{\texttt{}}
  965. \end{rightwordgroup} \\
  966. \bitbox[lrt]{64}{} \\
  967. \wordbox[lr]{1}{\vspace{0.96em}\texttt{payload}} \\
  968. \skippedwords \\\wordbox[lrb]{1}{}
  969. \end{bytefield}
  970. \begin{itemize}
  971. \item[] a zeroed UUID means administration packet
  972. \end{itemize}
  973. \caption{Data packet}
  974. \label{fig:data_packet}
  975. \end{figure}
  976. \subsection{User payloads}
  977. %\begin{figure}[H]
  978. % \centering
  979. % \begin{bytefield}[bitwidth=0.48em]{64}
  980. %
  981. % \bitheader{0,31,63} \\
  982. % \wordbox{3}{\texttt{record identifier}} \\
  983. % \wordbox{1}{\texttt{timestamp}} \\
  984. % \end{bytefield}
  985. % \begin{itemize}
  986. % \item[] the data fetched is the last before the timestamp
  987. % \item[] the timestamp is aligned on the coordinator's timestamping
  988. % \item[] if the timestamp is omitted the last confirmed page is read
  989. % \end{itemize}
  990. % \caption{Read request}
  991. % \label{fig:read_request}
  992. %\end{figure}
  993. %
  994. %\begin{figure}[H]
  995. % \centering
  996. % \begin{bytefield}[bitwidth=0.48em]{64}
  997. %
  998. % \bitheader{0,31,63} \\
  999. % \begin{rightwordgroup}{$32Kio$}
  1000. % \bitbox[lrt]{64}{} \\
  1001. % \wordbox[lr]{1}{\vspace{0.96em}\texttt{file page}} \\
  1002. % \skippedwords \\\wordbox[lrb]{1}{}
  1003. % \end{rightwordgroup} \\
  1004. % \end{bytefield}
  1005. % \begin{itemize}
  1006. % \item[] the data will be stored on the time of the server
  1007. % \end{itemize}
  1008. % \caption{Allocate$+$Write request}
  1009. % \label{fig:allocate_write_request}
  1010. %\end{figure}
  1011. \subsection{Root user payloads}
  1012. \section{\texttt{izaro-coordinate} replies}
  1013. \begin{figure}[H]
  1014. \centering
  1015. \begin{bytefield}[bitwidth=0.48em]{64}
  1016. \bitheader{0,31,63} \\
  1017. \wordbox{1}{\texttt{page count}} \\
  1018. \wordbox{3}{\texttt{page max}} \\
  1019. \wordbox{3}{\texttt{record identifier}} \\
  1020. \end{bytefield}
  1021. \begin{itemize}
  1022. \item[] the data will be stored on the time of the server
  1023. \end{itemize}
  1024. \caption{Userdata reply}
  1025. \label{fig:userdata_reply}
  1026. \end{figure}
  1027. \section{\texttt{izaro-coordinate} timing protocol and consensus}
  1028. Complementary information in the technical specification regarding constraints page \pageref{time_sync_chapter}.
  1029. \begin{figure}[H]
  1030. \centering
  1031. \begin{sequencediagram}
  1032. \newthread{a}{: Client}
  1033. \newinst[7]{b}{: Server}
  1034. \mess[1]{a}{$t_{client}$}{b}
  1035. \mess[1]{b}{$t_{client},t_{server}$}{a}
  1036. \end{sequencediagram}
  1037. \caption{Izaro client time synchronization}
  1038. \label{fig:client_time_proto}
  1039. \end{figure}
  1040. \begin{figure}[H]
  1041. \centering
  1042. \begin{sequencediagram}
  1043. \newthread{a}{: DataServer}
  1044. \newinst[7]{b}{: CoordServer}
  1045. \mess[1]{a}{$t_{client}$}{b}
  1046. \mess[1]{b}{$t_{client},t_{server}$}{a}
  1047. \mess[1]{a}{$t_{client},t_{server},t_{return},t_{round}$}{b}
  1048. \mess[1]{b}{$t_{client},t_{server},t_{return},t_{round},t_{sync}$}{a}
  1049. \end{sequencediagram}
  1050. \begin{enumerate}
  1051. \item $t_{client}$: Initial time as seen by the client
  1052. \item $t_{server}$: Time at reception of the first server packet as seen by the server
  1053. \item $t_{return}$: Time at reception of the first server reply by the data server
  1054. \item $t_{round}$: Time taken for a write round from the server
  1055. \item $t_{sync}$: An implementation defined calculated value derived from historical data from the protocol, e.g.
  1056. $t_{sync}=(\overline{t_{return}-t_{client}+t_{round}})\times 1.7$
  1057. \end{enumerate}
  1058. \caption{Izaro time synchronization}
  1059. \label{fig:time_proto}
  1060. \end{figure}
  1061. \begin{figure}[H]
  1062. \centering
  1063. \begin{sequencediagram}
  1064. \newthread{a}{: CoordServer}
  1065. \newinst[7]{b}{: DataServer}
  1066. \mess[1]{a}{$t_{1}$ data write}{b}
  1067. \mess[1]{b}{unconfirmed record}{a}
  1068. \begin{callself}{a}{wait $t_{sync}$}{return $t_{2}$}
  1069. \end{callself}
  1070. \mess[1]{a}{$t_{1} - t_{sync} < x < t_{2}$ unconfirmed read}{b}
  1071. \mess[1]{b}{same unconfirmed record}{a}
  1072. \mess[1]{a}{validate record}{b}
  1073. \mess[1]{b}{validated record}{a}
  1074. \end{sequencediagram}
  1075. \caption{Izaro single write confirmation}
  1076. \label{fig:confirmation_proto}
  1077. \end{figure}
  1078. \begin{figure}[H]
  1079. \centering
  1080. \begin{sequencediagram}
  1081. \newthread{a}{: Client}
  1082. \newinst[4]{b}{: CoordServer}
  1083. \newinst[4]{c}{: DataServer}
  1084. \mess[1]{a}{try lock id}{b}
  1085. \begin{callself}{b}{check lock}{return success}
  1086. \end{callself}
  1087. \mess[1]{b}{log lock state id}{c}
  1088. \begin{callself}{b}{wait $t_{sync}$}{return;}
  1089. \end{callself}
  1090. \mess[1]{b}{return lock success}{a}
  1091. \prelevel\prelevel
  1092. \prelevel\prelevel
  1093. \begin{callself}{c}{save lock}{return;}
  1094. \end{callself}
  1095. \postlevel\postlevel
  1096. \postlevel
  1097. \end{sequencediagram}
  1098. \caption{Izaro synchronization information logging (success)}
  1099. \label{fig:synchro_log}
  1100. \end{figure}
  1101. \begin{figure}[H]
  1102. \centering
  1103. \begin{sequencediagram}
  1104. \newthread{a}{: Client}
  1105. \newinst[4]{b}{: CoordServer}
  1106. \newinst[4]{c}{: DataServer}
  1107. \mess[1]{a}{try lock id}{b}
  1108. \begin{callself}{b}{check lock}{return failure}
  1109. \end{callself}
  1110. \mess[1]{b}{log lock failure}{c}
  1111. \prelevel\prelevel
  1112. \mess[1]{b}{return lock failure}{a}
  1113. \begin{callself}{c}{log failure}{return;}
  1114. \end{callself}
  1115. \end{sequencediagram}
  1116. \caption{Izaro synchronization information logging (failure)}
  1117. \label{fig:synchro_log_fail}
  1118. \end{figure}
  1119. \begin{figure}[H]
  1120. \centering
  1121. \begin{footnotesize}
  1122. \begin{sequencediagram}
  1123. \newinst{da}{: Client \#1}
  1124. \newinst[4]{dc}{: Server}
  1125. \newinst[4]{db}{: Client \#2}
  1126. \mess[1]{da}{$t_{1}$ data write}{dc}
  1127. \mess[1]{dc}{unconfirmed record $A$}{da}
  1128. \begin{callself}{da}{wait $t_{sync}$}{return $t_{2}$}
  1129. \postlevel\postlevel
  1130. \postlevel\postlevel
  1131. \end{callself}
  1132. \mess[1]{da}{$t_{1} - t_{sync} < x < t_{2}$ unconfirmed read}{dc}
  1133. \mess[1]{dc}{self unconfirmed record is first}{da}
  1134. \postlevel
  1135. \mess[1]{da}{validate record}{dc}
  1136. \mess[1]{dc}{validated record}{da}
  1137. \prelevel\prelevel
  1138. \prelevel\prelevel
  1139. \prelevel\prelevel
  1140. \prelevel\prelevel
  1141. \prelevel\prelevel
  1142. \prelevel\prelevel
  1143. \prelevel\prelevel
  1144. \prelevel\prelevel
  1145. \prelevel\prelevel
  1146. \mess[1]{db}{$t_{1}'$ data write}{dc}
  1147. \mess[1]{dc}{unconfirmed record $B$}{db}
  1148. \begin{callself}{db}{wait $t_{sync}$}{return $t_{2}'$}
  1149. \postlevel\postlevel
  1150. \postlevel\postlevel
  1151. \end{callself}
  1152. \mess[1]{db}{$t_{1}'-t_{sync} < x < t_{2}'$ unconfirmed read}{dc}
  1153. \mess[1]{dc}{self unconfirmed record is not first}{db}
  1154. \begin{callself}{db}{retry}{}
  1155. \end{callself}
  1156. \end{sequencediagram}
  1157. \end{footnotesize}
  1158. \caption{Izaro two writes commit/cancellation decision}
  1159. \label{fig:2user_confirmation_proto}
  1160. \end{figure}
  1161. \begin{figure}[H]
  1162. \centering
  1163. \begin{minipage}[t][\textheight-2\fboxsep-2\fboxrule][t]{0.7\textwidth}
  1164. \begin{footnotesize}
  1165. \begin{sequencediagram}
  1166. \newinst{db}{: Client}
  1167. \newinst[4]{da}{: CoordServer}
  1168. \newinst[4]{dc}{: DataServer}
  1169. \mess[1]{db}{send query}{da}
  1170. \mess[1]{da}{$t_{1}$ data write}{dc}
  1171. \mess[1]{dc}{unconfirmed record $A$}{da}
  1172. \prelevel
  1173. \prelevel
  1174. \begin{callself}{da}{wait $t_{sync}$}{return $t_{2}$}
  1175. \postlevel
  1176. \end{callself}
  1177. \mess[1]{da}{$t_{1} - t_{sync} < x < t_{2}$ unconfirmed read}{dc}
  1178. \mess[1]{dc}{self unconfirmed record is first}{da}
  1179. \mess[1]{da}{confirm record}{dc}
  1180. \mess[1]{dc}{confirmed record}{da}
  1181. \mess[1]{da}{wait continuation}{db}
  1182. \mess[1]{db}{send continuation query}{da}
  1183. \mess[1]{da}{$t_{3}$ data write}{dc}
  1184. \mess[1]{dc}{unconfirmed record $B$}{da}
  1185. \prelevel
  1186. \prelevel
  1187. \begin{callself}{da}{wait $t_{sync}$}{return $t_{4}$}
  1188. \postlevel
  1189. \end{callself}
  1190. \mess[1]{da}{$t_{1} - t_{sync} < x < t_{4}$ unconfirmed read}{dc}
  1191. \mess[1]{dc}{self unconfirmed record is first}{da}
  1192. \mess[1]{da}{commit record}{dc}
  1193. \mess[1]{dc}{committed record}{da}
  1194. \mess[1]{da}{return}{db}
  1195. \end{sequencediagram}
  1196. \end{footnotesize}
  1197. \end{minipage}
  1198. \caption{Izaro two queries confirmation and validation}
  1199. \label{fig:2query_confirmation_proto}
  1200. \end{figure}
  1201. \chapter{Storage layer}
  1202. \section{\texttt{izaro-storage}}
  1203. \begin{itemize}[label={\Square}]
  1204. \item Receive a query
  1205. \item Route a query
  1206. \begin{itemize}[label={\Square}]
  1207. \item Route to read
  1208. \item Route to write
  1209. \item Route to confirm
  1210. \item Route to commit/validate
  1211. \item Route to remove
  1212. \item Route to test
  1213. \item Route to allocate
  1214. \item Route to initiate streaming
  1215. \item Route to log lock information
  1216. \end{itemize}
  1217. \item Preemptively perform time synchronization queries
  1218. \item Perform side-effect
  1219. \begin{itemize}[label={\Square}]
  1220. \item Write
  1221. \item Confirm
  1222. \item Commit/validate
  1223. \item Remove
  1224. \item Route to test
  1225. \item Allocate
  1226. \item Streaming initiation
  1227. \item Log lock information
  1228. \end{itemize}
  1229. \item Perform response
  1230. \begin{itemize}[label={\Square}]
  1231. \item Respond to read
  1232. \item Respond to write
  1233. \item Respond to confirm
  1234. \item Respond to commit/validate
  1235. \item Respond to remove
  1236. \item Respond to test
  1237. \item Respond to allocate
  1238. \item Respond to initiate streaming
  1239. \item Responses are cached for at least 1 second
  1240. \item Responses are handles as a outgoing queue
  1241. \end{itemize}
  1242. \end{itemize}
  1243. \begin{itemize}[label={\Square}]
  1244. \item Can provide statistics given the statistics format
  1245. \item Statistics are readable as column data
  1246. \item Statistics items values start after the 16th byte of each line
  1247. \item Statistic items have units if applicable
  1248. \end{itemize}
  1249. \section{\texttt{gplib}: General Purpose POSIX based tools library}
  1250. \begin{itemize}[label={\Square}]
  1251. \item Managed file descriptor
  1252. \item Managed memory mapping
  1253. \item Managed stored array
  1254. \item Managed stored hash map
  1255. \item Managed stored indexed array
  1256. \item Regulated to big endian integer
  1257. \item Single Write Multiple Read adapter for containers
  1258. \item Region based MT adapter for containers
  1259. \end{itemize}
  1260. \begin{figure}[h]
  1261. \centering
  1262. \begin{tikzpicture}
  1263. \begin{class}[text width=0.80\textwidth]{shared\string_fd}{0,14}
  1264. \attribute{- fd : int}
  1265. \attribute{- last\string_error : int}
  1266. \attribute{- guard : std::atomic\string_int*}
  1267. \operation{+ shared\string_fd()}
  1268. \operation{- shared\string_fd( fd : int )}
  1269. \operation{+ \string~shared\string_fd()}
  1270. \operation{+ is\string_valid() : bool}
  1271. \operation{+ has\string_failed() : bool}
  1272. \operation{+ value() : int}
  1273. \operation{+ read( buffer : std::string\string_view ) : std::string\string_view}
  1274. \operation{+ write( buffer : std::string\string_view ) : std::string\string_view}
  1275. \operation{+ connect( addr : address )}
  1276. \operation{+ bind( addr : address )}
  1277. \operation{+ send( buffer : std::string\string_view, addr : address )}
  1278. \operation{+ receive( buffer : std::pair<std::string\string_view, address> ) : std::string\string_view}
  1279. \operation{+ accept() : shared\string_fd}
  1280. \operation{\underline{+ create( filename : std::string\string_view, mode : int ) : shared\string_fd}}
  1281. \operation{\underline{+ open( filename : std::string\string_view, flags : int ) : shared\string_fd}}
  1282. \operation{\underline{+ socket( domain : int, proto : int, flags : int ) : shared\string_fd}}
  1283. \operation{\underline{+ unix\string_socket( proto : int ) : shared\string_fd}}
  1284. \operation{\underline{+ unix\string_socket\string_pair( proto : int ) : std::pair<shared\string_fd, shared\string_fd>}}
  1285. \end{class}
  1286. \begin{class}[text width=0.80\textwidth]{shared\string_mmap}{0,0}
  1287. \attribute{- off : size\string_t}
  1288. \attribute{- sz : size\string_t}
  1289. \attribute{- ptr : void*}
  1290. \attribute{- guard : std::atomic\string_int*}
  1291. \operation{+ shared\string_mmap()}
  1292. \operation{+ shared\string_mmap( fd : shared\string_fd, size : size\string_t, offset : size\string_t )}
  1293. \operation{+ \string~shared\string_mmap()}
  1294. \operation{+ operator()$<$typename T$>$() : T*}
  1295. \operation{+ size() : size\string_t}
  1296. \operation{+ begin() : uint8\string_t*}
  1297. \operation{+ end() : uint8\string_t*}
  1298. \operation{+ offset() : size\string_t}
  1299. \operation{+ advise(adv\string_value)}
  1300. \end{class}
  1301. \composition{shared\string_mmap}{fd}{1}{shared\string_fd}
  1302. \end{tikzpicture}
  1303. \caption{\texttt{gplib} shared\_{}fd and shared\_{}mmap}
  1304. \label{fig:gp_class_diagram_mmap}
  1305. \end{figure}
  1306. \begin{figure}[h]
  1307. \centering
  1308. \begin{tikzpicture}
  1309. \begin{class}[text width=0.80\textwidth]{shared\string_fd}{0,6}
  1310. \end{class}
  1311. \begin{class}[text width=0.80\textwidth]{shared\string_mmap}{0,3}
  1312. \end{class}
  1313. \begin{class}[text width=0.80\textwidth]{stored\string_cyclic\string_queue}{0,0}
  1314. \attribute{+ $<<$T$>>$ : Type}
  1315. \attribute{+ $<<$is\string_mt\string_safe$>>$ : bool}
  1316. \attribute{- guard : std::atomic\string_int*}
  1317. \operation{+ stored\string_cyclic\string_queue()}
  1318. \operation{+ \string~stored\string_cyclic\string_queue()}
  1319. \operation{\underline{+ create( filename, elem\string_count) : stored\string_cyclic\string_queue}}
  1320. \operation{\underline{+ open( filename : std::string\string_view ) : stored\string_cyclic\string_queue}}
  1321. \operation{+ push( elem : T )}
  1322. \operation{+ has\string_looped() : bool}
  1323. \operation{+ size() : size\string_t}
  1324. \operation{+ elem\string_size() : size\string_t}
  1325. \operation{+ begin() : iterator$<$T$>$}
  1326. \operation{+ end() : iterator$<$T$>$}
  1327. \end{class}
  1328. \composition{shared\string_mmap}{fd}{1}{shared\string_fd}
  1329. \composition{stored\string_cyclic\string_queue}{metadata, data\string_buffer}{2}{shared\string_mmap}
  1330. \end{tikzpicture}
  1331. \caption{\texttt{gplib} stored\_{}cyclic\_{}queue}
  1332. \label{fig:gp_class_diagram_scq}
  1333. \end{figure}
  1334. \begin{figure}[h]
  1335. \centering
  1336. \begin{tikzpicture}
  1337. \begin{class}[text width=0.80\textwidth]{shared\string_fd}{0,6}
  1338. \end{class}
  1339. \begin{class}[text width=0.80\textwidth]{shared\string_mmap}{0,3}
  1340. \end{class}
  1341. \begin{class}[text width=0.80\textwidth]{stored\string_indexed\string_array}{0,0}
  1342. \attribute{+ $<<$T$>>$ : Type}
  1343. \attribute{+ $<<$is\string_mt\string_safe$>>$ : bool}
  1344. \attribute{- guard : std::atomic\string_int*}
  1345. \operation{+ stored\string_indexed\string_array()}
  1346. \operation{+ \string~stored\string_indexed\string_array()}
  1347. \operation{\underline{+ create( filename, elem\string_count) : stored\string_cyclic\string_queue}}
  1348. \operation{\underline{+ open( filename : std::string\string_view ) : stored\string_cyclic\string_queue}}
  1349. \operation{+ push( elem : T )}
  1350. \operation{+ pop(): T}
  1351. \operation{+ delete()}
  1352. \operation{+ size() : size\string_t}
  1353. \operation{+ elem\string_size() : size\string_t}
  1354. \operation{+ begin() : iterator$<$T$>$}
  1355. \operation{+ end() : iterator$<$T$>$}
  1356. \operation{+ operator[]( idx : size\string_t )}
  1357. \end{class}
  1358. \composition{shared\string_mmap}{fd}{1}{shared\string_fd}
  1359. \composition{stored\string_indexed\string_array}{metadata, data\string_array, translation\string_array, reserve\string_array, delete\string_array}{5}{shared\string_mmap}
  1360. \end{tikzpicture}
  1361. \caption{\texttt{gplib} stored\_{}indexed\_{}array}
  1362. \label{fig:gp_class_diagram_sia}
  1363. \end{figure}
  1364. \begin{figure}[h]
  1365. \centering
  1366. \begin{tikzpicture}
  1367. \begin{class}[text width=0.80\textwidth]{shared\string_fd}{0,6}
  1368. \end{class}
  1369. \begin{class}[text width=0.80\textwidth]{shared\string_mmap}{0,3}
  1370. \end{class}
  1371. \begin{class}[text width=1.03\textwidth]{stored\string_hash\string_map}{0,0}
  1372. \attribute{+ $<<$Tk$>>$ : Type}
  1373. \attribute{+ $<<$Tv$>>$ : Type}
  1374. \attribute{\underline{+ $<<$TPred$>>$ $=$ std::function$<$std::optional$<$Tr$>$(std::pair<Tk,Tv>)$>$ : Type}}
  1375. \attribute{+ $<<$is\string_mt\string_safe$>>$ : bool}
  1376. \attribute{- guard : std::atomic\string_int*}
  1377. \operation{+ stored\string_hash\string_map()}
  1378. \operation{+ \string~stored\string_hash\string_map()}
  1379. \operation{\underline{+ create( filename, elem\string_count, rec\string_count, del\string_count) : stored\string_cyclic\string_queue}}
  1380. \operation{\underline{+ open( filename : std::string\string_view ) : stored\string_cyclic\string_queue}}
  1381. \operation{+ insert( key : Tk, value : Tv )}
  1382. \operation{+ get( key : Tk ): Tv}
  1383. \operation{+ get\string_or( key : Tk, or\string_v : Tv ): Tv}
  1384. \operation{+ get\string_filter( key : Tk, lambda : std::function ): Tv}
  1385. \operation{+ for\string_each\string_backlog$<$Tr$>$( key : Tk, lambda : Pred ): Tr}
  1386. \operation{+ delete( key : Tk )}
  1387. \operation{+ size() : size\string_t}
  1388. \operation{+ elem\string_size() : size\string_t}
  1389. \operation{+ begin() : iterator$<$T$>$}
  1390. \operation{+ end() : iterator$<$T$>$}
  1391. \end{class}
  1392. \composition{shared\string_mmap}{fd}{1}{shared\string_fd}
  1393. \composition{stored\string_hash\string_map}{metadata, data\string_array, record\string_array, delete\string_array}{4}{shared\string_mmap}
  1394. \end{tikzpicture}
  1395. \caption{\texttt{gplib} stored\_{}hash\_{}map}
  1396. \label{fig:gp_class_diagram_shm}
  1397. \end{figure}
  1398. \begin{figure}[h]
  1399. \centering
  1400. \begin{tikzpicture}
  1401. \begin{class}[text width=0.90\textwidth]{regulated}{0,0}
  1402. \attribute{+ $<<$T$>>$ : Type}
  1403. \attribute{+ data : std::array$<$uint8\string_t, sizeof(T)$>$}
  1404. \operation{+ regulated()}
  1405. \operation{+ regulated(value : T)}
  1406. \operation{+ \string~regulated()}
  1407. \operation{+ operator T() : T}
  1408. \end{class}
  1409. \end{tikzpicture}
  1410. \caption{\texttt{gplib} regulated to big endian value}
  1411. \label{fig:gp_class_diagram_regulated}
  1412. \end{figure}
  1413. \begin{figure}[h]
  1414. \centering
  1415. \begin{tikzpicture}
  1416. \begin{class}[text width=0.30\textwidth]{shared\string_fd}{0,6}
  1417. \end{class}
  1418. \begin{class}[text width=0.30\textwidth]{shared\string_mmap}{0,3}
  1419. \end{class}
  1420. \begin{class}[text width=0.30\textwidth]{stored\string_cyclic\string_queue}{-6,3}
  1421. \end{class}
  1422. \begin{class}[text width=0.30\textwidth]{stored\string_indexed\string_array}{6,3}
  1423. \end{class}
  1424. \begin{class}[text width=0.30\textwidth]{stored\string_hash\string_map}{6,6}
  1425. \end{class}
  1426. \composition{shared\string_mmap}{}{}{shared\string_fd}
  1427. \composition{stored\string_hash\string_map}{}{}{shared\string_mmap}
  1428. \composition{stored\string_indexed\string_array}{}{}{shared\string_mmap}
  1429. \composition{stored\string_cyclic\string_queue}{}{}{shared\string_mmap}
  1430. \end{tikzpicture}
  1431. \caption{\texttt{gplib} container classes}
  1432. \label{fig:gp_class_diagram_global}
  1433. \end{figure}
  1434. \chapter{Coordination layer}
  1435. \section{Client to \texttt{izaro-coordinate}}
  1436. Queries are performed asynchronously; queries SHOULD be terminated by a specified relatively short timeout. Queries are composed of a sequence of operations ordered and executed with two(2) stacks of caches used to run them. A query is not performed until all of its operations is performed.
  1437. One of the caches is the query cache, a stack of operation executed from top to bottom.
  1438. The other cache is the data cache, to which data is pushed to or pulled towards.
  1439. Queries can be sliced in multiple packets affixed with an index and a size. The reply to a yet incomplete query will always be an error until all parts are received. The error reply will list the missing parts of the query. The reply to an unfinished query will always be an error except if it is in the waiting state.
  1440. Replies MAY also be split in a similar fashion as queries. The client can at any moment request an index for an incomplete reply. Errors only provide back the stack position of the query stack and of the data stack.
  1441. The query stack only contains operations, the data stack contains tagged unions of record identifiers, records and pages.
  1442. Both the query stack and the data stack MUST be stored in memory mapped temporary files to prevent exceeding live memory capacities in a way that allows denial of service. Those temporaries MUST be on a form of live disk.
  1443. Writes pop a record identifier then they pop a page.
  1444. Reads push a page then a record identifier.
  1445. Assertion expect a record identifier on top of the stack, but do not change it.
  1446. Allocations push a record identifier.
  1447. Large queries MAY be implemented to wait a larger timeout.
  1448. \begin{itemize}[label={\Square}]
  1449. \item Receive a query
  1450. \item Provide a connection hook
  1451. \item Can read a raw connection block
  1452. \item Perform non query operations
  1453. \begin{itemize}[label={\Square}]
  1454. \item Perform a long term lock
  1455. \item Release a long term lock
  1456. \item Lock a guard
  1457. \item Unlock a guard
  1458. \item Lock on a named region lock
  1459. \item Unlock on a named region lock
  1460. \item Create a named region lock
  1461. \item Inflate named region lock
  1462. \item Deflate named region lock
  1463. \item Get named region lock statistics
  1464. \end{itemize}
  1465. \item Perform a query
  1466. \begin{itemize}[label={\Square}]
  1467. \item Preemptively perform time synchronization queries
  1468. \item Create a query stack
  1469. \item Prepare a query stack
  1470. \item Execute a query stack
  1471. \begin{itemize}[label={\Square}]
  1472. \item Assert timestamp, pop and write
  1473. \item Assert/pop record identifier
  1474. \item Assert/pop page
  1475. \item Allocate and push
  1476. \item Confirm and wait
  1477. \item Read and push
  1478. \item Pop and write
  1479. \item Pop and delete
  1480. \item Clear and push
  1481. \item Assert timestamp
  1482. \item Cancel
  1483. \item Commit
  1484. \item Query stack pop til size $n$
  1485. \item Interweave $n$ elements with the $n$ next elements
  1486. \end{itemize}
  1487. \end{itemize}
  1488. \item Cleanup old data
  1489. \begin{itemize}[label={\Square}]
  1490. \item Cache
  1491. \item Timeout an old query
  1492. \item Incomplete query stack
  1493. \item List of past requests
  1494. \end{itemize}
  1495. \item Perform response
  1496. \begin{itemize}[label={\Square}]
  1497. \item Serialize a query stack state
  1498. \item Return a query stack state
  1499. \item Serialize a data stack state
  1500. \item Return a data stack state
  1501. \item Return a wait mark
  1502. \item Return a valid error
  1503. \end{itemize}
  1504. \item Keep track of user service usage
  1505. \begin{itemize}[label={\Square}]
  1506. \item Keep track of user write count
  1507. \item Keep track of user snapshots
  1508. \item Keep track of user delete count
  1509. \end{itemize}
  1510. \item Keep track of service activity
  1511. \begin{itemize}[label={\Square}]
  1512. \item Keep track of network usage
  1513. \item Keep track of non storage operations per minute
  1514. \item Keep track of storage operations per minute
  1515. \item In case of low storage operation per minute
  1516. \begin{itemize}[label={\Square}]
  1517. \item Trigger a cleanup up to filling the deletion table to a given threshold or complete cleanup
  1518. \item Perform a replica clone if an extra server is available
  1519. \end{itemize}
  1520. \item In case of high non storage operation per minute
  1521. \begin{itemize}[label={\Square}]
  1522. \item Request increasing the cyclic log size if the storage is planned to last less than a minute
  1523. \item Send a report of high usage of non storage
  1524. \end{itemize}
  1525. \item Log activity reports
  1526. \item Track handlers on exit and log statistics
  1527. \end{itemize}
  1528. \end{itemize}
  1529. Region locks have a name, a slicing and a size. You can lock any position within the size rapidly, locking a whole slice of the region. Useful to lock over parts of files.
  1530. The size of slices is fixed. Inflating the region lock adds slices to it. Deflating the region lock removes slices from it if and only if they are not locked, else it fails, a gradual deflation is advised unless you can expect huge slices to be unlocked.
  1531. Coordination servers are responsible for slicing big data plans across data server constellations, trying to fit pages in all constellations of equally, even if that means assigning new UUIDs to a user on allocating operations if needed.
  1532. \begin{figure}[h]
  1533. \centering
  1534. \begin{sequencediagram}
  1535. \newthread{a}{: CoordServer}
  1536. \newinst[4]{b}{: DataServer}
  1537. \mess[1]{a}{Query}{b}
  1538. \begin{callself}{b}{Perform}{}
  1539. \end{callself}
  1540. \mess[1]{b}{Reply}{a}
  1541. \prelevel\prelevel
  1542. \prelevel\prelevel
  1543. \prelevel
  1544. \begin{callself}{a}{Timeout}{Results}
  1545. \postlevel\postlevel
  1546. \postlevel\postlevel
  1547. \end{callself}
  1548. \end{sequencediagram}
  1549. \caption{Successful query}
  1550. \label{fig:cli_to_coord_success}
  1551. \end{figure}
  1552. \begin{figure}[H]
  1553. \centering
  1554. \begin{sequencediagram}
  1555. \newthread{a}{: Client}
  1556. \newinst[4]{b}{: Server}
  1557. \mess[1]{a}{Query}{b}
  1558. \begin{callself}{b}{Perform}{}
  1559. \end{callself}
  1560. \prelevel\prelevel
  1561. \prelevel
  1562. \begin{callself}{a}{Timeout}{Error}
  1563. \postlevel\postlevel
  1564. \postlevel\postlevel
  1565. \end{callself}
  1566. \begin{sdblock}{loop}{while no results received}
  1567. \mess[1]{a}{Query with same id}{b}
  1568. \begin{callself}{b}{Perform}{Cached results}
  1569. \end{callself}
  1570. \mess[1]{b}{Reply?}{a}
  1571. \prelevel\prelevel
  1572. \prelevel\prelevel
  1573. \prelevel
  1574. \begin{callself}{a}{Timeout}{Results or Error}
  1575. \postlevel\postlevel
  1576. \postlevel\postlevel
  1577. \end{callself}
  1578. \end{sdblock}
  1579. \end{sequencediagram}
  1580. \caption{Delayed success query}
  1581. \label{fig:cli_to_coord_dsuccess}
  1582. \end{figure}
  1583. \section{\texttt{izaro-coordinate} to \texttt{izaro-storage}}
  1584. List of writes MUST be kept and checked for consistency of timestamps before confirmation. They MUST be stored in memory mapped temporary files to prevent exceeding live memory capacities in a way that allows denial of service.
  1585. If your operation took more than $t_{sync}$ to perform its confirmation MAY NOT be atomic and result in a form of corruption. Any operation exceeding a threshold of operations advised by the server MAY have these side effects. It is advised these operations use either a regional form of locking or use a per entity form of locking performed by the client side respectfully of locking constraints. The server MUST guarantee loosely that any operation that takes less than the advised number of steps is atomic and sequentially consistent.
  1586. This MAY imply that only a group of operation that takes less than $t_{sync}$ will be atomic.
  1587. Any operation that will take less than $t_{sync}$ to perform MUST be atomic by design or fails.
  1588. \section{Coordination layer high-availability}
  1589. Each coordination server in a cluster get attributed a number ID by the leader. This number ID must be greater than any number ID in the cluster. The leader is always the node with the lowest number ID. Only the leader can perform locking operations, new node inclusion and replication orders.
  1590. In case of partition, each node is to try to contact the node with the lowest ID and register it as its potential leader. A leader will emerge if a strict majority of the nodes refer to it as leader.if a node fails to obtain said majority with a defined time frame, this node is to disconnect from the cluster and wait for a leader with lower ID to emerge, then seek for a new number ID from that leader.
  1591. \section{\texttt{gplib}: General Purpose POSIX based tools library}
  1592. \begin{itemize}[label={\Square}]
  1593. \item Managed stored lock table
  1594. \item Managed stored circular log
  1595. \item Managed stored lock queue
  1596. \item Bloom-filter
  1597. \item Region locker
  1598. \end{itemize}
  1599. \begin{figure}[h]
  1600. \centering
  1601. \begin{tikzpicture}
  1602. \begin{class}[text width=0.20\textwidth]{shared\string_fd}{-1,6}
  1603. \end{class}
  1604. \begin{class}[text width=0.20\textwidth]{shared\string_mmap}{-1,3}
  1605. \end{class}
  1606. \begin{class}[text width=0.20\textwidth]{stored\string_hash\string_map}{-1,0}
  1607. \end{class}
  1608. \begin{class}[text width=0.40\textwidth]{lock\string_id}{-11,-3}
  1609. \attribute{value : std::array$<$char, 4096$>$}
  1610. \end{class}
  1611. \begin{class}[text width=0.20\textwidth]{lock\string_key}{-5,6}
  1612. \attribute{value : uint64\string_t}
  1613. \end{class}
  1614. \begin{class}[text width=0.40\textwidth]{lock\string_tout}{-11,3}
  1615. \attribute{timestamp : regulated$<$uint64\string_t $>$}
  1616. \end{class}
  1617. \begin{class}[text width=0.30\textwidth]{exclusivity\string_t}{-11,0}
  1618. \attribute{value : uint8\string_t}
  1619. \attribute{\underline{free = 0}}
  1620. \attribute{\underline{exclusive = 0xFF}}
  1621. \end{class}
  1622. \begin{class}[text width=0.30\textwidth]{locktable}{-1,-3}
  1623. \attribute{\underline{Tk = lock\string_id}}
  1624. \attribute{\underline{Tv = lock\string_info}}
  1625. \inherit{stored\string_hash\string_map}
  1626. \end{class}
  1627. \begin{class}[text width=0.20\textwidth]{lock\string_info}{-5,-1}
  1628. \end{class}
  1629. \composition{shared\string_mmap}{}{}{shared\string_fd}
  1630. \aggregation{locktable}{$<<$Type$>>$}{}{lock\string_id}
  1631. \aggregation{locktable}{$<<$Type$>>$}{}{lock\string_info}
  1632. \composition{stored\string_hash\string_map}{}{}{shared\string_mmap}
  1633. \composition{lock\string_info}{id}{}{lock\string_id}
  1634. \composition{lock\string_info}{key}{}{lock\string_key}
  1635. \composition{lock\string_info}{timeout}{}{lock\string_tout}
  1636. \composition{lock\string_info}{state}{}{exclusivity\string_t}
  1637. \end{tikzpicture}
  1638. \caption{\texttt{gplib} lock classes}
  1639. \label{fig:gp_class_diagram_locks}
  1640. \end{figure}
  1641. \section{Communication between coordination servers}
  1642. Each server is to hold a certain amount of data about the cluster:
  1643. \begin{itemize}
  1644. \item Index ID
  1645. \item List of the other coordination servers and some information about them:
  1646. \begin{itemize}
  1647. \item Index ID
  1648. \item A contact target (tuple of a domain or IP address and a port)
  1649. \end{itemize}
  1650. \item Index ID of the leader
  1651. \end{itemize}
  1652. Any server with a lower Index ID (also referred as number ID) has execution authority on the servers below it.
  1653. \subsection{Leadership}
  1654. In a normal state, the leader is a server accessible from any coordination server with the lowest number ID. The leader is responsible for all locking operations within the cluster and for streaming the logs of those operations to be able to reconstruct the locking state of the cluster as accurately as possible in case of failure.
  1655. In case the leader loses access to a majority of the connected servers during a time period defined in its configuration, the leader is to announce its drop of leadership and the cluster enters a degraded state.
  1656. In degraded state, no lock or server replacement operation can be performed. Normal operations can still be performed for a configuration defined time in which it is considered the shared time state has not been degraded. In that state, the servers are to perform a partition resolution.
  1657. \subsection{Partition resolution}
  1658. In partition resolution mode, every server is to try to contact servers by Index ID order, and assign and send a partition resolution value that is dependent on how low the Index ID of the target server is in the list of reachable servers ordered by Index ID. If that server has a majority of servers that can reach it with a partition resolution value of 1, it is elected leader. If it doesn't but has a majority with both partition resolution values of 1 and 2, it gets elected leader.
  1659. \begin{figure}[H]
  1660. \centering
  1661. \begin{sequencediagram}
  1662. \newinst{a}{: Server \#2}
  1663. \newinst[3]{b}{: Server \#3}
  1664. \newinst[3]{c}{: Server \#4}
  1665. \mess[2]{b}{connect}{a}
  1666. \prelevel\prelevel\prelevel
  1667. \mess[2]{c}{connect}{a}
  1668. \mess[2]{b}{$PTR=1$}{a}
  1669. \prelevel\prelevel\prelevel
  1670. \mess[2]{c}{$PTR=1$}{a}
  1671. \begin{callself}{a}{Majority of $PTR=1$}{Set as leader}
  1672. \end{callself}
  1673. \mess[3]{a}{$Leader=2$}{b}
  1674. \prelevel\prelevel\prelevel\prelevel
  1675. \mess[3]{a}{$Leader=2$}{c}
  1676. \end{sequencediagram}
  1677. \caption{Best case PTR}
  1678. \label{fig:best_case_ptr}
  1679. \end{figure}
  1680. Between every step, there must be a synchronization between parties that managed to communicate.
  1681. \begin{table}[H]
  1682. \centering
  1683. \begin{tabular}{|c|c|c|c|c|}
  1684. \hline
  1685. & \#2 & \#3 & \#4 & \#5 \\
  1686. \hline
  1687. \#2 & X & X & X & X \\
  1688. \hline
  1689. \#3 & 1 & X & X & X \\
  1690. \hline
  1691. \#4 & 0 & 1 & X & X \\
  1692. \hline
  1693. \#5 & 0 & 1 & 1 & X \\
  1694. \hline
  1695. \end{tabular}\\
  1696. \caption{Network availability table}
  1697. \begin{tabular}{|l|l|l|l|}
  1698. \hline
  1699. \#2 & \#3 & \#4 & \#5 \\
  1700. \hline
  1701. L=0 & L=0 & L=0 & L=0 \\
  1702. \hline
  1703. L=1 & L=0 & L=0 & L=0 \\
  1704. \hline
  1705. L=1 PTR(3$\rightarrow{}$1) & L=0 Exp \#2 & L=0 Exp \#2 & L=0 Exp \#2 \\
  1706. \hline
  1707. L=0 PTR(3$\rightarrow{}$1) Exp \#3 & L=1 PTR(3$\rightarrow{}$1, 4$\rightarrow{}$1, 5$\rightarrow{}$1) & L=0 Exp \#3 & L=0 Exp \#3 \\
  1708. \hline
  1709. L=0 Rejected, Bad ID & L=1 PTR(2$\rightarrow{}$2, 4$\rightarrow{}$1, 5$\rightarrow{}$1) & L=0 & L=0 \\
  1710. \hline
  1711. L=0 Reset & L=1 PTR(2$\rightarrow{}$2, 4$\rightarrow{}$1, 5$\rightarrow{}$1) & L=0 & L=0 \\
  1712. \hline
  1713. L=0 Index changed to \#6 & L=1 PTR(6$\rightarrow{}$2, 4$\rightarrow{}$1, 5$\rightarrow{}$1) & L=0 & L=0 \\
  1714. \hline
  1715. L=0 & L=1 & L=0 & L=0 \\
  1716. \hline
  1717. \end{tabular}
  1718. \label{tab:ptr_workflow}
  1719. \caption{PTR workflow}
  1720. \end{table}
  1721. \section{Communication between coordination servers}
  1722. Communication between communication servers can be done either in anonymous mode, the Index ID sent in those requests is in that case sent as $0$. $0$ is otherwise an illegal Index ID.
  1723. Anonymous connections are the following operations:
  1724. \begin{itemize}
  1725. \item Connection as a new member
  1726. \item ID Reset announcement after falling behind in Index ID over
  1727. \item Querying the currently known leadership
  1728. \end{itemize}
  1729. The Index ID works as a header. Its format is a one byte size information, followed by a multi-precision big-endian unsigned integer of up to 256 bytes.
  1730. Identified connections are the following operations:
  1731. \begin{itemize}
  1732. \item Querying a heartbeat from the leader
  1733. \item Querying the full cluster information
  1734. \item Sending lock information for safekeeping
  1735. \item Sending back-end cluster changes updates
  1736. \end{itemize}
  1737. \section{$A/B/AB$ fragments}
  1738. Let $X$ be a number writable using $262144$ binary digits. Let's represent that number as $X={X_{high}X_{low}}_{2^{17}}$.
  1739. The $A$ part of $X$, written $A(X)$ is equal to $X_{high}$. The $B$ part of $X$, written $B(X)$ is equal to $X_{low}$.
  1740. The $AB$ part is the base $2^6$ digit wise modulo addition of $A$ and $B$, written as $(A+B)(X)$.
  1741. \section{Ownership of read-only blocks}
  1742. Ownership of blocks is stored on a global level on the same idea as payments.
  1743. Ownerships are the combination of a Block UUID, a User ID, a read flag and a write flag. Creation ownership is enforced above all as a read-write relationship. This means that the list of ownerships is not tested when one is the owner of a block.
  1744. \chapter{Time synchronization}\label{time_sync_chapter}
  1745. \section{Steadiness requirement}
  1746. The time is directed by the coordination server. Each server must synchronize to that time.
  1747. The time is never guaranteed to be the real time, but it is guaranteed that any time query return value will be strictly greater than the highest recorded time of the server. When the system is running in normal and degraded state, the unit of the subtraction of timestamps is nanoseconds, else it is undefined.
  1748. \section{Storage side requirement}
  1749. Storage side MUST keep stored its natural clock offset and the actual server offset as obtained per the time synchronization protocol.
  1750. Along with the server time, a value MUST be provided, named $t_{sync}$, representing the time required for the server to communicate with all storages summed with the reply time slowest storage to copy back.
  1751. $t_{sync}$ MAY be calculated pessimistically to provide atomicity for longer operations.
  1752. Storage side MUST refuse any transaction that would make write a node before in time than a node in the same record chain.
  1753. Storage side MAY free any non protected record that is labeled after $t_{last}$.
  1754. \begin{center}
  1755. \label{time_sync_equation}
  1756. \begin{minipage}{0.6\textwidth}
  1757. $t_{last} = t_{global} + t_{sync}$
  1758. \\\\
  1759. $t_{global} = t_{client} + t_{offset}$
  1760. \\\\
  1761. $with~~t_{offset} = \overline{t_{client}-t_{server}+(\frac{t_{return} - t_{client}}{2})}$
  1762. \\\\
  1763. $\overline{n}$ being the average recorded value of $n$.
  1764. $t_{client}$, $t_{server}$, $t_{return}$ being as defined at page \pageref{fig:time_proto}.
  1765. \end{minipage}
  1766. \end{center}
  1767. Storage servers MUST be able to provide their last registered timestamp.
  1768. \chapter{Client side}
  1769. It is advised that all of the primitives used for encryption and random number generation are copies of the primitives from the OpenBSD project wherever possible, from other public domain projects otherwise. They are to be contained in a separate library dynamically linked with the client. They are to only operate on memory allocated through a randomized layout memory allocator and blurred in randomized data.
  1770. \section{Key blocks and system root layout}
  1771. Key blocks are a tuple of a Block ID, a cipher identifier, and a root position in the Block ID. A cipher identifier is a list of tuples containing a cipher identifier followed by a righteously sized key.
  1772. The ciphers in a key block are ordered by the order they are used for encryption of a data point.
  1773. The system root layout is composed of a sequence of authentication headers, each of them containing a token that is required to prove ownership of a key to the authentication system.
  1774. It is also composed of a separate sequence containing encrypted key blocks. Both sequences have to be refreshed together at each authentication.
  1775. Storage of an additional initialization vector is possible. Both sequences are separate entities but are considered to be following each other, the authentication one first followed by the key sequence. No padding should be inserted between them.
  1776. \section{Authentication}
  1777. Along with the Key blocks and authentication headers, the system also contains a space to store a nonce. Any multi-precision integer if up to 256 bytes can be stored in that space.
  1778. Transmission of that number is done as an hexadecimal number in a null terminated string.
  1779. Whatever the authentication mode, the workflow is the following:
  1780. \begin{enumerate}
  1781. \item A websocket over SSL connection is established
  1782. \item The user provides its User ID, blocking authentications for that user for 30 seconds
  1783. \item The server provides back the user's authentication header and the nonce as a JSON document
  1784. \begin{itemize}
  1785. \item[\texttt{"auth\string_head"}:] the authentication headers as a byte array
  1786. \item[\texttt{"nonce"}:] the nonce as an hexadecimal string
  1787. \end{itemize}
  1788. \item The user deciphers the authentication headers, then prepend a new authentication header
  1789. \item The user then sends the deciphered token, as well as the new authentication header and the new nonce
  1790. \item If the server reads the same token as the deciphered token, then the authentication headers, the token and a nonce are replaced. The servers then transmits an authentication token
  1791. \end{enumerate}
  1792. \begin{figure}[H]
  1793. \centering
  1794. \begin{bytefield}[bitwidth=0.13em]{192}
  1795. \bitheader{0,7,31,63,71,95} \\
  1796. \bitbox[lrtb]{8}{\texttt{s}}
  1797. \bitbox[tr]{184}{}\\
  1798. \wordbox[lr]{1}{\texttt{Forward randomized padding of length s bytes}}\\
  1799. \skippedwords \\\wordbox[lbr]{1}{}\\
  1800. \bitbox[rl]{64}{\texttt{Root count}}
  1801. \bitbox[rl]{32}{\texttt{token sz}}
  1802. \bitbox[rl]{96}{\texttt{Authentication token}}\\
  1803. \begin{rightwordgroup}{this element is repeated\\for every root}
  1804. \wordbox[ltr]{1}{\texttt{record identifier}}\\
  1805. \bitbox[lrt]{8}{\texttt{n}}
  1806. \bitbox[rt]{184}{\texttt{root name}}\\
  1807. \bitbox[lrt]{8}{\texttt{m}}
  1808. \bitbox[rt]{184}{\texttt{root metadata}}\\
  1809. \bitbox[lrt]{8}{\texttt{c}}
  1810. \bitbox[rt]{64}{\texttt{cipher id}}
  1811. \bitbox[rt]{120}{\texttt{KEY}}\\
  1812. \bitbox[lrt]{64}{\texttt{cipher id}}
  1813. \bitbox[rt]{128}{\texttt{KEY}}\\
  1814. \bitbox[lrt]{64}{\texttt{cipher id}}
  1815. \bitbox[rt]{128}{\texttt{KEY}}
  1816. \end{rightwordgroup}\\
  1817. \wordbox[ltr]{1}{\vspace{0.96em}\texttt{\ldots}} \\
  1818. \skippedwords \\\wordbox[lrb]{1}{}
  1819. \end{bytefield}
  1820. \caption{Authentication header layout}
  1821. \label{fig:authheader}
  1822. \end{figure}
  1823. In the \autoref{fig:authheader}, the \texttt{cipher id} correspond to a specifically attributed identifier for each supported ciphers. The identifiers are specified as big-endian 64bits integers. They are associated with the related key size. Identifiers with a most significant byte left to 0 are explicitly unused and are free for any user defined primitives. The other variables respectively refer to:
  1824. \begin{itemize}
  1825. \item[\texttt{n}:] length of the \texttt{root name}
  1826. \item[\texttt{m}:] length of the \texttt{root metadata}
  1827. \item[\texttt{c}:] number of ciphers in the cryptographic pipeline
  1828. \end{itemize}
  1829. \subsection{Password mode}
  1830. In password mode, the password get blurred into an expanded key. This expanded key along with the nonce, encrypts the authentication header.
  1831. This expansion is to use PBKDF2/HMAC-SHA1 with a salt determined as the user name.
  1832. \subsection{One-Time Pad key mode}
  1833. In one time pad mode, the nonce corresponds to the offset from the start of the pad.
  1834. The pad information is stored in a complete directory. The directory is expected to have the following structure:
  1835. \begin{figure}[H]
  1836. \centering
  1837. \begin{minipage}{0.72\textwidth}
  1838. \begin{itemize}
  1839. \item[\texttt{/pad}:] A file containing raw high entropy data to use as a one-time pad
  1840. \item[\texttt{/userinfo.json}:] configuration information for the authentication, included the user name, eventual metadata (name and contents of certain blocks for example).
  1841. \end{itemize}
  1842. \end{minipage}
  1843. \caption{Directory structure of an authentication device}
  1844. \label{fig:pad_auth_structure}
  1845. \end{figure}
  1846. \section{Key generation}
  1847. \subsection{Random data sources}
  1848. We envision to provide cheap entropy sources that can be used with commodity laptops and with cheap hardware:
  1849. \begin{itemize}
  1850. \item microphones
  1851. \item webcams
  1852. \item mouse movements
  1853. \item radio noise
  1854. \end{itemize}
  1855. For all of those signals, only the lowest significant bits will be used as entropy sources, meaning that whatever happen only small amounts of entropy will be gathered. Multiple sources could even be used in conjunction.
  1856. Those sources will be used to be expanded into the one time pad used for authentication.
  1857. \subsection{Random data expansion}
  1858. TODO: examine the usability of a sliding key RC6 implementation in templated C++ with arbitrary $w$, $r$ and $k$ values.
  1859. \section{Library}
  1860. \subsection{Authentication API}
  1861. \subsection{Storage substructures API}
  1862. \subsubsection{\texttt{rstring}}
  1863. Recursive strings (or \texttt{rstring}) are persistent data structures. They represent an expendable span of characters through a recursive tree of pages. This recursive string can be block specific or block agnostic. Block specific recursive strings are not advised for any string that may grow longer than 4GB, corresponding to a recursion level of 2 for a block specific; they would, up to 147GB, correspond to a recursion level of 3 in block agnostic mode.
  1864. \begin{figure}[H]
  1865. \centering
  1866. \begin{tabular}{|r|r|r|r|r|}
  1867. \hline
  1868. Recursion level & Near & Far & Page count (Near) & Page count(Far)\\
  1869. \hline
  1870. 0 & 32576B & 32576B & 1 & 1\\
  1871. \hline
  1872. 1 & 15MiB & 5MiB & 510 & 170\\
  1873. \hline
  1874. 2 & 7.8GiB & 0.8GiB & 259'590 & 28'731\\
  1875. \hline
  1876. 3 & 3.9TiB & 149GiB & 132'131'819 & 4'855'540\\
  1877. \hline
  1878. 4 & 1.94PiB & 24TiB & 67'255'096'380 & 820'586'261\\
  1879. \hline
  1880. 5 & 988PiB & 3.9PiB & 34'232'844'057'929 & 138'679'078'110\\
  1881. \hline
  1882. \end{tabular}
  1883. \caption{\texttt{rstring} maximum capacity by recursion level}
  1884. \label{tab:gp_rstring_sizes}
  1885. \end{figure}
  1886. Conversion from block specific string to a block agnostic string is possible, leaving the originally written data in its former block.
  1887. \begin{table}[H]
  1888. \centering
  1889. \begin{tikzpicture}
  1890. \begin{class}[text width=0.90\textwidth]{rstring}{0,0}
  1891. \attribute{+ $<<$sz$>>$ : size\string_t}
  1892. \attribute{+ $<<$mode = no\_{}assert$>>$ : mode\string_t}
  1893. \attribute{+ fptr : gp::far\string_ptr}
  1894. \operation{+ rstring(root : gp::far\string_ptr)}
  1895. \operation{+ rstring(root : gp::far\string_ptr, nonce\string_gen : function)}
  1896. \operation{+ read(position : size\string_t, length : size\string_t)}
  1897. \operation{+ write(position : size\string_t, data : std::string\string_view)}
  1898. \operation{+ append(data : std::string\string_view)}
  1899. \operation{+ \string~rstring()}
  1900. \end{class}
  1901. \end{tikzpicture}
  1902. \caption{\texttt{gplib} class diagram for \texttt{rstring} in deprived construction}
  1903. \label{fig:gp_class_diagram_rstring_noassert}
  1904. \end{table}
  1905. \begin{figure}[H]
  1906. \centering
  1907. \begin{tikzpicture}
  1908. \begin{class}[text width=0.90\textwidth]{rstring}{0,0}
  1909. \attribute{+ $<<$mode = can\_{}assert$>>$ : mode\string_t}
  1910. \attribute{+ fptr : gp::far\string_ptr}
  1911. \attribute{+ lvl\string_value : uint64\string_t}
  1912. \attribute{+ lvl\string_cache : gp::rstring$<$can\_{}cache$>$}
  1913. \operation{+ rstring(root : gp::far\string_ptr)}
  1914. \operation{+ rstring(root : gp::far\string_ptr, nonce\string_gen : function)}
  1915. \operation{+ read(position : size\string_t, length : size\string_t)}
  1916. \operation{+ write(position : size\string_t, data : std::string\string_view)}
  1917. \operation{+ append(data : std::string\string_view)}
  1918. \operation{+ \string~rstring()}
  1919. \end{class}
  1920. \end{tikzpicture}
  1921. \caption{\texttt{gplib} class diagram for \texttt{rstring} in asserting construction}
  1922. \label{fig:gp_class_diagram_rstring_assert}
  1923. \end{figure}
  1924. \begin{figure}[H]
  1925. \centering
  1926. \begin{tikzpicture}
  1927. \begin{class}[text width=0.90\textwidth]{rstring}{0,0}
  1928. \attribute{+ $<<$mode = exclusive$>>$ : mode\string_t}
  1929. \attribute{+ fptr : gp::far\string_ptr}
  1930. \attribute{+ lvl\string_value : uint64\string_t}
  1931. \attribute{+ lvl\string_cache : gp::rstring$<$exclusive$>$}
  1932. \operation{+ rstring(root : gp::far\string_ptr)}
  1933. \operation{+ rstring(root : gp::far\string_ptr, nonce\string_gen : function)}
  1934. \operation{+ read(position : size\string_t, length : size\string_t)}
  1935. \operation{+ write(position : size\string_t, data : std::string\string_view)}
  1936. \operation{+ append(data : std::string\string_view)}
  1937. \operation{+ operator[](position : size\_{}t) : gp::pseudo\string_{}ptr}
  1938. \operation{+ \string~rstring()}
  1939. \end{class}
  1940. \end{tikzpicture}
  1941. \caption{\texttt{gplib} class diagram for \texttt{rstring} in exclusive construction}
  1942. \label{fig:gp_class_diagram_rstring_exclusive}
  1943. \end{figure}
  1944. \begin{figure}[H]
  1945. \centering
  1946. \begin{bytefield}[bitwidth=0.13em]{192}
  1947. \bitheader{0,31,63,95,127,159,191} \\
  1948. \bitbox[lrt]{32}{\texttt{level}}
  1949. \bitbox[lrt]{32}{\texttt{size}}
  1950. \bitbox[lrt]{8}{\texttt{1}}
  1951. \bitbox[lrt]{120}{\texttt{Unused}}\\
  1952. \bitbox[lrt]{128}{\texttt{ri\string_uuid}}
  1953. \bitbox[lrt]{32}{\texttt{ri\string_x}}
  1954. \bitbox[lrt]{32}{\texttt{ri\string_y}} \\
  1955. \wordbox{1}{\texttt{request\string_identifier}} \\
  1956. \bitbox[lrt]{192}{} \\
  1957. \wordbox[lr]{1}{\vspace{0.96em}\texttt{\ldots}} \\
  1958. \skippedwords \\\wordbox[lrb]{1}{}
  1959. \end{bytefield}
  1960. \caption{Non-zero level far pointer \texttt{rstring}}
  1961. \label{fig:nzfprstring}
  1962. \end{figure}
  1963. \begin{figure}[H]
  1964. \centering
  1965. \begin{bytefield}[bitwidth=0.13em]{192}
  1966. \bitheader{0,31,63,95,127,159,191} \\
  1967. \bitbox[lrt]{32}{\texttt{level}}
  1968. \bitbox[lrt]{32}{\texttt{size}}
  1969. \bitbox[lrt]{8}{\texttt{0}}
  1970. \bitbox[lrt]{120}{\texttt{Unused}}\\
  1971. \bitbox[lrt]{128}{\texttt{ri\string_uuid}}
  1972. \bitbox[lrt]{32}{\texttt{ri\string_x}}
  1973. \bitbox[lrt]{32}{\texttt{ri\string_y}} \\
  1974. \bitbox[lrt]{32}{\texttt{ri\string_x}}
  1975. \bitbox[lrt]{32}{\texttt{ri\string_y}}
  1976. \bitbox[lrt]{64}{\texttt{ri\string_coords}}
  1977. \bitbox[lrt]{64}{\texttt{ri\string_coords}} \\
  1978. \bitbox[lrt]{192}{} \\
  1979. \wordbox[lr]{1}{\vspace{0.96em}\texttt{\ldots}} \\
  1980. \skippedwords \\\wordbox[lrb]{1}{}
  1981. \end{bytefield}
  1982. \caption{Non-zero level near pointer \texttt{rstring}}
  1983. \label{fig:nznprstring}
  1984. \end{figure}
  1985. \begin{figure}[H]
  1986. \centering
  1987. \begin{bytefield}[bitwidth=0.13em]{192}
  1988. \bitheader{0,31,63,95,127,159,191} \\
  1989. \bitbox[lrtb]{32}{\texttt{0}}
  1990. \bitbox[lrtb]{32}{\texttt{size}}
  1991. \bitbox[lrt]{128}{\texttt{}}\\
  1992. \bitbox[lr]{192}{} \\
  1993. \wordbox[lr]{1}{\vspace{0.96em}\texttt{data}} \\
  1994. \skippedwords \\\wordbox[lrb]{1}{}
  1995. \end{bytefield}
  1996. \caption{Zero level \texttt{rstring}}
  1997. \label{fig:zrstring}
  1998. \end{figure}
  1999. \subsubsection{\texttt{rstring\string_table}}
  2000. \begin{figure}[H]
  2001. \centering
  2002. \begin{bytefield}[bitwidth=0.60em]{64}
  2003. \bitheader{0-3,31,63} \\
  2004. \bitbox[lrt]{1}{\texttt{0}}
  2005. \bitbox[lrt]{1}{\texttt{0}}
  2006. \bitbox[lrt]{1}{\texttt{F}}
  2007. \bitbox[lrt]{1}{\texttt{G}}
  2008. \bitbox[lrt]{60}{\texttt{}}\\
  2009. \bitbox[lrtb]{64}{\texttt{near\string_ptr<T>}}
  2010. \end{bytefield}
  2011. \vspace{0.5em}
  2012. \begin{bytefield}[bitwidth=0.60em]{64}
  2013. \bitheader{0-3,31,63} \\
  2014. \bitbox[lrtb]{1}{\texttt{1}}
  2015. \bitbox[lrtb]{1}{\texttt{0}}
  2016. \bitbox[lrt]{1}{\texttt{F}}
  2017. \bitbox[lrt]{1}{\texttt{G}}
  2018. \bitbox[lrt]{60}{\texttt{}}\\
  2019. \wordbox[lrtb]{3}{\texttt{far\string_ptr<T>}}
  2020. \end{bytefield}
  2021. \begin{itemize}
  2022. \item[\texttt{F}:] if set, \texttt{T} is a \texttt{hashseqfar}, else it is an \texttt{hashseqnear}
  2023. \item[\texttt{G}:] if set, the table uses \texttt{pointfar} as a first underlying structure, else it uses a \texttt{pointnear}
  2024. \end{itemize}
  2025. \caption{\texttt{rstring} table root}
  2026. \label{fig:rstringtable}
  2027. \end{figure}
  2028. \begin{figure}[H]
  2029. \centering
  2030. \begin{bytefield}[bitwidth=0.26em]{144}
  2031. \bitheader{0,31,63,95,127,144} \\
  2032. \bitbox[lrtb]{64}{\texttt{hash}}
  2033. \bitbox[lrtb]{64}{\texttt{near\string_ptr}}
  2034. \bitbox[lrtb]{16}{\texttt{ll}}\\
  2035. \bitbox[lr]{144}{\texttt{\ldots{}}}
  2036. \wordbox[lr]{1}{\vspace{0.96em}} \\
  2037. \skippedwords \\\wordbox[lrb]{1}{}
  2038. \end{bytefield}
  2039. \caption{\texttt{hashseqnear} substructure}
  2040. \label{fig:hashseqnear}
  2041. \end{figure}
  2042. \begin{figure}[H]
  2043. \centering
  2044. \begin{bytefield}[bitwidth=0.13em]{272}
  2045. \bitheader{0,31,63,95,127,159,191,223,255,271} \\
  2046. \bitbox[lrtb]{64}{\texttt{hash}}
  2047. \bitbox[lrtb]{192}{\texttt{far\string_ptr}}
  2048. \bitbox[lrtb]{16}{\texttt{ll}}\\
  2049. \bitbox[lr]{272}{\texttt{\ldots{}}}
  2050. \wordbox[lr]{1}{\vspace{0.96em}} \\
  2051. \skippedwords \\\wordbox[lrb]{1}{}
  2052. \end{bytefield}
  2053. \caption{\texttt{hashseqfar} substructure}
  2054. \label{fig:hashseqfar}
  2055. \end{figure}
  2056. \begin{figure}[H]
  2057. \centering
  2058. \begin{bytefield}[bitwidth=0.44em]{80}
  2059. \bitheader{0,31,63,79} \\
  2060. \bitbox[lrt]{64}{\texttt{near\string_ptr}}
  2061. \bitbox[lrt]{16}{\texttt{sz}}\\
  2062. \bitbox[tlr]{80}{\texttt{key data\ldots{}}}\\
  2063. \skippedwords \\
  2064. \bitbox[lrb]{16}{}
  2065. \bitbox[tlrb]{64}{\texttt{U}}
  2066. \end{bytefield}
  2067. \caption{\texttt{pointnear} substructure}
  2068. \label{fig:pointnear}
  2069. \end{figure}
  2070. \begin{figure}[H]
  2071. \centering
  2072. \begin{bytefield}[bitwidth=0.44em]{80}
  2073. \bitheader{0,31,63,79} \\
  2074. \wordbox[lrt]{2}{\texttt{far\string_ptr}}\\
  2075. \bitbox[lrb]{32}{}
  2076. \bitbox[lrbt]{16}{\texttt{sz}}
  2077. \bitbox[lrt]{32}{\texttt{}}\\
  2078. \bitbox[lr]{80}{\texttt{key data\ldots{}}}\\
  2079. \skippedwords \\
  2080. \bitbox[lrb]{48}{}
  2081. \bitbox[tlr]{32}{}\\
  2082. \wordbox[lrb]{2}{\texttt{U}}
  2083. \end{bytefield}
  2084. \caption{\texttt{pointfar} substructure}
  2085. \label{fig:pointfar}
  2086. \end{figure}
  2087. \texttt{U} is either a pointer to sequel of the key information in a \texttt{rstring} or a continuation of the data if the \texttt{sz} is above the addressable size.
  2088. \subsection{Cryptographic API}
  2089. \subsubsection{Cryptographic pipelines}
  2090. Cryptographic pipelines are a mean to stack encryptions and keys. their implementation is a sequence of page encrypters virtual objects holding their specific key. Cryptographic pipelines must offer a solution to clean their data up before deletion.
  2091. Cryptographic pipelines are expected to expose a function \texttt{decrypt} and a function \texttt{encrypt} both taking a reference to a page, a record identifier and a \texttt{Maybe(UInt64)} monad or similar type. Neither the \texttt{encrypt} nor the \texttt{decrypt} functions shall be able to fail with an exception. They should always leave the page is a valid state such as: \begin{itemize}
  2092. \item[] $encrypt(decrypt(P_a,r_i,Maybe(n)),r_i,Maybe(n))=P_a$
  2093. \item[] $decrypt(encrypt(P_a,r_i,Maybe(n)),r_i,Maybe(n))=P_a$
  2094. \end{itemize}
  2095. $P_a$ being an arbitrary page, $r_i$ any record identifier, and $Maybe(n)$ a monadic value representing an integer $n$ or its absence.
  2096. Cryptographic pipelines are to always be declared in decryption order.
  2097. \subsection{Distributed hash table}
  2098. The hash table used is an \texttt{byte-ll} hash table using a pointer size defined either being a 64bits near pointer or a 192bits far pointer.
  2099. The record part of the table is a fully allocated \texttt{rstring\string_table} root containing a pointer to a \texttt{hashseqnear}/\texttt{hashseqfar} as fit per their definitions. Each key/value pair is a separate \texttt{pointnear}/\texttt{pointfar} structure pointing to a \texttt{rstring} of the data.
  2100. \subsection{Distributed block}
  2101. Distributed blocks can be allowed to be cached on read, if they are, then they are locked and the lock are refreshed on a regular basis.
  2102. Distributed blocks are a special implementation of a \texttt{rstring}, they are to be used directly in the locked mode as a block storage system.
  2103. \chapter{Block storage system}
  2104. The block storage system is an explicit case of using the distributed block system, with the associated locking. It is redimensionable to bigger sizes. It is allocated using the smallest cached block mode required.
  2105. Blocks are composed of a recursive construct of far pointers to full database pages.
  2106. \section{Header Block definition}
  2107. \chapter{Native file system}
  2108. \section{Header Block definition}
  2109. \section{Properties}
  2110. \subsection{Limits}
  2111. \subsection{Permissions}
  2112. \subsubsection{NT ACL}
  2113. \subsubsection{Unix users}
  2114. \subsubsection{ACL2Unix}
  2115. \section{Caution advised}
  2116. \subsection{Client omnipotence}
  2117. \subsection{Vulnerabilities by design}
  2118. \appendix
  2119. \part{Annexes}
  2120. %\setcounter{chapter}{1}
  2121. %\renewcommand\thechapter{\Alph{chapter}}
  2122. \chapter{Encryption popularized}
  2123. \label{annex:encryption_popularized}
  2124. \textbf{Encryption: }The act to transform a message into a random looking cipher-text. The original message is often named the plain-text.
  2125. \hspace{-1.4em}\textbf{Cipher: }The mathematical function that transforms a plain-text into a cipher-text
  2126. \vspace{1.5em}\hspace{-1.4em}Encrypting data can be done in various ways. Each way have its properties and its resistances to certain types of attacks. All modern cryptography is key-based cryptography. It means that the way we encrypt data is not secret, what is secret is a value, named the key, that is used to encrypt the data.
  2127. \section{Properties of encryption}
  2128. We will here explain some of the properties a cipher can hold.
  2129. \subsection{Resistance}
  2130. A cipher generally have its resistance expressed as a power of two (e.g.: $2^{103}$) or as a number of bits of entropy (e.g.: $103~bits$). It is to be noted that this scale is not linear: it is exponential.
  2131. This means that a cipher that have $104~bits$ of entropy is 2 times harder to break than one with a resistance of $103~bits$ of the same family. Comparing resistance between different families is not relevant.
  2132. \subsection{Compactness}
  2133. Compactness of a cipher means that if you encrypt a message of side $n$ you will obtain a cipher-text of the same size. Conversely, If a cipher can generates a longer cipher-text than its message, it is said to be not compact.
  2134. for example, let's consider a simple cipher: for a message $A$, read it as a number and multiply it with a value that will be the key.
  2135. If your message is for example 8 digits, like $00005555$ and the key is $12345678$, the cipher-text will be equal to $5555 \times 12345678 = 68580241290$ which make a 11 digits cipher-text from a 8 digit message, and hence make the transformation non compact.
  2136. An example of compact transformation is the truncated addition, also named modulo addition, used by systems like one-time pads.
  2137. \subsection{Forward secrecy}
  2138. Forward secrecy is the property of a encryption system to protect parts of the messages given some were compromised. The typical use-case is, for example, to prevent decryption of a message if the message before that was compromised.
  2139. Forward secrecy is very important for messaging systems as it is suited to the fact that the key may be changed quite often, or that the key is not sufficient to break the encryption.
  2140. \subsection{Durability of secrecy}
  2141. Durability of an encryption system depends on two main factors: the resistance, all flaws taken in consideration, of the encryption scheme and the evolution of computers and their accessibility in the future.
  2142. You can put it in the terms of encryption having a expiration date. Past this time, someone that started decryption of the data you encrypted immediately may have deciphered your data.
  2143. \subsection{Encryption flaws/Cryptanalysis}
  2144. Some encryption systems have know flaws. Flaws in cryptography can either render the encryption obsolete or lower the work required to find a key. They always affect the durability of the secrecy in different scales.
  2145. For example, some flaws of the AES algorithm make it less safe by an order of magnitude of several thousands times faster to break. A complete brute-force attach is believed to take $3\times{}10^{51}$ years at most for the AES-256 variant with 50 surrealistic supercomputers able to compute a billion billion keys per second, With that it is clear that breaking it even with an advantage due to flaws is not realistic in a human lifetime.
  2146. Some other flaws may compromise systems are flaws that compromise completely cryptographic systems or that are predicted to theoretically compromise it in the years to come. An example of that is prime number factorization based asymmetrical, currently threatened by quantum based cryptography.
  2147. Research of flaws in cryptographic systems is named cryptanalysis.
  2148. \subsection{Side channel attacks}
  2149. Side-channel attacks refer to attacks on a cryptographic system that do not affect the way the system is designed but the way it is implemented. For example, using the sound made by electric current in a CPU have been used against some implementations of the OpenSSL library to deduce part of the key that was being used.
  2150. It is very hard to predict side channel attacks, and just as hard to prevent them.
  2151. A typical mitigation is for example to ensure all cryptographic operations take a constant amount of time. This prevents a typical attack called a time-based side channel attack.
  2152. \subsection{Homomorphism}
  2153. Homomorphism means that for a message $A$ and an operation $f : x$ (for example, if $f : x \rightarrow x \times 2$ means the operation of multiplying by 2), if you apply a cipher to $A$ and get a cipher-text $B$, there exist a way to apply $f : x$ to $B$ in such a way that decipherion of $B$ gives you the result of applying $f : x$ to $A$.
  2154. Expressed more simply, it means the you can execute operations on encrypted data without requiring to decipher it or understand it. Very few encryption mechanisms are fully homomorphic and those are mostly in research\autocite{Gentry:2009:FHE:1834954}.
  2155. \section{Types of encryption}
  2156. Encryption can express itself in different forms regarding to its way to handle the cryptographic key. Some have only one key, that must be known for encrypting and deciphering the data, we call those symmetrical ciphers; some have two keys, one for encrypting and one for deciphering, we call those asymmetrical ciphers.
  2157. \subsection{Symmetrical encryption}
  2158. Symmetrical encryption aims to encrypt data on a two way channel. The key allows you to both write encrypted data and read encrypted data, making it very suitable for securing a network channel once the keys have been safely exchanged, or to encrypt a disk.
  2159. \begin{figure}[h]
  2160. \begin{center}
  2161. \begin{itemize}
  2162. \item Rijndael
  2163. \item Chacha20
  2164. \item Blowfish
  2165. \item Serpent
  2166. \item Twofish
  2167. \item CAST5
  2168. \item RC4 and RC6
  2169. \item DES
  2170. \item 3DES
  2171. \item Skipjack
  2172. \item IDEA
  2173. \end{itemize}
  2174. \end{center}
  2175. \caption{List of symmetrical ciphers (non-exhaustive)}
  2176. \label{fig:sym_ciphers}
  2177. \end{figure}
  2178. Symmetrical encryption also have the advantage that in some cases it is possible to interweave some amounts of the encrypted data together, making the data harder to decipher if a part of it is missing. This is especially used when encrypting data that is read sequentially like network data, but this feature is not suitable for encrypting data on a disk or on any random access media.
  2179. \subsection{Asymmetrical encryption}
  2180. The goal of asymmetrical encryption is to provide ways to authenticate messages, ways to encrypt a message with a key and decipher it with a different key, and more generally ways to exchange secrets.
  2181. \begin{figure}[h]
  2182. \begin{center}
  2183. \begin{itemize}
  2184. \item Prime factorization based (RSA)
  2185. \item Elliptic curve based (ECDSA)
  2186. \item Paillier crypto system
  2187. \item Lattice based (NTRU, BLISS\autocite{Gentry:2009:FHE:1834954})
  2188. \end{itemize}
  2189. \end{center}
  2190. \caption{List of asymmetrical ciphers (non-exhaustive)}
  2191. \label{fig:asym_ciphers}
  2192. \end{figure}
  2193. It is evolving a lot nowadays as the most used algorithms are not extremely resistant to being deciphered by quantum mechanics based computers, in particular, systems based on the prime factor decomposition problem are hard to solve by typical computers but in theory not as hard by quantum computers.
  2194. This kind of cryptographic systems are generally used to exchange keys for symmetrical encryption.
  2195. \subsection{One-Time Pads}
  2196. One time pads are a cryptography technique that suppose both sides of a communication own a shared pad at least the size of the data to encrypt. Elements of the pad are added with a modulo addition to the plain-text to generate the cipher-text.
  2197. There is no known way to decipher the data without the pad, making One-Time Pad (or OTP, not to be mixed with One-Time Passwords) the safest cryptographic scheme, albeit an unrealistic one for large amounts of data.
  2198. It also relies on a different channel to transmit the pad in case of communications, yet doesn't in the case of storage.
  2199. \backmatter
  2200. %----------------------------------------------------------------------------------------
  2201. % BIBLIOGRAPHY
  2202. %----------------------------------------------------------------------------------------
  2203. \chapter*{Bibliography}
  2204. \addcontentsline{toc}{chapter}{\textcolor{ocre}{Bibliography}} % Add a Bibliography heading to the table of contents
  2205. %------------------------------------------------
  2206. %\section*{Books}
  2207. %\addcontentsline{toc}{section}{Books}
  2208. \printbibliography[heading=bibempty]
  2209. \printglossary
  2210. %------------------------------------------------
  2211. \listoftables
  2212. %------------------------------------------------
  2213. \listoffigures
  2214. %------------------------------------------------
  2215. %\section*{Books}
  2216. %\addcontentsline{toc}{section}{Books}
  2217. %\printbibliography[heading=bibempty,type=book]
  2218. %----------------------------------------------------------------------------------------
  2219. % INDEX
  2220. %----------------------------------------------------------------------------------------
  2221. \cleardoublepage % Make sure the index starts on an odd (right side) page
  2222. \phantomsection
  2223. \setlength{\columnsep}{0.75cm} % Space between the 2 columns of the index
  2224. \addcontentsline{toc}{chapter}{\textcolor{ocre}{Index}} % Add an Index heading to the table of contents
  2225. \printindex % Output the index
  2226. \end{document}