What is compiler construction in computer science
The compiler is a structured program mainly used to translate the source code into machine codes. The compiler is also known as a language translator. Each programming language has its individual compilers or interpreters. The designing of the interpreter is not more problematic than a compiler. Here in this article, we will discuss What is compiler construction in computer science.
There are numerous types of compilers accessible in the market. Pascal, FORTRAN, COBOL, C, C++, C#, Java, and several other high-level programming and their compilers are widespread among software professionals.
What is compiler construction in computer science?
Principle of the compiler:
The source program is provided in the compiler. Usually, the compiler converts that source code into machine codes, but certain compiler converts source codes into assembly language, and then those assemblers convert them into machine language.
Importance of compiler construction:
The compiler is firstly used to convert human-comprehensible code in any programming language to convert it into a machine comprehensible language that is binary (0, 1) therefore that the computer can read the data and carry out the operations and generate an output.
Secondly, Compilers give us the theoretical and practical information that is required to implement a programming language. Once we have learned to do a compiler, we pretty much know the insides of many programming languages.
Example:
Let’s suppose there is a FORTRAN compiler for personal computers and another for Apple Macintosh computers. Furthermore, the compiler industry is pretty competitive; hence there are actually many compilers for each language on each type of computer.
Phases of compilers
The process of the compiler of any compilers has certain special types of processes. These processes are completed in pre-defined phases. When one phase is ended next phase is started. As the time of compiler designing, each phase is designed very sensibly. The compiler is a system in which the source program has to pass from diverse phases and eventually converted them into machine codes.
These phases are as follows:
- Lexical Analyzer
- Syntax Analyzer
- Semantic Analyzer
- Intermediate code generator
- Code Generator
- Symbol table manager
- Error Handler
Lexical Analysis
The scanning or lexical analysis is a process to proceeds source program and process lexical tokens or tokens. The lexical analysis is competently handled by a lexer or lexical analyzer.
Example:
Amount: = salary + rent;
Tokens are as follows:
(1) Amount: Identifier
(2):=: Assignment sign
(3) Salary: Identifier
(4) +: Operator
(5) Rent: Identifier
The token is a bottom-level sequence of sub-strings which covers numerical constants, literal strings, operator symbols, punctuation symbols, and control constructions such as assignment, conditions, and looping.
Syntax Analysis
Firstly, The assemblage of tokens into grammatical expressions is called syntax analysis or parsing. Secondly, the context-free syntax is usually used for labeling the structure of language. Thirdly, The BNF (Backus- Naur Form) notation is widespread in computer science, which is used to signify and define the grammar of programming language.
Finally, A syntax analyzer takes tokens as input and output error notes if program syntax is incorrect. There are many algorithms for parsing. The most popular types of parsing are top-down parsing and bottom-up parsing.
Parse tree:
- Firstly, The syntactic structure of a string rendering to some formal grammar is known as a Parse tree.
- Secondly, A program that makes parse is known as the parser.
- Thirdly, Parser makes parse trees for regular and computer programming languages.
Semantic Analysis
The semantic analysis is a procedure of semantic error recognition in the source program. At times, grammatically precise statements are not semantically correct. So, the compiler is prepared with semantic error checking facilities.
Intermediate code generator
Firstly, The parse tree is caused by the method of syntax analysis. Secondly, The transitional code generator transforms the parse tree into an intermediary language which signifies the source code program. Three-Address Code is a popular type of intermediate language.
Example:
amount:= salary op rent
The amount, rent, and salary are operand and op is a binary operator.
Code Optimizer
It is used to expand the amount produced by the intermediate code generator. It enhances intermediate codes and gives fast-running machine codes. There are two most commonly used optimizations: Local optimization and Loop optimization.
Code Generator
It is the last phase of compilation in which re-locatable machine codes or assembly codes are formed. The declarations “amount: =salary + rent;” can be converted into assembly language.
The assembly language version of the statement:
LOAD salary
ADD rent
STORE amount
Assembler changes assembly codes into machine codes for CPU since CPU understands only machine codes, not any high-level programming languages.
Symbol-table Management
The table of identifiers and their characteristics is termed a symbol table.
Error handling
Every phase of compilation has some errors. These errors are composed and noticed at the time of compilation by error handling phases.
Top-down parser and its importance in the compiler:
Top-down Parsing
When the parser twitches constructing the parse tree from the twitch symbol and then tries to convert the start symbol to the input, it is known as top-down parsing.
Types of top-down Parsing:
There are mainly two types of top-down parsing;
Recursive descent parsing:
It is a shared form of top-down parsing. It is named recursive as it uses recursive measures to process the input. Recursive descent parsing undergoes backtracking.
Backtracking:
It means, if one origin of a product fails, the syntax analyzer resumes the process using diverse rules of the same production. This method may process the input string more than once to regulate the right production.
Importance of top-down parsing in compiler construction:
- Firstly, In compiler construction, top-down parsing is a parsing approach where one first looks at the uppermost level of the parse tree and works down the parse tree by using the rephrasing rules of formal grammar. LL parsers are a form of the parser that uses a top-down parsing approach.
- Secondly, Top-down parsing is mainly an approach of analyzing unidentified data relationships by theorizing general parse tree structures and then seeing whether the known essential structures are well-matched with the hypothesis. It arises in the analysis of both natural languages and computer languages.
- Thirdly, Top-down parsing can be observed as an attempt to find left-most derivations of an input stream by examining for parse-trees using a top-down expansion of the assumed formal grammar rules.
- Finally, Simple operations of top-down parsing do not let go of left-recursive grammars, and top-down parsing with backtracking may have exponential time complication with respect to the length of the input for unclear CFGs. On the other hand, more refined top-down parsers have been produced by Frost, Hafiz, and Callaghan which do provide vagueness and left recursion in polynomial time and which make polynomial-sized representations of the possibly exponential number of parse trees.
Hope you found much information about what is compiler construction in computer science. See also an interesting article on “criminal detection using face recognition”
https://modernabiotech.com/2021/05/10/what-is-pipelining-in-computer-architecture/