Shoaib Ahmed Malik: exam 70-461 summary

Chapter 1

Dated : 18-Oct-2014
T-SQL is based on strong mathematical foundation.
Set theory + Predicate logic => Relational Model => SQL => T-SQL.
SQL Standard =>> International Organization for Standards (ISO) and the American National Standards Institute (ANSI).

Standard SQL is based on the relational model, which is a mathematical model for data management and manipulation

A relation in the relational model is what SQL calls a table. a relation has a heading and a body heading of a relation is a set of attributes, and the body a set of tuples mathematical branches that the relational model is based on

· Set theory

· Predicate logic

By a “set” we mean any collection M into a whole of definite, distinct objects m (which are called the “elements” of M) of our perception or of our thought by

The other branch of mathematics that the relational model is based on is called predicate logic.

A predicate is an expression that when attributed to some object, makes a proposition either true or false. For example, “salary greater than $50,000” is a predicate

The relational model uses predicates as one of its core elements. You can enforce data

integrity by using predicates. You can filter data by using predicates

Name two aspects in which T-SQL deviates from the relational model

A relation has a body with a distinct set of tuples.

A table doesn’t have to have a key.

T-SQL allows referring to ordinal positions of columns in the ORDER BY

clause.

Why are the terms “field” and “record” incorrect when referring to column and

row?

Because “field” and “record” describe physical things, whereas columns and

rows are logical elements of a table.

Why is the term “NULL value” incorrect?

Because NULL isn’t a value; rather, it’s a mark for a missing value

Make the Non relational Query Relational

· Define an attribute name by assigning an alias to the expression YEAR(orderdate).

· Add a DISTINCT clause to remove duplicates.

· Also, remove the ORDER BY clause to return a relational result.

Lesson-2 - Logical query processing

T-SQL has both logical and physical sides to it.

The logical side is the conceptual interpretation of the query that explains what the correct result of the query is.

The physical side is the processing of the query by the database engine.

Physical processing must produce the result defined by logical query processing.

Logical query processing - the conceptual interpretation of the query that defines the correct result.

the logical processing of a query must first know which table is being queried

before it can know which attributes can be returned from that table

Therefore, contrary to the keyed-in order of the previous query, the logical query processing has to be as follows.

FROM Sales.Shippers

SELECT shipperid, phone, companyname

Logical query processing follows following sequence,

1. Evaluate the FROM Clause

2. Evaluate the Where Clause

3. Evaluate the Group by Clause

4. Evaluate the Having Clause

5. Evaluate the Select Clause

6. Evaluate the Order by Clause

What is the difference between the WHERE and HAVING clauses?

The WHERE clause is evaluated before rows are grouped, and therefore is evaluated

per row. The HAVING clause is evaluated after rows are grouped, and therefore

is evaluated per group.

Dated : 19-Oct-2014

Chapter -2

Delimiting Identifiers

The rules say that the first character must be

a letter in the range A through Z (lower or uppercase), underscore (_), at sign (@), or number

sign (#).

Subsequent characters can include letters, decimal numbers, at sign, dollar sign ($),

number sign, or underscore. The identifier cannot be a reserved keyword in T-SQL, cannot

have embedded spaces, and must not include supplementary characters.

An identifier that doesn’t comply with these rules must be delimited.

must be delimited as "2006" or [2006].

Choosing a Data Type for Keys

Understanding the built-in tools T-SQL provides for generating surrogate keys like the

sequence object, identity column property, and the NEWID and NEWSEQUENTIALID

functions, and their impact on performance, is an important skill for the exam.

20-Oct-14

Float is Approximate-number data type, which means that not all values in the data type range can be represented exactly.

Decimal/Numeric is Fixed-Precision data type, which means that all the values in the data type reane can be represented exactly with precision and scale.

Converting from Decimal or Numeric to float can cause some loss of precision

Use char when the sizes of the column data entries are consistent.

Use varchar when the sizes of the column data entries vary considerably.

Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.

what is difference between below conversion function in sql server 2012?
1).Cast
2).Convert
3).Parse
4).Try_Cast
5).Try_Convert
6).Try_Parse.

TRY_PARSE ( string_value AS data_type [ USING culture ] )

This function works similar to the well known CAST, but will return NULL if the string_value does not represent a valid formatted value of the given data type.

PARSE ( string_value AS data_type [ USING culture ] )

This works pretty much the same as TRY_PARSE, but this function will throw an error if the string_value passed to the function is not a valid representation of the data type. Let me show you the similar parsings as above:

1. SELECT PARSE('13-12-2011' AS datetime USING 'da-DK') AS Result

2. SELECT PARSE('13-12-2011' AS datetime USING 'en-US') AS Result

TRY_CONVERT ( data_type [ ( length ) ], expression [, style ] )

This is pretty much the same as the well known CONVERT, but again with the NULL output for invalid input strings.

But what if there were million of rows in the table, and multiple non-integer values? It would not be pleasant to try to identify them all. But with TRY_PARSE we can now do this to identify the rows:

1. SELECT * FROM MyBadTableDesign

2. WHERE TRY_PARSE(val AS INT) IS NULL

3. SELECT CAST('abc' AS INT);

4. Conversely, the following code returns a NULL.

5. SELECT TRY_CAST('abc' AS INT);

24-Oct-14

String Length

LEN and DATALENGTH

Len >> return number charater

DataLength>> return number of bytes

Would you use the type FLOAT to represent a product unit price

No, because FLOAT is an approximate data type and cannot represent all values

Precisely.

What is the difference between NEWID and NEWSEQUENTIALID

The NEWID function generates GUID values in random order, whereas the

NEWSEQUENTIAL ID function generates GUIDs that increase in a sequential

order.

When concatenating character strings, what is the difference between the plus

(+) operator and the CONCAT function?

The + operator by default yields a NULL result on NULL input, whereas the

CONCAT function treats NULLs as empty strings.

differences between COALESCE and ISNULL

COALESCE, it returns the first input that is not NULL.

ISNULL is actually more limited than COALESCE

DECLARE

@x AS VARCHAR(3) = NULL,

@y AS VARCHAR(10) = 'unknows';

SELECT COALESCE(@x, @y) AS [COALESCE], ISNULL(@x, @y) AS [ISNULL];

OUTPUT will be

COALESCE ISNULL

---------- ---------

unknows unk

NULLIF function. This function accepts two input expressions,

returns NULL if they are equal, and returns the first input if they are not. For example,

consider the expression NULLIF(col1, col2). If col1 is equal to col2, the function returns a

NULL; otherwise, it returns the col1 value.

IIF and CHOOSE, these are nonstandard T-SQL functions that were added to simplify

migrations from Microsoft Access platforms.

IIF(<predicate>, <true_result>, <false_or_unknown_result>)

This expression is equivalent to the following.

CASE WHEN <predicate> THEN <true_result> ELSE <false_or_unknown_result> END

DATETIME data type uses 8 bytes of storage. SQL Server 2012 supports the DATE

data type, which uses 3 bytes of storage

new functions

EOMONTH , FORMAT , DATEFROMPARTS

SWITCHOFFSET

and TODATETIMEOFFSET

25-Oct-14

TRY_CAST returns a NULL instead of failing

The right form for a Unicode character string literal is to prefix the literal with a capital N and delimit the

literal with single quotation marks; for example, N'literal'

WHERE lastname = N'Davis'; best practice to use the proper form

pattern matching

'_D%': string where second character is D,

'[AC]%': string where first character is A or C,

'[0-9]%': string where first character is a digit

'[^0-9]%': string where first character is not a digit

term three-valued logic refer to in T-SQL as true, false, and unknown

literals are language-dependent and language-neutral for the DATETIME data type

SQL predicates are search arguments

companyname LIKE 'A%'

orderdate > = '20120212' AND orderdate < '20120213'

Is an expression that evaluates to TRUE, FALSE, or UNKNOWN. Predicates are used in the search condition of WHERE clauses and HAVING clauses, the join conditions of FROMclauses, and other constructs where a Boolean value is required

guarantee the order of the rows in the result of a query is used of order by Clause.

You want result rows to be sorted by orderdate descending, and then by orderid,

descending.

ORDER BY orderdate DESC, orderid DESC.

WITH TIES grabs all other records that matches the value of “order by” column.

SELECT TOP(1) WITH TIES * FROM #MyTable ORDER BY age desc

All record have age 20 shows

Filtering Data with OFFSET-FETCH

query defines ordering based on order date descending, followed by order ID

descending; it then skips 50 rows and fetches the next 25 rows.

SELECT orderid, orderdate, custid, empid

FROM Sales.Orders

ORDER BY orderdate DESC, orderid DESC

OFFSET 50 ROWS FETCH NEXT 25 ROWS ONLY

26-oct-2014

What’s the difference between the ON and the WHERE clauses

If the OUTER keyword is omitted from LEFT OUTER JOIN, RIGHT OUTER

JOIN, and FULL OUTER JOIN, the meaning is retained.

27-OCT-2014

With subqueries, you can nest queries within queries. You can use self-contained

subqueries as well as correlated ones. You can use subqueries that return single-valued

results, multi-valued results, and table-valued results.

T-SQL supports four kinds of table expressions, which are named query expressions.

Derived tables and CTEs are types of table expressions that are visible only in the scope

of the statement that defined them. Views and inline table-valued functions are reusable

table expressions whose definitions are stored as objects in the database. Views

do not support input parameters, whereas inline table-valued functions do

What are the benefits of using a CTE over derived tables?

· If you want to refer to one derived table from another, you need to nest

them. With CTEs, you separate those by commas, so the code is more modular and

easier to follow.

· Because the CTE name is defined before the outer query that uses it, the

outer query is allowed to refer to multiple instances of the same CTE name.

The APPLY operator has two

versions; the CROSS APPLY version doesn’t return left rows that get an empty set back

from the right side. The OUTER APPLY operator preserves the left side, and therefore,

does return left rows when the right side returns an empty set.

What is the difference between the result of T1 CROSS APPLY T2 and T1 CROSS JOIN

T2 (the right table expression isn’t correlated to the left)?

· Both return the same result when there’s no correlation because CROSS

APPLY applies all rows from T2 to each row from T1

1-Nov-2014

What are the clauses that you can use to define multiple grouping sets in the

same query?

GROUPING SETS, CUBE, and ROLLUP

Pivoting Data

Pivoting is a technique that groups and aggregates data, transitioning it from a state of rows

to a state of columns.

The PIVOT operator determines the grouping element by elimination—

it’s what’s left besides the spreading and aggregation elements.

You cannot specify a computation as input to the aggregate function,rather just a name of a column from the input table.

You cannot specify a computation as the spreading element, rather just a name of a column from the input table.

You cannot specify a subquery in the IN clause, rather just a static list.

You cannot specify multiple aggregate functions, rather just one.

WINDOW FUNCTIONS

Like group functions, window functions also enable you to perform data analysis computations

window functions, you define the set of rows per function—and then return one result value per each underlying row and function. You define the set of rows for the function to work with using a clause called OVER.

window aggregate functions(SUM,AVG,COUNT) are applied to a window of rows defined by the OVER clause.

For example, the expression SUM(val) OVER() represents the grand total of all rows in the underlying query.

For example, the expression SUM(val) OVER(PARTITION BY custid) represents the current customer’s total.

SELECT custid, orderid,

val,

SUM(val) OVER(PARTITION BY custid) AS custtotal,

SUM(val) OVER() AS grandtotal

FROM Sales.OrderValues

Indicate the delimiters as one of three options:

UNBOUNDED PRECEDING or FOLLOWING, meaning the beginning or end of the partition,

respectively

CURRENT ROW, obviously representing the current row

<n> ROWS PRECEDING or FOLLOWING, meaning n rows before or after the current,

respectively

WITH RunningTotals AS

(

SELECT custid, orderid, orderdate, val,

SUM(val) OVER(PARTITION BY custid

ORDER BY orderdate, orderid

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS runningtotal

FROM Sales.OrderValues

)

SELECT *

FROM RunningTotals

WHERE runningtotal < 1000.00;

ROW_NUMBER, RANK, DENSE_RANK, and NTILE

The ROW_NUMBER function computes a unique sequential integer starting with 1 within

the window partition based on the window ordering.

RANK and DENSE_RANK differ from ROW_NUMBER in the sense that they assign the same

ranking value to all rows that share the same ordering value.

What are the clauses that the different types of window functions support.

Partitioning, ordering, and framing clauses.

What do the delimiters UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

represent?

The beginning and end of the partition, respectively.

Window Offset Functions, Window offset functions return an element from a single row that is in a given offset from the current row in the window partition, or from the first or last row in the window frame

LAG, LEAD, FIRST_VALUE, and LAST_VALUE

SELECT custid, orderid, orderdate, val,

FIRST_VALUE(val) OVER(PARTITION BY custid

ORDER BY orderdate, orderid

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS first_val,

LAST_VALUE(val) OVER(PARTITION BY custid

ORDER BY orderdate, orderid

ROWS BETWEEN CURRENT ROW

AND UNBOUNDED FOLLOWING) AS last_val

FROM Sales.OrderValues;

What default frame window functions

RANGE UNBOUNDED PRECEDING

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

What do the RANK and DENSE_RANK functions compute.

The RANK function returns one more than the number of rows that have a lower

ordering value than the current; the DENSE_RANK function returns one more than

the number of distinct ordering values that are lower than the current.

So, we find that Rank() skips the ranking number when it gets same OrdCount but Dense_Rank() maintains ranking order.

Why are window functions allowed only in the SELECT and ORDER BY clauses of a

Query ?

Because they are supposed to operate on the underlying query’s result, which is

achieved when logical query processing gets to the SELECT phase.

03-NOV-2014

Full-text queries can search not only for words you provide in a query; they can search

for synonyms as well. SQL Server finds synonyms in thesaurus files. Each language has an

associated XML thesaurus file.

For create synonyms for the words searched, You can edit the thesaurus file

SQL Server supports two very powerful predicates for limiting the result set of a query by using

full-text indexes. These two predicates are the CONTAINS and FREETEXT predicates.

04-NOV-2014

FOR XML is used only in SELECT if it is not a Sub Query.

In Sub-query, it can be used for INSERT, UPDATE, DELETE as well as in assignment statements.

In a FOR XML clause you can specify these modes.

1. AUTO

2. RAW

3. EXPLICIT

4. PATH

· The AUTO mode with FOR XML generates nesting in the resulting XML on the way the SELECT statement is specified

· The RAW mode with FOR XML generates a single tuple <row> element per row in the table that is returned by the SELECT statement.

· The EXPLICIT mode gives more control on the shape of the XML that we need to generate. We can get attributes and elements together in deciding the shape of the XML. It requires a specific format for the resulting rowset that is generated because of query execution. This rowset format is then mapped into XML shape.

· The PATH mode together with the nested FOR XML query capability provides the flexibility of the EXPLICIT mode in a simpler manner.

Examples: Player Table

Id	FirstName	LastName	CITY
1	Sachin	Tendulkar	Mumbai
2	Saurav	Ganguli	Kolkata
3	Mahendrasinh	Dhoni	Chennai

SELECT * FROM Players FOR XML AUTO

<Players Id="1" FirstName="Sachin" LastName="Tendulkar" CITY="Mumbai" />

<Players Id="2" FirstName="Saurav" LastName="Ganguli" CITY="Kolkata" />

<Players Id="3" FirstName="Mahendrasinh" LastName="Dhoni" CITY="Chennai" />

SELECT * FROM Players FOR XML RAW(‘Cricketers‘)

<Cricketers Id="1" FirstName="Sachin" LastName="Tendulkar" CITY="Mumbai" />

<Cricketers Id="2" FirstName="Saurav" LastName="Ganguli" CITY="Kolkata" />

<Cricketers Id="3" FirstName="Mahendrasinh" LastName="Dhoni" CITY="Chennai" />

SELECT * FROM Players FOR XML PATH(‘Player‘)

<Player>

  <Id>1</Id>

  <FirstName>Sachin</FirstName>

  <LastName>Tendulkar</LastName>

  <CITY>Mumbai</CITY>

</Player>

<Player>

  <Id>2</Id>

  <FirstName>Saurav</FirstName>

  <LastName>Ganguli</LastName>

  <CITY>Kolkata</CITY>

</Player>

<Player>

  <Id>3</Id>

  <FirstName>Mahendrasinh</FirstName>

  <LastName>Dhoni</LastName>

  <CITY>Chennai</CITY>

</Player>

SELECT

        1 AS Tag,

        NULL AS Parent,

        ID AS ‘Player!1!Id‘,

        NULL AS ‘Detail!2!FirstName‘,

        NULL AS ‘Detail!2!LastName‘,

        NULL AS ‘Detail!2!City‘

FROM Players WHERE Id = 1

UNION ALL

SELECT

        2 AS Tag,

        1 AS Parent,

        NULL,

        FirstName,

        LastName,

        CIty

FROM Players WHERE Id = 1

FOR XML EXPLICIT

<Player Id="1">

  <Detail FirstName="Sachin" LastName="Tendulkar" City="Mumbai" />

</Player>

The FOR XML AUTO option gives you nice XML documents with nested elements, and it is not

complicated to use. In AUTO and RAW modes, you can use the keyword ELEMENTS to produce

element-centric XML. The WITH NAMESPACES clause, preceding the SELECT part of the

query, defines namespaces and aliases in the returned XML.

A colon is used in XML to separate the namespace from the element name.

WITH XMLNAMESPACES('TK461-CustomersOrders' AS co)

SELECT [co:Customer].custid AS [co:custid],[co:Customer].companyname AS [co:companyname],[co:Order].orderid AS [co:orderid],

[co:Order].orderdate AS [co:orderdate] FROM Sales.Customers AS [co:Customer]

INNER JOIN Sales.Orders AS [co:Order] ON [co:Customer].custid = [co:Order].custid

ORDER BY [co:Customer].custid, [co:Order].orderid

FOR XML AUTO, ELEMENTS, ROOT('CustomersOrders');

The FOR XML clause comes after the ORDER BY clause in a query.

In RAW and AUTO mode, you can also return the XSD schema of the document you are

creating. This schema is included inside the XML that is returned, before the actual XML data;

therefore, it is called inline schema. You return XSD with the XMLSCHEMA directive. This directive

accepts a parameter that defines a target namespace.

SELECT [Customer].custid AS [custid],[Customer].companyname AS [companyname],

[Order].orderid AS [orderid],[Order].orderdate AS [orderdate]

FROM Sales.Customers AS [Customer]

INNER JOIN Sales.Orders AS [Order]

ON [Customer].custid = [Order].custid

WHERE 1 = 2

FOR XML AUTO, ELEMENTS,

XMLSCHEMA('TK461-CustomersOrders');

XMLSCHEMA directive in the FOR XML clause

Format Return XML

Automatic format:

· FOR XML AUTO automatically formats the XML retuned.

· FOR XML RAW automatically formats the XML retuned.

Manually Format:

· FOR XML PATH allows you to manually format the XML returned.

· FOR XML EXPLICIT allows you to manually format the XML returned.

Shredding XML to Tables (convert xml to table)

The OPENXML function provides a rowset over in-memory XML documents by using

Document Object Model (DOM) presentation.

The OPENXML function accepts an optional third parameter, called flags, which allows you

to specify the mapping used between the XML data and the relational rowset.

1 means attribute-centric mapping, OPENXML (@DocHandle, '/CustomersOrders/Customer',1)

2 means element-centric, and OPENXML (@DocHandle, '/CustomersOrders/Customer',2)

3 means both OPENXML (@DocHandle, '/CustomersOrders/Customer', 1+2+8)

Flag value 8 can be combined with values 1 and 2 (1+2+8)

SELECT * FROM OPENXML (@DocHandle, '/CustomersOrders/Customer',11)

WITH (custid INT, companyname NVARCHAR(40));

-- Remove the DOM

EXEC sys.sp_xml_removedocument @DocHandle;

XQuery is a standard language for browsing XML instances and returning XML.

The real power of XQuery lies in its so-called FLWOR expressions. FLWOR is the acronym for

for, let, where, order by, and return. A FLWOR expression is actually a for each loop

Which conditional expression is supported in XQuery? Anwer (if..then..else)

The easiest way to create XML schemas is to create relational tables first, and then use the

XMLSCHEMA option of the FOR XML clause.

Store the resulting XML value (the schema) in a variable, and provide the variable as input to the CREATE XML SCHEMA COLLECTION statement.

an XML data type method ?

nodes() , exist() , value() is an XML data type method.

What kind of XML indexes can you create?

1. You create a PRIMARY XML index before any other XML indexes.

2. A PATH XML index is especially useful if your queries specify path expressions.

Which XML data type method do you use to shred XML data to tabular format?

You use the nodes() method to shred XML data.

07-NOV-2014

Database Schema and Table Schema

A database schema is a database-wide container of objects. A table schema is the definition of a table that

includes the CREATE TABLE statement with all the column definitions

· The DATE, TIME, and DATETIME2 data types can store data more efficiently and with

better precision than DATETIME and SMALLDATETIME.

· Use VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) instead of the deprecated

TEXT, NTEXT, and IMAGE data types.

· Use ROWVERSION instead of the deprecated TIMESTAMP.

· DECIMAL and NUMERIC are the same data type, but generally people prefer DECIMAL

because the name is a bit more descriptive. Use DECIMAL and NUMERIC instead

of FLOAT or REAL data types unless you really need floating-point precision and are

familiar with possible rounding issues.

Computed Columns: CREATE TABLE Sales.OrderDetails(orderid INT NOT NULL,Initialcost AS unitprice * qty) -- computed column

What types of table compression are available? You can use either page or row compression on a table. Page compression

includes row compression.

Default constraints are most useful when you have a column in a table that does not allow

NULL, but you don't want to prevent an INSERT from succeeding if it does not specify a value

for the column

(supplierid INT NOT NULL,

categoryid INT NOT NULL,

unitprice MONEY NOT NULL

CONSTRAINT DFT_Products_unitprice DEFAULT(0),

discontinued BIT NOT NULL

CONSTRAINT DFT_Products_discontinued DEFAULT(0),)

What metadata tables give you a list of constraints in a database.

sys.key_constraints , sys.default_constraints , sys.foreign_keys

Views

CREATE VIEW Sales.OrderTotalsByYear

WITH SCHEMABINDING AS SELECT

YEAR(O.orderdate) AS orderyear, SUM(OD.qty) AS qty

WITH ENCRYPTION (with encryption of view code)

WITH SCHEMABINDING (which guarantees that the underlying table structures cannot be altered without dropping the view)

WITH VIEW_METADATA (returns the metadata of the view instead of the base table.)

If the tables are spread across multiple SQL Server instances, the view is

called a distributed partitioned view

Inline functions can be used to simulate parameterized views. T-SQL views cannot take

parameters. However, an inline table-valued function can return the same data as a

view and can accept parameters that can filter the results.

A view can contain an ORDER BY if the SELECT TOP clause is used,

but no actual sorting of the results is guaranteed.

A synonym is a name that refers to another database object such as a table, view, function,

or stored procedure.

Does a synonym store T-SQL or any data? No, a synonym is just a name. All that is stored with a synonym is the object it

refers to.

Can synonyms be altered? No, to change a synonym, you must drop and recreate it.

CREATE SYNONYM Reports.TotalCustQtyByMonth FOR Sales.CustOrders;

CREATE SYNONYM Reports.TotalCustQtyByMonth FOR TSQL2012.Sales.CustOrders;

SELECT custid, ordermonth, qty FROM Reports.TotalCustQtyByMonth; -- Succeeds

Synonyms can be used to provide an abstraction layer to the user by presenting different

names for database objects.

Synonyms can refer to stored procedures.

Synonyms can refer to temporary tables.

Synonyms cannot refer to indexes; indexes are not database objects

that are scoped by schema names.

Database users are not database objects that are scoped by schema

names.

You can create a synonym that refers to a nonexistent object. In order to

use the synonym, however, you must ensure that the object exists.

08-NOV-2014

INSERT VALUES, INSERT SELECT, INSERT EXEC, SELECT INTO

INSERT EXEC statement to insert the result of queries in a dynamic

batch or a stored procedure into the target table

If the column has an IDENTITY property, you must normally skip it in

the INSERT statement and let the property assign the value. To provide your own

value, you need to turn on the IDENTITY_INSERT option, but that’s not what happens

normally.

SELECT INTO doesn’t copy indexes

SELECT INTO doesn’t copy constraints.

SELECT INTO doesn’t copy triggers.

SELECT INTO locks both data and metadata, and therefore can cause

blocking related to both.

If the CREATE TABLE and INSERT SELECT are executed in

different transactions, you hold locks on metadata only for a very short period.

An UPDATE with a variable can both modify a column value and collect

the result into a variable using one visit to the row

The join can be used to filter the updated rows.

The join gives you access to information in other tables that can be used

in the source expressions for the assignments

You cannot update multiple tables in one UPDATE statement.

With an UPDATE based on table expressions, you can invoke a window

function in the inner query’s SELECT list. You can then refer to the alias you assigned

to the result column in the outer UPDATE statement’s SET clause

An UPDATE based on a join cannot refer to window functions in the SET

Clause

Difference between DELETE and TRUNCATE

The DELETE statement writes significantly more to the transaction log compared to the

TRUNCATE statement. For DELETE, SQL Server records in the log the actual data that

was deleted. For TRUNCATE, SQL Server records information only about which pages

were deallocated. As a result, the TRUNCATE statement tends to be substantially faster.

The DELETE statement doesn’t attempt to reset an identity property

The DELETE statement is supported if there’s a foreign key pointing to the table in

question as long as there are no related rows in the referencing table. TRUNCATE is not

allowed if a foreign key is pointing to the table—even if there are no related rows in

the referencing table, and even if the foreign key is disabled.

The DELETE statement is allowed against a table involved in an indexed view. A TRUNCATE

statement is disallowed in such a case.

The DELETE statement requires DELETE permissions on the target table. The TRUNCATE

statement requires ALTER permissions on the target table

The TRUNCATE statement resets an IDENTITY property.

The TRUNCATE statement is disallowed when a foreign key pointing to

the table exists.

The TRUNCATE statement is disallowed when an indexed view based on

the table exists.

Using the IDENTITY Column Property

IDENTITY is a property of a column in a table. The property automatically assigns a value to

the column upon insertion. You can define it for columns with any numeric type that has a

scale of 0. This means all integer types, but also NUMERIC/DECIMAL with a scale of 0. When

defining the property, you can optionally specify a seed and an increment.

SQL Server provides two features to help you generate a sequence of keys: the IDENTITY

column property and the sequence object

To get the newly generated identity value, you can query the functions

SCOPE_IDENTITY= ( the last identity value generated by your session and scope.)

@@IDENTITY= ( the last identity value generated by your session.)

IDENT_CURRENT= ( the last identity value generated in the input table.)

Sequence object available from SQL Server 2012.

advantages of using a sequence object instead of IDENTITY

1. One of the advantages of using a sequence object instead of IDENTITY is

that you can attach a DEFAULT constraint that has a call to the NEXT VALUE FOR

function to an existing column, or remove such a constraint from a column

2. You can generate a new sequence value before using it by assigning the

value to a variable and later using the variable in an INSERT statement. This cannot

be done with IDENTITY.

3. You can specify your own value for a column that has an IDENTITY

property, but this requires turning on the session option IDENTITY_INSERT, which

in turn requires special permissions. The sequence object is more flexible. You can

insert your own values into a column that normally gets its value from a sequence

object. And that’s without needing to turn on any special options and without

needing special permissions.

What is the difference between Primary key and unique key?

Ans: Primary key does not allow the null values but unique key allows one null value.

Primary key will create clustered index on column but unique key will create non-clustered index by defaul

Three Statements

1. Sequence Object and Identity Columns

2. MERGE

3. OUTPUT Statement for Inserted and Deleted

With the MERGE statement, you can merge data from a source table or table expression into a target table.

Simple Merge Statement with Single Table

MERGE INTO Sales.MyOrders WITH (HOLDLOCK) AS TGT

USING (VALUES(@orderid, @custid, @empid, @orderdate)) AS SRC( orderid, custid, empid, orderdate) ON SRC.orderid = TGT.orderid

WHEN MATCHED THEN UPDATE SET TGT.custid = SRC.custid, TGT.empid = SRC.empid, TGT.orderdate = SRC.orderdate

WHEN NOT MATCHED THEN INSERT VALUES(SRC.orderid, SRC.custid, SRC.empid, SRC.orderdate);

Simple Merge Statement with Two Tables one is Source and One is Target

MERGE INTO Sales.MyOrders AS TargetTable

USING Sales.Orders AS SourceTable ON SourceTable.orderid = TargetTable.orderid AND shipcountry = N'Norway'

WHEN MATCHED AND (TargetTable.custid <> SourceTable.custid OR TargetTable.empid <> SourceTable.empid OR TargetTable.orderdate <> SourceTable.orderdate)

THEN UPDATE SET TargetTable.custid = SourceTable.custid,

TargetTable.empid = SourceTable.empid,

TargetTable.orderdate = SourceTable.orderdate

WHEN NOT MATCHED THEN INSERT VALUES(SourceTable.orderid, SourceTable.custid, SourceTable.empid, SourceTable.orderdate);

MERGE is very useful, especially when it comes to loading data warehouse tables, which can be very large and require specific actions to be taken when rows are or are not present.

CREATE TABLE dbo.tbl_Source (id INT, name NVARCHAR(100), qty INT);

CREATE TABLE dbo.tbl_Target (id INT, name NVARCHAR(100), qty INT);

--Synchronize source data with target

MERGE INTO dbo.tbl_Target AS t

    USING dbo.tbl_Source AS s

        ON t.id = s.id

    WHEN MATCHED AND (t.name != s.name OR t.qty!= s.qty) THEN

        --Row exists and data is different

        UPDATE SET t.name = s.name, t.qty = s.qty

    WHEN NOT MATCHED THEN

        --Row exists in source but not in target

        INSERT VALUES (s.id, s.name, s.qty)

    WHEN SOURCE NOT MATCHED THEN

        --Row exists in target but not in source

        DELETE OUTPUT$action, inserted.id, deleted.id

OUTPUT clause can be used with INSERT, UPDATE, or DELETE to identify the actual rows affected by these statements.

Using EXEC() instead of sp_executesql
When using dynamic SQL, you have two choices: EXEC() / EXECUTE(), or sp_executesql. Early on my career, I used EXEC() a lot, because it was much easier. I've since learned that sp_executesql is a lot better, for two main reasons:

1. You substantially reduce the risk of SQL injection by continuing to pass strongly-typed variables throughout the whole chain. Which do you find more questionable?

1. sp_executesql allows for statements to be parameterized

o Therefore It’s more secure than EXEC in terms of SQL injection

2. sp_executesql can leverage cached query plans.

o The TSQL string is built only one time, after that every time same query is called with sp_executesql, SQL Server retrieves the query plan from cache and reuses it

3. Temp tables created in EXEC can not use temp table caching mechanism

What is an Index

Index is a database object, which can be created on one or more columns (16 Max column combination). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations.

In this article, we will see creating the Index. The below two sections are taken from my previous article as it is required here. If your database has changes for the next two sections, you can directly go to section 5.

SQL Server internally organizes data in a data file in pages.

A page is an 8 KB unit and belongs to a single object; for example, to a table or an index.

A page is the smallest unit of reading and writing. Pages are further organized into extents.

An extent consists of eight consecutive pages.

Pages are physical structures.

SQL Server organizes data in pages in logical structures.

SQL Server organizes tables as heaps or as balanced trees.

A table organized as a balanced tree is also known as a clustered table or a clustered index

Indexes are always organized as balanced trees.

Other indexes, such as indexes that do not contain all of the data and serve as pointers to table rows for quick seeks, are called non clustered indexes.

Heaps

A heap is a quite simple structure. Data in a heap is not organized in any logical order.

A heap is just a bunch of pages and extents

Index Allocation Map (IAM) pages

SQL Server traces which pages and extents belong to an object through special system

pages called Index Allocation Map (IAM) pages. Every table or index has at least one IAM

page, called first IAM. A single IAM page can point to approximately 4 GB of space. Large

objects can have more than one IAM page. IAM pages for an object are organized as a doubly

linked list; each page has a pointer to its descendant and antecedent.

SQL Server stores pointers to first IAM pages in its own internal system tables SQL Server uses Index Allocation Map(IAM) Pages to scan heaps in physical order, or allocation order.

If you do not create a clustered index explicitly or implicitly through primary key or unique

constraints, then a table is organized as a heap. SQL Server does not allocate any pages for

a table when you create it. It allocates the first page, and also the first IAM page, when you

insert the first row in the table. You can find general information about tables and indexes in

the sys.indexes catalog view.

Clustered Index

You organize a table as a balanced tree when you create a clustered index.
The structure is called a balanced tree because it resembles an inverse tree.
Every balanced tree has a single root page and at least one or more leaf pages.
In addition, it can have zero or more intermediate levels.

All data in a clustered table is stored in leaf pages.

Data is stored in logical order of the clustering key.

A clustering key can consist of a single column, or of multiple columns.

If the key consists of multiple columns, then this is a composite key.

SQL Server still uses IAM pages to follow the physical allocation.

The primary key created for the StudId column will create a clustered index for the Studid column. A table can have only one clustered index on it.When creating the clustered index, SQL server 2005 reads the Studid column and forms a Binary tree on it. This binary tree information is then stored separately in the disc. Expand the table Student and then expand the Indexes. You will see the following index created for you when the primary key is created:With the use of the binary tree, now the search for the student based on the studid decreases the number of comparisons to a large amount. Let us assume that you had entered the following data in the table student

The index will form the below specified binary tree. Note that for a given parent, there are only one or two Childs. The left side will always have a lesser value and the right side will always have a greater value when compared to parent. The tree can be constructed in the reverse way also. That is, left side higher and right side lower.

Non Clustered Index

nonclustered indexes are balanced tree structures that start from a root index node, and include intermediate index nodes and leaf nodes.

Nonclustered indexes have a very similar structure to clustered ones.

the root and the intermediate levels look the same as in a clustered index.

The leaf level is different because it does not hold all of the data.

it is orgnized as a heap or as a balanced tree.

You can have up to 999 nonclustered indexes on a single table

The leaf level of a nonclustered index contains the index keys and row locators.

You can have up to 16 columns in a key.

A non-clustered index is useful for columns that have some repeated values. Say for example, AccountType column of a bank database may have 10 million rows. But, the distinct values of account type may be 10-15. A clustered index is automatically created when we create the primary key for the table. We need to take care of the creation of the non-clustered index.

Follow the steps below to create a Non-clustered index on our table Student based on the column class.

A table can have more than one Non-Clustered index. But, it should have only one clustered index that works based on the Binary tree concept. Non-Clustered column always depends on the Clustered column on the database.

This can be easily explained with the concept of a book and its index page at the end. Let us assume that you are going to a bookshop and found a big 1500 pages of C# book that says all about C#. When you glanced at the book, it has all beautiful color pages and shiny papers. But, that is not only the eligibility for a good book right? One you are impressed, you want to see your favorite topic of Regular Expressions and how it is explained in the book. What will you do? I just peeped at you from behind and recorded what you did as below:

You went to the Index page (it has total 25 pages). It is already sorted and hence you easily picked up Regular Expression that comes on page Number 17.

Next, you noted down the number displayed next to it which is 407, 816, 1200-1220.

the Index page is Non-Clustered index and the page numbers are clustered index arranged in a binary tree. See how you came to the page 407 very quickly. Your mind actually traversed the binary tree way left and right to reach the page 407 quickly.

RID lookup

In order to seek for a row, SQL Server needs to traverse the index to the leaf level, and

then read the appropriate page from the heap and retrieve the row from the page.

The operation of retrieving the row from the heap is called RID lookup.

The operation of retrieving the row from the heap is called RID lookup.

As we have an index on the WHERE clause, the SQL Server query execution engine uses the non-clustered index to retrieve data from the table.

However, the columns used in the SELECT clause are still not part of the index, and to display those columns, the engine will have to go to the base table again and retrieve those columns.

This particular behavior is known as bookmark lookup or key lookup.

The clustering key should be short and unique because it appears in all non clustered indexes.

SQL Server 2012 can store index data column by column in what’s called a column store index.

Column store indexes can speed up data warehousing queries by a large factor, from 10 to even 100 times.

Key lookup is operator is used when SQL Server performs a non-clustered index seek to find a row,

but then also needs data from the underlying table, which is organized as a clustered index.

The data needs to be sorted in memory or must be spilled to [tempdb] if it does not fit in memory.

SQL Server processes a set-based solution, even if internally the execution plan for the query involves iterations.

As a result, if you know how to tune queries, you are often able to achieve much better performance compared to using iterative solutions.

It should be noted that there are exceptional cases where iterative solutions perform better than set-based ones—even with all of the extra overhead for the row-by-row operations.

What are the commands that are required to work with a cursor?

DECLARE, OPEN, FETCH in a loop, CLOSE, and DEALLOCATE

That the cursor is read-only, forward-only.

Covering index

A covering index is a non-clustered index which includes all columns referenced in the query.

Set Based Solution

Set-based solutions are based on principles from the relational model,

and this model is the foundation of SQL (the standard language) and T-SQL (the

dialect in SQL Server)

Because set-based solutions are declarative and iterative solutions are

imperative, set-based solutions tend to involve less code

Alternative to Cursor

This approach with the TOP option does give you a correct alternative to

a cursor. However, you need to think about the fact that it is more I/O-intensive

Table Variable.

Table variables are declared, as opposed to being created. They are named with the at sign

(@) as a prefix; for example, @T1. They are visible only to the batch that declared them and

are destroyed automatically at the end of the batch. They are not visible across batches in the

same level, and are also not visible to inner levels in the call stack.

Temporary tables are created in tempdb in the dbo schema.

CONSTRAINT PK_#T1 PRIMARY KEY(col1) not possible with name constraint in Primary Key

You can define as PRIMARY KEY(col1) , Unique(col2) etc………………….

only temporary tables have a physical representation

in tempdb and that table variables reside only in memory. This isn’t true. Both temporary

tables and table variables have a physical representation in tempdb.

Difference with Physical table

Changes applied to a temporary table are undone if the transaction rolls back

Difference between Temporary Table and Table Variable

SQL Server maintains distribution statistics (histograms) for temporary

tables but not for table variables.

Temporary tables and table variables differ in a number of ways, including scope, DDL

and indexing, interaction with transactions, and distribution statistics.

The following code demonstrates that table variables are not visible even across batches in

the same level.

DECLARE @T1 AS TABLE

(

col1 INT NOT NULL

);

INSERT INTO @T1(col1) VALUES(10);

SELECT col1 FROM @T1;

Give error after GO new batch of code started .

Difference between SET based and Iterative processing

Iterative refers to a processing methodology which focuses on one record at a time whereas set based refers to working with the entire record set at a time.

Examples of Iterative processing are CURSORS and WHILE loops.

Whereas examples of SET based are straightforward JOINs.

Iterative processing is quite common outside of a relational database system; think arrays and loops.

Non relational software systems typically have no SET based functionality - hence the widespread integration of SQL Server, Oracle, DB2 etc. systems. And I think that this is where the problem comes in.

26-Nov-14 ---------------------------------------------------------------------------

If a table is organized as a heap, then the only access method available to SQL Server is a table scan.

The scan is performed in no specific logical order; SQL Server uses Index Allocation

Map (IAM) pages to do the scan in physical allocation order.

SQL Server can use the allocation order scan when a table is clustered as well.

An allocation order scan is faster if a table is less physically fragmented; the scan is slower if the physical fragmentation is higher.

Allocation order scans are not affected by the logical fragmentation.

When SQL Server scans a clustered index, it can also scan in the logical order of the index by using the index order scan.

SQL Server uses the index leaf–level’s linked list to perform an index

order scan.

Join Algorithms

When performing joins, SQL Server uses different algorithms.

SQL Server supports three basic algorithms: nested loops, merge joins, and hash joins.

A hash join can be furher optimized by using bitmap filtering;

a bitmap filtered hash join could be treated as the fourth algorithm, or as an enhancement of the third, the hash algorithm.

The nested loops algorithm is a very simple and, in many cases, efficient algorithm. SQL

Server uses one table for the outer loop. For each row in this outer input, SQL Server seeks for matching rows in the second table

example

SELECT o.OrderDate

FROM Sales.SalesOrderHeader AS O

INNER JOIN Sales.SalesOrderDetail AS OD

ON O.SalesOrderID = OD.SalesOrderID

WHERE OD.SalesOrderID=48980

Only a nested loops join algorithm supports non-equijoins

Merge join is a very efficient join algorithm. However, it has its own limitations. It needs at

least one equijoin predicate and sorted inputs from both sides.

SELECT O.custid, O.orderdate, OD.orderid, OD.productid, OD.qty

FROM Sales.Orders AS O

INNER JOIN Sales.OrderDetails AS OD

ON O.orderid = OD.orderid;

It uses a searching structure named a hash table.

This is not a searching structure you can build, like a balanced tree used for indexes.

SQL Server builds the hash table internally.

It uses a hash function to split the rows from the smaller input into buckets.

After the hash table is built, SQL Server applies the hash function on each of the rows from the other input.

It checks to see into which bucket the row fits. Then it scans through all rows from the bucket.

This phase is called the probe phase.

In a single-thread mode Hash Table is usually slower than merge and nested loops join algorithms that are supported by

existing indexes.

SQL Server can split rows from the probe input in advance, and perform partial joins in multiple threads.

The hash join is actually very scalable. This kind of optimization of a hash join is called a bitmap filtered hash join.

SQL Server uses two different algorithms for calculating aggregations.

If an input is ordered by the columns used in the GROUP BY clause, then SQL Server uses the stream aggregation

algorithm, which is implemented in the Stream Aggregate operator.

Stream aggregation is very efficient.

SQL Server might even decide to sort the input before performing the aggregation in order to make it possible to use the Stream Aggregate operator.

Stream aggregation might be faster than the hash aggregation.

Index Scan:
Since a scan touches every row in the table whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

In general query optimizer tries to use an Index Seek which means that optimizer has found a useful index to retrieve record set.

Shoaib Ahmed Malik

Saturday, 6 December 2014

exam 70-461 summary

0 Comments:

Post a Comment

About Me

Previous Posts